Automation of penicillin adverse drug reaction categorisation and risk stratification with machine learning natural language processing

Joshua M. Inglis, Stephen Bacchi, Alexander Troelnikov, William Smith, Sepehr Shakib

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Background: The penicillin adverse drug reaction (ADR) label is common in electronic health records (EHRs). However, there is significant misclassification between allergy and intolerance within the EHR and most patients can be delabelled after an immunologic assessment. Machine learning natural language processing may be able to assist with the categorisation and risk stratification of penicillin ADRs. Objective: The aim of this study was to use text entered into an EHR to derive and evaluate machine learning models to classify penicillin ADRs and assess the risk of true allergy. Methods: Machine learning natural language processing was applied to free-text penicillin ADR data extracted from a public health system EHR. The model was developed by training on labelled dataset. ADR entries were split into training and testing datasets and used to develop and test a variety of machine learning models. These were compared to categorisation with a simple algorithm using keyword search. Results: The best performing model for the classification of penicillin ADRs as being consistent with allergy or intolerance was the artificial neural network (AUC 0.994, sensitivity 0.99, specificity 0.96). The artificial neural network also achieved the highest AUC in the classification of high- or low-risk of true allergy (AUC 0.988, sensitivity 0.99, specificity 0.99). All ADR labels were able to be classified using these machine learning models, whereas a small proportion were unclassifiable using the simple algorithm as they contained no keywords. Conclusion: Machine learning natural language processing performed similarly to expert criteria in classifying and risk stratifying penicillin ADRs labels. These models outperformed simpler algorithms in their ability to interpret free-text data contained in the EHR. The automated evaluation of penicillin ADR labels may allow real-time risk stratification to facilitate delabelling and improve the specificity of prescribing alerts.

Original languageEnglish
Article number104611
Number of pages5
Publication statusPublished - Dec 2021
Externally publishedYes


  • Adverse drug reaction
  • Electronic health records
  • Machine learning
  • Natural language processing
  • Penicillin


Dive into the research topics of 'Automation of penicillin adverse drug reaction categorisation and risk stratification with machine learning natural language processing'. Together they form a unique fingerprint.

Cite this