TY - JOUR
T1 - Estimation of the prevalence of adverse drug reactions from social media
AU - Nguyen, Thin
AU - Larsen, Mark E.
AU - O'Dea, Bridianne
AU - Phung, Dinh
AU - Venkatesh, Svetha
AU - Christensen, Helen
PY - 2017/6
Y1 - 2017/6
N2 - This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.
AB - This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.
KW - Adverse drug reactions
KW - Consumer health informatics
KW - Drug informatics
KW - Social media
KW - Word embedding
KW - Word representation
UR - http://www.scopus.com/inward/record.url?scp=85017169552&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2017.03.013
DO - 10.1016/j.ijmedinf.2017.03.013
M3 - Article
C2 - 28495341
AN - SCOPUS:85017169552
SN - 1386-5056
VL - 102
SP - 130
EP - 137
JO - INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
JF - INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
ER -