Prediction of population health indices from social media using kernel-based textual and temporal features

Thin Nguyen, Duc Thanh Nguyen, Mark E. Larsen, Bridianne O'Dea, John Yearwood, Dinh Phung, Svetha Venkatesh, Helen Christensen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Citations (Scopus)
26 Downloads (Pure)

Abstract

From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting“insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.

Original languageEnglish
Title of host publication26th International World Wide Web Conference 2017, WWW 2017 Companion
PublisherInternational World Wide Web Conferences Steering Committee
Pages99-107
Number of pages9
ISBN (Electronic)9781450349147
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event26th International World Wide Web Conference, WWW 2017 Companion - Perth, Australia
Duration: 3 Apr 20177 Apr 2017

Publication series

Name26th International World Wide Web Conference 2017, WWW 2017 Companion

Conference

Conference26th International World Wide Web Conference, WWW 2017 Companion
Country/TerritoryAustralia
CityPerth
Period3/04/177/04/17

Keywords

  • Cognitive computing
  • Feature engineering
  • Geo-referenced tweets
  • Kernel-based features
  • Online texts
  • Population health indices
  • Prediction
  • Temporal information
  • Textual features

Fingerprint

Dive into the research topics of 'Prediction of population health indices from social media using kernel-based textual and temporal features'. Together they form a unique fingerprint.

Cite this