Automated topic analysis for restricted scope health corpora: methodology and comparison with human performance

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

64 Downloads (Pure)

Abstract

This paper addresses the problem of identifying topics which describe information content, in restricted size sets of scientific papers extracted from publication databases. Conventional computational approaches, based on natural language processing using unsupervised classification algorithms, typically require large numbers of papers to achieve adequate training. The approach presented here uses a simpler word-frequency-based approach coupled with context modeling. An example is provided of its application to corpora resulting from a curated literature search site for COVID-19 research publications. The results are compared with a conventional human-based approach, indicating partial overlap in the topics identified. The findings suggest that computational approaches may provide an alternative to human expert topic analysis, provided adequate contextual models are available.

Original languageEnglish
Title of host publicationProceedings of the 54th Annual Hawaii International Conference on System Sciences, HICSS 2021
Subtitle of host publicationJanuary 4-8, 2021
EditorsTung X. Bui
PublisherUniversity of Hawai'i at Manoa
Pages775-781
Number of pages7
ISBN (Electronic)9780998133140
DOIs
Publication statusPublished - 2021
Event54th Annual Hawaii International Conference on System Sciences - Virtual, Online
Duration: 4 Jan 20218 Jan 2021

Publication series

NameProceedings of the Annual Hawaii International Conference on System Sciences
PublisherUniversity of Hawai'i at Manoa
ISSN (Print)1530-1605
ISSN (Electronic)2572-6862

Conference

Conference54th Annual Hawaii International Conference on System Sciences
Abbreviated titleHICSS 2021
CityVirtual, Online
Period4/01/218/01/21
OtherThe Hawaii International Conference on System Sciences, in its 54th year, is one of the longstanding scientific conferences and is highly ranked among information systems conferences. Diverse disciplines unified by a focus on information technologies are woven together in a matrix structure of tracks and themes. By attending HICSS you are not only reaching the audience of your track and mini-track; you also have the opportunity to learn about what is happening in related fields and meet leaders in those fields.

Keywords

  • Text analytics
  • Topic analysis
  • Natural language processing
  • Keyword extraction
  • Term frequency

Fingerprint

Dive into the research topics of 'Automated topic analysis for restricted scope health corpora: methodology and comparison with human performance'. Together they form a unique fingerprint.

Cite this