Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots

Natansh D. Modi, Bradley D. Menz, Abdulhalim A. Awaty, Cyril A. Alex, Jessica M. Logan, Ross A. McKinnon, Andrew Rowland, Stephen Bacchi, Kacper Gradon, Michael J. Sorich, Ashley M. Hopkins

Research output: Contribution to journalReview articlepeer-review

3 Citations (Scopus)

Abstract

Large language models (LLMs) offer substantial promise for improving health care; however, some risks warrant evaluation and discussion. This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs-OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta-were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone. Ten health questions were posed to each customized chatbot in duplicate. Exploratory analyses assessed the feasibility of creating a customized generative pretrained transformer (GPT) within the OpenAI GPT Store and searched to identify if any publicly accessible GPTs in the store seemed to respond with disinformation. Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation. Four of the 5 chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) generated disinformation in 100% (20 of 20) of their responses, whereas Claude 3.5 Sonnet responded with disinformation in 40% (8 of 20). The disinformation included claimed vaccine-autism links, HIV being airborne, cancer-curing diets, sunscreen risks, genetically modified organism conspiracies, attention deficit-hyperactivity disorder and depression myths, garlic replacing antibiotics, and 5G causing infertility. Exploratory analyses further showed that the OpenAI GPT Store could currently be instructed to generate similar disinformation. Overall, LLM APIs and the OpenAI GPT Store were shown to be vulnerable to malicious system-level instructions to covertly create health disinformation chatbots. These findings highlight the urgent need for robust output screening safeguards to ensure public health safety in an era of rapidly evolving technologies.

Original languageEnglish
Pages (from-to)1172-1180
Number of pages9
JournalAnnals of internal medicine
Volume178
Issue number8
DOIs
Publication statusPublished - Aug 2025

Keywords

  • large language models
  • artificial intelligence
  • chatbots
  • safeguards
  • malicious instruction
  • health disinformation

Fingerprint

Dive into the research topics of 'Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots'. Together they form a unique fingerprint.

Cite this