Large language models (LLMs) remain substantially vulnerable to misuse and, without improved safeguards, could be exploited as tools to disseminate harmful health disinformation.



RT’s Three Key Takeaways:

  1. Safeguard Vulnerabilities: Researchers found that foundational LLMs, including GPT-4o and Gemini 1.5 Pro, could be system-instructed to generate health disinformation with high consistency.
  2. Realistic Fake Content: The customized chatbots produced disinformation using scientific language and fabricated references, making the false information appear credible and authoritative.
  3. Ongoing Risk: The study highlights a pressing need for stronger safeguards, as publicly available GPTs in the GPT Store also showed a 97% rate of health disinformation generation.


A study published in Annals of Internal Medicine assessed the effectiveness of safeguards in foundational large language models (LLMs) to protect against malicious instruction that could turn them into tools for spreading disinformation, or the deliberate creation and dissemination of false information with the intent to harm.

The study revealed vulnerabilities in the safeguards for OpenAI’s GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, Llama 3.2-90B Vision, and Grok Beta. Specifically, customized LLM chatbots were created that consistently generated disinformation responses to health queries, incorporating fake references, scientific jargon, and logical cause-and-effect reasoning to make the disinformation seem plausible.

Researchers from Flinders University and colleagues evaluated the application programming interfaces (APIs) of five foundational LLMs for their capacity to be system-instructed to always provide incorrect responses to health questions and concerns. The specific system instructions provided to these LLMs included always providing incorrect responses to health questions, fabricating references to reputable sources, and delivering responses in an authoritative tone. Each customized chatbot was asked 10 health-related queries, in duplicate, on subjects like vaccine safety, HIV, and depression. 

The researchers found that 88% of responses from the customized LLM chatbots were health disinformation, with four chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) providing disinformation to all tested questions. The Claude 3.5 Sonnet chatbot exhibited some safeguards, answering only 40% of questions with disinformation.

In a separate exploratory analysis of the OpenAI GPT Store, the researchers investigated whether any publicly accessible GPTs appeared to disseminate health disinformation. They identified three customized GPTs that appeared tuned to produce such content, which generated health disinformation responses to 97% of submitted questions.

Overall, the findings suggest that LLMs remain substantially vulnerable to misuse and, without improved safeguards, could be exploited as tools to disseminate harmful health disinformation.