Should you really trust health advice from an AI chatbot?

Can AI Chatbots Be Reliable Sources for Health Advice?

For over a year, Abi from Manchester has relied on ChatGPT, a prominent AI chatbot, to assist with her health concerns. The convenience of instant access to information has made AI a tempting tool, especially when healthcare professionals are hard to reach. With some AI systems even performing well on medical assessments, the question remains: can we place our trust in chatbots like ChatGPT, Gemini, and Grok? Or does their advice carry greater risks than traditional internet searches? Abi, who experiences health anxiety, has found that chatbots offer more personalized guidance than generic online results, often steering her toward more specific concerns.

Abi’s Mixed Experience

Abi recalls a time when she suspected a urinary tract infection. The chatbot analyzed her symptoms and suggested visiting a pharmacist, which led to a correct diagnosis and treatment. “It felt like a collaborative approach,” she explains. “It was almost like talking to a doctor.” Yet, her trust was tested in January when she injured her back while hiking. The AI incorrectly identified a potential organ puncture, prompting her to rush to A&E. After three hours in the emergency department, Abi realized the diagnosis was overly alarmist, highlighting the chatbot’s occasional misjudgment.

“ChatGPT told me I’d punctured an organ and needed A&E immediately,” Abi says. “But the pain eased, and I knew I wasn’t in critical condition.”

The growing use of AI in health contexts raises concerns among experts. Prof Sir Chris Whitty, England’s Chief Medical Officer, warned earlier this year that while people are increasingly turning to these tools, the advice they receive is often “both confident and wrong.” Researchers at the University of Oxford explored this issue by testing AI’s ability to interpret health scenarios. In one experiment, doctors crafted detailed cases ranging from minor ailments to emergencies. When presented with complete information, AI achieved 95% accuracy, impressing the team. However, results dropped significantly when 1,300 participants engaged in conversational interactions with the chatbots.

Human-AI Communication Challenges

Prof Adam Mahdi, part of the Oxford study, notes that conversational exchanges complicate AI’s ability to diagnose accurately. “When people talk, they omit details, change their minds, or get distracted,” he explains. This dynamic was evident in a scenario involving a subarachnoid haemorrhage, a severe brain bleed requiring urgent care. Subtle variations in how users described symptoms to ChatGPT resulted in vastly different recommendations, including incorrect advice to avoid bed rest. The study revealed that accuracy plummeted to 35%, with two-thirds of cases yielding wrong diagnoses or care suggestions.

“The way people convey information gradually affects the outcome,” Mahdi says. “It’s not just about the data, but how it’s interpreted in real-time.”

Dr Margaret McCartney, a GP in Glasgow, highlights a key distinction between AI chatbots and search engines. “Chatbots create the illusion of a personal connection, whereas Google searches let you sift through multiple sources to assess reliability,” she points out. This personalization might influence how users perceive the advice, making them more susceptible to specific recommendations. Meanwhile, a separate analysis by the Lundquist Institute for Biomedical Innovation further underscores the need for caution, emphasizing how chatbots can shape health decisions in ways that differ from traditional methods.

Leave a Reply

Your email address will not be published. Required fields are marked *