Oxford Study Questions AI’s Advantage In Medical Advice

Key Points:

AI did not outperform internet searches in helping patients make medical decisions.
Human use of AI led to frequent misinterpretation of symptoms and advice.
Researchers warned of safety risks as reliance on AI health advice grows.

LONDON: An Oxford University study has found that artificial intelligence systems are no more effective than traditional methods such as internet searches when it comes to helping patients make decisions about their health.

The research, published in the journal Nature Medicine, assessed whether AI tools actually improve patient triage — the process of deciding what medical action to take — amid growing public reliance on chatbots for medical advice.

The authors said the findings were significant as increasing numbers of people are turning to AI for health guidance, despite limited evidence that such tools offer safer or better outcomes than existing resources.

Researchers from the University of Oxford’s Internet Institute worked with medical doctors to design 10 different clinical scenarios, ranging from mild illnesses such as the common cold to severe, life-threatening conditions including haemorrhages causing bleeding on the brain, according to Reuters.

In initial tests conducted without human participants, three large language models — OpenAI’s ChatGPT-4o, Meta’s Llama 3 and Cohere’s Command R+ — correctly identified medical conditions in 94.9 percent of cases.

However, they selected the correct course of action, such as calling an ambulance or visiting a doctor, in only 56.3 percent of cases on average. The companies involved did not respond to requests for comment.

The research team then recruited 1,298 participants across Britain and divided them into groups. Participants were asked to investigate symptoms and decide what to do next using either AI tools, their own experience, a standard internet search, or the UK’s National Health Service website.

When human participants interacted with the tools, relevant medical conditions were identified in fewer than 34.5 percent of cases. The correct course of action was chosen in less than 44.2 percent of cases – a result that was no better than outcomes from participants using traditional information sources.

Adam Mahdi, associate professor at Oxford and co-author of the study, said the findings highlighted a “huge gap” between AI’s technical potential and its real-world use by the public.

“The knowledge may be in those bots; however, this knowledge doesn’t always translate when interacting with humans,” Mahdi said, adding that further research was needed to understand why this gap exists.

The researchers analysed around 30 interactions in detail and found that patients often provided incomplete or inaccurate information. At the same time, AI systems were sometimes found to generate misleading or incorrect responses.

In one case, a patient describing symptoms of a subarachnoid haemorrhage — a potentially fatal condition involving bleeding on the brain – was correctly advised by AI to seek immediate hospital care after reporting a stiff neck, light sensitivity and the “worst headache ever.”

However, another patient describing similar symptoms but referring to a “terrible” headache was advised to lie down in a dark room instead.

The research team plans to conduct similar studies in different countries and languages, and over longer periods, to determine whether these factors affect AI performance in healthcare settings.

The study was supported by the data company Prolific, the German non-profit Dieter Schwarz Stiftung, and the governments of the United Kingdom and the United States.