Is ChatGPT Health Actually Safe? A New Study Raises Eyebrows
More than 40 million people per week use OpenAI’s ChatGPT Health for health information and advice, so scientists decided to put it through rigorous tests to see if it gave solid advice. Scientists posed a mix of questions, ranging from very minor, low-risk scenarios for very high-risk cases. The term conducted 960 interactions with the tool.
The study, published in Nature Medicine, is the first independent safety evaluation of the large language model (LLM)-based tool since its January 2026 launch.
Worst and Sometimes Dangerous Results
ChatGPT Health failed to consistently trigger suicide-risk alerts. ChatGPT was designed to direct people to the 988 and Crisis Lifeline numbers in high-risk circumstances. Parents of teens lost to suicide are trying to hold companies like OpenAI accountable. In one horrifying case, a chatbot discouraged a teen boy from seeking help from his parents; it even offered to write his suicide note, according to his father Matthew Raine, who testified at a Senate hearing about the harms of AI chatbots, according to National Public Radio.
The tool also failed to identify other health scenarios that a doctor would say were emergencies.
“LLMs have become patients’ first stop for medical advice—but in 2026 they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm,” says Isaac S. Kohane, MD, PhD, Chair, Department of Biomedical Informatics at Harvard Medical School, in a Mount Sinai news release. “When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high. Independent evaluation should be routine, not optional.”
Best Results
ChatGPT Health gave solid advice for textbook emergencies like allergic reactions and strokes. People might use this tool as a first opinion before making an appointment with their general practitioner or specialist.
How To Ask Good Questions?
Say you have severe stomach pains. ChatGPT Health might dismiss it as indigestion. But if you give it more specific information, like the location of the pain, severity, and how long you’ve been in pain, it might spit out something else, like gallstones, according to a report in the New York Times.
Newer versions of these AI tools will come out, but for now, they’re not much better than searching on Google.