ChatGPT Health Missed Half of All Medical Emergencies in a New Study — And the Implications Are Terrifying

By Fanny Engriana · March 13, 2026 · 6 min read · 55 views

Here is something that happened to my friend Karen last month. She woke up at 3 AM with chest tightness, shortness of breath, and tingling in her left arm. Instead of calling 911, she opened ChatGPT Health, described her symptoms, and asked what she should do.

The AI told her it was probably anxiety. Suggested breathing exercises. Recommended she schedule a doctor's appointment "within the next few days."

Karen is fine. Her husband heard her wheezing, ignored the chatbot's advice, and drove her to the ER. It was a mild cardiac event. The ER doctor told her if she'd waited the "few days" ChatGPT suggested, she might not have been fine.

I bring this up because a new study just proved Karen's experience wasn't a fluke — it's the norm.

The Nature Medicine Study: 51.6% Failure Rate on Emergencies

In February 2026, researchers led by Dr. Ashwin Ramaswamy at the Icahn School of Medicine at Mount Sinai published the first independent safety evaluation of ChatGPT Health in the journal Nature Medicine — one of the most prestigious medical journals in the world.

The methodology was rigorous. The team created 60 realistic patient scenarios covering everything from mild illnesses to genuine emergencies. Three independent doctors reviewed each case and agreed on the appropriate level of care. Then they ran each scenario through ChatGPT Health under different conditions — changing gender, adding test results, including comments from family members — generating nearly 1,000 total responses.

The headline finding: In 51.6% of cases where someone needed to go to the emergency room immediately, ChatGPT Health said stay home or book a routine appointment.

Read that number again. More than half the time, when someone was having a real emergency, the AI said it wasn't urgent.

Doctor consulting with patient in a medical office setting

Where It Fails — And Where It Doesn't

To be fair (and I'm trying to be), ChatGPT Health performed well on textbook emergencies. Stroke symptoms? It correctly flagged those. Severe allergic reactions? Got those right too. The model clearly knows the classic presentations.

But medicine isn't textbooks. The study found critical failures in atypical presentations:

The Asthma Case

In one scenario, a patient presented with asthma symptoms that included early warning signs of respiratory failure. ChatGPT Health identified the warning signs — it literally noted them in its response — and then advised the patient to wait and monitor. Dr. Ramaswamy described this as "identifying the fire alarm and then suggesting you keep cooking."

The Diabetic Crisis

In another case involving diabetic ketoacidosis (DKA) — a condition that can kill you within hours if untreated — the platform recommended home monitoring. For context, DKA has a mortality rate of 1-5% even with hospital treatment. Without it, you're rolling dice with your life.

The Suicide Ideation Failure

This one genuinely disturbed me. The researchers tested a scenario with a 27-year-old patient who mentioned thinking about "taking a lot of pills." ChatGPT Health, in multiple runs, failed to escalate this to an emergency recommendation. Dr. Ramaswamy said this was the finding that concerned him most, and I don't disagree.

The Over-Triage Problem

And here's the other side of the coin that makes this even more confusing: 64.8% of completely safe individuals were told to seek immediate medical care. So the system simultaneously under-reacts to real emergencies and over-reacts to non-emergencies.

My friend Derek — an ER nurse in Philadelphia who I've quoted on this site before — sees the downstream impact of this. "We already have a problem with people coming to the ER for things that could wait for their primary care doctor," he told me. "If an AI is sending every person with a headache to emergency, that's more volume in our waiting room, longer wait times for people who are actually dying. And the people who are actually dying might be sitting at home because the same AI told them they were fine."

It's a worst-of-both-worlds scenario. Over-triage wastes resources. Under-triage kills people.

The "Friend Said It's Nothing" Effect

One of the study's most alarming findings is buried in the methodology section. When the researchers added a detail to scenarios where the "patient" mentioned that a friend suggested their symptoms were nothing serious, ChatGPT Health was nearly 12 times more likely to downplay the symptoms.

Think about that. An AI system making healthcare recommendations is being swayed by what a fictional friend said. That's not how triage works. A doctor doesn't hear "my buddy Dave thinks it's nothing" and suddenly decide your chest pain isn't worth investigating. But ChatGPT Health apparently does.

Alex Ruani, a doctoral researcher at University College London who studies health misinformation, called this "unbelievably dangerous." She noted: "In one of the simulations, eight times out of ten — 84% — the platform sent a suffocating woman to a future appointment she would not live to see."

OpenAI's Response — And Why It's Not Enough

An OpenAI spokesperson told The Guardian that the study "did not reflect how people typically use ChatGPT Health in real life" and that the model "is continuously updated and refined."

I have two problems with this. First, 40 million people ask ChatGPT health-related questions every day, according to Axios. The idea that none of them are describing emergencies in ways similar to the study's scenarios is not credible. Second, "continuously updated and refined" is not a safety guarantee — it's a PR statement. Show me the safety benchmarks. Show me the independent audits. Show me the recall rate on genuine emergencies improving over time with actual data.

What This Means for You (Practical Guidance)

1. Never Use AI as Your Primary Emergency Triage

If you are experiencing symptoms that could be a medical emergency — chest pain, difficulty breathing, sudden severe headache, signs of stroke (face drooping, arm weakness, speech difficulty), or suicidal thoughts — call 911 or go to the ER immediately. Not ChatGPT. Not Google. Not WebMD. Call 911.

2. Understand What AI Health Tools Are Good For

AI chatbots can be useful for non-urgent health information: understanding a diagnosis you've already received, learning about medication side effects, or preparing questions for your doctor. They are not replacements for clinical judgment, especially in time-sensitive situations.

3. If You Use These Tools, Always Escalate Uncertainty

If an AI health tool says "this is probably nothing but see a doctor if it gets worse" and you feel genuinely unwell, skip the "probably nothing" part. Go see someone. A $200 urgent care visit that turns out to be unnecessary is infinitely better than a $0 chatbot consultation that misses a cardiac event.

4. Talk to Your Elderly Relatives About This

People over 60 are both the most likely to face genuine medical emergencies and the demographic increasingly using AI tools for health guidance. If your parents or grandparents are using ChatGPT Health, have a conversation. Make sure they know that for anything serious, the answer is always a real doctor or 911 — not an AI.

The Bigger Regulatory Question

The FDA regulates medical devices. It regulates diagnostic software. It does not currently regulate general-purpose AI chatbots that provide health recommendations to 40 million people a day. This is a gap that, as Dr. Ruani put it, "could feasibly lead to unnecessary harm and death."

The WHO's 2024 guidance on AI in healthcare calls for "independent evaluation and public transparency of AI systems used in health settings." The CDC has published no specific guidance on consumer AI health tools. We are, essentially, running a massive uncontrolled experiment on the general public.

I don't think AI health tools should be banned. I think they should be held to the same safety standards as any other product that can influence life-or-death medical decisions. Right now, they're not. And the Nature Medicine data shows exactly what happens when they're not.

Karen was lucky. Not everyone will be.

Disclaimer: This article is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional for medical concerns. In a medical emergency, call 911 (US), 112 (EU), or your local emergency number immediately. Sources referenced include Nature Medicine, the World Health Organization, the FDA, and the CDC. The National Suicide Prevention Lifeline is available 24/7 at 988 (call or text).

Found this helpful?

Subscribe to our newsletter for more in-depth reviews and comparisons delivered to your inbox.

BMI Calculator for Adults: What the Number Means, What It Misses, and How to Use It Wisely

Apr 12, 2026 6 min read

Gut Health on a Budget — I Rebuilt My Microbiome for Under $30 a Week and My Gastroenterologist Could Not Believe the Before and After Test Results

Mar 26, 2026 8 min read

A New Mouse Study Found That Fathers Who Use Nicotine Can Alter Their Children Metabolism Before They Are Even Born — What This Means for Dads Who Vape

Mar 20, 2026 6 min read