Ask Dr. Maia · Issue 2 Part 2 · May 18, 2026


ChatGPT Health, Claude, and Copilot: the verdict (Part 2)

Part 1 covered what the Health button actually is and whether it works. Today: who is most affected, where regulators land, and how to use these tools without getting hurt.


Where Part 1 left off

Last week, Part 1 traced what changed when five major technology companies launched dedicated AI health products in the first quarter of 2026. The short version: less than the branding implies. All five run on the same general-purpose large language models that were already on your phone. None has published a prospective clinical safety trial. None has FDA clearance for any clinical indication. A Nature Medicine stress test found that ChatGPT Health undertriaged 52% of gold-standard emergency conditions in structured testing, and crisis-intervention messaging activated in the opposite of the clinically correct pattern. The general accuracy picture sits around 70% on routine questions, which is meaningful but insufficient for high-stakes clinical decisions.

That is the diagnostic. Part 2 is the prescription. Who is reaching for these tools, where regulators stand, and what to actually do.


The access dimension

The KFF Tracking Poll (February-March 2026, n=1,343 nationally representative U.S. adults) documents who is actually reaching for these tools:

  • Uninsured adults use AI for mental health at more than twice the rate of insured adults (30% vs. 14%).
  • Among adults ages 18-29 who used AI for health advice, 38% cited lack of access to a regular health provider as a major reason; 29% cited inability to afford a provider.
  • Among adults with annual household incomes below $40,000, one in three cited not being able to afford a health care provider as a major reason for using AI.

The Pew Research Center survey found that uninsured Americans use AI chatbots for health information more often even when accounting for age, income, and other factors. Eleven percent of uninsured respondents used AI for health information often or extremely often, compared to 7% of those with insurance.

Most AI tools were not tested on patients who look like the population that is now reaching for them most. Consider the range of structural variables: insurance status, zip code (rural or urban), age, language preference, income, health condition, genetic and family history. Anyone can be underrepresented in the training data given the right combination of those factors, and the people now relying most heavily on consumer health AI are often underrepresented across multiple dimensions simultaneously.

The training data representation problem is concrete, not theoretical. A June 2025 npj Digital Medicine study from Cedars-Sinai tested four leading LLMs on identical psychiatric scenarios, varying only the stated race of the patient. Several models offered different treatment recommendations depending on patient race. Two omitted ADHD medication recommendations when the patient was identified as Black. One suggested guardianship for a depression case, only when the patient was African American. A 2025 Stanford study found that LLMs perform substantially worse for speakers of less-resourced languages.

The JMDH arms-race paper puts a structural frame around this: the design infrastructure of all five major health AI platforms assumes broadband connectivity, premium subscriptions, connected wearables, and English-language proficiency. The KFF report confirms the subscription picture directly. Perplexity Health, Claude for Healthcare’s personal health integrations, and Amazon Health AI all require paid subscriptions. ChatGPT Health is available on free tiers. Copilot Health is currently free, though Microsoft has signaled it will move to a paid model. The tools offering the most personalized features through direct record integration are increasingly behind a paywall, which makes them less accessible to the population most reliant on them.

The KFF access brief (April 30, 2026) summarizes the design problem: AI can compound access gaps when the underlying training data is not representative of all patients, and when deployment design assumes infrastructure that lower-income or rural populations do not have.


The regulatory reality

The regulatory landscape for general LLMs used as consumer health tools is, in a word, incomplete.

FDA. No general-purpose LLM has been cleared or approved by the FDA for clinical decision-making, triage, diagnosis, or treatment recommendations. The FDA’s 2026 Clinical Decision Support Software guidance provides a four-criteria test for when a software function is not a medical device and therefore falls outside FDA oversight: it must not analyze medical images or complex signals, must be based on clinically accepted data, must present recommendations for independent HCP review, and must be intended to support rather than replace HCP judgment. Consumer-facing general LLMs that give triage or symptom advice directly to patients, without a clinician in the loop, do not clearly meet criteria 3 and 4. The FDA’s 2026 CDS guidance makes the transparency expectations more explicit for AI-driven tools, requiring plain language descriptions of training data representativeness and clinical validation results. None of the five consumer health AI platforms has published the documentation that would satisfy those requirements.

The FDA has also drawn a clear line on accountability: a 2026 FDA warning letter to a drug manufacturer established that “AI did not tell us” is not a defensible compliance posture. That principle is portable to clinical settings: someone must own the evidence, the limitations, and the harm response.

FTC. The FTC’s Healthcare Task Force is signaling coordinated enforcement targeting health AI marketing claims and unsubstantiated clinical efficacy claims. The December 2025 GLP-1 telehealth settlement established the template: AI-enabled health products with deceptive pricing or unsubstantiated clinical claims are an enforcement target. Consumer health AI platforms making clinical-sounding statements without peer-reviewed evidence face growing FTC scrutiny.

EU AI Act. The EU AI Act’s high-risk AI provisions become enforceable August 2, 2026 for most high-risk systems. Healthcare AI systems, including those for triage, clinical decision support, and symptom assessment, fall into the high-risk category under Annex III. Requirements include conformity assessments, human oversight documentation, technical documentation of training data representativeness, and incident reporting systems. U.S.-headquartered platforms with European users are in scope regardless of FDA status. As of May 2026, none of the five major consumer health AI platforms has publicly confirmed EU AI Act compliance. Penalties for noncompliance reach €15 million or 3% of global annual turnover. Less than three months remain before the deadline.

A regulatory proposal from a vendor. The most-watched regulatory proposal of the week comes from a vendor. OpenAI’s “Keeping Patients First” blueprint, published May 6, 2026, proposes federally aligned regulatory sandboxes operated by states, health systems, and clinicians, under common national guardrails, without displacing FDA medical-device authority. Former ONC chief David Blumenthal told STAT News the proposals are “somewhat reasonable” but that OpenAI is “trying to have their cake and eat it too.” The piece worth holding onto is what the proposal omits. The Coalition for Health AI (CHAI), which OpenAI joined as a member in 2024, has spent four years building the multi-stakeholder coalition and independent assurance-lab network that exactly this kind of sandbox would need. CHAI’s assurance-lab vision stalled in 2025, and the OpenAI blueprint arrives in the vacuum left behind. The operator question matters. When sandbox sites concentrate in community health, rural, and safety-net settings, residual risk concentrates with the populations who can least afford a tool that fails in pilot. A future issue will take this up directly.

AMA. In May 2026, the American Medical Association formally asked Congressional leaders to require federal safeguards for AI health chatbot developers, including mandatory data retention limits, protections against unauthorized access, and transparency requirements. This is the first time organized medicine has gone on record requesting specific legislative action on consumer AI health tools.

HIPAA. Consumer health AI applications, when used outside a covered entity’s system, are not covered by HIPAA. A user who connects their medical records to ChatGPT Health, Perplexity Health, or Copilot Health via a third-party integration is operating in a legal environment where their health data has fewer protections than a conversation with a covered provider.

A May 2026 security study indexed on Semantic Scholar found that a public-facing medical RAG chatbot exposed internal configuration and the most recent 1,000 user conversations without authentication. Symptoms, medications, and mental health questions were among the disclosures. The population most likely to disclose sensitive health information to a free tool is, by definition, the population with the fewest alternatives and the least ability to audit the privacy posture of the tool they are using.


The HAIRA-framed verdict

The HAIRA framework, which Dr. Maia co-developed, asks five questions about any health AI system: What is the safety profile? Who is accountable when it fails? How reliable is it across the populations it will serve? Does it expand or contract access? How transparent is it about its own limitations? (Full citation: Hussein R, Hightower M, Beaulieu-Jones B, et al. Healthcare AI Governance Readiness Assessment (HAIRA): a peer-reviewed maturity model. npj Digital Medicine. 2026;9:236.)

In the consumer adaptation used here (HAIRA-RA, adapted from HAIRA for consumer decision-making), three tiers emerge for the current landscape:


Tier 1: General LLMs used directly for health questions (ChatGPT, Claude, Gemini, Perplexity without the health layer)

Safety: Documented failure modes at clinical extremes, especially emergency triage. Crisis-safeguard activation is inconsistent. Safety guardrails can themselves increase distress in mental health contexts. Accountability: No clear accountability pathway when the tool gives wrong advice. No FDA oversight for clinical use. No prospective harm reporting system. Reliability: Approximately 70% accuracy on general health questions; substantially lower on emergency triage (52% miss rate for emergencies in structured testing). Performance varies by language, demographic characteristics, and how queries are framed. Access: Widely accessible, including free tiers. Reaches populations with limited clinical alternatives. Training data underrepresents those same populations. Off-hours use by mobile users is the behavioral signature of the access gap. Transparency: Disclaimers present (tool is not for diagnosis), but buried. No public disclosure of training data representativeness by subgroup. No clinical validation evidence published.

Verdict: Use with explicit limits. Appropriate for general health education, appointment preparation, understanding test results in plain language. Not appropriate for triage, diagnosis, or as a primary health resource. In a medical emergency, close the app and call 911 or go to the emergency department.


Tier 2: General LLMs wrapped in health-specific products, with some clinical oversight (ChatGPT Health, Copilot Health, Claude for Healthcare, Perplexity Health)

Safety: Same underlying models as Tier 1. Privacy-isolated environments reduce some data risk. Physician review of some responses claimed but not independently verified. Emergency triage failures documented for ChatGPT Health specifically (Nature Medicine, 2026). No prospective clinical safety trial published for any platform. Accountability: Slightly clearer, with named companies, published privacy policies, and regulatory exposure under FTC and EU AI Act. Still no FDA clearance for any clinical indication. HIPAA does not cover consumer-side integrations. Reliability: Health-specific system prompts and physician review may improve some performance, but no peer-reviewed evidence of improvement over the base model on clinical outcomes. KFF notes that “accuracy questions remain unresolved” across all five platforms. Access: Mixed. Personalized features (record integration, wearable data) are largely behind paid subscriptions, reducing access for the population most reliant on free tools. ChatGPT Health and Copilot Health are currently available on free or lower-cost tiers. Transparency: Better than Tier 1 for data practices (health data not used for training). No published model cards, subgroup performance data, or clinical validation reports.

Verdict: Conditionally appropriate for health education and personal health data organization. Not yet appropriate for symptom triage or clinical decision support. Ask three questions before relying on any of these platforms: (1) Is there published clinical evidence for this tool’s health-specific accuracy? (2) Is the health data I upload protected from advertising use and third-party sharing? (3) Does the platform route me to a clinician or emergency services when I describe symptoms that warrant urgent care?


Tier 3: General LLMs wrapped in health-specific products, without clinical oversight or evidence, or with documented harms

Safety: No peer-reviewed clinical evidence. Documented harms in analogous products (companion AI, wellness-only tools with mental health claims). No crisis routing. No age verification in many cases. Accountability: No identifiable accountability pathway. Disclaimers present but unenforceable in practice. FTC enforcement possible but not yet applied at scale. Reliability: Unknown or negative. Hallucination risk documented in general LLMs; no subgroup performance published. Access: Often free, which increases reach to the access-constrained population, and therefore increases harm potential when the tool fails. Transparency: Minimal. No training data disclosure, no clinical validation, no limitations statement.

Verdict: Not appropriate as a health resource. If a product makes clinical-sounding statements, connects to personal health data, and cannot point to peer-reviewed evidence for its health-specific accuracy, treat it with the same skepticism you would apply to any unlicensed medical advice. Recommend against use for health decisions.


Practical guidance

For patients and families

  1. Ask one question before you trust a health AI tool: “Where is the peer-reviewed evidence that this tool works for health decisions?” If the answer is a marketing page rather than a published study, treat it as health education, not health care.

  2. Understand what your data becomes. If you connect medical records or lab results to a consumer AI tool, read the privacy policy before uploading. Confirm whether your health data can be used for advertising, shared with third parties, or used to train future models. Consumer health AI tools are generally not covered by HIPAA when used outside a clinical system.

  3. For urgent symptoms, do not substitute AI for emergency care. Chest pain, difficulty breathing, stroke symptoms (face drooping, arm weakness, speech difficulty), severe allergic reactions, and any symptom that feels like a medical emergency require 911 or an emergency department. The Nature Medicine study documented a 52% miss rate for emergency conditions in ChatGPT Health. That miss rate falls hardest on users without a clinical fallback.

For clinicians

  1. Screen patients for AI use as a health information source, the same way you screen for supplement use. Patients are often not volunteering this information. A JAMA Psychiatry commentary from April 2026 recommends routine screening because AI-generated health information is now affecting clinical encounters, and providers need to know what their patients have been told.

  2. Document your patient-facing AI guidance in the chart. As the AMA Congressional letter signals and FTC enforcement posture confirms, the regulatory environment is moving. Documenting that you discussed AI tool limitations with a patient, and what tool the patient is using, protects both the patient and the clinical record.

  3. Connect patients without insurance or a regular provider to verified low-cost resources. For patients using consumer AI as a primary health resource because of access barriers, findtreatment.gov, federally qualified health centers (which use sliding-scale fees and accept uninsured patients), and the Health Resources and Services Administration (HRSA) provider locator are concrete alternatives. AI is reaching this population because the system is failing it; help close that gap where you can.


If you are in crisis right now

This section is here because health content reaches people at hard moments. If you are in a medical or mental health emergency right now, please stop here and use one of the resources below. Everything else in this newsletter can wait.


Medical Emergency Call 911 immediately. If you are experiencing chest pain, difficulty breathing, signs of stroke, or any symptom that may require emergency care, do not search for AI advice. Call 911.

988 Suicide and Crisis Lifeline Call or text 988 (available 24/7, free, confidential, in English and Spanish) 988lifeline.org

Crisis Text Line Text HOME to 741741 (available 24/7) crisistextline.org

Nurse Advice Line If you have insurance, call the number on the back of your card and ask for the nurse advice line. Available 24/7 for most major insurers. A licensed nurse can help you determine whether your symptoms need emergency care, urgent care, or can wait for a scheduled appointment.

SAMHSA National Helpline (free treatment referrals, 24/7) Call 1-800-662-4357 samhsa.gov/find-help/national-helpline

Veterans Crisis Line Dial 988 then Press 1, or text 838255 veteranscrisisline.net


Closing

Here is what I want you to take from this two-part issue.

The general-purpose AI on your phone has become, for tens of millions of people, the most accessible health resource they have. That is not primarily a story about technology. It is a story about a healthcare system that costs too much, is too hard to navigate, and leaves too many people without a clinician they can call at 2 a.m. Consumer health AI is filling a gap that should not exist. The question is whether it fills it safely.

The evidence as of May 2026 gives a differentiated answer. These tools can help with health education, appointment preparation, and understanding test results. They cannot reliably triage emergencies. They cannot replace a clinician’s judgment. They have not been tested, at any scale, on the populations who are reaching for them most. And the regulatory infrastructure that would require them to prove they are safe for those populations is still under construction, with the most important deadline less than three months away.

In the meantime, the woman at 2 a.m. in rural Alabama is still on her phone. The access crisis that put her there is older than ChatGPT and will outlast whatever Health button the next platform launches. What Ask Dr. Maia can offer is the information to use these tools for what they are actually capable of, know their limits before those limits matter, and push toward the services that serve the people who need them most.

Take care of yourself. Take care of each other.

Dr. Maia

Bottom line: The population reaching for consumer health AI most heavily is uninsured, rural, lower-income, non-English-speaking, and underrepresented in training data, often across multiple dimensions at once. Federal regulation is incomplete: no general-purpose LLM has FDA clearance for clinical indications, FTC enforcement is signaled but not yet at scale, and the EU AI Act’s high-risk health AI deadline is August 2, 2026. A vendor-authored regulatory proposal (OpenAI’s “Keeping Patients First”) arrived this month into the vacuum left by stalled multi-stakeholder governance. The HAIRA-framed verdict: Tier 1 general LLMs are acceptable for health education only; Tier 2 health-wrapped products are conditionally appropriate for education and personal health data organization; Tier 3 products without evidence are not appropriate as health resources. In a medical emergency, call 911.


Ask Dr. Maia is educational content. It is not medical advice and does not create a doctor-patient relationship. If you are in a medical emergency, call 911. If you are in crisis, call or text 988 in the US. © 2026 Ask Dr. Maia. All rights reserved. To unsubscribe, click here.


Sources for Part 2

  1. KFF Tracking Poll on Health Information and Trust: Use of AI for Health Information and Advice. February-March 2026 (n=1,343). KFF

  2. Pew Research Center: Health information from social media and AI rated more convenient than accurate. April 7, 2026. Pew

  3. Cedars-Sinai study: Racial differences in LLM psychiatric treatment recommendations. npj Digital Medicine. June 2025. cedars-sinai.org

  4. Stanford: Digital divide, AI LLMs, exclusion of non-English speakers. May 2025. news.stanford.edu

  5. Ahmed MM, Othman ZK. The AI health arms race: a critical perspective on Big Tech and the widening access gap. Journal of Multidisciplinary Healthcare. 2026. Dove Medical Press

  6. KFF: The Growing Use of Artificial Intelligence in Health Care and Implications for Disparities. April 30, 2026. kff.org

  7. FDA Clinical Decision Support Software Guidance. January 2026. fda.gov

  8. FDA warning letter: inappropriate AI reliance in GMP documentation. RAPS. April 17, 2026. raps.org

  9. FTC Healthcare Task Force: Debevoise analysis. March 31, 2026. debevoise.com

  10. EU AI Act compliance: August 2026 high-risk deadline. Holland & Knight. April 27, 2026. hklaw.com

  11. OpenAI policy blueprint, “Keeping Patients First.” May 6, 2026. openai.com

  12. STAT News: OpenAI policy blueprint, “unleashing AI potential in health care.” Mario Aguilar, May 6, 2026. statnews.com

  13. AMA urges Congress to set AI chatbot safeguards. Healthcare Info Security. May 1, 2026. healthcareinfosecurity.com

  14. Semantic Scholar: Patient-facing RAG medical chatbot exposed private chats and system prompts. May 1, 2026. semanticscholar.org

  15. Hussein R, Hightower M, Beaulieu-Jones B, et al. Healthcare AI Governance Readiness Assessment (HAIRA): a peer-reviewed maturity model. npj Digital Medicine. 2026;9:236.

  16. JAMA Psychiatry: Commentary on screening patients for AI health tool use. April 2026. pubmed.ncbi.nlm.nih.gov/41533367

  17. Hugo H, et al. ChatGPT Health performance in a structured test of triage recommendations. Nature Medicine. 2026. nature.com/articles/s41591-026-04297-7 (Part 1 anchor; cited here for context)


Read Part 1 for the diagnostic half: what the Health button actually is, what the clinical accuracy data says, and the safety guardrail paradox.

Keep Reading