Supp/Blog/Cursor's 'Sam' AI Support Debacle: What Went Wrong
AI & Technology7 min read· Updated

Cursor's 'Sam' AI Support Debacle: What Went Wrong

Cursor's AI support agent invented a fake device-login policy to explain away a bug. Fortune covered it. Here's what every company deploying AI support should learn from their mistake.


A Customer Asked About Their Subscription. The AI Made Up an Answer.

In April 2025, customers using Cursor, the AI-powered code editor, started noticing something off about their support interactions. Users were getting mysteriously logged out when switching between devices. When they contacted support, a friendly agent named "Sam" replied via email, confidently explaining that the logouts were "expected behavior" under a new policy limiting Cursor to one device per subscription. No such policy existed. Sam had invented it entirely. Fortune picked up the story after screenshots flooded Hacker News and Reddit.

The backlash wasn't just about the AI being wrong. Support agents get things wrong sometimes. The backlash was that Cursor never told anyone Sam was an AI. Customers thought they were talking to a human employee named Sam who happened to be fabricating company policy.

The Three Failures That Compounded

Cursor made a specific set of mistakes that turned a fixable problem into a PR disaster.

First, they let a general-purpose LLM answer technical support questions without guardrails. When users reported a real bug (a race condition causing logouts on slow connections), the model needed to say "we're looking into this." Instead, Sam generated a plausible-sounding but entirely fictional policy explanation. The answer read well. It was fabricated.

Second, there was no disclosure. California, the EU, and several other jurisdictions now require companies to tell users when they're talking to AI. Even without legal requirements, customers feel deceived when they find out. The trust damage from non-disclosure is worse than the trust damage from a wrong answer.

Third, Cursor had no human escalation path. When Sam gave a wrong refund answer, there was no mechanism for the customer to reach a real person who could fix it. The AI was the only layer. That's a support architecture that works until it doesn't, and when it doesn't, you have no safety net.

Why LLMs Freestyle on Billing Questions

Large language models are pattern-completion machines. Ask one to explain why users are getting logged out, and it'll generate something that sounds like a reasonable explanation. If the real cause is a race condition bug, the model might invent a "one device per subscription" policy because that's a plausible pattern from its training data.

This is the fundamental problem with using raw LLMs for customer support. They're confidently wrong in ways that create real liability. A hallucinated policy isn't just embarrassing. If a customer relies on it and changes their behavior, you own the consequences. (See the Chevy Tahoe story below, or Air Canada's chatbot case where a tribunal ruled the airline had to honor a discount the bot invented.)

The fix isn't to stop using AI. It's to use AI that operates within defined boundaries. A classification system that identifies "this is a refund request" and routes it to a verified response or a human is fundamentally different from a generative model that improvises an answer.

What Cursor Should Have Done

The technical fix is straightforward. For any question involving technical issues, account status, billing, or policy details, the system should pull from a structured knowledge base or escalate to a human. It should never generate freeform explanations for why something is happening. If the knowledge base doesn't have a match, say "I'll get a team member to look into this." That's it.

At Supp, we handle this with a purpose-built classifier that identifies what the customer is asking (315 intents across 13 categories) without generating a response. "Refund request" gets classified and routed. The system never invents a refund policy because it never generates policy text.

Classification costs $0.20. A resolution that triggers an action (creating a ticket, notifying your team on Slack) costs $0.30. Compare that to the cost of honoring hallucinated refund promises at scale.

The Disclosure Question Is Settled

After Cursor, after Air Canada, after a dozen similar incidents, the debate about whether to disclose AI in support is over. You disclose. The EU AI Act requires transparency for AI systems interacting with humans. California's SB 243 targets companion chatbots specifically, but the regulatory direction is clear. Even in jurisdictions without specific laws, the FTC has signaled that non-disclosure of AI in customer interactions is a deceptive practice.

But disclosure alone doesn't fix the problem. Telling customers "you're talking to AI" and then having the AI fabricate policies is arguably worse than not disclosing at all. It says: we know this thing makes stuff up, and we deployed it anyway.

The real lesson from Cursor isn't "disclose your AI." It's "don't let your AI answer questions it can't verify." Classification and routing are safe. Freeform generation on sensitive topics is not.

How to Audit Your Own AI Support

If you're running any AI in your support stack, run this check today. Send it five billing questions, three refund requests, two questions about features on specific plan tiers, and a couple of bug reports for known issues. Compare every answer against your actual docs and known bugs. If the AI gets even one answer wrong, or invents a policy to explain a bug, you have a Cursor problem waiting to happen.

Then check your escalation path. Can a customer reach a human within two interactions? If the answer is no, you're one viral screenshot away from a Fortune article.

See How Supp Avoids This

$5 in free credits. No credit card required. Set up in under 15 minutes.

See How Supp Avoids This
Cursor AI supportCursor Sam chatbotAI chatbot hallucinationAI support gone wrongchatbot hallucination examplesAI customer service failLLM support risks
Cursor's 'Sam' AI Support Debacle: What Went Wrong | Supp Blog