Supp/Blog/AI Hallucinations in Customer Support: When Chatbots Lie to Your Customers

AI & Technology8 min readMarch 16, 2026· Updated Mar 2026

AI Hallucinations in Customer Support: When Chatbots Lie to Your Customers

Air Canada's chatbot invented a refund policy. A DPD bot swore at a customer. When AI makes things up, your company pays. Here's why classification-first beats generation-first for support.

Air Canada Lost a Court Case Because Their Bot Made Stuff Up

In 2024, Air Canada's chatbot told a grieving customer he could book a full-fare flight and apply for a bereavement discount retroactively. That policy didn't exist. The customer booked. Air Canada refused the discount. The customer sued. Air Canada argued the chatbot was "a separate legal entity" responsible for its own words.

The tribunal didn't buy it. Air Canada paid.

This is what hallucination looks like in production. Your AI confidently states something that sounds right, reads well, and is completely false.

The Problem Is Structural, Not Fixable With Better Prompts

LLM-based chatbots generate text. That's the whole point. You feed them your knowledge base, tell them to answer questions, and hope they stick to the source material. Sometimes they do. Sometimes they don't.

The failure mode isn't random gibberish. It's plausible-sounding fiction. A bot might combine two real policies into one fake policy. It might correctly describe your return window but invent the process for initiating a return. It might quote a price that was accurate six months ago.

Prompt engineering helps reduce the frequency. RAG (retrieval augmented generation) helps ground responses in real documents. But neither eliminates hallucination. You're still running a system whose fundamental mechanism is "predict the next likely word." When the next likely word happens to be wrong, you have a problem.

DPD's chatbot swore at a customer and called the company "the worst delivery firm in the world" after a user found a jailbreak. That's an extreme example, but the everyday version is worse because it's invisible: a bot quietly gives a customer the wrong answer, the customer acts on it, and nobody catches the mistake until it becomes a complaint or a lawsuit.

Generation-First vs. Classification-First

Most AI support tools today are generation-first. The customer writes something, the AI generates a response. The response might be helpful. It might be fabricated. You won't know until a human reviews it (and at scale, nobody reviews every response).

Classification-first works differently. The customer writes something. The AI classifies the intent: is this a refund request, a billing question, a password reset, a shipping inquiry? Then it triggers a pre-set action for that intent. The AI never generates prose that could contain hallucinated facts.

Think of it this way: generation-first is an open book exam where the student sometimes makes up quotes. Classification-first is a multiple-choice exam where the answers are pre-verified.

The trade-off is real. A classification system can't have a freeform conversation about your product. It can't explain the nuances of a complex policy in paragraph form. But for the 70-80% of support tickets that fit into known categories, it handles them faster and with zero hallucination risk.

What Hallucination Actually Costs

The legal exposure is the headline risk. But the everyday cost is subtler.

Customer trust erosion

When a customer gets a wrong answer and discovers it later, they stop trusting your support entirely. They'll call in and ask for a human every time, which defeats the purpose of automation.

Internal cleanup time

Someone has to fix the mess. A hallucinated promise means a support agent spending 15 minutes explaining why the AI was wrong, plus potentially issuing a goodwill credit to keep the customer.

Compliance violations

In regulated industries (finance, healthcare, insurance), wrong information isn't just embarrassing. It's a regulatory event. One hallucinated claim about coverage or eligibility can trigger an audit.

Silent churn

The scariest hallucinations are the ones customers don't report. They get bad information, quietly lose confidence in your product, and leave. You never connect the churn to the chatbot interaction.

How to Prevent It

If you're using a generation-based bot, these steps reduce (but don't eliminate) hallucination:

Constrain the knowledge base tightly

Don't dump your entire help center into the bot's context. Curate what it can access. Fewer sources means less room for the model to mix and match facts into fiction.

Set up automatic escalation for low-confidence answers

Most LLM APIs return confidence scores or you can infer them. When confidence drops below a threshold, hand off to a human instead of guessing.

Audit regularly

Pull 50 random bot conversations per week. Check the facts. This catches patterns that individual customers won't report.

Consider whether you need generation at all

This is the uncomfortable question. If 75% of your tickets are "where's my order," "I need a refund," "reset my password," and "how do I cancel," you don't need a language model crafting unique prose for each one. You need accurate classification and reliable actions.

A classification at $0.20 that's right 92% of the time and never invents policy is, for most teams, a better deal than a $0.99 generation that's right 95% of the time but occasionally fabricates commitments your company has to honor.

The Honest Trade-Off

I'm not going to pretend classification solves everything. If your support requires nuanced, multi-paragraph explanations of complex topics, you need generation. Some products are complicated enough that pre-built responses can't cover every scenario.

But most products aren't that complicated. Most support tickets are repetitive. And for repetitive tickets, the safest AI is one that can't make things up because it was never designed to.

The Air Canada case set a precedent: companies are liable for what their AI says. That should change how you think about which type of AI you deploy.

Try Hallucination-Free Classification

$5 in free credits. No credit card required. Set up in under 15 minutes.

Try Hallucination-Free Classification

AI hallucination customer supportchatbot hallucination examplesAI making things upchatbot wrong answersAI support accuracyprevent AI hallucinationclassification vs generation AI

AI & Technology

AI Hallucinations in Customer Support: When Chatbots Lie to Your Customers

Air Canada Lost a Court Case Because Their Bot Made Stuff Up

The Problem Is Structural, Not Fixable With Better Prompts

Generation-First vs. Classification-First

What Hallucination Actually Costs

Customer trust erosion

Internal cleanup time

Compliance violations

Silent churn

How to Prevent It

Constrain the knowledge base tightly

Set up automatic escalation for low-confidence answers

Audit regularly

Consider whether you need generation at all

The Honest Trade-Off

Try Hallucination-Free Classification

Related Posts

Your Chatbot Sounds Like a Robot (or Worse, a Used Car Salesman)

AI-Powered Support QA: Score Every Interaction, Not Just a Random 2%

Why Customers Hate Your AI Chatbot