Supp/Blog/When Support Automation Goes Too Far
AI & Technology7 min read· Updated

When Support Automation Goes Too Far

Air Canada's chatbot invented a refund policy. DPD's bot cursed at a customer. Klarna's satisfaction tanked. Here are the patterns behind automation disasters.


A Collection of Disasters

In January 2024, a frustrated customer asked DPD's chatbot to write a poem, and it obliged with a verse about how terrible DPD is, calling itself "useless" and cursing. The screenshots went viral on X with over a million views. DPD disabled the AI portion of their chatbot the same day.

Air Canada's chatbot invented a bereavement fare discount that didn't exist. A tribunal ruled the airline was responsible for its chatbot's advice. Cost: C$812 in damages and fees, plus permanent internet infamy.

Klarna's AI handled the equivalent workload of 700 support agents as the company shrank its overall headcount from 5,000 to under 4,000. Customer satisfaction dropped. The CEO publicly admitted they went too far and started rehiring.

These aren't edge cases. They're what happens when companies automate aggressively without guardrails.

Pattern 1: Letting LLMs Freestyle

DPD and Air Canada share the same root cause: they gave a large language model access to customers and let it generate freeform responses. LLMs are excellent at generating text that sounds right. They're terrible at knowing when they're wrong.

When a frustrated DPD customer asked the chatbot to write a poem about itself, it happily complied, producing verses about how useless it was. The model had no concept of "appropriate customer interaction." It generates the most likely response to the input, and the most likely response to "write a poem about how bad DPD is" is a poem about how bad DPD is.

When Air Canada's chatbot was asked about bereavement fares, it generated a plausible-sounding policy because many airlines have bereavement fares. The model didn't check whether Air Canada specifically had one. It generated what sounded right.

The fix: don't let LLMs generate freeform customer-facing text without review. Use them for classification (identifying intent) and retrieval (finding the right article), not for generation (writing new text). Or if you do use generation, constrain the output to pre-approved response templates.

Pattern 2: Automating Too Much, Too Fast

Klarna's mistake was speed, not strategy. Automating support with AI is smart. Cutting headcount aggressively while letting AI handle everything (including complex financial disputes) is reckless.

The AI handled volume. It answered questions. But the questions it answered poorly (billing disputes, payment plan modifications, emotional complaints) were the ones that mattered most for customer satisfaction and retention.

The fix: automate in tiers. Start with the simplest 20% of your tickets. Get that working well. Expand to 40%. Keep measuring customer satisfaction on AI-handled conversations specifically. If satisfaction drops, stop expanding and fix what's broken.

Pattern 3: No Escape Hatch

The most common complaint across all AI support failures: customers couldn't reach a human.

When the AI couldn't help, there was no obvious path to a person. Some implementations deliberately hide human escalation to keep automation metrics high. This is short-term thinking. A customer who wants a human and can't reach one doesn't become a satisfied AI-handled ticket. They become an angry ex-customer.

The fix: make human escalation visible and easy in every AI interaction. One click. No hoops. If a customer asks for a human, they get one. Period. Your automation rate will be lower, but your customer retention will be higher.

Pattern 4: Measuring Resolution Instead of Satisfaction

AI support vendors love "resolution rate" as a metric. The AI responded and the customer didn't come back. Resolution!

But "didn't come back" might mean "gave up." Or "went to a competitor." Or "called the credit card company instead." Resolution rate doesn't distinguish between satisfied customers and defeated ones.

The fix: measure satisfaction on AI conversations separately. Ask a simple question after AI interactions: "Did this solve your problem?" Track the answer. If fewer than 75% say yes, your "resolutions" are partially fictional.

The Common Thread

Every automation disaster shares one characteristic: the company prioritized efficiency over experience. They asked "how many conversations can AI handle?" instead of "how well does AI handle conversations?"

The companies that get AI support right ask different questions. What should AI handle? Where should it defer to humans? How do we know it's working? How easy is it for customers to override the AI?

What Actually Works

Classification-first, not generation-first. Identify what the customer wants, then trigger a known action. Don't generate novel text.

Confidence thresholds. If the AI isn't sure, it should say so and route to a human. A fast "let me connect you with someone" is better than a slow, wrong answer.

Conservative scope. Automate password resets and order tracking before you automate billing disputes and complaint resolution.

Visible human escalation. Always.

Separate metrics. Track AI satisfaction independently from overall support satisfaction. They're different numbers and you need both.

See a Safer Approach

$5 in free credits. No credit card required. Set up in under 15 minutes.

See a Safer Approach
AI support failureschatbot customer service failsautomation gone wrongAI chatbot disasterssupport automation mistakesbad chatbot examples
When Support Automation Goes Too Far | Supp Blog