Your Product Is Down and Tickets Are Flooding In: The Outage Support Playbook
Outages are inevitable. The support experience during an outage determines whether customers wait patiently or start writing angry tweets. Here's the tactical playbook.
The First 15 Minutes Define Everything
Your monitoring alerts fire. The product is down. In 15 minutes, your support queue will triple. In 30 minutes, customers will be on Twitter. In an hour, someone from sales will forward you an email from a prospect asking if your product is reliable.
What you do in those first 15 minutes determines whether this is a manageable incident or a brand-damaging event.
Before the Outage: Build the Infrastructure
If you're reading this during an outage, skip to the next section. If you're reading this on a calm Tuesday, do these things now.
Get a status page live
Statuspage, Instatus, or even a simple static page. The URL should be something memorable: status.yourcompany.com. Put the link in your app's footer, in your help center header, and in your chatbot's fallback responses.
A status page does two things. It gives customers a place to check that isn't your support queue. And it signals that you take reliability seriously enough to have monitoring infrastructure.
Write your outage templates in advance
You need four messages pre-written and approved:
The acknowledgment: "We're aware of an issue affecting [service]. Our engineering team is investigating. We'll update this page every 30 minutes."
The investigation update: "We've identified the cause. [Brief non-technical description]. We're working on a fix. ETA: [time or 'we'll update in 30 minutes']."
The resolution: "The issue has been resolved. [Service] is operating normally. We'll publish a full post-mortem within 48 hours."
The follow-up: "Here's what happened, why, and what we're doing to prevent it." This one you write after the incident, but having the template structure ready saves time.
Set up auto-classification for outage tickets
When an outage hits, 80% of incoming tickets will be the same message: "Is the app down?" or "I can't log in" or "Getting an error." If your support tool can auto-detect these and respond with a link to your status page, you just saved your agents from answering the same question 200 times.
At $0.20 per classification, handling 500 outage tickets automatically costs $100. Having two agents manually respond to 500 identical tickets costs their entire shift.
During the Outage: The Minute-by-Minute Playbook
Minute 0-5: Confirm and communicate internally
Don't update customers until you've confirmed the outage is real and widespread (not a single user's browser cache). Check your monitoring. Ping engineering. Once confirmed, move fast.
Internal communication matters as much as external. Your sales team needs to know. Your account managers need to know. Send a Slack message to #all-company: "We're experiencing an outage affecting [service]. Status page is being updated. Direct all customer inquiries to [link]. Do not speculate on cause or timeline."
That last sentence prevents well-meaning colleagues from telling customers different things.
Minute 5-15: Update the status page and set up auto-responses
Flip the status page to "Investigating." Post the acknowledgment template. Enable auto-responses for incoming tickets that match outage-related intents.
Your auto-response should contain exactly this: what's happening, that you're aware, a link to the status page, and a commitment to update at a specific interval. Nothing else. No apologies yet (save those for the resolution). No ETA guesses. No technical details.
Every 30 minutes: Update even if nothing changed
"We're still investigating. No new information yet. Next update at [time]." That sentence takes 10 seconds to write and prevents hundreds of "any update?" tickets.
The worst thing you can do during an outage is go silent. Silence breeds speculation. Speculation breeds anger. Anger breeds tweets.
When you have an ETA: Be honest about confidence
"We expect to have this resolved within 2 hours" is fine if you're genuinely confident. "We're making progress but don't have a reliable ETA yet" is better than giving a time and missing it. Missing your own deadline is worse than not giving one.
Prioritization During the Outage
Not all outage tickets are equal.
Priority 1: Customers who lost data or money
If your outage caused failed transactions, lost work, or billing errors, those customers need immediate human attention. No auto-response. Flag these and route them to a senior agent.
Priority 2: Enterprise customers with SLAs
Check your contracts. Some customers have uptime guarantees with financial penalties. Your account managers should be proactively reaching out to these customers, not waiting for them to contact you.
Priority 3: Customers asking "is it down?"
Auto-respond with status page link. These are the majority of tickets and require zero human effort if you've set up automation.
Priority 4: Unrelated tickets
Normal support tickets don't stop during an outage. Customers still have billing questions and feature requests. These can wait until the outage is resolved, but don't let them pile up for days.
After the Outage: The Post-Mortem
Publish a post-mortem within 48 hours. Public, not internal-only. Customers respect transparency.
Include: what happened, when it started and ended, how many customers were affected, root cause, what you're doing to prevent it from happening again, and any compensation (if applicable).
Skip the corporate language. "A configuration change in our database cluster caused a cascading failure" is more trustworthy than "We experienced a service disruption." People can tell when you're being vague on purpose.
Compensation guidelines
For outages under 1 hour: acknowledge and apologize. No credit needed for most products.
For outages between 1-4 hours: consider a proactive service credit for affected customers. Even a small credit (one day free, 10% off next month) turns a negative experience into a positive memory.
For outages over 4 hours: proactive outreach from account managers. Credit should be automatic, not something customers have to ask for.
The Meta-Lesson
Every company has outages. AWS goes down. Google goes down. Your product will go down. Customers understand this.
What they don't understand, and won't forgive, is silence, confusion, and the sense that you were caught off guard. The companies with the best reputations for reliability aren't the ones that never go down. They're the ones that communicate so well during incidents that customers feel informed and respected throughout.
Build the infrastructure now. Write the templates now. Set up the auto-classification now. Future you, at 2 AM with a pager going off, will be grateful.