How to Build a Support Quality Scorecard
Most teams measure support quality by gut feel. A scorecard gives you real data on what's working, what isn't, and where to coach. Here's how to build one that people actually use.
You hire a second support agent. They seem great in the interview. Two weeks in, you start getting complaints. Responses are technically correct but cold. Customers feel dismissed. One sends a screenshot to your CEO.
You didn't have a way to catch this because you weren't measuring quality. You were measuring speed and volume, and on those metrics, the new hire looked fine.
Why Gut Feel Doesn't Scale
When you're the only person doing support, quality is easy. You know if you're doing a good job because you see every response and every customer reaction.
The moment you add a second person, you lose visibility. By the time you have 5 agents, you're seeing maybe 10% of conversations. You're making quality judgments based on whichever tickets happen to land in front of you.
A scorecard fixes this by giving you a consistent framework for evaluating support interactions. Not every interaction. A random sample. Usually 5 to 10 per agent per week. That's enough to spot patterns and coach effectively.
What to Measure
You need 4 to 6 criteria. More than that and reviewers won't fill it out consistently. Fewer and you're not getting useful signal.
Here's a framework that works for most teams:
Accuracy: did the agent give the correct answer? Score 1-5. This is non-negotiable. A friendly, fast response that's wrong is worse than a slow one that's right. Weight this heavily, 30% of the total score.
Tone and empathy: did the agent match the customer's emotional state? Score 1-5. A customer who's frustrated about a billing error needs acknowledgment before a solution. A customer asking a quick factual question doesn't need a paragraph of empathy. Matching the temperature is what matters. Weight at 20%.
Completeness: did the agent address everything the customer asked? Score 1-5. Customers who ask two questions and only get one answered have to follow up. That wastes everyone's time. Weight at 20%.
Efficiency: did the agent resolve it in a reasonable number of exchanges? Score 1-5. Some agents ask for information they could find themselves. Some give vague answers that require follow-up. Fewer exchanges is better, as long as quality doesn't suffer. Weight at 15%.
Communication clarity: was the response easy to understand? Score 1-5. No jargon the customer wouldn't know. No ambiguous phrasing. Short sentences when possible. Weight at 15%.
Total: 100 points. An agent scoring 80+ is doing well. 60-79 needs coaching on specific areas. Below 60 needs a serious conversation.
How to Actually Use It
The scorecard is a coaching tool, not a punishment tool. The moment agents feel like it's being used to justify write-ups or firings, they stop caring about it.
Weekly reviews work best. Set aside 30 minutes. Pull 5 random tickets per agent. Score them. Share the scores with the agent individually, not publicly.
Focus on patterns, not individual tickets. If an agent scores low on empathy across 4 out of 5 reviews, that's a coaching opportunity. If they score low on one out of five, that might just be a bad day.
Calibration sessions matter too. Once a month, have all reviewers (usually you and a team lead) score the same 5 tickets independently, then compare. If you gave a ticket a 4 on tone and your lead gave it a 2, you need to align on what "good tone" looks like. Without calibration, the scores drift and become meaningless.
Common Mistakes
Scoring every ticket. You'll burn out in a week. Random sampling gives you enough data to coach effectively. 5 per agent per week is the sweet spot for teams under 20.
Weighting speed too heavily. Speed matters, but it's already measured by your response time metric. The scorecard should focus on quality. An agent who responds in 2 minutes with a wrong answer shouldn't outscore an agent who responds in 10 minutes with the right one.
Making the criteria too subjective. "Was the response good?" is useless. "Did the agent correctly identify and resolve the customer's issue?" is measurable. Write criteria that two different reviewers would score similarly.
Not sharing results with agents. A scorecard that only management sees doesn't improve anything. Agents need to know what they're being evaluated on and how they're doing. Transparency builds trust and gives them something to work toward.
Ignoring positive scores. If an agent consistently scores 90+, tell them. Recognition costs nothing and keeps your best people from burning out.
What About AI-Assisted Quality?
If you're using AI to handle some support interactions, you need a separate quality framework for those. AI responses should be evaluated on accuracy and completeness weekly by a human reviewer.
When AI gets something wrong, that's not a coaching conversation. That's a system improvement: update the response, refine the classification, add an edge case handler.
The scorecard for human agents stays focused on the interactions AI can't handle, the complex, emotional, or ambiguous ones. These are inherently harder, so don't compare AI accuracy rates (which look great on simple tickets) to human scores on complex tickets. Apples and oranges.
A Simpler Alternative
If a full scorecard feels like overkill for your team size (and it might, if you have 1-2 agents), start with binary quality checks.
Read 3 random tickets per agent per week. For each one, ask two questions: "Was the answer correct?" and "Would I be happy receiving this response?" Yes or no. Track the percentage over time.
If you're hitting 90%+ on both, your quality is fine. If you're below 80% on either, you need to dig deeper, and that's when the full scorecard becomes worth it.
The point isn't to build a perfect measurement system. The point is to look at your support regularly and have a consistent way to talk about what "good" means. Everything else is refinement.