Purpose-Built Models vs General LLMs: Why Specialization Wins for Support
A model trained on 315 support intents outperforms GPT-4 for support classification. Here is why, and when general models still have a place.
The Specialist vs The Generalist
A general-purpose LLM is a brilliant generalist. It can discuss philosophy, write SQL queries, and draft marketing copy. It has read the entire internet. But when you ask it to classify a customer support message into one of 315 specific intents, it is working outside its comfort zone.
A purpose-built classification model does one thing: it reads a support message and tells you what the customer wants. It was trained on hundreds of thousands of labeled support messages. Every weight in the model is optimized for this single task.
The difference in performance is significant.
Why Specialization Wins
Accuracy. A purpose-built classifier hits 92% accuracy on support messages out of the box. A general LLM with a good prompt might hit 80 to 85%. The remaining 7 to 12% gap means the difference between "mostly works" and "reliably works."
Speed. Classification takes 50 to 200ms. LLM inference takes 2 to 8 seconds. For a customer waiting for a response, that gap matters. For a system processing thousands of messages in batch, it is the difference between minutes and hours.
Cost. A small classification model runs on modest hardware. A large language model requires expensive GPU compute. At scale, this translates directly to cost per message.
Predictability. A classifier returns one of 315 defined intents. It cannot hallucinate a new intent. It cannot give a wrong but confident-sounding answer. If it is not sure, the confidence score tells you, and you route to a human. This predictability makes it safe to automate actions based on the result.
Consistency. Same input, same output. Every time. There is no prompt engineering to maintain, no temperature settings to tune, no version changes that break your workflow.
The 315 Intents
What does a purpose-built support model actually know? Here is a sampling of the 315 intents it classifies:
- password_reset: Customer wants to reset or change their password - refund_request: Customer wants money back for a purchase - order_tracking: Customer wants to know where their order is - subscription_cancel: Customer wants to cancel their subscription - bug_report: Customer is reporting a software bug - feature_request: Customer is suggesting a new feature - pricing_inquiry: Customer wants to know about pricing - payment_failure: Customer's payment did not go through - account_deletion: Customer wants to delete their account - shipping_delay: Customer's order is late - two_factor_setup: Customer needs help with 2FA - invoice_request: Customer needs an invoice copy - data_export: Customer wants to export their data
Each intent maps to specific categories (billing, technical support, account management, etc.) and can trigger specific automated actions.
Where General LLMs Still Win
Purpose-built models are not the answer for everything:
Complex troubleshooting. When a customer describes a multi-step technical issue, a general LLM can reason about the problem and suggest solutions. A classifier just tells you the intent is technical_issue.
Response generation. When you need to draft a personalized reply, a general LLM writes better than any template. Use it to draft responses for human review.
Knowledge base search. LLMs are excellent at finding relevant documentation and summarizing it for the customer.
Multi-turn conversations. When the customer needs to go back and forth to resolve something, a conversational model handles that better than a single-shot classifier.
The Right Architecture
Use both, in the right order:
1. Classifier first. Every message gets classified. Fast, cheap, reliable. This determines the intent and confidence. 2. Rules engine second. High-confidence classifications trigger automated actions. No LLM needed. 3. LLM third (optional). Low-confidence messages or complex intents get passed to an LLM for response drafting. A human reviews before sending.
This architecture gives you the speed and reliability of a specialized model for the majority of messages, with the flexibility of a general model for the edge cases.