The following is the virtual receptionist for a dental clinic, powered by GPT 5.1 (It's the one suggested by Retell AI).
GPT-5.1 treats turn boundaries as a hard, prompt-driven property. It executes what you literally wrote. If a step in your prompt contains a multi-clause string ending in → end_call, GPT-5.1 reads that as one composite instruction: "say all of this, then end the call." Even with two explicit wait rules:
A voice agent that fires end_call prematurely loses the caller. There's no correction loop. The line is dead. The next time that customer needs your service, they remember the awkward AI experience and call your competitor.
The following is the same virtual receptionist (exact same prompt) for a dental clinic, powered by Claude 4.6 Sonnet (It's the one NOT suggested by Retell AI).
Claude 4.6 Sonnet understand my instructions so well that even if I remove the following key rule, it still works fine:
Every question step requires a wait for the caller's answer before continuing
It means you can write shorter, cleaner prompts on Sonnet without losing reliability, which is one of the genuine advantages of using stronger models
You can engineer around it. It's just expensive in a different way.
To get GPT-5.1 to behave reliably like Claude, I've ended up adding things like:
- Explicit "STOP. Wait for the caller's spoken response. Do not call any function until the caller has spoken." blocks on every question step.
- Dedicated terminal steps for end_call, with Key Rules restricting which step can trigger it.
- Branching logic written out explicitly because the model won't infer it.
- Round after round of test calls because edits to one part of the prompt destabilise behaviour elsewhere.
That's real engineering cost.
Hours of authoring time per agent, plus ongoing maintenance brittleness. It doesn't appear in Retell's per-minute pricing, but it's not free.
Sonnet 4.6 typically needs none of this. The same flow runs in 30–40% fewer prompt tokens and works on the first deployment.
On Retell, GPT-5.1 runs at roughly $0.04 per minute. Claude Sonnet 4.6 runs at roughly $0.08 per minute. On paper, GPT wins on cost.
But in production, that's not always the case.
To get a cheaper model like GPT to behave as intended, you end up adding far more guardrails (which increases prompt size and, in turn, call cost) and far more testing. And there's still no guarantee the agent will behave as expected.
The 2x per-minute cost is real. The 95%+ reliability vs intermittent failure delta is bigger. A voice agent that doesn't understand the difference between a question and a closing tag isn't a cheaper voice agent. It's a broken one.
Don't take my word for it. Test it yourself.
Other Articles on Voice AI.
- Call your Voice AI Agent a "receptionist," not an "assistant."
- Overusing Em Dashes Makes Voice Agents Sound Robotic.
- Use Contractions to make voice agents sound more natural.
- Avoid Using Tag Questions in Voice Agent Confirmations.
- Claude Beats ChatGPT for Voice AI Agents.
- How to A/B Test in Retell AI.
- Automated Alerts in Retell AI to Monitor Voice AI Operations.
- Custom Reporting For Voice AI - Mini-Course.
- CRMs like GHL are overkill for building Voice AI Agents.
- How To Bill Your Voice AI Clients Like A Pro.
- Voice AI Knowledge Base Creation Best Practices.
- How to build Cost Efficient Voice AI Agent.
- When to Add Booking Functionality to Your Voice AI Agent.
- Without IP your AI company is worth nothing.
- AI Automation Agency Pricing Rules.
- How to Prevent Toll Fraud in Retell AI.
- Voice AI - Build once → Sell many → Collect monthly forever.
- State Machine Architectures for Voice AI Agents.
- Missing Context Breaks AI Agent Development.
- Avoid the Overengineering Trap in AI Automation Development.
- Retell Conversation Flow Agents - Best Agent Type for Voice AI?
- How To Avoid Billing Disputes With AI Automation Clients.
- Don't 'Build' AI Automation Workflows, 'Code' Them.
- Critical Aspect of Prompt Engineering - Domain Parameters.
- Zero Shot vs Single Shot vs Multi Shot Prompting.
- How to Build Reliable AI Workflows.
- Stop Building AI You Can't Fix.
- Automating 100% of your workflows is a disaster waiting to happen.
- How to build Voice AI Agent that handles interruptions.
- AI Automation Without CRM Is Useless for Business Growth.
- Structured Data in Voice AI: Stop Commas From Being Read Out Loud.
- Why Your Voice AI Sounds Robotic and How to Fix It.
- Why You Need an AI Stack (Not Just ChatGPT).
- AI Default Assumptions: The Hidden Risk in Prompts.
- Vibe Coding Fails Without Context and Expertise.
- How to make your Voice AI Agent Date & Time Aware.
- Why AI Agents lie and don't follow your instructions.
- How to Write Safer Rules for AI Agents.
- Two-way syncs in automation workflows can be dangerous.
- Using Twilio with Retell AI via SIP Trunking for Voice AI Agents.