If, for whatever reason, you can not reduce the token size of your voice agent prompt but still want to reduce its latency, experiment with a different TTS engine.
Consider the following voice agent:

Let's call this agent ‘Agent A’.
The estimated latency of ‘Agent A’ is 1070-1250ms.
It uses the ‘Gemini 3.1 Flash Lite’ model, its token size is 3593-5733 tokens, and it cost $0.089/min.
The ‘Agent A’ uses the ‘GPT-40 mini TTS:

Create a copy of ‘Agent A’ and call it ‘Agent B’.
Change the TTS engine of ‘Agent B’ to ‘Elevenlabs Flash V2’:

The estimated latency of ‘Agent B’ is now 820-1000ms (1070-1250ms.).
It still uses the ‘Gemini 3.1 Flash Lite’ model, its token size is still 3593-5733 tokens, but it now cost $0.114/min instead of $0.089/min.

So just by changing the TTS engine, you managed to reduce the latency.
The TTS engine is the underlying system that converts text into speech.
Think of it as the technology: Elevenlabs Flash V2, OpenAI's GPT-4o mini TTS, Cartesia, PlayHT, and so on.
Each engine has its own architecture, latency profile, streaming behaviour, and pronunciation quality.
The engine is what determines how fast audio starts, how smoothly it streams, and roughly what it costs per minute.
The voice is a specific character (like "Grace," "Olivia," a male British voice, a young American female) speaking through that engine. The voice defines the tone, accent, pitch, and personality.
The same TTS engine can offer dozens of different voices, and a similar-sounding voice often exists across multiple engines, but it won't behave identically because the engine underneath is different.
When I compared the two agents, I wasn't really comparing two voices. Both used "Grace." I was comparing two TTS engines that each host their own version of ‘Grace’.
The Grace on 'GPT-4o mini TTS' and the 'Grace on Elevenlabs Flash V2' are different underlying voices that share a name and persona.
The latency, streaming behaviour, and cost differences came from the engine swap, not from any change in voice character.
If you ever want to genuinely compare two voices, keep the engine the same and change only the voice within it.
If you want to compare engines, keep the voice persona the same (as I did) and swap the engine.
Mixing the two changes at once, you can't tell which one caused the result.
Other Articles on Voice AI.
- State Machine Architectures for Voice AI Agents.
- Missing Context Breaks AI Agent Development.
- Avoid the Overengineering Trap in AI Automation Development.
- Retell Conversation Flow Agents - Best Agent Type for Voice AI?
- How To Avoid Billing Disputes With AI Automation Clients.
- Don't 'Build' AI Automation Workflows, 'Code' Them.
- Critical Aspect of Prompt Engineering - Domain Parameters.
- Zero Shot vs Single Shot vs Multi Shot Prompting.
- How to Build Reliable AI Workflows.
- Stop Building AI You Can't Fix.
- Automating 100% of your workflows is a disaster waiting to happen.
- How to build Voice AI Agent that handles interruptions.
- AI Automation Without CRM Is Useless for Business Growth.
- Structured Data in Voice AI: Stop Commas From Being Read Out Loud.
- Why Your Voice AI Sounds Robotic and How to Fix It.
- Why You Need an AI Stack (Not Just ChatGPT).
- AI Default Assumptions: The Hidden Risk in Prompts.
- Vibe Coding Fails Without Context and Expertise.
- How to make your Voice AI Agent Date & Time Aware.
- Why AI Agents lie and don't follow your instructions.
- How to Write Safer Rules for AI Agents.
- Two-way syncs in automation workflows can be dangerous.
- Using Twilio with Retell AI via SIP Trunking for Voice AI Agents.
- The Realistic Latency Target for Voice Agents.
- The required-field loop that breaks voice agents.
- Why your Voice prompt needs a clean-up pass.
- When to split your voice agent - The Bleed Test Framework.
- Abuse Ladder in Voice AI.
- Understanding Retell AI Transfer Screening Agents.
- The 80/20 Rule of Voice Agent Development.
- Retell AI Current Time Awareness has a reliability problem.
- Every dynamic variable in voice agent needs a fallback.
- When your Voice AI grader is wrong, not your agent.
- Voice Agent Prompt Formatting Matters Lot More Then You Think.
- Most AI Projects Fail. But Yours Will Succeed.
- Run both Voice AI agent and test simulator on the same model.
- Four LLM Failure Modes and One User Failure Mode.
- What over 1000 Hours of AI Training Taught Me.
- Zero-Width Characters in Voice Agent Prompts.
- Reducing Voice Agent Latency by Testing Different TTS Engines.