rayvoc.ai

Glossary

Time to First Audio (TTFA)

Time to first audio (TTFA) is the elapsed time between a caller finishing their utterance and hearing the first audible sound of the agent’s response. Measured voice-to-voice — from the caller’s last syllable to the agent’s first, at the caller’s ear — it is the latency metric that corresponds to what a person on the phone actually experiences, and therefore the only honest headline number for voice AI responsiveness.

TTFA is the sum of every stage in the response path: endpointing (deciding the caller is done), speech recognition finalization, the LLM’s time-to-first-token, text-to-speech time-to-first-byte, and network/media transport in both directions. In a typical pipeline, the LLM’s time-to-first-token is the largest and most variable slice — but a fixed endpointing timeout or an extra network hop can quietly dominate.

The benchmarks that matter: humans leave roughly 200–300ms between conversational turns; responses beyond about one second feel robotic, and callers start saying “hello?” or hanging up. Measured platform TTFAs in 2026 range from roughly 600ms at the fast end to 950–1450ms for common configurations. Speech-to-speech models compress the pipeline — Grok’s voice model measures around 0.78s TTFA, versus ~1.49s for GPT-4o Realtime.

Why it matters for voice agents

Vendors quote partial latencies — model time-to-first-token, TTS synthesis speed, “API latency” — that each describe one stage while ignoring the rest. A platform can advertise a 300ms model and still deliver 1.5-second turns. TTFA closes that loophole: it includes everything, in the order the caller experiences it, including the telephone network.

Measurement discipline matters as much as the metric. TTFA should be measured over real phone calls (browser demos skip the carrier leg and flatter results), with production-shaped prompts, and reported as p50 and p95 — never the average — because the long tail of slow turns is what callers remember. Platforms that show per-call, per-stage TTFA breakdowns let you find regressions with data instead of vibes.

Be first in line when we launch

Every account starts with a 14-day free trial — 1 concurrent channel, a real phone number, and full platform access.