rayvoc.ai

Platform · Models

Your models, our wiring. Swap any stage without rewriting the agent.

Point Rayvoc at any OpenAI-compatible endpoint — OpenAI, Anthropic, Grok, vLLM-hosted open source, your fine-tunes. Bring your own TTS and STT, or use the managed defaults. The agent stays the same; only the config changes.

any
OpenAI-compatible LLM endpoint
3
swappable stages: STT, LLM, TTS
0
code changes to switch models
lower
per-minute rate on BYO keys

One interface, every model

The OpenAI chat completions API has become the de facto wire protocol for LLMs — OpenAI ships it, Anthropic and Grok expose compatible endpoints, and vLLM serves any open-weight model behind it. Rayvoc builds on that: give the platform a base_url and an API key, and your agent streams against that endpoint. A hosted frontier model, a fine-tune trained on your transcripts, or a Llama variant on GPUs you control all plug in identically.

The speech stages are equally open. In pipeline mode — streaming STT → LLM → TTS — you can bring your own recognition and synthesis engines or use Rayvoc’s managed defaults, independently per stage. Or skip the pipeline entirely with speech-to-speech mode on Grok voice models, which also power our 20-language multilingual agents. The how-it-works page covers both modes in depth.

agent-models.json
{
  "llm": {
    "provider": "openai_compatible",
    "base_url": "https://vllm.internal.acme.com/v1",
    "api_key_ref": "secret://acme-vllm-key",
    "model": "acme-support-ft-v3",
    "temperature": 0.4
  },
  "stt": { "provider": "managed_default" },
  "tts": {
    "provider": "byo",
    "base_url": "https://tts.acme.com/v1",
    "voice": "nova-warm"
  }
}

Model lock-in is a real liability

Model leadership flips every few months. The best price-performance for a support agent today may be a different provider next quarter — and the platform that hard-wired you to one vendor’s models turns every flip into a migration project. BYO models is insurance: when a better or cheaper model ships, you change two lines of config, run the same calls, and compare.

Comparison is the part most platforms can’t actually deliver, because they don’t show you where time goes. Rayvoc records a per-stage latency waterfall on every call — transport, recognition, model time-to-first-token, synthesis time-to-first-audio — so an A/B between two LLM providers is a data question, not a vibe check. If your self-hosted endpoint is adding 400ms of time-to-first-token, the dashboard says so. Our latency guide explains how to read the waterfall, and low-latency voice AI covers what we do on our side of it.

Mix and match, including the telecom layer

Openness runs through the whole stack. Just as you can bring models, you can bring your own carrier over SIP — or use Rayvoc-native numbers in 100+ countries. Tool calling works the same against any model that supports JSON-schema functions, so webhooks and integrations survive a model swap untouched.

Pay for what you actually use

With your own keys, inference is billed by your providers at your negotiated rates, and Rayvoc charges a lower per-minute platform rate. Teams with existing OpenAI or Anthropic commitments get to spend them. The exact numbers are on the pricing page, with the reasoning in voice AI pricing explained.

Frequently asked questions

Which LLMs can I use with Rayvoc?

Anything that speaks the OpenAI API: OpenAI and Anthropic models, Grok, open-source models served from vLLM or similar hosts, and your own fine-tunes. You provide a base URL and key; Rayvoc streams to it. Grok voice models are also supported natively in speech-to-speech mode.

Can I bring my own TTS and STT engines too?

Yes. Each stage of the pipeline — STT, LLM, TTS — is independently swappable. Use your own engines for all three, use the managed defaults for all three, or mix: many teams bring a fine-tuned LLM and keep the managed speech stack.

Does switching models mean rebuilding my agent?

No. Prompts, tool definitions, telephony routing, and analytics live at the agent level, not the model level. Swapping the LLM is a configuration change — change the base URL and model name, and the agent keeps its phone numbers, tools, and history.

Is BYO cheaper than the managed stack?

Yes — when you bring your own model keys, you pay your providers directly for inference, and Rayvoc charges a lower per-minute platform rate since managed inference is not included. The pricing page has both rates side by side.

Run your stack, not ours

Every account starts with a 14-day free trial — 1 concurrent channel, a real phone number, and full platform access.