Three Architectures — Which One Do You Need?
TalkifAI agents come in three types:Pipeline = voice agent — assembly line of STT → LLM → TTS Realtime = voice agent — one provider handles audio end-to-end Text = chat-only agent — no audio, REST API + streaming text
Voice vs Text — The Key Rule
| Capability | Pipeline | Realtime | Text |
|---|---|---|---|
| Inbound phone calls | ✅ | ✅ | ❌ |
| Outbound calls | ✅ | ✅ | ❌ |
| Browser voice test | ✅ | ✅ | ❌ |
| Text Chat API | ✅ | ✅ | ✅ |
| Website chat widget | ✅ | ✅ | ✅ |
| Requires STT/TTS setup | ✅ | ❌ | ❌ |
| Lowest cost | ❌ | ❌ | ✅ |
Which to choose? If you need voice calls, pick Pipeline or Realtime. If you only need a website chat widget or text API, pick Text — it’s simpler and cheaper.
Pipeline Architecture
How it works
When to choose Pipeline
Mix Providers
Deepgram STT + Gemini LLM + Cartesia TTS — pick the best from each category.
Control Costs
Use a cheaper STT + affordable LLM and keep the same quality at lower cost.
Specific Voice
A particular ElevenLabs voice you love? Only possible with Pipeline.
First Time Building
Pipeline is more forgiving. Better starting point for new users.
Provider Options
- STT (Audio → Text)
- LLM (The Brain)
- TTS (Text → Audio)
| Provider | Best For | Accuracy |
|---|---|---|
| Deepgram Nova 2 ⭐ | Real-time, most languages | Excellent |
| OpenAI Whisper | High accuracy, non-English | Excellent |
| AssemblyAI | Accents, noisy environments | Very Good |
Realtime Architecture
How it works
When to choose Realtime
Speed is Critical
Sub-300ms response time — the conversation feels completely natural.
Natural Interruptions
Users can interrupt the agent mid-sentence — just like a real conversation.
Already Using OpenAI/Google
If you already have an OpenAI or Google key, Realtime delivers the best value.
Premium Experience
The highest quality conversation — as close to human as current AI allows.
Realtime Providers
| Provider | Model | What Makes It Special |
|---|---|---|
| OpenAI Realtime | gpt-4o-realtime-preview | Best quality, full function calling support |
| Gemini Live | gemini-2.0-flash | Google’s latest, competitive pricing |
Text Architecture
How it works
When to choose Text
Website Chat Widget
Embed a chatbot on any website. No microphone required — users type their messages.
Lowest Cost
No STT or TTS costs. You only pay for LLM tokens — the cheapest option.
Mobile App Chat
Build in-app chat experiences where voice isn’t appropriate (e.g., in a meeting).
API-First Integrations
Integrate AI into your own app, CRM, or support portal via REST API.
Text Agent Providers
| Provider | Model | Best For |
|---|---|---|
| GPT-4o-mini ⭐ | OpenAI | Most use cases — fast and affordable |
| GPT-4o | OpenAI | Complex reasoning, nuanced responses |
| Gemini 2.0 Flash | Fast, cost-effective alternative |
Text agents support all the same features as voice agents: custom functions, knowledge base, subagents, memory (Graphiti), and post-call analysis — except anything audio-related.
Decision Guide — Which One to Pick?
Side-by-Side Comparison
| Feature | 🔧 Pipeline | ⚡ Realtime |
|---|---|---|
| Response time | 500–800ms | 200–400ms |
| Provider choice | Any combination | OpenAI or Google only |
| Voice variety | Any TTS voice available | Provider’s built-in voices |
| Cost control | Fine-grained per component | Single provider pricing |
| Function calling | ✅ Fully supported | ✅ Supported (OpenAI Realtime) |
| Interruption handling | Good | Excellent |
| Own API key required | Optional | Required |
| Recommended for beginners | ✅ Yes | Not recommended |
Switching Architecture Later
You can change architecture after creation — nothing is locked in:- Open Studio → Agent Settings
- Select the new architecture type
- Configure the required fields
- Save and re-activate
Next Steps
Create an Agent
Now that you understand architectures, build your agent.
Choose a Voice
TTS provider and voice selection guide.