Skip to main content

Overview

The Voice Agent Runtime is a Python microservice that handles all real-time voice processing. It’s separate from TalkifAI Studio (the web app) and runs on a dedicated Google Cloud VM for performance. Repository: Livekit-Production-Agent Stack: Python, FastAPI, LiveKit Agents SDK

Key Responsibilities

FunctionImplementation
LiveKit agent workersentrypoint.py — Agent entry point
STT/LLM/TTS orchestrationproviders/ — Provider integrations
Session lifecyclesession/ — Session creation and management
Function callingtools/ — Custom and built-in tools
Recordingsservices/recording_service.py → GCS
Transcriptsservices/transcription_service.py
Noise cancellationrnnoise_wrapper.py
Post-call analysisservices/analysis_service.py
Billingservices/billing_service.py
REST APImain.py — FastAPI on port 8000

Architecture

LiveKit Server

      │  Agent Worker connects

┌─────────────────────────────────┐
│      entrypoint.py              │
│      (LiveKit Agent Worker)     │
│                                 │
│  Session Factory                │
│  ┌──────────────────────────┐   │
│  │  VoiceSession            │   │
│  │  ┌────────┐ ┌─────────┐  │   │
│  │  │  STT   │ │  LLM    │  │   │
│  │  └────────┘ └─────────┘  │   │
│  │  ┌────────┐ ┌─────────┐  │   │
│  │  │  TTS   │ │ Tools   │  │   │
│  │  └────────┘ └─────────┘  │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘


  FastAPI REST API (port 8000)
  ├── /agent    — Agent config endpoints
  ├── /chat     — Text-based chat
  ├── /batch    — Batch calling system
  ├── /sip      — BYOC/telephony
  └── /recording — Recording management

Session Lifecycle

1. User joins LiveKit room


2. Agent worker spawned (entrypoint.py)


3. Agent fetches config from Studio API
   (agent ID, prompt, providers, functions)


4. Session initialized
   ├── STT pipeline started
   ├── LLM context loaded (with memory if configured)
   ├── TTS provider initialized
   └── Tools registered


5. Live conversation loop
   ├── STT: User audio → transcript
   ├── LLM: Transcript → response (+ optional function calls)
   └── TTS: Response → audio


6. Session end (user hangs up / end_call tool)
   ├── Recording uploaded to GCS
   ├── Transcript saved to DB
   ├── Post-call analysis triggered (if configured)
   └── Billing event sent to Billing Service

Provider Integrations

STT Providers (providers/stt.py)

ProviderModelCharacteristics
DeepgramNova 2Best real-time accuracy
OpenAIWhisperHigh accuracy, slightly slower
AssemblyAIUniversal 2Strong with accents

LLM Providers (providers/llm.py)

ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo
Googlegemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash

TTS Providers (providers/voice.py)

ProviderCharacteristics
Cartesia SonicLow latency, natural
OpenAI TTSHigh quality, multiple voices
ElevenLabsMost natural, emotion-aware

Batch Calling System

The runtime includes a Redis-backed batch calling system (batch_system/):
  • ARQ for async job processing
  • Two-level concurrency: Org limit + Batch limit
  • Auto-retry: 3 attempts per call
  • Scheduling: IANA timezone support

Environment Variables

# Core
DATABASE_URL=            # PostgreSQL connection
LIVEKIT_URL=             # LiveKit server WebSocket URL
LIVEKIT_API_KEY=         # LiveKit API key
LIVEKIT_API_SECRET=      # LiveKit API secret

# AI Providers
OPENAI_API_KEY=
GOOGLE_API_KEY=
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=

# Storage
GCS_BUCKET_NAME=         # Google Cloud Storage for recordings
GOOGLE_APPLICATION_CREDENTIALS=

# Billing
BILLING_SERVICE_URL=     # Internal billing service URL

# Redis (Batch Calling)
REDIS_URL=

Deployment

The runtime is deployed on a Google Cloud VM (not serverless) because:
  • LiveKit agent workers need persistent WebSocket connections
  • Low-latency audio processing benefits from dedicated compute
  • Batch calling workers need long-running processes
Scaling: Deploy multiple VMs behind a load balancer. Each VM runs independent agent workers. See Dockerfile and .github/workflows/ for CI/CD details.