Back to Documentation
Configuration

ASR & LLM BYOK Setup: Extreme Elasticity & Vendor-Neutral Architecture

Seamlessly mix and match premium third-party cloud services with private, locally hosted models.

CXMind adheres to a strict "Vendor-Neutral" design philosophy, empowering enterprises to find the perfect equilibrium between cost, accuracy, and regulatory compliance. Through our BYOK (Bring Your Own Key) model, you can seamlessly mix and match premium third-party cloud services with private, locally hosted models.

ASR Connection Pool Architecture

The CXMind Go Ingestion Engine is engineered specifically for high-concurrency voice stream processing, centered around a robust WebSocket multiplexing mechanism.

  • Millisecond First-Token Latency (TTFT): The engine maintains a pool of persistent, "warm" connections to upstream providers (such as Azure, Google, Tencent, Deepgram, etc.). This bypasses the overhead of complex TCP/TLS handshakes for every new call.

Intelligent Routing & Load Balancing:

  • Round-Robin Routing: Entering PCM audio streams are assigned to an available TaskHandler execution handle, drawn from a health-checked (Ping-Pong) pool.
  • High-Density Multiplexing: Each individual WebSocket connection can multiplex up to 10 concurrent audio streams. A single node easily supports massive concurrent channels backed by this architecture.

Resilience & Auto-Recovery:

  • Exponential Backoff: The system employs an exponential backoff algorithm to automatically rebuild broken tunnels during network jitter.
  • Local Buffer Guard: During reconnection phases, the engine protects local buffers to prevent data loss, ensuring zero "token loss" once the link is restored.

LLM Configuration & Gateway Routing

The CXMind LLM adaptation layer is fully compatible with the OpenAI API specification, making the integration of new models as simple as a configuration update.

1. Hybrid Deployment Strategies

  • Cloud LLMs: Integrate industry-leading models like GPT-4o, Claude 3.5, or Gemini 1.5 Pro via API for complex reasoning or multi-language tasks.
  • Self-Hosted Engines: For high-volume, standardized workflows, CXMind supports offline endpoints driving Llama 3, DeepSeek, or Qwen via Ollama or vLLM.

2. BYOK Governance

  • Key Group Isolation: Configure independent API keys for different projects or tenants to enable precise billing telemetry and rate-limit control.
Dynamic Fallback:

Scenario: If a cloud provider triggers a Rate Limit or latency exceeds a defined threshold, the gateway can automatically reroute traffic to a local DeepSeek node, ensuring uninterrupted service.

Data Privacy & Compliance

The BYOK architecture is not just a cost-saving measure; it is a compliance tool.

  • Sensitive Data Masking: Before transmitting data to a public LLM, CXMind can perform local PII (Personally Identifiable Information) scrubbing.
  • On-Premise Loops: For highly classified calls, the system can be locked to use only local ASR (e.g., Faster-Whisper) and local LLMs, ensuring data never leaves the private network.

Need more help or have a specific architecture question?

Contact Engineering Support