Back to Documentation
Configuration
Ingestion Engine Architecture: Deep Dive into High-Throughput Edge Capture
Deep dive into the horizontally scalable, high-performance edge capture service built in Golang.
The CXMind Ingestion Engine is a horizontally scalable, high-performance edge service built in Golang. It is architected to process tens of thousands of concurrent UDP packets per second, transforming raw network traffic into structured, actionable voice data in real-time.
Core Processing Pipeline
High-Performance Packet Sniffer:
- Multi-Mode Capture: Supports binding to UDP 9060 for HEP v3 (Homer) encapsulated traffic or direct capture from physical NICs using AF_PACKET / Zero-copy modes.
- Kernel-Level Filtering: Leverages BPF (Berkeley Packet Filter) expressions to discard non-VoIP traffic within the kernel space, drastically reducing CPU interrupts and context-switching overhead in user space.
Concurrent SIP Parser:
- State Machine Management: Extracts Call-ID, From/To tags, and CSeq headers to maintain a lightweight in-memory state machine, tracking the full call lifecycle from INVITE and 200 OK to BYE.
- Zero-Allocation Parsing: Utilizes pointer slicing (string headers as sub-slices of the original buffer) rather than string copying, ensuring a zero-allocation profile even under extreme concurrency.
SDP Negotiation & SRTP Decryption:
- Media Tracking: Automatically parses SDP payloads to dynamically negotiate media IPs, ports, and encryption suites.
- In-Memory Decryption: Extracts ephemeral SRTP keys on the fly. For maximum security, keys reside strictly in volatile memory for AES-GCM/AES-CTR streaming decryption and are never persisted to disk.
DSP Goroutine Pool:
- Adaptive Jitter Buffer: Handles RTP packet reordering and loss concealment to ensure a continuous, high-quality audio stream.
- Stream Transcoding: Uses optimized CGO bindings or native Go implementations to transcode various codecs (G.711, G.729, Opus) into standard PCM 16-bit 8k/16k arrays for downstream ASR consumption.
Distributed State Sync
Since UDP load balancers often distribute traffic asynchronously across multiple instances, the Ingestion Engine is designed to be entirely stateless, allowing for seamless elastic scaling:
- Real-time State Broadcasting: Upon processing critical signaling (e.g., a 200 OK establishing a media session), the engine immediately broadcasts conversation metadata to a Redis Pub/Sub cluster.
- Multi-Node Synchronization: This mechanism ensures that any node—whether it is handling the signaling or the media stream—stays synchronized. If an Ingestion Node fails, adjacent replicas can instantly take over the session based on the state snapshot in Redis.
- Horizontal Scalability: Deployment via containers allows for dynamic scaling based on traffic peaks. Nodes remain decoupled, achieving coordination through consistent hashing and the Redis backbone.
Key Performance Metrics
| Metric | Performance | Notes |
|---|---|---|
| Packet Throughput | 50,000+ PPS / Core | Optimized UDP handling via Go Runtime |
| Memory Footprint | < 200MB (Standby) | Extremely low static memory overhead |
| Decryption Latency | < 1ms | Stream-based AES processing for SRTP |
| Sync Latency | < 10ms | Sub-millisecond state sync via Redis |
Need more help or have a specific architecture question?
Contact Engineering Support