Back to Documentation
Configuration
ClickHouse MergeTree Topology: Scaling CDR Aggregation
Scaling aggregation capabilities for millions of Call Detail Records (CDRs).
Faced with massive, continuous text stream ingestion (Transcription data), traditional relational databases will almost universally crash due to I/O bottlenecks. CXMind fully embraces ClickHouse as the cornerstone data store for massive-scale computation and real-time aggregation.
1. Schema Meta-Table Design
sip_messages
- Purpose: Temporary storage for raw underlying SIP request tracking logs (HEP v3).
- Optimization: Utilizes the MergeTree engine with an automated TTL (Time-To-Live) mechanism (e.g., 7-day rolling expiry) to prevent disk exhaustion from raw packet storage.
call_events
- Purpose: The core table for generating macroscopic event indexing and trend analysis.
- Partitioning: Partitions by
toYYYYMMDD(event_time)to accelerate daily historical queries by skipping irrelevant data parts.
transcription_segments
- Purpose: Records every sentence fragment, speaker ID, and precision timestamp extracted from audio.
- Query Routing: Heavily relies on
call_idhashing as the SHARDING KEY. This ensures all segments belonging to the same conversation are co-located on the same shard, eliminating the need for expensive cross-node joins during context analysis.
2. Cluster Configuration Strategies
For large-scale production communication exchanges, our recommended topology model deploys a Primary-Replica Sharded High Availability architecture (e.g., 2 Shards + 2 Replicas = 4 total nodes):
- ReplicatedMergeTree Engine:Powered by ClickHouse Keeper (or ZooKeeper) for data replication. This ensures that even if a primary shard fails, the replica takes over within seconds with zero data loss.
- Go-based Ingestion Buffering:To absorb massive incoming bursts, the ingestion container employs a local BadgerDB instance for persistent delayed batching.
Atomic Bulk InsertsHigh-Performance Asynchronous Writing- Individual INSERT statements are strictly forbidden.
- The system enforces highly performant bulk inserts triggered by either
buffer_size(e.g., 10,000 records) orbuffer_interval(e.g., 2 seconds), maximizing ClickHouse's sequential write capabilities.
Need more help or have a specific architecture question?
Contact Engineering Support