System Design: WhatsApp / Real-Time Chat Application
What’s the Goal?
Enable real-time, end-to-end encrypted messaging between users (1-on-1 or in groups), with support for media, read receipts, message sync across devices, and offline delivery.
Core Components
a. User Service
- Manages user profiles, devices, and online status
- Handles authentication and session management
- Stores user metadata (not message content)
b. Messaging Service
- Routes messages between senders and receivers
- Manages message delivery status
- Ensures reliable message delivery
c. Connection Service
- Manages WebSocket connections for real-time communication
- Tracks online/offline status of users
- Optimizes connection management for mobile devices
d. Storage Service
- Stores message history and media
- Maintains conversation threads
- Handles message retention policies
e. Notification Service
- Sends push notifications for offline users
- Manages notification preferences
- Integrates with platform-specific notification services (APNs, FCM)
f. Media Service
- Processes uploads/downloads of images, audio, video
- Optimizes media for different devices and connections
- Manages efficient storage and delivery
g. Encryption Service
- Ensures end-to-end encryption for all messages
- Manages key exchange and verification
- Protects metadata where possible
h. Group Service
- Manages group membership and permissions
- Handles message fanout to group members
- Optimizes group message delivery
i. Sync Service
- Keeps message history synced across user devices
- Manages conflict resolution
- Handles partial/selective sync
How Backend Processes a Message
Stage | Details |
---|---|
1. Send Message Request |
|
2. Validate & Store |
|
3. Delivery Decision (Online vs Offline) |
|
4. Receipt Updates |
|
5. Message Sync |
|
6. Group Chat Delivery (Fanout Model) |
|
7. End-to-End Encryption (E2EE) |
|
Scaling the System
Component | Scaling Strategy |
---|---|
WebSocket Connections | Use a pool of servers with sticky sessions + load balancer |
Message Queues | Kafka/SQS to queue undelivered messages and decouple services |
Database | Shard user data + messages by user ID or region |
Media Storage | Store large files in object stores (S3/GCS) + serve via CDN |
Caching | Redis for quick lookups (e.g., online status, recent messages) |
Read Receipts | Use event streams to update message states in real time |
Notification Service | Scale independently with mobile push (FCM/APNs) |
Technical Challenges & Solutions
1. Connection Management
- Challenge: Maintaining millions of concurrent WebSocket connections
- Solution:
- Connection pooling with specialized servers
- Heartbeat mechanisms to detect stale connections
- Graceful connection handling for mobile devices (battery optimization)
2. Message Ordering
- Challenge: Ensuring messages appear in correct order across devices
- Solution:
- Lamport timestamps or vector clocks
- Server-assigned sequence numbers per conversation
- Client-side reordering when necessary
3. Offline Message Delivery
- Challenge: Ensuring reliable delivery when users go offline/online
- Solution:
- Message queue for undelivered messages
- Message retention policies
- Delivery receipts and retry mechanisms
4. Media Handling
- Challenge: Efficiently storing and delivering large media files
- Solution:
- Progressive upload/download
- Multiple resolutions based on device/connection
- Background compression and optimization
- Separate storage path from message content
5. Multi-Device Sync
- Challenge: Keeping conversation state consistent across devices
- Solution:
- Central message store with device-specific cursors
- Conflict resolution strategies
- Selective sync for older messages
Security Considerations
End-to-End Encryption Implementation
- Each user generates public/private key pairs on device registration
- Public keys are exchanged via the server
- Messages are encrypted with recipient’s public key
- Server only sees encrypted content
- Key verification through QR codes or verification numbers
Data Protection
- Message content encrypted in transit and at rest
- Metadata minimization where possible
- Automatic message expiry options
- Secure deletion mechanisms
Privacy Features
- Read receipt controls
- Last seen privacy options
- Profile photo visibility settings
- Group invitation controls
Conclusion
Building a WhatsApp-like system requires careful consideration of real-time communication, encryption, offline capabilities, and scale. The architecture must balance immediate message delivery with reliability, while ensuring end-to-end encryption protects user privacy. By separating concerns into specialized services and implementing proper scaling strategies, you can create a messaging system that handles millions of concurrent users while maintaining performance and security.