System Design: Notification System

A notification system is essential for keeping users engaged and informed. Whether it’s a social platform alerting you to a new message or an e-commerce site confirming an order, timely communication builds trust and drives interaction. Designing such a system at scale is challenging because it needs to cater to millions of users, respect individual preferences, and maintain reliability across several delivery channels. In this post we’ll explore how to architect a backend notification service that delivers real-time and batched messages through in-app alerts, push notifications, and email. We’ll also cover strategies to handle retries, prevent duplicates, and monitor the health of the system.

1. Goals and Requirements

Before diving into the architecture, let’s clarify our objectives. We need a system that can:

Send alerts across multiple channels: in-app (the familiar bell icon), push notifications for mobile/web, and email messages.
Scale to millions of users without degraded performance.
Respect user preferences—some might only want in-app alerts, others might prefer email, or a mix of all channels.
Handle reliability concerns with retries and deduplication so users don’t receive multiple copies of the same message.
Support both real-time notifications (e.g., someone liked your photo) and batched digests (e.g., a daily summary of new followers).

With these requirements in mind, we can design a flexible system that grows with our user base and remains responsive even during traffic spikes.

2. Typical User Flow

To understand what our system must do, consider a scenario within a social network. User A likes a post created by User B. We want User B to be notified about this action as quickly as possible. That notification might appear immediately within the application, or it might trigger a push notification on B’s phone. If B has email alerts enabled, they could also receive a message summarizing the activity. The flow is as follows:

Trigger event – The application emits an event when User A clicks “like”. This event contains metadata about the actor (User A), the receiver (User B), the object acted upon (the post), and the type of event (like).
Enqueue event – The event is placed onto a message queue so it can be processed asynchronously. This decoupling protects the main application from spikes in activity.
Process event – A worker service reads the event from the queue, checks User B’s preferences, generates a notification payload, and then dispatches that payload to the appropriate delivery channels.
Deliver message – Each delivery channel (in-app, push, email) is handled by its own specialized service that ensures the message is delivered. If sending fails, the service retries according to a backoff strategy.

This flow might look straightforward for one event, but we need to consider many edge cases. What if User B has blocked User A? What if the push notification service is temporarily unreachable? And how do we ensure we never send the same notification twice if a worker crashes midway through processing? We’ll address these issues as we design the system.

3. Architecture Overview

The architecture consists of multiple components working in concert. At a high level we have:

Notification API – Exposes endpoints for other services to trigger notifications. Each request to this API validates input and publishes a message to the Notification Queue.
Notification Queue – A message broker such as Kafka or RabbitMQ. It acts as a buffer between the API and processing workers. By controlling the rate at which we consume messages, we can handle bursts without overwhelming downstream systems.
Processor Worker – Dequeues events and performs the heavy lifting: fetching additional information about the event, checking user preferences, creating the message payload, and fanning out to delivery services.
Preference Service – Stores per-user settings for notification channels and topics. For instance, some users might disable email notifications or only want push alerts for important events.
Channel Services – Separate services (In-App Writer, Push Service, Email Service) that handle the specifics of each delivery mechanism. They operate independently so a failure in one channel (e.g., email provider outage) does not affect others.
Databases and Caches – In-app notifications are typically stored in a database so the user can view them later. We also rely heavily on caching (e.g., Redis) for quick lookup of user preferences and deduplication data.

The decoupled nature of these components ensures that our system is fault tolerant. If one part fails, the rest can continue functioning and process backlog once the issue is resolved.

4. Event Flow Detailed Walkthrough

Let’s examine the step-by-step flow in more detail.

Step 1: Trigger Event

When an action occurs in our application, the responsible service calls the POST /notify endpoint with relevant data. Here’s a simplified JSON payload:

{
  "event_type": "like",
  "actor": "UserA",
  "receiver": "UserB",
  "object": "Post123"
}

This endpoint performs some quick validation to ensure required fields are present. It should be light-weight to avoid slowing down the calling service. After validation, the API publishes the event to the notification queue and returns success, allowing the calling service to continue its own work.

Step 2: Add to Notification Queue

The queue acts as a shock absorber, allowing us to handle bursts of events. For example, a celebrity might post something that receives thousands of likes within minutes. Without a queue, the API workers might buckle under the load. By placing events in Kafka or RabbitMQ, we smooth out spikes and gain durability. Messages remain in the queue until successfully processed by the worker.

Step 3: Processor Worker

The processor is where the core logic lives. Once it dequeues a message, it might need to fetch additional context from the database—for example, the full text of the post or profile information about the actor. Next, it queries the Preference Service to see how the receiver wants to be notified. The worker composes a user-friendly payload, such as:

UserA liked your post 'How to scale systems'

At this point the worker also generates a deduplication key. One common strategy is to hash a combination of the actor, receiver, event type, and object. This hash is stored briefly (e.g., 24 hours) in a cache. If the same payload appears again, we consider it a duplicate and discard it. This avoids scenarios where the user receives multiple notifications because of a bug or repeated requests.

Step 4: Fan Out to Channels

Once the worker has built the notification, it sends the payload to each channel that the user has enabled. If User B wants in-app and email notifications but not push, the worker will only communicate with the In-App Writer and Email Service. Each channel operates independently:

In-App Writer – Inserts a new record into the notifications table, marking it as unread. The user will see this appear under the bell icon on their next visit or when polling for updates.
Push Service – Uses a push provider such as FCM (Firebase Cloud Messaging) for Android/web or APNs (Apple Push Notification service) for iOS. It must handle device tokens and ensure messages are formatted for each platform.
Email Service – Builds an email template and sends it via a service like Amazon SES or SendGrid. This service may queue messages for later processing to avoid hitting provider rate limits.

Step 5: Sender Retries and Status Tracking

Each channel service is responsible for its own retry logic. If the push provider fails or times out, the Push Service will retry several times using an exponential backoff strategy. After a certain number of attempts (e.g., three to five), the message might be moved to a Dead Letter Queue (DLQ) for later inspection. Logging is critical here: we track whether each attempt succeeded, failed, or was skipped due to user preferences.

Step 6: In-App Notification View

For in-app notifications, the front-end periodically fetches unread notifications from the database. When the user clicks the bell icon, these notifications are marked as read. The data might look like this:

{
  "text": "UserA liked your post",
  "read": false,
  "timestamp": "2025-06-05T12:34:56Z"
}

Storing read/unread status allows us to show counts and manage batch dismissals. Because we use a database, we can also implement simple pagination so the user can browse older notifications.

5. Data Modeling

A robust data model is key to supporting new features. Here’s a simplified schema for in-app notifications and preferences:

Users
------
- id (primary key)
- email
- name

Preferences
-----------
- user_id (foreign key to Users)
- channel_in_app (boolean)
- channel_push (boolean)
- channel_email (boolean)
- digest_frequency (enum: immediate, daily, weekly)

Notifications
-------------
- id (primary key)
- receiver_id
- actor_id
- event_type
- object_id
- message
- read (boolean)
- created_at

The preferences table can be expanded to handle more granular topics (e.g., likes vs. comments) or quiet hours during which notifications are muted. The Notifications table might store additional data like a link to the related content. Indexing by receiver and created_at ensures we can quickly fetch the latest alerts for each user.

6. Managing User Preferences

Handling user preferences correctly is crucial for a good user experience. We might design a simple REST API that allows the front-end to update settings or query them on demand. However, repeatedly hitting the database during notification processing would be slow. A better approach is to cache preference data in Redis with a relatively short TTL. The Processor Worker checks the cache first; if there’s a miss, it reads from the database and populates the cache. This speeds up event processing and reduces load on the primary database.

Edge cases include what happens if the user updates their preferences while a notification is mid-flight. The simplest solution is to treat preferences as eventually consistent—changes will apply to new notifications once the cache expires. If strict consistency is required, we could include a version number with the preferences and implement optimistic locking or subscribe to a change feed that invalidates caches immediately.

7. Channel-Specific Details

Each delivery channel comes with unique challenges.

In-App Notifications

Storage – Typically we keep these in a relational database (PostgreSQL, MySQL) because queries are straightforward: fetch unread notifications for a user, mark as read, paginate older notifications.
Real-Time Updates – To deliver in-app alerts instantly, we might use WebSockets or long polling. When the processor writes a new notification, it publishes an event on a message bus that the real-time gateway relays to connected clients.
Retention – We might only keep a certain number of old notifications per user (say, the last 100) to save storage, archiving or deleting older records periodically.

Push Notifications

Device Tokens – Users may have multiple devices. We store tokens in a separate table with fields for platform, token value, and a last_seen timestamp. When a device unregisters, we remove or deactivate the token.
Payload Limits – Push services often limit payload size. We keep messages brief and include a deep link to the relevant page in our app.
Security – APNs and FCM require secure certificates or server keys. Rotating these keys must be part of our operational checklist.

Email Notifications

Templates – For maintainability, we separate email templates from code. We might use templating engines to inject dynamic data into a consistent layout.
Rate Limits – Email providers enforce per-second or per-day limits. The Email Service queues outgoing messages and processes them at a safe rate, using additional queues if needed.
Bounces and Complaints – We subscribe to bounce notifications from the provider to keep our sender reputation high. Repeatedly sending to invalid addresses could get us blocked.

8. Queueing and Asynchronous Processing

We rely heavily on asynchronous processing for scale. The Notification Queue decouples event producers from consumers. We can horizontally scale the Processor Worker by running multiple instances that share the queue. When traffic spikes, we spin up more workers; when traffic drops, we scale down to save resources.

Kafka is a popular choice here because of its durability and throughput. Each partition can be processed by a separate worker, and message ordering is preserved within partitions. For simpler setups, RabbitMQ or cloud-managed solutions like AWS SQS can also work. The key point is that the queue should handle millions of messages per day without losing data.

9. Reliability and Retry Mechanisms

At scale, failures are inevitable. Network hiccups, provider outages, and bugs can all cause messages to fail. We design for at-least-once delivery, which means it’s possible for a user to receive a notification twice if something goes wrong. To compensate, our writes must be idempotent—if we try to insert the same notification twice, the database should ignore the duplicate or overwrite the existing record without changing the state. Deduplication hashes stored in Redis help with this.

Retries are implemented with exponential backoff. For example, if the push service fails, we might retry after 1 second, then 3 seconds, then 9 seconds, and so on up to a maximum delay. Too many retries can create additional load or duplicate messages, so we limit the total number and move persistent failures to a DLQ. Operators can later inspect these messages to diagnose issues.

For email, we store a status field: pending, sent, failed. If a message fails after all retries, we mark it as failed and potentially notify an administrator. The same status tracking applies to push notifications, where we might also log specific error codes from the provider (invalid token, quota exceeded, etc.). This data feeds into monitoring dashboards and alerts.

10. Deduplication and Idempotency

Deduplicating notifications ensures users aren’t bombarded with identical messages. Consider a scenario where User A likes a post, but due to a glitch the event is sent twice. Without deduplication, User B would receive two identical alerts. To prevent this, the Processor Worker computes a hash like hash(actor_id, receiver_id, event_type, object_id) and stores it in Redis with a TTL. Before delivering a notification, the worker checks if the hash already exists. If it does, the event is skipped. This technique is fast because it operates in memory, and the TTL ensures memory usage doesn’t grow indefinitely.

Idempotency also applies to database writes. Each notification row can include a unique external ID derived from the hash. If the worker tries to insert a duplicate row, the database’s unique constraint prevents duplication. This protects against double processing if a worker crashes midway and reprocesses the same message after restarting.

11. Scaling Strategies

Handling millions of users requires thoughtful scaling techniques across the entire system.

High Throughput Events – The queue absorbs spikes. We can partition by user ID so each partition maintains order for a subset of users while enabling horizontal scaling.
Slow Channels – Email tends to be slower than push or in-app alerts. We buffer outgoing emails in a separate queue or store them in a table processed by a cron job or background worker. This prevents slowdowns from affecting real-time channels.
Caching – User preferences, deduplication hashes, and frequently accessed metadata are cached in Redis. Cache invalidation is controlled via TTLs or explicit updates from the preference service.
Load Balancing – API endpoints and channel services are fronted by load balancers. Autoscaling groups add or remove servers based on CPU or queue length metrics.
Database Partitioning – As the notifications table grows, we might shard it by user ID or use a NoSQL store like Cassandra, which can handle large write volumes and distribute data across many nodes.
Batching – For low-priority notifications or digest emails, we group multiple events together. This reduces overhead and prevents overwhelming users with dozens of emails. A nightly batch job can collect all notifications for the day and send a single summary email.

12. Monitoring and Metrics

Without observability, we can’t guarantee reliability. Key metrics to monitor include:

Queue length and processing rate – If the queue grows without being consumed, we may need to add workers.
Success and failure rates per channel – A sudden spike in push failures might indicate a provider outage or expired credentials.
Latency from event creation to delivery – Users expect near real-time feedback, especially in social applications.
Deduplication rate – Too many duplicates might mean our upstream services are sending repeated events.
Bounce and complaint rates for email – High values can damage our sender reputation.

Alerts should trigger when thresholds are exceeded. We also log every step of the pipeline. For example, when the Processor Worker receives a message, it logs the event ID and user ID. When a push notification is sent, we log the provider’s response. Aggregated logs help debug failures and provide insight into user engagement patterns.

13. Security Considerations

Because notifications often contain personal data, security is paramount. Some best practices include:

Authentication and Authorization – Only authorized services should call the Notification API. We can enforce authentication with API keys or OAuth tokens. Additionally, we must verify that the actor is allowed to trigger notifications for the receiver (e.g., you can’t like a private post without permission).
Data Encryption – Use HTTPS for all network communication. Sensitive data such as email content should be encrypted in transit and, if necessary, at rest.
Throttling – Malicious actors might try to spam the system with events. Rate limiting at the API layer protects our infrastructure and prevents abuse.
Auditing – Keep audit logs of who triggered notifications, when, and which channel was used. This can help track abuse or debug issues with privacy settings.

14. Real-Time Delivery vs. Batching

Not all notifications are equal. A friend request might warrant immediate delivery, while periodic marketing emails can be grouped into a digest. Our system differentiates between these types.

Real-Time – The Processor Worker pushes events to channels instantly. This is critical for actions where the user expects immediate feedback, such as receiving a direct message.
Batching – For less urgent updates, we store events in a separate table with a scheduled send time. A batch job runs hourly or daily to compile a summary and send it in one message. Users can choose their preferred digest frequency through preferences.

Balancing real-time and batched notifications ensures we deliver timely information without overwhelming users or our infrastructure. It also provides flexibility as our product grows; we can add new categories of notifications and decide how each should be handled.

15. Putting It All Together – An End-to-End Example

Let’s revisit the initial scenario with a deeper look at each layer.

User Action – User A clicks “like” on User B’s post.
Event Creation – The Post Service sends a request to the Notification API with the event payload.
Queue Insertion – The API validates the payload, converts it into a standard internal format, and publishes it to Kafka.
Worker Consumption – A Processor Worker listening on the Kafka topic receives the event. It fetches additional data about the post and checks User B’s preferences from Redis.
Deduplication Check – The worker computes a hash of the event and looks it up in Redis. If not found, it adds the hash with a TTL.
Payload Generation – The worker composes a message like “UserA liked your post ‘How to scale systems’” and records the event in the Notifications table.
Fan Out – Based on preferences, the worker sends the payload to the In-App Writer, Push Service, and Email Service. Each service acknowledges receipt.
Channel Delivery – The Push Service contacts FCM/APNs, the Email Service enqueues a message with SES/SendGrid, and the In-App Writer updates the database. If any channel fails, its service retries according to policy.
User Receives Notification – User B sees the alert in real time or in a later digest, depending on preferences.
Monitoring – Metrics are updated (success rate, delivery time). If failures exceed a threshold, an alert is triggered for investigation.

This example demonstrates how the pieces interact to deliver a single notification. In production, this process runs at massive scale, with thousands of events per second across many channels.

16. Conclusion

Designing a notification system involves far more than sending a message from point A to point B. We must think about scale, reliability, user preferences, and the subtleties of each delivery channel. By using a message queue, processing workers, and channel-specific services, we decouple components so failures in one area don’t cascade throughout the system. Deduplication, retries, and monitoring keep the system robust, while caching ensures low latency.

As your user base grows, you may add features like personalized ranking (so users see important notifications first), machine-learning driven engagement scores, or user-specific quiet hours. The design outlined in this post lays a strong foundation to support such enhancements. With careful planning around data modeling, preference management, and monitoring, you can deliver a reliable, scalable notification system that keeps users engaged without overwhelming them.