System Design

What are common caching strategies in system design?

Caching improves latency and reduces load by storing frequently used data closer to the application or user. Common strategies include cache-aside, read-through, write-through, write-behind, write-around, TTL-based caching, and CDN caching.

System DesignCachingRedisScalabilityPerformanceDistributed Systems

The Short Answer

Caching stores frequently used data somewhere faster to access than the original source of truth.

The goal is usually to reduce latency, reduce database load, absorb traffic spikes, and improve user experience.

The hard part is not putting data in a cache. The hard part is deciding when the cache should be read, updated, expired, refreshed, or ignored.

The Real Problem Caching Solves

Imagine a product page that gets thousands of requests per minute. Each request needs product details, pricing, inventory hints, review summaries, and recommendation data.

Without caching, every request may hit the database or multiple downstream services.

Without Cache

User request
App server
Database hit every time

With Cache

User request
Cache hit
Database avoided

But caching creates new questions: how fresh does the data need to be, what happens when data changes, and what happens if the cache is down?

Problem Context 1: Read-Heavy Product Details

Suppose product details are read constantly but updated relatively rarely.

This is a perfect fit for cache-aside, also called lazy loading. In cache-aside, the application checks the cache first; on a miss, it reads from the database, stores the result in cache, and returns it. This is one of the most common database caching strategies.

Product getProduct(String productId) {
    Product cached = cache.get(productId);

    if (cached != null) {
        return cached;
    }

    Product product = database.findProduct(productId);
    cache.set(productId, product, ttl);

    return product;
}

Why It Works Here

Only popular products enter the cache. Rarely viewed products do not waste cache memory.

Main Tradeoff

The first request after a miss is slower because it still has to hit the database.
Cache-aside is often the first strategy to mention in interviews because it is simple, flexible, and common for read-heavy systems.

Problem Context 2: Data Should Be Fresh After Writes

Suppose users update their profile, and the next read should usually see the updated data.

One option is write-through caching. The application writes to the database and immediately updates the cache.

void updateUserProfile(UserProfile profile) {
        database.update(profile);
        cache.set(profile.id(), profile, ttl);
    }

This works well when the cache is a shared distributed cache such as Redis because all application servers read from the same cache.

User Request
App Server A
Redis Cache
Database

However, many systems introduce a second cache layer inside each application server:

App Server Local Memory Cache (L1)

            ↓

    Redis Distributed Cache (L2)

            ↓

    Database

Local caches are extremely fast because they avoid a network call to Redis, but they introduce a new challenge: stale data.

Imagine Server A updates a user profile:

Server A:
        database.update(...)
        redis.set(...)

    Server B:
        still has old value in local memory

Now Server B may continue serving stale data until its local cache is refreshed or invalidated.

Why It Works Here

Reads after updates are likely to see fresh data because the cache is updated immediately after the database write.

Main Tradeoff

Writes become slower and local caches may require additional invalidation mechanisms to avoid stale data.

Common solutions include:

  • Short TTLs on local cache entries
  • Redis Pub/Sub invalidation messages
  • Kafka or event-driven cache invalidation
  • Versioned cache keys
  • Avoiding local caches for highly dynamic data
In simple systems, write-through usually means updating a shared cache such as Redis after updating the database.

In larger distributed systems, the harder problem is keeping multiple local caches synchronized after the update.

Keeping Multiple Servers Consistent

Once a system grows beyond a single application server, cache consistency becomes more challenging.

Suppose we have:

Users
  ↓
Load Balancer
  ↓
Server A
Server B
Server C
  ↓
Redis
  ↓
Database

If Server A updates a user profile, how do Servers B and C know their cached copy is now stale?

Several approaches are commonly used.

Option 1: Redis Only (No Local Cache)

Application servers do not keep local copies of cached data. Every cache lookup goes directly to Redis.

App Server
     ↓
Redis
     ↓
Database

When Server A updates Redis, all other servers immediately see the updated value because everyone reads from the same shared cache.

Why it works
  • Simple architecture
  • No cache synchronization problems
  • Fresh data visible immediately
Tradeoffs
  • Every cache hit requires a network call
  • Redis latency becomes part of every request
  • Redis availability becomes critical

Option 2: Local Cache + Redis

Each application server maintains a small in-memory cache in addition to Redis.

Local Cache (L1)
       ↓
Redis (L2)
       ↓
Database

Requests first check local memory. Only if the data is missing do they query Redis.

Why it works
  • Extremely fast reads
  • Reduced Redis traffic
  • Lower latency for hot data
Tradeoffs
  • Servers can hold different versions of data
  • Stale reads become possible
  • Additional invalidation mechanisms required

Option 3: Local Cache + Short TTL

Instead of synchronizing caches immediately, each server accepts a small amount of staleness.

User Profile
TTL = 30 seconds

If Server B has stale data, it naturally expires after a short period and is refreshed from Redis.

Why it works
  • Simple implementation
  • No messaging infrastructure required
  • Good enough for many systems
Tradeoffs
  • Users may briefly see old data
  • Updates are not immediately visible everywhere
  • Choosing the right TTL can be difficult

Option 4: Local Cache + Pub/Sub Invalidation

When a server updates data, it also publishes an invalidation event.

Server A updates profile
          ↓
Publish invalidation event
          ↓
Servers B and C receive event
          ↓
Evict local cache entry

Redis Pub/Sub, Kafka, or another message broker can be used to distribute invalidation events.

Why it works
  • Very fast local cache reads
  • Near real-time cache consistency
  • Scales well across many servers
Tradeoffs
  • More moving parts
  • More operational complexity
  • Lost invalidation events can cause stale data
In interviews, a strong answer is:

"If freshness matters, I would start with Redis as a shared cache. If latency becomes a concern, I would introduce local caches and use Pub/Sub invalidation or short TTLs to keep them synchronized."

Problem Context 3: Very High Write Volume

Suppose a system records lots of events, counters, or activity logs. Writing synchronously to the database on every change may be too expensive.

A write-behind strategy writes to cache first and persists to the database asynchronously later.

recordEvent(event) {
    cache.increment(event.counterKey());

    // async worker later flushes updates to database
}

Why It Works Here

Writes feel fast because the request does not wait for the database every time.

Main Tradeoff

If the cache or async pipeline fails before flushing, data may be delayed or lost unless the pipeline is durable.
Write-behind can improve write performance, but it needs careful durability and failure handling.

Problem Context 4: Avoid Polluting the Cache

Suppose users upload large reports or rarely accessed documents. Writing every new object into cache may waste memory.

A write-around strategy writes directly to the database and does not immediately populate the cache.

void saveReport(Report report) {
    database.save(report);

    // do not cache immediately
    // cache later only if users actually read it
}

Why It Works Here

Rarely read data does not fill the cache.

Main Tradeoff

The first read after write may be slower because the cache does not have the data yet.
Write-around is useful when many writes are unlikely to be read soon.

Problem Context 5: Fast Content Near the User

Suppose you serve images, JavaScript bundles, CSS files, videos, or public article pages to users around the world.

A CDN cache (Content Delivery Network) stores content at edge locations closer to users. This reduces latency and reduces origin server load.

User
Nearby CDN edge
Origin server
Database / storage

Why It Works Here

Static or mostly static content can be served from a nearby edge cache.

Main Tradeoff

Cache invalidation and stale content become important when content changes.

Problem Context 6: Rapidly Changing Data

Suppose you cache comments, activity feeds, leaderboards, or inventory-like data that changes frequently.

One practical approach is TTL-based caching. Every cache key gets an expiration time, and the system accepts that data may be slightly stale for a short period.

cache.set(
    "leaderboard:daily",
    leaderboard,
    Duration.ofSeconds(5)
);

AWS recommends applying TTLs to cache keys in most cases, and notes that short TTLs can be a practical way to protect a hammered database query while evaluating a more elegant solution.

Why It Works Here

A short TTL reduces database load while limiting how stale the data can become.

Main Tradeoff

Users may briefly see old data.

Problem Context 7: Expensive Data That Must Stay Warm

Suppose a homepage, recommendation block, or pricing summary is very expensive to compute and gets requested often.

A refresh-ahead strategy refreshes cached data before it expires, so users are less likely to experience a slow cache miss.

Cache entry exists
Near expiration
Refresh in background
User sees warm cache

Why It Works Here

Users avoid slow misses for very hot or expensive data.

Main Tradeoff

The system may refresh data that nobody ends up requesting.

Common Cache Layers

In-Memory Local Cache

Fastest access, but each application instance has its own copy. Good for small reference data.

Distributed Cache

Redis or Memcached-style cache shared by many application servers. Good for cross-instance coordination.

Database Query Cache

Stores expensive query results, but invalidation can become tricky when underlying rows change.

CDN / Edge Cache

Stores public content near users. Great for static assets and cacheable pages.

The Hard Part: Cache Invalidation

Cache invalidation means removing or refreshing old data when the source of truth changes. Redis describes invalidation as removing old data from the cache so the system can avoid serving outdated data and improve cache usefulness.

Common invalidation approaches include:

  • expire keys with TTL
  • delete cache keys after database writes
  • update cache immediately after writes
  • publish events that tell services to evict keys
  • version cache keys when data models change
A cache can make a system faster, but a stale cache can make the system confusing or incorrect.

Cache Stampede / Thundering Herd

A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.

Popular key expires
        ↓
1,000 requests miss cache
        ↓
1,000 database queries
        ↓
database spike

Common protections include:

  • request coalescing or single-flight loading
  • locks around cache rebuilds
  • jittered TTLs so many keys do not expire together
  • refresh-ahead for very hot keys
  • serving stale data briefly while refreshing in the background

Choosing the Right Strategy

StrategyProblem ContextMain Risk
Cache-asideRead-heavy data loaded on demandFirst miss is slower
Write-throughData should be fresh after writesSlower writes
Write-behindHigh write volumeData loss risk without durable async pipeline
Write-aroundAvoid caching rarely read writesFirst read after write is slower
Short TTLFast-changing dataBrief stale reads
Refresh-aheadHot expensive dataUnneeded refresh work

The Interview-Friendly Explanation

Caching improves latency and reduces load, but each strategy has a consistency tradeoff. Cache-aside loads data on demand and is common for read-heavy systems. Write-through keeps cache fresher but slows writes. Write-behind improves write latency but needs durable failure handling. Write-around avoids cache pollution. TTLs and invalidation control staleness. For distributed systems, also discuss Redis, cache stampedes, stale data, and what happens if the cache fails.

Common Interview Follow-Ups

What is the most common caching strategy?

Cache-aside is one of the most common strategies. The application checks the cache first, reads from the database on a miss, stores the result in cache, and returns it.

Why is cache invalidation hard?

Because the cache is a copy of data. When the source of truth changes, every cached representation that depends on that data may need to be updated or removed.

What is a cache stampede?

A cache stampede happens when many requests miss the cache at the same time and all hit the database or downstream service together.

Should every piece of data be cached?

No. Cache data that is expensive to fetch or compute and is read often enough to justify cache memory and invalidation complexity.

What happens if Redis is down?

Usually the system should degrade gracefully. Depending on the product, it may bypass cache and hit the database, serve stale data, shed load, or return an error for noncritical features.

Final Takeaway

Caching is not one pattern. It is a set of tradeoffs. A good system design answer explains what data is being cached, why it is safe to cache, how it is invalidated, how stale it can be, and what happens when the cache fails.