Interview Prep

System Design Interview Questions

Distributed systems, messaging, scalability, caching, queues, consistency, and backend architecture interview prep.

Showing 18 of 18 questions

Kafka Partitions and Ordering Explained

Kafka guarantees ordering within a partition, not across the whole topic. The key idea is choosing the right message key so related events go to the same partition.

KafkaDistributed SystemsMessagingOrdering

What are idempotency keys and why are they important?

Idempotency keys prevent duplicate side effects when clients retry requests. They are critical for payments, order creation, purchases, booking systems, and any API where running the same operation twice would be dangerous.

ReliabilityAPIsDistributed SystemsPayments

Exactly-once vs at-least-once delivery in Kafka

At-least-once means messages are not lost but may be processed more than once. Exactly-once means the final processing effect happens once, which usually requires transactions, idempotency, or careful system design.

KafkaDistributed SystemsReliabilityIdempotency

What are common rate limiting strategies in system design?

Rate limiting controls how many requests a client can make within a time period. Common strategies include fixed window, sliding window, token bucket, and leaky bucket.

System DesignRate LimitingAPIsScalabilityDistributed Systems

What is consistent hashing and why is it useful?

Consistent hashing distributes keys across servers in a way that minimizes remapping when servers are added or removed. It is useful for caches, sharding, load balancing, and routing users to stable backend nodes.

System DesignConsistent HashingDistributed SystemsCachingSharding

How would you safely transfer money between accounts?

A safe money transfer system must handle concurrency, consistency, deadlocks, retries, duplicate requests, and distributed system failures while ensuring money is never lost or duplicated.

System DesignConcurrencyTransactionsDistributed SystemsConsistencyBackend

LRU Cache Design

Understand how an LRU cache works, how Java LinkedHashMap can implement it, and what interviewers expect beyond the code.

CachingJavaHashMapLinkedHashMapSystem Design

Why Use Kafka?

Understand when Kafka and similar messaging systems become useful: moving from direct service calls to durable event-driven communication.

KafkaEvent Driven ArchitectureMessagingSystem Design

What are common caching strategies in system design?

Caching improves latency and reduces load by storing frequently used data closer to the application or user. Common strategies include cache-aside, read-through, write-through, write-behind, write-around, TTL-based caching, and CDN caching.

System DesignCachingRedisScalabilityPerformanceDistributed Systems

What are tail latency and p99 latency?

Latency in system design is the time interval between the start of a request from a client to the delivery of the result back from the server. Tail latency describes the slowest requests in a system. p99 latency means 99% of requests are faster than this value, while the slowest 1% are at or above it.

System DesignLatencyp99PerformanceDistributed SystemsSRE

What is a dead letter queue and why is it useful?

A dead letter queue stores messages that could not be processed successfully after retries or validation failures. It helps the main pipeline keep moving while preserving failed messages for debugging, alerting, and replay.

System DesignMessagingKafkaDLQReliabilityDistributed Systems

Deadlock prevention strategies

Deadlock happens when threads wait forever on each other's locks. The most practical prevention strategy is consistent lock ordering, plus timeouts, smaller lock scope, and better system design.

ConcurrencyDeadlockLocksJavaSystem Design

Availability concepts in system design

Availability means the system can successfully serve users when they need it. This page explains how availability is measured, improved, and discussed in system design interviews.

AvailabilityReliabilitySLOSystem DesignHigh Availability

Timeouts and retries in system design

Timeouts prevent services from waiting forever, while retries help recover from temporary failures. Used badly, retries can overload dependencies and cause cascading failures.

ReliabilityTimeoutsRetriesBackoffSystem Design

Circuit breaker pattern in system design

A circuit breaker protects your service from repeatedly calling a failing dependency. It fails fast, gives the dependency time to recover, and helps prevent cascading failures.

ReliabilityCircuit BreakerResilienceMicroservicesSystem Design

Graceful degradation in system design

Graceful degradation means keeping the most important user flows working even when optional features or dependencies fail.

ReliabilityAvailabilityGraceful DegradationResilienceSystem Design

Reliability concepts in system design

Reliability means a system performs its intended function correctly and consistently over time. It includes availability, correctness, durability, recovery, observability, and predictable behavior during failures.

ReliabilityAvailabilitySLOError BudgetSystem Design

Health checks in system design

Health checks help infrastructure decide whether an instance is alive, ready for traffic, or should be restarted. They are essential for load balancing, deployments, and auto-recovery.

Health ChecksReliabilityAvailabilityKubernetesSystem Design