1. Scalability
Design systems that handle growth in load by adding resources (vertical) or adding more machines (horizontal).
- Prefer horizontal scaling where possible.
- Stateless services simplify scaling.
2. Availability & Reliability
Keep the system responsive and correct despite failures.
- Use redundancy and failover.
- Design for graceful degradation.
3. Partitioning (Sharding)
Split data/work across multiple nodes to reduce per-node load.
- Shard by user ID, geography, or time-window depending on access patterns.
4. Data Consistency & CAP Tradeoffs
Understand CAP: Consistency, Availability, Partition tolerance — you cannot fully have all three in distributed systems.
- Choose between strong consistency (synchronous replication) or eventual consistency (async replication) based on requirements.
5. Caching
Reduce latency and load with caches (CDN, in-memory caches like Redis).
- Cache invalidation is hard—use TTL and versioning.
6. Load Balancing
Distribute traffic across servers to avoid hotspots and single points of failure.
7. Observability
Logs, metrics, and traces help detect and diagnose problems quickly.
- Instrument key business flows; set meaningful alerts.
8. Security & Privacy
Always design with threat models, least privilege, encryption in transit and at rest.
// Components (high level)
// - API servers (stateless)
// - Database (sharded), e.g., user-based shards
// - Cache (Redis) for lookups
// - Message queue for async analytics
// Flow:
// 1. Client -> Load Balancer -> API Server
// 2. API checks cache for short->long mapping
// 3. If miss, read from DB and populate cache
// 4. Writes: generate ID, write to DB and invalidate related caches
Architecture snippet (ASCII)
// Internet
// |
// Load Balancer
// / | \
// API1 API2 API3 (stateless)
// | | |
// +-> Cache (Redis) <-+
// | |
// +-> DB (Shard A/B/C)+-> Message Queue -> Analytics