High Concurrency Database Design: Read-Write Separation, Sharding & Redis Caching Strategies | 2025

High Concurrency Database Design: Read-Write Separation, Sharding & Caching Strategies
Introduction: Database is the Most Common Bottleneck in High Concurrency Systems
The system is slowing down. You check monitoring—CPU is fine, memory is fine, network is fine.
The problem is the database.
Queries piling up, connection count spiking, response time went from 50ms to 5 seconds. This is the most common scenario in high concurrency systems.
From experience, 90% of system performance issues originate in the data layer.
This article will take you from bottleneck analysis through read-write separation, database sharding, Redis cache design, and those three headache-inducing cache problems: penetration, breakdown, and avalanche.
If you're not familiar with overall high concurrency architecture, we recommend first reading What is High Concurrency? Complete Guide.
1. Database Bottleneck Analysis
Before optimizing, you need to know where the problem is.
1.1 Connection Limit
Every database connection consumes memory. MySQL default max connections is 151, PostgreSQL is 100.
When concurrent requests exceed the connection limit, new requests can only wait. Wait too long and they timeout, users see errors.
Symptoms:
Too many connectionserror- Application can't get connections
- Requests queuing
Solutions:
- Use connection pools (HikariCP, PgBouncer)
- Increase max_connections (but there are limits)
- Read-write separation to distribute connections
1.2 Read-Write Contention
Databases have locking mechanisms. Writing may lock entire tables or rows, other requests must wait.
In high concurrency scenarios, hot data is frequently read and written, lock contention becomes severe.
Symptoms:
- Many requests in
waiting for lockstate - Write operations block reads
- Database CPU high but throughput low
Solutions:
- Read-write separation (reads don't affect writes)
- Optimistic locking instead of pessimistic locking
- Reduce lock hold time
- Hot data in cache
1.3 Single Machine Capacity Limit
How much data can one database hold? How many QPS can it handle? All have limits.
When data reaches hundreds of millions or billions of rows, single machine query performance significantly degrades.
Symptoms:
- Query time increases with data volume
- Insufficient disk space
- Backup and restore times too long
Solutions:
- Database sharding
- Archive old data
- Use distributed databases
2. Read-Write Separation Implementation
Read-write separation is the first step in high concurrency database optimization.
2.1 Principle Explanation
Most applications are "read-heavy, write-light." E-commerce sites have 90% of requests browsing products, only 10% are purchases.
Read-write separation approach:
- Primary (Master): Handles all write operations
- Replica (Slave): Handles read operations, can have multiple
Write requests → Primary
Read requests → Replica 1 / Replica 2 / Replica 3
Primary data is synchronized to replicas through replication.
2.2 Implementation Methods
Application Layer Implementation
In code, determine SQL type and decide whether to send to primary or replica.
# Pseudocode
if sql.startswith("SELECT"):
connection = slave_pool.get_connection()
else:
connection = master_pool.get_connection()
Middleware Implementation
Use dedicated middleware for automatic routing:
- MySQL Proxy
- ProxySQL
- MaxScale
- ShardingSphere
Middleware benefits: Transparent to application, no code changes.
Cloud Managed Solutions
Major cloud databases support read-write separation:
- AWS Aurora: Automatic read-write split endpoint
- GCP Cloud SQL: Create Read Replica
- Azure SQL: Configure Read Scale-Out
2.3 Replication Lag Issues
Replica data isn't synchronized in real-time, there's lag (usually milliseconds to seconds).
This causes the "just-written data can't be read" problem.
Solutions:
- Force primary reads: For queries requiring high timeliness, force reads from primary
- Lag-aware routing: Monitor replica lag, switch back to primary when lag is too high
- Causal consistency: Read with the write's GTID to ensure replica has synced
3. Database Sharding Strategy
When single machine database reaches its limits, sharding is the next step.
3.1 When to Shard
Signals to shard:
- Single table exceeds 10 million rows (rule of thumb)
- Single database exceeds 500GB
- Single database connections insufficient
- Query performance continuously declining
When NOT to shard:
- Data volume still small, optimizing SQL and indexes is enough
- Haven't done read-write separation and caching
- Team lacks distributed database experience
Sharding significantly increases complexity—not a light decision.
3.2 Database Sharding Strategies
Vertical Sharding: Split by business
Put different business tables in different databases:
- User DB: users, user_profiles
- Order DB: orders, order_items
- Product DB: products, categories
Benefit is business isolation—one DB down doesn't affect others.
Horizontal Sharding: Split by rules
Split same table's data across multiple databases:
- User DB 1: user_id 1-10 million
- User DB 2: user_id 10-20 million
Benefit is breaking single machine capacity limits.
3.3 Table Sharding Strategies
Vertical Table Sharding: Split columns
Split one wide table into multiple tables:
- users: id, name, email (frequently used columns)
- user_details: id, bio, avatar, preferences (infrequently used columns)
Reduces single row size, improves query performance.
Horizontal Table Sharding: Split rows
Split data into multiple tables by rules:
- orders_202401: January 2024 orders
- orders_202402: February 2024 orders
Or by ID modulo:
- orders_0: order_id % 4 = 0
- orders_1: order_id % 4 = 1
- orders_2: order_id % 4 = 2
- orders_3: order_id % 4 = 3
3.4 Sharding Key Selection
The Sharding Key determines which database/table data goes to. Selection is crucial.
Good sharding key characteristics:
- Frequently used in queries: Most queries include this field
- Evenly distributed: Data distributes evenly, no skew
- Rarely changes: Changing sharding key causes data migration
Common sharding keys:
- User ID (user_id)
- Order ID (order_id)
- Time (created_at)
- Tenant ID (tenant_id, SaaS scenarios)
3.5 Cross-Database Query Challenges
After sharding, the biggest challenge is cross-database queries.
Problem 1: Cross-database JOIN
User table is in user DB, order table is in order DB. How to query "user's all orders"?
Solutions:
- Application layer assembly (query user first, then orders)
- Data redundancy (order table stores username)
- Wide table (pre-merge related data)
Problem 2: Cross-database Pagination
Orders are in 4 databases, need to get "latest 10 orders"—how?
Solutions:
- Each DB gets 10, application layer merges and sorts
- Search engine (Elasticsearch) for queries
- Avoid deep pagination
Problem 3: Global Unique ID
After sharding, can't use auto-increment ID because each DB would generate duplicate IDs.
Solutions:
- UUID (simple but takes space)
- Snowflake ID (Twitter's approach)
- Database segment (Meituan Leaf)
4. Redis Cache Design
Redis is standard equipment for high concurrency systems. Used correctly, database pressure reduces 90%.
4.1 Cache Architecture
Request → Application Layer → Redis Cache → Database
↓
Cache hit returns directly
Cache Aside Pattern (most common)
- Check cache first
- If cache miss, check database
- After query, write to cache
- Return result
On update:
- Update database
- Delete cache (not update)
Why "delete" instead of "update"? Because concurrent updates might cause cache and database inconsistency. After deletion, next request will re-read from database.
4.2 Caching Strategies
What to cache?
- Hot data (frequently read)
- Expensive computation results
- Rarely changing data
How long to cache?
- TTL (Time To Live) sets expiration
- Set based on data characteristics: hot products 5 minutes, user data 1 hour, static config 1 day
- Add random value to avoid simultaneous expiration
How big should cache be?
- Estimate hot data size
- Reserve 20% space for traffic spikes
- Set eviction policy (LRU most common)
4.3 Distributed Locks
High concurrency scenarios need distributed locks to protect shared resources.
Scenario: Flash sale inventory deduction. 100 requests deducting simultaneously—without locking, might oversell.
Redis Distributed Lock Implementation:
# Lock
SET lock_key unique_value NX PX 30000
# Unlock (use Lua script for atomicity)
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
Notes:
- Set expiration time to avoid deadlock
- Use unique value to identify lock owner
- Verify it's your lock when unlocking
For more reliable distributed locks, consider Redlock algorithm or Redisson.
For more flash sale design details, see High Concurrency Transaction System Design.
4.4 Performance Tuning
Connection Pool Configuration
Don't create new connections for every request. Use connection pools, pre-establish connections for reuse.
Pipeline Batch Operations
Send multiple commands together, reduce network round trips.
pipe = redis.pipeline()
pipe.get("key1")
pipe.get("key2")
pipe.get("key3")
results = pipe.execute()
Choose Correct Data Structures
- Simple Key-Value: String
- Object attributes: Hash
- Leaderboards: Sorted Set
- Queues: List
- Dedup counting: Set / HyperLogLog
5. Cache Problem Solutions
Use caching and you'll encounter these three problems. Interview must-knows, work must-encounters.
5.1 Cache Penetration
Problem Description:
Querying non-existent data. Cache doesn't have it, database doesn't either. Every request hits the database.
Malicious attackers can use this to overload your database.
Solutions:
Solution 1: Cache Empty Values
When database finds nothing, cache an empty value (with short TTL).
GET user:999999 → null (cached)
No database query needed
Solution 2: Bloom Filter
Add a Bloom filter layer before cache. Non-existent keys get intercepted, never hitting database.
Request → Bloom Filter → Cache → Database
↓
Non-existent returns directly
Bloom filters have some false positive rate but no false negatives.
5.2 Cache Breakdown
Problem Description:
The moment a hot key expires, massive requests simultaneously hit the database.
Imagine Double 11's bestselling product—the moment cache expires, tens of thousands of requests simultaneously query the database.
Solutions:
Solution 1: Mutex Lock
Only one request queries database and updates cache, others wait.
value = redis.get(key)
if value is None:
if redis.setnx(lock_key, 1): # Acquire lock
value = db.query(key)
redis.set(key, value)
redis.delete(lock_key)
else:
sleep(0.1)
return get_with_lock(key) # Retry
return value
Solution 2: Never Expire + Background Updates
Hot keys never expire, background tasks update regularly.
Cache: Never expires
Background task: Updates every 5 minutes
5.3 Cache Avalanche
Problem Description:
Large number of keys expire simultaneously, all requests hit database. Or Redis itself goes down.
Solutions:
Solution 1: Add Random Value to Expiration
ttl = base_ttl + random.randint(0, 300) # Base TTL + 0-300 seconds random
redis.setex(key, ttl, value)
Solution 2: Multi-layer Caching
Local cache + Redis + Database. If Redis goes down, local cache holds the line.
Solution 3: Redis High Availability
Use Redis Sentinel or Redis Cluster, avoid single point of failure.
Solution 4: Circuit Breaker Degradation
When database pressure is too high, trigger circuit breaker, return default values or error messages.
Database can't handle the traffic? From read-write separation to sharding, every step has traps. Book an architecture consultation and let experienced consultants help plan your data layer optimization.
6. Cloud Database Selection
If you use cloud, you don't need to manage databases yourself. But choose the right service.
6.1 AWS Solutions
| Service | Type | Use Cases |
|---|---|---|
| RDS | Relational | Traditional apps, MySQL/PostgreSQL |
| Aurora | Relational (Cloud Native) | High concurrency MySQL/PostgreSQL, auto-scaling |
| DynamoDB | NoSQL (Key-Value) | Ultra-high throughput, serverless |
| ElastiCache | Cache | Redis/Memcached managed |
Aurora Highlights:
- 5x faster than RDS MySQL
- Auto-scales to 128TB
- 15 read replicas
- Cross-AZ high availability
6.2 GCP Solutions
| Service | Type | Use Cases |
|---|---|---|
| Cloud SQL | Relational | MySQL/PostgreSQL/SQL Server |
| Cloud Spanner | Relational (Distributed) | Global consistency, massive scale |
| Firestore | NoSQL (Document) | Mobile apps, real-time sync |
| Memorystore | Cache | Redis managed |
Cloud Spanner Highlights:
- Horizontally scalable relational database
- Globally distributed
- Strong consistency (this is rare)
- 99.999% SLA
6.3 Azure Solutions
| Service | Type | Use Cases |
|---|---|---|
| Azure SQL | Relational | SQL Server ecosystem |
| Cosmos DB | NoSQL (Multi-model) | Global distribution, multiple APIs |
| Azure Cache | Cache | Redis managed |
Cosmos DB Highlights:
- Multi-model (document, key-value, graph, column-family)
- Globally distributed
- Multiple consistency levels to choose
- Millisecond response
6.4 Selection Comparison Table
| Need | AWS | GCP | Azure |
|---|---|---|---|
| MySQL Managed | Aurora | Cloud SQL | Azure SQL |
| Massive Scale Relational | Aurora | Spanner | Azure SQL Hyperscale |
| Global Distributed NoSQL | DynamoDB Global | Spanner / Firestore | Cosmos DB |
| Redis Managed | ElastiCache | Memorystore | Azure Cache |
For more detailed cloud architecture comparisons, see Cloud High Concurrency Architecture.
7. Practical Case Study
Case: E-commerce Order System Optimization
Background:
- Daily orders: 500,000
- Peak QPS during promotions: exceeds 10,000
- Database frequently couldn't handle it
Optimization Steps:
Step 1: Add Caching
- Product info in Redis, TTL 5 minutes
- Database reads reduced by 70%
Step 2: Read-Write Separation
- Primary for writes, two replicas for reads
- ProxySQL for routing
- Replicas handle 80% of read traffic
Step 3: Order Table Sharding
- Shard by month: orders_202501, orders_202502
- Archive historical orders to cold storage
- Hot data query performance improved 3x
Results:
- QPS increased from 3,000 to 15,000
- Database CPU usage dropped from 90% to 40%
- System stable during promotions
FAQ
Q1: When should I use Redis?
Whenever you have "read-heavy, write-light" hot data, it's suitable for Redis. Including: Sessions, product info, leaderboards, counters, distributed locks.
Q2: Does read-write separation affect data consistency?
There will be lag, usually milliseconds to seconds. For "just wrote and need to read" scenarios, force reads from primary.
Q3: How to do reporting after sharding?
After sharding, cross-database queries are troublesome. Reporting should use a separate data warehouse (like ClickHouse, BigQuery), syncing data from business DBs on schedule.
Q4: Can NoSQL replace relational databases?
Depends on scenario. Scenarios needing complex queries, JOINs, transactions—relational databases remain preferred. NoSQL suits high throughput, flexible schema scenarios.
Q5: What if cache and database are inconsistent?
Use "delete cache" instead of "update cache" strategy. If still inconsistent, set shorter TTL, or use CDC (Change Data Capture) for sync.
Conclusion: Data Layer Optimization is Key to System Performance
Database is the lifeline of high concurrency systems. Optimize correctly and 10x performance improvement isn't a dream.
Key Takeaways:
- Database bottlenecks usually come from connections, lock contention, single machine capacity
- Read-write separation is the first step—replicas share read pressure
- Sharding breaks single machine limits but greatly increases complexity
- Redis cache reduces 90% database pressure
- Cache penetration solved with empty values or Bloom filters
- Cache breakdown solved with mutex locks or never-expire
- Cache avalanche solved with random TTL and high availability
- Cloud databases are hassle-free, but choose the right service
Extended Reading:
- What is High Concurrency? Complete Guide
- High Concurrency Architecture Design
- High Concurrency Testing Guide
- Python vs Golang High Concurrency
- High Concurrency Transaction System Design
- Cloud High Concurrency Architecture
Need a Second Opinion on Architecture?
Good data layer design can save multiples in costs. If you're:
- Database can't handle traffic, need optimization
- Evaluating whether to shard
- Choosing cloud database services
Book an architecture consultation and let's review your data layer architecture together.
All consultation content is completely confidential, no sales pressure.
References
- Martin Kleppmann, "Designing Data-Intensive Applications" (2017)
- Redis Official Documentation, "Redis Best Practices" (2024)
- AWS, "Amazon Aurora User Guide" (2024)
- Google Cloud, "Cloud Spanner Best Practices" (2024)
- Alibaba Cloud, "Database Best Practices" (2023)
Need Professional Cloud Advice?
Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help
Book Free ConsultationRelated Articles
What is High Concurrency? 2025 Complete Guide: Definition, Architecture Design & Cloud Solutions
What does High Concurrency mean? This article provides a complete analysis of high concurrency definition, common problems, architecture design patterns, and how to use Redis, database optimization, and cloud services to handle high-traffic scenarios. Whether you're dealing with e-commerce flash sales, ticket-grabbing systems, or real-time trading, this guide helps you design highly available system architecture.
SQLSQL vs NoSQL Complete Comparison: Database Selection Guide and Use Cases
Comprehensive comparison of SQL and NoSQL database differences, covering data models, scalability, ACID vs BASE, CAP theorem, performance characteristics, and use case recommendations to help you make the right technology choice.
Cloud DatabaseWhat is Cloud Database? 2025 Complete Guide | Free Plans, Platform Comparison, Setup Tutorial
Complete analysis of cloud database definition, pros/cons, and use cases. Compare AWS, GCP, Azure platforms, recommend free plans, and step-by-step guide to creating your first cloud database.