High Concurrency Database Design: Read-Write Separation, Sharding & Redis Caching Strategies | 2025

12/13/202514 min min read

#High Concurrency Database#Redis#Read-Write Separation#Database Sharding#Cache Penetration#Cache Breakdown#Cache Avalanche#MySQL#Cloud Database

High Concurrency Database Design: Read-Write Separation, Sharding & Caching Strategies

Introduction: Database is the Most Common Bottleneck in High Concurrency Systems

The system is slowing down. You check monitoring—CPU is fine, memory is fine, network is fine.

The problem is the database.

Queries piling up, connection count spiking, response time went from 50ms to 5 seconds. This is the most common scenario in high concurrency systems.

From experience, 90% of system performance issues originate in the data layer.

This article will take you from bottleneck analysis through read-write separation, database sharding, Redis cache design, and those three headache-inducing cache problems: penetration, breakdown, and avalanche.

If you're not familiar with overall high concurrency architecture, we recommend first reading What is High Concurrency? Complete Guide.

1. Database Bottleneck Analysis

Before optimizing, you need to know where the problem is.

1.1 Connection Limit

Every database connection consumes memory. MySQL default max connections is 151, PostgreSQL is 100.

When concurrent requests exceed the connection limit, new requests can only wait. Wait too long and they timeout, users see errors.

Symptoms:

Too many connections error
Application can't get connections
Requests queuing

Solutions:

Use connection pools (HikariCP, PgBouncer)
Increase max_connections (but there are limits)
Read-write separation to distribute connections

1.2 Read-Write Contention

Databases have locking mechanisms. Writing may lock entire tables or rows, other requests must wait.

In high concurrency scenarios, hot data is frequently read and written, lock contention becomes severe.

Symptoms:

Many requests in waiting for lock state
Write operations block reads
Database CPU high but throughput low

Solutions:

Read-write separation (reads don't affect writes)
Optimistic locking instead of pessimistic locking
Reduce lock hold time
Hot data in cache

1.3 Single Machine Capacity Limit

How much data can one database hold? How many QPS can it handle? All have limits.

When data reaches hundreds of millions or billions of rows, single machine query performance significantly degrades.

Symptoms:

Query time increases with data volume
Insufficient disk space
Backup and restore times too long

Solutions:

Database sharding
Archive old data
Use distributed databases

2. Read-Write Separation Implementation

Read-write separation is the first step in high concurrency database optimization.

2.1 Principle Explanation

Most applications are "read-heavy, write-light." E-commerce sites have 90% of requests browsing products, only 10% are purchases.

Read-write separation approach:

Primary (Master): Handles all write operations
Replica (Slave): Handles read operations, can have multiple

Write requests → Primary
Read requests → Replica 1 / Replica 2 / Replica 3

Primary data is synchronized to replicas through replication.

2.2 Implementation Methods

Application Layer Implementation

In code, determine SQL type and decide whether to send to primary or replica.

# Pseudocode
if sql.startswith("SELECT"):
    connection = slave_pool.get_connection()
else:
    connection = master_pool.get_connection()

Middleware Implementation

Use dedicated middleware for automatic routing:

MySQL Proxy
ProxySQL
MaxScale
ShardingSphere

Middleware benefits: Transparent to application, no code changes.

Cloud Managed Solutions

Major cloud databases support read-write separation:

AWS Aurora: Automatic read-write split endpoint
GCP Cloud SQL: Create Read Replica
Azure SQL: Configure Read Scale-Out

2.3 Replication Lag Issues

Replica data isn't synchronized in real-time, there's lag (usually milliseconds to seconds).

This causes the "just-written data can't be read" problem.

Solutions:

Force primary reads: For queries requiring high timeliness, force reads from primary
Lag-aware routing: Monitor replica lag, switch back to primary when lag is too high
Causal consistency: Read with the write's GTID to ensure replica has synced

3. Database Sharding Strategy

When single machine database reaches its limits, sharding is the next step.

3.1 When to Shard

Signals to shard:

Single table exceeds 10 million rows (rule of thumb)
Single database exceeds 500GB
Single database connections insufficient
Query performance continuously declining

When NOT to shard:

Data volume still small, optimizing SQL and indexes is enough
Haven't done read-write separation and caching
Team lacks distributed database experience

Sharding significantly increases complexity—not a light decision.

3.2 Database Sharding Strategies

Vertical Sharding: Split by business

Put different business tables in different databases:

User DB: users, user_profiles
Order DB: orders, order_items
Product DB: products, categories

Benefit is business isolation—one DB down doesn't affect others.

Horizontal Sharding: Split by rules

Split same table's data across multiple databases:

User DB 1: user_id 1-10 million
User DB 2: user_id 10-20 million

Benefit is breaking single machine capacity limits.

3.3 Table Sharding Strategies

Vertical Table Sharding: Split columns

Split one wide table into multiple tables:

users: id, name, email (frequently used columns)
user_details: id, bio, avatar, preferences (infrequently used columns)

Reduces single row size, improves query performance.

Horizontal Table Sharding: Split rows

Split data into multiple tables by rules:

orders_202401: January 2024 orders
orders_202402: February 2024 orders

Or by ID modulo:

orders_0: order_id % 4 = 0
orders_1: order_id % 4 = 1
orders_2: order_id % 4 = 2
orders_3: order_id % 4 = 3

3.4 Sharding Key Selection

The Sharding Key determines which database/table data goes to. Selection is crucial.

Good sharding key characteristics:

Frequently used in queries: Most queries include this field
Evenly distributed: Data distributes evenly, no skew
Rarely changes: Changing sharding key causes data migration

Common sharding keys:

User ID (user_id)
Order ID (order_id)
Time (created_at)
Tenant ID (tenant_id, SaaS scenarios)

3.5 Cross-Database Query Challenges

After sharding, the biggest challenge is cross-database queries.

Problem 1: Cross-database JOIN

User table is in user DB, order table is in order DB. How to query "user's all orders"?

Solutions:

Application layer assembly (query user first, then orders)
Data redundancy (order table stores username)
Wide table (pre-merge related data)

Problem 2: Cross-database Pagination

Orders are in 4 databases, need to get "latest 10 orders"—how?

Solutions:

Each DB gets 10, application layer merges and sorts
Search engine (Elasticsearch) for queries
Avoid deep pagination

Problem 3: Global Unique ID

After sharding, can't use auto-increment ID because each DB would generate duplicate IDs.

Solutions:

UUID (simple but takes space)
Snowflake ID (Twitter's approach)
Database segment (Meituan Leaf)

4. Redis Cache Design

Redis is standard equipment for high concurrency systems. Used correctly, database pressure reduces 90%.

4.1 Cache Architecture

Request → Application Layer → Redis Cache → Database
                    ↓
              Cache hit returns directly

Cache Aside Pattern (most common)

Check cache first
If cache miss, check database
After query, write to cache
Return result

On update:

Update database
Delete cache (not update)

Why "delete" instead of "update"? Because concurrent updates might cause cache and database inconsistency. After deletion, next request will re-read from database.

4.2 Caching Strategies

What to cache?

Hot data (frequently read)
Expensive computation results
Rarely changing data

How long to cache?

TTL (Time To Live) sets expiration
Set based on data characteristics: hot products 5 minutes, user data 1 hour, static config 1 day
Add random value to avoid simultaneous expiration

How big should cache be?

Estimate hot data size
Reserve 20% space for traffic spikes
Set eviction policy (LRU most common)

4.3 Distributed Locks

High concurrency scenarios need distributed locks to protect shared resources.

Scenario: Flash sale inventory deduction. 100 requests deducting simultaneously—without locking, might oversell.

Redis Distributed Lock Implementation:

# Lock
SET lock_key unique_value NX PX 30000

# Unlock (use Lua script for atomicity)
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

Notes:

Set expiration time to avoid deadlock
Use unique value to identify lock owner
Verify it's your lock when unlocking

For more reliable distributed locks, consider Redlock algorithm or Redisson.

For more flash sale design details, see High Concurrency Transaction System Design.

4.4 Performance Tuning

Connection Pool Configuration

Don't create new connections for every request. Use connection pools, pre-establish connections for reuse.

Pipeline Batch Operations

Send multiple commands together, reduce network round trips.

pipe = redis.pipeline()
pipe.get("key1")
pipe.get("key2")
pipe.get("key3")
results = pipe.execute()

Choose Correct Data Structures

Simple Key-Value: String
Object attributes: Hash
Leaderboards: Sorted Set
Queues: List
Dedup counting: Set / HyperLogLog

5. Cache Problem Solutions

Use caching and you'll encounter these three problems. Interview must-knows, work must-encounters.

5.1 Cache Penetration

Problem Description:

Querying non-existent data. Cache doesn't have it, database doesn't either. Every request hits the database.

Malicious attackers can use this to overload your database.

Solutions:

Solution 1: Cache Empty Values

When database finds nothing, cache an empty value (with short TTL).

GET user:999999 → null (cached)
No database query needed

Solution 2: Bloom Filter

Add a Bloom filter layer before cache. Non-existent keys get intercepted, never hitting database.

Request → Bloom Filter → Cache → Database
           ↓
       Non-existent returns directly

Bloom filters have some false positive rate but no false negatives.

5.2 Cache Breakdown

Problem Description:

The moment a hot key expires, massive requests simultaneously hit the database.

Imagine Double 11's bestselling product—the moment cache expires, tens of thousands of requests simultaneously query the database.

Solutions:

Solution 1: Mutex Lock

Only one request queries database and updates cache, others wait.

value = redis.get(key)
if value is None:
    if redis.setnx(lock_key, 1):  # Acquire lock
        value = db.query(key)
        redis.set(key, value)
        redis.delete(lock_key)
    else:
        sleep(0.1)
        return get_with_lock(key)  # Retry
return value

Solution 2: Never Expire + Background Updates

Hot keys never expire, background tasks update regularly.

Cache: Never expires
Background task: Updates every 5 minutes

5.3 Cache Avalanche

Problem Description:

Large number of keys expire simultaneously, all requests hit database. Or Redis itself goes down.

Solutions:

Solution 1: Add Random Value to Expiration

ttl = base_ttl + random.randint(0, 300)  # Base TTL + 0-300 seconds random
redis.setex(key, ttl, value)

Solution 2: Multi-layer Caching

Local cache + Redis + Database. If Redis goes down, local cache holds the line.

Solution 3: Redis High Availability

Use Redis Sentinel or Redis Cluster, avoid single point of failure.

Solution 4: Circuit Breaker Degradation

When database pressure is too high, trigger circuit breaker, return default values or error messages.

Database can't handle the traffic? From read-write separation to sharding, every step has traps. Book an architecture consultation and let experienced consultants help plan your data layer optimization.

6. Cloud Database Selection

If you use cloud, you don't need to manage databases yourself. But choose the right service.

6.1 AWS Solutions

Service	Type	Use Cases
RDS	Relational	Traditional apps, MySQL/PostgreSQL
Aurora	Relational (Cloud Native)	High concurrency MySQL/PostgreSQL, auto-scaling
DynamoDB	NoSQL (Key-Value)	Ultra-high throughput, serverless
ElastiCache	Cache	Redis/Memcached managed

Aurora Highlights:

5x faster than RDS MySQL
Auto-scales to 128TB
15 read replicas
Cross-AZ high availability

6.2 GCP Solutions

Service	Type	Use Cases
Cloud SQL	Relational	MySQL/PostgreSQL/SQL Server
Cloud Spanner	Relational (Distributed)	Global consistency, massive scale
Firestore	NoSQL (Document)	Mobile apps, real-time sync
Memorystore	Cache	Redis managed

Cloud Spanner Highlights:

Horizontally scalable relational database
Globally distributed
Strong consistency (this is rare)
99.999% SLA

6.3 Azure Solutions

Service	Type	Use Cases
Azure SQL	Relational	SQL Server ecosystem
Cosmos DB	NoSQL (Multi-model)	Global distribution, multiple APIs
Azure Cache	Cache	Redis managed

Cosmos DB Highlights:

Multi-model (document, key-value, graph, column-family)
Globally distributed
Multiple consistency levels to choose
Millisecond response

6.4 Selection Comparison Table

Need	AWS	GCP	Azure
MySQL Managed	Aurora	Cloud SQL	Azure SQL
Massive Scale Relational	Aurora	Spanner	Azure SQL Hyperscale
Global Distributed NoSQL	DynamoDB Global	Spanner / Firestore	Cosmos DB
Redis Managed	ElastiCache	Memorystore	Azure Cache

For more detailed cloud architecture comparisons, see Cloud High Concurrency Architecture.

7. Practical Case Study

Case: E-commerce Order System Optimization

Background:

Daily orders: 500,000
Peak QPS during promotions: exceeds 10,000
Database frequently couldn't handle it

Optimization Steps:

Step 1: Add Caching

Product info in Redis, TTL 5 minutes
Database reads reduced by 70%

Step 2: Read-Write Separation

Primary for writes, two replicas for reads
ProxySQL for routing
Replicas handle 80% of read traffic

Step 3: Order Table Sharding

Shard by month: orders_202501, orders_202502
Archive historical orders to cold storage
Hot data query performance improved 3x

Results:

QPS increased from 3,000 to 15,000
Database CPU usage dropped from 90% to 40%
System stable during promotions

FAQ

Q1: When should I use Redis?

Whenever you have "read-heavy, write-light" hot data, it's suitable for Redis. Including: Sessions, product info, leaderboards, counters, distributed locks.

Q2: Does read-write separation affect data consistency?

There will be lag, usually milliseconds to seconds. For "just wrote and need to read" scenarios, force reads from primary.

Q3: How to do reporting after sharding?

After sharding, cross-database queries are troublesome. Reporting should use a separate data warehouse (like ClickHouse, BigQuery), syncing data from business DBs on schedule.

Q4: Can NoSQL replace relational databases?

Depends on scenario. Scenarios needing complex queries, JOINs, transactions—relational databases remain preferred. NoSQL suits high throughput, flexible schema scenarios.

Q5: What if cache and database are inconsistent?

Use "delete cache" instead of "update cache" strategy. If still inconsistent, set shorter TTL, or use CDC (Change Data Capture) for sync.

Conclusion: Data Layer Optimization is Key to System Performance

Database is the lifeline of high concurrency systems. Optimize correctly and 10x performance improvement isn't a dream.

Key Takeaways:

Database bottlenecks usually come from connections, lock contention, single machine capacity
Read-write separation is the first step—replicas share read pressure
Sharding breaks single machine limits but greatly increases complexity
Redis cache reduces 90% database pressure
Cache penetration solved with empty values or Bloom filters
Cache breakdown solved with mutex locks or never-expire
Cache avalanche solved with random TTL and high availability
Cloud databases are hassle-free, but choose the right service

Extended Reading:

Need a Second Opinion on Architecture?

Good data layer design can save multiples in costs. If you're:

Database can't handle traffic, need optimization
Evaluating whether to shard
Choosing cloud database services

Book an architecture consultation and let's review your data layer architecture together.

All consultation content is completely confidential, no sales pressure.

References

Martin Kleppmann, "Designing Data-Intensive Applications" (2017)
Redis Official Documentation, "Redis Best Practices" (2024)
AWS, "Amazon Aurora User Guide" (2024)
Google Cloud, "Cloud Spanner Best Practices" (2024)
Alibaba Cloud, "Database Best Practices" (2023)

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

High Concurrency

What is High Concurrency? 2025 Complete Guide: Definition, Architecture Design & Cloud Solutions

What does High Concurrency mean? This article provides a complete analysis of high concurrency definition, common problems, architecture design patterns, and how to use Redis, database optimization, and cloud services to handle high-traffic scenarios. Whether you're dealing with e-commerce flash sales, ticket-grabbing systems, or real-time trading, this guide helps you design highly available system architecture.

SQL

SQL vs NoSQL Complete Comparison: Database Selection Guide and Use Cases

Comprehensive comparison of SQL and NoSQL database differences, covering data models, scalability, ACID vs BASE, CAP theorem, performance characteristics, and use case recommendations to help you make the right technology choice.

Cloud Database

What is Cloud Database? 2025 Complete Guide | Free Plans, Platform Comparison, Setup Tutorial

Complete analysis of cloud database definition, pros/cons, and use cases. Compare AWS, GCP, Azure platforms, recommend free plans, and step-by-step guide to creating your first cloud database.

High Concurrency Database Design: Read-Write Separation, Sharding & Caching Strategies

Introduction: Database is the Most Common Bottleneck in High Concurrency Systems

1. Database Bottleneck Analysis

1.1 Connection Limit

1.2 Read-Write Contention

1.3 Single Machine Capacity Limit

2. Read-Write Separation Implementation

2.1 Principle Explanation

2.2 Implementation Methods

2.3 Replication Lag Issues

3. Database Sharding Strategy

3.1 When to Shard

3.2 Database Sharding Strategies

3.3 Table Sharding Strategies

3.4 Sharding Key Selection

3.5 Cross-Database Query Challenges

4. Redis Cache Design

4.1 Cache Architecture

4.2 Caching Strategies

4.3 Distributed Locks

4.4 Performance Tuning

5. Cache Problem Solutions

5.1 Cache Penetration

5.2 Cache Breakdown

5.3 Cache Avalanche

6. Cloud Database Selection

6.1 AWS Solutions

6.2 GCP Solutions

6.3 Azure Solutions

6.4 Selection Comparison Table

7. Practical Case Study

Case: E-commerce Order System Optimization

FAQ

Q1: When should I use Redis?

Q2: Does read-write separation affect data consistency?

Q3: How to do reporting after sharding?

Q4: Can NoSQL replace relational databases?

Q5: What if cache and database are inconsistent?

Conclusion: Data Layer Optimization is Key to System Performance

Need a Second Opinion on Architecture?

References

Need Professional Cloud Advice?

Related Articles

What is High Concurrency? 2025 Complete Guide: Definition, Architecture Design & Cloud Solutions

SQL vs NoSQL Complete Comparison: Database Selection Guide and Use Cases

What is Cloud Database? 2025 Complete Guide | Free Plans, Platform Comparison, Setup Tutorial