Back to HomeHigh Concurrency

High Concurrency Database Design: Read-Write Separation, Sharding & Redis Caching Strategies | 2025

13 min min read
#High Concurrency Database#Redis#Read-Write Separation#Database Sharding#Cache Penetration#Cache Breakdown#Cache Avalanche#MySQL#Cloud Database

High Concurrency Database Design: Read-Write Separation, Sharding & Redis Caching Strategies | 2025

High Concurrency Database Design: Read-Write Separation, Sharding & Caching Strategies

Introduction: Database is the Most Common Bottleneck in High Concurrency Systems

The system is slowing down. You check monitoring—CPU is fine, memory is fine, network is fine.

The problem is the database.

Queries piling up, connection count spiking, response time went from 50ms to 5 seconds. This is the most common scenario in high concurrency systems.

From experience, 90% of system performance issues originate in the data layer.

This article will take you from bottleneck analysis through read-write separation, database sharding, Redis cache design, and those three headache-inducing cache problems: penetration, breakdown, and avalanche.

If you're not familiar with overall high concurrency architecture, we recommend first reading What is High Concurrency? Complete Guide.


1. Database Bottleneck Analysis

Before optimizing, you need to know where the problem is.

1.1 Connection Limit

Every database connection consumes memory. MySQL default max connections is 151, PostgreSQL is 100.

When concurrent requests exceed the connection limit, new requests can only wait. Wait too long and they timeout, users see errors.

Symptoms:

  • Too many connections error
  • Application can't get connections
  • Requests queuing

Solutions:

  • Use connection pools (HikariCP, PgBouncer)
  • Increase max_connections (but there are limits)
  • Read-write separation to distribute connections

1.2 Read-Write Contention

Databases have locking mechanisms. Writing may lock entire tables or rows, other requests must wait.

In high concurrency scenarios, hot data is frequently read and written, lock contention becomes severe.

Symptoms:

  • Many requests in waiting for lock state
  • Write operations block reads
  • Database CPU high but throughput low

Solutions:

  • Read-write separation (reads don't affect writes)
  • Optimistic locking instead of pessimistic locking
  • Reduce lock hold time
  • Hot data in cache

1.3 Single Machine Capacity Limit

How much data can one database hold? How many QPS can it handle? All have limits.

When data reaches hundreds of millions or billions of rows, single machine query performance significantly degrades.

Symptoms:

  • Query time increases with data volume
  • Insufficient disk space
  • Backup and restore times too long

Solutions:

  • Database sharding
  • Archive old data
  • Use distributed databases

2. Read-Write Separation Implementation

Read-write separation is the first step in high concurrency database optimization.

2.1 Principle Explanation

Most applications are "read-heavy, write-light." E-commerce sites have 90% of requests browsing products, only 10% are purchases.

Read-write separation approach:

  • Primary (Master): Handles all write operations
  • Replica (Slave): Handles read operations, can have multiple
Write requests → Primary
Read requests → Replica 1 / Replica 2 / Replica 3

Primary data is synchronized to replicas through replication.

2.2 Implementation Methods

Application Layer Implementation

In code, determine SQL type and decide whether to send to primary or replica.

# Pseudocode
if sql.startswith("SELECT"):
    connection = slave_pool.get_connection()
else:
    connection = master_pool.get_connection()

Middleware Implementation

Use dedicated middleware for automatic routing:

  • MySQL Proxy
  • ProxySQL
  • MaxScale
  • ShardingSphere

Middleware benefits: Transparent to application, no code changes.

Cloud Managed Solutions

Major cloud databases support read-write separation:

  • AWS Aurora: Automatic read-write split endpoint
  • GCP Cloud SQL: Create Read Replica
  • Azure SQL: Configure Read Scale-Out

2.3 Replication Lag Issues

Replica data isn't synchronized in real-time, there's lag (usually milliseconds to seconds).

This causes the "just-written data can't be read" problem.

Solutions:

  • Force primary reads: For queries requiring high timeliness, force reads from primary
  • Lag-aware routing: Monitor replica lag, switch back to primary when lag is too high
  • Causal consistency: Read with the write's GTID to ensure replica has synced

3. Database Sharding Strategy

When single machine database reaches its limits, sharding is the next step.

3.1 When to Shard

Signals to shard:

  • Single table exceeds 10 million rows (rule of thumb)
  • Single database exceeds 500GB
  • Single database connections insufficient
  • Query performance continuously declining

When NOT to shard:

  • Data volume still small, optimizing SQL and indexes is enough
  • Haven't done read-write separation and caching
  • Team lacks distributed database experience

Sharding significantly increases complexity—not a light decision.

3.2 Database Sharding Strategies

Vertical Sharding: Split by business

Put different business tables in different databases:

  • User DB: users, user_profiles
  • Order DB: orders, order_items
  • Product DB: products, categories

Benefit is business isolation—one DB down doesn't affect others.

Horizontal Sharding: Split by rules

Split same table's data across multiple databases:

  • User DB 1: user_id 1-10 million
  • User DB 2: user_id 10-20 million

Benefit is breaking single machine capacity limits.

3.3 Table Sharding Strategies

Vertical Table Sharding: Split columns

Split one wide table into multiple tables:

  • users: id, name, email (frequently used columns)
  • user_details: id, bio, avatar, preferences (infrequently used columns)

Reduces single row size, improves query performance.

Horizontal Table Sharding: Split rows

Split data into multiple tables by rules:

  • orders_202401: January 2024 orders
  • orders_202402: February 2024 orders

Or by ID modulo:

  • orders_0: order_id % 4 = 0
  • orders_1: order_id % 4 = 1
  • orders_2: order_id % 4 = 2
  • orders_3: order_id % 4 = 3

3.4 Sharding Key Selection

The Sharding Key determines which database/table data goes to. Selection is crucial.

Good sharding key characteristics:

  • Frequently used in queries: Most queries include this field
  • Evenly distributed: Data distributes evenly, no skew
  • Rarely changes: Changing sharding key causes data migration

Common sharding keys:

  • User ID (user_id)
  • Order ID (order_id)
  • Time (created_at)
  • Tenant ID (tenant_id, SaaS scenarios)

3.5 Cross-Database Query Challenges

After sharding, the biggest challenge is cross-database queries.

Problem 1: Cross-database JOIN

User table is in user DB, order table is in order DB. How to query "user's all orders"?

Solutions:

  • Application layer assembly (query user first, then orders)
  • Data redundancy (order table stores username)
  • Wide table (pre-merge related data)

Problem 2: Cross-database Pagination

Orders are in 4 databases, need to get "latest 10 orders"—how?

Solutions:

  • Each DB gets 10, application layer merges and sorts
  • Search engine (Elasticsearch) for queries
  • Avoid deep pagination

Problem 3: Global Unique ID

After sharding, can't use auto-increment ID because each DB would generate duplicate IDs.

Solutions:

  • UUID (simple but takes space)
  • Snowflake ID (Twitter's approach)
  • Database segment (Meituan Leaf)

4. Redis Cache Design

Redis is standard equipment for high concurrency systems. Used correctly, database pressure reduces 90%.

4.1 Cache Architecture

Request → Application Layer → Redis Cache → Database
                    ↓
              Cache hit returns directly

Cache Aside Pattern (most common)

  1. Check cache first
  2. If cache miss, check database
  3. After query, write to cache
  4. Return result

On update:

  1. Update database
  2. Delete cache (not update)

Why "delete" instead of "update"? Because concurrent updates might cause cache and database inconsistency. After deletion, next request will re-read from database.

4.2 Caching Strategies

What to cache?

  • Hot data (frequently read)
  • Expensive computation results
  • Rarely changing data

How long to cache?

  • TTL (Time To Live) sets expiration
  • Set based on data characteristics: hot products 5 minutes, user data 1 hour, static config 1 day
  • Add random value to avoid simultaneous expiration

How big should cache be?

  • Estimate hot data size
  • Reserve 20% space for traffic spikes
  • Set eviction policy (LRU most common)

4.3 Distributed Locks

High concurrency scenarios need distributed locks to protect shared resources.

Scenario: Flash sale inventory deduction. 100 requests deducting simultaneously—without locking, might oversell.

Redis Distributed Lock Implementation:

# Lock
SET lock_key unique_value NX PX 30000

# Unlock (use Lua script for atomicity)
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

Notes:

  • Set expiration time to avoid deadlock
  • Use unique value to identify lock owner
  • Verify it's your lock when unlocking

For more reliable distributed locks, consider Redlock algorithm or Redisson.

For more flash sale design details, see High Concurrency Transaction System Design.

4.4 Performance Tuning

Connection Pool Configuration

Don't create new connections for every request. Use connection pools, pre-establish connections for reuse.

Pipeline Batch Operations

Send multiple commands together, reduce network round trips.

pipe = redis.pipeline()
pipe.get("key1")
pipe.get("key2")
pipe.get("key3")
results = pipe.execute()

Choose Correct Data Structures

  • Simple Key-Value: String
  • Object attributes: Hash
  • Leaderboards: Sorted Set
  • Queues: List
  • Dedup counting: Set / HyperLogLog

5. Cache Problem Solutions

Use caching and you'll encounter these three problems. Interview must-knows, work must-encounters.

5.1 Cache Penetration

Problem Description:

Querying non-existent data. Cache doesn't have it, database doesn't either. Every request hits the database.

Malicious attackers can use this to overload your database.

Solutions:

Solution 1: Cache Empty Values

When database finds nothing, cache an empty value (with short TTL).

GET user:999999 → null (cached)
No database query needed

Solution 2: Bloom Filter

Add a Bloom filter layer before cache. Non-existent keys get intercepted, never hitting database.

Request → Bloom Filter → Cache → Database
           ↓
       Non-existent returns directly

Bloom filters have some false positive rate but no false negatives.

5.2 Cache Breakdown

Problem Description:

The moment a hot key expires, massive requests simultaneously hit the database.

Imagine Double 11's bestselling product—the moment cache expires, tens of thousands of requests simultaneously query the database.

Solutions:

Solution 1: Mutex Lock

Only one request queries database and updates cache, others wait.

value = redis.get(key)
if value is None:
    if redis.setnx(lock_key, 1):  # Acquire lock
        value = db.query(key)
        redis.set(key, value)
        redis.delete(lock_key)
    else:
        sleep(0.1)
        return get_with_lock(key)  # Retry
return value

Solution 2: Never Expire + Background Updates

Hot keys never expire, background tasks update regularly.

Cache: Never expires
Background task: Updates every 5 minutes

5.3 Cache Avalanche

Problem Description:

Large number of keys expire simultaneously, all requests hit database. Or Redis itself goes down.

Solutions:

Solution 1: Add Random Value to Expiration

ttl = base_ttl + random.randint(0, 300)  # Base TTL + 0-300 seconds random
redis.setex(key, ttl, value)

Solution 2: Multi-layer Caching

Local cache + Redis + Database. If Redis goes down, local cache holds the line.

Solution 3: Redis High Availability

Use Redis Sentinel or Redis Cluster, avoid single point of failure.

Solution 4: Circuit Breaker Degradation

When database pressure is too high, trigger circuit breaker, return default values or error messages.


Database can't handle the traffic? From read-write separation to sharding, every step has traps. Book an architecture consultation and let experienced consultants help plan your data layer optimization.


6. Cloud Database Selection

If you use cloud, you don't need to manage databases yourself. But choose the right service.

6.1 AWS Solutions

ServiceTypeUse Cases
RDSRelationalTraditional apps, MySQL/PostgreSQL
AuroraRelational (Cloud Native)High concurrency MySQL/PostgreSQL, auto-scaling
DynamoDBNoSQL (Key-Value)Ultra-high throughput, serverless
ElastiCacheCacheRedis/Memcached managed

Aurora Highlights:

  • 5x faster than RDS MySQL
  • Auto-scales to 128TB
  • 15 read replicas
  • Cross-AZ high availability

6.2 GCP Solutions

ServiceTypeUse Cases
Cloud SQLRelationalMySQL/PostgreSQL/SQL Server
Cloud SpannerRelational (Distributed)Global consistency, massive scale
FirestoreNoSQL (Document)Mobile apps, real-time sync
MemorystoreCacheRedis managed

Cloud Spanner Highlights:

  • Horizontally scalable relational database
  • Globally distributed
  • Strong consistency (this is rare)
  • 99.999% SLA

6.3 Azure Solutions

ServiceTypeUse Cases
Azure SQLRelationalSQL Server ecosystem
Cosmos DBNoSQL (Multi-model)Global distribution, multiple APIs
Azure CacheCacheRedis managed

Cosmos DB Highlights:

  • Multi-model (document, key-value, graph, column-family)
  • Globally distributed
  • Multiple consistency levels to choose
  • Millisecond response

6.4 Selection Comparison Table

NeedAWSGCPAzure
MySQL ManagedAuroraCloud SQLAzure SQL
Massive Scale RelationalAuroraSpannerAzure SQL Hyperscale
Global Distributed NoSQLDynamoDB GlobalSpanner / FirestoreCosmos DB
Redis ManagedElastiCacheMemorystoreAzure Cache

For more detailed cloud architecture comparisons, see Cloud High Concurrency Architecture.


7. Practical Case Study

Case: E-commerce Order System Optimization

Background:

  • Daily orders: 500,000
  • Peak QPS during promotions: exceeds 10,000
  • Database frequently couldn't handle it

Optimization Steps:

Step 1: Add Caching

  • Product info in Redis, TTL 5 minutes
  • Database reads reduced by 70%

Step 2: Read-Write Separation

  • Primary for writes, two replicas for reads
  • ProxySQL for routing
  • Replicas handle 80% of read traffic

Step 3: Order Table Sharding

  • Shard by month: orders_202501, orders_202502
  • Archive historical orders to cold storage
  • Hot data query performance improved 3x

Results:

  • QPS increased from 3,000 to 15,000
  • Database CPU usage dropped from 90% to 40%
  • System stable during promotions

FAQ

Q1: When should I use Redis?

Whenever you have "read-heavy, write-light" hot data, it's suitable for Redis. Including: Sessions, product info, leaderboards, counters, distributed locks.

Q2: Does read-write separation affect data consistency?

There will be lag, usually milliseconds to seconds. For "just wrote and need to read" scenarios, force reads from primary.

Q3: How to do reporting after sharding?

After sharding, cross-database queries are troublesome. Reporting should use a separate data warehouse (like ClickHouse, BigQuery), syncing data from business DBs on schedule.

Q4: Can NoSQL replace relational databases?

Depends on scenario. Scenarios needing complex queries, JOINs, transactions—relational databases remain preferred. NoSQL suits high throughput, flexible schema scenarios.

Q5: What if cache and database are inconsistent?

Use "delete cache" instead of "update cache" strategy. If still inconsistent, set shorter TTL, or use CDC (Change Data Capture) for sync.


Conclusion: Data Layer Optimization is Key to System Performance

Database is the lifeline of high concurrency systems. Optimize correctly and 10x performance improvement isn't a dream.

Key Takeaways:

  1. Database bottlenecks usually come from connections, lock contention, single machine capacity
  2. Read-write separation is the first step—replicas share read pressure
  3. Sharding breaks single machine limits but greatly increases complexity
  4. Redis cache reduces 90% database pressure
  5. Cache penetration solved with empty values or Bloom filters
  6. Cache breakdown solved with mutex locks or never-expire
  7. Cache avalanche solved with random TTL and high availability
  8. Cloud databases are hassle-free, but choose the right service

Extended Reading:


Need a Second Opinion on Architecture?

Good data layer design can save multiples in costs. If you're:

  • Database can't handle traffic, need optimization
  • Evaluating whether to shard
  • Choosing cloud database services

Book an architecture consultation and let's review your data layer architecture together.

All consultation content is completely confidential, no sales pressure.


References

  1. Martin Kleppmann, "Designing Data-Intensive Applications" (2017)
  2. Redis Official Documentation, "Redis Best Practices" (2024)
  3. AWS, "Amazon Aurora User Guide" (2024)
  4. Google Cloud, "Cloud Spanner Best Practices" (2024)
  5. Alibaba Cloud, "Database Best Practices" (2023)

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

Related Articles