Back to HomeHigh Concurrency

Python vs Golang High Concurrency: FastAPI, asyncio and Goroutine Practical Comparison | 2025

13 min min read
#Python#Golang#High Concurrency#FastAPI#asyncio#Goroutine#GIL#Performance Comparison#Technology Selection

Python vs Golang High Concurrency: FastAPI, asyncio and Goroutine Practical Comparison

Introduction: Language Choice Affects System Ceiling

"Python is too slow, not suitable for high concurrency." "Go is great at everything, just hard to write."

You've definitely heard both statements. But which is true?

Language choice does affect system performance limits. Python is flexible with a rich ecosystem, but has GIL limitations. Go natively supports high concurrency, but has a steeper learning curve.

This article uses real data to compare Python and Go performance in high concurrency scenarios, helping you make the right technology choice.

If you're not familiar with basic high concurrency concepts, we recommend first reading What is High Concurrency? Complete Guide.


1. Python's High Concurrency Limitation: GIL

1.1 What is GIL

GIL (Global Interpreter Lock) is CPython's Global Interpreter Lock.

Simply put: Python can only execute one thread at a time.

No matter how many threads you create, no matter how many CPU cores you have, Python's bytecode execution is always single-threaded.

1.2 GIL's Impact

CPU-Intensive Tasks

Multi-threading is completely useless. Running calculations with 10 threads won't be faster than 1 thread.

# This doesn't help
import threading

def cpu_intensive():
    total = 0
    for i in range(10_000_000):
        total += i

# 4 threads won't be faster than 1
threads = [threading.Thread(target=cpu_intensive) for _ in range(4)]

I/O-Intensive Tasks

GIL is released during I/O waiting. So for network requests, file reading/writing, database queries, multi-threading is still useful.

# This works because GIL is released during I/O waiting
import threading
import requests

def fetch_url(url):
    return requests.get(url)

# Other threads can execute during I/O waiting
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]

1.3 Ways to Bypass GIL

Method 1: asyncio (Coroutines)

Don't use multi-threading, use single-threaded coroutines. Switch to other tasks during I/O waiting.

import asyncio
import aiohttp

async def fetch_url(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

Method 2: multiprocessing (Multi-Process)

Each process has its own GIL. Multiple processes means truly parallel execution.

from multiprocessing import Pool

def cpu_intensive(n):
    total = 0
    for i in range(n):
        total += i
    return total

if __name__ == '__main__':
    with Pool(4) as p:  # 4 processes
        results = p.map(cpu_intensive, [10_000_000] * 4)

Method 3: Use C Extensions

Libraries like NumPy and Pandas are written in C at the bottom layer and release GIL. Numerical computations can truly run in parallel.


2. Python High Concurrency Solutions

2.1 asyncio Coroutines

asyncio is the asynchronous framework introduced in Python 3.4. Core concepts:

Event Loop: Event loop, responsible for scheduling coroutines Coroutine: Coroutines, defined with async def await: Wait for async operation to complete

import asyncio

async def say_hello(name, delay):
    await asyncio.sleep(delay)  # Non-blocking wait
    print(f"Hello, {name}!")

async def main():
    # Run three coroutines simultaneously
    await asyncio.gather(
        say_hello("Alice", 1),
        say_hello("Bob", 2),
        say_hello("Charlie", 3),
    )

asyncio.run(main())
# Total time is only 3 seconds, not 6 seconds

Suitable Scenarios:

  • Large amounts of network I/O (API calls, web scraping)
  • Database queries
  • File reading/writing

Unsuitable Scenarios:

  • CPU-intensive computations
  • Need to call non-async blocking libraries

2.2 FastAPI + uvicorn

FastAPI is a modern Python web framework with native async support. Performance far exceeds Flask.

from fastapi import FastAPI
import httpx

app = FastAPI()

@app.get("/products/{product_id}")
async def get_product(product_id: int):
    # Async call to external API
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/products/{product_id}")
    return response.json()

@app.post("/orders")
async def create_order(product_id: int, quantity: int):
    # Async database operation
    order = await database.orders.insert_one({
        "product_id": product_id,
        "quantity": quantity,
    })
    return {"order_id": str(order.inserted_id)}

Performance Data (4C8G Server):

  • Flask + gunicorn: ~1,000 QPS
  • FastAPI + uvicorn: ~5,000 QPS
  • FastAPI + uvicorn + async DB: ~8,000 QPS

Deployment Methods:

# Development environment
uvicorn main:app --reload

# Production environment (multi-worker)
uvicorn main:app --workers 4 --host 0.0.0.0 --port 8000

# Or use gunicorn + uvicorn worker
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker

2.3 multiprocessing Multi-Process

Use multi-process when CPU-intensive computation is needed.

from multiprocessing import Pool, cpu_count
from fastapi import FastAPI
from concurrent.futures import ProcessPoolExecutor

app = FastAPI()
executor = ProcessPoolExecutor(max_workers=cpu_count())

def heavy_computation(data):
    # CPU-intensive computation
    result = 0
    for i in range(10_000_000):
        result += i * data
    return result

@app.post("/compute")
async def compute(data: int):
    import asyncio
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(executor, heavy_computation, data)
    return {"result": result}

Illustration 1: Python asyncio Event Loop Diagram

3. Golang's Native High Concurrency Advantages

3.1 Goroutine Principles

Goroutine is Go's lightweight thread. Compared to OS threads:

FeatureOS ThreadGoroutine
Memory~1-8MB~2KB
Creation CostHighVery Low
Context Switch CostHighVery Low
Quantity LimitThousandsHundreds of Thousands

Go's runtime schedules large numbers of Goroutines onto a small number of OS threads (M:N scheduling).

package main

import (
    "fmt"
    "time"
)

func sayHello(name string) {
    time.Sleep(1 * time.Second)
    fmt.Printf("Hello, %s!\n", name)
}

func main() {
    // Launch 1000 goroutines simultaneously
    for i := 0; i < 1000; i++ {
        go sayHello(fmt.Sprintf("User%d", i))
    }

    time.Sleep(2 * time.Second)
}

Launching 1000 Goroutines only increases memory by about 2MB. If using Java/Python threads, it might need 1-2GB.

3.2 Channel Communication

Go's philosophy: Don't communicate by sharing memory; share memory by communicating.

Channel is the communication pipe between Goroutines.

package main

import "fmt"

func producer(ch chan<- int) {
    for i := 0; i < 10; i++ {
        ch <- i  // Send to channel
    }
    close(ch)
}

func consumer(ch <-chan int) {
    for num := range ch {  // Receive from channel
        fmt.Println("Received:", num)
    }
}

func main() {
    ch := make(chan int, 10)  // buffered channel

    go producer(ch)
    consumer(ch)
}

Channel Advantages:

  • Avoid lock complexity
  • Natural synchronization mechanism
  • Cleaner code

3.3 sync Package

When traditional synchronization mechanisms are needed, Go provides the sync package.

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    var mu sync.Mutex
    counter := 0

    for i := 0; i < 1000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            mu.Lock()
            counter++
            mu.Unlock()
        }()
    }

    wg.Wait()
    fmt.Println("Counter:", counter)  // 1000
}

Common Tools:

  • sync.WaitGroup: Wait for a group of Goroutines to complete
  • sync.Mutex: Mutex lock
  • sync.RWMutex: Read-write lock
  • sync.Once: Execute only once
  • sync.Map: Concurrency-safe Map

4. Performance Benchmark Comparison

Enough theory, let's look at real data.

4.1 Test Environment

  • Machine: 4C8G cloud VM
  • System: Ubuntu 22.04
  • Python: 3.11 + FastAPI 0.104 + uvicorn
  • Go: 1.21 + Gin
  • Load Testing Tool: k6

4.2 HTTP API Performance Comparison

Test Scenario: Simple JSON response

# Python FastAPI
@app.get("/ping")
async def ping():
    return {"message": "pong"}
// Go Gin
r.GET("/ping", func(c *gin.Context) {
    c.JSON(200, gin.H{"message": "pong"})
})

Results:

MetricPython FastAPIGo Gin
QPS12,00045,000
P50 Latency3ms1ms
P99 Latency15ms5ms
Memory80MB20MB

Conclusion: Go is 3-4x faster in simple API scenarios.

4.3 CPU-Intensive Task Comparison

Test Scenario: Calculate Fibonacci sequence

# Python (CPU-intensive, GIL becomes bottleneck)
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

@app.get("/fib/{n}")
def calc_fib(n: int):
    return {"result": fibonacci(n)}
// Go
func fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    return fibonacci(n-1) + fibonacci(n-2)
}

r.GET("/fib/:n", func(c *gin.Context) {
    n, _ := strconv.Atoi(c.Param("n"))
    c.JSON(200, gin.H{"result": fibonacci(n)})
})

Results (n=35):

MetricPythonGo
QPS15200
P50 Latency2.5s180ms
P99 Latency3.5s250ms

Conclusion: For CPU-intensive tasks, Go is 10+ times faster.

4.4 I/O-Intensive Task Comparison

Test Scenario: External API call + Redis query

# Python (async shines here)
@app.get("/data/{id}")
async def get_data(id: int):
    async with aiohttp.ClientSession() as session:
        async with session.get(f"https://api.example.com/{id}") as resp:
            api_data = await resp.json()

    cache_data = await redis_client.get(f"cache:{id}")
    return {"api": api_data, "cache": cache_data}
// Go
r.GET("/data/:id", func(c *gin.Context) {
    id := c.Param("id")

    var wg sync.WaitGroup
    var apiData, cacheData interface{}

    wg.Add(2)
    go func() {
        defer wg.Done()
        resp, _ := http.Get("https://api.example.com/" + id)
        json.NewDecoder(resp.Body).Decode(&apiData)
    }()
    go func() {
        defer wg.Done()
        cacheData, _ = redisClient.Get(ctx, "cache:"+id).Result()
    }()
    wg.Wait()

    c.JSON(200, gin.H{"api": apiData, "cache": cacheData})
})

Results:

MetricPython asyncGo
QPS3,0005,000
P50 Latency25ms20ms
P99 Latency80ms50ms

Conclusion: In I/O-intensive scenarios, the gap narrows to 1.5-2x. Python async performs well in these scenarios.


5. Use Case Analysis

5.1 Scenarios for Choosing Python

Rapid Prototyping

  • Concise syntax, fast development
  • Rich third-party libraries
  • Suitable for MVP and quick validation

Data Science / Machine Learning

  • NumPy, Pandas, TensorFlow ecosystem
  • Jupyter Notebook friendly
  • Data processing pipelines

I/O-Intensive Web Applications

  • Combined with async, decent concurrency achievable
  • CRUD APIs, admin dashboards
  • Non-extreme traffic scenarios

Existing Python Tech Stack

  • Team familiar with Python
  • Existing codebase in Python
  • Easier to recruit Python engineers

5.2 Scenarios for Choosing Go

High-Performance API Services

  • Need extremely high QPS
  • Latency-sensitive
  • Resource-constrained environments

Infrastructure / Tools

  • CLI tools
  • Proxy services
  • Kubernetes ecosystem (all written in Go)

Microservices Architecture

  • Inter-service communication performance critical
  • Need to handle large numbers of concurrent connections
  • Containerized deployment (Go compiles to single binary)

CPU-Intensive Services

  • Real-time computation
  • Data processing
  • Encoding conversion

5.3 Decision Flowchart

Do you need high performance?
    ├─ No → Python (faster development)
    └─ Yes → Is it CPU-intensive?
              ├─ Yes → Go (no GIL limitation)
              └─ No → Is it I/O-intensive?
                        ├─ Yes → Python async also works
                        │       (Go still slightly better)
                        └─ No → Choose based on team experience

Illustration 2: Python vs Go Use Case Comparison

6. Hybrid Architecture Recommendations

You don't have to pick sides. Many companies use both.

6.1 Architecture Patterns

Pattern 1: Python for Business Layer, Go for Gateway

User → Go API Gateway → Python Business Service
                      → Python Business Service
                      → Python Business Service

Go Gateway handles high concurrency connections and routing, Python handles complex business logic.

Pattern 2: Python for CRUD, Go for Computation

Web Requests → Python FastAPI (CRUD operations)
Computation Tasks → Go Service (high-performance processing)

Extract CPU-intensive parts and implement in Go.

Pattern 3: Assign by Team Capability

Team A (Python background) → User Service, Order Service
Team B (Go background) → Real-time Communication, Push Service

Let teams use familiar languages, use Go for critical performance paths.

6.2 Inter-Service Communication

Hybrid architectures need standardized inter-service communication:

HTTP/REST

  • Simple and universal
  • Both Python and Go support it
  • Suitable for low-frequency calls

gRPC

  • High performance (based on HTTP/2 + Protobuf)
  • Strongly typed (IDL-defined interfaces)
  • Suitable for high-frequency inter-service calls
// user.proto
syntax = "proto3";

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
}

message GetUserRequest {
  int64 user_id = 1;
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

Both Python and Go can generate code from proto files, maintaining interface consistency.

6.3 Practical Case Study

Case: E-commerce Platform

ServiceLanguageReason
Product ServicePythonMainly CRUD, fast development
Order ServicePythonComplex business logic
Search ServiceGoHigh QPS requirement
Push ServiceGoLong connections, high concurrency
Data AnalyticsPythonData science ecosystem
API GatewayGoPerformance critical

Need Technology Selection Advice? Language choice affects long-term development. Schedule Architecture Consultation, let experienced consultants help you analyze the most suitable tech stack.


FAQ

Q1: Is Python really unsuitable for high concurrency?

Not entirely true. Python + async performs well in I/O-intensive scenarios. But CPU-intensive scenarios are indeed limited by GIL. The choice depends on your specific scenario.

Q2: Is Go hard to learn?

Go syntax is simple, official documentation is excellent, and those with programming background can get started in 1-2 weeks. The difficult part is understanding the design philosophy of Goroutines and Channels.

Q3: Can PyPy improve Python performance?

PyPy (JIT compiler) is indeed 2-5x faster than CPython. But ecosystem compatibility is limited, not all libraries are supported.

Q4: Rust is faster than Go, why not use Rust?

Rust indeed has higher performance, but the learning curve is steeper and development speed is slower. Go is the balance point between performance and development efficiency. Unless you have extreme performance requirements, Go is the more pragmatic choice.

Q5: How do I convince my team to try Go?

Start with small projects, like an internal tool or CLI. Expand after gaining experience. Don't start by converting core business to Go.


Conclusion: No Silver Bullet, Choose Based on Scenario

Python and Go each have advantages; the key is matching your scenario.

Key Takeaways:

  1. Python has GIL limitations, but asyncio can handle I/O-intensive tasks
  2. FastAPI + uvicorn is the best combination for Python high concurrency
  3. Go's Goroutines are lightweight and efficient, naturally suited for high concurrency
  4. For CPU-intensive scenarios, Go is 10+ times faster
  5. For I/O-intensive scenarios, the gap narrows to 1.5-2x
  6. Hybrid architecture is a pragmatic choice

Further Reading:


Need a Second Opinion on Architecture Design?

Technology selection is a long-term decision; wrong choices are costly. If you're:

  • Evaluating whether Python or Go is better for your system
  • Planning the tech stack for microservices architecture
  • Considering migrating from Python to Go

Schedule Architecture Consultation, let's analyze your requirements and technology choices together.

All consultation content is completely confidential, with no sales pressure.


References

  1. Python Official Documentation, "asyncio — Asynchronous I/O" (2024)
  2. Go Official Documentation, "Effective Go" (2024)
  3. FastAPI Official Documentation (2024)
  4. Gin Web Framework Official Documentation (2024)
  5. TechEmpower Framework Benchmarks (2024)

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

Related Articles