AWS Lambda Error Handling Complete Guide: 502, 503, 504 Error Solutions
AWS Lambda Error Handling Complete Guide: 502, 503, 504 Error Solutions
Introduction: When Lambda Fails
3 AM, your phone rings.
"The API is down, everything returns 502!"
You groggily open your laptop, staring at CloudWatch error logs, with no idea where to start.
This scenario is all too common.
Lambda error messages can be hard to interpret. 502, 503, 504 all look like "broken," but the causes are completely different.
This article will help you build a systematic error troubleshooting mindset, so next time you encounter problems, you can identify the cause in 5 minutes.
If you're not familiar with Lambda basics, consider reading AWS Lambda Complete Guide first.

Lambda Error Types Overview
Lambda errors fall into two main categories.
Invocation Errors vs Function Errors
Invocation Errors:
Lambda service-level issues where the function never started executing.
Common causes:
- Concurrency limit reached
- Insufficient permissions
- Resource not found
- Invalid request format
Function Errors:
Function started executing but failed during the process.
Common causes:
- Code syntax errors
- Unhandled exceptions
- Out of memory (OOM)
- Execution timeout
HTTP Status Code Mapping
When Lambda is used with API Gateway, errors convert to HTTP status codes:
| Status Code | Error Type | Common Cause |
|---|---|---|
| 400 | Request Error | Invalid request format |
| 403 | Permission Denied | Invalid API Key, insufficient IAM permissions |
| 500 | Internal Error | Lambda code threw unhandled exception |
| 502 | Bad Gateway | Lambda response format error |
| 503 | Service Unavailable | Concurrency limit, service unavailable |
| 504 | Gateway Timeout | Lambda execution timeout |
Let's dive deep into each error.
502 Bad Gateway Explained
502 is the most common Lambda error.
Good news: it's usually easy to fix.
Common Causes
Cause 1: Response Format Error (Most Common)
API Gateway expects a specific response format. If Lambda returns the wrong format, you get 502.
# Error: Returning string directly
def lambda_handler(event, context):
return "Hello World" # 502!
# Error: body is not a string
def lambda_handler(event, context):
return {
"statusCode": 200,
"body": {"message": "Hello"} # body must be string, 502!
}
# Correct format
def lambda_handler(event, context):
return {
"statusCode": 200,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"message": "Hello"})
}
Cause 2: Out of Memory (OOM)
When Lambda uses more memory than configured, it's forcibly terminated.
You'll see in CloudWatch Logs:
Runtime exited with error: signal: killed
Cause 3: Unhandled Exceptions
Code throws an exception without try-catch, Lambda terminates abnormally.
# This causes 502
def lambda_handler(event, context):
data = event["data"] # If event has no "data" key, throws KeyError
return {"statusCode": 200, "body": data}
Solutions and Code Examples
Solution 1: Fix Response Format
Create a standard response function:
import json
def create_response(status_code, body, headers=None):
response = {
"statusCode": status_code,
"headers": {
"Content-Type": "application/json",
"Access-Control-Allow-Origin": "*"
},
"body": json.dumps(body) if isinstance(body, dict) else body
}
if headers:
response["headers"].update(headers)
return response
def lambda_handler(event, context):
try:
# Business logic
result = {"message": "success"}
return create_response(200, result)
except Exception as e:
return create_response(500, {"error": str(e)})
Solution 2: Increase Memory Configuration
For OOM issues:
- Go to Lambda function settings
- Increase Memory Size (e.g., from 128MB to 256MB or 512MB)
- Monitor memory usage during execution
Solution 3: Complete Error Handling
def lambda_handler(event, context):
try:
# Safely get parameters
data = event.get("data", {})
name = data.get("name", "Unknown")
# Business logic
result = process_data(name)
return create_response(200, result)
except ValueError as e:
# Known business error
return create_response(400, {"error": str(e)})
except Exception as e:
# Unexpected error, log details
print(f"Unexpected error: {str(e)}")
return create_response(500, {"error": "Internal server error"})
Quick Checklist
When encountering 502, check in order:
- Is response format correct? (statusCode + body string)
- Does CloudWatch Logs show "signal: killed"? (OOM)
- Are there unhandled exceptions? (Check log error messages)
- Is body a string? (Cannot be dict or list)
For more API Gateway integration details, see Lambda + API Gateway Integration Tutorial.
Can't fix the error? Book free consultation and let experts diagnose the problem.
503 Service Unavailable Explained
503 means Lambda service temporarily cannot process the request.
Common Causes
Cause 1: Concurrency Limit Reached
Lambda accounts have a default 1,000 concurrent execution limit (can request increase).
When all concurrency is used, new requests get 503.
Cause 2: Reserved Concurrency Set Too Low
If you set Reserved Concurrency for a specific function, requests exceeding this value are rejected.
Example: Set Reserved Concurrency = 10, 15 requests arrive simultaneously, 5 get 503.
Cause 3: Provisioned Concurrency Configuration Issues
When using Provisioned Concurrency, if configured quantity can't handle traffic spikes, excess requests may be throttled.
Solutions
Solution 1: Request Concurrency Limit Increase
- Go to AWS Service Quotas
- Search "Lambda concurrent executions"
- Request quota increase (usually 1-3 business days)
Solution 2: Adjust Reserved Concurrency
Evaluate function's actual needs, adjust appropriate Reserved Concurrency value.
Or remove Reserved Concurrency limit, letting function share account quota.
Solution 3: Implement Retry Mechanism
For scenarios accepting brief delays, implement exponential backoff retry on client:
import time
import requests
def call_api_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.get(url)
if response.status_code == 503:
wait_time = (2 ** attempt) + random.random()
print(f"503 error, retrying in {wait_time:.2f}s...")
time.sleep(wait_time)
continue
return response
except Exception as e:
if attempt == max_retries - 1:
raise
raise Exception("Max retries exceeded")
Solution 4: Use SQS Buffer
Put requests into SQS queue, let Lambda process at stable rate:
Client → SQS Queue → Lambda (batch processing)
This avoids throttling issues from traffic bursts.
To understand Provisioned Concurrency cost impact, see AWS Lambda Pricing Complete Guide.

504 Gateway Timeout Explained
504 means Lambda execution time exceeded API Gateway's limit.
Common Causes
Cause 1: Lambda Execution Timeout
Lambda itself has 15-minute execution limit.
But with API Gateway, limits are stricter:
- REST API: 29 seconds
- HTTP API: 30 seconds
Exceeding this time, API Gateway returns 504.
Cause 2: External API Latency
Lambda calling external services (database, third-party API) too slowly:
# If this request takes 35 seconds, it will timeout
response = requests.get("https://slow-external-api.com/data")
Cause 3: Cold Start + Processing Time
Cold Start can take 1-10 seconds (depending on language and package size), plus normal processing time, may exceed 30-second limit.
Solutions
Solution 1: Optimize Code Execution Time
- Reduce unnecessary computations
- Parallel process independent tasks
- Use more efficient algorithms
import asyncio
import aiohttp
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_one(session, url) for url in urls]
return await asyncio.gather(*tasks)
# Call multiple APIs in parallel instead of sequentially
results = asyncio.run(fetch_all(urls))
Solution 2: Increase Lambda Timeout Setting
Adjust Timeout in Lambda settings:
- Go to function settings
- General configuration → Timeout
- Set appropriate value (max 15 minutes)
Note: This doesn't solve API Gateway's 30-second limit.
Solution 3: Switch to Async Processing
For long-running tasks, use async mode:
1. Client sends request
2. Lambda puts task in SQS/EventBridge
3. Immediately returns 202 Accepted + task ID
4. Background Lambda processes task
5. Client polls for result or receives WebSocket notification
For more async processing architecture, see Lambda + EventBridge Event-Driven Architecture Practice.
Solution 4: Use Provisioned Concurrency to Reduce Cold Start
# Enable Provisioned Concurrency
# 1. Publish Lambda version
# 2. Set Provisioned Concurrency for that version
# This eliminates Cold Start but has additional cost
Timeout problems hard to fix permanently? This is usually an architecture design issue, not just parameter tuning.
Book architecture consultation and let us help redesign the flow.
Debugging Tools and Techniques
With the right tools, debugging becomes much easier.
CloudWatch Logs Viewing Method
Every Lambda execution automatically logs to CloudWatch Logs.
Finding Logs:
- Go to CloudWatch → Log groups
- Search
/aws/lambda/function-name - Select recent Log stream
Key Log Messages:
# Normal execution
START RequestId: xxx
... your print output ...
END RequestId: xxx
REPORT RequestId: xxx Duration: 123.45 ms Billed Duration: 124 ms Memory Size: 128 MB Max Memory Used: 64 MB
# OOM error
Runtime exited with error: signal: killed
# Code error
[ERROR] KeyError: 'data'
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 5, in lambda_handler
data = event["data"]
Using Log Insights Query:
# Find all errors
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20
# Find longest running requests
fields @timestamp, @duration
| sort @duration desc
| limit 10
X-Ray Request Tracing
X-Ray traces complete request paths to find performance bottlenecks.
Enable X-Ray:
- Go to Lambda function settings
- Monitoring and operations tools → Enable "Active tracing"
Add Tracing in Code:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
# Automatically trace HTTP requests, database calls, etc.
patch_all()
@xray_recorder.capture('process_data')
def process_data(data):
# This function's execution time will be traced
return transformed_data
View Trace Results:
- Go to X-Ray → Traces
- View service map, find slow nodes
- Click individual requests for detailed time breakdown
Local Testing (SAM Local)
Test locally before deploying to catch issues faster.
Install SAM CLI:
# macOS
brew install aws-sam-cli
# Windows
msi installer from AWS
# Verify installation
sam --version
Run Lambda Locally:
# Create test event
echo '{"name": "test"}' > event.json
# Run function locally
sam local invoke MyFunction --event event.json
# Start local API
sam local start-api
This catches most errors before deployment.

Error Handling Best Practices
Beyond solving current errors, building protective mechanisms is more important.
Structured Error Response
Unified error response format makes frontend handling easier:
def create_error_response(status_code, error_code, message, details=None):
body = {
"error": {
"code": error_code,
"message": message
}
}
if details:
body["error"]["details"] = details
return {
"statusCode": status_code,
"headers": {"Content-Type": "application/json"},
"body": json.dumps(body)
}
# Usage example
return create_error_response(
400,
"INVALID_INPUT",
"Name is required",
{"field": "name"}
)
DLQ Setup
Dead Letter Queue (DLQ) captures failed events, preventing data loss.
Setup Steps:
- Create SQS Queue as DLQ
- Go to Lambda function settings
- Asynchronous invocation → Destinations
- On failure: Select SQS Queue
Process DLQ Messages:
# Another Lambda processes DLQ messages
def dlq_handler(event, context):
for record in event['Records']:
# Log failed event
failed_event = json.loads(record['body'])
print(f"Processing failed event: {failed_event}")
# Can send notifications, save to database, etc.
notify_team(failed_event)
Monitoring and Alerts
If you use Terraform to manage Lambda, you can include these error handling settings in Infrastructure as Code. See Terraform Lambda Deployment Tutorial.
Set up CloudWatch Alarms for real-time notification when issues occur:
Key Metrics:
- Errors: Function error count
- Throttles: Throttled requests
- Duration: Execution time
- ConcurrentExecutions: Concurrent execution count
Set Up Alerts:
# Using CloudFormation/SAM
ErrorAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: LambdaHighErrorRate
MetricName: Errors
Namespace: AWS/Lambda
Dimensions:
- Name: FunctionName
Value: !Ref MyFunction
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 5
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref AlertTopic
FAQ
What's the difference between 502 and 500 errors?
500 error usually means Lambda code threw an exception but was correctly received by API Gateway; 502 means Lambda response format error or abnormal termination, API Gateway couldn't parse the response. Fixing 502 usually requires checking response format and memory configuration.
How to view Lambda's detailed error logs?
Go to CloudWatch Logs, search for /aws/lambda/function-name Log group. Logs show complete error messages and stack traces. You can also use Log Insights for more complex query analysis.
What's the maximum Lambda Timeout?
Lambda itself can be set up to 15 minutes (900 seconds). But with API Gateway, REST API limits to 29 seconds, HTTP API limits to 30 seconds. Tasks exceeding this time should use async processing.
How to avoid timeout caused by Cold Start?
Using Provisioned Concurrency eliminates Cold Start. Or optimize code package size, choose languages with faster Cold Start (Python, Node.js), and move initialization logic outside Handler to execute only once.
Conclusion: Building Robust Error Handling Mechanisms
Errors cannot be completely avoided.
What's important is building comprehensive detection and handling mechanisms so when errors occur, they can be quickly discovered and fixed.
Key Points Recap:
| Error Code | Main Cause | First Check |
|---|---|---|
| 502 | Response format error | return format, OOM |
| 503 | Concurrency limit | Account quota, Reserved Concurrency |
| 504 | Execution timeout | API Gateway 30-second limit |
Next Steps:
- Set up unified error handling patterns for all Lambda functions
- Configure CloudWatch Alarms to monitor key metrics
- Configure DLQ for critical functions to prevent data loss
- Regularly review error logs to identify potential issues
Need Expert Help with Lambda Issues?
If you're:
- Dealing with recurring Lambda errors
- Optimizing Lambda execution performance
- Designing more stable Serverless architecture
Book architecture consultation, we'll respond within 24 hours.
Proper architecture design can prevent 90% of runtime errors.
References
- AWS Official Documentation: Lambda Error Handling
- AWS Official Documentation: Troubleshooting Lambda
- AWS Official Documentation: CloudWatch Logs Insights
- AWS Official Documentation: X-Ray for Lambda
- AWS Blog: Error handling patterns in Amazon API Gateway and AWS Lambda
Need Professional Cloud Advice?
Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help
Book Free ConsultationRelated Articles
AWS Lambda + EventBridge Event-Driven Architecture Practical Tutorial
How to build event-driven architecture with Lambda + EventBridge? This complete tutorial covers Event Source Mapping, scheduled tasks, and cross-service integration to help you design highly scalable Serverless systems.
AWS LambdaLambda@Edge Complete Guide: CDN Edge Computing Applications and Practice
What is Lambda@Edge? Complete analysis of CDN edge computing, including trigger points, limitations, practical applications (URL rewriting, A/B testing, image optimization), helping you implement advanced features on CloudFront.
AWS LambdaTerraform AWS Lambda Deployment Complete Tutorial: IaC Best Practices
How to deploy AWS Lambda with Terraform? This complete tutorial covers IaC best practices, including Module usage, CI/CD integration, multi-environment deployment, helping you achieve repeatable infrastructure management.