Back to HomeAWS Lambda

AWS Lambda Error Handling Complete Guide: 502, 503, 504 Error Solutions

12 min min read
#AWS Lambda#Error Handling#502 Bad Gateway#503 Error#504 Timeout#CloudWatch Logs#X-Ray#DLQ#Debugging#Serverless

AWS Lambda Error Handling Complete Guide: 502, 503, 504 Error Solutions

Introduction: When Lambda Fails

3 AM, your phone rings.

"The API is down, everything returns 502!"

You groggily open your laptop, staring at CloudWatch error logs, with no idea where to start.

This scenario is all too common.

Lambda error messages can be hard to interpret. 502, 503, 504 all look like "broken," but the causes are completely different.

This article will help you build a systematic error troubleshooting mindset, so next time you encounter problems, you can identify the cause in 5 minutes.

If you're not familiar with Lambda basics, consider reading AWS Lambda Complete Guide first.

Illustration 1: Lambda Error Monitoring Dashboard

Lambda Error Types Overview

Lambda errors fall into two main categories.

Invocation Errors vs Function Errors

Invocation Errors:

Lambda service-level issues where the function never started executing.

Common causes:

  • Concurrency limit reached
  • Insufficient permissions
  • Resource not found
  • Invalid request format

Function Errors:

Function started executing but failed during the process.

Common causes:

  • Code syntax errors
  • Unhandled exceptions
  • Out of memory (OOM)
  • Execution timeout

HTTP Status Code Mapping

When Lambda is used with API Gateway, errors convert to HTTP status codes:

Status CodeError TypeCommon Cause
400Request ErrorInvalid request format
403Permission DeniedInvalid API Key, insufficient IAM permissions
500Internal ErrorLambda code threw unhandled exception
502Bad GatewayLambda response format error
503Service UnavailableConcurrency limit, service unavailable
504Gateway TimeoutLambda execution timeout

Let's dive deep into each error.


502 Bad Gateway Explained

502 is the most common Lambda error.

Good news: it's usually easy to fix.

Common Causes

Cause 1: Response Format Error (Most Common)

API Gateway expects a specific response format. If Lambda returns the wrong format, you get 502.

# Error: Returning string directly
def lambda_handler(event, context):
    return "Hello World"  # 502!

# Error: body is not a string
def lambda_handler(event, context):
    return {
        "statusCode": 200,
        "body": {"message": "Hello"}  # body must be string, 502!
    }

# Correct format
def lambda_handler(event, context):
    return {
        "statusCode": 200,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"message": "Hello"})
    }

Cause 2: Out of Memory (OOM)

When Lambda uses more memory than configured, it's forcibly terminated.

You'll see in CloudWatch Logs:

Runtime exited with error: signal: killed

Cause 3: Unhandled Exceptions

Code throws an exception without try-catch, Lambda terminates abnormally.

# This causes 502
def lambda_handler(event, context):
    data = event["data"]  # If event has no "data" key, throws KeyError
    return {"statusCode": 200, "body": data}

Solutions and Code Examples

Solution 1: Fix Response Format

Create a standard response function:

import json

def create_response(status_code, body, headers=None):
    response = {
        "statusCode": status_code,
        "headers": {
            "Content-Type": "application/json",
            "Access-Control-Allow-Origin": "*"
        },
        "body": json.dumps(body) if isinstance(body, dict) else body
    }
    if headers:
        response["headers"].update(headers)
    return response

def lambda_handler(event, context):
    try:
        # Business logic
        result = {"message": "success"}
        return create_response(200, result)
    except Exception as e:
        return create_response(500, {"error": str(e)})

Solution 2: Increase Memory Configuration

For OOM issues:

  1. Go to Lambda function settings
  2. Increase Memory Size (e.g., from 128MB to 256MB or 512MB)
  3. Monitor memory usage during execution

Solution 3: Complete Error Handling

def lambda_handler(event, context):
    try:
        # Safely get parameters
        data = event.get("data", {})
        name = data.get("name", "Unknown")

        # Business logic
        result = process_data(name)

        return create_response(200, result)

    except ValueError as e:
        # Known business error
        return create_response(400, {"error": str(e)})

    except Exception as e:
        # Unexpected error, log details
        print(f"Unexpected error: {str(e)}")
        return create_response(500, {"error": "Internal server error"})

Quick Checklist

When encountering 502, check in order:

  • Is response format correct? (statusCode + body string)
  • Does CloudWatch Logs show "signal: killed"? (OOM)
  • Are there unhandled exceptions? (Check log error messages)
  • Is body a string? (Cannot be dict or list)

For more API Gateway integration details, see Lambda + API Gateway Integration Tutorial.


Can't fix the error? Book free consultation and let experts diagnose the problem.


503 Service Unavailable Explained

503 means Lambda service temporarily cannot process the request.

Common Causes

Cause 1: Concurrency Limit Reached

Lambda accounts have a default 1,000 concurrent execution limit (can request increase).

When all concurrency is used, new requests get 503.

Cause 2: Reserved Concurrency Set Too Low

If you set Reserved Concurrency for a specific function, requests exceeding this value are rejected.

Example: Set Reserved Concurrency = 10, 15 requests arrive simultaneously, 5 get 503.

Cause 3: Provisioned Concurrency Configuration Issues

When using Provisioned Concurrency, if configured quantity can't handle traffic spikes, excess requests may be throttled.

Solutions

Solution 1: Request Concurrency Limit Increase

  1. Go to AWS Service Quotas
  2. Search "Lambda concurrent executions"
  3. Request quota increase (usually 1-3 business days)

Solution 2: Adjust Reserved Concurrency

Evaluate function's actual needs, adjust appropriate Reserved Concurrency value.

Or remove Reserved Concurrency limit, letting function share account quota.

Solution 3: Implement Retry Mechanism

For scenarios accepting brief delays, implement exponential backoff retry on client:

import time
import requests

def call_api_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            if response.status_code == 503:
                wait_time = (2 ** attempt) + random.random()
                print(f"503 error, retrying in {wait_time:.2f}s...")
                time.sleep(wait_time)
                continue
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise
    raise Exception("Max retries exceeded")

Solution 4: Use SQS Buffer

Put requests into SQS queue, let Lambda process at stable rate:

Client → SQS Queue → Lambda (batch processing)

This avoids throttling issues from traffic bursts.

To understand Provisioned Concurrency cost impact, see AWS Lambda Pricing Complete Guide.

Illustration 2: Lambda Concurrency Limit and Throttling Diagram

504 Gateway Timeout Explained

504 means Lambda execution time exceeded API Gateway's limit.

Common Causes

Cause 1: Lambda Execution Timeout

Lambda itself has 15-minute execution limit.

But with API Gateway, limits are stricter:

  • REST API: 29 seconds
  • HTTP API: 30 seconds

Exceeding this time, API Gateway returns 504.

Cause 2: External API Latency

Lambda calling external services (database, third-party API) too slowly:

# If this request takes 35 seconds, it will timeout
response = requests.get("https://slow-external-api.com/data")

Cause 3: Cold Start + Processing Time

Cold Start can take 1-10 seconds (depending on language and package size), plus normal processing time, may exceed 30-second limit.

Solutions

Solution 1: Optimize Code Execution Time

  • Reduce unnecessary computations
  • Parallel process independent tasks
  • Use more efficient algorithms
import asyncio
import aiohttp

async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_one(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Call multiple APIs in parallel instead of sequentially
results = asyncio.run(fetch_all(urls))

Solution 2: Increase Lambda Timeout Setting

Adjust Timeout in Lambda settings:

  1. Go to function settings
  2. General configuration → Timeout
  3. Set appropriate value (max 15 minutes)

Note: This doesn't solve API Gateway's 30-second limit.

Solution 3: Switch to Async Processing

For long-running tasks, use async mode:

1. Client sends request
2. Lambda puts task in SQS/EventBridge
3. Immediately returns 202 Accepted + task ID
4. Background Lambda processes task
5. Client polls for result or receives WebSocket notification

For more async processing architecture, see Lambda + EventBridge Event-Driven Architecture Practice.

Solution 4: Use Provisioned Concurrency to Reduce Cold Start

# Enable Provisioned Concurrency
# 1. Publish Lambda version
# 2. Set Provisioned Concurrency for that version
# This eliminates Cold Start but has additional cost

Timeout problems hard to fix permanently? This is usually an architecture design issue, not just parameter tuning.

Book architecture consultation and let us help redesign the flow.


Debugging Tools and Techniques

With the right tools, debugging becomes much easier.

CloudWatch Logs Viewing Method

Every Lambda execution automatically logs to CloudWatch Logs.

Finding Logs:

  1. Go to CloudWatch → Log groups
  2. Search /aws/lambda/function-name
  3. Select recent Log stream

Key Log Messages:

# Normal execution
START RequestId: xxx
... your print output ...
END RequestId: xxx
REPORT RequestId: xxx Duration: 123.45 ms Billed Duration: 124 ms Memory Size: 128 MB Max Memory Used: 64 MB

# OOM error
Runtime exited with error: signal: killed

# Code error
[ERROR] KeyError: 'data'
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 5, in lambda_handler
    data = event["data"]

Using Log Insights Query:

# Find all errors
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

# Find longest running requests
fields @timestamp, @duration
| sort @duration desc
| limit 10

X-Ray Request Tracing

X-Ray traces complete request paths to find performance bottlenecks.

Enable X-Ray:

  1. Go to Lambda function settings
  2. Monitoring and operations tools → Enable "Active tracing"

Add Tracing in Code:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Automatically trace HTTP requests, database calls, etc.
patch_all()

@xray_recorder.capture('process_data')
def process_data(data):
    # This function's execution time will be traced
    return transformed_data

View Trace Results:

  1. Go to X-Ray → Traces
  2. View service map, find slow nodes
  3. Click individual requests for detailed time breakdown

Local Testing (SAM Local)

Test locally before deploying to catch issues faster.

Install SAM CLI:

# macOS
brew install aws-sam-cli

# Windows
msi installer from AWS

# Verify installation
sam --version

Run Lambda Locally:

# Create test event
echo '{"name": "test"}' > event.json

# Run function locally
sam local invoke MyFunction --event event.json

# Start local API
sam local start-api

This catches most errors before deployment.

Illustration 3: X-Ray Service Trace Map

Error Handling Best Practices

Beyond solving current errors, building protective mechanisms is more important.

Structured Error Response

Unified error response format makes frontend handling easier:

def create_error_response(status_code, error_code, message, details=None):
    body = {
        "error": {
            "code": error_code,
            "message": message
        }
    }
    if details:
        body["error"]["details"] = details

    return {
        "statusCode": status_code,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps(body)
    }

# Usage example
return create_error_response(
    400,
    "INVALID_INPUT",
    "Name is required",
    {"field": "name"}
)

DLQ Setup

Dead Letter Queue (DLQ) captures failed events, preventing data loss.

Setup Steps:

  1. Create SQS Queue as DLQ
  2. Go to Lambda function settings
  3. Asynchronous invocation → Destinations
  4. On failure: Select SQS Queue

Process DLQ Messages:

# Another Lambda processes DLQ messages
def dlq_handler(event, context):
    for record in event['Records']:
        # Log failed event
        failed_event = json.loads(record['body'])
        print(f"Processing failed event: {failed_event}")

        # Can send notifications, save to database, etc.
        notify_team(failed_event)

Monitoring and Alerts

If you use Terraform to manage Lambda, you can include these error handling settings in Infrastructure as Code. See Terraform Lambda Deployment Tutorial.

Set up CloudWatch Alarms for real-time notification when issues occur:

Key Metrics:

  • Errors: Function error count
  • Throttles: Throttled requests
  • Duration: Execution time
  • ConcurrentExecutions: Concurrent execution count

Set Up Alerts:

# Using CloudFormation/SAM
ErrorAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: LambdaHighErrorRate
    MetricName: Errors
    Namespace: AWS/Lambda
    Dimensions:
      - Name: FunctionName
        Value: !Ref MyFunction
    Statistic: Sum
    Period: 60
    EvaluationPeriods: 1
    Threshold: 5
    ComparisonOperator: GreaterThanThreshold
    AlarmActions:
      - !Ref AlertTopic

FAQ

What's the difference between 502 and 500 errors?

500 error usually means Lambda code threw an exception but was correctly received by API Gateway; 502 means Lambda response format error or abnormal termination, API Gateway couldn't parse the response. Fixing 502 usually requires checking response format and memory configuration.

How to view Lambda's detailed error logs?

Go to CloudWatch Logs, search for /aws/lambda/function-name Log group. Logs show complete error messages and stack traces. You can also use Log Insights for more complex query analysis.

What's the maximum Lambda Timeout?

Lambda itself can be set up to 15 minutes (900 seconds). But with API Gateway, REST API limits to 29 seconds, HTTP API limits to 30 seconds. Tasks exceeding this time should use async processing.

How to avoid timeout caused by Cold Start?

Using Provisioned Concurrency eliminates Cold Start. Or optimize code package size, choose languages with faster Cold Start (Python, Node.js), and move initialization logic outside Handler to execute only once.


Conclusion: Building Robust Error Handling Mechanisms

Errors cannot be completely avoided.

What's important is building comprehensive detection and handling mechanisms so when errors occur, they can be quickly discovered and fixed.

Key Points Recap:

Error CodeMain CauseFirst Check
502Response format errorreturn format, OOM
503Concurrency limitAccount quota, Reserved Concurrency
504Execution timeoutAPI Gateway 30-second limit

Next Steps:

  1. Set up unified error handling patterns for all Lambda functions
  2. Configure CloudWatch Alarms to monitor key metrics
  3. Configure DLQ for critical functions to prevent data loss
  4. Regularly review error logs to identify potential issues

Need Expert Help with Lambda Issues?

If you're:

  • Dealing with recurring Lambda errors
  • Optimizing Lambda execution performance
  • Designing more stable Serverless architecture

Book architecture consultation, we'll respond within 24 hours.

Proper architecture design can prevent 90% of runtime errors.


References

  1. AWS Official Documentation: Lambda Error Handling
  2. AWS Official Documentation: Troubleshooting Lambda
  3. AWS Official Documentation: CloudWatch Logs Insights
  4. AWS Official Documentation: X-Ray for Lambda
  5. AWS Blog: Error handling patterns in Amazon API Gateway and AWS Lambda

Need Professional Cloud Advice?

Whether you're evaluating cloud platforms, optimizing existing architecture, or looking for cost-saving solutions, we can help

Book Free Consultation

Related Articles