1. What is AWS Step Functions?
AWS Step Functions is a serverless orchestration service that lets you coordinate multiple AWS services into workflows. You define your workflow as a state machine using a JSON-based language called Amazon States Language (ASL).
Core Concept
Step Functions solves the problem of coordinating multiple Lambda functions and services. Instead of chaining Lambdas by having one invoke another (tightly coupled), Step Functions manages the flow, error handling, retries, and parallelism visually and declaratively. Think: workflow engine for serverless.
2. Why Step Functions?
- Lambda has a 15-minute timeout — Step Functions can orchestrate workflows lasting up to 1 year
- Visual workflow designer in the AWS Console
- Built-in error handling, retries, and catch blocks
- Parallel execution of branches
- Conditional branching (if/else logic)
- Wait states (pause for seconds, minutes, hours, or until a specific time)
- Audit trail — full execution history for every state
3. State Machine Concepts
State Machine
- The workflow itself — a collection of states and transitions
- Defined in Amazon States Language (JSON)
- Each execution gets a unique execution ID and full history
States (Building Blocks)

4. Step Functions Workflow Types

5. Error Handling
Step Functions has built-in error handling at the state level:
Retry
- Automatically retry a failed state based on error type
- Configure: MaxAttempts, IntervalSeconds, BackoffRate
- Exponential backoff is supported (BackoffRate > 1)
"Retry": [
{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
]
Attempt 1: wait 2s, Attempt 2: wait 4s, Attempt 3: wait 8sCatch
- If all retries fail, catch the error and redirect to a fallback state
- Route to a cleanup Lambda, SNS notification, or Fail state
- Can catch specific errors or all errors (States.ALL)
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "HandleErrorState"
}
]Common Error Types
- States.ALL: Matches any error
- States.TaskFailed: Task state failed
- States.Timeout: State timed out
- States.Permissions: Insufficient permissions
- States.ResultPathMatchFailure: Output path doesn’t match
6. Service Integrations
Step Functions can directly integrate with 200+ AWS services without Lambda:

Direct Integration = No Lambda Needed
Step Functions can call DynamoDB, SQS, SNS, ECS, Glue, SageMaker, and many more services directly. You do NOT need a Lambda function as a "middleman" for simple API calls. This reduces cost, complexity, and latency.
7. Common Step Functions Patterns
Pattern 1: Sequential Processing
- Step A → Step B → Step C (each step runs after the previous)
- Example: Validate order → Process payment → Ship order
Pattern 2: Parallel Processing
- Multiple branches run simultaneously
- Example: Send email AND update DB, and notify warehouse in parallel
Pattern 3: Map (Fan-Out)
- Process each item in a list concurrently
- Example: Process each line item in a 100-item order (Map state runs 100 iterations)
- Supports max concurrency to limit parallel executions
Pattern 4: Human Approval
- Use .waitForTaskToken to pause the workflow until a human approves
- Send task token to an SNS/SQS notification → human reviews → sends token back → workflow continues
Pattern 5: Saga Pattern (Distributed Transactions)
- Coordinate a multi-step transaction across services
- If any step fails, execute compensating transactions to undo previous steps
- Example: Book flight → Book hotel → Book car. If the car fails: cancel hotel, cancel flight.
- Step Functions’ Catch blocks trigger the compensating actions
8. Step Functions vs Other Services

Exam Tip
Step Functions: "Coordinate multiple Lambda functions" = Step Functions. "Workflow longer than 15 minutes" = Step Functions (Standard = up to 1 year). "High-volume short workflow" = Express Workflow. "Human approval in workflow" = .waitForTaskToken. "Call DynamoDB without Lambda" = Direct integration. "Distributed transaction with rollback" = Saga pattern. Standard = exactly-once, expensive. Express = at-least-once, cheap, 5-min max.