Testing AWS Fault Injection Service (FIS) templates using LocalStack
Learn how to test AWS Fault Injection Service experiments locally using LocalStack with Terraform for provisioning EC2 instances, defining FIS experiment templates to inject faults like stopping and terminating instances, and validating resilience testing outcomes without cloud costs.
Introduction
Testing how your application behaves during failures is an important part of building reliable systems. AWS Fault Injection Service (FIS) helps by letting you run controlled chaos experiments to see how your system responds to disruptions like instances stopping, network issues, or process terminations.
But testing directly in AWS can be slow, expensive, and risky, especially when you’re still building or refining your experiments.
This is where LocalStack can help. It lets you emulate FIS experiments locally, so you can iterate quickly, test safely, and avoid unexpected cloud costs.
In this guide, you’ll learn how to:
- Set up a local environment for FIS testing using LocalStack
- Use Terraform to provision EC2 instances with the right configuration
- Run fault injection experiments to stop and terminate instances
- Verify experiment outcomes by inspecting and validating system behavior
By the end, you’ll have a repeatable way to validate your application’s fault tolerance, entirely on your local machine.
Key Concepts
AWS Fault Injection Service (FIS) allows you to run controlled experiments that introduce faults into your AWS infrastructure to test system resilience. These experiments are defined using a JSON configuration and run using the CreateExperimentTemplate and StartExperiment APIs.
Core Components
An FIS experiment consists of the following components:
- Action: The type of fault to inject. For example, stopping an EC2 instance or sending a Systems Manager command.
- Target: The resources to apply the fault to. Targets are selected based on filters such as tags or instance IDs.
- Duration: The length of time the fault should persist.
These elements together form an Experiment. When the duration expires, FIS automatically stops introducing faults and, where supported, attempts to return the system to a stable state.
Types of Actions
FIS actions can be grouped into two categories:
- One-time events: These perform a single API action. For example:
aws:ec2:stop-instancesandaws:ec2:terminate-instancesthat stop & terminate an EC2 instance respectively. - Probabilistic API errors: These inject faults by altering API responses. For example:
aws:fis:inject-api-unavailable-errorreturns HTTP 503 errors for a percentage of API requests.
LocalStack Support
LocalStack currently supports the following FIS actions:
aws:ec2:stop-instances– Stops the specified EC2 instances.aws:ec2:terminate-instances– Terminates the specified EC2 instances.aws:rds:reboot-db-instances– Reboots the specified RDS instances.aws:ecs:stop-task– Stops the specified ECS task.aws:ssm:send-command– Sends a command via Systems Manager to the target EC2 instances.
You can define these actions and their associated targets in an experiment template, then execute the experiment and observe the changes in your infrastructure state.
These are the core components of running fault injection experiments with AWS FIS and LocalStack. Let’s get started with setting up the environment and running our first experiment.
Prerequisites
localstackCLI with a LocalStack Auth Token- Terraform &
tflocalwrapper for running Terraform against LocalStack - AWS CLI &
awslocalfor using AWS CLI commands against LocalStack
Step 1: Create the Terraform configuration
To run fault injection experiments locally, we need a few EC2 instances that FIS can safely target. In this step, we’ll define a Terraform configuration that spins up three EC2 instances, each with different roles and tags, making it easy to run experiments against one or more of them.
1.1: Define local variables
Create a new file named main.tf to define your Terraform configuration. We begin by defining a few local values to reuse throughout the configuration.
locals { ami_id = "ami-df5de72bdb3b"
user_data = <<-EOF #!/bin/bash -xeu apt update apt install python3 -y python3 -m http.server 8000 EOF}ami_idrefers to the AMI to use for our local EC2 instance (Ubuntu 22.04 in this case) provided by LocalStack.user_datais a startup script that installs Python and runs a basic HTTP server on port 8000.
This emulates a minimal but working application workload on each instance.
1.2: Create a security group
Next, define a security group that allows inbound HTTP traffic to port 8000.
resource "aws_security_group" "web_sg" { name = "web-server-sg" description = "Allow HTTP inbound traffic"
ingress { from_port = 8000 to_port = 8000 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }
egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] }}- Ingress rule allows any external system to access the EC2 instance over TCP port 8000.
- Egress rule allows outbound traffic to any destination.
This will make it possible to interact with the running Python server during tests.
1.3: Launch EC2 instances
We define three EC2 instances: one simulating a web server, another as an API server, and a third as a background worker.
Web server
resource "aws_instance" "web_server" { ami = local.ami_id instance_type = "t2.micro" security_groups = [aws_security_group.web_sg.name] count = 1 user_data = local.user_data
tags = { Name = "web-server" Environment = "production" Application = "web-service" }}API server
resource "aws_instance" "api_server" { ami = local.ami_id instance_type = "t2.micro" security_groups = [aws_security_group.web_sg.name] count = 1 user_data = local.user_data
tags = { Name = "api-server" Environment = "production" Application = "api-service" }}Background worker
resource "aws_instance" "worker" { ami = local.ami_id instance_type = "t2.micro" security_groups = [aws_security_group.web_sg.name] count = 1 user_data = local.user_data
tags = { Name = "worker" Environment = "production" Application = "background-worker" }}They all share the same AMI, instance type, and security group. Each instance runs the same Python server from the user_data script. The unique Application and Name tags distinguish their roles in this example.
1.4: Output instance details
To keep track of which resources were created, we output their instance IDs and public IP addresses.
output "web_server_id" { value = aws_instance.web_server[0].id}
output "web_server_ip" { value = aws_instance.web_server[0].public_ip}
output "api_server_id" { value = aws_instance.api_server[0].id}
output "api_server_ip" { value = aws_instance.api_server[0].public_ip}
output "worker_id" { value = aws_instance.worker[0].id}
output "worker_ip" { value = aws_instance.worker[0].public_ip}These outputs can be used to verify experiment targeting and test connectivity. Use the public IP addresses to access the Python web servers running on port 8000 (e.g., http://<web_server_ip>:8000).
This configuration gives you a reproducible environment with basic EC2 instances that simulate workload diversity and are ready for chaos testing using FIS. Let’s move on to deploying them locally.
Step 2: Deploy the Terraform configuration
Now that we’ve defined our infrastructure, it’s time to deploy it using tflocal, which is a wrapper around the Terraform CLI configured to work with LocalStack.
This will create three EC2 instances and a security group in your local environment, ready to be targeted by FIS experiments.
2.1: Start LocalStack
Before applying any Terraform config, make sure LocalStack is running and authenticated:
localstack auth set-token <YOUR_LOCALSTACK_AUTH_TOKEN>localstack startThis will start LocalStack, with services like EC2, SSM, and FIS lazily loaded on demand.
2.2: Initialize Terraform
Inside your project directory, initialize Terraform using tflocal:
tflocal initThis sets up the backend and downloads the required AWS provider plugins. You should see a message confirming that Terraform has been initialized successfully.
2.3: Apply the configuration
Now deploy the resources:
tflocal applyTerraform will:
- Show a preview of what it plans to create
- Ask for confirmation
- Create EC2 instances and the security group
- Output their instance IDs
You should see output similar to this:
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Outputs:
api_server_id = "i-11982d2235aaa555b"api_server_ip = "172.17.0.9"web_server_id = "i-5a5f02f6e0f220bc0"web_server_ip = "172.17.0.8"worker_id = "i-db1ff4dd7e120bff1"worker_ip = "172.17.0.7"These instance IDs are useful later when targeting specific instances in an experiment.
With this step complete, our local AWS environment is now running three tagged EC2 instances. Next, we’ll define and run our first FIS experiment.
Step 3: Define and run the FIS experiment
Now that our EC2 instances are up and running in LocalStack, we can define a Fault Injection Service (FIS) experiment to test how our system behaves when parts of it fail.
This experiment will:
- Stop the web server instance
- Terminate the API server instance
- Run SSM
SendCommandon Worker instance
3.1: Create the FIS experiment template
Start by saving the following JSON into a file named create-experiment.json.
{ "description": "Comprehensive FIS experiment for EC2 instances", "stopConditions": [ { "source": "none" } ], "roleArn": "arn:aws:iam::000000000000:role/FisExperimentRole", "targets": { "WebServiceInstance": { "resourceType": "aws:ec2:instance", "resourceTags": { "Application": "web-service" }, "selectionMode": "ALL" }, "ApiServiceInstance": { "resourceType": "aws:ec2:instance", "resourceTags": { "Application": "api-service" }, "selectionMode": "ALL" }, "WorkerInstance": { "resourceType": "aws:ec2:instance", "resourceTags": { "Application": "background-worker" }, "selectionMode": "ALL" } }, "actions": { "StopWebServer": { "actionId": "aws:ec2:stop-instances", "targets": { "Instances": "WebServiceInstance" }, "description": "Stop web server instance" }, "TerminateApiServer": { "actionId": "aws:ec2:terminate-instances", "targets": { "Instances": "ApiServiceInstance" }, "description": "Terminate API server instance" }, "RunCpuStress": { "actionId": "aws:ssm:send-command", "parameters": { "documentArn": "arn:aws:ssm:us-east-1::document/AWSFIS-Run-CPU-Stress", "documentParameters": "{\"DurationSeconds\":\"120\"}", "duration": "PT5M" }, "targets": { "Instances": "WorkerInstance" }, "description": "Run a CPU stress test on worker instances" } }}This template:
- Targets EC2 instances by tags (
Application=...) for each service type. - Includes actions such as:
- Stopping the web server instance (
aws:ec2:stop-instances) - Terminating the API server instance (
aws:ec2:terminate-instances) - Running a CPU stress test on the worker instance using an SSM command (
aws:ssm:send-command)
- Stopping the web server instance (
- Has no stop conditions, so the experiment runs until all actions complete.
- Requires a Role ARN, which is ignored in LocalStack.
Let’s register this experiment and run it locally.
3.2: Register the experiment template
Use the following command to register the experiment template with LocalStack:
awslocal fis create-experiment-template \ --cli-input-json file://create-experiment.jsonYou should see a response with the generated experimentTemplateId. Keep this ID safe, as you’ll need it to start and inspect the experiment.
{ "experimentTemplate": { "id": "dd2937cd-4cbc-4592-a7fc-dabfbad8bd0a", ... }}3.3: Start the experiment
Now that the template is registered, you can start the experiment using the template ID:
awslocal fis start-experiment --experiment-template-id dd2937cd-4cbc-4592-a7fc-dabfbad8bd0aThis triggers all actions defined in the template:
- The web server instance is stopped
- The API server instance is terminated
- The worker instance runs a mock CPU stress load using SSM
You should see a response confirming that the experiment is running:
{ "experiment": { "id": "6a701848-ac05-45f5-a653-6c87b77d0de8", "state": { "status": "running" }, ... }}To check the status of the running experiment, use the following command:
awslocal fis get-experiment --id 6a701848-ac05-45f5-a653-6c87b77d0de8You’ll see the current state of the experiment, including the targets and actions being applied.
Step 4: Validate the outcome
After running the FIS experiment, the final step is to verify that the actions were applied correctly to each EC2 instance. You can use the DescribeInstanceStatus API to check the current state of each instance.
4.1: Check the web server
This instance was targeted by the aws:ec2:stop-instances action.
awslocal ec2 describe-instance-status \ --instance-ids i-5a5f02f6e0f220bc0 \ --output json \ --query 'InstanceStatuses[0].InstanceState'You should see the state as "stopped" if the experiment ran correctly.
{ "Code": 80, "Name": "stopped"}If you try to access the web server, as instructed on the LocalStack logs, you should see a connection error, confirming that the instance is indeed stopped.
4.2: Check the API server
This instance was terminated by the aws:ec2:terminate-instances action.
awslocal ec2 describe-instance-status \ --instance-ids i-11982d2235aaa555b \ --output json \ --query 'InstanceStatuses[0].InstanceState'You should see the state as "terminated" if the experiment ran correctly.
{ "Code": 48, "Name": "terminated"}If you try to access the API server, as instructed on the LocalStack logs, you should see a connection error, confirming that the instance is indeed terminated.
Summary
You’ve now completed a full FIS experiment:
- Created EC2 instances with specific tags
- Defined an experiment template
- Ran fault actions like stop and terminate
- Verified the results entirely in LocalStack
This setup gives you a safe, fast way to prototype and validate FIS experiments before running them in a live AWS environment.
However, keep in mind that there are some limitations. LocalStack does not support advanced selection modes like percentage-based targeting, and the roleArn field is ignored during execution.
What to try next?
If you’d like more control over your local chaos experiments, LocalStack offers a Chaos API that allows you to simulate:
- API failures (custom HTTP error codes and messages for any service)
- Network effects like latency
- Probabilistic and customizable fault injection
The Chaos API focuses on declarative effects that impact AWS APIs (e.g., returning errors, adding delays) without invoking actual AWS resource actions. These effects are applied dynamically based on configuration rules and support all AWS services and operations (e.g., S3, Lambda, Kinesis) with no service-specific restrictions.
If you’d like to learn more, check out the Chaos API documentation.