Testing AWS Fault Injection Service (FIS) templates using LocalStack

Introduction

Testing how your application behaves during failures is an important part of building reliable systems. AWS Fault Injection Service (FIS) helps by letting you run controlled chaos experiments to see how your system responds to disruptions like instances stopping, network issues, or process terminations.

But testing directly in AWS can be slow, expensive, and risky, especially when you’re still building or refining your experiments.

This is where LocalStack can help. It lets you emulate FIS experiments locally, so you can iterate quickly, test safely, and avoid unexpected cloud costs.

In this guide, you’ll learn how to:

Set up a local environment for FIS testing using LocalStack
Use Terraform to provision EC2 instances with the right configuration
Run fault injection experiments to stop and terminate instances
Verify experiment outcomes by inspecting and validating system behavior

By the end, you’ll have a repeatable way to validate your application’s fault tolerance, entirely on your local machine.

Key Concepts

AWS Fault Injection Service (FIS) allows you to run controlled experiments that introduce faults into your AWS infrastructure to test system resilience. These experiments are defined using a JSON configuration and run using the CreateExperimentTemplate and StartExperiment APIs.

Core Components

An FIS experiment consists of the following components:

Action: The type of fault to inject. For example, stopping an EC2 instance or sending a Systems Manager command.
Target: The resources to apply the fault to. Targets are selected based on filters such as tags or instance IDs.
Duration: The length of time the fault should persist.

These elements together form an Experiment. When the duration expires, FIS automatically stops introducing faults and, where supported, attempts to return the system to a stable state.

Types of Actions

FIS actions can be grouped into two categories:

One-time events: These perform a single API action. For example: aws:ec2:stop-instances and aws:ec2:terminate-instances that stop & terminate an EC2 instance respectively.
Probabilistic API errors: These inject faults by altering API responses. For example: aws:fis:inject-api-unavailable-error returns HTTP 503 errors for a percentage of API requests.

LocalStack Support

LocalStack currently supports the following FIS actions:

aws:ec2:stop-instances – Stops the specified EC2 instances.
aws:ec2:terminate-instances – Terminates the specified EC2 instances.
aws:rds:reboot-db-instances – Reboots the specified RDS instances.
aws:ecs:stop-task – Stops the specified ECS task.
aws:ssm:send-command – Sends a command via Systems Manager to the target EC2 instances.

You can define these actions and their associated targets in an experiment template, then execute the experiment and observe the changes in your infrastructure state.

These are the core components of running fault injection experiments with AWS FIS and LocalStack. Let’s get started with setting up the environment and running our first experiment.

Prerequisites

localstack CLI with a LocalStack Auth Token
Terraform & tflocal wrapper for running Terraform against LocalStack
AWS CLI & awslocal for using AWS CLI commands against LocalStack

Step 1: Create the Terraform configuration

To run fault injection experiments locally, we need a few EC2 instances that FIS can safely target. In this step, we’ll define a Terraform configuration that spins up three EC2 instances, each with different roles and tags, making it easy to run experiments against one or more of them.

1.1: Define local variables

Create a new file named main.tf to define your Terraform configuration. We begin by defining a few local values to reuse throughout the configuration.

locals {
  ami_id = "ami-df5de72bdb3b"

  user_data = <<-EOF
    #!/bin/bash -xeu
    apt update
    apt install python3 -y
    python3 -m http.server 8000
  EOF
}

ami_id refers to the AMI to use for our local EC2 instance (Ubuntu 22.04 in this case) provided by LocalStack.
user_data is a startup script that installs Python and runs a basic HTTP server on port 8000.

This emulates a minimal but working application workload on each instance.

1.2: Create a security group

Next, define a security group that allows inbound HTTP traffic to port 8000.

resource "aws_security_group" "web_sg" {
  name        = "web-server-sg"
  description = "Allow HTTP inbound traffic"

  ingress {
    from_port   = 8000
    to_port     = 8000
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Ingress rule allows any external system to access the EC2 instance over TCP port 8000.
Egress rule allows outbound traffic to any destination.

This will make it possible to interact with the running Python server during tests.

1.3: Launch EC2 instances

We define three EC2 instances: one simulating a web server, another as an API server, and a third as a background worker.

Web server

resource "aws_instance" "web_server" {
  ami             = local.ami_id
  instance_type   = "t2.micro"
  security_groups = [aws_security_group.web_sg.name]
  count           = 1
  user_data       = local.user_data

  tags = {
    Name        = "web-server"
    Environment = "production"
    Application = "web-service"
  }
}

API server

resource "aws_instance" "api_server" {
  ami             = local.ami_id
  instance_type   = "t2.micro"
  security_groups = [aws_security_group.web_sg.name]
  count           = 1
  user_data       = local.user_data

  tags = {
    Name        = "api-server"
    Environment = "production"
    Application = "api-service"
  }
}

Background worker

resource "aws_instance" "worker" {
  ami             = local.ami_id
  instance_type   = "t2.micro"
  security_groups = [aws_security_group.web_sg.name]
  count           = 1
  user_data       = local.user_data

  tags = {
    Name        = "worker"
    Environment = "production"
    Application = "background-worker"
  }
}

They all share the same AMI, instance type, and security group. Each instance runs the same Python server from the user_data script. The unique Application and Name tags distinguish their roles in this example.

1.4: Output instance details

To keep track of which resources were created, we output their instance IDs and public IP addresses.

output "web_server_id" {
  value = aws_instance.web_server[0].id
}

output "web_server_ip" {
  value = aws_instance.web_server[0].public_ip
}

output "api_server_id" {
  value = aws_instance.api_server[0].id
}

output "api_server_ip" {
  value = aws_instance.api_server[0].public_ip
}

output "worker_id" {
  value = aws_instance.worker[0].id
}

output "worker_ip" {
  value = aws_instance.worker[0].public_ip
}

These outputs can be used to verify experiment targeting and test connectivity. Use the public IP addresses to access the Python web servers running on port 8000 (e.g., http://<web_server_ip>:8000).

This configuration gives you a reproducible environment with basic EC2 instances that simulate workload diversity and are ready for chaos testing using FIS. Let’s move on to deploying them locally.

Step 2: Deploy the Terraform configuration

Now that we’ve defined our infrastructure, it’s time to deploy it using tflocal, which is a wrapper around the Terraform CLI configured to work with LocalStack.

This will create three EC2 instances and a security group in your local environment, ready to be targeted by FIS experiments.

2.1: Start LocalStack

Before applying any Terraform config, make sure LocalStack is running and authenticated:

localstack auth set-token <YOUR_LOCALSTACK_AUTH_TOKEN>
localstack start

This will start LocalStack, with services like EC2, SSM, and FIS lazily loaded on demand.

2.2: Initialize Terraform

Inside your project directory, initialize Terraform using tflocal:

tflocal init

This sets up the backend and downloads the required AWS provider plugins. You should see a message confirming that Terraform has been initialized successfully.

2.3: Apply the configuration

Now deploy the resources:

tflocal apply

Terraform will:

Show a preview of what it plans to create
Ask for confirmation
Create EC2 instances and the security group
Output their instance IDs

You should see output similar to this:

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

api_server_id = "i-11982d2235aaa555b"
api_server_ip = "172.17.0.9"
web_server_id = "i-5a5f02f6e0f220bc0"
web_server_ip = "172.17.0.8"
worker_id     = "i-db1ff4dd7e120bff1"
worker_ip     = "172.17.0.7"

These instance IDs are useful later when targeting specific instances in an experiment.

With this step complete, our local AWS environment is now running three tagged EC2 instances. Next, we’ll define and run our first FIS experiment.

Step 3: Define and run the FIS experiment

Now that our EC2 instances are up and running in LocalStack, we can define a Fault Injection Service (FIS) experiment to test how our system behaves when parts of it fail.

This experiment will:

Stop the web server instance
Terminate the API server instance
Run SSM SendCommand on Worker instance

3.1: Create the FIS experiment template

Start by saving the following JSON into a file named create-experiment.json.

{
  "description": "Comprehensive FIS experiment for EC2 instances",
  "stopConditions": [
    {
      "source": "none"
    }
  ],
  "roleArn": "arn:aws:iam::000000000000:role/FisExperimentRole",
  "targets": {
    "WebServiceInstance": {
      "resourceType": "aws:ec2:instance",
      "resourceTags": {
        "Application": "web-service"
      },
      "selectionMode": "ALL"
    },
    "ApiServiceInstance": {
      "resourceType": "aws:ec2:instance",
      "resourceTags": {
        "Application": "api-service"
      },
      "selectionMode": "ALL"
    },
    "WorkerInstance": {
      "resourceType": "aws:ec2:instance",
      "resourceTags": {
        "Application": "background-worker"
      },
      "selectionMode": "ALL"
    }
  },
  "actions": {
    "StopWebServer": {
      "actionId": "aws:ec2:stop-instances",
      "targets": {
        "Instances": "WebServiceInstance"
      },
      "description": "Stop web server instance"
    },
    "TerminateApiServer": {
      "actionId": "aws:ec2:terminate-instances",
      "targets": {
        "Instances": "ApiServiceInstance"
      },
      "description": "Terminate API server instance"
    },
    "RunCpuStress": {
      "actionId": "aws:ssm:send-command",
      "parameters": {
        "documentArn": "arn:aws:ssm:us-east-1::document/AWSFIS-Run-CPU-Stress",
        "documentParameters": "{\"DurationSeconds\":\"120\"}",
        "duration": "PT5M"
      },
      "targets": {
        "Instances": "WorkerInstance"
      },
      "description": "Run a CPU stress test on worker instances"
    }
  }
}

This template:

Targets EC2 instances by tags (Application=...) for each service type.
Includes actions such as:
- Stopping the web server instance (aws:ec2:stop-instances)
- Terminating the API server instance (aws:ec2:terminate-instances)
- Running a CPU stress test on the worker instance using an SSM command (aws:ssm:send-command)
Has no stop conditions, so the experiment runs until all actions complete.
Requires a Role ARN, which is ignored in LocalStack.

Let’s register this experiment and run it locally.

3.2: Register the experiment template

Use the following command to register the experiment template with LocalStack:

awslocal fis create-experiment-template \
    --cli-input-json file://create-experiment.json

You should see a response with the generated experimentTemplateId. Keep this ID safe, as you’ll need it to start and inspect the experiment.

{
  "experimentTemplate": {
    "id": "dd2937cd-4cbc-4592-a7fc-dabfbad8bd0a",
    ...
  }
}

3.3: Start the experiment

Now that the template is registered, you can start the experiment using the template ID:

awslocal fis start-experiment --experiment-template-id dd2937cd-4cbc-4592-a7fc-dabfbad8bd0a

This triggers all actions defined in the template:

The web server instance is stopped
The API server instance is terminated
The worker instance runs a mock CPU stress load using SSM

You should see a response confirming that the experiment is running:

{
  "experiment": {
    "id": "6a701848-ac05-45f5-a653-6c87b77d0de8",
    "state": {
      "status": "running"
    },
    ...
  }
}

To check the status of the running experiment, use the following command:

awslocal fis get-experiment --id 6a701848-ac05-45f5-a653-6c87b77d0de8

You’ll see the current state of the experiment, including the targets and actions being applied.

Step 4: Validate the outcome

After running the FIS experiment, the final step is to verify that the actions were applied correctly to each EC2 instance. You can use the DescribeInstanceStatus API to check the current state of each instance.

4.1: Check the web server

This instance was targeted by the aws:ec2:stop-instances action.

awslocal ec2 describe-instance-status \
  --instance-ids i-5a5f02f6e0f220bc0 \
  --output json \
  --query 'InstanceStatuses[0].InstanceState'

You should see the state as "stopped" if the experiment ran correctly.

{
    "Code": 80,
    "Name": "stopped"
}

If you try to access the web server, as instructed on the LocalStack logs, you should see a connection error, confirming that the instance is indeed stopped.

4.2: Check the API server

This instance was terminated by the aws:ec2:terminate-instances action.

awslocal ec2 describe-instance-status \
  --instance-ids i-11982d2235aaa555b \
  --output json \
  --query 'InstanceStatuses[0].InstanceState'

You should see the state as "terminated" if the experiment ran correctly.

{
    "Code": 48,
    "Name": "terminated"
}

If you try to access the API server, as instructed on the LocalStack logs, you should see a connection error, confirming that the instance is indeed terminated.

Summary

You’ve now completed a full FIS experiment:

Created EC2 instances with specific tags
Defined an experiment template
Ran fault actions like stop and terminate
Verified the results entirely in LocalStack

This setup gives you a safe, fast way to prototype and validate FIS experiments before running them in a live AWS environment.

However, keep in mind that there are some limitations. LocalStack does not support advanced selection modes like percentage-based targeting, and the roleArn field is ignored during execution.

What to try next?

If you’d like more control over your local chaos experiments, LocalStack offers a Chaos API that allows you to simulate:

API failures (custom HTTP error codes and messages for any service)
Network effects like latency
Probabilistic and customizable fault injection

The Chaos API focuses on declarative effects that impact AWS APIs (e.g., returning errors, adding delays) without invoking actual AWS resource actions. These effects are applied dynamically based on configuration rules and support all AWS services and operations (e.g., S3, Lambda, Kinesis) with no service-specific restrictions.

If you’d like to learn more, check out the Chaos API documentation.