AWS AutoScaling Group & lifecycle hook

AWS AutoScaling Group lifecycle management

AutoScaling Service

Amazon Web Services AutoScaling is a service that automatically adjust the capacity of a group of EC2 instancesd. It is possible to set a fixed number of instance or set rules so the group will scale according to a metric. An AutoScaling Group is a group of instance launched by the AutoScaling service on your behalf, using a Launch Configuration or a Launch Template. A good practice is to use AutoScaling for any instance that you run on AWS, because of the integrated autoheal based on system healthcheck, your instance will be automatically recreated in case of problem. Without ASG, if something happens to an instance, you will have to recreate it manually or create manually the healthcheck. AutoScaling has a lot of advantages: autohealing in case of the sudden loss of an instance, scheduled scaling, scaling based on metrics, aggregated monitoring, easy integration with Elastic Load Balancer, etc.

Lifecycle hooks

The main function of the AutoScaling service is to scale up and down a group of instance. When it scales down because one instance is not needed anymore, it will terminate it. The termination is not graceful, so if you use the instance for processing, some tasks might still be running, thus it is possible to loose data. A nice feature of AutoScaling to answer to this problem is the lifecycle hooks. Those hooks can be setup to be set on instance creation or termination. If a hook is set, AutoScaling will pause the creation or termination process and will wait for a signal to continue. The idea is that you can let the instance finish its tasks, then send the signal to the AutoScaling and the instance will be deleted without any data loss. Neat. But Amazon does not provide anything out of the box on an instance to detect that a hook is set or to send a signal afterward. However AWS CLI or SDK provide such mecanism. So I made my own solution, leveraging the Boto3 library, to handle the lifecycle hook directly on the targeted EC2 instance.

Lifecycle hook manager

The idea is very simple: a Python script will run on regular basis thanks to a cron job, it will check if a hook has been set by the AutoScaling on the instance. If so, it runs a given command then send the signal to the AutoScaling Group to proceed. I called the script the lifecycle hook handler.

The code can be found on GitHub.

Usage on an EC2 instance

In this example, the lifecycle hook manager is just sending a message to all tty but it cans run any shell command. This example requires an AWS account with a VPC setup and an AutoScaling Group with one instance running SSH access.

For this example, you will need an AWS account with a AutoScaling group created.

The instance needs to have a IAM role with at least the following policy attached:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLifecycleHooks",
                "autoscaling:RecordLifecycleActionHeartbeat"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

The following command will add a lifecycle hook on termination to the existing ASG:

aws autoscaling put-lifecycle-hook --lifecycle-hook-name test-hook --auto-scaling-group-name <asg_name> --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING

You can check that the lifecycle hook has been created with:

aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name <asg_name>

Then you can use SSH to connect to the EC2 instance, get the hook manger script there and run the following command:

./lifecycle_hook_handler.py -c 'wall "Instance is being terminated"'

The -c parameter is use to specify the command that should be run if the hook is set. Nothing should happens because the lifecycle hook is not set yet.

You can then terminate the instance on the AWS Console but don’t terminate the instance manually otherwise it will be terminated without hook. Instead, set the desired instance number in the AutoScaling Group from 1 to 0.

You can then run the hook manager script again :

./lifecycle_hook_handler.py -c 'sleep 30 && wall "Instance is being terminated"'

And this time, the hook should be set so the command should run and the signal should be sent back to the AutoScaling Group. While the command is running, the manager will send an heartbeat every 10 seconds by default, otherwise the AutoScaling will timeout the hook. The script itself will timeout after 5 minutes by default, you can set this value using the -s parameter. The AutoScaling group will timeout automatically the lifecycle hook after 48 hours, so the value should be inferior (see global timeout in the the desribe-lifecycle-hooks command). After a timeout, the default action set when you created the lifecycle hook will be executed, it can be abort or continue the termination or the creation, you can use -a parameter to set the value.

You can use -l and -g parameter to setrespectively the lifecycle hook name or the AutoScaling Group name, it can be useful if you don’t want to run the manager on the instance itself. If you don’t specify those parameters, the manager will auto detect them for you. The -r parameter can be use to specify the region where the instance is located.

In the repository, there is an example of cron job that can reproduce the same example, the job will check the hook status every 5 minutes and run the same command if a hook is set.

Since the code is open-source, you can freely use this manager to handle the lifecycle hooks on your EC2 instances. Don’t hesitate to report or propose improvement idea with an GitHub issue or even open a pull request if you want to contribute.