AWS DevOps Professional: CFN Signal, WaitCondition, and CreationPolicy
Create_Complete is not the full story. Resources in cloudformation with complex bootstrapping will require a more bespoke solution
Typical cloudformation provisioning is as follows
- Logical resources outline in CFN template
- Template is uploaded to CFN to create Stack
- Stack is used to create/update/delete physical resources in AWS using the configurations in the template
Imagine the template that dictates the creation of an EC2 instance.
When the physical resource is complete, CFN is alerted indicating the completion of the physical resource in the events tab.
All is well... right?
Well... maybe...? It's hard to tell. All we know is that there's an instance. CFN has no access to any other information beyond "it's there".
If there's a more complicated provisioning process - perhaps you're bootstrapping a public web server on the instance - CFN has no way of knowing if any further actions have been completed successfully.
The EC2 will be in the CREATE_COMPLETE
state long before the bootstrapping finishes. Even if it fails, the resource will still show a successful creation.
Here are a few ways we can make up for this limitation
Cloudformation Signals
You can tell cloudformation to wait for a certain number of success signals.
You can also configure a timeout. Tell CFN in Hours:Minutes:Seconds how long to wait for those signals (Maximum 12 hours). If the timeout is reached the stack will fail.
This will make the resource wait, it won't automatically shift to a create complete state until it's received the success signal(s).
You can also configure it to send failure signal to which the resource create will fail.
CreationPolicy
Typically used for EC2s or auto-scaling groups, the CreationPolicy lets you outline the number of success signals and the timeframe to get them.
CreationPolicy
ResourceSignal:
Count: '5'
Timeout: PT1H20M45S
Count: Number of success signal to set the resource to CREATE_COMPLETE
(1 by default).
Timeout Format: PT#H#M#S, therefore the timeout from the above example will go off in 1 hours, 20 minutes, and 45 seconds.
If you're like me then the PT part is probably bugging you. Just know that the value for timeout must be given in the ISO8601 duration format.
PT just means "Period of Time", that's all we need to know.
Check out the official documentation here.
WaitConditions
A WaitCondition is a specific logical resource.
It can depend on other resources; other resource can depend on it.
Like other resources it must also be set to the create complete state for the stack to complete. However, this resource can gate-keep completion until it either reaches a timeout or receives a signal from the "WaitHandle"
The WaitHandle can generate a pre-signed URL for resource signals. "Pre-signed" just means "trusted", so it doesn't need any credential.
A separate even might need to take place before the resource is fully provision.
This event can generate a JSON with data that is passed back as a signal response.