TL;DR - I think you’re probably best off with Terraform

Preface

These are highly opinionated pieces. While they are based on many years of professional experience in the industry they don’t take into account individual nuance. Posts are mostly going to focus around AWS tooling as that’s where most of my experience is. What better way to kick off than with CloudFormation.

CloudFormation

You might hear some very opinionated (wait, this blog post is opinionated…) people claim that CloudFormation sucks. And I kind of agree. However if used in a very specific way it can be pretty useful.

CloudFormation originally only supported JSON templates. I believe the intention AWS had was that CloudFormation would only ever be written by tools (hence the later release of CDK), however that kind of sucked and people only ever hand crafted templates, leading to CloudFormation eventually supporting YAML.

This concept of machines only ever managing CloudFormation is important however. To understand why we have to understand how CloudFormation works. When a template is deployed CloudFormation spins up the resources, and saves the state and configuration of each resource. This allows CloudFormation to perform its infamous rollback process. It also allows CloudFormation to know what steps to take to perform updates. Now if users have been meddling with resources this all falls apart. Subtle changes in resources can break CloudFormation changes and even worse CloudFormation can rollback a failed release to what was in it’s own state rather than what was running prior to a stack update!

The ability to import resources into CloudFormation is new(ish) and support is very limited. If you have an existing stack - good luck.

This all sounds horrible, but if you think about what CloudFormation is trying to achieve you can make it work in your favour.

My CloudFormation rules:

This sounds like a lot of work - whats the pay off?

Terraform

I haven’t used OpenTofu in anger yet. It’s likely that everything I say here probably applies to OpenTofu as well and it’s worth while checking out - I just don’t have experience with it yet.

I’m going write a whole post on using Terraform so I will keep this a little bit more brief. Terraform has a lot of pitfalls and issues (can we please not store secrets in the statefile and have dynamic state backend config kthnkz) but the advantage is that it much easier for engineers to manage in an ad-hoc manner.

Unlike our CloudFormation scenario above we have:

It does come with its own drawbacks though:

If you have an environment with existing resources or engineers that are likely poke things manually - terraform is a good choice.

HCL language, while not perfect can lead to good grepability. You can still fall down a rabbit hole of module mess (this will be covered in my other post) but if you try to keep your terraform flat then managing bulk changes can be ok.

Pulumi, CDK, CDK for terraform, other code generators

Unless you have a very good reason, just don’t. They seem like a good idea. Your software engineers will think its a good idea. However from experience every single implementation I’ve seen or worked with has turned into a mess. I think they can work however they have too many footguns to remain manageable.

Software engineers will often abstract resources to the point that making changes is fragile, complex and time consuming. Finding out how a resource is created and configured requires following chains of abstractions (hope your IDE is configured correctly).

After all of this you still end up with a template. Which means you still need to understand how to debug the template + you have to map the generated template back to the code. So not only do you need to know the tool / code abstractions you still end up with all the pain of managing the output of templates anyway.

The other problem is grepability. When operating large infrastructure deployments there’s some tasks when abstractions like this become frustrating. Some examples:

When using code generators it can be sometimes hard to perform these tasks. Sometimes easier. But often harder as everyone will have created their own abstraction for their resources. Compared to a DSL (like Terraform’s HCL or the YAML for CloudFormation) we can search all the entire company’s repos for something like aws_s3_bucket and possibly even script changes to the resource. Using a tool like multi-gitter suddenly means that its possible to update resources to a new company policy or standard quickly.