6 Tips for Maturing your Approach to Infrastructure-as-Code.

Infrastructure-as-Code (IaC) has revolutionized cloud infrastructure management – making it easier than ever before to automate deployment/configuration, track changes and manage idempotent infra environments via version control, and increase the scalability of applications/systems in a holistic sense (meaning everything from network components to compute resource, database, etc). Anyone who has worked in a technology workshop with a mature approach to IaC understands the night-and-day differences between that and purely manual cloud management.

The rub, of course, is actually getting to the point of a mature approach to IaC. Rome wasn’t built in a day, and neither is efficient and functional IaC that solves more problems than it creates. The following are 6 tips from my own experience in building a mature approach to IaC which is both efficient and functional for actual use – offered in the form of twin DO and DO NOTs:

*Note – I will gravitate toward language and terms most heavily oriented to Terraform, which is the cross-cloud market leader in IaC; but these tips can theoretically apply to any IaC language.

CODE MANAGEMENT

DO NOT use gigantic monolithic code throwing all resources and environments into a single template. People and teams using IaC for the first time are often tempted to simply translate existing infra into a single IaC template because it seems to make sense at first, like taking a story passed down orally and writing it on paper for the first time. If you’re hosting anything more complex than a small singular app environment, this is the wrong approach – trying to manage unrelated resources in a single template will lead to problems like a configuration issue on a resource for App X preventing deployment for a resource change on App Y. Terraform, for example, will not deploy if the configuration engine detects an issue in code during deployment. Why should App Y have to languish and wait on what could be critical updates simply because something unrelated is misconfigured elsewhere? And this just one example – there are a LOT of critical risks and efficiency problems introduced by the monolithic approach, such as accidentally modifying infra for the wrong service, or wrong environment.

DO design your IaC to be modular and focused on specific use-cases. One recent example on which I have worked are Terraform templates deploying connectivity to backend APIs by orchestrating deployment across the cloud provider, firewall appliance, front door/traffic management proxy, and DNS, all highly focused on delivering a single specific service. In this case, each individual deployment is its own template and repo. Within each repo, I leverage GitHub Environments for deploying distinct dev, stage, and prod environments. We want infra which is part of the same use-case managed together, and that’s it. Not only does this decouple our infra from being hung up by problems on infra for unrelated services, but it also more or less eliminates the possibility of accidentally modifying infra belonging to another service (so long as all config management is done thrugh IaC).

ENVIRONMENT MANAGEMENT

DO NOT use completely different code bases for each environment of a specific app or use-case. This may sound a bit contradictory to the Code Management point above in which I said IaC templates should be distinct and modular. “So am I using the same code or not? Make up your mind!” The risk being identified here are environments which are not reasonably identical. Let’s say you want to deploy infra for App Y with environments for development, staging, and production using IaC – and you host each app environment in a separate repo where cloud resources are outlined in templates living in that specific repo for Dev-AppY, or Stg-AppY, etc. A few weeks after the production deployment, some balloonhead goes into your Prod repo and deploys changes to critical infrastructure IaC – but not in the lower environments. Congratulations, your app environments no longer match – opening you to risks during testing. How can we be sure an app working in Dev will work in Prod if the environments are not reasonably identical? The bottom-line is that there’s no real safeguard preventing this from happening, the code bases between environments could change resulting in different infrastructure and you might never know until you have a critical outage on your hands.

DO use a single code base which will deploy reasonably identical infrastructure across all environments. When managing environments for an application or service in cloud, we want each environment to be reasonably identical – I say “reasonably” because there are often legitimate reasons for slight variation but environments need to be identical in the most important ways. We want app environments in Dev which will give us an idea of what to expect in Prod when a service is deployed, and we want the environments managed in such a way as to ensure it becomes highly unlikely for critical differences between stages to arise. There are a few ways of doing this, but a common method is to use environment variables – such as tfvars files for Terraform. You can design your infra code to accept all inputs through a tfvars file outlining variables for each different environment, then simply run through deploying each environment using tfvars files laid on top of the same code base. This ensures you have an identical environment across each stage (barring manual changes in console) and keeps it that way.

DEPLOYMENT AND AUTOMATION

DO NOT execute IaC locally and manually using non-remote state files for anything other than sandbox-level testing. If we focus on Terraform for ease of explanation – do not deploy your IaC using “terraform apply” commands run from your computer terminal with the tfstate file on your local storage. Firstly, your state file is the key to kingdom once infrastructure is deployed – lost it, and you’ve lost the ability to keep managing infra via TF. What if your HD conks out, or the VDI is decommissioned? The state file is also highly likely to contain sensitive secrets. It shouldn’t just live on your computer for any desktop-hopper to see. Furthermore, running TF locally via manual commands completely removes other automation solutions from the equation. I can assure you, your competition is out there somewhere learning how to use automated deployments to be more productive and comprehensive in their approach to infrastructure management – using local, manual execution is the probably the least productive way to handle IaC.

DO utilize automated deployment pipelines with remote state management. Firstly, on state file – use something like a cloud storage bucket (make sure versioning is enabled!) or a platform like Hashicorp Cloud to store your state where the file can be encrypted and access restricted. As Gandalf the Grey might say, “Keep it secret; keep it safe.” Second, you want to wring as much value out of automated deployment pipelines as possible and terrific tools like Jenkins and GitHub Actions allow you to do this. For example, consider my use-case I described above for deploying connectivity to backend API services. I have steps in my GitHub Actions workflow which grab most of the static environment variables directly from the cloud platform using scripts which search for and collect the necessary values. One specific example is in logging – because my cloud environment uses carefully standardized naming conventions, I’m able to run scripts during deployment which look at the designated environment and locate the ID of the logging group resource specifically designated for that type of app and environment (negating the need to find it manually in the console) and save it to tfvars. This may sound like a small thing, but now times that by 10 or more variables; considerable time saved and margin for human error reduced. Embrace automation – the possibilities are nearly endless.

When IaC is adopted poorly, it will create more problems than it solves. Don’t fall into the trap of adopting IaC practices that are simply convenient up front – consider the tips above and focus on adopting an IaC approach that will be efficient, functional, and actually solve problems. The effort WILL make your team more competitive!



Categories: Cloud, Tips

Tags: , , , , ,