Hello, I am Apollo Clark, a Cloud Architect, formerly with HashiCorp, with 13+ years of AWS experience, 4+ years of Azure Experience, and 3+ years of GCP experience. I've worked with the largest financial services companies in the world, and various US Dept of Defense (DoD) organizations, over the years on projects with security requirements of PCI-DSS, HIPAA, FedRAMP, and GDPR. AWS is an amazing service capable of a wide variety of uses, but with that flexibility comes a lot of complexity that is easy to misconfigure. Unfortunately, even in 2022, a lot of cloud provider services are not secure by default. This guide is a list of the most common mistakes I've seen. Many organizations adopted AWS organically, without any centralized planning, given the ease of using an oragnization credit card to spin up infrastucture in minutes, versus going through months of approval and waiting for physical hardware to be delivered, installed, configured, and made available into on-prem VMware based data centers. Whenver I worked with organizations I would go through this quick list to get a sense of how technically mature they are, and help them understand where there is room for improvement.
When I started using AWS in 2008 there were only a few services available: S3 file storage, EC2 VMs, and VPC Networking. There are now about 120+ services, with dozens of resources per service, with up to hundreds of configuration options per resource. AWS did not have Consolidated Billing / Centralized Billing / Multi-Account Billing until 2010-02-09, and did not release AWS Organizations until 2017-02-27. Before 2014, I would often meet with AWS Architects who actively discouraged having multiple AWS Accounts, and would encourage customers to have only a few AWS Accounts. Now with AWS Organizations it is much easier to consolidate managing multiple AWS Accounts. Understanding that history, it hopefully explains why AWS Multi-Account Security and Configuration is not easy to use and has various missing features. AWS Landing Zones was released on 2018-06-14, and allows for "... automating the set-up of an environment for running secure and scalable workloads while implementing an initial security baseline through the creation of core accounts and resources." AWS Control Tower was released on 2019-06-24, which uses AWS Landing Zones and adds additional security features using AWS SSO, AWS Service Catalog, AWS Config, and AWS CloudTrail; be sure to review the FAQ and Limitations and Quotas. AWS Control Tower is a powerful solution, but it may not be flexible enough, and often requires replacing existing AWS Accounts. AWS Control Tower released a Terraform-based Account Factory on 2021-11-29.
- https://aws.amazon.com/about-aws/whats-new/2010/02/09/announcing-consolidated-billing-for-aws-accounts/
- https://aws.amazon.com/about-aws/whats-new/2017/02/aws-organizations-now-generally-available/
- https://aws.amazon.com/about-aws/whats-new/2018/06/introducing-aws-landing-zone/
- https://aws.amazon.com/about-aws/whats-new/2019/06/aws-control-tower-is-now-generally-available/
- https://docs.aws.amazon.com/managedservices/latest/onboardingguide/set-up-consolidated-billing.html
- https://aws.amazon.com/organizations/faqs/
- https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-aws-environment/understanding-landing-zones.html
- https://aws.amazon.com/blogs/aws/category/management-tools/aws-control-tower/
- https://aws.amazon.com/controltower/faqs/
- https://docs.aws.amazon.com/controltower/latest/userguide/what-is-control-tower.html
- https://docs.aws.amazon.com/controltower/latest/userguide/limits.html
- https://www.mitocgroup.com/blog/my-architecture-aws-control-tower-vs-aws-landing-zone/
- https://aws.amazon.com/blogs/aws/new-aws-control-tower-account-factory-for-terraform/
- https://github.com/aws-ia/terraform-aws-control_tower_account_factory
- https://www.hashicorp.com/blog/hashicorp-teams-with-aws-on-new-control-tower-account-factory-for-terraform
- https://learn.hashicorp.com/tutorials/terraform/aws-control-tower-aft
- https://docs.aws.amazon.com/controltower/latest/userguide/taf-account-provisioning.html
- https://registry.terraform.io/modules/aws-ia/control_tower_account_factory/aws/latest
- https://www.hashicorp.com/resources/aws-terraform-landing-zone-tlz-accelerator
Oddly, AWS does not have a Multi-Account Invenotry Management Service. Within Azure and GCP, you can query a JSON API for each Cloud Account. The best I've seen so far are:
- https://aws.amazon.com/blogs/mt/tag/inventory/
- AWS Systems Manager (only for AWS EC2 VMs)
- Use AWS Systems Manager custom Inventory to locate Log4j files on managed nodes - AWS Blog
- VMware CloudHealth
- DivvyCloud
- ServiceNow
- Prisma Cloud, Asset Inventory - Palo Alto Networks
- kopicloud
- NCCGroup / aws-inventory (open source)
- DuoLabs / CloudMapper (open source)
- DuoLabs / CloudMapper Blog Post
- aws-auto-inventory (open source)
- scopely-devops / skew (open source)
Tagging is a surprisingly difficult endevour within AWS. There is not any ability to call a single API to Auto-Tag all untagged resources, nor can you define a collection of "Default Tags". Terraform has made efforts to make this easier by introducing Terraform AWS Provider Default Tags on 2021-05-12. There is also an AWS Config Rule check, powered by AWS Lambda, to alert whenever a resource is missing required tags.
- Terraform AWS Provider Default Tags
- Terraform Enteprprise - Sentinel Rule - enforce mandatory tags
- Validating Terraform plans with the Open Policy Agent (OPA)
- Pre-deployment Policy Checks for Terraform using OPA (Open Policy Agent)
- AWS Config - Managed Rules - required-tags
- Automatically tag new AWS resources based on identity or role (2020-11-02) - AWS Blog
- Automated Tagging for Cost Optimization (2020-08-03) - Intelligent Discovery
- Customized Resources Auto-Tagging in AWS (2020-04-20) - IT Next
- GorillaStack / auto-tag (open source)
AWS CloudWatch is NOT a replacement for a dedicated and centralized metrics monitoring tool like SignalFX (acquired by Splunk on 2019-10-02), ELK, DataDog, New Relic, Grafana, or Prometheus. AWS CloudWatch could only be configured to collect metrics from a single AWS Account before 2019-11-08, and only recently added support for Cross Account Alarms on 2021-08-05. Support for Export and Importing Dashboards was added on 2017-07-05. There is not any way to Export nor Import existing Alerts, which means anything configured cannot be reused and needs to be redundantly custom built. Finally, the querying capabilities of AWS CloudWatch are not robust. Given these multiple limitations, there is not a robust community of available dashboards.
- https://www.splunk.com/en_us/investor-relations/acquisitions/signalfx.html
- https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-cloudwatch-launches-cross-account-cross-region-dashboards/
- https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Cross-Account-Cross-Region.html
- https://aws.amazon.com/about-aws/whats-new/2021/08/announcing-amazon-cloudwatch-cross-account-alarms/
- https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/
- https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Dashboards.html
- https://docs.aws.amazon.com/grafana/latest/userguide/dashboard-export-and-import.html
- https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudwatch/get-dashboard.html
Now that Cloud Providers are the top form of hosting compute resources, Cost Management has become very visible to IT Executives. The organic growth of cloud providers over the years was often small scale, with less than a hundred VMs, with budgets under $100,000 USD / year. Those early experiments started in 2008 have evolved and grown over the past 10 years to become the primary source of revenue for many large enterprises. With that growth of usage has also come a dramatic increase in costs.
You need to do Inventory Management, Tagging / Establish Ownership, and Metrics Monitoring, before you can do Cost Management.
Inventory Management establishes WHAT you have, Tagging establishes WHO owns it and WHEN it was created and last changed, Metrics Monitoring establishes HOW it is being used.
CloudFormation was released on to enable "Infrastructure as Code" to compliment the rise of "Configuration as Code" with tools like Chef, Puppet, Ansible, and later SaltWorks. HashiCorp Terrform was released on.