Skip to content

Instantly share code, notes, and snippets.

@avoidik
Forked from ejlp12/1_ecs_note.md
Created November 9, 2021 01:40
Show Gist options
  • Save avoidik/214399e234582f685197cde92d996aac to your computer and use it in GitHub Desktop.
Save avoidik/214399e234582f685197cde92d996aac to your computer and use it in GitHub Desktop.
ECS Best Practices Notes

ECS Best Practices

  • Understand and check Service Quota of ECS/Fargate and other related services

  • Cluster

  • Use Amazon ECS-optimized AMIs.

    • Using different OS is hard to maintain: upgrade OS, patching, update Docker, update ECS Agent, etc
    • Subscribe for update notification.
  • Launching EC2 Container Instance

    • Don't use public IP address (Turn off Auto-assign Public IP)
    • Make EC2 instance immutable.
      • Better not to expose SSH for remote login, use AWS System Manager Run Command & Session Manager instead.
    • Use Spot Instance whenever possible eg. for Development environment
      • Find the instace type that are not frequently interrupted
      • Set Spot pricing to little bit higher than avarage
      • Use Spot Fleet to deploy the target capacity you request (expressed in terms of instances or a vCPU count)
    • Understad how the EC2 container instance works
      • Don't use reserved ports for your application (Linux TCP: 22, 2375, 2376, 51678, 51679, 51680)
      • Don't store log files or any persistent data in the container - it will make docker storage full
      • Look into /data directory for troubleshooting (contains information about the cluster and the agent state)
      • Set Container Agent config if you harden the OS using SELinux or Apparmor
      • For better performance, tune ECS_IMAGE_PULL_BEHAVIOR & Image/Task Clean up parameters based on how often you deploy -
    • Optimize ECS task density using ENI trunking
    • {Day2} Setup Automated update EC2 instances, since doing it manually is hard and error prone
  • Fargate

  • Networking

    • Use separate VPC, don't mix up with other service eg. EC2 instances that are not belong to the cluster.
      • Plan your VPC & Subnet CIDR, avoid complexity of using multiple CIDRs in a VPC
      • Use IP address tools
    • VPC & Subneting architecture patterns: https://containersonaws.com/architecture/
    • Makes Container Registry as near as possible with your cluster (for low latency & speed up docker pull).
      • ECS Cluster & ECR are better in the same Region
    • Use network mode = awspvc for greater security using SG, easy troubleshooting (using VPC flow log)
    • Use network mode = host, if you want the task bypasses Docker's built-in virtual network and maps container ports directly to the EC2 instance's network interface directly
  • Task Definition

    • Don't store env variables in the task definition, instead use Parameter Store - more secure.
    • Always set healthCheck parameter in the Container Definition for task that will be part of ECS service or using ECS Service Discovery.
      • Adjust other health check parameters: interval, timeout, retries, startPeriod based on your app characteristics
  • Service

    • Consider to use placement strategy
      • use “availability-zone” as spread attribute, to spread the Tasks being launched as evenly as possible across AZ
    • Service Discovery
      • Use Amazon ECS Service Discovery/CloudMap (internal domain resolution) for inter-service communication inside a cluster.
      • Be aware of SRV and A records for service lookup using DNS. 'A' record is simple, using SRV records you might change your app code since it will requires the app to resolve the IP address and the port.
    • Highly recommended to use ALB instead of ELB - dynamic port mapping, more detail monitoring & access log
    • Use placement strategy and constraint to maximize your resource. CDK example, Terraform example
    • Tune scaling parameters: healthcheck grace period and scaling cooldowns
    • Recommended to use Target Tracking Scaling Policies instead of Step Scaling Policies. Common scaling metric is based on EC2's CPU utilization or request count per target of ALB's target group.
    • Use API gateway to expose services
  • Observability

    • Send application log to standar output and stream to centralize logging. Take advantage of aws-logs driver & CloudWatch
    • Enable CloudWatch Container Insight to collect more detail monitoring metrics and logging.
    • Use X-Ray for transaction tracing for troubleshooting perfomance.
  • Deployment

  • Security

  • Cost Optimization

    • Right sizing EC2 container instances
    • Set tagging for all Containter instances
    • Consider to use EC2 Spot and Fargate Spot

What you need to know (be aware of) when using ECS on Fargate.

  • Limitation Fargate do not support all of the task definition parameters. ref
    • Cannot use provilaged mode
    • Should use awsvpc mode -> Task will have ENI and a primary private IP address
    • Cannot use gpu
    • No placement constraint
    • Task CPU and memory (min: 0.25 vCPU, 0.5GB RAM, max: 4 vCPU, 30 GB RAM)
    • Logging: awslogs, splunk, firelens, and fluentd
  • Optional need Amazon ECS task execution IAM role for call other AWS service, e.g. ECR
  • Fargate platform version realease will provides update on kernel or operating system updates, new features, bug fixes, or security update
  • Task automated scheduled-retirement: you will be notified by email
    • Task is stopped or terminated by AWS. If it is part of the service, it will be updated automatically.
    • Reason:
      • Irreparable failure of the underlying hardware
      • Task has a security vulnerability
  • Fargate task recycling
    • When a security or infrastructure update is needed
    • No notification before recycling process
    • Only affect task that part of service (not standalone task)
  • Fargate makes no network throughput guarantees, nor does it guarantee equal CPU performance among tasks,
  • Expose Fargate using API gateway, VPC Link & NLB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment