Your application’s Auto Scaling Group scales up too quickly, too much, and stays scaled when traffic decreases. What
should you do to fix this?
Set a longer cooldown period on the Group, so the system stops overshooting the target capacity. The issue is that the scaling system
doesn’t allow enough time for new instances to begin servicing requests before measuring aggregate load again.
Calculate the bottleneck or constraint on the compute layer, then select that as the new metric, and set the metric thresholds to the
bounding values that begin to affect response latency.
Raise the CloudWatch Alarms threshold associated with your autoscaling group, so the scaling takes more of an increase in demand
Use larger instances instead of lots of smaller ones, so the Group stops scaling out so much and wasting resources as the OS level,
since the OS uses a higher proportion of resources on smaller instances.
Systems will always over-scale unless you choose the metric that runs out first and becomes constrained first. You also
need to set the thresholds of the metric based on whether or not latency is affected by the change, to justify adding
capacity instead of wasting money.