As ARM architecture gains traction, especially with offerings like Amazon’s Graviton processors, there’s a significant opportunity for cost savings. Analyzing cloud costs is crucial, especially for organizations experiencing rapid growth, as compute expenses often dominate budgets.
Analyzing our cloud costs at Vim, “compute” is where we spend the most. This post dives into how we at Vim analyzed and assessed ARM. We cover what the cost saving potential is, how to review application compatibility, how to execute the migration and how we tested and monitored.
Cost Saving Analysis
Reading about the adoption of Graviton in the industry and considering most of our workload is running Node.js we recognized an opportunity for big savings with little effort.
Cost savings vary by company (you can check yours here). In our case, savings can range from 10%-40% depending on how pods are organized and how we would deprecate Intel machines.
After completing our analysis, we decided that Graviton was the most compelling and would have the most impact on cost savings.
But wait, is my application compatible?
Good question. And that’s what we’re going to figure out first. As we know, EKS uses docker images to build your app on the cloud. So to get ARM pods, we need ARM images on our docker images store. Docker stores like DockerHub make it easy to see a clear indication of which architecture the image was built in. For example, if your Docker image had just AMD, it would have a single option in the architecture selector. However, the goal is to support both architectures, which would make the transition smooth while scaling up new pods and scaling down old pods.
Looking at a Docker image on DockerHub, a dual-architecture image would look like this:
So how do I get my app to be dual-architecture?
This depends on your docker-building script but is typically quite straightforward.
Docker considered this and came up with an extended CLI tool called “dockerx”.
This tool allows you to specify which architectures you’d like the image to build in. Specifically, the `build` command.
An example of building an image in dual-architecture would be as follows:
docker buildx build --platform linux/amd64,linux/arm64 -t <user>/<repo>:<image_tag> . --push
You’ll want to change the build script in as many images in your organization as possible so that later, it’ll be mostly configuration on AWS to select which pods to scale in which architecture.
Ok, my apps are being built with dual architecture. Now what?
Now that your docker images support both architectures, this will allow AWS to bring them to life in an ARM pod. AWS is intuitive enough to pull the correct architecture image according to its support. If you have both, you get to choose.
On to the cloud!
The first migration should be picked carefully – we aspire to get quick results and make an impact on the bottom line of the invoice. Lower environments like development or staging are the first pick of course, but more specifically – RDS.
For Vim, we chose it because we have RDS instances that we used for Extract Transform Load (ETL) processes so the SLA is lower, and also if reversion is needed – no harm is done.
The migration itself is not complicated:
- Pick a database engine that is supported by Graviton:
New – Amazon RDS on Graviton2 Processors | Amazon Web Services - Change the instance type of your RDS instance to one of the Graviton instances. Choose the Graviton instance type that matches your requirements and modify your instance.
Gradual Migration
Now, how did we test that nothing is broken after such a change?
Well… In our case, we decided to add a few Graviton nodes to the K8S cluster in lower environments (like development or staging). Then, add nodeAffinity with
nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: kubernetes.io/arch operator: In values: - arm64 weight: 100
Whereas for the deployments that still do not support multi-arch, add prevention from being scheduled on Graviton nodes
requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: NotIn values: - arm64
After the pods are deployed, run your standard sanity checks and gradually remove the AMD nodes according to your needs.
Let’s see those gains!
The most enjoyable part is seeing cost savings. We use a tool called: CloudHealthTech which allows you to monitor actual costs in your cloud provider.
After completing the hard part – scaling up Graviton pods and scaling down the AMD’s – we can see the savings showing nicely in the graph. On the left pane, that is in Reports → Cost → EC2 Instance.
If you’re successful, you may see a graph similar to to this: (Divided by CPU)
Notice c5n (AMD), gradually scaling down, and r7g scaling up. The total cost (the height of both) is significantly lower than the baseline.
Good luck and happy hunting these cost savings 🙂