Case Study: How Flatiron Health Gained Visibility and Control Over Total Platform Costs

Cloud Tagging Best Practices for Better Cost Allocation, Part 2

In the previous blog, we discussed tagging 101, how tagging could help with cloud cost reporting,  how to promote a tagging strategy that supports cost attribution in your business and engineering context, and cultivating the culture of cloud resource ownership at scale.

This blog continues the series and discusses tagging strategies that work at scale and how to tag resources with Infrastructure-as-Code (IaC). We will add suggestions for key-value pairs (tags) that could fit in your environments and suggestions for a tags hierarchy. Use this as a reference for your tagging enhancements.

Cloud Tagging Hierarchy

Tagging cloud resources is the first step toward cost visibility and attribution. Tags are needed at several levels to ensure precise attribution.

From a business perspective, you could tag based on your organization structure (usually based on cost centers and organization hierarchies), which could look something like below:

From an infrastructure perspective, a cloud resource tagging hierarchy looks like the following:

  • Account/project level tags
  • Cloud Infrastructure resource level tags
  • Microservice tags (K8s)

Let’s explore each in detail.

ACCOUNT-LEVEL TAGS


These tags can help with cost attribution at an account level (AWS) / Project level (GCP) and are the founding steps towards budgeting and forecasting, as well as building chargeback/show-back models.

Note: Having tags defined as required in the list below will help your organization’s tagging strategy thrive. We’ve seen large enterprises use 1:1 mapping for accounts and teams. This works well when there’s a landing zone structure in place; otherwise, it adds a lot of overhead. 1:1 mapping also helps build precise cost attribution and chargeback models. (Network costs and other shared costs are quite tricky to handle, and this mapping helps with that.)

Note: These tags should be enough when some accounts/projects are owned by engineering teams and not shared with others. This level of granularity won’t be sufficient for cost attribution for shared accounts/microservices. In the below sections, we’ll cover how to overcome that problem and allocate costs for shared services.

CLOUD INFRASTRUCTURE RESOURCE-LEVEL TAGS


An account in the Cloud contains more than a few teams working on it. In these scenarios with multiple teams, tagging has to be performed at a resource level to build cost attribution.

Note: From a cloud resource deployment perspective, the recommendation is to use environment (dev, qa, staging, prod, etc.) to identify resources from specific environments.

Tip: Resource Cleanup-related Tags (Save dollars with these tags + automation)

Pro tip #1: This tag helps to clean up resources that are no longer needed after a particular.

Pro tip #2: This tag helps to shut down resources that are no longer needed after a particular date.

In my previous life, we stopped QA clusters and EC2 Instances with the help of K8s labels in combination with cronjobs and AWS EC2 tags in combination with lambda function, respectively.

Security-related Tags

This is a curated list of tags that could come in handy with security teams with regards to incident response, etc. While these tags do not directly contribute towards cost allocation, they can help your security teams implement guardrails and automate processes.

MICROSERVICES TAGS (K8s LABELS)


Kubernetes Cluster-level Tags

All our Kubernetes clusters must contain the following tags at the instance level.

Kubernetes Resource-level Tags (popularly known as Labels):

In today’s modern-day microservice architecture, container resources must be labeled to achieve precise cost attribution. K8s tagging is approached via labels. All our Kubernetes resources (pods, nodes, etc. ) must contain the following tags or labels in a Kubernetes world.

How to Tag AWS Resources via Terraform (tf)

AWS resources could be tagged by an Infrastructure-as-code (IaC) tool; either Terraform or CloudFormation works. Terraform has gained popularity due to its cloud-agnostic nature of building IaC and better state management and integrations with cloud vendors.

Terraform Example

We must create a block like the below in main.tf where we call the module from Terraform-utils.

module “account_tags” {
source = “github.com/Organization-dev/terraform-utils/modules/tags/account-tags?ref=v0.1″account_name = “devtest”
service_owner = “Ops”
requestor = “ops@Organization.net”
}

We do this by populating default_tags with the module which we defined in main.tf above.

provider “aws” {
region  = “us-west-2”
profile = “Organization-devtest”default_tags {
tags = module.account_tags.tags
}
}

How to Tag Your K8s Resources

 Here’s how to tag/ label your K8s resources via manifest files Label section.

apiVersion: v1
kind: Pod
metadata:
name: front-end-app
labels:
env: dev
app: nginx
service_owner: team-xyz
service_name: nginx

Shared Services Cost Attribution

Shared services range from taxes, support fees, credits, and databases to microservices. This could be a shared S3 bucket, an RDS database consumed by multiple teams, a K8s microservice used to process data, an EMR cluster that processes data, etc.  It’s a pain to build chargeback models with shared services.

Cost attribution for taxes, support fees, and credits

  • These charges from the CSP are not split to accounts/projects; they’re usually charged at the payer account level with no granular attribution.
  • A fair way to chargeback (allocate these charges back to engineering teams) is to split these charges based on a team’s cloud spending proportionally.
    • Example: If Team A spends 10% of the total bill, 10 % of taxes, support fees, and credits must be allocated to Team A.

Cost attribution for shared services like RDS and K8s (microservices)

Shared services chargeback often gets tricky, and it’s not easy. Every situation can be unique; there is no one-size-fits-all solution. Here are some pointers that can help address this problem.

A meaningful, quantifiable metric is vital in building a shared-service chargeback framework. Examples of such metrics could be:

  • Time taken for the query to run: This metric can help build a chargeback model for associating costs to customers. This will help identify which customers are profitable.
  •  Amount of data transferred/ processed: This metric helps chargeback costs on a system heavy on data processing.
    • A way to chargeback here is to proportionally distribute costs to users based on the amount of data processed.

For shared microservices, use systems metrics (% CPU utilization, % Memory utilization) together with # of API invocations and/or Network IOPS.

  • Use a combination of these metrics to proportionally distribute resource costs such as EC2, S3, RDS, etc.
  • You could use a formula that looks like this:
    • shared_service_cost =  cloud_resource_cost * (percent_distribution_of_metric1 * weight_of_metric1 + percent_distribution_of_metric2 * weight_of_metric2 +…)
    • cloud_resource_cost refers to the actual price of cloud resources such as EC2, S3, RDS, etc.
    • weight_of_metric needs consensus from within engineering teams and service owners
    • Use the metrics that make sense for your workloads. The proportional weight of metrics is a prerequisite to building this model.

Note: This would need additional metrics like CPU, Memory, Network IOPS, and storage to be deployed to a centralized monitoring system like Prometheus/CloudWatch, etc, to report percent_distribution_of_metrics.

AWS has a blog that helps with tenant-based cost allocation implementation here, and such a model can be applied for K8s, RDS, and S3  shared services cost attribution.

Parting Thoughts

Tagging cloud resources is essential for granular cost visibility and allocation. Tags are needed at multiple levels, including account/project-level and cloud infrastructure resource-level tags, to ensure precise cost attribution. The tags recommended here are what we’ve seen work best at scale in large enterprises.

In addition, tags can help with security and automation for incident response management and execution on cost savings opportunities. Furthermore, shared services cost allocation can be tough; we have discussed strategies for tagging shared services.

Remember, though, that to be truly useful to the business, tags should be used in the context of business hierarchies and rolled up to teams, departments, business units, divisions, etc. When your infrastructure or business organization changes, it should be relatively easy to update your tags accordingly. Manually tagging and retagging resources is a surefire way to waste engineering resources and lower morale. Your cloud cost management tool should be able to make tag management simple.

Contact us to learn more about how Yotascale simplifies cloud tagging management.