Register now: The Zen of Total Platform Engineering Cost Management

Cloud Tagging Best Practices for Better Cost Allocation, Part 1

Cloud tagging ensures visibility of public cloud resources and their associated costs. This visibility is a foundational first step to:

  •   Understanding cloud resources and allocation across teams,
  •   Identifying root cause analysis (RCA) to cost anomalies,
  •   Remediating cost anomalies before they burn millions of dollars in unanticipated spend, and
  •   Reporting on cost metrics that matter to your business.

Many organizations need help implementing and enforcing unified cloud tagging policies for their multi-cloud infrastructure. We’re writing this blog series to assist you in developing a solid cloud tagging strategy that operates at scale, based on our years of experience developing tagging strategies from scratch at businesses ranging from startups to large enterprises. Here, you’ll learn how to develop a cloud tagging strategy that supports cost allocation in business and engineering contexts and how to implement this strategy throughout your teams to foster a culture for cloud cost ownership.

What is cloud tagging?

Tags are identifiers of resources that can depict cloud resource ownership and association. Tags are of the form Key and Value pairs (yes, you guessed it, just like JSON format!). They function much the same as real-world tags. For example:

  •   Labeling your kid’s belongings before they go to summer camp
  •   Adding a tag to your pet to identify/find them when lost
  •   Adding expiration dates to perishable foods
  •   If you look closely, your name is also a tag that identifies you!

In the cloud world, a ton of information can be used to depict resource ownership and association, both in technical and business terms. A single cloud resource can have multiple tags. For example:

  •   What team is responsible for managing this resource?
  •   What customer does this resource support?
  •   What application does this resource support?
  •   Which environment or version of the application does this resource belong to?

Why do we need cloud tagging?

Just as tags can answer technical questions about specific cloud resources, tags can also help answer higher-level business questions about an organization’s strategies, plans, and return on investments.

Visibility: Reporting on cloud costs and fostering a culture of cloud cost awareness

To cultivate cost awareness, we must help engineers understand their costs. When we add tags to resources of an internal service or product feature, their costs become visible to them. (We’ll discuss what tags to use across your cloud resources later in the blog series.) For example, when we use a service_id tag key and value as the name of the team’s service, one can depict the cost of their service by selecting their service_id tag value.

Ownership: Support the finance team in producing precise budgets and forecasts

Tags can help build per-team/cost center budgets, forecasts, and chargeback/showback models. Once we have tags associated with engineering teams, finance can build budgets on top of individual team tags to help notify when thresholds are met. These models are fundamental to the FinOps practice.

Chargeback and showback models both depict costs at a product/team level. Chargeback sends expenses to a product/team P&L, and that team is accountable for the payments. In contrast, the showback model shows the charges by product/team, but payments come from a centralized budget.

Unit economics: Measure what matters to your business

Tags play a vital role in measuring cloud unit economics, a system for objectively measuring how well your organization is performing against its FinOps goals and as a business in the market.  (Unit economics is a separate section by itself, and we’ll talk about this in detail later in the series.) Typical metrics include profit margins, cost of goods sold (COGS), and customer lifetime value (LTV), but you might want to look at more specific measurements, for example:

  •   Costs per team/feature/service
  •   Costs per customer
  •   Costs per monthly active user (MAU)

Troubleshooting: Cost spike analysis and remediation

Tags aid first responders (FinOps Engineers/ Cloud Infra teams) in identifying where cost spikes originate, performing root cause analysis (RCA) on a particular service, and remediating the issue. Cost spikes can potentially burn thousands or millions of dollars before they are even identified. Having proper tags and budgets on these tags at a team level can help accelerate RCA and remediation.

Cloud governance: Security enforcement

Tags can help you enforce security policies across a fleet of cloud resources. For example: Consider that we have rolled a tag called environment with values as dev/ test/ prod. If a security patch needs to be applied to prod compute resources, you could use the tag environment = “prod” and have automation scripts run to patch prod compute resources. Security tags can also help identify the severity of cloud resources and add compliance labels.

Automation: Rightsizing and terminating

Tags are great for automation! You can write Lambda functions (AWS) or similar in other clouds that watch through tags and do wonders with it w.r.t cost savings. You can prune untagged or idle resources when you have tagging enforcement. (Consider mandating tags in Dev accounts; we’ll discuss these tactics later in the blog series.)

What content goes into cloud tagging?

Think about the questions you or others at your organization will ask about your cloud resources and work backward from there. Who might be asking questions? What questions are they asking?

  •   Sr. Engineers/Tech Leads: Potential costs incurred for implementing OKRs/ projects?
  •   Engineering Managers: Costs of applications?
  •   Product Managers: Profitability of products?
  •   Finance: Monthly costs and forecasts?
  •   Executives: Costs, profitability, forecasts, opportunities?

What information would you need to answer those questions? For example:

  •   How much was spent to support the customer we onboarded this month?
  •   What’s my spend in the EMEA region?
  •   What did it cost us in data transfer to add that new auto-complete capability to the search bar?
  •   Where can we save money on our cloud resources?
  •   Which of my applications is driving our cloud spend?

What are the challenges of cloud tagging?

Cloud computing environments can be extremely complex, with hundreds of thousands of services being utilized across multiple cloud platforms. Just thinking about how to go about tagging all those resources can be stressful. Once you do dive in, it’s easy to get stuck in a loop of ideas to design and implement your tagging structure.

In any environment, though, relying on manual tagging does not scale. This is especially true in modern dynamic environments where new projects spin up daily, teams change, and resources are ephemeral. It can be like playing a never-ending game of “Whack-a-Mole” as your infrastructure evolves. Your engineers will not be happy: the more time they spend on tagging, the less time they spend on building great products and services.

Cloud tagging strategies and best practices

Here are a few tips to help get you oriented and ready to tackle your cloud tagging initiative. No matter what the current state of your tags today, it’s never too late to implement a thoughtful strategy that’s easy to adopt, enforce, and maintain.

Define the policy

  •   When devising your tagging taxonomy, start with how your business measures success and drill down into the teams, products, processes, and technologies that drive that success. What questions is the business trying to answer?
  •   Keep it simple at first; don’t go overboard.  Settle on a few required tags, and have optional tags, too. (In the next blog, we’ll detail the tags that have worked at scale.)
  •   Define the key/value pair format. Keep it unified, whether it is camelcase, all uppercase, or all lowercase.

Communicate the policy

  •   Share the reasons why tags are helpful; what are the benefits to the stakeholders? Go back to the questions the business may be asking.
  •   Post the policy centrally where all the teams implementing tagging will find it. Some teams use internal wikis, like Confluence or Notion.
  •   It works best when the tagging policy comes from a central infra or FinOps team.
  •   Give “tech talks” on rolling out these tags to cloud resources and why we need these tags across engineering to help with a faster rollout.
  •   Provide periodic updates to teams on their adherence to the policy.
  •   Appreciate engineering teams’ efforts. A small thank you goes a long way!
  •   Pro Tip: If you have a TPM org, please involve those teams and take as much help as you can get to help roll out these tagging policies across engineering.

Review the policy and your organization’s tag health regularly

  •   Start with soft enforcement. Hard enforcement—i.e., not allowing resource creation without tags—is not recommended at first, as this takes more of a “stick” approach and can increase friction between engineering and ops or finance teams.
  •   When you’re ready for hard enforcement, start mandating tags with Dev and Test teams, providing clear communication from Day 0.

Automate tagging processes

  •   Messy and inconsistent tags can be cleaned up; mistagged resources can be retagged.
  •   Cloud resources with missing tags can be tagged based on other contextual data.
  •   New cloud resources can be tagged automatically based on defined policies.
  •   If taxonomies differ from cloud to cloud (e.g., AWS tags use different terms than Azure tags), tags can be “normalized” to bring multi-cloud costs into a single view.
  •   Tags can be grouped into hierarchies.
  •   Pro Tip: Use Resource groups in AWS to tag multiple resources with a single API call. More info here

What about untaggable cloud resources?

Ultimately, the fact is that your tags will never be perfect, and it’s likely that you will always have some untagged resources. A rudimentary way to deal with this is to assign services to teams or applications based on proportional sharing or directly to teams in cases where a single team consumes the service. You could also identify a metric that works for your organization. We’ve seen metrics like the amount of GB ingested/sec and the amount of data processed by applications to proportionally allocate untaggable costs, which worked well!

Even if a resource is untagged, you can still have visibility of its cost. Untagged resources can be grouped as “unallocated” for cost reporting until they can be tagged appropriately.

Read more in part 2

Continue reading Part 2 of this blog series, where we will talk more about tagging best practices, offer ideas on what tags to use, and how to implement tagging via Infrastructure as Code (IaC).

Ready to see how Yotascale makes cloud tagging and cost allocation fast and easy? Request a demo today.