Cloud Cost Anomaly Detection at Zoom

Cloud Cost Anomaly Detection Saved the Day, the Month, at Zoom

Managing cloud cost is a full-time job, but no single person in an organization can do it alone. Cloud FinOps roles and teams are popping up everywhere to drive cloud cost accountability at a cultural level. When they get their monthly bill from the cloud provider(s), they want to see progress. That could be measured in dollars, i.e., a lower cloud bill. More often, in growing SaaS companies, the cloud bill grows too; so, progress might simply mean narrowing the variance between forecasted and actual spend.

The last thing anyone wants, though, is to be shocked when that monthly bill comes—and that’s what Zoom Video Communications managed to avoid earlier this year.

It was a Friday afternoon, mid-month, at Zoom. Yotascale’s cloud cost management product detected a $20,000 daily cost spike that quickly grew to $40,000. A cloud cost analyst at Zoom received an alert from Yotascale pinpointing the source of the anomaly and reasons for the increased cost. In this case, it came down to an unintentional misconfiguration made by a single engineer.

Here’s the thing: If the anomaly had gone undetected until the monthly cloud bill arrived, Zoom would have had to pay an extra $40K per day for 15 days. That’s a whopping $600,000 in unanticipated cloud cost—even more, if you consider that the bill usually arrives on the second or third day of the following month, and then it takes extra time to manually review the bill. Talk about shock.

Thankfully, that didn’t happen. The analyst immediately worked with the engineer responsible to remediate the issue and saved Zoom from a runaway cloud cost catastrophe. Nice work, team!

What lessons can we learn here? I can think of a few.

Automation is your ally.
If you don’t have an automated system in place to alert you to these types of cost spikes in near-real time, you are doing yourself and your organization a disservice. Set up thresholds and use advanced tools to catch more nuanced anomalies as well as potential budget overruns. Be proactive now, so you can react faster the moment a cloud cost anomaly is detected. Waiting until the cloud bill comes—and the money is already spent—is a losing battle.

Discounts aren’t everything.
While it is true that the lion’s share of cloud cost savings typically comes from commitment-based discount programs, like Reserved Instances, Savings Plans, and Enterprise Discount Programs, this story proves that cloud cost anomaly detection can do its fair share in preventing waste. Think of it as another leg in the cloud cost management stool.

Engineers need cloud cost visibility.
Zoom has a centralized FinOps team and, in this case, the alert was sent to the cloud cost analyst by design. But, if the engineer at Zoom had gotten the alert directly, even more time and money could have been saved. After this singular event, Zoom has given all engineers access to Yotascale. They also updated their cloud governance policies to prevent this type of incident from happening in the future.

“It’s great knowing we can count on Yotascale to spot these kinds of anomalies as they happen, so we can stop cost leakage from growing into a huge loss,” said Nick Konstantinou, senior cloud cost analyst at Zoom. “Not only that, but we can take the learnings and improve our cloud governance policies organization-wide. Moving forward, every engineer at Zoom will have access to Yotascale to help us contain costs and be even more efficient in the cloud.”

This story had a happy ending. Of course, no one wants to waste money. But as a FinOps practitioner, I’d much rather tell the CFO that we went over our monthly cloud budget by $60K than by $600K. Wouldn’t you?

In use at Zoom since 2019, Yotascale helped Zoom grow their cloud infrastructure efficiently as remote work skyrocketed during the pandemic. Read the case study for more.

 

 

Leave a Comment

Your email address will not be published.