Client Background
Our customer, Appsurify.com, is a software development and testing company that creates products for other software development companies to speed up the running of automated tests with a modern machine learning solution.
Business Challenge
The main challenge the client experienced was the high Microsoft Azure infrastructure costs and the need for Azure infrastructure cost optimization. Even after performing some optimization on their side and reserving VMs for a year, they still had monthly infrastructure costs of more than $10k for a single Kubernetes cluster. The client used Azure Kubernetes Cluster, PostgreSQL server, and storage accounts in their operations.
Solution
Gart dedicated an Azure architect to work closely with the client’s team to analyze the cloud infrastructure’s current state, review the cost breakdown, and identify the root cause
of the issue. After deep analysis, we discovered that most of the costs came from the Load Balancer and network traffic. However, there was no substantial incoming traffic from customers to the solution. Most of the traffic was internal, but the client was charged for it as external traffic for some reason.
The issue was in the underlying technology the client used. Machine learning solutions operate with vast sets of data to train the model. In the case of our customer, data was stored on a file share inside Azure. However, pods connected to the file share via a public link, so all the traffic to and from the file share was treated as external.
For instance, the client could have 10TB of traffic per day in peak load, costing the company up to $400 daily.
We configured the file share with a private endpoint in the same VNet where AKS nodes were deployed so that all the traffic was routed only through VNet (internal traffic) and did not affect the Load Balancer costs.