AWS Cost Optimization and CI/CD Automation for Entertainment Software Platform

  • CI/CD Pipelines
  • Cloud Cost optimization 
  • Incidents Management
  • Infrustructure Management
  • Monitoring
  • SRE
  • Technical Consulting
AWS Cost Optimization and CI/CD Automation

Client Background

Our client is a platform that connects artists from all over the globe with music curators by promoting songs and artists. At the same time, allowing curators to find talented artists and earn exciting rewards by exploring new music hits.

Business Challenge

The main business challenges were as follows:

1. Lack of transparency of processes in the team

The team faced difficulties maintaining project visibility and the decisions’ accuracy.

2. Not a stable product with frequent downtimes

The application experienced frequent downtimes, causing inconvenience to users and negatively impacting the overall user experience.

3. Postponed release schedule

The release schedule of the application had to be postponed multiple times due to various reasons, such as unforeseen technical issues, resource constraints, or changing business requirements. This delay affected the timely delivery of new features and hindered the ability to meet user expectations.

4. Vast and non-optimal infrastructure costs

The infrastructure costs associated with hosting and managing the application were high and not optimized. Inefficient resource allocation and improper scaling mechanisms led to unnecessary expenses, impacting the overall profitability of the project.

5. Project maintenance resource disposal

There were challenges in efficiently allocating and managing project maintenance resources to ensure the right resources were available at the right time.

6. Not scalable application

Infrastructure architecture was not adapted to potential scale-up or down activities. It could have an impact on platform performance, availability, and self-healing.

Regarding the technology stack, the main challenges were as follows:

The application was developed using Node.js (Express) technology, an open-source server-side and mobile API application framework (as Node.js provides high performance for data-intensive applications that require real-time processing capabilities).

The application consists of 2 main parts: the backend and a cron-server for running periodical tasks or launching routine background tasks. The backend and cron-server are hosted and deployed on AWS (Amazon Web Services).

The front end of the application is hosted as static content on AWS S3. This approach leverages the simplicity and cost-effectiveness of hosting static content on a scalable and globally distributed storage service like AWS S3.

Prior to collaborating with the Gart team of DevOps experts, the client’s solution was hosted on multiple EC2 machines and managed manually. The client had already chosen AWS as their hosting platform, and the Gart team stepped in to optimize and improve the deployment and management processes. 

The manual deployment process involved developers accessing the target server via SSH, fetching a branch from the repository, and activating the build. However, due to the lack of isolation between the development environment and production, the Amazon EC2 machines were placed in the same account as a virtual private cloud (VPC). This setup posed security and stability risks as changes made in the development environment had an impact on the production environment.

Our customer had 4 different Amazon EC2 instances; each required the manual deployment procedure described earlier. This distributed workload increased complexity, maintenance efforts, and the costs of managing multiple server instances. Consolidating the deployment process and reducing the number of server instances would help streamline operations and reduce overhead.

The application was not containerized, meaning it lacked the isolation containerization technologies like Docker provided. This absence of isolation between the development environment and production further increased the risk of configuration issues and compatibility problems when deploying the application. Containerization would offer improved deployment consistency, scalability, and ease of management.

Our customer’s project solution architecture:

Solution

Prior to partnering with Gart, our customer settled defined acceptance criteria for the services provided:

A) Introduce 4 types of environments:

  • Production environment – the same account as it is now, same VPC – don’t move production from the existing account
  • Сreate a separate testing stage and dev environments – production and non-production accounts, and place workloads there. There should be a reasonably quick time to spin up a new development environment in AWS (including a new DB instance)
  • Local dev environment (create a local sandbox using e.g. docker-compose with all workloads – UI, backend, database, cron server). Cron servers on old infrastructure should be switched off on migration to avoid double jobs execution.

B) Create containers for:

  • Frontend application (React)
  • Backend application (Node.js)
  • Cron server application (Node.js)
  • NEST application (NEST.js)

C) Introduce CI/CD pipelines for 4 containers (AWS ECS) for workloads
D) Introduce the job to run integration and UI tests
E) Introduce roles and policies for Developer and DevOps + viewer role
F) Introduce metrics and alerts for a cluster:

  • Memory utilization (with alert)
  • CPU utilization (with alert)
  • Disk space utilization (with alert)
  • Restarted containers (with alerts)

G) Document the production deployment procedure as a guidebook.
H) Document rollout procedure (in case of failure) — that MUST include Database restoration to the previous state (if database migration scripts were applied).
I) Achieve an acceptable level of Downtime on migration to new infrastructure – up to 1 hour.
J) Old servers should be switched off due to workload migration to new infrastructure (switch from EC2 IP addresses to ECS load-balancer).
K) The QA team will accept the quality of the environment before making it publicly accessible.

Results

The Gart team has successfully completed the project due to the settled time limitation – in 1.5 months. By partnering with DevOps professionals, the client’s team got the following outcomes:   

  • Infrastructure costs on computing resources were reduced by 10-15% by introducing 4 types of environments and containerization – Production, Dev, Local Dev environment, and Cron server)   
  • Release management by introducing CI/CD process was automated. Human factor minimized  
  • Downtime decreased from 1 hour to zero  
  • Security was increased by following AWS’s well-architected framework for infrastructure  
  • Non-used servers are turned off due to workload migration 
  • Prepared rollout procedures and documentation for disaster recovery.   

Our cooperation has succeeded in meeting and overriding client requirements. Our team managed to cope with all our problems and provide timely solutions, SRE, and technical support services until now.   

Our customer reviewed us 5 stars and left a testimonial on Clutch.

Cloud Consulting & Web Hosting for Social Networking Platfrom

Encounter similar challenges in your project? Book a consulting session with us!

people icon

Let’s work together!

See how we can help to overcome your challenges

arrow arrow

Thank you
for contacting us!

Please, check your email

arrow arrow

Thank you

You've been subscribed

We use cookies to enhance your browsing experience. By clicking "Accept," you consent to the use of cookies. To learn more, read our Privacy Policy