Home
Resources
How DevOps and SRE Practices Can Ensure Project Scalability for Your Business

DevOps

SRE

How DevOps and SRE Practices Can Ensure Project Scalability for Your Business

Roman Burdiuzha

Cloud Architecture Expert Co-founder & CTO of Gart

April 20, 2025

How DevOps and SRE Practices Can Ensure Project Scalability for Your Business

Table of contents

Understanding Scalability
Scaling for Success: The Proven Path to Revenue Growth and Cost Savings
How DevOps and SRE Practices Enable Scalability
DevOps vs. SRE: Complementary Strengths for Scaling
How Gart Solutions Drives Scalable Performance

Is your software ready for growth, or will it crumble under pressure?

Businesses are under immense pressure to innovate and grow. While technology is the backbone of these advancements, understanding its intricacies can be a daunting task for non-technical business owners. This is especially true when it comes to complex concepts like scalability.

Scalability is the ability of a system to handle increasing workloads and user demands. Without it, businesses risk experiencing slow performance, system crashes, and ultimately, lost customers. It’s the difference between a website that can handle a sudden surge in traffic during a holiday sale and one that crashes under the pressure.

This is where the disciplines of DevOps and Site Reliability Engineering (SRE) come into play. These complementary practices, which have gained significant traction in the tech industry, offer a roadmap for ensuring the scalability and resilience of your digital projects without sacrificing reliability.

This guide dives into how scaling delivers business ROI, the practices that make it possible, and the strategic partnership Gart Solutions provides.

Understanding Scalability

Pilots are easy, but scaling up is hard

Scalability is simply the ability of a system to grow and handle increased demand. Imagine a small restaurant that becomes incredibly popular. If it can’t expand its kitchen or seating, it will struggle to serve more customers. A scalable restaurant, on the other hand, can adjust its operations to accommodate the growing crowd.

The consequences of poor scalability can be dire for your business. Imagine your company’s website grinding to a halt during a major marketing campaign, frustrating potential customers and causing them to abandon their shopping carts or search for your competitors. Or consider the impact of a critical business application crashing under the strain of increased usage, leading to lost productivity, missed deadlines, and dissatisfied clients.

The consequences of poor scalability extend beyond lost customers and revenue. A system that can’t handle increased demand can damage a company’s reputation. Major online retailers like Amazon or ticket sales platforms have invested heavily in scalability to prevent these issues during peak shopping periods. They understand that a seamless customer experience is crucial to their success.

benefots of scalability in cloud computing

Scaling for Success: The Proven Path to Revenue Growth and Cost Savings

Recent research from the Boston Consulting Group (BCG) has shed light on the tangible business benefits of scaling digital solutions. The study, which covered approximately 2,000 global companies, found that scaling individual digital solutions can generate revenue increases of 9% to 25% and cost savings of 8% to 28% compared to the relevant baseline (see Exhibits 2 and 3).

But the real game-changer emerges when companies scale several digital solutions across the enterprise. In these cases, the research indicates that organizations can achieve an enterprise-wide revenue increase of almost 17%, along with a 17% reduction in costs.

Individual digital solutions saw 9–25% revenue growth and 8–28% cost savings

Enterprise-wide scaling resulted in ~17% revenue increase and ~17% cost reduction.

The advantages of scaling digital solutions extend beyond just the financial bottom line. Businesses that successfully scale their digital capabilities also experience qualitative benefits, such as:

Reimagined customer experiences that drive loyalty and satisfaction
Greater ability to integrate digital and data ecosystems for competitive advantage
Stronger business resilience and adaptability to market changes
More inclusive and diverse workplaces that foster innovation

How DevOps and SRE Practices Enable Scalability

It’s a valid question, and one that deserves a clear, practical explanation. Let’s dive in and explore the key ways these complementary disciplines can future-proof your technology investments.

Automation

One of the core principles of DevOps is the automation of repetitive tasks, such as software deployment, infrastructure provisioning, and testing. By automating these processes, you can significantly reduce the time and effort required to scale your project. Imagine being able to spin up new servers or deploy the latest version of your application with just a few clicks – that’s the power of DevOps automation.

Infrastructure as Code (IaC)

DevOps and SRE emphasize the use of IaC, where your infrastructure is defined and managed using code, rather than manual, error-prone processes. This approach makes it much easier to replicate and scale your infrastructure as your business grows. It’s like having a digital blueprint that you can use to quickly and consistently build out new environments.

Continuous Integration and Continuous Deployment (CI/CD)

DevOps practices like CI/CD help to automate the entire build, test, and deployment pipeline. This means that changes to your codebase can be quickly and reliably rolled out to production, supporting faster iterations and scalability. Imagine being able to launch new features or updates without the risk of lengthy downtime or service disruptions.

Monitoring and Observability

SRE places a strong emphasis on monitoring and observability, which are essential for understanding the health and performance of your digital systems. By implementing robust monitoring tools and practices, you can quickly identify bottlenecks, performance issues, and other problems that may arise as you scale your project. This allows you to address challenges proactively, rather than waiting for your customers to experience the impact.

Scalable Architecture

DevOps and SRE encourage the adoption of scalable architectural patterns, such as microservices, serverless, and cloud-native approaches. These modern architectural styles make it much easier to scale individual components of your project independently, rather than having to scale the entire system at once. It’s like building with Lego blocks – you can add or remove pieces as needed without disrupting the whole structure.

Cloud Scalability: Horizontal vs. Vertical Scaling of IT Infrastructures

Capacity Planning

SRE practices include proactive capacity planning, where you continuously monitor and forecast the resource requirements of your system. This allows you to scale your infrastructure and resources ahead of time, avoiding sudden spikes in demand that could cause performance issues or service disruptions.

Incident Response and Resilience

DevOps and SRE focus on building resilient systems that can withstand failures and recover quickly. This includes implementing practices like chaos engineering, incident response, and self-healing mechanisms. By making your digital solutions more robust and reliable, you can ensure that they continue to function smoothly even as you scale to meet growing demands.

DevOps vs. SRE: Complementary Strengths for Scaling

Aspect	DevOps	SRE
Approach	Culture + automation tools	Reliability engineering with metrics
Scalability Enablement	CI/CD, IaC	Capacity planning, error budgets, resiliency
Goal	Fast, consistent releases	Reliable operation during growth
Focus	Development process optimization	System availability and error management

By adopting these DevOps and SRE practices, you can unlock the true scalability of your digital projects, empowering your business to adapt and thrive in the face of changing market conditions and customer needs. It’s a strategic investment that will pay dividends for years to come.

Key considerations for scalability:

Vertical scaling: Increasing resources of existing hardware (e.g., CPU, RAM).
Horizontal scaling: Adding more servers or instances to distribute the load.
Load balancing: Distributing incoming traffic across multiple servers.
Caching: Storing frequently accessed data for faster retrieval.
Database optimization: Improving database performance to handle increased data volume.
Cloud computing: Leveraging elastic resources for on-demand scalability.

Understanding your business needs is the first step. What challenges are you facing? Are you looking to accelerate development, improve system reliability, or optimize costs? Having a clear picture of your requirements will help you find a partner that aligns with your objectives.

The capacity to scale your digital solutions is no longer a nice-to-have – it’s a strategic imperative. The companies that master this art will be well-positioned to outpace the competition, capitalize on growth opportunities, and future-proof their success.

The choice is clear: you can continue to rely on outdated, manually intensive processes that put your business at risk of performance issues, service disruptions, and lost revenue, or you can invest in the proven practices that will transform your digital operations and position your company for sustainable growth.

How Gart Solutions Drives Scalable Performance

Gart combines consulting and hands-on delivery across:

Automation services: IaC with Terraform, CI/CD pipelines
Observability platforms: Prometheus, Grafana, CloudWatch setups
Architecture design: Microservices, container orchestration (ECS/EKS)
Capacity forecasting: Scaling planning, cloud resource optimization
Incident readiness: Auto‑remediation, runbook development, SRE coaching

Scale your business without limits. Contact Gart today.

Let’s work together!

See how we can help to overcome your challenges

FAQ

What is horizontal vs vertical scaling?

Horizontal scaling adds more machines; vertical adds power (CPU/RAM) to existing ones. Both approaches serve different growth needs.

How does Infrastructure as Code support scalability?

IaC ensures infrastructure is repeatable, version-controlled, and quickly deployable—ideal for creating scalable environments.

Why is monitoring critical when scaling?

Monitoring catches bottlenecks and performance issues early, making scaling proactive rather than reactive.

How do DevOps and SRE contribute to project scalability?

DevOps and SRE practices contribute to scalability by automating processes, improving collaboration, enhancing monitoring and alerting, and fostering a culture of continuous improvement and rapid iteration.

What are some key DevOps practices that enhance scalability?

Key DevOps practices include continuous integration/continuous deployment (CI/CD), infrastructure as code, automated testing, and microservices architecture.

How does SRE impact business operations?

SRE improves business operations by focusing on system reliability, automating manual tasks, implementing effective monitoring, and using error budgets to balance innovation with stability.

Can small businesses benefit from DevOps and SRE practices?

Yes, businesses of all sizes can benefit from DevOps and SRE practices. These methodologies can be scaled and adapted to fit the needs and resources of smaller organizations.

What tools are commonly used in DevOps and SRE?

Common tools include version control systems (like Git), CI/CD platforms (like Jenkins or GitLab), configuration management tools (like Ansible or Puppet), and monitoring solutions (like Prometheus or Grafana).

Cloud

DevOps

Cost-Effectiveness: The Path to Sustainable DevOps and Cloud Solutions

Fedir Kompaniiets

February 26, 2025

Cost-effectiveness in DevOps and cloud strategy isn’t about finding the cheapest provider — it's about building scalable, sustainable, and efficient systems that reduce total cost of ownership while supporting long-term business growth. What Does Cost-Effectiveness Mean in DevOps and Cloud? Cost-effectiveness in this context refers to balancing investment with long-term value, not cutting corners. Instead of opting for the cheapest service or tool available, it’s about making strategic decisions that improve performance, reliability, and scalability over time. Too often, organizations assume cutting IT spend or chasing free cloud credits is “efficient.” But this can backfire when hidden costs, performance bottlenecks, or non-scalable infrastructure come into play. Why the Cheapest Option Isn’t Always the Best Long-Term Choice There are cloud startup programs, but it's essential to approach them carefully. Often, businesses make mistakes in network design and services while using free cloud credits, leading to significant additional infrastructure costs once the free period ends. One startup leveraged free credits from the Google Cloud Startup Program to quickly build its product. However, when the free period ended, they faced crippling infrastructure costs due to a lack of optimization. Check this case study: DevOps for Microsoft HoloLens Application Run on GCP Summary: Choosing the lowest-cost IT or cloud option often leads to technical debt, downtime, and scalability issues, costing more in the long run. While it's tempting to lean into "free tiers" and minimal upfront expenses, these choices frequently come with hidden costs: Limited functionality Lack of support or SLAs High overage charges after trial periods end At Gart Solutions, we promote a sustainable approach that maximizes ROI while aligning with business goals, ensuring that every IT dollar contributes to performance, stability, and growth. Sustainable IT Cost Reductions vs. Short-Term Cuts Summary: Cutting costs for immediate savings often leads to long-term inefficiencies. True cost-effectiveness means aligning IT spending with business strategy and future-readiness. In economic downturns, it’s natural for CIOs and IT leaders to seek cost savings. But reckless budget slashing can do more harm than good. Avoid These 3 Common Mistakes: Short-term focus: Cutting across the board can hinder future growth and innovation. Overreliance on consultants: Consultants often suggest low-hanging fruit, leaving limited potential for long-term savings. Neglecting stakeholders: Ignoring the impact of IT cuts on business operations can damage relationships and hinder outcomes. Our Strategy for Cost-Effective DevOps and Cloud Solutions Summary: We combine smart savings with strategic investments, helping clients avoid over-engineering while investing wisely in scalable, future-ready infrastructure. Not every component of your infrastructure needs premium tools or enterprise licenses. At Gart Solutions, we guide clients through intelligent decision-making: Where to optimize for cost (e.g., Spot VMs, autoscaling, open-source tools) Where to invest for growth (e.g., security, automation, compliance tooling) Our goal: make sure every dollar contributes to uptime, user experience, or innovation. By carefully analyzing your needs and implementing smart strategies, we ensure that you're getting the most out of your IT investments. This approach not only reduces waste but also ensures that every dollar spent contributes directly to your business goals. Read more: 20 Easy Ways to Optimize Expenses on AWS and Save Over 80% of Your Budget Strategic Product Design as a Foundation for Cost Savings The cornerstone of our cost-effective approach is strategic product design. We focus on laying down the right basic architecture from the start, emphasizing long-term stability and scalability. This ensures that your IT solutions can adapt and grow with your business without encountering major issues or requiring extensive reworks. Our solutions are designed with your future in mind. We create systems that can scale seamlessly as your business grows, allowing you to manage costs effectively at every stage of your journey. One of the key benefits of our approach is the ability to avoid future technological problems related to growth, migration, or other common challenges. This forward-thinking approach prevents the need for costly overhauls down the line and provides a stable foundation for your ongoing success. Case Study: Azure Spot VMs for Jewelry AI Vision In one example, we helped a visual AI platform for the jewelry industry cut cloud costs by 81% using Azure Spot VMs. By redesigning workloads for elasticity and resilience, we optimized compute consumption without compromising performance. Lesson: Design choices made early unlock compounding savings over time. Check this cost optimization case study: Cutting Costs by 81%: Azure Spot VMs Drive Cost Efficiency for Jewelry AI Vision. Understanding Cloud Costs in DevOps: OpEx vs. CapEx Summary: DevOps-related cloud costs fall into two main categories: Operational Expenses (OpEx) and Capital Expenses (CapEx). Knowing the difference helps you budget and optimize more effectively. Operational Expenses (OpEx) OpEx refers to ongoing costs of running DevOps workloads in the cloud, such as: Cloud instance runtime (compute) Storage usage Managed services (like databases or monitoring tools) Traffic and bandwidth These costs are typically pay-as-you-go and vary month-to-month. Capital Expenses (CapEx) CapEx refers to one-time or upfront investments, such as: Reserved cloud capacity (e.g., AWS Reserved Instances) On-premise infrastructure purchases Software licenses or setup fees Choosing CapEx can reduce monthly spending, but it requires commitment and forecasting. What is FinOps and Why Does It Matter in Cost Optimization Summary: FinOps (Financial Operations) is a framework that brings financial discipline into DevOps, ensuring cloud spending is aligned with business value and usage. Defining FinOps in Simple Terms FinOps helps teams: Understand where cloud dollars are going Predict costs before deploying Optimize spend without stalling innovation It's the bridge between engineering, finance, and operations. Why FinOps is a Game-Changer In traditional IT, budgets are fixed. But in the cloud, expenses are variable and usage-driven. That makes cost control harder, unless teams actively manage and monitor costs. FinOps brings visibility and accountability across: Engineers (who build infrastructure) Finance teams (who manage budgets) Product managers (who track business value) Key FinOps Practices: Real-time cloud cost reporting Cost forecasting by team/project Tagging resources for accountability Optimization sprints focused on spend reduction. FinOps, or Financial Operations, is an evolving cloud financial management discipline that brings financial accountability to the variable spend model of cloud, enabling distributed teams to make business trade-offs between speed, cost, and quality. How We Integrate FinOps Into Our DevOps Services At Gart Solutions, we bake FinOps principles directly into our DevOps pipelines, so clients gain both infrastructure automation and cost control from day one. Our FinOps Integration Approach Includes: Cloud cost dashboards visible to stakeholders Automated alerts for budget thresholds Resource tagging and cost attribution per environment Collaboration between engineers and finance on priorities At Gart Solutions, we integrate FinOps practices into our DevOps and cloud services to further enhance cost-effectiveness and sustainability. Case Studies: Cost-Effective DevOps in Action Case Study 1: DevOps for Microsoft HoloLens Application on GCP Challenge:A startup used Google Cloud's free startup credits to launch an ambitious product. But when the credits expired, they faced massive costs due to inefficient network design and a lack of resource planning. Solution:Gart audited the infrastructure, implemented CI/CD pipelines, and restructured the architecture to reduce dependency on costly services. Outcome: 48% reduction in monthly infrastructure spend Improved performance and deployment speed A scalable setup ready for product launch Lesson:Free credits can create hidden risks. A strategic DevOps partner can turn short-term wins into sustainable growth. Case Study 2: Cutting 81% Cloud Costs with Azure Spot VMs for AI Vision Challenge:A jewelry AI startup faced high compute bills due to heavy visual processing and machine learning workloads. Solution:Gart moved workloads to Azure Spot VMs, refactored pipelines for fault tolerance, and automated cost monitoring. Outcome: 81% reduction in compute costs Zero downtime during migration Flexible scaling for future growth Lesson:Cost savings don’t require cutting features, just smart architecture. Long-Term Benefits of a Cost-Effective DevOps Strategy Summary: Sustainable DevOps isn’t just about saving money now. It helps your business scale smarter, reduce risk, and outperform competitors over time. 1. Lower Total Cost of Ownership (TCO) You avoid patchwork fixes, re-platforming, and costly downtime. Efficient systems cost less to operate over years, not just months. 2. Greater Reliability Fewer outages. Better performance. Happier users. And less stress for your team. 3. Future-Proof Architecture With scalable infrastructure, your systems evolve with your needs, not against them. 4. Better Use of Internal Resources Your team focuses on innovation instead of fixing things or firefighting budget issues. DevOps Cost Decision Table – Cheap vs Sustainable Understanding the difference between cost-cutting and cost-effectiveness is key. Here’s a side-by-side comparison that outlines why strategic investment outperforms bargain-basement decisions over time. CriteriaCheap DevOps SolutionSustainable DevOps SolutionInitial CostLow upfront spendModerate, aligned with needs and future goalsScalabilityPoor – requires rebuildBuilt to scaleCompliance ReadinessLacks safeguardsAligned with HIPAA, GDPR, etc.Maintenance & SupportLimited or absentIncluded, proactive monitoringTotal Cost Over 12–24 MonthsHigh due to technical debt and reworkLower due to long-term savingsBusiness ImpactRisk of downtime, slower innovationFaster delivery, greater stability Conclusion:The sustainable path pays off — not just financially, but in operational resilience, scalability, and growth enablement. Cost Optimization Checklist for IT Leaders Use this checklist to review your DevOps and cloud setup for waste, inefficiencies, and untapped savings. ✅ Infrastructure & Cloud Usage Are we using reserved instances or spot pricing effectively? Are workloads appropriately sized and scheduled? Are we auto-scaling based on demand? ✅ Monitoring & Observability Do we track cloud costs by team or project? Are alert thresholds in place for spending anomalies? Are we logging usage by service tags? ✅ DevOps & Automation Are pipelines automated to prevent manual errors? Are we deploying only what’s needed with IaC? Are environments automatically shut down when idle? ✅ FinOps & Financial Governance Do we review cloud spend weekly or monthly? Are budgets and forecasts visible to Dev and Finance? Have we assigned ownership for each cloud resource? Conclusion Sustainable DevOps isn't about spending less — it’s about spending smarter. At Gart Solutions, we believe that true cost-effectiveness is about creating sustainable, high-quality solutions that provide long-term value. By focusing on strategic design, smart resource utilization, and future-proofing your systems, we help you build a robust IT infrastructure that supports your business goals while keeping costs under control. At Gart Solutions, our mission is to help you achieve IT sustainability and financial efficiency together. Let’s build something that lasts, without overextending your budget. Remember, indiscriminate cost-cutting can do more harm than good. A well-planned approach focused on long-term value is key to achieving sustainable IT cost reductions.

DevOps

SRE

SRE Monitoring: Golden Signals as a Key Metrics for System Reliability

Fedir Kompaniiets

February 9, 2025

Site Reliability Engineering (SRE) focuses on keeping services reliable and scalable. A crucial part of this discipline is monitoring, which is where the concept of Golden Signals comes into play. By focusing on just four “Golden Signals,” organizations can cut their incident response time in half. Golden Signals help teams quickly identify and diagnose issues within a system. This post explores how SRE teams use these metrics — latency, errors, traffic, saturation—to drive reliability and streamline troubleshooting in complex microservices environments. What are the four golden signals in SRE SRE principles streamline monitoring by focusing on four key metrics—latency, errors, traffic, and saturation—collectively known as Golden Signals. Instead of tracking numerous metrics across different technologies, focusing on these four metrics helps in quickly identifying and resolving issues. Latency: Latency is the time it takes for a request to travel from the client to the server and back. High latency can cause a poor user experience, making it critical to keep this metric in check. For example, in web applications, latency might typically range from 200 to 400 milliseconds. Latency under 300 ms ensures good user experience; errors >1% necessitate investigation. Latency monitoring helps detect slowdowns early, allowing for quick corrective action. Errors:Errors refer to the rate of failed requests. Monitoring errors is essential because not all errors have the same impact. For instance, a 500 error (server error) is more severe than a 400 error (client error) because the former often requires immediate intervention. Identifying error spikes can alert teams to underlying issues before they escalate into major problems. Traffic:Traffic measures the volume of requests coming into the system. Understanding traffic patterns helps teams prepare for expected loads and identify anomalies that might indicate issues such as DDoS attacks or unplanned spikes in user activity. For example, if your system is built to handle 1,000 requests per second and suddenly receives 10,000, this surge might overwhelm your infrastructure if not properly managed. Saturation:Saturation is about resource utilization; it shows how close your system is to reaching its full capacity. Monitoring saturation helps avoid performance bottlenecks caused by overuse of resources like CPU, memory, or network bandwidth. Think of it like a car's tachometer: once it redlines, you're pushing the engine too hard, risking a breakdown. Challenges associated with monitoring saturation in microservices: Complexity of Microservice Architectures:In microservice environments, various services are often built on different technologies (e.g., Node.js, databases, Swift). Each service may handle resource usage differently, making it challenging to monitor and understand overall system saturation accurately. Saturation occurs when resources such as CPU, memory, or network bandwidth are fully utilized, leading to degraded performance. Resource Utilization Visibility:Since each microservice can have its unique metrics, gaining a clear view of overall saturation is difficult. Teams need to aggregate and standardize data from multiple services to accurately assess saturation levels. This can be time-consuming and requires expertise across different technology stacks. Identification of Bottlenecks:Saturation often results in bottlenecks where some services are overloaded while others are underutilized. Pinpointing which service is causing the bottleneck in a complex system can be difficult without a cohesive monitoring approach like the one provided by SRE Golden Signals. Dynamic and Variable Loads:In microservice architectures, traffic and resource demands can fluctuate rapidly, making it essential to monitor saturation in real-time. Services must adapt to changes in load, but without proper monitoring, it's easy to miss critical saturation points that can impact overall system performance. Why Golden Signals Matter Golden Signals provide a comprehensive overview of a system's health, enabling SREs and DevOps teams to be proactive rather than reactive. By continuously monitoring these metrics, teams can spot trends and anomalies, address potential issues before they affect end-users, and maintain a high level of service reliability. SRE Golden Signals help in proactive system monitoring SRE Golden Signals are crucial for proactive system monitoring because they simplify the identification of root causes in complex applications. Instead of getting overwhelmed by numerous metrics from various technologies, SRE Golden Signals focus on four key indicators: latency, errors, traffic, and saturation. By continuously monitoring these signals, teams can detect anomalies early and address potential issues before they affect the end-user. For instance, if there is an increase in latency or a spike in error rates, it signals that something is wrong, prompting immediate investigation. What are the key benefits of using "golden signals" in a microservices environment? The "golden signals" approach is especially beneficial in a microservices environment because it provides a simplified yet powerful framework to monitor essential metrics across complex service architectures. Here’s why this approach is effective: ▪️Focuses on Key Performance Indicators (KPIs) By concentrating on latency, errors, traffic, and saturation, the golden signals let teams avoid the overwhelming and often unmanageable task of tracking every metric across diverse microservices. This strategic focus means that only the most crucial metrics impacting user experience are monitored. ▪️Enhances Cross-Technology Clarity In a microservices ecosystem where services might be built on different technologies (e.g., Node.js, DB2, Swift), using universal metrics minimizes the need for specific expertise. Teams can identify issues without having to fully understand the intricacies of every service’s technology stack. ▪️Speeds Up Troubleshooting Golden signals quickly highlight root causes by filtering out non-essential metrics, allowing the team to narrow down potential problem areas in a large web of interdependent services. This is crucial for maintaining service uptime and a seamless user experience. By applying these golden signals, SRE teams can efficiently diagnose and address issues, keeping complex applications stable and responsive. How to Monitor Microservices Using Golden Signals Monitoring microservices requires a streamlined approach, especially in environments where dozens (or hundreds) of services interact across various technology stacks. Golden Signals provide a clear, focused framework for tracking system health across these distributed systems. 1. Start by Defining What You’ll Monitor Each microservice should have its own observability pipeline for: Latency – Measure the time it takes for a request to be processed from start to finish. Errors – Capture both 4xx and 5xx HTTP codes or application-level exceptions. Traffic – Monitor request rates (RPS/QPS) and message throughput. Saturation – Track CPU, memory, thread usage, and queue lengths. Tip: Integrate these signals into SLIs (Service Level Indicators) and SLOs (Service Level Objectives) to measure system reliability over time. 2. Use Unified Observability Tools Deploy tools that allow you to collect metrics, logs, and traces across all services. Popular platforms include: Datadog and New Relic: Full-stack observability with built-in Golden Signals support. Prometheus + Grafana: Open-source, highly customizable metrics + dashboards. OpenTelemetry: Instrument code once to collect traces, metrics, and logs. 3. Isolate Service Boundaries Microservices should expose telemetry endpoints (e.g., /metrics for Prometheus or OpenTelemetry exporters). Group Golden Signals by service for clarity: MicroserviceLatencyError RateTrafficSaturationAuth220ms1.2%5k RPS78% CPUPayments310ms3.1%3k RPS89% Memory 4. Correlate Signals with Tracing Use distributed tracing to map requests across services. Tools like Jaeger or Zipkin help you: Trace latency across hops Find the exact service causing spikes in error rates Visualize traffic flows and bottlenecks 5. Automate Alerting with Context Set thresholds and anomaly detection for each signal: Latency > 500ms? Alert DevOps Saturation > 90%? Trigger autoscaling Error Rate > 2% over 5 mins? Notify engineering and create an incident ticket How can the "one-hop dependency view" assist in troubleshooting? The "one-hop dependency view" in application performance monitoring (APM) simplifies troubleshooting by focusing only on the services that directly impact the affected service. Here’s how it helps: ▪️Reduces Investigation Scope Rather than analyzing the entire microservices topology, the one-hop view narrows the scope to immediate dependencies. This selective approach allows engineers to focus on the most likely sources of issues, saving time in identifying the root cause. ▪️Streamlines Root-Cause Analysis By examining only the services one level away, the team can apply the golden signals (latency, errors, traffic, saturation) to detect any anomalies quickly. If a direct dependency is experiencing problems, it becomes immediately apparent without unnecessary complexity. ▪️Decreases Mean-Time-to-Recovery (MTTR) With fewer services to investigate, the MTTR is significantly reduced. Engineers can identify and address the root issue faster, minimizing downtime and maintaining the application’s reliability. Using the one-hop dependency view helps SRE teams keep the troubleshooting process efficient, especially in complex, interdependent service ecosystems Practical Application: Using APM Dashboards Application Performance Management (APM) dashboards integrate Golden Signals into a single view, allowing teams to monitor all critical metrics at once. For example, the operations team can use APM dashboards to get insights into latency, errors, traffic, and saturation. This holistic view simplifies troubleshooting and reduces the mean time to resolution (MTTR). Here's how they work together: ▪️Centralized Monitoring with APM Dashboards:APM tools provide dashboards that centralize the key Golden Signals—latency, errors, traffic, and saturation. This centralized view allows operations and development teams to monitor the health of their applications in real-time. By displaying these critical metrics in one place, APM tools simplify the identification of performance issues, making it easier to spot trends and anomalies that need attention. ▪️"One Hop" Dependency Views:APM tools often support a "one hop" dependency view, which shows only the immediate downstream services connected to a problematic service. This feature is particularly useful in complex microservice environments where pinpointing the root cause of an issue can be daunting. By focusing on immediate dependencies, teams can quickly assess which services are functioning within normal parameters and which are experiencing issues, thereby speeding up the troubleshooting process. ▪️Proactive Issue Detection and Resolution:Integrating Golden Signals into APM tools allows for proactive monitoring, where issues can be identified before they escalate into more serious problems. For example, if a service’s saturation levels begin trending upwards, the APM tool can alert the team before users experience degraded performance. This proactive approach helps reduce the mean time to resolution (MTTR) and improves overall service reliability. ▪️ Customization for Different Teams:The video also mentions that APM tools can be customized for different stakeholders within the organization. While the operations team may focus on all four Golden Signals, development teams might create specialized dashboards that prioritize the signals most relevant to their services. This tailored approach ensures that both dev and ops teams are aligned and can address issues quickly, often even before they impact the end-users. In essence, the integration of SRE Golden Signals with APM tools empowers teams to maintain high levels of service performance and reliability by providing clear, actionable insights into the most critical aspects of their systems. What is the significance of distinguishing 500 vs. 400 errors in SRE monitoring? The distinction between 500 and 400 errors in SRE monitoring is crucial because it impacts how issues are prioritized and addressed. Here’s a breakdown: Error TypeCauseSeverityResponse500 Server-side issueSystem/app failureHighImmediate investigation400 Client-side request issueBad input/authLowerMonitor trends only 500 Errors (Server Errors) These indicate serious problems on the server side, such as downtime or crashes. They require immediate attention because they prevent users from accessing the service entirely, often resulting in significant disruptions. For instance, a 500 error signals that something is failing within the server's infrastructure, meaning end-users can’t receive a response at all. Therefore, these errors are more critical in incident response and may trigger alerts for the SRE team. 400 Errors (Client Errors) These typically indicate client-side issues, where a request is invalid or needs adjustment, like when the requested resource doesn’t exist or is restricted. Such errors might be resolved simply by retrying or by the client correcting the request, so they’re usually less urgent. Monitoring 400 errors can still reveal trends or user behavior that may require attention, but they don't indicate systemic issues. In summary, recognizing the difference allows SREs to prioritize resources on issues that directly affect the system’s reliability and availability (like 500 errors) versus issues that may just need minor adjustments or retries. SRE Monitoring Dashboard Best Practices A well-structured SRE dashboard makes or breaks your incident response. It’s not just about displaying data — it’s about surfacing the right insights at the right time. Here's how to do it: 1. Prioritize Golden Signals Above All Place latency, errors, traffic, and saturation front and center. Avoid clutter—these four are your frontline defense against performance issues. Example Layout: Top row: Latency (P50/P95), Error Rate (%), Traffic (RPS), Saturation (CPU, Memory) Second row: SLIs, SLO burn rates, alerts over time 2. Use Visual Cues Effectively Color code thresholds: green (healthy), yellow (warning), red (critical) Sparklines for trend visualization Heatmaps to spot saturation across clusters or zones 3. Break Down by Environment & Service Segment dashboards by: Environment (prod, staging, dev) Service or team ownership Availability zone or region This helps you quickly isolate issues when incidents arise. 4. Integrate Logs and Traces Link metrics to logs or traces: Click on a spike in latency → see related trace in Jaeger or logs in Kibana Integrate dashboards with alert management (PagerDuty, Opsgenie) 5. Provide Different Views for Different Teams SRE/DevOps view: Full stack overview + real-time alerts Engineering view: Deep dive into a specific service’s metrics Management view: SLO dashboards and service health summaries Use templating (in Grafana or Datadog) so one dashboard serves multiple roles. 6. Regularly Review & Evolve Dashboards Prune unused panels or metrics Reassess thresholds quarterly Add annotations for incidents or deployments Dashboards should be living documents, not static reports. Learn from the official Google documentation. Conclusion Ready to take your system's reliability and performance to the next level? Gart Solutions offers top-tier SRE Monitoring services to ensure your systems are always running smoothly and efficiently. Our experts can help you identify and address potential issues before they impact your business, ensuring minimal downtime and optimal performance. Discover how Gart Solutions can enhance your system's reliability today! Learn from our IT Monitoring case studies (Monitoring Solution for a B2C SaaS Music Platform and Advanced Monitoring for Digital Landfill Management) to learn more about our SRE Monitoring expertise. After implementing Golden Signals, our customer reduced MTTR by 60% in under two months. https://youtu.be/BqPXUxhshTM?si=EWFFu0JNYgJCj7g0

0 Easy Ways to Optimize AWS Costs and Save Over 80% of Your Budget

Cloud

20 Easy Ways to Optimize Expenses on AWS and Save Over 80% of Your Budget

Fedir Kompaniiets

January 13, 2025

In my experience optimizing cloud costs, especially on AWS, I often find that many quick wins are in the "easy to implement - good savings potential" quadrant. [lwptoc] That's why I've decided to share some straightforward methods for optimizing expenses on AWS that will help you save over 80% of your budget. Choose reserved instances Potential Savings: Up to 72% Choosing reserved instances involves committing to a subscription, even partially, and offers a discount for long-term rentals of one to three years. While planning for a year is often deemed long-term for many companies, especially in Ukraine, reserving resources for 1-3 years carries risks but comes with the reward of a maximum discount of up to 72%. You can check all the current pricing details on the official website - Amazon EC2 Reserved Instances Purchase Saving Plans (Instead of On-Demand) Potential Savings: Up to 72% There are three types of saving plans: Compute Savings Plan, EC2 Instance Savings Plan, SageMaker Savings Plan. AWS Compute Savings Plan is an Amazon Web Services option that allows users to receive discounts on computational resources in exchange for committing to using a specific volume of resources over a defined period (usually one or three years). This plan offers flexibility in utilizing various computing services, such as EC2, Fargate, and Lambda, at reduced prices. AWS EC2 Instance Savings Plan is a program from Amazon Web Services that offers discounted rates exclusively for the use of EC2 instances. This plan is specifically tailored for the utilization of EC2 instances, providing discounts for a specific instance family, regardless of the region. AWS SageMaker Savings Plan allows users to get discounts on SageMaker usage in exchange for committing to using a specific volume of computational resources over a defined period (usually one or three years). The discount is available for one and three years with the option of full, partial upfront payment, or no upfront payment. EC2 can help save up to 72%, but it applies exclusively to EC2 instances. Utilize Various Storage Classes for S3 (Including Intelligent Tier) Potential Savings: 40% to 95% AWS offers numerous options for storing data at different access levels. For instance, S3 Intelligent-Tiering automatically stores objects at three access levels: one tier optimized for frequent access, 40% cheaper tier optimized for infrequent access, and 68% cheaper tier optimized for rarely accessed data (e.g., archives). S3 Intelligent-Tiering has the same price per 1 GB as S3 Standard — $0.023 USD. However, the key advantage of Intelligent Tiering is its ability to automatically move objects that haven't been accessed for a specific period to lower access tiers. Every 30, 90, and 180 days, Intelligent Tiering automatically shifts an object to the next access tier, potentially saving companies from 40% to 95%. This means that for certain objects (e.g., archives), it may be appropriate to pay only $0.0125 USD per 1 GB or $0.004 per 1 GB compared to the standard price of $0.023 USD. Information regarding the pricing of Amazon S3 AWS Compute Optimizer Potential Savings: quite significant The AWS Compute Optimizer dashboard is a tool that lets users assess and prioritize optimization opportunities for their AWS resources. The dashboard provides detailed information about potential cost savings and performance improvements, as the recommendations are based on an analysis of resource specifications and usage metrics. The dashboard covers various types of resources, such as EC2 instances, Auto Scaling groups, Lambda functions, Amazon ECS services on Fargate, and Amazon EBS volumes. For example, AWS Compute Optimizer reproduces information about underutilized or overutilized resources allocated for ECS Fargate services or Lambda functions. Regularly keeping an eye on this dashboard can help you make informed decisions to optimize costs and enhance performance. Use Fargate in EKS for underutilized EC2 nodes If your EKS nodes aren't fully used most of the time, it makes sense to consider using Fargate profiles. With AWS Fargate, you pay for a specific amount of memory/CPU resources needed for your POD, rather than paying for an entire EC2 virtual machine. For example, let's say you have an application deployed in a Kubernetes cluster managed by Amazon EKS (Elastic Kubernetes Service). The application experiences variable traffic, with peak loads during specific hours of the day or week (like a marketplace or an online store), and you want to optimize infrastructure costs. To address this, you need to create a Fargate Profile that defines which PODs should run on Fargate. Configure Kubernetes Horizontal Pod Autoscaler (HPA) to automatically scale the number of POD replicas based on their resource usage (such as CPU or memory usage). Manage Workload Across Different Regions Potential Savings: significant in most cases When handling workload across multiple regions, it's crucial to consider various aspects such as cost allocation tags, budgets, notifications, and data remediation. Cost Allocation Tags: Classify and track expenses based on different labels like program, environment, team, or project. AWS Budgets: Define spending thresholds and receive notifications when expenses exceed set limits. Create budgets specifically for your workload or allocate budgets to specific services or cost allocation tags. Notifications: Set up alerts when expenses approach or surpass predefined thresholds. Timely notifications help take actions to optimize costs and prevent overspending. Remediation: Implement mechanisms to rectify expenses based on your workload requirements. This may involve automated actions or manual interventions to address cost-related issues. Regional Variances: Consider regional differences in pricing and data transfer costs when designing workload architectures. Reserved Instances and Savings Plans: Utilize reserved instances or savings plans to achieve cost savings. AWS Cost Explorer: Use this tool for visualizing and analyzing your expenses. Cost Explorer provides insights into your usage and spending trends, enabling you to identify areas of high costs and potential opportunities for cost savings. Transition to Graviton (ARM) Potential Savings: Up to 30% Graviton utilizes Amazon's server-grade ARM processors developed in-house. The new processors and instances prove beneficial for various applications, including high-performance computing, batch processing, electronic design automation (EDA) automation, multimedia encoding, scientific modeling, distributed analytics, and machine learning inference on processor-based systems. The processor family is based on ARM architecture, likely functioning as a system on a chip (SoC). This translates to lower power consumption costs while still offering satisfactory performance for the majority of clients. Key advantages of AWS Graviton include cost reduction, low latency, improved scalability, enhanced availability, and security. Spot Instances Instead of On-Demand Potential Savings: Up to 30% Utilizing spot instances is essentially a resource exchange. When Amazon has surplus resources lying idle, you can set the maximum price you're willing to pay for them. The catch is that if there are no available resources, your requested capacity won't be granted. However, there's a risk that if demand suddenly surges and the spot price exceeds your set maximum price, your spot instance will be terminated. Spot instances operate like an auction, so the price is not fixed. We specify the maximum we're willing to pay, and AWS determines who gets the computational power. If we are willing to pay $0.1 per hour and the market price is $0.05, we will pay exactly $0.05. Use Interface Endpoints or Gateway Endpoints to save on traffic costs (S3, SQS, DynamoDB, etc.) Potential Savings: Depends on the workload Interface Endpoints operate based on AWS PrivateLink, allowing access to AWS services through a private network connection without going through the internet. By using Interface Endpoints, you can save on data transfer costs associated with traffic. Utilizing Interface Endpoints or Gateway Endpoints can indeed help save on traffic costs when accessing services like Amazon S3, Amazon SQS, and Amazon DynamoDB from your Amazon Virtual Private Cloud (VPC). Key points: Amazon S3: With an Interface Endpoint for S3, you can privately access S3 buckets without incurring data transfer costs between your VPC and S3. Amazon SQS: Interface Endpoints for SQS enable secure interaction with SQS queues within your VPC, avoiding data transfer costs for communication with SQS. Amazon DynamoDB: Using an Interface Endpoint for DynamoDB, you can access DynamoDB tables in your VPC without incurring data transfer costs. Additionally, Interface Endpoints allow private access to AWS services using private IP addresses within your VPC, eliminating the need for internet gateway traffic. This helps eliminate data transfer costs for accessing services like S3, SQS, and DynamoDB from your VPC. Optimize Image Sizes for Faster Loading Potential Savings: Depends on the workload Optimizing image sizes can help you save in various ways. Reduce ECR Costs: By storing smaller instances, you can cut down expenses on Amazon Elastic Container Registry (ECR). Minimize EBS Volumes on EKS Nodes: Keeping smaller volumes on Amazon Elastic Kubernetes Service (EKS) nodes helps in cost reduction. Accelerate Container Launch Times: Faster container launch times ultimately lead to quicker task execution. Optimization Methods: Use the Right Image: Employ the most efficient image for your task; for instance, Alpine may be sufficient in certain scenarios. Remove Unnecessary Data: Trim excess data and packages from the image. Multi-Stage Image Builds: Utilize multi-stage image builds by employing multiple FROM instructions. Use .dockerignore: Prevent the addition of unnecessary files by employing a .dockerignore file. Reduce Instruction Count: Minimize the number of instructions, as each instruction adds extra weight to the hash. Group instructions using the && operator. Layer Consolidation: Move frequently changing layers to the end of the Dockerfile. These optimization methods can contribute to faster image loading, reduced storage costs, and improved overall performance in containerized environments. Use Load Balancers to Save on IP Address Costs Potential Savings: depends on the workload Starting from February 2024, Amazon begins billing for each public IPv4 address. Employing a load balancer can help save on IP address costs by using a shared IP address, multiplexing traffic between ports, load balancing algorithms, and handling SSL/TLS. By consolidating multiple services and instances under a single IP address, you can achieve cost savings while effectively managing incoming traffic. Optimize Database Services for Higher Performance (MySQL, PostgreSQL, etc.) Potential Savings: depends on the workload AWS provides default settings for databases that are suitable for average workloads. If a significant portion of your monthly bill is related to AWS RDS, it's worth paying attention to parameter settings related to databases. Some of the most effective settings may include: Use Database-Optimized Instances: For example, instances in the R5 or X1 class are optimized for working with databases. Choose Storage Type: General Purpose SSD (gp2) is typically cheaper than Provisioned IOPS SSD (io1/io2). AWS RDS Auto Scaling: Automatically increase or decrease storage size based on demand. If you can optimize the database workload, it may allow you to use smaller instance sizes without compromising performance. Regularly Update Instances for Better Performance and Lower Costs Potential Savings: Minor As Amazon deploys new servers in their data processing centers to provide resources for running more instances for customers, these new servers come with the latest equipment, typically better than previous generations. Usually, the latest two to three generations are available. Make sure you update regularly to effectively utilize these resources. Take Memory Optimize instances, for example, and compare the price change based on the relevance of one instance over another. Regular updates can ensure that you are using resources efficiently. InstanceGenerationDescriptionOn-Demand Price (USD/hour)m6g.large6thInstances based on ARM processors offer improved performance and energy efficiency.$0.077m5.large5thGeneral-purpose instances with a balanced combination of CPU and memory, designed to support high-speed network access.$0.096m4.large4thA good balance between CPU, memory, and network resources.$0.1m3.large3rdOne of the previous generations, less efficient than m5 and m4.Not avilable Use RDS Proxy to reduce the load on RDS Potential for savings: Low RDS Proxy is used to relieve the load on servers and RDS databases by reusing existing connections instead of creating new ones. Additionally, RDS Proxy improves failover during the switch of a standby read replica node to the master. Imagine you have a web application that uses Amazon RDS to manage the database. This application experiences variable traffic intensity, and during peak periods, such as advertising campaigns or special events, it undergoes high database load due to a large number of simultaneous requests. During peak loads, the RDS database may encounter performance and availability issues due to the high number of concurrent connections and queries. This can lead to delays in responses or even service unavailability. RDS Proxy manages connection pools to the database, significantly reducing the number of direct connections to the database itself. By efficiently managing connections, RDS Proxy provides higher availability and stability, especially during peak periods. Using RDS Proxy reduces the load on RDS, and consequently, the costs are reduced too. Define the storage policy in CloudWatch Potential for savings: depends on the workload, could be significant. The storage policy in Amazon CloudWatch determines how long data should be retained in CloudWatch Logs before it is automatically deleted. Setting the right storage policy is crucial for efficient data management and cost optimization. While the "Never" option is available, it is generally not recommended for most use cases due to potential costs and data management issues. Typically, best practice involves defining a specific retention period based on your organization's requirements, compliance policies, and needs. Avoid using an undefined data retention period unless there is a specific reason. By doing this, you are already saving on costs. Configure AWS Config to monitor only the events you need Potential for savings: depends on the workload AWS Config allows you to track and record changes to AWS resources, helping you maintain compliance, security, and governance. AWS Config provides compliance reports based on rules you define. You can access these reports on the AWS Config dashboard to see the status of tracked resources. You can set up Amazon SNS notifications to receive alerts when AWS Config detects non-compliance with your defined rules. This can help you take immediate action to address the issue. By configuring AWS Config with specific rules and resources you need to monitor, you can efficiently manage your AWS environment, maintain compliance requirements, and avoid paying for rules you don't need. Use lifecycle policies for S3 and ECR Potential for savings: depends on the workload S3 allows you to configure automatic deletion of individual objects or groups of objects based on specified conditions and schedules. You can set up lifecycle policies for objects in each specific bucket. By creating data migration policies using S3 Lifecycle, you can define the lifecycle of your object and reduce storage costs. These object migration policies can be identified by storage periods. You can specify a policy for the entire S3 bucket or for specific prefixes. The cost of data migration during the lifecycle is determined by the cost of transfers. By configuring a lifecycle policy for ECR, you can avoid unnecessary expenses on storing Docker images that you no longer need. Switch to using GP3 storage type for EBS Potential for savings: 20% By default, AWS creates gp2 EBS volumes, but it's almost always preferable to choose gp3 — the latest generation of EBS volumes, which provides more IOPS by default and is cheaper. For example, in the US-east-1 region, the price for a gp2 volume is $0.10 per gigabyte-month of provisioned storage, while for gp3, it's $0.08/GB per month. If you have 5 TB of EBS volume on your account, you can save $100 per month by simply switching from gp2 to gp3. Switch the format of public IP addresses from IPv4 to IPv6 Potential for savings: depending on the workload Starting from February 1, 2024, AWS will begin charging for each public IPv4 address at a rate of $0.005 per IP address per hour. For example, taking 100 public IP addresses on EC2 x $0.005 per public IP address per month x 730 hours = $365.00 per month. While this figure might not seem huge (without tying it to the company's capabilities), it can add up to significant network costs. Thus, the optimal time to transition to IPv6 was a couple of years ago or now. Here are some resources about this recent update that will guide you on how to use IPv6 with widely-used services — AWS Public IPv4 Address Charge. Collaborate with AWS professionals and partners for expertise and discounts Potential for savings: ~5% of the contract amount through discounts. AWS Partner Network (APN) Discounts: Companies that are members of the AWS Partner Network (APN) can access special discounts, which they can pass on to their clients. Partners reaching a certain level in the APN program often have access to better pricing offers. Custom Pricing Agreements: Some AWS partners may have the opportunity to negotiate special pricing agreements with AWS, enabling them to offer unique discounts to their clients. This can be particularly relevant for companies involved in consulting or system integration. Reseller Discounts: As resellers of AWS services, partners can purchase services at wholesale prices and sell them to clients with a markup, still offering a discount from standard AWS prices. They may also provide bundled offerings that include AWS services and their own additional services. Credit Programs: AWS frequently offers credit programs or vouchers that partners can pass on to their clients. These could be promo codes or discounts for a specific period. Seek assistance from AWS professionals and partners. Often, this is more cost-effective than purchasing and configuring everything independently. Given the intricacies of cloud space optimization, expertise in this matter can save you tens or hundreds of thousands of dollars. More valuable tips for optimizing costs and improving efficiency in AWS environments: Scheduled TurnOff/TurnOn for NonProd environments: If the Development team is in the same timezone, significant savings can be achieved by, for example, scaling the AutoScaling group of instances/clusters/RDS to zero during the night and weekends when services are not actively used. Move static content to an S3 Bucket & CloudFront: To prevent service charges for static content, consider utilizing Amazon S3 for storing static files and CloudFront for content delivery. Use API Gateway/Lambda/Lambda Edge where possible: In such setups, you only pay for the actual usage of the service. This is especially noticeable in NonProd environments where resources are often underutilized. If your CI/CD agents are on EC2, migrate to CodeBuild: AWS CodeBuild can be a more cost-effective and scalable solution for your continuous integration and delivery needs. CloudWatch covers the needs of 99% of projects for Monitoring and Logging: Avoid using third-party solutions if AWS CloudWatch meets your requirements. It provides comprehensive monitoring and logging capabilities for most projects. Feel free to reach out to me or other specialists for an audit, a comprehensive optimization package, or just advice.

Understanding Scalability

Scaling for Success: The Proven Path to Revenue Growth and Cost Savings

How DevOps and SRE Practices Enable Scalability

Automation

Infrastructure as Code (IaC)

Continuous Integration and Continuous Deployment (CI/CD)

Monitoring and Observability

Scalable Architecture

Capacity Planning

Incident Response and Resilience

DevOps vs. SRE: Complementary Strengths for Scaling

How Gart Solutions Drives Scalable Performance

FAQ

What is horizontal vs vertical scaling?

How does Infrastructure as Code support scalability?

Why is monitoring critical when scaling?

How do DevOps and SRE contribute to project scalability?

What are some key DevOps practices that enhance scalability?

How does SRE impact business operations?

Can small businesses benefit from DevOps and SRE practices?

What tools are commonly used in DevOps and SRE?

You might also like

Cost-Effectiveness: The Path to Sustainable DevOps and Cloud Solutions

SRE Monitoring: Golden Signals as a Key Metrics for System Reliability

20 Easy Ways to Optimize Expenses on AWS and Save Over 80% of Your Budget

Subscribe to our blog