About the Client
ReSource International is an Icelandic company that specializes in environmental solutions. They have developed elandfill.io, a digital solution for monitoring and managing landfill operations.
Challenge
- Need for a scalable solution to monitor and manage the elandfill.io platform.
- Requirement for a cloud-agnostic approach to support future growth.
- Ensuring the platform could be managed by users with varying technical expertise.
- Identifying and addressing issues swiftly to minimize downtime and operational disruptions.
- Need for automation in monitoring processes and notifications.
Solution
To address these challenges, we developed the Resource Management Framework (RMF) – a unified system for managing and monitoring digital solutions for landfills.
The RMF was designed with scalability in mind, allowing for future growth without being tied to a single cloud provider. The platform was initially hosted on Hetzner, which posed a limitation for future scalability. We developed a cloud-agnostic solution imposed certain constraints on the DevOps team but ultimately provided greater flexibility.
The platform’s modular structure includes various components, one of which is monitoring. This component consists of a dashboard that shows the status of the system, including applications, their versions, and their last seen status.
This dashboard displays:
- List of installed applications
- Application versions
- Status of each service
- Time of last status update (Last seen)
Management Dashboard
This user-friendly interface allows non-technical personnel to monitor production environments. It helps quickly identify issues, such as login failures, by pinpointing whether the problem lies with the UI, backend, database, or performance.
Each new service is automatically added to the dashboard, enabling the platform to expand as needed.
Notifications
We configured alerts for critical services to notify users of issues such as memory usage exceeding predefined limits. These alerts can trigger scripts to resolve problems automatically.
Integration with Microsoft Teams provides notifications on deployments, ensuring that all team members are aware of changes and potential impacts.
By structuring and representing the collected data, we laid the foundation for the next level of automation. This includes building an alert system that not only notifies but also performs actions based on the alerts, like running scripts to fix issues.
For instance, when a notification about CPU usage spike is received, a corresponding action or script is triggered to resolve the issue.