I believe we’re entering a golden age of observability- we can gather metrics from our applications and infrastructure, better interpret them with query languages and pretty dashboards, and get notifications in chatrooms and our oncall systems. All of this technology at our fingertips- without any software licensing fees!
The challenge I see with these new tools is that they tend to assume ‘cloud-native’ infrastructure- the happy path for setup and configuration usually requires a container orchestration system.
There are a LOT of organizations (including my clients) that aren’t running large cloud deployments for their businesses- however would greatly benefit from the power these tools provide.
For this reason I created o11y-in-a-box, enabling smaller engineering, IT, and Ops teams to launch a single-host monitoring system in minutes. I’m not kidding.
This solution provides:
- Prometheus, a highly-scalable time-series database and monitoring system;
- Loki, a lightweight log storage and querying system;
- Grafana, a feature-rich dashboarding, monitoring, and alerting app.
These tools together enable teams to easily:
- gather performance and availability metrics for their hosts, services, and network devices;
- monitor availability of network services such as websites;
- analyse server logs across all of their infrastructure;
- create meaningful dashboards to reveal service and product health;
- send alerts when services are down/unhealthy.
The only requirements are:
- an Ubuntu 22.04 server to run the monitoring software;
- Ansible installed on a local workstation to provision the above software on the server.
Code repo and instructions are available here: https://gitlab.com/certomodo.io/o11y-in-a-box
I love using tools like these to make software understandable and resiliant. If you’d like to explore working with me to transform the state of your team’s monitoring and alerting, please reach out!
(Image credit: capt.sopon)