By Prashant Mehta, 10/27/25

Introduction

If you’ve worked with me, especially in the past few months, something I have emphasized that echos within our incredibly talented technical team is the idea of being able to solve problems quickly. I told Saketh Machiraju not too long ago, the reason his timelines are easily pushed back and are incredibly flexible is because “I expect you to fail repeatedly, because that’s what we have to get out of the way before prod.” Harsh? Maybe. Though, nobody on our team has done what we’re doing before, and therefore we must build stacks that let us find problems quickly. More so than finding problems, seeing problems emerge in real-time and being so hyper-vigilant of these issues that they become easy to solve before going to production and impossible to ignore in production.

Grafana Labs, a brief acknowledgement

We use Grafana Labs entire stack, services like alloy , mimir , grafana , loki, tempo and more to scrape our own internal metrics and provide the best platform possible to interface with the results of those metrics. We build on Grafana Labs tooling primarily because it’s open source and because its built on the Open Telemetry stack, which mimir took a few steps further in improving its scalability in the enterprise level. Do we plan on paying for Grafana? Probably not. Could we? Probably. The goal is to host our own SOC2/2 compliant stack without external reliance, if we pull something like this off it may very well be one of the greatest accomplishments our team would be able to pull off.

Interactions and Design of Lighthouse

Following our California Marine themes, we dub this project lighthouse . Allowing us to shine light on problems, guiding our ships safely to shore. Lighthouse has a mutable interface, by providing viewership into our various data sources over MCP & Grafana by extending these various APIs into a unified interface.

Grafana’s alloy does this incredibly well. Alloy can be treated like a pipe and any time you want to get metrics out of your mimir , loki , or tempo databases you simply setup a tool through alloy to do so. This will drastically simplify our API and let us do cool things like wrap alloy into an MCP and integrate it with our LLM. In addition to giving that LLM access to things like alert manager and other tools to give us extended notifiers into our stack. Failing services, and other anomalies can be detected quickly. Of course the native front-end for all of these things in lighthouse is Grafana. mimir, loki, and tempo will all link up there.

You can sort of think of it as applications to pipes, pipes to databases, and databases to centralized dashboards & exporters. In this scenario it looks something like:

Envoy ----------\         /--------> Mimir (Metrics)--\
Triton ----------> Alloy --------> Loki (Logs) ---------> GCP Bucket
Applications ---/         \-------> Tempo (Traces)----/

Mind you your entire data layer (Envoy, Triton, etc.) provide metrics, logs, and traces. Alloy being a unified scraper means you can collect from all of these feeds in OTel formatting and send them to the right sources so they can all be viewed by the Grafana dashboard or other connected tools through another Alloy exporter.