By: Prashant Mehta, 12/3/25

Why would we self-host our own CI/CD?

Honestly there are days where this is a really good question, and there are other days where its a really stupid question. When you run an enterprise level codebase, deployed as micro-services, but programmed as a monorepo there’s a lot of considerations to make. Planning a large operation like Runway Avenue requires planning, which is why we introduced Radar and utilize Asana and other Project Management softwares. We do this because we want a company run by engineers and thinkers and not schedulers and “productivity specialists”. This requires abstracting away things engineers usually think about and hate thinking about.

The list of these things are typically as follows:

  • Helm Charts and K8s deployments
  • Monitoring, logging, observability pipelines
  • Build scripts / Testing the full services
  • Managing the Git Tree / PRs / Code Reviews

Engineers are often responsible for making sure these things are up to company standards, however the best way to implement these things is to ensure that they are integrated such that any time something needs to be done a specific way in line with the rest of production such that it’s centralized and reproducible. This can be done using a combination of tools, we have chosen GitLab CI/CD, Terraform, Buck2, Flux, and BuildBuddy.

GitLab

GitLab is just open source GitHub. Don’t overcomplicate it. You’ve got the same structure as normal Git with similarly added on features that GitHub usually provides. Branch management, automatic package building based on scripted definitions, as well as some premium features like testing, security analysis, etc.. I don’t believe we plan on paying for the nicer stuff any time soon but eventually we will and when that happens GitLab’s already competitive stack becomes a no-brainer compared to other CI/CDs and Git managers.

We also manage a very large and very complex kubernetes stack that relies on monorepo organization. GitLabs is the central point in which we define organization such that the tools surrounding it can test, deploy, and package all of our code automatically.

On top of that, we also have an integrated GitLab registry which means that we don’t need to worry about docker for package & container registries. Putting it all together, this lets us manage large, key aspects of container distribution across GCP overload nodes and the Anton clusters we manage ourselves.

Flux

This is where Flux comes in. Flux makes sure that whatever GitLabs says about Kubernetes and our k8s configurations is source of truth. In fact, if you make changes to a k8s stack operated by Flux, it will actually override it back to the original configuration within minutes. This is to ensure that if a change is made to prod, it is logged and tested before it is pushed to. prod. More so, that if a change is made to prod in any configuration, GitLab automations are triggered and all responsible members of tech team are notified for any given repo. This ensures everything is authorized correctly and helps with our commitment to SOC2 uptime and safety guarantees.

Terraform

Clearly, this company loves giving IBM money. We use a lot of the major Hashicorp products, RHEL , and administrative tools like Cockpit which manages our KVM configurations (more on that another-time . Terraform exists simply to automate the installation process of physical nodes, such that they are reproducible. This is often referred to IaC (Infrastructure as code) and allows code to define parameters for actual deployable systems. Terraform allows you to take a blank physical node with a basic operating system and install all necessary utils and configurations on top of it. This works well because most of these configurations are absolutely standardized. In our case, once RHEL is installed on a physical node, Terraform can script out installations and parameters for KVM and within each sub-operating system deploy other Terraform scripts that configure fully deployable systems and sync them against prod. Since we use a k8s stack, deploying and connecting more nodes is relatively simple and doesn’t change the overall complexity. This makes Terraform substantially less complicated since Kubernetes can administer a worker or control-plane node and it’ll just join the existing plane and sync through GitLab. In fact, it’s likely that if we ever get to a point where our internal network exceeds 25GB/s (probably around end of Y2, unless we decide to order Arista switches in which case probably closer to mid Y3) that we would be able to get physical nodes running within just an hour or less from scratch. Especially if we assume we’ve cached VM images / snapshots that we can quickly load and deploy straight to new nodes.

Buck2

Why Buck2? Why not Buck? Why not Bazel? Hell, why not just write a 150 line script that compiles all of your packages, spins them into each container and then makes them talk over open ports in your subsystem.

All valid questions, in fact, Rachit probably asked me the last one at some point. The answer is really simple. Compiling large repos sucks, but large repos are the best way to track the intricacies of larger deployments. In fact, if I can have all of pacific-coast in one repo, the best way to do that is with feature branches that all meet together in larger versions that have compatibility tests and get progressively more detailed depending on what “resolution” of branch you’re in.

Buck2 helps with that significantly, build tools in general are fantastic because it uses caching and codebase tracking to make sure you almost never compile the exact same code twice. The less you have to recompile, the faster your test-cases go and better yet, the less compute you spend doing a repetitive task. In addition to compile caching, Buck2 is written in Rust which gives it twice the speed of the original Buck. As for Bazel, Buck2 is just easier to use from our reading. On top of that, Meta actually uses it daily. It manages a lot of their critical repos and that’s a peace of mind that Google’s Bazel doesn’t offer, as it is the open source version of their internal tool Blaze. We also found that there’s less specificity which is something Rachit and I tend to prefer since we just want a tool that compiles everything to our specification quickly and well enough.

What the hell is BuildBuddy?

You may have heard about everything else on this list. In fact, I hope you have, because it would make the most sense. BuildBuddy seems like a curveball coming from me, but if you actually look at what they do and re-read the section on buck2 it focuses on a strategy called RBE. This means any time buck2 compiles, it goes to a remote server to build and return containers, packages, or test results.

No we are not outsourcing the compiling. In fact quite the opposite. BuildBuddy powers some of the largest open-source repos, because the company itself is open-source. They provide compute nodes to those who wish to offload compute hours to RBE, however we would rather own that process ourselves.

Conclusion

Once you read about all of these systems in depth, it’s hard to say it won’t change the company’s structure deeply. It will. We will have to pursue a monorepo structure to take full advantage of the tools we are baking into our production stack. This requires an update to the git-practices-and-standards (which will happen by either myself, or Rachit another-time ) that forces us to organize branches into features, features into versions, and merge upwards to main. In fact, for a long time, main will remain blank until v0. From there we should expect that our git best practices become a law. So much so that we will have pre-determined naming conventions and management policies that ensure clear direction when pushed to prod. This will make sure that we don’t have accidental prod merges, cross-version merges, or other mistakes that could lead to irreversible damage to prod or other critical company assets. In addition to ensuring that we maintain IaC libraries, Flux syncing, GitLab’s registry, we must make sure we’re following the best practices that allow these services to do what they do best and Buck2, BuildBuddy, and GitLab will help us organize our code in ways that can do that in a large monorepo format.

Anchor is the full system that is designed to automate majority of these actions. Developers will have to write BUCK files, Terraform scripts, and k8s configurations for their software. The rest will be taken care of by Anchor. Anchor will automatically test code against your defined tests, move to compile, and update the registry as defined by the repo. This requires us to define these situations, but will allow us to stay organized as we increase our developer teams. It will also allow us to amp up failure detection using Buck2’s code query tools, GitLab’s test reporting, and other insights which we will gain by operating this larger codebase management system. On top of that, Anchor will work in tandem with our other internal services like Radar to make sure that people are up to date with project requirements and information. The goal of Anchor is that developers spend the least amount of time defining projects and spend the most amount of time actually solving problems and interacting with the code. We believe the tools we have selected will accomplish this and we would love input if you have any suggestions on how we might move forward with anchor. Please reach out to me on Slack if there are any amendments to be made as we continue to develop anchor.