By: Prashant Mehta, 10/21/2025 Updated: Prashant Mehta, 12/6/25
Introduction to Storage Mechanisms
I think a good way to start this one is with the goal of pacific coast. You may have noticed Runway Avenue’s marine theme at some point, Radar itself is based on the idea of being able to see things as they are around you while fostering a sense of exploration. Anchor, our CI/CD pipeline, gives us something to ground us to when faulty code tries to make its way up our ship. Pacific-coast? It represents the line between the user and our data lakes and oceans. In fact, it started the marine theme.
Thematics and theatrics aside, the point of pacific-coast is to provide a storage mechanism that connects to our API to expose simple CRUD functions, while pacific-coast handles caches, Postgres handling, and all other relevant complex data structures used in our system. pacific-coast also has its own internal monitoring logic for ensuring that caches in Kubernetes pods can close safely and retain non-expired items despite a certain cache being under utilized. We use memcached for this process to ensure we have the speed of non-persistence while using postgres as our source of truth.
There also exists alternatives, Google’s BigQuery was considered, hosting a relational database in AWS was also considered early on, however our stack needed to support on-prem as significant concerns about cloud pricing and over-reliance on hardware we didn’t own arose. MongoDB was also considered, Rachit would like me to mention that at one point it was high on our list, ultimately he convinced me that the cons of a JSON format were outweighed by the pros of a relational database that contained metadata tags to a block storage of some kind. A lot of this logic will be discussed further down in the section on Postgres.
Discussing the elements: How our system works
To really fundamentally understand storage systems we must cover the basics first. An item can be in one of two states, persistent or non-persistent. Majority of computer scientists will understand that the difference is simply whether or not the data remains if the software operating it dies. Postgres is persistent, memcached is not. However in being non-persistent, memcached is much quicker. This presents the dilemma.
How can we build a fast, distributed system, without compromising on the persistence and consistency of data?
And the answer is the system that we now maintain and build today. Recall that our entire platform is containerized, as described in our Systems Architecture overview. This means that our code is designed to connect multiple pods to a single persistent storage device. We can have many concurrent Postgres primary nodes and then from there spawn replica nodes to assist in speed and concurrency. This sounds like it takes care of most of our problems! We can handle a lot of traffic this way, and write too! However far from it. In fact, the goal is to minimize reading from storage, even if that storage is 2.5” enterprise SATA SSDs and lightning quick. Data is meant to flow, the faster you can retrieve data, the faster the flow.
Memcached
memcached is non-persistent, which means everything lives in RAM. Not only that, but we can shard memcached instances by just generating more pods. This allows us to tune our resources really effectively through Kubernetes pod architecture by creating “small” pods of about 2-4gb and creating more or less as needed. On top of that, our memcached client actually even has a many-to-many relationship. Which means that we can have as many clients and as many servers for caching as we need across our entire stack. Need more RAM? Kubernetes will just create another pod. Need more traffic because your requests are coming from FAISS? Great, Kubernetes will just create more async clients and because of our gomemcached interface, we can just keep adding more clients if more access points are needed. This way, we can actually under allocate resources to pods and still have a really rock solid foundation that can handle a ton of users, while having a really low bare minimum threshold. Tuning your Kubernetes configurations is always a good idea, but being able to shard between 2-10 servers across 2-10 clients on the fly makes it so much easier to scale your resources. If you’re interested in reading more about memcached, a technical whitepaper will be made available pretty soon here.
PostgreSQL
Postgres is a fantastic database. There are few databases, and fewer open-source databases, that handle a large amount of data at incredibly quick speeds. Postgres is one of the rare combinations which made it a natural choice for Runway Avenue’s data platform. Yet, Postgres has a limitation, which is that it doesn’t load data much faster or slower depending on what data you pull. It matters more so, how you pull or query for that data than the amount of columns you pull out. Granted, the amount of columns you pull also matters which is why relational databases aren’t just one fat table. On top of that, Postgres by itself is really bad at handling two types of data that is critical for Runway Avenue’s stack: Time-series data and vectors. Both of these data types are critical to our data-platform and provide a large amount of the query handling. That being said, this is mitigated by TimescaleDB, pgvector, and FAISS which we’ll talk about later in terms of how we utilize these technologies.
Postgres is probably your best choice for a modern relational database in 2025. It supports a large amount of IOPS, has flexible data types, and for whatever you can’t store you can always put a bucket link as text into the table. Using a relational database for us was key considering that relational databases are typically faster and flexible enough within reason.
Majority of the JSON requirements we had would wind up only being useful in ML training, in which case we could just eat the Google Storage egress fees in retrain. On top of that, most ML engineers would have a stroke if every time you re-trained the recommender you had to reclassify your entire database. While this happens from time-to-time it’s a strenuous process, which further implies that working with JSON in mind would be shortsighted.
With the background out of the way, we can talk about the fun stuff! Our Postgres deployment will be on a VM in the Anton cluster that uses Envoy to communicate over the L4 layer. This means that Postgres is logically isolated from the kubernetes cluster and has the same access to it as a remote user request. The only reason a remote user request would not work in this scenario is that the Postgres VM is not visible outside of the private subnet. This IP will be discoverable via Envoy Proxy within Kubernetes. As far as k8s are aware, the access point for postgres is within the cluster. This will be discussed further in Production PostgreSQL at Runway Avenue.
Google Cloud Storage (GCS)
Image data can be really heavy in some cases, and requires a fast array with lots of storage. This is storage that we actually don’t have access to yet, one day we will, but for the moment this is considerably heavier than we’d like to admit. To put it into perspective, images data far exceeds all of the other labels we would have on our data. On top of that, images are the primary thing users and the ML models will be looking at to make a decision. We can’t afford to store all of that on Anton in a persistent format.
On top of that, it’s nice to pay super low fees to make Google store our encrypted data forever. The TPM seal on Anton will ensure that this data cannot be decrypted without Raft quorum from the rest of the physical Anton nodes. Should any bad-faith occur, data leaks, etc; GCS will still be airtight. We also use GCS to store cold-line metrics, Lighthouse training data, and other indicators that we may want to keep for far longer than just one year.
Majority of this handling will be on Rachit’s team in building out pacific-coast all the way from APIs to specific SDKs for internal use that highlight best practices and automate encryption handling and paths.