Cloud Native Computing Foundation Online Programs, 25 May 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Building a managed database service with Kubernetes Operators

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right welcome to my session on building a manage database service using kubernetes operators before getting into the weeds I want to start off explaining who I am and why you should listen to me. So uh my name is Jimmy zilinski I'm, the founder of a company called auth, said at offset. We build a database called spice DB and what spice DB does? Is it stores your authorization data? So when you build a permission system for your applications, um eventually, you hit a couple different types of problems.

A

Maybe you want to make Dynamic changes to the system without changing code in various places, or maybe you have multiple applications that need to share the same data in order to determine and access decision in your code, and that's that's the type of scenario in which you would reach out for an authorization specific database.

A

So my background is in product engineering and operations, so I've worked as a product manager in the past, I've worked as a software engineer, building a distributed system projects and then I've also worked in operations running those distributed systems in production.

A

So, despite my background being in product, I still write code every day and carry a page over the services that I build prior to founding onset I worked at a company called coreless which got acquired by a red hat um and at coralesce I, actually co-created a cncf project called the operator framework alongside some other members. Now, members of the LCD team and what the operator framework lets folks do is basically more easily build operators, so they can customize kubernetes and extend it in ways that make sense for running their their domains.

A

So as a part of coreos, I was also a maintainer of oci, which is the container specification and I've done a bunch of work in the container registry space over the years. So uh that's a bit about my background.

A

All right um to level set the for the talk um before anything I, always like to kind of Level Playing Field, make sure that everyone is using or understands the same terminology or what we're talking about before diving right in um so uh to be able to even discuss this topic, we have to kind of cover two major subjects right. The first is what's a managed database. Our first in the second is what are kubernetes operators um so manage. Database service is going to be pretty much.

A

uh You Outsourcing the operational side of a database to a particular provider. So instead of you spinning up a database and managing on top of your own Hardware or even Cloud Hardware, this is going to be someone else doing that, for you I'm purely giving you the details, you need for your application to connect to that database and then they're, basically out of the out of the way for that. So you don't have to maintain a pager or anything like that to make sure that the database is operational and able to serve traffic.

A

There's kind of two different types of providers that you can Outsource to. There are Cloud providers that obviously have the expertise in running software on top of Cloud environments, so examples of that would be Amazon RDS and uh Google Cloud platforms, Cloud SQL. These providers offer the typical relational databases, but they also have Individual Services for more specialized databases.

A

The other type of expert that you can Outsource this to are the actual database providers themselves, so folks like cockroach Labs, so on cockroachdb dedicated and my own company outside selling supposed to be dedicated, but there are plenty of other database providers in the space that also do similar. Elasticsearch and redis also come to mind as kind of examples of these database provider.

A

Experts that offer these types of services all right. So then, what are kubernetes operators, so operators are custom controllers for kubernetes that encode application specific logic, so that basically means extending the kubernetes API and teaching it about effectively new Concepts that are specific to your domain.

A

The uh point to all of this is to basically effectively improve how kubernetes is able to handle running that application, but even more greater concept is actually encoding your domain into kubernetes, so that the kubernetes control plane actually becomes the central interface for everything it becomes the source of truth, and you can always use the standard tools, like your your dashboard or Coupe control, to query that and understand what is running it in production all right, so without further Ado um I'm, going to talk about my anecdotal experience, building supposed to be dedicated.

A

The reason why I'm going to use this is not only familiarity, but also because we've actually built the service semi-recently. There are a lot of other managed database services that are probably built on top of kubernetes, but because of the recency of this uh I think it's probably more applicable to someone looking to build a similar service today, if they're trying to do that for building their own product or just building a platform uh engineering team internally at their business.

A

So uh the rest of the talk is kind of be basically me describing the system we've built kind of like this. The decision making process we went through and uh kind of the way we've kind of divided things up and how we think about the different problems that we had to solve.

A

So at a 10 000 foot view um we can kind of break down this problem into three major phases: um provisioning, runtime and then the day two operations, uh the provisioning side is going to be kind of how we create the customer environments.

A

um So that means everything related to creating and updating clusters and what lives on those clusters and the finer details to that is actually how you decide to split up and differentiate between what is what defines the cluster and what defines the configuration that lives on top of the cluster.

A

This is actually pretty subtle when you're trying to understand what things should need to be updated with the life cycle of kubernetes itself versus things that can be iterated on with changes to the application, so that that's one of the subtle aspects. Another big one is how you're going to promote changes to these different customer environments.

A

um How are you going to roll out kubernetes updates or any changes to the aforementioned cluster? Configuration and how are you going to do that in a way that um can be Progressive and so that your customers, if whether they have maintenance windows or they're, very sensitive, to updates? If you can get the updates at the regular Cadence that they're expecting um and we move on to the runtime phase? The runtime phase is about basically what things have to be running uh live when customers are using these systems.

A

This is where the managed database, as a service kind of differentiates from a lot of other workloads that you might be running on kubernetes, because the customers are actually going to be modifying the the cluster itself in real time, and uh that basically means that we need to be able to not only manage our own configuration but also be able to respond to end users deciding they want to take operations like scaling their database cluster up or down.

A

The other unique kind of problem in this space is going to be handling the availability and performance requirements of running a database. Databases are typically very performance and latency sensitive workloads and they're also stateful workloads.

A

So all these things kind of complicate the actual production runtime of of the system and basically being able to run a service in a way where, when different events like scaling up or scaling down happens- or you lose a node in kubernetes uh to be able to handle that in such a way that you don't lose any performance or drop, any requests is very tricky and something that every database of the service is going to need to manage, because you can't necessarily make changes in the application code that is talking to your database.

A

Instead, you need to make the the actual runtime as robust as possible, because you don't have any control of the application code connecting to the databases you're managing.

A

Finally, we have the day two operations, which are basically the the actual operations that our sres are going to be managing. So this has to do with basically handling backups.

A

Specifically, as I said, customers can modify these environments, so that means we not only have to be able to reproduce our clusters, but also be able to reproduce the state that the customers changed, and then we also have to power. Our own operational workload, so that means we need to be able to aggregate metrics across customers who need to understand the health and state of the customers.

A

Page are engineers and things like that when something is going wrong on a customer environment, so I'm going to dive deeper into provisioning, now um I'm going to list out some of the Technologies and some of the Core Concepts that we've chosen to go with I would say a lot of these different Technologies are personal choice. I'm not saying you should choose one over the other, um but I'm going to include why we ended up with the ones that we have.

A

But these reasons are kind of organization specific, and if you have a company, for example, that has a ton of terraform expertise, go ahead. Use terraform, I. Think that's going to be a better choice for you. If, uh if that's what you have the expertise in at your company, but for us, for example, um we picked palumi, um we're very comfortable writing go code and we actually wanted to build ultimately one binary that is kind of our infra program that can manage all kinds of different things.

A

So palumi just gets embedded into that process, so we actually have commands for provisioning things, but we have other commands for for accessing different systems. That would be a part of our operations team, uh basically everyday work.

A

So uh that's why we ended up picking palumi um for actually reconciling configuration on a cluster we use.

A

Arcocd um flux is another example of a cncf project that also kind of does this continuous deployment um we ultimately aligned it on Argo, specifically because it has a nice web UI for checking the health of all the environments, but also it has kind of nice functionality for actually applying the changes like dry runs and pruning, and you can actually write Lua to to kind of extend Argo in some scenarios where we specifically for when you're, creating operators um you're going to create custom kind of definitions of healthiness in the status fields and Argo can be extended with Lua to actually understand those to know whether a a custom resource that you've actually created for your operator is healthy or not.

A

um So that's super useful functionality there for the actual configuration we use on the cluster itself. We use customize. We previously used queue a lot, but we ultimately migrated to customize, because it was really easy to structure integrates directly with Coupe cuddle, so our Engineers have to install any additional tooling um it's way easier to onboard Engineers, because if you understand kubernetes, you probably a kubernetes yaml at least manifests.

A

You are going to understand using uh customize to some degree and it lets us actually really reuse a lot of tools off the shelf um because you can kind of point to any manifest in a git repository and use that as a reference and kind of extend that using customize. So as we adopt more and more of the kind of standard Community tools, we can kind of just point customize to those tools and get them vendored almost for free or with very little notification um and yeah.

A

If you're using Q, you can have to do all the legwork of kind of um importing and transpiling, basically a yaml into queue and you're kind of on your own for a lot of uh a lot of the tooling and structure um but I. Imagine some of that will change over time. So uh it's not necessarily kind of dry. If you're watching this video six months from now, maybe the the state of the world for Q has improved dramatically. So.

A

uh Finally, we also use GitHub actions and we use GitHub actions uh mostly because we can automate a bunch of the GitHub apis for opening and emerging pull requests, and that ties very much so into the concepts I want to talk about um the high level Concepts that we have are largely around kind of our promotion process, which we call the ring model. The ring model is specifically about uh basically bucketing customers into groups of stability, so that we can slowly roll out changes uh one phase at a time to that bucket of customers.

A

So, for example, what we actually do is we have a staging instance and the staging instance gets every change pushed to it as part of a continuous deployment. And then, when things look good, we promote that to what we call ring zero and ring zero as other testing environments, whether it's doing performance testing or just staging environments. At all said, then, what uh what happens is once that kind of passes the Q8 there?

A

Then we actually run it to ring one which would be our rapid kind of released phase, so customers that have adopted into getting updates sooner, but potentially less stable releases and then kind of so on and so forth. We promote to ring two which is more stable and then ring three which is more stable, Etc.

A

So that's kind of how we structure How, We Roll things out. This was inspired by the internal model used at GitHub. We have some X hubbers at offset and that actually inspired us greatly to to solve this problem.

A

This way, so we know it scales because it's being used by big companies like Microsoft so and finally, we have uh kind of get Ops, but githubs by Bots um is kind of how I want to talk about it, because, while githubs was great, um making changes in some of these repositories can be very verbose and error prone. It can take a really long time, so what we actually do is we have automations all around it, so you can manually kind of Click from a drop down to say.

A

I want to promote this ring to this ring um and then box handle the rest. So you get kind of like the the benefits of having everything checked in to get and if you had to manually override anything, you could but also a lot of the error prone side of copying and pasting specific versions into specific places, all automated away. So in the general case, you pretty much don't have to open your Editor to to make the changes that you want to see propagated to the system.

A

So here is a drawing of our customized configuration. We kind of split it into three top level. Folders We have the bases, the features and the overlays. If you're familiar with customized overlays are typically used for the kind of end results. That's going to be a renderable thing that you can actually apply to a cluster, so we have a Dev one or actually variations of Dev ones, and then we have kind of customer specific ones, the customer specific ones. We keep in a separate repository.

A

The info repository that tracks all the customer environments and the Devlin lives in our mono repo, alongside the configuration itself, um but then overlays are composed of at least one or more base and then a set of features. So examples of features are like postgres database or ECR for getting your your images on this cloud provider or GCR if you're using Google Cloud.

A

So we actually break everything down into these different features that you can then compose together um and to actually build a working system, uh and then the bases are kind of like the base layout for a cluster that installs, the things that we want to assume are always going to be there. So in the the actual like regular cluster base, we have um basically the monitoring stack, that we want to use to deploy to absolutely every cluster, to make sure we kind of have a baseline of understanding the health of every cluster.

A

That is not specific to any workload that we deploy to it. This gets used both on a an info cluster that we run a centralized for our infra, like operations team, but also then, on all the customers as well. But then we also have this Dev base, and this Dev base is basically filling the gap between something like a Docker, desktop, kubernetes or kind, and making that exactly similar to what we get when we run palumi to generate a cluster on a cloud provider for an actual production environment.

A

So that kind of fills the gaps there so that um the Clusters look exactly the same. They have the same starting base. Then we apply the base and then we apply. Whatever features are specific to that that environment, um so here is kind of the architecture of the git Ops pipeline in our monorepo, as I said, we have a configuration that lives in there.

A

That makes it so developers can iterate on configuration and also the code for the different projects and kind of spin that Stack Up locally and running on their machine and test everything out. And then, when that looks good that gets committed to a monorepo and then what happens? Is we have this other info repo which tracks customer environments and customer environments are actually organized into rings and then those Rings reference, a specific, commit Sha of the mono repo.

A

So that you can actually point it to a particular snapshot of the configuration of the monorail at a point in time, so that's how we kind of get uh basically all the version tracking and the ability for us to promote different versions of the configuration to different customer environments. Inside of that infra repo. We also have the binary that manages palumi and that's what's going to provision the individual clusters.

A

We have configuration files for each each customer environment and there as well so that's kind of the the Central Central source of what is represented in production. Every cluster is also deployed into its own cloud provider account. So if you're running on Amazon each customer runs in an AWS account, that's individual to that that particular customer. That's just the level of isolation we've chosen for the system, um but that's not necessarily A Hard requirement for for every managed database as a service, we're just a security product.

A

So we take kind of isolation a bit more seriously than a lot of other people. um So then, finally, we have our centralized infra kubernetes cluster. This is what runs Argo. It runs Thanos so that we can actually collect metrics and and query and understand the runtime of our customer environments, but what Arco is going to be doing? Is it's going to be pulling the infrar repo and asserting that each of the customer environments is synchronized to the proper state that the customer environment is configured for?

A

So it does make sure that if there's anyone that logs into a machine- and they are debugging something if they skew the configuration it's going to be restored eventually by Argo that way, even if a machine gets get compromised, we kind of have something that's going to reset the cluster and basically make sure that nothing is. Nothing is the way it shouldn't be. um So that's our high level of uh kind of the git Ops workflow. We have time to move on to the runtime environment in the runtime.

A

We have built two custom operators, so this is going to be the kind of the with kubernetes operators portion of the talk which is kind of the meat and potatoes. So we have decided to split uh basically our system into two different operators.

A

The first operator is open source and it is basically all of the configuration and operational know-how to automate running Spice TV the database itself.

A

We make those open source because we want our customers or any open source users to also be able to operationalize and runs by Stevie, just as good as we can. um So. This includes scaling space DB, making sure that it doesn't drop traffic, making. Sure Space TV knows how to basically self-cluster it handles running migrations of the data changes across versions.

A

It makes sure that it has an update graph and make sure that you go from a supported version to a supported version and basically assures you that you have zero downtime as you go through the upgrade process. So this kind of logic all lives inside the space DB operator, and then what we have is is the auth Z operator, which is our proprietary operator, and this includes automations that largely are reliant on assumptions about how we've laid out our clusters.

A

So if the functionality is tightly coupled to opinions and decisions for how to run a kubernetes cluster, then we keep it in the proprietary one. So, uh purely because it's not applicable to anyone else's deployment, it's only applicable to ours.

A

So that's kind of the decision-making framework for us on where we cut cut things off on the open source and proprietary. um But at the end of the day, the users they get. This next.js front end that we've built and that's the customer facing interface, but it's actually an interface to kubernetes.

A

So what we're actually doing is making it. So when a user logs into the the dashboard for a space to be dedicated, they're, actually seeing a view of kubernetes and the resources that live on the cluster and when they say, for example, choose to create a new space. Db cluster they're, actually talking to a JavaScript application that is going to talk to the kubernetes API and create custom resources, and that is how the core of everything is functioning.

A

It's all using kubernetes as the source of Truth and then, of course, we kind of have all the additional tooling that compose our kind of opinions for how to run kubernetes. That's using things like Contour or insert manager um and, like the Prometheus operator, I, think things like these, so at the core of the concepts of our runtime include basically centralizing everything into the kubernetes control plane. You want to use that as your source of Truth. It makes it a convenient API um for for managing all these things.

A

For us, our data, like the actual control plane, that's being used for our customers to make changes, is one in the same with the control plane that our operations team is managing. So that gives us a convenient way to to basically interact with the system. We don't have to build some kind of admin interface into the dashboards that we can kind of get our operations team access to the customer control plane.

A

Now there it's just one in the same control plane for us, so that's where a lot of the benefits come from, but also the the power of the operators being that the customer driven changes um actually live also in the cluster. So this is what's enabling the fact that a customer can log in start making changes to the infrastructure and those can apply immediately, because all those automations are not a living operator that has to get paged and go to the cluster and make a change to it.

A

Instead, it is a kubernetes operator, that's running in the cluster, that can manipulate the desired state of the deployment and just run with it.

A

So with that, we have basically the namespace layout of one of our clusters.

A

Some of these namespaces get applied to absolutely every cluster, and some of these namespaces are exclusive to a particular cluster. So um the awesome monitoring namespace, for example, gets deployed to absolutely all clusters that we run. This includes kind of like the all the infrastructure. We need for paging, alerting doing metrics tracing applications that are running on the on the cluster base, and this goes all the way to non kind of spice DB customer clusters.

A

This also runs, for example, in our infra cluster, so that we can make sure that the info cluster is running and healthy, even though the infra cluster is only running our internal tooling and not spiced DB workloads.

A

So this is fully generic and can be reused across the company, but then gets specialized by kind of the resources that get created in other namespaces. So then we have the lsed system and I'll set a region. So the difference between these two are the system is what I would say? Is the customer facing control plane so multiple um uh in region, in customer environments, where they actually are running in multiple regions? So say you have a Europe and a North America uh kubernetes cluster deployment, so you have two individual clusters.

A

um What ends up happening? Is you pick one as your control plane and that's where the offset operator runs? That's where the dashboard runs, um anything that's kind of driving. The information on the dashboard is going to live there and what ends up happening is when you choose to provision something there. uh The offset operator actually understands the configuration for the other regions that make up the customer environment and it will create resources in the appropriate cluster.

A

So onset, a region is kind of the thing that standard Tides standardizes a cluster to be able to run spice DP. So primarily it has the spice to be operator in it and that's going to sit there and watch for the request to create clusters or make changes to clusters that the offset operator is then going to create as a reaction to a customer making a change in a dashboard, and it's going to create those clusters inside of the tonight namespace. So the tenant namespace is where all the kind of runtime customer data is.

A

This is where the systems they've proficient live. It's the one that the operations team is mostly going to be inspecting, because these are these places where the customers are actually live, making changes. This is what we typically focus on for backing up data like customer specific configuration, the things that they have actually changed on the system.

A

Every other kind of, like smaller namespace in here are the kind of cluster dependencies, so we use the Prometheus operator and Kube State metrics, just to like make sure that we're kind of got the standard, centered operational um kind of deployment for collecting, metrics and observability from the cluster um I kind of mentioned earlier, that we use cert manager and Contour as our Ingress and pki infrastructure, and then we actually create two deployments of Contour in the internal and external namespaces.

A

These namespaces are for internal and external traffic, so because customer environments are often in vpcs, like virtual networks, um that traffic goes through a specific load, balancer and then internet facing traffic goes through the external load balancer. So that's how we kind of differentiate those and do peering to internal networks at our customers, companies and then finally, we have Valero, which is going to do backups and then all the cube system, e namespaces, that you get from the different Cloud providers, cool, so um kind of transitioning.

A

Now to a final topic: the Final Phase the day two operations, um these Technologies are kind of the standard ones um and uh the reason why you pick the standard ones is kind of like the high level concept I want to mention, which is that the observability data isn't just for you, because you are building a system that is kind of customer facing infrastructure.

A

Some of this data you're going to pass on to your customers. They want to know what the latencies are of the the database. They don't know how much CPU they're using they want to know how much capacity they're, using if they're going to have to scale up if you're growing up to scale up is that going to affect their bill.

A

um So it's not pure purely your decision on what kind of Technologies you're going to choose for for these Stacks, because they're going to integrate, potentially with customer systems, they might want to ingest logs or traces or metrics from their database as it's running into their own systems, so that they can also page their Engineers. If something is going wrong inside of the manage database service. So for that we're using all the standard kind of Prometheus ecosystem for observability.

A

So that's kind of the Prometheus operator, Coupe State, metrics, Thanos grafana the works there and then we use for traces but generically just open Telemetry and then, as I described before, backups need to be not only data, so we're using kind of the Box standard cloud provider, datastore backups, so things that come with the data store themselves, but we're also building um ppis so that our customers can basically export data out of live systems or stream that data to a replica that they have themselves, maybe on a completely different premise: on-prem or a backup environment.

A

um So we're we're kind of tackling this on both fronts. But the unique thing is actually not the backup of the data, but the fact that you have to also back up the configuration, because if you restore the cluster and replay all plumey and your configuration changes, that's not going to include any of the changes the customers have made to the control plane themselves.

A

So that's where Valero comes in and we're actually continuously backing up the changes that customers are making to the Clusters so that, if we have to restore a cluster, we can restore absolutely everything. And the kind of nice thing is it's all kind of decoupled in different ways. So we can restore just the customer data if we need to restore it to a older version of maybe the cluster or an older version of the configuration all the namespaces that run in the cluster. Because everything is broken into these three different categories.

A

We can actually mix and match versions to produce stable versions or unstable versions of the environments for our users.

A

So with that, uh like to conclude, you can find me on social media in these three places on Twitter blue sky, and then you can always email me if you're interested in any of the projects I talked about, we have a link to Spice TV dedicated the open source space. Dp operator is available for exploring and kind of like learning how we went about automating, the actual operational side of our database.

A

That's actually built on another library that we have open source, called controller idioms and what controller idioms does? Is it wraps up high level behavior that you're going to need to implement for idiomatic custom controllers and kubernetes operators into a library that you can just reuse?

A

So examples of this are kind of custom informers, setting statuses, according to other properties, that of like other resources, you're managing things like being able to pause your your operator so that the operator stops reconciling so that a human can come in and debug kind of, like these higher level patterns that you would always need to implement that are not like the core logic of the operator, we've kind of abstracted in a way that you can import um and then also, if you're, more interested in spice DB itself, you can always join the space, DB Discord or look at our GitHub organization.

A

We have plenty of other open source projects all around the cloud native ecosystem things uh regarding, basically all parts of the stack operators grpc the database itself, clients for the database. Things like that. So thanks for your time.