Cloud Native Computing Foundation KubeCon + CloudNativeCon Europe 2020 - Virtual, 26 Aug 2020

Previous Meeting

⏯

youtube image

►

From YouTube: Sponsor Demo: DataStax - Cloud Native Cassandra: Deploying on Kubernetes with cass-operator

Description

During this demo, we will discuss Cloud Native Cassandra and how to deploy on Kubernetes with cass-operator.

A

Hi, I'm cedric linven, director of developer advocacy at tedx, and today I want to show you how to deploy apache cassandra in kubernetes using the cass operator. Let's get started. The first things to ask is what is cassandra? Well, apache cassandra is a distributed. Nosql database. You can install cassandra on a single node. You would have about one terabyte of data, 3000 transaction and per node, but it makes sense to install cansandra on multiple nodes in these architectures. There is no master, they all communicate to each other. With a peer-to-peer protocol called go seeping.

A

Those nodes can be grouped as rings or data centers. With that, let's see the use case you have with this database. Well, first, the more capacity you need, the more node you add if you also need more throughput, simply add new nodes. So with that cassandra fit all the ev rights, ev read use case could be time series event streaming. Log analytics internet of things. Second range of use case for cassandra is to leverage on the availability.

A

The data there in the cluster is replicated multiple times. That means you can lose any of the node. Is not a big deal so with that there is no data loss and the system is always on. Remember, peer-to-peer, no master and data is replicated. Yes, you can totally lose any of the node. It's not a big deal at all, so with rangers use case. Caching, market data, pricing or inventory, and many many more then cassandra is distributed.

A

You might have seen in the schema before and so with that you can install some nodes on dedicated countries or nodes on dedicated regions and comply with gdpr laws or cope with latency constraints from your applications.

A

Again, a lot of range of use case you might think of banking, retail or global company would like to benefit on the distributed capability and last, of course, cloud native. This is community hardware, you can lose any of the node having bad network or not. Good disk is not that a big deal. Everything is done asynchronously, and so you can totally implement api hybrid cloud multi-cloud on top of casama. So how well in kasana in a cassandra cluster, you would have one to multiple rings. Those rings could be distributed geographically.

A

Apple uber are famous to have those kind of distribution or using hybrid or multi-cloud, having the same data layer available in the multi-cloud yeah. So how to deploy that? Let's see we will probably leverage on kubernetes right.

A

This is the cassandra cluster. The big blue is the cluster inside. You could find rings, one to multiple rings and within the rings multiple nodes.

A

Now there is also something called the rack in cassandra architectures that tells cassandra how to distribute the data among the nodes, because if you know that two nodes are on the same rack or the same geographical region, you may want to do the distribution of the data to lose as many data as possible. If you lose this rack so now, let's see how we put these into kubernetes.

A

Cassandra has been available for ages in docker, but this is a stateful container and, as such, you need to provide some volume to under the storage to export sport, get some environment variables. Quite a lot of parameters to add right, so first docker compose can come to the rescue, but even then, in a cassandra ring. There are some nodes tagged as scenes identified by ip and each time a node wants to join the cluster. You need to be aware of those nodes. So it's not that easy to make a cluster scale.

A

You can scale nodes, you can scale seeds, it's not behaves the same way. So, of course, kubernetes come to the rescue. We would need to create a dedicated custom resource to properly manage the life cycle of a cassandra cluster. How to stop stop nodes, make the cluster scale or even stopping parts doing some updates of the configuration, rolling, restart everything will be listened and scheduled and managed by an operator. So let's see how you install the printer and how it works in the kubernetes cluster.

A

Okay, I am on my laptop and I have installed kind just to show you that you can run the demo even on your laptop first. I've created a kind cluster using this kind config file. I do have a single control, plane and multiple workers. The reason is, I need one worker per cassandra.

A

Node right, then I associate kind with coupe ctel to be able to do cube, get nodes and get both my worker and control plane.

A

Then I create a key space to put everything in a single place and then first step in the key space. I create a storage class. So now you might tell yeah there is a default storage class, but the volume you want to mount for each cassandra node will be specific on your needs. You want to use local storage and have the disk as fast as possible. Probably if you are running on gke, maybe you have some dedicated, faster storage. So anyway, the storage class is really custom for your environment.

A

Important part is wait for first customer to be able to start and use the storage class. Okay, I've created simply this courage class. Then I have imported the operator so this time the yaml is there. I we create the namespace it was there already create some services.

A

The cast operator define the custom resource definition all right and a couple of accounts and secrets are needed to have the environment ready. If I go there and simply do koopa ctl get pods, you can see that I do have the cast operator running, but except from that, my coop cluster is empty. If I want to show you what just happened using a schema, this is what I do guys. I do have my cass operator namespace.

A

I create a couple of uh secrets, services and custom resource definitions under security and webhook all to handle the connection between the operator and the custom resource we are created and we also had defined a storage class which is here the kind uh storage class for my laptop all right. So now, let's create a custom right cluster using the operator.

A

First, I will define a yaml, so let's create a single node cluster for cassandra. You can see that I'm using the cassandra data center custom resource here you define the data center name and this is how we could define multiple data center in a single ring or in a single cluster or maybe even having multiple cluster with multiple rings. Each time you will set that cluster name and metadata name here. So we are using cassandra as you can see it's an open source.

A

So the operator is not only for datastax enterprise, it's just open and it's work with customer versions.

A

You define how many nodes do you, like you, do the mapping with the storage class and once the node will be set up, we want to override the default configuration, providing cassandra, dml keys and also gvm option keys. So what I do now is go in here and I will copy and apply those configuration to be able to watch what's happening. I will watch the code and see that I will have my sts created, so kubernetes will create uh the cooperator will create sts for each cassandra rack.

A

So, let's see how it goes.

A

All right, after a few minutes time to download all the images now we do have the sds ready and if I want to describe the cassandra data center or cast dc custom resource named dc1.

A

This is what I get so I do have all the information coming from my yaml of course, and here you can see all the event that made possible that cluster creation so creating service seed, all the services, creating the rack, putting the node in the rack and created all the users needed to manage this rack.

A

If I want to show you what it looks like so by creating this custom resource, this is everything that we've just created. So we created the custom resource dc, one it. We do have a super user secret uh sts for each rack. No rack has been provided in the yaml file, so it will create one by default, then I will create a dedicated pod for each cassandra node. This is the name of the board and for each node we will attach some persistent volume and it's all done by the sts.

A

Now we also have some dedicated services to run all pods. Only the seeds or service, and with that you can totally scale up your cluster in transparent manner and all will be updated, as you expect. So speaking of that I do have here a second yaml. This time called three node with new config.

A

So what I do is changing the scale the size to three and also adding new value in the configuration, and what I do expect from the customer operator is to make my cluster scale up and do the rolling restart if needed, to update my settings.

A

So to do so, I will copy my command here so see simply applying this new yaml and again I will watch my but to see what's happened and, as you can see, it's immediately start the new pods in the same sts and now I will start cassandra, make it available first init, and only when the first will be available. We will start the second one, and this is all the purpose of the operator doing the step in the proper order.

A

Again, let's see in a couple of minutes.

A

During the time that image are pulled back off, we can try to show what happened at the cassandra data center level. So you might see that because now we ask to scale and have the size to three.

A

The state of the custom resource definition is not what it should, and so, as a consequence, the operator will make the cluster scale to match the new state, and so we will move from this state with a single pod for the sts to three pod, and this is exactly what's happening: zero one two where all the pod will beat startup. But when does the operator know when to start another one?

A

Because cassandra alone is not kubernetes ready at the tax stack, we had to create it and update and do some open source for a cassandra management api service. This is a sidecar running in the pod, just to expose a rest api for kubernetes to know you know the liveness and the readiness of each pod, and if that you see how it works, you expo you deploy.

A

New yaml operator will match the state of what in the ml and what it should be and will execute the command to make the cassandra cluster fit the reality and with that, I'm done with the demo. But you can do the same by using this query code, and I expect you to see all of you at the workshop all together, where we can play even more, including graffana, prometheus monitoring and some operation using cassandra itself. Thank you very much and see you.

A