Apache Cassandra Cassandra Summit 2014, 10 Oct 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Pivotal: Apache Cassandra on Pivotal CloudFoundry

Description

Speaker: Tammer Saleh, Director of Product - Cloud Foundry Services at Pivotal

Pivotal is dedicated to bringing best-of-breed data services to Pivotal CF, and there is no other open source data technology with as much potential as Cassandra. We’ll discuss the strategies and techniques for deploying and managing a multi-user Cassandra installation that integrates with Cloud Foundry.

- Making Cassandra manage itself
- Single-tenant versus Multi-tenant usage
- Deploying Cassandra with BOSH
- Cloud Foundry services architecture.

A

A

Hi everybody today I'm going to talk about cloud, pivotal cloud, foundries, use of Cassandra and the product that we're building for deploying Cassandra clusters on premise behind the firewall in a kind of automated fashion. It's a partnership between pivotal and datastax that we're going to be releasing in the next couple of weeks.

A

So first, my name is tamara saleh, I'm the director of product for pivotal cloud foundry, I'm actually in London, flew in for this event, which is is marvelous incredible. I also was an engineer for pivotal for a while spent some time pairing on the runtime team. In the past, I've ran consulting companies, director of engineering for Engine, Yard, various other things.

A

Today, we're going to talk about a bunch of things. It's a lot of content to get through so I'm going to try and move through it quickly. Let me give you a quick overview of pivotal cloud foundry what it is and how it works. I'm going to talk about the services API and I'm also going to talk about why we chose Cassandra as the the data store to focus upon how we built Cassandra as a service available at a platform and how we automated Cassandra operations.

A

So, first, let's give a quick overview of pivotal CF, pivotal CF is a platform as a service who here is familiar with cloud foundry in general, raise your hands, maybe Heroku or any other platform as a service. So pivotal CF is very similar.

A

To I mean it is a an on-premise deployment of cloud foundry, it's very similar to Heroku and how you use it, and the idea is that we can give developers agility by removing all of the friction from getting their app from a developed application into production right and a phrase that we like to use in. Pivotal is CF push is golden and that really what we're saying is a sing command to push your app into production is the user experience goal that were that were focused upon now.

A

Frankly, nowadays, anybody can produce a stateless platform. That's kind of table stakes for the cloud world at this point. It's really about services, how you bind services and how you, how you add state to that.

A

So here, I've shown all the commands for producing or pushing at an application to production and binding a stateful service to it and then immediately seeing that up on up online now with ppl CF, there's also not just the command-line interface, but there's also an operator focused dashboard and the application developer focused dashboard, so pivotal CF is multi or its its infrastructure agnostic, so Cloud Foundry runs on vsphere, AWS and OpenStack and pivotal CF basically comes in as an appliance that you deploy.

A

It uses a technology we developed called Bosh on the backend I'm, going to talk a little bit more about that, and and again it shows a web-based dashboard to the operator.

A

So they can easily install other things like Cassandra alongside it one when the operator installs, the runtime tile, which is the heart of Cloud Foundry ops manager, deploys this this complex system, which basically involves a routing layer, a layer for managing the applications, the application lifecycle, single sign-on things like that, a health manager to make sure everything keeps running and then, alongside that, they can install other tiles like Cassandra and those are deployed as services that talk to the runtime and enable the the integrations and then finally, there's the application developer console which is deployed alongside it as well.

A

So real quick. Let's talk about the services API and how that works. So, let's look again at kind of a simplified version of the internals of Cloud Foundry. You can see here. We've got again the operations manager, we've got a Bosch behind it. On top of the infrastructure and we've already deployed the cloud foundry runtime we've got the router and then we've got some application instances and the cloud controller, which is the API endpoint. That manages the whole thing now. The operator has now deployed a service next to it.

A

The service is it's all about the service broker, which is a very small API endpoint that communicates with the cloud controller and the concept of service instances that the application developer is requested. So, let's kind of go through the workflow here: application developer and I like the command line, because I'm a UNIX geek so most of stuff I talk about it's going to be on the command line. But of course this all works to the dashboard as well. But the application developer says: okay, CF marketplace.

A

Tell me what you have for data services like Cloud Controller then reaches out to the service broker the sir. It says: what do you have? A service broker says: well, I've got Cassandra right and then the Cloud Controller says here's here's. What we have available. One of the services is Cassandra, and this is a view of the the dashboard, but in the command line. You just get that listing right there. So the application developer says: sweet I, like Cassandra I, want to get that right.

A

So they create that service when they, when the initial, when they, when they run CF, create service. The Cloud Controller talks back to the services API again on the broker and says: ok, give me an instance of this, and you can see the little blue instance that now popped up at that point. The service broker has actually made a database for the application developer right now, it's just sitting out there there's nothing talking to it. You can bind it to an application, so the application developer says buying the service to my appt.

A

The blue dot on the left is the application blue doubt on the right is the service instance, so when he says, bind it it provisions a binding, the binding gets returned as basically an environment variable that is exposed to the application instance right from that point forward. All communication goes directly from the application instance to the service instance directly. We don't get in the way right now.

A

The reason for that two-phase approach, where you provision the service and then you bind us to the application, is so that you can bind that service to multiple applications right depending on how you want to do your microservices architecture, you might want to share database on the back end, but the point of this is the services. Api is incredibly simple, incredibly easy to implement. This is literally all of the RESTful API calls for this API and I'll point out that these are all the verbs we support as well.

A

There are only five permutations, so you can write a service broker trivially in something like Sinatra, via Ruby or via java. It's very easy. The complexity of doing a service is not in the API. It's in the implementation behind it, which we'll talk a lot about so real quick I want to talk about what services we're focusing on for pivotal kind of get into why cassandra is the one that we're taking to production.

A

First, we have Redis, which is a kv, store, I'm sure every I'm sure everybody's heard of these databases, so I'm going to kind of move through it quickly. Redis is one of my favorites: it's somebody referred to it as the ak-47 update of data stores, it's very focused in what it can do its rock-solid, but it doesn't scale very well in terms of number of nodes. I think you could do masters slave at the moment.

A

Clustering coming out later, nothing like a Sondra neo4j for graph database, elastic search for full-text, searching MongoDB for a fast and easy, no sequel store memcache for caching, Maria DB. For my sequel, we have our own services like our Hadoop distribution, PhD react, cs4 a blobstore, s3 compatible and then, of course, Cassandra highly distributed right. Heavy kv, / columnstore, so these are the focuses of what we're going to be releasing with pivotal CF. Some of these. Some of these are already released. Some of these are in progress right now with the London team.

A

So Cassandra is our first production grade service and some people have asked why. Why are we focusing on Cassandra I? Think it's pretty clear, though this is an independent study that was done by the University of Toronto in 2012. Has anybody seen this study raise your hands?

A

This is great because it really outlines the fact or underlines the fact that Cassandra is the the cloud store of the future, its scales linearly. They did a review of all of the no sequel, open-source stores that they could find and cassandra is the only one that performed this well.

A

Of course, netflix has shown that they were able to get up to I. Think it's a million writes per second, with Cassandra and again linear scaling is very rare in computer science to see a system that actually scales this linearly. It's really impressive, so we believe you know, Cassandra is truly cloud scale. Cassandra supports multi data center deployments and nowadays, given the cloud a multi data center deployment is is commonplace. I remember when I used to be a unix administrator for citysearch.

A

We had two data centers one in los angeles, one in chicago, and that was cutting edge for us to have that kind of distributed system. But even then it was just a master's master-slave failover nowadays, with Amazon and with various cloud technologies, is very easy and how in place to be deploying your application across multiple data centers, because Sandra is the best data store to support that.

A

So, let's talk about how we built Cassandra in terms of the architecture when it's a service that integrates of the platform so again, I said the services. Api is actually quite simply, you've got kind of these four main terms. You got the service itself, which is kind of a meta term. It means what exactly is the technology that you're deploying for in this situation? Of course, it's Cassandra you can choose the various plans in that service, which is nice.

A

It gives the application developer a little bit of tweaking I want a small me, more large, Cassandra right, and then you provision the Cassandra instance, and then you bind it again. The binding is just a user account and some credentials that gets passed back to the to the application. The real trick is what is an instance whenever you're building out a new service, you have to think long and hard about exactly what the user gets when they say.

A

I want an instance of service x, all right, so initially, the easiest way of getting a service out into the world is to do a what we call multi-tenant installation. So we have this available right now in beta form for the Cassandra product, and what this means is it's one Cassandra cluster that is divvied up amongst users right when I asked for an instance of Cassandra.

A

What I actually get is a single key space and a user that has access to that key space, so this works actually fairly well with Cassandra in terms of scaling, as we've already seen, Cassandra can scale out linearly and it's good for development and testing and staging environments. Unfortunately, there's some limitations of Cassandra, which means that it's not good for heavy production use, and it's not it's not good for in it for strictly untrusted environments. So, let's talk about a couple of limitations, key space visibility. This is the biggest one.

A

So, basically, there's no way in Cassandra to say that user a cannot, see the names of all the key spaces in the system. All right now, that's not a huge security concern for us, because in cloud foundry the key space names are actually just randomized. Goo id's, a user can see that there's 20 key spaces or 200 key spaces, but that's not really a big deal. A bigger problem with key space visibility is that in Cassandra, there's also not a way to say that user a cannot, see the names of tables inside the key space.

A

Now the names of the tables inside the key spaces are determined by the other users. So there is some information leakage there. That makes this only appropriate for again for semi trusted environments. Now pivotal CF is designed to be deployed behind the firewall inside private institutions, so it's still useful again for staging and development environments.

A

There's also the noisy, neighbor problem in Cassandra. There's no way to limit the amount of, for example, the amount of CPU, that's used by queries against a single key space. The amount of memory that's used for queries against a single key space or the amount of disk space this used by a key space. We actually did some research to see if we could just make use of the underlying unix system quotas to deal with the the disk quota, at least the key space size that would kind of work.

A

The problem is that the way Cassandra dumps, the I think is SS tables- means that it'd be easy to kind of overrun that quota and then lose data, and that's absolutely unacceptable in our situation. So noisy neighbors is a problem with a multi-tenant cassandra and again quotas as well, noisy neighbors for cpu quotas for things like disk and memory.

A

So so we have the multi-tenant cassandra, which is good for application development staging. But when we really want to push it out to production, we need to think bigger right. So for production level, when a user requests a cassandra instance, what they're actually going to get is a set of dedicated VMs that are set up as a cassandra cluster in the initial version we're just going to hard code.

A

That number, maybe three and probably one number four plan so easier to say I want the small plan, which is three I, want the medium plan, which is say ten. I want the large plan, which is whatever the operator wants to make it now. This is truly production grade. It's is dedicated VMS as a cluster, there's no noisy neighbors whatsoever, there's no issues of quotas, because the operator can set the size of the disks on these VMS. The only issue with it is, it is quite expensive.

A

The operator is clearly going to want to limit who can provision a production level, cassandra cluster there's also a middle ground and by the way, this is what we recently rhian cepted about a week ago. So we had a big kickoff meeting where we defined the architecture to find this story's, pivotal's and agile organization and we're actively building this version out for the production level. Cassandra, there's another architecture that we've been looking at closely and it's something that we're considering it's not quite clear.

A

This is something that our customers are demanding yet, but it is it's a possible architecture which is basically a mix between the two, so it's a shared set of VMs, but when a when a user asks for a Cassandra instance, what they get is a set of Cassandra processes spread out across those VMS configured as a cluster wrapped inside containers, we use warden for a containerisation technology. You can also use docker and the key is that these would be shared in the VMS.

A

So a set of VMs might be running, say three VMs might be running 20 Cassandra clusters right now because of Linux containerization technology. That's rather well, it's not new. It's actually been around for a long time, but it's constantly improving and what you can, what you can constrain with containers. It is possible to basically remove the noisy neighbor problem entirely. You can constrain, of course, how much disk space is being used. How much memory is being used? How much CPU you can even constrain the networking aspect, so the clusters can only talk to each other.

A

You can do all kinds of things there. The nice thing about this is that a user would get an entire access to the entire key space or all of the key spaces. With a multi-tenant solution, a user just gets the one key space, usually for Cassandra. That's the right way to go because hundra, you know. Usually you have a key space per application, but we have seen you use cases where people want to be able to provision key spaces on the fly in order to segregate data.

A

So that's the architecture of what we're building. Now, let's talk about some of the some of the issues with automating Cassandra.

A

Let's see, Cassandra automation is managing a cassandra database is actually fairly complex, repairs and timings, and such we have a tool called Bosch, which makes all of this much easier bosch. Is our tool for deploying everything, Cloud Foundry I actually used to run product for Bosh, so I was running. The Bosch team. I can say this because of that Bosch is incredibly powerful. It's also incredibly painful to use I like to make the analogy. Bosch is like a like this.

A

This huge sword that you get to wield, but you have to have to hold it upside down, so you know you can get a lot done with it, but you're going to cut yourself in the process. That's why we built ops manager around bosch to make the interface easier.

A

So Bosch is a tool that is you can it's in the same space as as a puppet or chef or salt or ansible, but it takes things it does things in a different way. It's predictable, it's repeatable, its infrastructure, agnostic and it's built for large-scale deployments. Bosch takes care of the entire life cycle of a deployment it takes care of provisioning. The vm is talking to the infrastructure to get that done. It takes care of configuring. What's on the VMS, it takes care of juggling the persistent store for the for the VMS as well.

A

All the networking everything bosch is also very, like I said, predictable and repeatable, because bosch compiles everything from scratch. It does it once and uploads it to a blob store and it uses that same the same set of blobs to lay down every vm that it deploys. That's also why bashas built for large-scale deployments. In addition, Bosh has features like Canaries rolling deployments.

A

So when you want to update a Bosch deployments, so you you make some changes to how your deployment works, and you run the Bosch update, it'll, actually, first deploy a canary make sure that that's working fine, then it will deploy all of the VMS, maybe in hundreds of VMs in a rolling manner, maybe 10 at a time.

A

You can configure that Bosh also ensures that all the processes are running correctly on the VMS, the ones you've told it to watch and and to make sure that the vm stay healthy and that's a feature that we added to Bosh because of AWS. It's the Bosch resurrector kind of a side story here is that we used to run our public cloud foundry installation on a vsphere installation, a vsphere data center that was absolutely production grade. Emc data storage, I think EMC servers as well.

A

I think they're, like gars, with machine guns outside this building right. Because of that, we didn't really have to worry about a che, because VMs never went away right and then, when I came onto pivotal, my first task was to take this production platform. Does public public cloud and move it over to AWS, which I like to refer to as the the quantum flux of the cloud world AWS is absolutely the worst API and the least reliable of any of the infrastructures that you could possibly work with.

A

So because of that, we had to build more h, a more reliability in nebosh and also we built this feature called the bosch resurrector. It's a very simple feature.

A

Basically on each one of the VMS, that's deployed by Bosh Bosh has an agent, that's constantly heart, beating back into the bosch director. Now, if at any point one of those agents, just the agent stop sending heartbeats could be a network, partition could be anything the agent could be down for some reason, box doesn't really care at that point. Boss just sends a stone, ax signal to the to the infrastructure and just wipes the entire vm off the buff.

A

The the the network now again bosch is aware of the of the persistent storage of the vm, so nothing's lost it just recreates that vm and then attaches the persistent store again now the agents heart beating and everything just works. Fine, and this is proven to be one of the more important features- a bosch, especially when you're deploying it's something like AWS.

A

Now another interesting thing about Cassandra, who here, who here knows about the issues with Cassandra and NCPD, raise your hands? Not many people see a few of you laughing because you probably had to configure this yourself in the past.

A

There's a difference between precision and accuracy, which I think most you probably understand. The interesting thing about Cassandra is that it's more important for the clocks on the various Cassandra VMs to be precise than accurate, I, don't care if my Cassandra nodes think that it's 1912 as long as they all think that it's 1912 right.

A

So the way that you get that done is in ntpd, which is the the service that ensures that your VMs are have the right time set on them. You can tell it to think they call it homing to peers. You can tell it to to home to peers in a in a tiered fashion. So this is a very unusual configuration for VMS in a network. But basically you say all these nodes I want them all to hone to each other.

A

All right, you make a little mesh of ntp nodes, and that means that they're all going to first and foremost, have the same time now. This entire little network can drift in terms of time, so eventually they can all be an hour off, but they're all going to be exactly that same hour off.

A

So then, in order to add accuracy accuracy to it, we also tell them as a secondary source that they should all use.

A

Another node in the same network as the is the secondary so first and foremost make sure that we're all on the same page and second, let's all kind of look over here and see what the upstream says right now we already deploy a node for the service broker, so we just co-locate the NT that ntpd server on the service broker now in order to make sure that that's accurate with respect to the internet that one service broker then hone homes up to whatever ntpd server you set in the in the office manager configuration up in the internet, so there's actually a really great blog post, which is linked to down here.

A

That describes not just this configuration but kind of all the sets of intermediate configurations that this person went through to figure out the best way of setting up time on an auto Cassandra cluster.

A

Another aspect of maintaining and managing a Cassandra cluster is the the repair lifecycle, I'm sure everybody here is dealt with running repairs on Cassandra cluster. In all of the all the times you might have to do that by hand. Now the challenge with pivotal Cloud Foundry is that we're taking over the operations aspect of Cassandra on behalf of the operator right, so we're we can't send in any telemetry back to headquarters for us, because it's all deployed behind a firewall. Most of the point of this is that companies don't want that stuff being sent out.

A

So we can't send in our engineers when we notice that there's something wrong one of the VMS when a repair needs to be run or something like that, we have to build a system that will work like clockwork and keep the whole thing running on its own.

A

So there's a set of times when we run repairs the first one is when we're decommissioning a node, and we know that that VM is going away permanently. Now we don't run repairs when we know that node is coming right back. So if Bosch is doing a deployment- and it has to recreate the vm for some reason or if the resurrector ran and had to recreate the vm, we don't run a repair then because in those situations you could actually end up running a repair across the entire cluster and really degrading performance.

A

We also allow the operator to configure a time threshold when, if the node has been down for longer than this period of time, then we run a repair when the node comes back. So if there was some downtime in the node for say twenty minutes and the operator said 15s their threshold. When the node comes back, we will we will run a repair on that node.

A

The reason we have to make this configurable is because it actually has everything to do with the the amount of data that's being written to the loads per minute right and that's something that we can't. We can't know ahead of time, so we have to allow the operator to change that.

A

We also run a repair on each node once a week within the GC grace period and that has to do with tombstones and making sure that tombstones don't get don't get destroyed. Just then we see old data coming back right.

A

So the whole point of this is like I said. Nowadays, it's almost with technologies like docker, with all the new routing technologies that are coming out. It's not that difficult to produce kind of a toy platform. The real key is in producing something that manages itself maintains itself and that has high available highly available stateful data services behind it, and we believe that those are the key differentiator for pivotal CF.

A

Our customers believe that as well, our customers are really excited by the Cassandra product that we're building with a datastax partnership and and it's one of the most important projects that were working on right now, so anyways I want to wrap that up quick. Thank you very much for your time. If, if this interests you and if you're interested in living in London, we are hiring on the cloud foundry team over there.