Kubernetes Big Data Special Interest Group, 14 Jun 2017

Previous Meeting

⏯

youtube image

►

From YouTube: SigBigData Meeting Demo (June 14, 2017)

Description

Cassandra on Kubernetes by Matthew Stump

A

B

All right we're ready, okay, dr. Cornish, Hey, so I'm Matt, stump the CEO and co-founder, of course, hello. What we will be been doing in the past three helped an IT people migrate legacy work folks from UM from data centers to the cloud, so we've got customers in AWS and GCE.

B

Historically, we were a shop or the dealt with big data systems provide without a word, I'm rarely Cassandra, but a lot of Sagara can cough. Hang things like that. The Clinton name was the perfect and driver and worked with custom Iago problem, many X's such an excellent. So the customers that we're dealing with are really large deployments like one of our deployments, us Cassandra's around six thousand nodes and for migrating that's in GCE, and then we we also primarily deal in regulated environment. Oh thanks.

B

How come one of the issues that we've been into is that these people didn't have the appropriate staff to run these cultures like there's, not a whole lot of large-scale Cassandra people. So we thought that instead of trying to build these people, we would automate it on top of kubernetes so that it would make it more.

B

So you don't have to spin up a whole bunch of office people. So what we've done is we started out initially on the work that Chris did with Cassandra and kubernetes, and we built on that so in the container or Cassandra itself. We're using.

B

We did a couple of things. First, is that staple sets, although they're very good at getting us at ease and reattachment volumes and the the naming conventions around cause making sure that once husband's other times, if you did things like lose and EBS volume, the data would go away and like nodes, wouldn't know whether or not they're, a new node or a replacement zone.

B

There's a lot of things that you need to do when doing rolling, upgrade and flushing first or it's like all the mechanics of actually running a cluster, and so we put a lot of effort into making the please continue for Cassandra viable and we've been pushing that to open sources well, and we can put your little utilities can make this more viable. In addition to that and I'll start sharing, my screen.

B

We put a lot of effort into tarts to make this stuff viable, so we not only have the Costanza card and that's been written down.

C

B

So we can maybe up the Cassandra staple set which isn't that interesting, but when you dig into each one of these pods, we do things like add additional containers around it. So we'll have the Cassandra container, but then we'll also add in the a log forwarder and which, in this instance we're using file deep and we'll use the JMS exporter from Commedia.

B

Initially, we tried to just have all the logs forward to whatever the default log sources for standard out, but one of the things that we ran into is that there are different teams that own different namespaces and the ability to segregate logs and make it visible to that application developers that are difference in the option. People was necessary and then also some of these clusters are are secured in private or may have personally identifiable information. Some of them are.

B

Some of these clusters have requirements that only certain individuals of certain nationalities access done things like that, and so we needed to be able to segregate logs thanks to the local and I know it's called website. Hugging excuse me.

B

I'm going to continue so we added the file be container in which bold and modified.

B

Standard out is just in their standard out, but then we also will dump Jason encoded logs to a shared volume between the Cassandra container in the logs container and the log border monitors those logs in for them to a log session. The other thing that we went into is that old Cassandra for jam max forwarding. It was by default for pushing out something like 5,000 measures and when you pull that using the.

B

Via from easiest it could, especially if the nose was under load because we're doing a lot of load, testing right now, testing viability of media, the he could kill the Prometheus scraper I'm. Sorry, the Prometheus JMX exporter, like you, did jetty errors like not enough resources, so we had to break that off into its own container as well.

B

So it's really a three container pod for Cassandra itself within the rest of the ecosystem we went through and we made it Selenia Cassandra, yes one, but we've also done things like create the staple sets for a three-tier deployment, action, three-tier deployment of log stash and I'm. Sorry, a free tier deployment of a lawful search as well. So we have a client here, master tier. Then we have a faithful steps of the actual data notes as well and then and which makes that we can get the data notice up as necessary.

B

We actually can scale out each tier is necessary and then we've got the persistent data within the within the data. No, the in addition, so that we do have Ravana and everything else set up this deployment. The.

B

The and we've also done the same for from medias as well. We are starting to run into the limitations of a number of metrics per second that Prometheus was able to collect right now that the Prometheus server with a load of a standard process under load X coefficients LGR, is routinely pushing 50 60 megabits per second down to disk. We are trying to work on reducing the number of metrics that are collected from these employees to make it smaller, in addition to Cassandra we're doing the same stuff for zookeeper and Kafka as well.

B

We've had a little bit of interaction with a spark team to see if we can move production spark or close to kubernetes, but it's bit still early days. The other thing that we've been doing is.

B

Testing out the viability of cops for supporting production environments, so this cluster right now it's running AWS reusing the alpha channel of cob and we're using kubernetes one-seven beta1 in what we're doing right now is testing home the staple set rolling update functionality because it right now we've got nine nodes in this cluster.

B

The the one thing that's been nice about cops is that if we.

C

B

You real quick is that it does allow us to run spot instances so right now, I'm running because I'm I'm doing load, testing, I'm running with larger heaps like 32 gig, just standard heaps and 10 gigahertz have CPU per node and it's expensive.

B

If we're to use just an for 10 X larges by themselves, we're cops were able to use odd instances so I'm running in us, peace to of AWS I can set my max price for the instance height, and it actually works pretty well every once in a while and incidents will get yanked out of me because, like the bid goes up or it maybe there isn't, there isn't that instance available and the auto scaling groups of AWS will automatically relaunch instances and we're able to actually keep workloads running.

B

Uninterrupted for the most part like Cassandra link, I will lose a kiss engineer for a moment, but because we have an RF of three we're able to keep load going at.

B

A local quorum and continue the test and then a couple minutes later, when the new nixon's backup instances will spin bakka organ moves to another available note and the locus will continue and.

A

So the spot instances going.

B

Away is in really initiative and they're really really nice. The other thing that we've been doing speaking on the topic of is that, as we are in a regulated environment, one of the things we've had to do is get through audits, with all the AWS roles and all the containers and PCs and all that sort of stuff.

B

So we've sponsored Chris to make it so that we are able to use our own AWS rules and we are able to the audit the interesting that it's going to spin up and we do have self-hosted containers for the entire community employment. After that, we can bring it through ways, security skins and produce that, as an audit report to our customers right now, all of our workloads are pre-production, but we're looking at moving a bank in the asia-pacific region to cops for a Kassandra workload.

B

Ee means of production in around October and we're going through the audit and.

B

The readiness class for that right now, one thing that is causing an issue is really dealing with exposing the the Virtual Private Network.

B

As you said, Sandra I'm, sorry that cue grenade uses internally to the mystical Sutton to the legacy network. So all these people they have existing Cassandra deployments. Those won't necessarily go away, at least not initially ability and includes McLeod and Cassandra, because they use constant protocol is, will broadcast the IP address that it's that urban a signs to it, but that's not how real network.

B

So what we're having to do right now is we have to use the first Network you can cause like really where we find sorry that note that you were reminded Todd to a node as one of one Sandra. You know it allowed convert for kubernetes and that and we just sized the instances for freely. So we use different, auto scaling groups and different affinity and anti penny rules to make sure that they know to get the points to the correct, kubernetes node and that's a great thing. It is dedicated to a production, exchanger engineering.

B

We are evaluating whether or not calico, with its support for bgp is an alternative moving forward, so that we can join that two networks at the moment. We're just really first lightly think what else one of the other interesting things is that we were responsible for the original recommended addition early on, because we've been doing this a number six years to not use EBS and only use in store volumes or rebound and testing for both search for floats and for Cassandra workloads. That EBS is fast enough, especially the GPU instances and the.

B

The io1 instances the court production will close under look right now. We will typically push somewhere between 10,000 and 20,000 PDF or cassandra noted with 10 gigahertz dedicate CPU, 64 gigabytes and memories independent good night. So it wasn't that's pretty much what I wanted to share. Oh one other thing that we're doing so we started off with helm. Helm has a lot of issues.

B

First in API is sort of crap and it like Maine like we're going to be working in what you can think of as multiple environments for each tenant is a development group.

B

The ability to understand who's deployed what and understand what versions things are on. Essentially, all the control that you would want coming enterprise standpoint that right now is enforced through labels in helm and Convention, and there's nothing that really stops somebody from modifying those up from underneath you are not doing the correct thing so we're switching to Service Catalog anybody's workers for each of these technologies, so that we can not only spin up the the core service of everything around it. One of the things that we're doing is good, generating operators.

B

You simply and sort of the pattern that that the.

C

B

Developed it for medius operator to manage all these instances to automate the dividend events so the chain don't necessarily need a operations manager. So that's what we're doing and I can say anything. The questions are to show you code about what we're doing. It's, not the most exciting demo. It's just watching vehicle cuts going up.

B

That's pretty much it questions.

A

Yeah I know you said yeah so I don't know: I had Milligan, okay.

D

All right, so that's Matthew or you take a demo. So.

E

I said it: I have a question for Matthew. Yes, I was just wondering like I was looking I looked into iris catalogs, but I start. Can you hear me Matthew.

D

You're all muted John nature.

E

C

Song, this.

E

Is on okay, can you guys not wait? There's a still amusing down there, Bloomberg.

F

E

I can hear you, okay, so yeah sorry, so I was just wondering. Cuz like we bought. We've actually been looking at that home as well like what just looking on the github page for service catalogs. It looks like it's still an alpha. Have you had any issues with a robustness or stability with it.

B

Not at the moment, I mean we're still early on. It's pretty simple, ready and brokers, pretty simple! So it's it's right! Now it's working for us. We haven't run into issues. The main issue for us has actually been stability of QPS eyes on those pretty pretty going on. You know it's like we have to be monitoring app and you ready to bounce like you ready, snow fairly regularly.

C

Absolutely this week with everything: okay,.

E

D

Thank you, yeah I guess, so we do have some other stuff to get to sorry.

F

I had a quick question as well for Matt I'm curious. If you had done work on node or rack level locality as far as sending that data into Cassandra's, that it can better place. Data between his services yeah, the initial públicos.

B

That we contributed x to kubernetes, it wasn't doing any wreck wariness, but what we're doing is we're swing in the labels from the underlying code on the weeds, the move, cops deployments across three AG's and and we use the the affinity of rules or anti family rules to make sure that the nodes are placed evenly across the three APs. One thing that we have run into I haven't doubled, listen, 1/7, but there's a lot of issues with paints in.

B

1.6.4, where, if they call them, knows we're available and different these, it would be one pillow with room. Okay,.

C

B

There's not more um and so different running here instance, groups dealing with the FAA become a little bit like you. We do plan on now we're using just the DNS name of the underlying pod to do speed discovery, but we're going to switch that to a Cuban, Eddy specific seed provider and they proven a um it's. A big snitch they'll. Make it easier to consume is amazing. Okay, now we're pushing.

D

B

Our son script, that's part of it be funny, and one script does eleven I collect up all the environment variables that precedes the rack, all vectors and modify the ammo for miles, but I forgot trying in this a little bit mostly to keep enemies. It's.

D

B

D

Together, if we're one.

B

Binary recent fradulent.

C

B

With Minnesota and Turkey.

D

On me, when water says.