Apache Cassandra This Week In Cassandra, 15 Apr 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: This Week In Cassandra: Tales from the Field 4/15/2016

Description

Link to blog referenced in video: http://bit.ly/1MwM8c4

A

Alright we're back with another this week in Cassandra this week we have with us Ryan's fila from datastax. He has seen about 500 grape clusters and maybe 500 broken clusters, not really sure if that ratio is correct, but he's seen a lot of clusters in his life he's going to be talking with us today about some common anti-patterns and things that people just constantly seem to do wrong, which is a really good time. Ryan. Thank you for joining us nice to have you thanks for having me preciate it.

A

We also have Luke, Tillman and Sheree hi I also day to stack. So it's it's. A data stack screw today, not a bad time. You know we're people too, so we get to talk so first big news in the Cassandra community I got my first haircut of the year. Just want to I mean there's lots. I mean there were lots of rumors leading up to it. It's confirmed the hair has been cut, just everybody can relax now, but also we have the Cassandra 3.5 release, which is just gone out.

A

This is a pretty cool release, is rehi was well. Actually, you were just talking about it. A few minutes ago is rehi. Yes, you know 35 release well,.

B

Actually, dow really for the the cuidado branch, we have only four bug fixes, but most of them are critical. In fact, two of them are critical bug fixes for sassy. One of them is about the sassy going om if you have to brick SS tables and the second one is like a really really nasty um index corruption. So, if you are planning to you Stassi, please use three dot: five and not tweet at four.

A

And you are, you did a lot of testing for I.

B

Intended the index, corruption bird in fact be frosting. It like I I, need to fetched like a 20 million rows out of gas on by using sassy before seeing the first exception. So it's really really really nasty back and really hard to find so.

A

How did you, how did you find it? What were you just ago well.

B

The test, what the idea was if I want to full scan, not really full scan but, like a fetch, a third of my table out of Cassandra into spark for analytics right. It is an analytic scenario, so I want to use a see to filter down a little bit of my data, and this is how it works. So I'm, just fetching out of the data. Ok,.

A

um Ok and you, you thought you were loading up like how many billions of rows did you put.

B

Operationally I was loading up like up to 3 billions and 200 million girls. Ok, so the cluster was like 13 bare metal machines with SSD and lots of brands and lots of CPU perfect. Then.

A

You didn't put it on the sand. No.

B

No, don't even talk about sang with me.

B

A

B

Not funny I think hopefully now we have something stable and at least I have that easy on, like a one vegan gross, oh yeah, we never know it may have another nasty bugs, but at least for now, with this amount of data, it does work cool.

A

Awesome stuff that works I, like it yeah.

B

A

Very cool I'm, so 35 that you know we talked about this before the odd. If you can tell what kind of release it is by the number at the end, if it's an odd number, it's a bug-fix release, 35 is just bug fixes. There were no new features, so if you're sort of like bleeding edge- but you don't want to be like as bleeding edge, is very high, maybe um you should think about rolling out Audrey leases.

A

Maybe if you don't care, maybe you're more, like Luke and you're like I'll, just put anything into production, I, don't even care! If it's! If it's been tested by anybody, um just go for it. You know you're more of like a free spirit. I then you could put three four or perhaps 36 when it comes out right.

C

Look good advice, John.

A

Thank you. um So the next thing that we had there was a blog post on planning. Sandra I want to talk about going multi DC. This is an interesting blog news because it kind of talks. This guy wrote his own.

A

What's what's the return, balancer load, balancer, Thank You mo balancing policy, actually I, guess to be technical, yeah, its own load, balancing policy for this thing so Ryan, we were talking a little bit about this before, like kind of what's your. What's your take on this well.

D

So I this is actually really often requested in pre sales and support where I've worked in my different times of data stacks and the problem with the problem with it is a lot of people that want it, don't actually want it, and so, like I, think there's some valid use.

D

Cases for people are saying: okay, we can handle the latency, we want to have a local quorum and we want to and we're okay with having some temporary inconsistency, because when you have like, like, let's say, you're doing, local quorum, reads and writes, which is sort of where this kicks in is like, if you're trying to pin to a DC- and you want to fail over to another DC the problem with that, if you're using local it sort of implies, it implies you how local corn, probably you have a strong consistency, meet you're, saying, maybe I have three replicas and I at most want to have one of the I always want to generally get the right answer.

D

Maybe there's a race condition. Maybe a millisecond, but I always want to get the right answer, and so, where this, this sort of thought process at least I find people are surprised by is, if you have a transient failure and let's say that you fail a right to your local data center. Maybe one of your replicas or two of your replicas were down you're, going to fail over to that remote data center. But now you have this extra latency.

D

Maybe you don't hit your SL A's and, depending on the way, you've written your load balancing policy. You might try again with that local one. Now your rights might have succeeded with the remote one, but not the local one. Now you're, inconsistent yeah.

A

D

This is where a lot of people to do this, actually don't like the result. In the end, they don't like the extra latency that they get when they fail over and two it doesn't cover the common modes that occur. So you end up in this sort of anti Goldilocks world where you don't. You are too slow to satisfy your SaaS and you're too inconsistent to satisfy your implied contract, and so it's it's.

D

It only works for the cases where you where you have like an outage at Sandra cluster and your app servers are still up, which is uncommon, yeah,.

A

D

A

But even even if you had that scenario, let's suppose you or to like the alternative is to fail over to another DC entirely and just say, even though my app servers are up, it doesn't matter like it had to go around the world, for instance, and let's say you did like you know, 50 queries and some of them actually depended on the previous queers yep you'd end up with a situation where you've added latency to each round trip of queries right and go yep yeah.

A

So you might as well just fail over to the entire DC. Is what you're saying yeah.

D

Because your app servers we're talking to in number of Cassandra nodes and so now you're taking now you're going over particularly a constrained pipe and sending all that data over and if you've thought about that and you've size. For that, there's no problem with it like if you like, I, have customers that have over 10 gig links between data centers. You know they're actually close to each other, they're, actually just different failure zones, and so they have the latency where they can go. Okay.

D

All these app servers are going to talk to this other data center and that's fine and that's that's- that's completely acceptable, but most people that do this haven't thought through that, and it's one of these running with scissors type processes. Just because you can do it doesn't mean it's a good idea, yeah and- and so you definitely really want to be careful, you want to think about alright.

D

What happens in the case of transients outages can I handle having a right succeed in one data center and then my next read is going to another data center where that right may not have actually gotten to yet yeah really get yourself innocence and prizes where you've lost that nice warm blanket of local quorum and you potentially introduce some invariants and you've broken some invariants in your app without realizing it yeah.

A

So that you hit it, you hit an interesting point a little bit earlier. There was you talked about having to tend geek link, I. Think one of the interesting things that a lot of people find using is kind of the difference between bandwidth and latency. Right people, frequently like you, could have you know a 10,000 gigabit connection, but the distance between your data centers you're, just fundamentally limited by the speed of light there right.

D

So so, there's two there's two points, though so in practice not only does link speed exactly what you're talking about throughput and latency are not necessarily related. In fact it, sadly, as you get further apart often times it means you may be, only have 155 Meg lake.

D

Not only do you have a the electrons or traveling slower, so you may be only you have to pay a 300 millisecond penalty between here in London, I think it is, and there's not really much you can do about it, but you might only have 155 Meg link between those two regions and so now you're sending maybe three gigabytes of traffic across it, constraining the pipe making that already bad latency many times worse, and so, instead of 300 milliseconds you're now taking 10 or 10 second latency hit, because you can constrain that pipe and now everything breaks, yep, backup, pants, all that fun stuff yeah.

A

That's uh yeah, it's a common common mistake. People make just I. Have this huge connections like yeah dude you're, not breaking physics.

D

A

D

And I'm not saying this is a bad idea. I'm saying you have to think about all the other things you know.

A

Commonly misunderstood Luke any thoughts, I.

C

Just going to say that I think you know he kind of points out in this article too. If you know, if you're listening to this and you go and you check out the article on planet Cassandra, he does kind of point out the different ways that you can fail over. So as opposed to doing this sort of round robin you know this or this DC aware kind of failover policy in in the driver. He does talk about failing over. You know, basically the entire data center and and uses netflix.

C

I mean we love to use netflix as the example of you know, but yeah he totally uses netflix and talks about failing over at the app app level, and I think the point that brian made is is important because a lot of times the reason uh you know that people are doing you know multi data center. Cassandra in the first place is because they want either disaster recovery. You know kind of scenario, or they want to get the data closer for better Layton sees you know better customer experience, essentially right.

C

They want their their clients that are in Europe to get you know like low latency, good good experience with their with their site, and so you know this idea of having, like you, know your European data center potentially spewing. All these queries over to your you know us East data center you're, essentially going to make the latency worse and kind of defeat. The purpose you know of why you did it why you went multi DC in the first place, so interesting read, definitely recommend checking out the article yeah right.

A

On I, so this is this is kind of an interesting segue for us one of the many really cool parts by having you on Ryan is we get to talk about the like other ways in which people have tried to do things, and you know either succeeded or I think you've seen a lot of their failure. I mean you've been basically I mean you handled solution, your solution, architect, for a long time, now, you're doing a lot of support and really kind of helping people through problems that they um see.

A

Infra deputed are ya in in production, so it's kind of cool to like, like look at more of like a macro scale of like just the things that people frequently do wrong. So you mean you were kind of talking to us earlier about.

A

You know the issues with I say the load, balancing policy people think they want assert this load, balancing policy which will, like all of a sudden start connecting to a remote DC, and then they kind of like breaks like the contracts that they've set up the assumptions that they've made, and so what are some other things that you've kind of seen people do um like.

A

Basically, what patterns of people like led to like? Oh, this is logically the thing that I want and it turns out. It's like the exact wrong thing. Yeah.

D

Yeah, where your mitigation strategy creates the problem you're trying to prevent yeah exactly so, a common ones are too aggressive. A consistency.

D

Level, ironically, creates by aggressive I mean if I did every raid and right at CL, all I'd be much more likely to have partially succeeded, reads and writes, and therefore much less likely to actually satisfy my consistency levels because my servers are down, and so you have an ironic situation where, in the aspect of I'm, really afraid of inconsistency, I've created potentially more inconsistency because I have a bunch of nodes offline, but now I have to say, I have to make sure I haven't lost data and all these other things that I might have invented as problems in the process.

D

A

D

One is not as common as people doing quorum or each quorum more often than they probably intend to for their use case mm-hmm. So that's really common and and generally misunderstandings in sea levels and uh one common very common manifestation. That is the downgrading consistency, retry policy, where people will have a very aggressive initial retry policy and then, of course, it'll fell. It'll fall back all the way to one which never.

C

Understood that yeah.

D

Yeah and I and actually I've had some really good debates with really smart people defending that and and I respect it and I. Think if you understand the nuance and I don't want to tell people not to run with scissors, you know if you know what you're doing and you're okay with the one and you're whatever, but it's it's sort of a transient failure of ditz, lower consistency, pauses lower policy, and so you wouldn't know about it.

D

Without you know, some sort of logging right you'd have to add something to the policy to actually know that you basically been doing CL 1 over and over and over again, and you may not notice that for some time and.

C

So that's a surprise.

D

For people, okay,.

C

This is me because it's like, if you're gonna fall back to one I mean it implies at some level, you're, okay with one so.

B

Right you just use.

C

One in the first place, so I I will.

A

I will defend a few cases of every guy. I've actually had the strongest are.

D

Yeah yeah yeah yeah.

A

I there is, there is a case for it and if you have a system where, like I work for a company where we were doing, we were doing some analytics basically right and we would like we, we tried our best to have the information, be really really accurate. But there were times where, like you know, maybe there was a problem so so we would rather show our page to someone with a result and a thing that says like warning like this may be totally correct, but here's something anyway and so.

D

I think that would be a great idea if you had some sort of bit of data that was tracked down, because this is a policy took it to another level and said these reads and writes: were 10.

B

We don't have this in the Java driver, you can get. You know the execution info back and you have the query, consistency table and achieve, and she said yes and you can get this info back. Definitely.

D

But you have to block that and make that part of your app and get ahead of it. Yes, a TI.

A

Ya, that's the case not just like not just like, oh whatever, like know, if, if you're comfortable with being like yeah I downgraded to one and liked I, don't need to tell anybody like if you feel like you, don't actually need to tell people, then you probably could have used one of the first place like I, completely agree with you yeah, but that that's the nuance right. It's like if it meant, if.

D

It doesn't matter.

A

Enough to actually tell the person that did the request, then it probably doesn't matter ever and you might as well code your app against that logic and at least the cabinet, your brain, while you're writing, as opposed to like writing under the naive, is something like sumption that you'll have quorum and they'd be like aubs, whatever yeah.

D

So that's actually a relative calm when I see people go from coram to downgrade inconsistency, retry policy and and are really surprised when they get the result and a good example of this was I've, seen people doing important things like oh fact check with downgrading or consistency retry policy which, for those of you don't know, that's a fraud check and there's massive fines attached to getting the wrong answer mm-hmm and- and they had no idea what they're like well, we were in this in case the one node is down I'm like, but for you know we have no doubt, but you don't want to give the wrong answer.

D

No no I mean you have to like just be honest with yourself about what a consistency level means at some point and right. So, if you're going to use something like any database that accepts multiple rights, whether that's in a global data center sense or na, because you know this applies to whether using my sequel or Oracle and you're using multi dinosaur application. You have multiple sources of truth as soon as you have that you have to be aware of that inconsistency.

D

In your read path- and you can't just naively fall over if this really important that you get the right answer if those rights haven't replicated across yeah right anyway, so this is sort of the same idea with downgrading. So that's a very common one. Okay.

A

Let me let me ask you a question about that. So, if, if, if you don't give an answer, let's suppose you're just like I don't know is that worse than giving the wrong answer.

D

Depends on the use case, 40 fact: yes, okay, that means no! It's like you're, better off, saying, I, don't know the answer right now. Please try again, you know, if you can just say failed is better than so.

A

That, okay, so that's that's an example of where you'd rather have lower availability. There, then, then something that's a few milliseconds on today. Yes,.

D

In fact they would, they should have been using lightweight transactions actually and their exact. That's a use case. We should locking mechanisms another case you should use lightweight transactions and not like a downgrading consistency. Retry policy so but I mean this stuff is coming because it sounds good. It sounds like oh I want to be really available. Is why I picked Cassander in the first place, it's very seductive, yep and and and and to see I've had multiple people say.

D

We can't use consistency level, one because it's too important, but we have to use downgrading consistency, retry policy yeah, it's all people say that and it's a logical incongruity, yeah, there's.

A

Definitely a a gap in the I. Don't know the thought process there so.

D

What did you want to talk about any of the other ones like yeah.

A

D

Sizes or a repair not being mandatory or I mean there's there's a.

A

Whole bunch offering well I think I mean that the repair one is really interesting. One because I think people frequently say I, don't do delete um so I never need to repair just very throw that out. There I've actually seen that many times I don't deletes so I, never needed repair.

D

I've actually heard it gone further than that. Well, we haven't had any problems, so we don't need to run repair. Oh, it's like I think are doing, deletes.

A

Yeah thatthat's I mean that the delete okay, the deletes, are an interesting one, because obviously you can have Sami data right like we need to get those tombstones in place before Greece Jeezy gray seconds by live, but there are there are tons if there are absolutely cases where you need to do repair, um even if you've never issued a delete. I've, you have. You worked with people who have just been like what whatever.

D

Oh yeah, I know, I would say, probably thirty percent of new users to Cassandra. Don't think they ever need to run repair that I've worked with at least and I have to convince them that they need to my next.

A

Question was going to be, would you say that is wrong.

D

Yeah, so I actually did a blog post about it. Called Cassandra repair should be called mandatory maintenance and instead of like no tool, mandatory maintenance or something like that or you know, like a warning- should go in the logs, you haven't run repair in XY and z. uh It's it's! It's just a very thing.

D

I think a lot of is a lot of people come from a background where you don't have to run this sort of operation on the database and I I can see the argument for because of the nature of where cassandra is such a mind. Shift I can see the argument for maybe it's something we should be handling on our own. Maybe it's part of a server process. Yeah.

C

So that there is it for.

D

C

Is there anybody know why we haven't liked me about sort of like compaction where it runs in the background is sort of a you.

A

Know there is a juror for that. um It's.

B

A midfielder, it's an F is like a met, so many many Chara at that school because.

C

B

Complex because a repair operation is very, very intensive network intensive and this IO intensive cpu intensive. So you cannot take it right. It yeah you'd have to have a lot.

D

Of throttling, like.

B

We do have action really.

D

You know where people opportune compaction or they make too much of the node a compaction engine that it has nothing left to do anything else. Like my all time, favorite then throttle the compaction which people will use to get out of a problem, but a lot of customers run that as their normal default compaction, throughput setting, isn't Ronald and then level then they'll submit a ticket for why do I have bouncing latency okay, so this is really common.

D

But again we you have to remember the historical like, like sometimes I sound, like I'm, being like hard on people, but this is all news: is all complicated, we'd have 30 years of third normal form, training and different? You know strategies with databases that now we're telling people the opposite thing up, and you know they haven't, haven't think about compaction in the same way that we now just you know have to for them, and tuning is not ideal. We need some more auto tuning in Cassandra I.

D

Think to make this something that they don't have to get wrong, but yeah, so compaction is really tough for people to tune correctly. I see people crank current compactors, really high, that's very common they'll have spindles, and forty eight cores and they'll have concurrent compactors at like 48 and and they're 7200rpm hard drives will be dying and spindles going everywhere. Yeah.

A

Well, well, even even you know, even in the Cassandra world, or just the tech world in general, like enough, has changed the last five like five years like, let's not, let's forget about the last 30 years. The advice that you may have gotten five years ago or three years ago about like let's st. JVM tuning, may be completely different right now, for instance like if, like the all the advice that everybody came out two years ago about tuning the JVM, for um let's say right, heavy workloads is like horrible.

A

If you have a rehab workload on solid-state drive like it will destroy your cluster and that's what we found in my last company, like we put out like one of the first posts on how to tune for read heavy workloads and we were doing on solid state drives and both what was it.

D

Everything was.

A

It a problem because I you need a really big new gen size. If you want to do read heavy workloads, because you have a lot of these short-lived objects and if your new gen is too small, then they get promoted prematurely I, don't it with a lot of you end up with a lot of full gc's, because you're just constantly like reading a ton of data off disc and pushing it at your old jam and then you full GC.

A

It's like you end up with, like you know, 10-minute average latency on my queries, which is you know that doesn't meet your SLA at all. Oh bunny.

C

I think even that advice that you guys had like you know it's probably like what you're too old. At this point, it's.

D

Really a bit well I.

C

Mean it's moving, so things move, especially with Cassandra, so quickly, like with g one g c vs. So you know like we got new garbage collectors now in java. Yeah. That's like you know it changes so so quickly. Yeah.

A

So it's hard to keep up with it, and and it's like and you've got this distributed system and like it's, you know, maybe you have it spread across 500 servers and it's just like how does this thing behave like it's just hard to understand, I think sometimes so um yeah. I agree, there's a lot of areas that we can. We can do to improve these things like there's a lot of auto throttling things that can happen.

A

Yes, one hundred percent on that kind of, like kind of like how, like you just brought up to you, want g1, will auto size, the number of new generation regions, old, gen, survivor or like it will take care of everything for you eventually. I think that you know you'd probably want to see that out. Your database too yep.

D

Yeah tickly, with all of our processes that are new for people. You know well.

A

The last thing I wanted to touch on with you was storage, because people this is this even like even I- hit this and I'm not dealing with people in the field as much as you are I'm dealing with people that are new to the Cassandra world, Luke I, like you and I, have gotten this question million times. We hi you've probably seen this a bunch of times. I mean you in the beginning. You were like dude, say San around me.

A

Wouldn't what are some of the like? What's your? What's the worst you've ever seen, someone trying incorrectly put a sand into production with Cassandra right.

D

Oh boy, so recently I had a ticket where somebody had four nodes per vm and they were all sharing the same one. That was pretty bad and it's like a 72, its engine, rpm spindles and raid five configuration. So that's that's pretty bad. The worst was I had a six node cluster, all sharing the same lon and that one was falling over.

D

He povo em, and they were large rights too, so that made it worse was falling over om at 14 TPS for the whole cost our whole cluster, but literally just all chain offline, because the disk would stop moving, it would just get frozen basically and, and so the nothing could happen, and then the rights would just back up and it would just om and- and that was pretty amazing, yeah large batches people, big batches like like, like you, know, like to meg three Meg, ten big batches, yeah yeah.

A

um So I'm, not okay, that's two things: ah I wanted to make sin the last thing: yeah ah what what dumb shits and Sri hi? What? If what have you seen with what seemed like gnarly Sam business? The youth is using the.

B

I didn't see really sad people using Sun because I stop them before they use it. But I have like a very interesting question like oh by the way we want to do some performance benchmark, ah but it's really really. The resort are very really bad, so I asked them. So how did you do your performance? Oh, we pop up like us type virtual machines, and then we start injecting data and then just fail. Well, it's 5dr machine sharing, the same disk. Sorry, there is no magic all the way to same issues. That's.

C

Funny I, like people benchmarking on sands, like I, think one thing that that people don't know is- and this is true benchmarking Cassandra in general- is they'll. You know they'll run a benchmark that runs for five or ten minutes or half an hour or something like that, and data never ever actually gets flushed to disk in the first place, or they don't run it long enough where compaction actually kicks in so like they kind of get this false sense of security that you know. Oh my sands.

C

Okay, you know like it seems to the perform okay and then they get it out into production and then you've got compaction. You know you actually got stuff being flushed to disk and, like it's a whole other story, so yeah.

A

Like trying to try to do a read, heavy read heavy test when all of your data sits in memory, yeah.

D

A

Lions yeah: yes, he just kind of wasted your time so.

D

I'm, actually, when I was in pre-sales, I had about seven accounts, just use a single node on sand as their as their POC single.

A

Knows we're not going to even hint we're not naming names yeah, that's durable, yeah awesome! ah So.

D

Yet so don't know Sam, yeah and actually test with more than one note, in fact, maybe five yeah.

A

Definitely do that and don't use this and just use your own. Please dedekind discs, just you know whatever. Oh alright I think we can wrap this one up. um Ryan appreciate having you on. Thank you very much for your time.

D

A

You want to say before we cut out.

D

I guess just really before you get before anybody gets started with Cassandra, make sure that you think about things as how they have to work not as magic or not as Cassandra's just handling it for you magically. This is a distributed database and if you don't understand how something works, ask definitely ask it on the IRC. There's some really help for people there yep.

A

Mailing lists, Stack Overflow, would ever asks somewhere. It is up on Twitter play anywhere yeah, all right cool. Alright, let's wrap it up guys. Thank you, shut it down.