Apache Cassandra This Week In Cassandra, 1 Apr 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: This Week in Cassandra: Digging into Sysdig 4/1/2016

Description

Link to blog referenced in video: http://www.planetcassandra.org/blog/this-week-in-cassandra-digging-into-sysdig-412016/

A

All right: what's up human beings, I am John Haddad. This is this week in Cassandra with us. Today, we've got Luke Tillman got Patrick McFadden, both datastax, and we have Jean Luca Morello from cystic right. I thought. I do okay, with the name yeah.

A

So last week we talked a little bit about sis tape, blog post, that you wrote on planet Cassandra. So this be a cool opportunity to go into detail with that for the second half exciting.

A

So we got a couple. A couple JIRA's moving here that I kind of want to take a look at before we approach the Cystic topic, the first one that I wanted to touch on is this dtcs jira? You can find the link for this in the this week with Cassandra blog post, but there's a dtcs alternative. That's been proposed effectively.

A

So when you're dealing with lots of time series data because the controller you know you've had the compaction process and if you have huge partitions that start growing and you're using something like size to your compaction, you can end up with kind of unpredictable performance as a result of compaction, so it can get a little bit hairy and then there's alternative authors.

A

A connection strategy called a tiered compaction that specifically meant to handle time, suits / clothes and there's an alternative called tiny little compassion, which was written by Jeff jirsa and submitted and he's trying to get it basically pushed into main line. So that's kind of what we're looking at they're. The.

B

Sort of the controversy or the reason why this is a jira ticket, so maybe to back up just a little bit and give some history is that when de terre compaction strategy kind of first came out, there were some some issues with it which they're linked to. Actually, if you go and look at this gear ticket, you know there was some performance problems and then there were some concerns about. Maybe usability.

B

Would that be fair to say, maybe like that, though, you know that it was hard to configure hard too hard to use and so time window compaction kind of came out of users. Frustration like yes, they wanted something for working with time series data, but they wanted something. Maybe that was easier to configure and then the way I read through some of this is that they wanted to get rid of the tiering part as well. That's part of the compaction strategy, and so now we kind of got this this ticket.

B

Where they're trying to decide you know, can we just we merge time window compaction strategy in? Is it one or the other like we have to pick between the two? You know our users going to be confused if there's two sort of time series, compaction, strategies and and I think probably the reason we want to highlight this is maybe not necessarily to take a side one way or the other. Maybe I, don't know John you like taking sides. Oh.

A

B

Have an opinion like you say that I do but but get users that are out there like, if anybody's out there. Listening to this and they're doing time, series data and maybe they're using one of these to compaction strategies to say, hey, go, take a look at this ticket. You know the more information we have from you out there in the community on how you're using things the better off this decision is going to come out. Basically yeah.

A

We definitely need this is. This is a great case of like community feedback being like absolutely necessary to make the right decision on this right like if there's an everybody in the world is using date to your compaction and successfully, then you know, merge may be merging in another thing that does sort of the same thing: that a simplified version may not appropriate gianluca with the compaction strategy you guys using over there.

C

At the moment, we're using the level tier compaction, I, admit, I, have been kind of interested into experimented into experimenting with something newer and more advanced, so I admit what you just mentioned. It has been in my to-do list for quite a while now so yeah yeah.

B

Do you guys store time series data at all? That's just yeah.

C

Yeah we essentially the main use case for our biggest class under plaster, is actually storing big time series data. They are not time series in the typical in the typical way. You would imagine so we just don't store numbers. We just say we store time series of large binary blobs. So for us, a time series is a series of big binary data at the periodic time interval across time, and we have different, let's say timelines that goes from data that has a few second granularity to data that has a few minutes to a few hour.

C

Granularity, so definitely a time series. So that's why these compaction might actually be useful, but we have been quite happy with the LCS mm-hmm.

A

And I'm, assuming that you're running everything on solid state drives yeah.

C

Yeah yeah yeah. We are running everything on AWS. At the moment we are in ac3 dot for excel, so very expensive, but very fast SSD yeah, we are thinking about experimenting, was with the eye to to get a larger, essentially a more dense cluster and reduce the cost per gigabyte, but so far with the yeah with the c3 and SSD has been good. Okay, very.

A

Cool, so how big I think of a cluster you guys working with so.

C

We have essentially different clusters, for we have essentially several application and we have clusters of all sizes. I wouldn't say that we are overly big I, think of our biggest clusterings in the order of 30 nose or something like that. But we constantly experiment. You know with the fake data traffic replay years, and so we come up with pretty articulated test scenarios. For example, just a few days ago, while I was troubleshooting, one of the while I was essentially doing the troubleshooting and then eventually led to the blog post that appeared last week.

C

I was trying with a very cool scenario where I add a cluster of 60 Cassandra nodes, all in docker containers on a single huge mm instance on cassette on AWS. So we had this cool scrape. That would just you know, spin up from nothing a Cassandra class and it was super cool with just one instance.

C

You have you, go and do log into a container and do not to status, and you see 60 instances and you can effectively measure you know all the cluster activity with a single set of tool without having you not to login into each server if they wear separate instances so yeah, we definitely experiment with quite a lot of those things. Cool.

A

Very nice, the next the next blog or show the next jira that I wanted to take a look at is adding support for the group iced statement to some of us, our group by clause to select statements, and that's ten, seven or seven again like two in the blog post, I'm already excited about this. This is this is there's like still so here. I really enjoy the constant, like features that are coming out as the result of tick tock like every two months. It's it's a Christmas and I think its employment news stuff constantly.

A

um So this is really cool. I think, look! You and I have done what it like.

A

20-30 talks together and people like how many times people say like oh, like what's the deal with aggregations, like you know, I'm asking it's just constant I think having even basic support for aggregations in Cassandra's is the huge.

B

Plan on this looks like, is it aggregations within a partition so.

A

It's Oh God, um there's, there's.

B

A lot of to me that that's what would make sense like performance wise, you know because aggregations to cross again introduced as a whole, a lot of thing, yeah I, don't think I, don't.

A

Think that we're looking at across partitions I believe that the point here is actually no I think it may be, because the original description says it should be possible to group either at the petition level or the clustering column level.

A

So there's there's a lot of information in here. I mean this ticket. Is this issue's been going back for, like four months now so mean.

B

Art are uh you know when people ask about aggregations that, like Cassandra days and stuff, our answer was typically like spark or you do it in your application. You know kind of answer so yeah you're right. This is a definitely an exciting feature. Yeah.

A

I think it is a cross, it may be across petitions, but I need to dig deeper into this to to be serving, but yeah like this is just it's so commonly asked for, and you know it makes it makes a lot of sense like I agree like even if the feature supports doing it partitions, your general, probably a horrible idea, it's kind of like you know we have user-defined applications right and we're like.

A

Don't ever do it ever cross partitions, because it's just crazy like it, you know you ping, 100 million photos back to the coordinator and then try and aggregate that it's like not cool, not cool, um so I. Think in general it's is one of those like you know. You can.

B

A

Don't even if you can do it um and better, you know you're, definitely better off having a good, solid data model where you can avoid some of the crazy nuts anyway and I. Think a lot of the aggregations that we're going to see are in time series data. That's where it's the really interesting. So you may.

B

Be a McCullum for progress. Yes,.

A

What's a good time, so I think I think lifting that that we wanted to talk about with you gianluca, you know about by your blog post last week, hoping you could kind of elaborate a little bit on. You know your experiences with Cassandra and how are you using cystic and I mean youryour I when I say using I should really say writing that you mentioned that you do rid of the first line of code on cystic. Are they, which is pretty awesome? Why don't you tell us a little bit about stick and kind of why?

A

Why you why you start working with it? How helped you with Cassandra yeah.

C

Definitely so essentially assist dig. We are a group of pretty serious low level system users, so we have always been big trouble, shooting nerds so.

B

C

When other people usually are bothered by the fact that they need to do some troubleshooting I get quite excited, I said wow. Now we have a system problem that we need to address, and so, while the ecosystem for troubleshooting is essentially a very, very, very good, in the open source world I mean in Linux. There are some extremely killer tools, simple one more advanced one, such as per for tools like that we thought thats is dig. Essentially we could do something similar.

C

They will make essentially troubleshooting much much easier and that's what we try to achieve with this dig and so essentially, since they can be thought of a sort of replacement and enhancement to all the traditional troubleshooting tools. So essentially, I like to take a cystic as a TCP dump, not for your network, but really for your entire system with some nice additions and it turns out and then so.

C

Thats is dig as a huge advantage when you start talking about about containers, because when you start talking about containers now you have all your processes, such as Cassandra, that live in a completely isolated environment and if you're going to use the traditional troubleshooting tools to essentially make live on issues, you're going to face significant challenges. For example, let's be practical in the in the example that I mentioned before that I have Cassandra cluster running entirely on Darkar on a machine which again was for testing purposes, but can actually really happen.

C

I mean we know of people. Some of our customer run Cassandra cluster with in coover Nettie's. So.

A

C

All all inside docker containers, if you want to measure, for example, the let's say the internode traffic activity- you're not going to be able to easily do this with netstat or I have top mostly because each Cassandra instance we live in its own network namespace. So each Cassandra instance, from the point of view of the system where you're running your netstat is, is almost a different network machine because as a completely different network stack.

C

So if you do next that you're not going to be able to see, for example, the connection that your Cassandra hosts are doing among themselves in the cluster, and these, for example, is something that is extremely interested in monitoring. Because, following up to my blog post again, I I didn't have the time to write it. But what we found out essential is that not only by doing that optimization, we were able to get a much lower response time, but we were also able to massively optimize the network traffic, so the measurement were essentially.

C

This I took my Cassandra cluster in a docker container and I started running my benchmark and then with his dig since essentially was the only tool with we I could efficiently see the network traffic across containers I started monitoring.

C

What was the network traffic on port 90 42, which is essentially the client port, where you connect with your Python or Java code and essentially read or write ascend, read or write, queries and I measured the traffic on poor 1942, then I measured the traffic on port 7,000, which is the internode Cassandra port, so where nodes exchange, essentially all the information about replication or where the coordinator will actually make the data flowing across all all the replicas that can serve a specific query and so by measuring this I was quite puzzled that in the first iteration we saw that the intraclass internode activity on poor 7,000 would was 20 times more than the traffic on 494 t2, which sounded very crazy.

C

Why? Because this was affecting a rate of less bill in a massive way like the data traffic on AWS is very expensive, so we started doing the math and with our same way we do all our queries. All our raid queries of consistency quorum right.

C

So in the very worst case, assuming we had to read data from all the three replicas in the very extreme case, I will never expect an internal traffic which is more than three to four times the amount of traffic on 494 t2, whereas here I was having 20 times so massive, and this all came out of us.

C

Essentially after doing troubleshooting, it all came out to the fact that, due to the way we design our schema the coordinator when he was asking essentially to the replicas, give me the data for this specific query was sending on the network. All the columns of the table that I didn't really need that I really I didn't really ask for so.

C

This was essentially the same problem that I explained in my blog post, but brought to the net torque level, and so by doing the optimization that we were able essentially to spot by refactoring the schema we were able, essentially to bring the internode traffic back to now is I. Believe it's between for about four times the traffic on 494 t2, which is incredible because now all the CPU and our Cassandra nodes went down and most important, our AWS bill for our Cassandra cluster really shrank in almost half.

C

So this was really a massive success and so probably for very expert Cassandra user. This thing, my my my seemed like straight forward, but if you don't think yourself as an expert Cassandra user, then start measuring things like yeah, like the network traffic, that you see and start doing some system troubleshooting also that can help. You essentially share all of light into your problem. So here's what we did- and this is essentially the typical use case- that you would use for sis dig in containers, so yeah cool.

A

This this is one of those things that kind of reminds me of when you start to explain it. It makes a ton of sense, like you should absolutely know, what's going on in your network yeah, but it's one of those things. It's also easily overlooked and I. Think a lot of people will kind of view stuff like this. As a as a premature, optimization, now they'll be like you know, they'll quote Donald Knuth and be like prove drop division.

A

We don't need to do that, but I think it kind of falls into that same category as like understanding how JVM to networks. Oh yeah, look about three years ago, I had an application running in production um with you know like my previous company, and we were seeing some pretty bad performance problems and we, when we finally dug in and like really tried to like understand what was happening in the JVM, we were able to bring just from changing the heap size and the new gen. We were able to make a make up.

A

Queries go from about 25, to 30 milliseconds down to about three and to go 10 times. Faster is definitely not a premature, optimization. It's a world, absolutely ludicrous improvement and I think what you're talking about is in that same category, it's something that a lot of people will just kind of assume that you know maybe looking at this as a waste of time.

A

We're looking at this we're not going to see a big improvement, but to be this thorough is, is so important because you can really get some huge wins and- and I imagine that, like you know that this is one of those things where you know the network traffic is going to impact, because there's lots more data. Coming in on to your coordinator, it's going to affect your GC times so between life, JVM tuning and understanding, your schema. You can end up with like crazy performance variations, so it's one of those things where it's like.

A

You know. People are don't. If you don't take the time to understand, what's going on the cluster, you could run into really bad results and it's a distance. You know it's conf. It is complex. It's a distributed system. It's not something! That's that's easy to get right away and it takes a long time to get good at it, but this is absolutely how you get good at it.

B

So well, another thing, that's kind of cool, though, is like we always preach the importance of schema right I mean like every you know, every event we ever do we always talk about. You know: data modeling schema, you know getting your schema right. You know that and I'd say you know, having talked to people that are out in the field, even you know, as often as we are working directly with customers, they say like ninety percent of the problems. I think I was talking to them.

B

Then Brom had an insta lustre the other day and he said ninety percent of the problems we see are still schema related. You know people people getting their schema wrong, but it's pretty cool like that. You know you have a schema problem like this and then you end up using system tools. You know like it like cystic to actually then kind of figure out. Oh yeah. This is this is something that we could solve with a with a schema changed. So you know it's kind of funny how it manifests.

B

You know schema problems manifest themselves as stuff, like you know, like network problems, or you know or or uh you know, I owe problems that kind of thing. Yep, yeah.

A

I think it's also so all right, I think I. Think our time is pretty much wrapping up here, so I think I think we covered all the things we wanted to talk about right, yeah, I, think so, okay, cool um gianluca! Thank you very much anything you want to say for you, peace out.

C

Guys, thoughts to dig now.

B

Are you guys hiring? We always let people plug if they're hiring? Oh.

C

Yeah, of course, of course, yeah yeah, yeah, yeah yeah he's.

A

Fake, oh yeah right, but not if you don't use this dick.

C

No is it in and I'm very prepared, yeah.

A

You better have cystic in to resume awesome. Alright, well, I think that Luke even never nothing. Oh one. Last thing: a call for papers, Cassandra's stomach cfp is open. We have a link at the bottom of the blog post for this week in december, so definitely check that out submit some talks. There's going to be a lot of them. The review process is crazy, but yeah. We love good talks at this thing that the Cassandra summit is always big time. Alright, let's wrap this one up. Thank you very much.

A

Coming on and I think we're good to go.