Apache Cassandra This Week In Cassandra, 5 Feb 2016

Previous Meeting

⏯

youtube image

►

From YouTube: This Week In Cassandra 2/5/2016

Description

Link to blog post discussed: bit.ly/1mhskwB

A

ah There we are now we're on the air all.

B

A

Are live all right, hello! Everyone welcome to our first this week in Cassandra, we're going to be talking about just basically stuff, that's happening in the Cassandra ecosystem. Over the last week, we're going to be taking a look at blog posts, new projects in the in the blog post itself that accompanies this video. You can see any new job postings and we're going to take a look at some of the JIRA's and merges that have happened in the open source code base of Cassandra itself, so this is in particular, really exciting for us. This is.

A

This is what we love to do. Oh yeah, I loved. It is only I love. I love me some open source. You do in vilonia time I like yeah, so um yeah. Let's take a look at the the news and the blog posts that have come out this week. The first.

B

One that we have first.

A

B

Should let everyone know? First of all, this is our first shot, so, let's make sure you know we make mistakes because we're live, and this is yeah help give it a first shot at this. So yeah we would get feedback quite.

A

A bit yep and we've you: can you can there's a Q&A tab if you're looking at the Hangout you're watching it live, you can ask questions just be aware that there's a chance that we might not get to it in time or might show up a little late. The Hangout thing to be a little bit weird. So if we don't answer your questions in the Hangout, we can answer them on Twitter I'm, and that will you know. Maybe other people have.

C

A

And things like that, so we're pretty pretty pumped about that anything else did I miss anything did I know my sweet, pink curtains with the weird lights. In the background.

B

No but I did and look at.

A

My feedback on yeah, it's pretty pretty nice yeah. It says plenty cuz.

B

Interval, yours is just pink, that's kind of lame, an actually total you and I did say you I want to do with the 22 on for this, but yeah that next time.

A

Okay yeah, so let's, uh let's dig into the blog posts in the news. They give me all the good stuff, so the first one that we have on our list here is the importance of single partition operations in Cassandra by our very own schwee hi. He is excellent team.

A

Okay, for you guys, would you guys think of this blog post? Well,.

B

That that to me is one of those. This is the foundational blog post, where, if you're working with Cassandra, you should understand this in any kind of query, your data model depends on the single partition.

B

Absolutely so, if you look at what he's talking about it's really, there are some really fundamental points in there, as if you do data modeling, you know, of course you need a partition key, but then this is a more of a Y and then the downside of what you wouldn't and actually I was I saw some really cool stuff in there like how he talks about secondary indexes, which I, don't think a lot of people consider when you're using a secondary index and a partition key. You do get some benefit from that yeah.

C

I think it I think it's like, especially the opening part. Where he's talking about those loaded, words, atomicity and isolation, and translated I, won't try and get jobs started on the word atomic. If.

B

You don't go and.

C

If you don't go and dig into our docs, sometimes especially if you're a newcomer to Cassandra, it might not be obvious to you the guarantees that you get when you're doing single partition operations so like, for example, this came up when I was doing the the Cassandra persistence plugin for a cadet.

C

This, the data model I ended up with because I needed those guarantees ended up, trying to use single partition operations and I even did a meet-up talk on it, because it was kind of an interesting thing that people sometimes miss. You know, especially if you're a newcomer, so very good, very good information for sure yep.

A

So what else we got here? We've got 0 as.

C

B

A

News on a stand x, coming out of Netflix, so the basically the long the long-standing driver is being retired, I think that's I. Think.

B

I have time, yeah knows an old friend, but it's an old hero of the public right but I, see annex was around I think that was one of the first really big contributions from netflix and the Cassandra open source, because it was a really. It was a bit of a wild west with drivers, and this is for years ago five years ago, and it was their internal driver which validated a lot it, but he did. It was a thrift driver and of course, when that was the gist of the article.

B

Is it now that thrift is well? I guess is deprecated now is then it will give them a doubt frozen, uh there's, no reason for them to continue spending time working on it when they're moving forward the different different plan yep, give.

C

Them some props due for some of the inspiration stuff that came later, you know like in the datastax official drivers, I think, like I, think I, remember, hearing someone say token awareness you.

B

Know really something.

C

That you know was something that came from from that driver originally so props to them for for kind of laying the foundation it.

B

Was interesting because Ilan, who was one really I, think one of the main drivers for the asti annex he and I were talking once and he was the first one to turn me on to this idea of reusing a token aware driver and when he told it to me, is one of those things or I, that's kind of obvious. I guess now that I think about it. Yeah just kind of like shocked me for a minute. I had to stop eating my pizza.

B

I think we were to meet up and that has been a cornerstone of the datastax java driver and python driver and every.

A

Other driver and yeah eliminating the eliminating extra network hops is, you know whenever you can take advantage of that, it's absolutely yeah yeah. So what else we got here, multi-threaded program to count rows of your Cassandra table: hey I, like a program, I love a good project. That does one thing exactly.

B

A

Brian has coming at you with counting right, okay, now that we need to talk much more about that. Well,.

B

You know that that is an interesting problem to me. Right is, and so Brian and I were actually talking about this before he did this, he was getting ready. The funny background on this is getting ready to take a cross-country flight and he needed a project. Yeah like you know about that jug um had he was he's like what is what does something stupid? That is not easy to do and counting how many rows yep yep, that's their.

A

Justification opportunity- that's perfectly um so I get to do a shameless plug of my own blog post, working working with Python and Cassandra, and the object mapper right now. There is no way synchronous, there's no asynchronous programming behind it. You can't, let's say, create multiple rows. You know asynchronously, it has to be synchronous process. I kind of just show how you can do all this stuff with G event. The underlying driver supports.

A

Do you man, so you still get all the advantages of like connection pooling and things like that, which is you know, awesome, except now you can do it. Asynchronously and I've got some metrics show you the performance gains when you actually factor in latency. It can be huge Luke.

B

You can you, can your superior dance now? Have you ever ever used C sharp without a synchronous, II well.

C

You know I mean.

B

C

Big pain in the king of the button till they I mean it. You know its continuation style programming. You know like, like callback held from you, know from JavaScript some other languages like that until they till they added a sink and await. So it's funny they're, actually thinking about having those keywords to to JavaScript. Now too, but I didn't yeah, I, guess not being a.

B

Pi JavaScript just already a think.

C

Ass to me just make it kind of easier to reason about. What's going on your kind of writing code like where it looks like it's synchronous and you.

B

Know they don't get it yeah.

C

These really ridiculous invented callback blocks, but yeah.

A

Yeah, it's nice not to have to have these like nested, like insanity of like okay, when this is done, then do this and like- and it's just lambdas all the way down. Eventually, maybe you like add a number somewhere. Why.

B

Don't I gotta say lambda lambda lambda when I'm writing code I'm gonna stop yeah.

A

That's it's not very fun or, like you, write all these like little functions up top and it just it gets really hairy this. This is nice because, with with G event, it just patches the standard library in Python and yields on any I oak all right, so you can have these micro threads and basically be like constantly doing stuff actually using your CPU instead of just waiting around yeah. It's awesome, I.

C

A

C

Being a Python guy didn't really even realize what the state of you know, kind of threading and that kind of stuff in Python even was an async I, oh and that kind of stuff. So me it was interesting just learning that yeah yeah.

B

You know it was this bid: son, green libraries, the green thread, libraries that I used to do in C++. They got me away from C++ back because I'm Harry you twist your brain into like a Rubik's Cube. It's horrifying yeah.

A

Well, this stuff I'm, pretty I, think I think this is understandable. I think gmn is a really good rapper, that abstracts away the interview ya go so there's another there's a blog post about one of the features in Cassandra 30, the new token allocation algorithm, which I think is pretty cool, that's on the datastax blog. Would you guys think of you guys have looked at this this feature already? What do you what's your take on this.

C

So I mean when I read it, I will admit to like not understanding all of the explanation of the algorithm involved, so but I guess what I took away from it was. It seemed like. Basically, the idea was to for people using be nodes, which was everybody like on on new, newer versions of seen her, since what when was that too old, where that was going to be the default? Who went.

B

On to that was the deal well, they thought I was here default 20, but we had people using it. Actually, this is when it came about when we realized how how hard it was to use. Is anyone not to yeah.

C

So I guess what I took away is that it's going to be trying to be smarter about how it how it distributes V nodes around there around the cluster and where you know what machines they're distributed to, and you even get some parameters to kind of play with to give it hints. You know to kind of be smarter, especially if you've got an unbalanced cluster like where you got more powerful machines. Maybe you're phasing in some new, more powerful machines into an existing cluster kind of thing. I'm.

A

C

B

Kereta does is say, you know the when this has been liked. It I think day, zero problem with v nodes that I'm just glad has finally solved, because that code was pretty. The original code was just a randomizer. I took up a token a random token out of a murmur of three and said here just use this, and that's that was the reason we had 256 tokens in.

B

There is just to give it some n to give it an even distribution, and it wasn't always even I mean- would swing like seven to fourteen percent, something like that. um Not okay. In a four node cluster yep.

A

So I think one of the things I'm interested to see is testing. Regarding optimizing. The number of tokens you know is is eight enough eyes it 16. Is it 64, um I think.

B

All right, so, interestingly enough, we didn't this isn't in here. This is a previous blog post, but we just had that blog post about vinodh and jay baud alignment, and I think that's great, because now we could start getting into more intelligent operations, type decision or we say I'm going to create so many different j bought dis and I'm gonna match it up at the number of v nodes and so you're making those kind of 11 decisions. Instead of just throwing out a number which you know, those are fun.

A

Dude, don't make any sense yeah and this this jury you're talking about I, think this is where it allocates v nodes two discs and it prevents a whole bunch of failure. Scenarios like if you lose one disc, you don't necessarily affect every single range of tokens that that a note is responsible, for it only affects some of them, so you can just replace one disc and it's a less chaotic operation. Hopefully it also rained for your entire data center yeah and hopefully.

B

A

Also improves no density right I, give you if you're not compacting as many nodes with each other like there should be better. You know well.

B

I think it would get you away from. We are tanjung, but it's a good point. It gets people away from having to do craziness like raid one raid 0, which one am I going to do, because I gotta do something with my disk and I'm, not really sure what the allocations can be like and I use. J-Bot isn't going to be right, I mean there's a lot of questions and come up in operations.

B

That's what I like about this is that it it doesn't make you make unnatural decisions and use raid when you really don't need to mm-hmm.

A

Definitely the last thing that we've got in here is logging. The generated cql from the spark connector I'm I'm actually really happy to see this because I had I was just talking about this with somebody, um so basically the the big problem. If you're not familiar with the spark a connector, you can be executing these big batch jobs against Cassandra and you may have no idea what cql is being thrown against Cassandra from spark, because, if you're using especially if using data frames, it's doing a lot of the work for you.

A

So this this is opposed from ryans fila talking about the different ways of seeing the generated, cql and I I think this is really helpful, like if you're using spark of the printer protector like to me. This is like you should absolutely read this post smite. That's just my point of view. Read it right, read it right now, a.

B

Little PTSD flashbacks of using hibernate I'm.

C

Sorry exactly I was thinking to actually I wasn't gonna generated queries.

B

C

B

I'm right now, I mean I, get it but yeah, but it's it I guess that goes to the same problem. It is when you have algorithms developing some query: yeah. It's like trusting a pit, bull good dog, good dog bit my arm up, so it could happen right I mean you could get a really wack query going and it's because it's non deterministic you get a good chance, but as a practitioner you need to have visibility. Okay,.

A

Def all right, the.

C

Back back at datastax, but.

A

B

That out there.

A

C

B

Am the man and applauding.

A

So we've got, um we've got an event coming up I think we should take a sec to talk about Cassandra de la on februari 17th East during the LA area. You definitely you're going to want to check this out, especially if you don't have a lot of experience with Cassandra. This is a lot of it's my birthday.

B

Too so you can come on we're having a huge collaboration, yeah.

A

It's going to need downtown LA. Where is it Patrick? The.

B

Hill, that would know the westin bonaventure and literally in downtown it's in the center yeah. You might.

A

Recognize some.

B

Of the scenes from The, Terminator or heat it.

A

True lies as well: the elevators.

C

Yes, that's right! That's right! Arnold.

A

On the horse, the elevator good times.

B

We could have a Cassandra day there. It's almost not worthy.

A

B

Also we go ahead, I was gonna, say we more close too close to the calendar next week, I'll be in Germany for Cassandra to Munich and Cassandra day, Berlin, that's on a Tuesday and Thursday next week. If you are in Germany sign up now go come see me: hey!

B

A

B

A

It I endorse this.

B

I have a pink curtain and I endorse this message.

A

Let's, let's talk a little bit about some jira updates right. This is this is where I get to play around with code. That's not even released yet and there's little changing. I love this stuff, the first one I want to mention this is this is an improvement. So it's a. It sounds like a small improvement, but it can be a really big deal. Just fixing improving this spark Cassandra protector to support reading for materialized views that just didn't work apparently with 30.

A

So if you're, you know, if you're into building off trunk, guess what you can now leverage your materialized views to get some faster spark, queries and I. Think that's! That's just awesome! That's just.

B

How well now, with ten percent more awesome, just.

A

Cranked up be awesome, yeah.

B

You know what we added awesome.

A

We also have I, think I think this is a huge development. We can talk a little bit about this in detail. I think news secondary indexes in Cassandra are already merged into trunk and there's a jira eleven eleven thousand 67 improvements to it, adding a like claws so.

B

Anyway, I should just pause. You do not have to memorize that number. This is in a blog post on planet Cassandra. You.

C

A

B

Link, you do not have to memorize this number yeah.

A

You don't yeah, you don't need to be memorizing every single JIRA I. Let you think it's fun um this. This is huge, so we can we'll probably spend some time blogging about the the actual implementation details. But the gist of this is that you can do prefix like wild card searching Suffolk searching. There's some awesome stuff, that's coming into this, and it's fast. This is fast.

A

This has been in production in a very large cluster for a while now mm-hmm good times, so it's pretty pretty exciting and the stuffs being merged in and it will show up in 34, which is going to be out in like a month and a half or so soon.

B

Yeah yeah 3ds become human, three dead, four yeah yeah, so.

A

I'm excited yeah.

B

No I mean it's a lot even comment on.

C

This you're, just like cited bit yeah I, mean I've just second, what everybody's saying I mean it's super exciting like being able to do a like. You know it's like bringing us closer and closer to that to that SQL kind of parody, not that that should be our goal, because we're not you know we're not a relational database, but pretty cool too to be able to do. You know those those kinds of queries that you're used to doing near.

B

C

Relational databases, that's.

B

Just it, I also, I think this has always been an issue with data model and cassandra is people tend to use secondary indexes a little more liberally than they should thinking of well. This is the right choice. If I'm going to try to bend my my big data model to my will, but if you use them in an incorrect way than they can turn into a real downside and it's it's almost an anti-pattern I know: I've had some people say.

B

Why do you just say no to every time, but because you can run into big trouble and it's not the best way to get performance for sure, I. Think I say this a lot they're great for convenience, but not for performance. Yeah.

A

But here there's a lot of really good performance stuff and it too I mean I, don't want to look too deep into the internals like I said because it's it's pretty hairy, but there's some really really clever things being done in here that enable really really good performance. You still have the problem with trying to do secondary indexes across you know, distributed costs like a distributed database. So you know you could hit every machine, but.

B

A

It's it's. The impact of doing them on each machine is going to be so much less that it's absurd and the implementation you know I keep reading about it. I keep looking through the code and honestly, it just gets more and more impressive. Every like everything I learned about it, it looks better better. Everyone.

B

That I know that it's tried it has has been absurdly excited to the point where I had to sedate them. Yeah.

A

I'm really I'm really pumped to try this stuff out with spark. So next week, I'm going to be playing with spark, and this and and seeing how how much of a performance impact we can, we can have from from having secondary indexes that are there you know trivial like they add it doesn't take any longer to do a right to so you can you get the same right, performance and you're going to get just better, read performance and flexible querying that you just could not do before so so.

C

We we keep mentioning like, then you know the like claws, but I thought the rest of the sazzy stuff had had some other stuff, even like I assume that got merged into so it was like grouping, you know, Predacons or groupings.

A

Grouping as a separate one, but that is being worked on, but there's but arbitrary, like arbitrary numbers of predicates that get built into like an actual like tree of an indoors with you know like a real query: planner that can combine operations to make for more efficient lookups. That stuff is all like slated to come out. I think it's just being done in stages right now, but- and I actually prefer that I'd rather see like be done at a time as opposed to like these huge patches, which landed everybody's like rebasing for weeks.

B

A

That's actually.

B

The dream of tic toc right, though, how we do tick-tock releases as the even ones our feature releases, but we do them enough that now we can, we could have this apostles been working on enough to get into three dot for where's functional what people can start using it. I sorry hi talking about the there is a company out there, a user ready to put into production. God bless him for that. But you know: don't everyone do that great idea. I will in person yeah I know you know, uh but you know that's that's great!

B

Then we can get this incremental stuff going and say. Oh well, there's some changes we need to make and we can change it in the next couple of months and not wait until you know next year or something like that. Yep.

A

Yeah so I'm I'm, loving the stuff. That's coming out in tick-tock I wrote a blog post about what happened in 3, 2 I'll, be doing the same thing for every single release, just covering what's happening and the sazzy stuff that's coming out in 34 and- and I suspect, every feature release is going to be just amazing. So this is there.

B

Will be a meet-up coming in april and so we're working on the location that just heads up pavel the the author of this feature will one of but main author on the feature. If you look at the JIRA, he will be doing this meetup so.

A

In sports is brief: it's pretty cool. It.

B

Is really cool.

A

So um I think that covers all that we had in here 40 is that we want to talk about projects and things like that. Do we have any Q&A?

A

Yes, one Russell right, why isn't anyone combing anybody's hair, great.

B

A

B

That's a physics question uh because jana not hace be the.

A

Example apart yeah hundreds of miles.

B

A

Thank you. Thank you, russ and there's, apparently a plus two on that, so people are really want to know. Yeah.

B

Yeah, if your hair's really uncommon, so I'm gonna do some throw that out there. Maybe maybe no.

C

B

Hair to say so,.

A

Alright, let's wrap this up. I guess Oh. Once again, last thing I wanted to mention so definitely hit us up on Twitter I'm rusty razor blade. We got Patrick McFadden Luke Tillman seriously. If you have any questions just tweet at us, we're going to be doing this.

B

A

B

Is this is awesome, is this is horrible? Please let us know if it's really really horrible um tell somebody else. No yeah.

A

We don't yet we don't know yeah, we gotta, really what I Luke will cry if yeah.

B

Sad, let the fly to Denver and like 'can solium bad, because.

C

It's been called so frozen, TF I, just.

B

Sad when they're frozen, so what we want every week John is. We would like to we're going to do this blog post every week. This we can go sooner. Just to give you get you caught up, so you don't have to try to track it all. The time like we do we're kind of maniacs for this. But what we want to see is some projects blogs things. You know things that are interesting. They're happening community. Do this weekly.

B

If you have a job posting, you really want, and we've already got a couple for next week- ready go ahead and send those our way just at Planet Cassandra on Twitter and we've, and if it's more than what you could do in a tweet, then add us. There hit us up on Twitter. We can get into a email exchange, but we want to know about stuff, that's happening so that we can convey. It is back out to the Cassandra community.

B

We have a. We have a large community worldwide. That I think is growing. Every single day and kind of information is really critical, because it's just turning into a large amount of noise and you don't want to lose things. Oh yeah.

A

And having having a place where you can just go and check out a quick little post get your dose for the week of what's going on, especially it's Friday. Whatever.

B

You lift it up today when you see ya, let's.

A

Just hang out eat a sandwich. Read a blog post will.

B

Do something? Oh, let's read a blog post: let's read it to each other. Let's.

A

B

After this, okay.

A

So all right uh anything else, I can't think of anything. I'm done no I'm.

B

Gonna go read a blog post all.

A

Right, let's shut this thing down, how do we do that.

B

I think Lina has the big.