Apache Cassandra Use Case Interviews from Cassandra Summit Europe 2014, 2 Feb 2015

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Apache Cassandra at Spotify with Jimmy Mårdell, Tech Product Owner

Description

Speakers: Jimmy Mårdell, Tech Product Owner at Spotify & Patrick McFadin, Chief Apache Cassandra Evangelist at DataStax

A

So welcome to european cassandra summit 2014 jimmy. Thank you very much so jimmy. um Why don't you introduce yourself? Tell us what you do.

B

So yeah I'm jimmy from spotify and I'm the tech product owner of a team at spotify. That's responsible for the council in the cassandra infrastructure at spotify.

A

So all right, uh spotify, of course I know what it is. I love it. um What about those people who live under a rock that have never heard of spotify.

B

What are you telling people about so so? Spotify is a music streaming service, uh bringing you the right music for every moment, be it on computer, mobile or tablets, uh we're essentially your music wardrobe. We we have all the music that that you can ever want to listen to. I.

A

Like the all-you-can-eat family,.

B

Yeah yeah, so there's a difference between us and many of our, and before is that you don't have to buy yourself. You you're you're, buying access to music.

A

So I would just I would suspect that you probably have a few data challenges there yeah.

B

Yeah, we have a lot of challenges.

A

So um like what are some of the use cases that you use cassandra for.

B

So we so we're using cassandra for uh it's. It's really the our de facto database when it comes when we want to store data uh higher. We need to be highly available and used by many users, and we have a lot of users and oneness. For example, we all our playlists that you have that you have created are stored in like big cassandra clusters, but we also use a cassandra for social networking. We're using cassandra for storing music collection recommendations and everything.

A

So that was actually one of the first presentations I saw you guys do as a playlist. You had a really interesting playlist application. um So of course I appreciate that as a spotify user, that my data is probably going to be there when I save something on playlist, it will stay. But would you say that I mean what was one of the main reasons you use cassandra.

B

So when we started a spotify, we had this problem of. How do you store data reliably for millions or hundreds of millions of active users and because we're building our own data center, because we want really low latency when you play music and we had to figure out our own? What we need to host our own databases and we tried out many different kind of databases- relational ones. They didn't scale us very well. As you know, we tried out some other ones, and then we found this dynamo paper, uh which is famous that we started.

A

World famous dynamite yeah, we started building.

B

Something on top of that it was miserable. Okay, we made, let's make a second attempt, but hey wait. We have some guys here that have done something based on this as well, which seems to be everything we want right. It's based on dynamo, it uses immutable log structure, storage, immutable, fine seems very durable, so we decided to give it a shot.

A

So all right, I'm gonna, ask the question because I know probably people are thinking this. You actually store music files on cassandra. No we're not! Okay! That's a bad use case right.

B

Yeah, that's a bad use case because cassandra used to typically store structured data in in cassandra for for big data files like big music, mp3, mp3 files or whatever format you're using. uh You typically store that in big storage or cdn, or something like that.

A

So if you were to, I mean, if you were to summarize a use case, I mean if someone was learning how to use cassandra or wanted to use cassandra. What would be a great use case for cassandra.

B

We use cassandra for many different use cases. Typically the way spotify works. You have a micro service architecture, uh so we have a lot of small services doing one thing and doing one thing well, and each of these services needs to store data you, the data that does that service needs to store, it's usually fairly simple. It can be a simple key value store and you can use metadata, basically key value store because excellent key value store. It brings all this durability and everything, but then, of course, cassandra is so much more.

B

You can use it also to store time series state, it's excellence on time, series data, it's excellent for storing many different kinds of data, and it's really this high availability and multiple data center support. That's made at us cassandra for almost anything. So I think there's almost nothing that you couldn't use cassandra to store so.

A

Recently, uh we don't have to talk about exactly what the feature was, but you- and I were talking about multi-data center- is becoming the problem to solve right. So would you say that's probably one of the primary reasons.

B

Indeed, so when when we were talking about the storage problem, one of the one of the concepts that we tried out and really regretted doing was something called home sighting, which meant that the us, a user, only stored your data in the data center, where your user account was created.

B

I still think that's a fairly common thing and it works horribly because data centers, you want to do maintenance, you can have routers going down the switches and we want to be able to shift the user from one data center to another without losing the data right and using multi. Multiple data center replication in cassandra gives us that.

A

And no doubt you've had a data center go down before.

B

Yeah, either because of the provider uh doing something bad or which has gone wrong or any kinds of problems they can be. I mean you can have network partitionings between between data centers. uh We had. There was an example.

A

Yeah yeah a couple of.

B

Months ago there was this shark that bought by top vote that took a bite of the transatlantic cable, and this is actually no yoke of coursing causing real problems with our replication between clusters. But uh it will fix later on well.

A

I always tell people that, if you're getting anywhere anywhere close to 100 uptrend, you have to be in multiple data centers. Yes, it absolutely has to be so it's it's clear that that's what you're doing right now.

B

Yes, um it's also multiple data center gives much greater latency much better latency.

A

True, that's probably true now, with your microservices architecture. Does that meld well with what cassandra does for you? um I mean like being distributed and.

B

Yes, I mean so what we're doing for cassandra when it comes to microservice architecture? Is that we're actually using many cassandra clusters? So that's something I believe that maybe not so many many companies are using their. I think it's more common scenario that you have just one or two or very few cassandra clusters, and you have many key spaces to store a lot of data in there, and then you have one dba or two debates handling all that. Instead we're going a different direction.

B

I have very many cassandra clusters, one for each of these micro services and which makes each cluster much easier to maintain and it's easier to perform and profile it and easier to. If something goes, bad you're only affecting one part of the entire service.

A

So that's interesting. That means that you're, pretty decentralized with your cassandra. So who do you have one team that manages all your cassandra installations or is it? How does that get managed? So.

B

It once upon a time until after a year ago, we had yes, indeed, one team that did all the debating and managing cassandra and working with center upstream. But then there was a shift about a year ago where the operational responsibility was pushed out from this centralized team, and this is not only cassandra but any operations within spotify from one team to all the teams actually developing the features so we're pushing out the responsibility share. The load spread the pain and as a developer.

B

When you have to get up in the middle of night to fix the code, you get very much more careful what you put in production.

A

That's becoming more of a common theme and I think that's great I mean it's. uh It makes you move a lot faster, yeah so and that's good as a user. I like to see more features coming, so I had less downtime. It's awesome yeah. So, yes,.

B

So when, when we made that change there, there was okay, we, this would probably not work. Well initially we thought it was on the long run. It would work well, but they actually turned out that down that up time went up right away from the start. Oh that's great yeah, so it will be pleasant positively surprised by that. So you're doing a talk today, yes and we'll talk today.

A

um So uh what are a couple highlights and things you're going to be talking about.

B

So I will talk about exactly what yes, totally, that, how our how we operate the center model, this decentralized model, spreading the pain not having a small team operating at center clusters. I will explain how we came to use cassandra and then we'll talk a bit about repairs, which is as jonathan explained this morning is the very common pain in cassandra. And, finally, I will talk about uh day-tier compaction, which is a a recent invention that we did at spotify and which is now available so.

A

That that one is actually something you and I share a little common ancestry on, because that was something that I was very interested in early on and then I was in stockholm uh last spring yep and I met bjorn. Yes, so now bjorn works for spotify yeah. He does. Can you tell a little bit of how that came about.

B

Yeah so about a year ago, uh we had our our main cassandra engineer, marcus. uh He. He came up with this idea of that for time, series state and none of the compaction strategies, zeiss tier or level tier, is actually perfect for the use case. So he came up. We could do something different about this, and then we had this master thesis student who wanted to work on cassandra and we thought that was a perfect fit.

B

So that's how it really got started and yeah.

A

It's well it's so really it's about maintaining um a different compaction strategy around the fact that you know that the data is immutable, that it's going to have a long tail and stop recompacting data, that's already being compacted yeah.

B

So it's really it's organizing the data based on the knowledge on how the rights look like. So are you you're using that.

A

In production right now,.

B

A

This in production right now, so uh what have you seen? What are some of the benefits you've seen.

B

uh If so, it's only been in production for like two three weeks, so, unfortunately, we haven't gathered as much data on it, yet uh we're using it actually, for we have several use cases for it and we're using it for actually a case right now with ttl data. So one other benefit of data compaction is that if you have a lot of detailed data, you can actually drop entire asset tables at a time because you're not mixing all the new data, because.

A

All of the all of the time series data or the data, that's in a certain time frame, yeah fits in one file.

B

Yes, exactly and then you can drop this, so you get if with ttl you, sometimes you get tombstones and that you can't easily come back away, but with data compaction you can actually drop drop the entire asset, because because you're not mixing all the new data.

A

So that, actually uh that sounds like a great idea: you're just deleting a file.

B

Yes, so, and that works turns works very well uh for time series we we still just very early on in testing phase. I can't give you any numbers, but uh uh we did we did when it was developed. We did run it in server mode for a long long time in production and saw that yeah that the performance was much better.

A

So are: are you looking at using this for more of an operations need, or is it for performance or re-performance?.

B

I would say it would read performance uh yeah.

A

Mostly so, and that's really, one of the keys is that it gives you the same performance lead performance as level capacity. Yes,.

B

A

And a lot less compaction like size to it, but.

B

For a typical case, for instance, where we're really looking where we're trying this out now, which, where it's it's, that we have this cassandra cluster- that monitors all all services, spotify right, thousands of them and there's a lot of metrics that come in ever and it's it's a your standard time series use case, and then we have all the graphs. Everything- and I mean the most common thing- is that you only want to look at the graphs from the latest day or latest week, but we store it for entire year right.

B

But all that data is old. You can't query it when you want to, but no usually ask where the latest data. So this is exactly what what you would like to use dating for.

A

So I'm excited about this, because I I mean I, this has been something I've been wanting for at least a year or two, and so I've been anxious to watch and see how it goes so I'll probably be hitting you up again and see how things are going. um Maybe the next uh summit talk will be on how it works in production. Yes,.

B

Yeah, I would be.

A

Happy to talk about that so uh final thought. So when someone's getting started with cassandra I mean what advice? Would you give someone to start starting with cassandra, especially.

B

In a large organization, if you get started with center today, you should realize that uh you, you will get exposed to sql immediately and but sql is not sql. So you really need to understand how cassandra work you can't just assume this is oh now I can use cassandra as a relational database because you really need to understand data modeling and there's a lot of excellent guides out there on how to get started with that, but also how to set up cassandra clusters make sure make sure you read up on best practices.

B

I think the documentation on the data stacks website is just fantastic. Just take your time to to read through that, and you won't make very many mistakes at least.

A

And if you make a mistake, there's plenty of people to help.

B

And a mistake, you learn, you learn from making mistakes. Yeah.

A

Well, and thank you for being a part of that community too, thank you. um I did a meet up there and I think that was probably one of my biggest ones.

A

It's a great place to do a meet up yeah. You have a great space, okay. Well, thank you very much for being here today. Thank you.