Apache Cassandra Cassandra Summit 2015, 14 Mar 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Sony: PlayStation4 and Cassandra - Journey Continues

Description

Speaker: Alexander Filipchik, Principal Software Engineer

It has been 2 years and 20 million+ consoles sold since the Playstation 4 launch, and Cassandra is still alive and well within our infrastructure. We will cover various aspects of running Cassandra at large scale, share our findings, and discuss some tricks that can make your lives easier. We will share how we handle varying use cases such as batch analytics using Spark to how we provide real-time personalized search. And just like before, we will be having a raffle (Last time, one lucky attendee walked away with a brand new PS4 destiny edition!).

A

B

Yeah thanks, everyone for coming I know why you guys are really here, but it's all right. You have to sit through our presentation which hopefully you'll get something out of it, so we launched ps4, probably less than two years ago, and we've done a lot of local Cassandra. We probably made every mistake in the book and we've learned from it: we've been able to scale and really handle crazy amounts of traffic and really have learned a lot of you want to share what we've learned with you all, especially in this past year. So.

C

And also so you'll get you will get this really beautiful. I will like it as raffle ticket, so keep it with you at the presentation ends and if you want just continue keeping it afterwards, just like art and throw the numbers and we'll just at the end, we'll we'll try to pick a winner by generating random number. So it's I I think it will be fun and then be careful by the way. With this thing, I spent two hours yesterday, stamping it manually and I'm, not really good.

C

Stamper I did several major mistakes, so this thing is really really like fragile, but yes, just be careful with it yeah, but the front.

B

C

Is really cool? Okay,.

B

C

Who why do you want to listen to us actually slide? That's why that's what people put the slides right, so I'm, Alex, I principal software engineer with Sony network entertainment, here's my playstation network, ID Laser laser toy, so you guys feel free to add me as a friend we just read it from Chan allottees. That allows you to add me as a friend and find me in a PS and then we'll talk about it later and.

B

I'm Dustin I'm, also principal software engineer at Sony network entertainment, as you can see, I did not put my PSN name I'm a very private gamer, but you could probably find me through his account, so you can test our features for us. We'd really appreciate it. So God ah so like I was saying.

B

The ps4 has really really risen in popularity is one of the most fastest growing consoles ever and this kind of give you a feel for what the kind of user base PlayStation Network deals with we deal with over 65 million active users and on record we have several hundred million users, so you can imagine the amount of interactions that happen. The amount of user level content that we have to store in our Cassandra.

B

It's quite a large task and when you think about what playstation network is you've got to start thinking outside of ps4 and ps3 we're very much a multi-platform set of services to provide a broad set of experiences on my ps4 sui has three vita phones and eating televisions that are not something so samsung TV. We have some experiences of them there and will continue to expand and innovate in that direction.

B

So, like I said ps4 contains, is set to sell and if you look at some of the the numbers here, when we launched in 2013, we launched immediately with two million consoles and I. Don't know who's worth. Look like a lot of scale before or anything that was a startup that got really popular Lee fast.

B

Just imagine having a two million users logging into your services over like midnight immediately seeing that traffic spike was quite an interesting challenge and, as you can see, our growth have to spend going on continuously well over 20 million consoles right now, we've grown to 10 times the amount that we started with and in fear one thing you don't see or just the traffic spikes, so this is only ps4 is sold. We also deal with traffic spikes.

B

When we launch games like destiny here or call of duty, even indie games such as rocket League you're, always vigilant. We always have to be able to manage our data in a very good way.

C

Yes, that is a very sensitive topic. We have a really. We have a really really big competitor and then there are a lot of very smart people working for microsoft, but it's very hard to compete with microsoft, a lot of bright engineers, but we was able actually to do really really well and to give you a how how bad is the competition? How tough it is.

C

You can see it even here today right we have Microsoft presentation as and Microsoft again work surrounded guys here to help us I yeah to fight them bad, but but but there is one thing right here: you go that would you see if you go to stores even today, that's available Playstations and here's number of available positions.

B

Yeah, I actually had really.

C

Hard times finding this one for you, yeah.

B

I really really struggle to find one level.

C

Yeah, but so what does it mean for us? It means that we need to iterate fast. We need to bring new features and also want to make sure that our services and up and running and then they can scale and then our back-end doesn't fall apart every month or every year. I'll show we we we don't want to see any outages right, 100 impact we wanted good latencies went really great customer or user gamers experience. Actually, okay,.

B

So what we develop, I mean, if you think, about all the functionality that we provide. I mean it's, it's not just saying: okay, we have a ps4 or some services inside the ps4 and also off ps4, there's quite a bit of experiences. For example, we actually have a very large social network, like I mentioned before, we have over 100 billion users over 65 billion active users, so you can imagine a social network of that scale. We keep track of all the user activities, what they do, what they play, what what their friends are?

B

We have a pretty rich social graph at Sony. We also do things like maintaining the gaming, video libraries and being able to search through those libraries and cassandra is actually a part of that search solution which alex we'll talk about shortly. We do authentication, we have some new features like communities. That's coming out your parts that power the store we use it for just some general purpose stuff like caching and there's a lot more. So if you look I mean we have a whole suite of products that we provide. Some are very innovative.

B

For example, we are one first to do live streaming, TV over over IP, so we have playstation view product which is extremely innovative, always on the cutting edge. We have game streaming. We have assortment of products, we've partnered with Spotify for our music business and, as you see, there's a lot of use cases. We have the handle. What kind of back-end do we need to support all those use cases, and what we found is that when we try to make the tough decision a couple years ago, what can we?

B

What database solution can provide us with scalability low latency and also be able to have us provide for all these new experiences? You know law debates, a lot of fights lot of technologies that we looked over and Cassandra was the winner and I look back at those meetings and those discussions and I'm pretty happy with the decision. So it's kind of give you an idea.

B

If you look at all these experiences on the ps4, cassandra is in there, it may not be obvious, but it's actually, and so many of the experiences on the PlayStation- and this is this position for we're, not including other platforms, which also use that so I mean you sort of like traffic I mean when we talk about. So our Cassandra clusters are based around serving the end-user. The analytics portion- we're not about analytics data in here, but we're talking about live to the customer. So we have no bunch of clusters. We have nodes that cluster.

B

That range between very simple for node, two or four hundred note clusters we're streaming tens of gigs of data per second, and we have 100 terabytes of just raw data that gets served and it's just a massive amount of data, and what we have here is just some general read and write numbers is that no, we were doing a lot reads: a lot of rights.

B

We're actually transfer actually a lot of data in these. So that's kind of what those numbers are coming from.

A

C

Right so, let's now talk about real stuff right advertising party's over guys yeah, so keep. What is this so France search? What does that? Actually, it's personalized search where you had several use cases when we needed a beard like, for example, for PSN user right I want to be able to find my friend of mine on a platform.

C

I want to be able, search by name, going to be able by search by online ID, and if you think about it, just general search like how many, if how many users are there on this planet, like 7, billions possible gamers, so you can just I mean the solution is pretty obvious. You just put everything in solar and then scale it up, and then you will be able to search it, but we had to had several interesting requirements.

C

For example, once you want to be able to show your friends on top of your friends of friends right and on top of everyone else, and then we want to be able to search inside your friends and inside you inside your friend of friends, so there another use cases right, for example, you're on a game and then your multiple games, probably and then you on multiple videos and we need a platform that can be very, very fast and answering questions like how many games do I own.

C

How many PlayStation 4 games give me all PlayStation 4 games with and that give me all the preorder it PlayStation 4 games, and then it can go and on and on and on and on and on. So solar is not really good solution, let's say to hand to do such to handle such use cases. So we had to. We had to build something something else. So I won't be talking about France use case here only so we have way station for we have Cassandra the problem.

C

Is you probably not easy, this big empty area and then I'll just all just tell just there will be one error just going down I, don't know it will not be so. The first problem that we needed to solve is how do we get the social graph? Social graph can be very, very, very big. It could be billions of billions of billions of edges. It cannot fit in one box memory, it should be distributed.

C

You you probably want to be able to access this really fast in parallel, so the obvious solution was to build a micro servers on top of it and then and then Cassandra I was like a good choice to 250 stores. This data, the problem with it is there are several problems. So the first problem, the you can have those power users with a lot a lot of friends and then how do you store it good question? How many of you use cql rights, reefed.

A

C

Right, so, if you use cql, you probably know that there's this partition thing which lives on the same, only one note, and then we Swift it's to me. It's easy just one vote. It lives on the same box on the same note and can be replicated. But if you keep adding four, if you model it as a account, ID and bunch of friends, then this think it gets really big and then loading it in memory.

C

All the time can get can be expensive and it will be slow and also people, people create new connections, people break connections. So a lot of updates going into this thing, a lot of tombstones. We had very interesting challenges, so we decided at the end that lets ledges. Let's just build another service on top of it. Let's cash everything locally, let's have a way to evict the cash and memories faster. Unfortunately, and fortune truth is, memory is much much faster than go over the network. So if data is local, the access is just blazingly fast.

C

You can probably go through all this graph in matter of seconds rather than ours, if trying to load it from a database. So next thing real-time indexer. Why did we need it? So when you search for something inside your friends like, for example, you want to find someone who is friend of your friend by name and sorted and all the stuff, if you stir those personalized indices somewhere and we tried it, we, we failed actually you'll notice very soon that the data size grows exponentially.

C

It grows very fast and it is put a lot of pressure on Cassandra compaction. Some brutal CPU goes two hundred percent when, like power user power user, so that's why we thought like well. Maybe we can do a real-time indication when a request comes we'll just do indexing.

C

We have this layers that can give us all the graph really fast or pieces of the graph partitions, let's say and then we'll just index it in real time, and then there are good technologies to do it like loosing, for example, right very good one used inside solar and then elastic search uses it where we can do it, we can do the same for sure and then last piece. Actually you don't want to be in big scene every time right, because index indocin is CPU intensive.

C

So here it goes like last piece of the puzzle in memory personal index. It's a cache distributed cache. It keeps like user sessions, let's say for some time. So when you go to the platform you search we're not rebuilding illnesses. How fast is this thing and then how many, how many account would say, we'll processing so we're processing millions of accounts per second, that's number of account that is added to index and then search latency is less than one millisecond on average afternoon indexes build. So we learnt several good lessons and we found several bugs.

C

So if you use st onyx actually and then, as I saw a lot of people, don't use this t onyx, but it doesn't mean that cql might not have similar issue right. So in st onyx you can specify a number of connections can note which your application will try to open, and then it there is a. There is an issue with it. The number of connections always growth. There's no mechanism to reduce the pool when low it spikes. The top server will open as many connections as possible load goes away, connections so open.

C

And then it's inside thrift, implementation you're like buffers buffers, always grow. They never shrink. So when we try to store indices- and this is can be big, so what we saw spycams a lot of connections are open, then connections a circle they go in like this and then sooner or later there will be huge payload that will be is a fetch of written through the can action, buffer buffer gets bigger and bigger, and bigger and application just leaks memory. So there's a fix for it is just evict.

C

Connections right actually watch them and they, if they growing too big, and then, if you see that right to me too many idle connection, just kill them and maybe kill connections every several minutes, for example, to make sure buffers are not not not there no leaks in buffers, so another one might apply to secure as well. So when you want to fetch friends of friends right, if we, if you store it as a count, a different the first quarter will be give me all the friends of myself right.

C

The second quarter will be give me all the friends of my friends right. It feels like a range query when you do a row slicing, so default. Implementation in st annex goes like this. Even if you use talking about it will go to a random note. Actually, if you go to their code the the way they do it right when they trying to find the nodes, its responsible for a range of a key just returns. Now, where is implementation right and then it picks random coordinator and then coordinator?

C

That's all this magic, and then it goes back is at optimal. Not really have a network up, so let's go, let's go to the slide and then the improvement actually will be to do something like this right. You know talking ranges, so you can technically find all the right coordinators in your ring, because you know ring topology. St onyx gives you this information, and then you can actually do a quick. Decor is nodes directly, so it will be 1 network hope that's.

C

Hopefully this cassandra has this one interesting mechanism to redistribute in the load and then I'll go back to a previous slide. It's another idea. So, as I said, ro can grow, a partition can grow. So when we find found that you can actually write, Cassandra uses multiple tokens and then distributes load by I, say no, this this ad actually thing for sympathy stock in range. You can do it yourself as well inside Cassandra and that's what we would. What we did for several be close.

C

We just was splitting them in multiple buckets based on column names, for example, based on based on based on identifiers here, and the problem with it. You'll need to do more reads to fetch everything when you do a scan with shrift, if you can do probably sequential scan but with rift, you'll need to do at least, like n reads to fetch, like all those rows. So but writing is fast and then it you you want.

C

You want blue or up your memory when you're always extremely huge, because I worst case never, for example, friend account who knows everyone on a Playstation. It will be like it, a really really big bro like all right, then I'll skip this one. So rotation seems like a good idea. I mean why don't why? Why not utilize memory? The memory when stuff in memory it's very fast, like reddit, is fast memcache this fast and we thought could be a good idea actually to use it. So we tried it and it was painful.

C

Honestly didn't didn't, go really well, even though we thought we had a really good use case for it. So we had this juicy house. It wasn't like complete jisuk help, but after a day of running instance was go, instances was going into weird GC mode. When there was this firing, gc's one after another, one after another, cpu was growing up.

C

Latency was going to it wasn't like that bad latency was growing up from let's say two milliseconds to 25 milliseconds, but because ever seen this was kind of optimized for 225 millisecond latency that wasn't really good. So our latency, our customer facing latency, was growing up.

C

Yes, it takes longer to restart. Actually, if you can figure it the store stuff on disk, and if you restart the bouncing out note takes time to review it all the caches, and then we just decided that there's are a better, better ways to spend memory, actually them to use it to use rotation, spark yeah.

D

We've had this.

C

So there is a Cassandra issue.

C

The problem is it's, so what we thought we can use spark to run analytics on top of our production data with that, oh maybe we can stream have the second bc only for analytics, and then stream data in real time zone will run spark. On top of this different Cassandra. We ran some tests. We didn't really like it so right now we use park or use park with Cassandra will use Park to monitor Cassandra right. We use Park to process all these metrics and then actually all those metrics going into our logins infrastructure.

C

So we can see what's going on what is failing? Why is failing.

B

Okay, so I want to talk about supper kind of handling abuse cases which is designing for migration, and what do I mean by that? Well, I want to say that change is inevitable. I mean we, we launched, you know little under two years, but the user place has grown. A lot of new features have been added and things don't let a scale, as you initially thought so column family that exists in a key space may have grown more than you thought or different paths.

B

Different different sets of clusters may have more traffic than you anticipated when you first designed everything, and the thing is because we have so many interactions, we have interaction between users. Interaction between games, users interact with different features at different times, game launches totally new things are happening all the time. So what does this? Do? It may cause more load on different aspects of your cluster. So what really is the strategy here and how did we handle it and I think people who've done?

B

Data migration probably have done very similar things, but I think when you deal with massive amounts of data, it's it's very crucial to know what your shaggy is and how to make sure you don't affect the end user. So we have is a sense of load. Very so let's say you have an application that handles multiple use cases. Okay, so this is mix st hugh's cases. Kind of fit under this feature makes sense to put in this application and okay.

B

Well, this feature should have its own Cassandra cluster, because that's what we should do have a cluster / feature and that manage different use cases well, which is good. But then what happens is that well sometimes you'll have like unknown or unexpected ways of traffic coming in and basically just kind of bloats, some some load into a particular user journey user flow and what happens is there's an impact on the cluster right.

B

You may have some content that may be a little too wide and it's basically affecting other use cases, other flows inside the user experience and, what's even suckier, is, if there's a critical path into one of those other use cases you're really compromising that flow. So there's a need to change. So we've had situations like this, where it's like. Okay, we reached this point where our cluster, its use is a ton of imbalance. We need to to do something about it. So what do we do? We do migration.

B

What that means is that we take that aspect that use case and move it out, move its data out, move it its back end out, so that we can scale that one independently and not effect of the other views cases up time. So what do we do? We basically create a second cluster they're, not connected they're, actually very independent. We create a second cluster and then what we do is we restore from backup.

B

So we did consistent backups and we restore it using at this table loader and also from the application point of view, we're doing double rights and double reads. While this is happening, it is a little more expensive, but when you're trying to move a large amount of data and make sure that the end user is not affected and that you want to keep up time, you don't have any maintenance. You got to do this kind of in place approach.

B

So what do we do? We bring it twice and also we read twice so what we'll do we'll read from the do closer? First, if it's not there, you look at the old cluster and then fix the mismatch. So what does that mean?

B

We need to always know check for that integrity, and, what's key, is that we always have to monitor so every time we have a Miss, we keep track of it in our graphs into our into our metrics, and we keep an eye on to see how off of these two clusters and, of course that doesn't solve walk use cases.

B

So we also do is that we have integrity, scripts, things where we can iterate and just through both the clusters and just validate and see how far apart they are, and the idea is that over time, these clusters should eventually become very consistent with each other and then over time. What we do is when you see it's: okay, we cut the cord, and now the application is a lot happier. The database is a lot happier and, in general, we're in a good spot with moving our data around and and I think. The key.

B

The pic key point to take away from this is when you're dealing with a lot of days, so similar nodes have over 500 gigs per node.

B

Managing this and just putting new DCs means transferring so much data, and it pretty much kills your your life cluster Network I, oh ok, so moving on to the how we can take this a little bit further, because, obviously, in that previous one you're still bound by certain the application itself, so here we're doing the same thing. What the main point is is that you want to totally isolate that user flow, so we do the same thing: restore from backup.

B

Do the fun read right twice we isolate it, but then you know what we can also even do this. We can even isolate this out into its own application and keep it completely separate and spin off to another microservice.

B

Here's in some cases, when you first initially looked at a feature you may have thought that is everything it should be. It should happen, but sometimes it doesn't happen that way. Use cases change. So when I say it is not for migration, there's some things that we do to kind of evaluate and how we can design or feed applications to account for this potential scenario. So we want to do is really anticipate these critical areas in your feature and and basically pushing through a separate key space.

B

When you close the separate key space, it's so much easier to restore backups so much easier to set up a second DC. If you have to so much easier to move the data around cuz, it a column family, it gets a lot harder. Okay, so some critical areas is, if you have any different kinds of security requirements around your data, if it has potentially some kind of user information that may be potentially something you don't want to share with the whole family as well segment it out.

B

If you have anything, that's unbound or potentially could be a lot of large data. If you can have something that creates a lot of edges, you should pass up with this out into its own area and if there's dependency or a lot of different services to this particular set of data may be put in a spot where you can potentially create new clusters if need be, and just separate out the data.

B

So you know when, for our applications, we always separate our protection pools for each critical section. From the application point of view, it makes a lot easier to make updates and split it out to supper applications if we have to and most of all just monitor, everything always monitor the load on the different key spaces, different column, families and also before else, mentioning that you want to monitor when you're doing this migration itself and make sure that your consistency is in a good state.

C

All right, my favorite section issues so everyone that probably in this room, intended multiple sessions yesterday so- and you know that cassandra is rock-solid- bulletproof no bars, no issues. You cannot bring the thing down, it's not possible period. Well, you can, and I think he has someone fun. What of you guys probably attended presentations about page duty like say: experiences, miss Cassandra, so cassandra is a really good database.

C

It does those things really, but sometimes there are issues with it and then this is like probably most important link to remember, right issues, Apache dark sure, so you'll be able to see what's going on with community. What is it talking about? What new features coming?

C

Also, you will be able you'll be able to know what is deployed in production right now, because you know, if you're going to get the call on friday night, you might not be in a best condition to do a troubleshooting and figure it out by yourself, so it might be better to at least know ahead of time that well, we deployed version five phones and all this open issues. Actually so we might face them in production and then, when new version comes right is released.

C

You might it's much it's easier to think about, so if we upgrade what we're getting from upgrade, that's why? Actually, if you run in multiple versions of Cassandras in your production, it makes it very very hard to track all this changes, because now you need to know like all we have version 2.0 point 15, 14, 13, 1.2, and then it just it just hard. So upgrades good idea. Actually you don't even probably want to run two years old Cassandra clusters Oh at least, if you're running you should know what is going on with them.

C

What what is the possible issues inside yeah? That's my favorite one so Cassandra to rinse right. They should not connect if you don't want them to connect and then also when you have two rings right. If you set it up this one DC second disease, I will be communicating, but in this example you see this confused note in between what happened. We had one cluster. We had second cluster, then at some point. This note that was the Commission could crash the diet we removed it then later and then those clusters are using been out later.

C

We brought it up again, but we it's all elastic, we use with you, use Amazon it all those ec2 instances, so we just assign it to another cluster, and then we thought is everything. Okay, until we saw that first cluster still things that note the one to it and then first cluster was showing it is down. Second, cluster was showing is a knob and we thought well. Okay, at least at least. Cassandra is not streaming data and then everything should be fine. We were planning to actually deal with it and then I was running.

C

So I was running as sequential scan to prepare data for a spark analysis later and I found that when I core in this cluster actually sorry that first cluster some notes, return, respond with key space is not found and like well how's it possible. It's definitely there and then I start digging and I found that our client library wasn't a ver. That note is down because from client point of view, it's it calls from describe. It gets all the notes, notes of fine, so it just it's. If think so, this one belongs to that cluster.

C

It core is it. Latency is okay, so it was actually going to this cluster and then we didn't have any data loss, because actually this one, the first one wasn't really mission-critical, but I was really surprised it it's as possible actually, but we saw it a second problem that we saw at the puzzle so help me with it. So at some point at time, X we saw a spike and load averages across our Cassandra instances.

C

Right looks like a lot of users just with a modern new game like destiny or something TCP connections from the application to the nodes spike. So you probably see that thing right. That means that we had the clusters that we were ready to decommission. We stream data. It was just there. We were ready to decommission ate. It was just coming around so definitely like a lot like a spike of connections.

C

Pipe of new users Russian, probably some new service- was released like some, you may be feature, but guess what network out across all the nodes dive like there was a spike and then it went down any ideas. What happened any guesses? Oh.

C

No, but it was a good one.

B

Anyone anybody see this before and yeah anybody experienced.

C

B

This is really compaction visual. We did it before.

C

Yeah, but not that now now, that's not that time all right. So this is interesting. It's another issue, memory meter. So it's a thing that monitors how much memory is available for the instance for the Java so to know when to start flushing, ma'am tables, and apparently there is an issue with it. So what happened? Amazon killed our note and we had to decommission it. We started it with the commission eight and then it's trimmed data to other notes, and then the swimming part can trigger it. Memo meter goes into an endless loop.

C

It consumes one hundred percent of CPU Cassandra notes actually binds. They cannot process data if they become really really slow and we were using V notes, which may issue even worse, because here right now just swim to two of them with v notes. What happened? It's just blasted. It just send data to all of our Cassandra note. So a lot of them went into this hell the solution he had datastax engineer.

C

How does a lot with it and the social was to just do it rolling restarts until it issued goes away and then just do a break to latest datastax version because we use data stack. So that's why I said you want to know what deployed another one interesting. Last time we were using two disks when we present it. Last year we said we went through right, 0, 8 1, and then we ended up using two disks. We thought it's a good idea. It's not the problem, the two problems with it. First first compaction starts red.

C

One is dated, so blue one is empty space. This one is used space, first, compaction, compact, disc, 2, then Custer's ends or rice coming, because I'm just can start another compaction. It's not scheduling it really well and then at some point discuss a discus full. So the way the current behavior is not we'll just go down, but at that point we were running version. That was doing interesting thing. No, it was actually the Commission in itself. Note said: fine.

C

I'm done here sorry and handle it anymore stream, data to other nodes and then just lift left, and we like well I thought I, have 20 notes in this cluster. Why it's only like 19 and then like 18. What is going on here, it's fixed now default behavior is just just stop yeah. So that's actually yeah abss! That's a that's! What was going on this health checker Java, removing this note from the rain for the disc, it's close to full, not very nice message: yeah EBS, EBS that what we were doing to save other nodes.

C

We were just adding EBS volumes, so they can at least handle all this compaction, and then we had to remove them very interesting. Thank you for your, like thanks to our herbs, guys for making it happen and keeping stop running, and that's actually it that's it that.

B

C

Yeah so well right now, it's like you, a plus raffle and weed oblivion. Have this hard limit because no one's coming? Is this room? It's a lunchtime actually, so we have some time you, yes,.

B

That's how I do ok, so there's any questions. Yes, actually these are our microphone. We can.

D

So I had a question about your trick, so it looked like what you did was basically took. The account dependent underscore and a number to it and I was curious. If you did it that way, if you considered adding another column for that bucket and then basically making the partition key, the two of them combined, it seems.

A

B

The same thing like a composite key yeah I think yeah yeah. I mean that's also possible yeah.

D

B

Was just curious? No, no! I think yeah conceptually, that's. Okay,.

D

What it is okay, yeah, I didn't know if there was a performance reasoner.

B

No, not only is one clue, they underscore the composite key thing that you're doing, but.

C

Composite key is a it's still, probably same problem. Well,.

D

If you make it both of those part of their part, the partition key right, the two columns together then.

E

D

Still partition it separately, anyways right, good.

C

Question, how will it be stored in Cassandra and how it will be retrieved if you do this with two part issue is, if you put two things into partition game so.

D

When you well, your partition key would be the two columns together right to.

C

Column cetera, so our problem was, you: have a waste reefed the kill right now in internally. It is free right.

A

C

Hideo roki right our petition key yeah. Actually it can be composite, but yeah.

D

C

It behind the scene is just a bigger key yeah.

D

C

Doesn't matter yet what you do, but yeah technically.

D

You could yeah I was just curious for Downton ability reasons or if there was performance, trade-offs, either way.

B

A

How long did it take you to perform your migration in those scenario.

B

Ok, so the migration tickle, it was a lot of it was in planning, and a lot of it was interesting, but it took fright house it was like a like. Actual execution was probably within a week's time and then just data validation. We let it run for like a couple weeks that we saw that the the difference in the data was was was not there. The.

F

Moon fell out from a business.

B

Perspective this is this, just look good, then that's when we cut it. So my dish is a.

A

B

Going thing that was kind of left on for a while I, never.

A

Know the question to in regards to using EBS. How did you handle that at the performance level? So the thing.

B

Is we're using EBS primarily to offload these large compaction, 'he's auto, who uses what kind of compaction strategy here, if you sized here to level tier or whatever? But no, if you don't have that fifty percent of space, sometimes you will give the situation. You don't wanna get a situation, but if you do get in that situation, you need somewhere for it to compact right, so we throw the EBS on there. Let compact- and at least in our case, what we've had is that data is being deleted. It's not really being cleaned up.

B

So when we put on EBS we should get something less back out and then we just remove vba. So there is a performance hit, but our use for it is primarily in the compaction realm is where we're really focusing on, and it gets us out of this, the the really bad situation that we're.

E

B

What happens is that this fills up. We had the problem where those would just kill themselves. They couldn't handle it. So there's a way to kind of save, save that the closer it.

A

Is that permanent, or did you automate that this.

B

Is allow show some manual work and then there's some automation in terms of like you know, spinning up new notes, the kind of fill in those gaps sometimes but yeah. It was a lot of babysitting I'd, say yeah.

B

Some good code there I can help you now people can view please I.

G

Give it a good could review yes, so the migration strategy you talk about sounds pretty similar with how we've been thinking about it. One kind of an unanswered question was how we might deal with multi region replication there, because, like you, can do double rights in the same region that the alien is great. But as soon as you have another region, multimaster that's replicating. It seemed.

B

Well, we've had situation where we did multi datacenter and one of the big killers was just network. So, for example, we had a datacenter, Nepalese and Japan and transferring data or want the wire. When you get a loan on data, you will never catch up. You just gather hints, for example, or you just do so so so mechanisms you always do. Is q qo and then use the cute it to do it.

B

Then you we still a ways to validate the data, because you know you'll, never with the amount of Rights come in you'll, never be able to be one hundred percent, confident that the data is exactly the same and the way we're approaching migrations. We don't necessarily use the data center connection and we don't run repairs like that. Basically amount of data we have, we don't want to take down a live class or so yeah. You use a QE recognize anything.

C

What was they give you just an idea, so when we last time talk to emma's on folks from DynamoDB, actually they were planning to do a cross-regional replication using messaging mechanism, actually so what their actual okay there's, which they were thinking, implement and it using their simple soon simple messaging service, but yeah, just yeah.

E

In the search use case, you mentioned that the solar was probably not a good fit for you and because it was a graph kind of thing so just curious to know like did you guys, try out with Titan or any other graph database. Oh, oh.

B

Yeah so I think a bit initially, we went through actually a lot of graph databases. We would try to orient. We try to look into do for Jay and stuff, like that. So for our particular use case, the thing is we want to go for a very personalized search, so the main difference is that you know you want one of both. We were very greedy, going personalized search. We also want a global search right. We want to be able to search across all PS and users or I want to search with a my local graph.

B

I want to search or search within my own library, for example, and the problem is when you have this kind of personalized aspect in a big index, you can imagine how big that goes in the performance hits that you'll run into your leads. Take an incredible hit and you're, always reindex ting, the entire thing so I think for us. We moved for the graph question.

B

We moved out of it because the amount of users we had and the amount of Kalinka tween users that we had exceeded what the technology that we're looking at kids handled, for example, even for day the time was. You know that cold, like master-slave kind of architecture it without doing standing, sharda mechanism. It was very difficult to do so. That's why we've leaned on this sanda distributed approach and doing the personal index to handle the searching within someone's own social graph, yeah.

H

E

Take one more question.

H

And then we'll get wrong with this thing so that you guys aren't you know I'll fight each other in the lunch line, a.

F

Good question by your migration, so I was curious. What do you guys use to backup your ring and how long did it take to actually restore from your backup? Maybe I should ask your ops guy I mean.

B

The thing is the way we back that we push everything into testing. If we do increments, we have actually our own hand written scripts that will do incremental backups. So we have nodes that have over 500 gigs. We have 700 gig notes, even things that get close to a terabyte and can imagine just trying to back up the entire thing for a hundred node cluster will take forever. So we do in Quebec so backing that up is quite fast and also restoration. It could take days actually to do that.

B

The idea is that we're not replicated not doing a kind of weird replication or anything like that. That puts strain on the cluster, we're doing it completely separate, and so we have a little bit of luxury and the time, but so yeah usually we'd, take those backups from the s3 put them out and depending on how big that is, that's the data transfer data transfer time. Thank.

F

C

Alright guess the moment everyone is waiting for all right. Let's.

B

See so it's all fair.

C

Right thing, but it's kayla.

B

Is you proofread it make sure that he's not doing.

C

Really fast enough welcome, I can I can. I can actually put it on github way if you.

E

B

That place is yours that yeah.

C

Open source completely, it's that we've given it to community right. So, let's see, let's see a vowel from bottom, so 80 to 89 anyone see very nice. It might be a miss. What give you yeah, because if it was randomly distributed so.

B

All right, nobody screaming, so don't worry one. Yet nobody.

C

Right: okay, good all right, yeah.

B

If you missed three times, we get to keep it.

C

I carries this one for.

B

A 68 hired I heard a grunt, no got to go for low numbers, don't know if you hate it up at me.

C

Yeah but it's a little friend, um okay, 114.

B

Yeah 40 50 good I'm, just lucky for back.

B

To take a chair of all right, you playing that anywhere just runs out.

B