Apache Cassandra Cassandra Summit 2014, 10 Oct 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Sony Network Entertainment: Launching PlayStation 4 with Apache Cassandra

Description

Speakers: Alexander Filipchik & Dustin Pham, Staff Software Engineers at Sony Network Entertainment

Since the launch of the PlayStation 4, many of the PSN features have been delivered using Cassandra. We will be talking about our experience as we launched one of the most popular gaming consoles in the world on well over 300 nodes.
- Why we picked Cassandra
- Exactly what PSN features for PS4 are powered by Cassandra
- The infrastructure used to deploy our clusters
- How we monitor system heath
- How we design, test and deploy
- Issues we faced and lessons learned along the way

A

A

A

So I'm glad I got so many people, so many people want to learn about our experience with Cassandra. Well, usually, our usual competitor is Microsoft, but today we, our competitor, is netflix right, big Cassandra, user and I'm glad that we actually were interested. We interesting to other people to learn from us, so today were going to talk about our experience with Cassandra.

A

Some of us might know we're a very small team. We spend probably so six months before the launch. We were not sure if we're going to use Cassandra in production in six months. We after we made the decision in six months. We designed our horizontally scalable system, we deployed it to production and in Abbott we survived all the lunch. What kind of one console war, and also it's in production? It's full speed in production right now,.

A

So um myself, I'm Alexander flip chick I go by Alex, so I worked at Google for a little bit. I worked at betfair and when I joined sony well, it's really funny. When I joined sony, I found that it's actually that our particular team San Francisco team, it's a really start a P environment and in front of them actually have the start up a etiquette. So it has actually helped us a lot during the luncheon preparation.

B

So I'm Dustin also a staff engineer. As you can see, this is a ratchet with the staff but uh so get.

A

It guys, okay, so.

B

So yeah I'm, an avid gamer working for sony, has been in a dream, come true for me, so making this work was very valuable from for my career. So hopefully we can share some of those lessons with you and who's here to hear us or who's here for the free ps4 on this kind of curious. Okay, all right I, just only like one, honest person, alright, yeah, ok, all right so I guess.

A

So today, people are talking about journey journey towards Cassandra, so we'll talk about it too. We'll talk about Cassandra beckley as for features. What actually, what actually, what particular features of playstation, 4 infrastructure and actually Sun infrastructure uses use Cassandra behind the thing I'll talk about it, obvious stuff and then we'll we'll finish with our lessons learned, because we think we learned a lot of interesting lessons. We did so many things wrong, but it all turned out to be okay and fail fast and be successful at the end right.

B

Okay, so I'm going to go kind of the journey towards Cassandra. It was a tough decision. It was one of those things we had to really come together and agree on on some of this technology.

B

So we can't talk about a journey without actually talking about the challenges that our particular team had to deal with so challenges for one like we said before, we're a small team Sony's, a big company, but little do people know there's a lot of teams that make it happen, they're all little bit different. They all do things different ways. Our team, in particular was was pretty small like a very much like a start-up, which means that we didn't have some of the operational support, sup this guy and right.

B

So it was that was one of the big challenges legacy support. We also have to support some of our previous years released features and make sure that those are still performing to scale any prada shoes we'd have to kind of this juggle. Support of that, and also creation of new features, hardware deadlines, this one's an interesting one. I don't know how many software companies get to deal with this, but we had a very hard deadline. When it came to the hardware hardware was going to be released november 29th of 2013. There was no changing.

B

That firmware was going to be flash on to the console there's no. Turning back. We had to make sure all our iPads were built and up to scale. We have to meet that deadline. There's no saying we're going to push an alpha, ps4 or something like that. You have to really deliver and also we have to scale at peak time.

B

So the launch of the ps4 was interesting because it was released on Black Friday, which for all legacy systems and current systems, it was be like we knew a spike was coming, we knew we'd have to handle the traffic I was coming, and that was a really interesting challenge, the kind of blooming figure as we're going closer to the date.

B

So why Cassandra? Some of these sound, very basic, but they're really mattered a lot to us when deciding this technology, for example the strong community being a small team, we needed a technology that we can rely heavily on the community and their learnings. So we don't make the same mistakes. We made new mistakes, but it's it's: okay, horizontally, scalable architecture, duh, yeah, yeah, sure you really want this, but we knew relaunching with the million consoles. We knew this would grow.

B

We really want to have a database that would scale with our user base good performance, another no-brainer. But honestly, yes, we need to be able to serve our customers with with low latency high reliability and just be able to provide a very responsive service for them. Cost-Effective, Cassandra great for running on commodity hardware.

B

I guess: maintenance costs are significantly less than if you're talking about like an Oracle or you know, MySQL or something it's a little bit more cost effective to to maintain, and it was a brand new adventure for us. It was a good to as our kind of culture, this kind of delve into some of the new technologies and really make it work in a prod setting.

A

All right so probably wondering what kind of features are powered by Cassandra. So what's interesting when I joint, when, when recruiter called me and asked me, do you want to join Sonja like Mel? Why? Why does hardly a company? Why does you comb? Why do you call me actually I'm software engineer I'm? I want to own a web part of this universe and then to now that sony has a lot of services. Actually whole thing is about software, not about hardware, so this is a playstation 4, actually real screenshot.

A

What's new one of the features is powered by Sandra, it's all internal, it's our small facebook, which is quite very powerful I, don't not all the features I exposed as for now, but it works like an yeah. It actually works like a like something like a mixture of Facebook and Twitter video library, we do stream. We do stream and, for example, vudu hulu one of our competitors and then, if you, if you're following announcements, will be rolling around with cloud TV soon so will be interesting very soon.

A

Actually so my library games library, it's powered by Cassandra. What's interesting about game library, it's kind of entitlement, storage, it's personalized data, what user own and then we built an interesting solution like custom basis or changing that allows us to to run a very, very fast queries to pull user related data.

A

Any data actually PlayStation now, which is cloud gaming, I, don't know! If you heard about it, you should because it generates a lot of buzz. So we run games on our hardware and then we deliver it. We are streams to playstation 3, playstation, 4 and beyond.

A

Notifications are powered by cassandra next librarian inspired by cassandra to live area. Is it's a place where you can get some information about game particular game? What what's your friend to see? What trophies your you know, your friends actually are getting. Oh, what's just to see what's going on with which game you can see like recent activities, related items so add-ons ever since powered by Cassandra our store catalog, this is web. Catalog, it's powered by Cassandra to and PlayStation PlayStation catalog is a biggest. Actually it's it's on a scale with it's it's.

A

It has same scale as Apple, catalog or Microsoft Xbox, Live, catalog, a lot of products, every single game and a lot of movies, actually pre-order functionality powered by Cassandra. It helps it helped us a lot during destiny, lunch another big title, I'll talk about it later, playstation plus feature powered by cassandra to playstation, plus its arm its kind of community.

A

It's a it's a feature that allows us to to to help gamers to get insights on new games, to get new games free games and also its requirement to play multiplayer on playstation 4. So if this thing goes down, everything goes down.

A

Recommendations, recommendations, not I, they use Cassandra I, would say, remove download list. So if you buy a game online using our webstore, you can push it to one of your playstation force or playstation series depends: it's actually it's a PlayStation Vita Cassandra to share button. You want to share your experience. Upload movie, I'll create a small movie and upload it or a screenshot Cassandra identification. So Disney is given a talk about us. 2.0.

A

We implemented it to actually and many more features like TVs look into a lot of hardware talking to our services that actually uses Cassandra behind the scene.

B

Alright, let's talk a little bit: Ops II stuff, so the infrastructure, so more cassandra is hosted in both cloud and physical data. Centers fun fact: you know we're several hundred nodes and growing, and we basically got to the approach that we do a cluster / feature I. Think when we started out, we did a bit of rewrite patterns and all that kind of trying to be really smart and try to figure out ways to do it, but then be removing.

B

This went back to culture by Pete Rose, the most robust way of doing it via nodes and assigned tokens. I think also talked a little bit more about that, but we do have a kind of a mix a right. Now certain clusters are vinodh, some other ones or assigned tokens, and we use as the NX client. We don't we're not using any of the other drivers. Currently, okay, that's for ps4 cloud nodes.

B

So this is not the complete Cassandra ecosystem, but this is the kind of part that we're responsible for data throughput we're talking about gigabytes per second, we transfer a lot of data in between clients and our Cassandra. Our read rights over 200,000, a second that's what data size? We have tens of terabytes. This is mainly user serving data, it's not like user event. Data which is are actually significantly larger than this.

B

But this is live data that gets served to the customers and, as you was a previously announced, we know we hit the 10 million ps4 sold mark and then also we have 80 million ps3 soul. Then we have I guess around 250 million users.

B

Our clusters, so, like I, said before we did. This based off, read right pattern initially didn't work out for us. We had some interesting downtime due to too kind of putting a little bit too much in a particular cluster, so we moved away from that. They kinda couple all of our systems. I just said that, ok, so our seeds are referenced by dias names. This allows us to kind of change seats behind the scenes without having to change any configurations and applications or anything like that.

B

It just kind of works, so this has actually been a great help for us. By doing this, it lets us do maintenance in a lot easier way: dyster compaction, that's what you use to quickly and we do do so many of compassion for some column, families, I, don't know if anybody's run into a situation where you're, as this tables, keep on growing and for some reason, they're, not shrinking and yeah. Sometimes the full compaction is all I think that they really get it get it out of there.

B

So a typical note, as you can tell by saying this, it says we're an AWS, so we have n 24, x cells and we're also using some I to to xls about the same one is a SSD backed which is really helping with some recompense, but that's that's kind of a typical kind of note in our in our world, which you know us to a femoral disk. I'll, probably talk about disc later, how we're configuring the discs, what we did during launch in some of our lessons there.

B

You know we put the commit log on the root partition which helps when you know you need to restart a node virgin thing and right now we're doing everything via topology file. I know there's like ECT stitches at the time when we launch. We want apology file which we're managing with with chef right currently, but it used to be a pain to maintain. Now it's a little bit more automated, but in retrospect maybe we should have investigated a little bit different way of maintaining our clusters.

B

Okay, so a negative is the way we set up our are replicas, so we stint relief between availability zones so just having the adjacent tokens trying to be stitched between a Z's. So obviously, if an AZ goes down, we still have some some notes going on right now. We're using leave chef to to maintain all that right now, so do it when it built it out.

B

This layout, like I, said before we did some interesting things with this trying different ways. That's one things we like to do is like try a lot of different techniques, even if maybe ill-advised so we started out using a raid 0 and get throughput.

B

However, with raid 0, obviously, if you lose one device, you lose the array. We had situations where this would happen and that's why we were like okay. Well, maybe we should do a raid one and you probably asking why well you have replication factor. Why would you need to even do this? You know you have disco out.

B

You haven't no go out you're, going to put more load on the adjacent nodes to transfer the data back, especially, we have a lot of data, so this reliability at one point seemed very important to us, but the problem with this. Obviously you get half the space and also your you know your throughput changes as well. So right now, what we're doing we just we broke the raid, we split it, and now we have configured casana just to write the two partitions.

B

So now we have parts on some data and one partition and the one and another disk. So that's what we're doing right now and we have to split the rate- is we're running out of space, so that's some interesting emergency maintenance there plus the resizing many of our sign token that clusters no magic here. We just double. That's it. You know moving nodes. We could do it's a very network. Heavy operation we found as network, has always been kind of a big problem for us which I can talk about when incoming slide.

B

We had this kind of learning of thrift payload size, so we did hit this limit, which we, with the current code, that we have how we're handling particular users. We had some really power users out there who would own I, don't know how many was maybe like of like 10,000 games crazy, but it's true and we're having this huge, just really important user report that hey some things are not updating. I'm can't see certain kind of games in my in my in my library and that's, obviously, a problem.

B

Temporary fix right now is, to you know, increase those kind of threshold to allow this user to be able to write his data into our store, not ideal. We may have to look into you know teeny code, but for now this was the way we we got around that bouncing nodes. We had a situation with Vee nodes where you would add a node and all the nodes would have a different view of the cluster Utah half your notes, think the other half is dead, that other half think the other ones were dead.

B

They didn't agree on anything. So what turns out is that the our iteration, with our version with v nodes, was super chatty. So when they're trying to communicate it took such a long time to figure out what nodes were up, they would actually be reported as down so try and change the V convict threshold. Let's it so the notes kind of give a little bit more leeway to say if a node is up or down. So it's it kind of helped with that that situation, I entered data center lane see.

B

I'm not going to talk to the finishes all right, so one of the things we've seen with having multiple data centers, especially with the volume of data that we're writing, is that data will take a long time to replicate, and not only that is that, because we are, some we do use VPNs is that those pipes can be filled up very quickly.

B

Some of the ways we got around that was for one charting their kind of data by always replicating regional type data into those data centers and then not replicating to other data centers that helped save a lot of data transfer, and we also added another tunnel to to let the data transfer across. But that was one of the problems with with having like a VPN pipe thing, it's kind of dangerous.

B

It's always going to be potential ball neck.

A

All right so one of the important lessons we learned you want to want one of the important lessons we learned. You want monitor everything every single possible metric. You can collect, you go and collect it. It helps a lot because when things go and run and things go and run very fast, we saw a domino effect right I'm going to take keynotes, it's not possible, it is possible, you can.

A

You can lose all your cluster 100 notes, cluster in matter of mmm 20 minutes right, it's cascade and very fast out of memory errors, for example very dangerous ones, corrupted SS tables another another interesting things that like that was one of the reasons we actually we switch from read/write patterns to feature by cluster. That's to make sure our features are independent and some small feature cannot kill bigger feature right so just turn just to make sure we're.

A

Not we don't have this coupled system sitting in one database yeah because interesting things happening so we use nature's. We using Cuban elastic, search to search our logs vision, graphite form application matrix. We not only monitor Cassandra metrics, like internal Cassandra metrics. We also monitor what is the latency between our apps and Cassandra's. That's important for us, because really I'm not really interested in.

A

What's going on inside Cassandra right, if it given me two milliseconds to one millisecond latency between my app and Cassandra, it's fine, if it's eight millisecond or if it spikes up to a second, that's, not good, and then internally everything could be looking very, very good. I mean it could be one bed node and then all our notes are fine, but there's another thing about Cassandra and then now I don't know. If you saw it Spotify presentation last year explained it very well.

A

One bed note can actually can actually because of one bad note, your application latency can spike right. It's not like one note goes down and all the writes and reads against this note: spiking or everything can bounce, and there is an explanation yeah you can. You can actually it's kind of very simple explanation because of the internal implementation and cues in the way Cassandra redistributes reads and writes.

A

We also use AWS cloud watch not for everything just just to check stuff some, sometimes yeah, app level monitoring which is graphite, ops center. We have it yeah we use it. Sometimes we more rely on our own stuff right on what we built so destiny, I'd love. How many people heard about this game a very, very big game. A lot of people were waiting for it. It was. It was launched on Tuesday, so yeah, oh cool, um so, what's interesting about destiny, lunch is later on RPS spiked, five times actually from from the average.

A

So a lot of people were waiting for for a midnight and we couldn't afford. Actually we couldn't afford any downtime at all. Everything and everything went smooth. Actually, so here's some stats. So this line, that's our rps. So, as you can see, it was around against one of our API. It's not against whole ecosystem right. Just for just to give you some some idea. So at this time it's usually it's Monday midnight, it's usually 530 s, so traffic was higher, actually just even before the launch and then right at midnight. This is huge.

A

Spike of RPS was a very big hit and then on the bottom. This is logarithmic. You can say our Cassandra latencies actually for some of our API is like get index Layton's to get catalog latency, and so you might wonder why. Why is why? Do we see here? Two milliseconds like three milliseconds and red line is probably 25 milliseconds.

A

The thing is, our production system do not operate with a small. We don't transfer tiny bits of data, sometimes to transfer a lot of data. For example, one of those calls transfers on average 100 kilobytes from Cassandra or to Cassandra preço. That's why some latency is a higher, but it's still compared to legacy system. Latency is a very, very low, so lessons learned section, it was a very interesting lunch. It was a very crazy lunch.

A

Microsoft was with xbox system was very close. They had more people, probably maybe 10 times more peoples in than we had, and then we didn't have any choice. Actually, our days were not movable and we just we just had to make it work right now we couldn't afford any downtime on a lunch date when we use it we're going to sell 1 million consoles in the neck and then, following months, another one to two million consoles, so we didn't have any migration from all system. Can you stand?

A

Everything was built brand new, so, on a on a day on this day of the lunch, we just pulled the plug, saying: okay, guys, here's our api's our clusters up and running, go use it and it was very big hit for api's for Cassandra's actually for whole infrastructure.

A

So some of the interesting lessons we learnt about st annex client cross the sea latencies. So we made a mistake in a config file. It was a it's very small mistake, but it gave us a lot of headaches, so I sinus conflicts will not configure it to send the send requests to local data centers. So our app apps work going into random data centers right, for example, from us to Japan latencies. What terrible that was fixed prior to to lunch.

A

Another interesting problem: in balance: no traffic heavy was so it seen it. Anyone when you have like two very hot nodes, and also now it's just doing nothing. So it was another interesting interesting thing. So st annex by the by default uses md5 hash hash an algorithm to find the note which owns a particular row key right. So an hour Cassandra's were using murmur three. So what was actually happening? That sa onyx was constantly sending traffic picking run, coordinate your notes and because more more 332 bitin default Hessian, it's 128 I. Believe right.

A

You can imagine that ever since it was generated by st onyx, all the all the all the hatches were off. So that's why we had two very hot notes: dns cashing in the GM another interesting thing so initiative different things can actually can happen. For example, amazon can say can send you a notification saying you know what you know it will be decommissioned. You instance, if physical instance will be decommissioned in two weeks, so Oh in three weeks. Oh it's! Actually it can go down.

A

Amazon had a pretty bad outage one day so when they lost high-level abilities on and then and that's a reality, it's not the thumb. It's not like it could happen. It is. It happens all the time so Dana's Cashin helped us because we could DNS using DNS names for signals, helped us, because we had this ability to swap instances behind the thing. But if you, if you're going to follow same strategy, remember that driven by default java, caches DNS names, so you can, you can fix the property, you can change the property.

A

But if you anything, if you don't do it it will it's not going to work yep.

A

Client client seats yeah, so this is this is our tale of two nodes, as you can see two nodes very hot as an ounce almost doing nothing, it's actually it's pretty good situation, because what we saw sometimes it's like sixty to eighty percent I mean CP utilization three times more traffic in and out another notes, 20, that's how it looks now.

A

Cp utilization numbers are not correct, it's high, but that we captured it when we when we fix the issue so other interesting lessons single bed, node can raise up latencies significantly, it can raise latencies it can, it can and it it can be in so it's your latency will will spike and then it will go down, but it won't go down to the same level as before and then set your situation. It's it requires fast, actually fast actions, because it could be the notice cascading.

A

When you see latency spike, you don't know, what's going on with it, with your cluster, actually write without it without checking some stats, because it could be that one one of the note just died or it could be. We had a situation when one of our notes actually start started to fail for no reason right. The way we found it is that we saw that three as a note, actually gathering hints gigabytes and gigabytes of hints and they're constantly, and they were failing to transfer those hints to one of the the one of our notes.

A

The strategy for such a situation, at least what we learned with our limited of support, operational support. We just rebuilt note: we don't we don't buzzer is fixing it bad note with we. We just rebuild it in real time. It's usually depends on depends on instant side. It depends on data size. It takes from several like minutes to several hours, but it's actually safe. You can do it, you, you won't say any downtime. Actually, if you careful and then it's very fast.

A

Yep, taking out an entire Cassandra cluster is very, very easy by very easy I can give you some real examples. If you've write a lot, you can slap him on tables very, very fast, then out of memory can follow out of memory will cascade across whole cluster. We saw it, we didn't see it in production. We. So during our stress testing across all our air alone.

A

Ap is the fix was to scaling up the first fix and the second figs. We compress everything we've right to Cassandra we're trying to minimize traffic as much as possible, because it's easier to just kill our appt level layer right by adding mod Tomcats. We use Java it's much easier and then it's much much easier to do a compression on on an app site on a client side, then right to cassandra and wait till it compresses and then here's this risk to kill it.

A

Another thing: cql quarry I'm, not sure about latest Cassandra 2.1 on a version we run, you can kill Cassandra note by running the Brawn coring. You can do it actually, if it's not indexed yeah, if it's, if it, you can especially be careful if you created here. If you created your scheme us using shrift using Cassandra client, and then you try to do a stupid thing like oh, how many, how many rows? Let me do a count. It can bring. No down you'll, see out of memory.

A

You could see out of memory exception, corrupted, SS tables, that's a funny one that we learn two weeks ago. Oh look at them. We we have a big cluster. That's that's powering library, feature user level, data, tens of terabytes of data and then same cluster hosting a small small key space. Nothing not probably very small, but not as big as as first feature and then I get this error in production corrupted as the stables one node starts rolling it and it died without of memory.

A

Then second note then, sir note well lost eight notes in eight minutes. That was very interesting experience, so the fix was just to sacrifice one feature yeah just for some time, just stop it and then and then fix the problem.

A

So that alone, it's a operational support, pointing on a huge spike.

A

Yeah, so that's a good example of one bad note. Actually raising a plate ensues, it's spiked. We removed a note. Actually another thing right. If, if we see that we have a bad note, the first thing we do, we usually we just remove it from a ring. We just stop Swift protocol we can afford, we can afford hints we can really like fine, it will be gathering hints and other notes, but we need to fix it as fast as possible.

B

Okay, so some of the things that we're end up monitoring a lot more after many of the issues that we had during during launch is a mem table. Flush frequency, like you're, saying, like we had a single note and caused significant latency across your entire entire, a player during. I guess christmas time.

B

We would just see these this lane seat of our summer, our surface just creep up, and then this I kind of do somewhat of a let's step up and this continued to get worse and it always would boil down to a single node that was Washington tables a lot and for a while we were just okay. Let's list is deal with this and it kind of restarted and it and and and what's funny, is like we just restarting that one node brought all the a plane see back down until it happened again.

B

So that kind of says that you know this, that mean that made us really monitor this kind of situation. A lot, and especially during some of the holiday seasons, hinted handoffs. Yeah I mean that that's pretty common practice but I think when it comes to hinted handoff yeah. What's up.

B

Root cause of that one and what eels traffic and also we have the hot nodes. So when we have the the the two hot nodes thing where where was we were using token aware so when we had token aware, with the bad hashing all traffic to one guy, he's coordinating everything fixing that asked like help alleviate the this particular problem as well. So it was kind of you know. It's bad usage, but yeah so hinted. Handoff is good to know that yeah we have hinted handoff, it's happening.

B

That stuff will happen, but it's also good to know who they're failing to garbage collection. You know you always want to avoid those whole GCL situations, compaction, Alex, we'll talk a little bit more about compaction later, but but yeah. That's seeing how frequently it's happening if you're caught always compacting and kind of data sizes that that you're dealing with their you may need to kind of re-evaluate or maybe add, more nodes or figure out how why it's compacting so much and histograms so another Sony, a guy is making a husband about histograms this afternoon.

B

You should definitely check it out. He's an NGO, a more detail, I believe: okay,.

A

B

What we found VPNs are dangerous ball neck, more data transfer, the more your notes, get get backed up easily build a note to fix, talk about that and backup data replication packet. It helps, but it doesn't protect against data corruption so and the normalization costs they always say denormalize, but it does come with the cost, because this is cheap, but ec, tues or not yeah detail on almost everything. Keep this size small. Yes,.

B

No, we have quite a bit of whole families. The only nope I can I'll take have enough fingers for that so and because we have a lot of TT else, unless you want to adjust your DC grace period, if it data is not that important, if you have nodes down you, don't care about the zombie record coming back adjusted you see, grace period just make sure that data gets removed. Eventually, transactions to be creative use, two dot one is supposed to solve it: okay, load test with real data.

B

Okay, I need to wrap up pretty quick here.

A

B

Replication strategy based off your you, know, different usage patterns. You really want to tailor your replication strategy towards that. If the data is source of truth or not really important make sure you have a really good replication strategy where the data is located. Different data centers have your local data, their user level versus app level data usually low, like up level data more replicas, or else you create a lot of hot spots plus they're wide commands should be staggered. Global repairs are bad.

B

Anybody remembers that yeah, so don't do it, especially if you do have a lot. Automation be very wary that automation could also mean fat fingering, something and pretty much screwing up your entire cluster. Okay, really.

A

Quick coke ends, so we started with v notes. We started goes without Oh from rational perspective. It's really easy. We don't have a lot of opt DevOps, so we thought it's a good idea, but increased chat knees on ghost Protocol with v notes. That was one year ago, this box supposed to be fixed, but we're so slowness and repair and cleanups, and also, as the unexplained doesn't know, does doesn't like V nodes because of the pulling mechanism.

A

Also one of her vena one of our denotes cluster crashed one week before the lunch and then actually we spent all day trying to bring it up, and then you know in a wigger. In the next day we decided that I know it's too risky. We converted seventy percent of our cluster to assign tokens. We used custom, build streaming streaming library to stream data between Venus cluster and non Venus cluster.

A

It was done in one day four days before official launch compaction, yeah worst enemy largest, gives, with high CPU, longer compaction, very painful, very, very painful. Sometimes we disable camp actions and then run and when our traffic goes down level compaction with size, two tried it on several nodes in production, startup time, sepia traders iOS start time is interesting. It's not really clear, but you know it was because a lot of files are generated. You'll spend much more time rebuilding now. So, if you, if you stop it, it will take.

A

It won't take one minute to start updates removals hit up disk somewhere. Someone somewhat notes had had two hundred of gigabytes of real data and 400 gigabytes of removed. Oh update data right. So that's a long, that's actual! Is it it that's a little journey towards Cassandra, and this last slide of course,.

B

So I mean there's not a lot of time for Q&A, but we are going to be in the Cassandra live section downstairs and we'll be able to personally and.