Apache Cassandra Data Modeling, 12 Nov 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Description

Speaker: Patrick McFadin, Chief Evangelist at DataStax
Slides: http://www.slideshare.net/planetcassandra/c-summit-eu-2013-apache-cassandra-20-data-model-on-fire
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?

A

uh So we have, uh this is 30 minutes to get through my slides and at school. um We're gonna do this together right right on all right, so uh patrick mcfadden, um now chief evangelist, that's kind of a new thing for me. I also do data solutions for data stacks, and so because of that I get a I get to work with customers. I get to work with people in open source, which is great, and I just get a lot of use. Cases and part of that is you know what really helps.

A

I think for the community is that I get a variety and where you may have one specific use case. I get to talk to you and find out what it is and then I can take it into to the next person and say: oh you know what somebody else just did that or maybe somebody else did something like that. So it's great because I get to talk about that and I have some actual use cases today that we're going to talk about so, let's get into it.

A

So first things first, as you know, data model is king. I talk about this. All the time, and so with uh cassandra 2.0, we, we got all kinds of new stuff to talk about. um I've done my uh three data modeling talks, and so this is just an update on that, but it's just always getting better. We we have so much that we can get out of our data models and there's so much bad. You can get in data models as well, so we we have to understand the underlying storage engine.

A

Sometimes- and that's I I want to get into that a little bit today. As I talk about data models, we talk about things like using the partition keys and the clustering columns, and things like that. Well, that really kind of points to the underlying storage engine. What's going on underneath because that's the real power in your data model, it's not just creating the right field names and using camel case and stuff like that.

A

It's a lot of it is just understanding where, where your performance is- and I do a lot of that- uh where we look at performance as the primary motivator for our data model, we want to make sure it's performant and not going to fall over under load. So I want to talk a lot about that and the the topic today is going to be tuning as well, so we're going to I'm hesitating because I'm gonna dig into something today that no one else has, I don't think, and uh so just strap on.

A

This is gonna, be fun and we brought elvis along with this. So this is gonna be fun. So, let's first things. First, let's talk about new features, uh lightweight transactions. Now transactions are more of an acid problem. Right I mean, if you're, using a relational database like mysql lightweight transactions, are not what you're going to get you're going to get full transactions, but in a distributed system like cassandra, lightweight transactions are very useful for a variety of edge cases and distributed systems have a problem.

A

Some of them are solved in really bad ways such as a distributed lock or external locking system. So let's talk about this particular problem here. Here's here's my problem and this is a complicated slide. um The problem I'm going to have is whenever I'm creating a new user account, so a new user account over on this side.

A

I have one process and this process is like hey: here's my I'm going to select from this user table and I'm gonna look for the existence of a user, well that this is a read before write, which is an anti-pattern in cassandra right, because you have this moment. Well, here's here's how it would come down so at t zero. I look for p mcfadden this guy. It might be me- and it comes back with zero rows. Okay, that user doesn't exist in the user's table. Awesome! Okay! Well at t1.

A

Another process asks the same question now for that particular user name, which is a primary key. It comes back with zero at that time, so from t 0 to t 1, it could be a few milliseconds. Well, so the process 1 says well, there's nothing there, so I'm going to go ahead and create this record. So this is me that I'm creating this is patrick mcfadden, that's being created and writes that into the database well process. Two is unaware of that. This happened and process. Two is gonna, say: hey, there's nothing there.

A

I'm gonna go ahead and do it too so that t3, this guy paul mcfadden from oracle, gets overwrites my record and that's not cool. So the sad panda right I mean that's and that's a reality that happens. um It's a sad reality. This happened behind the reality, and this is this is the edge case that we have warnings about hey. You know.

A

This is why you don't do a read before write, because the guarantee that the next time you when you do the right that it's going to over potentially overwrite the thing that wrote it before so at t2 this was written t3. This was written. T3 wins wrong record. So when I go to get in there with my username and password it's going to reject me saying you don't exist, but I just got an acknowledgement that I created my account. That is not workable.

A

Okay. So how do we fix that? Well, I think you probably know the punch line right lightweight transactions, so a lightweight transaction is going to give us. I know it's going to trip on that. um What's going to give us, is this this edge case solve and this case I create go ahead and create t0. This account and I use, if not exists, new, that's new syntax.

A

Now, what's cool? Is it's going to come back and say you're applied it's true, so we're going to it's going to do an existence check to make sure that it exists, that's cool! It uses paxos to make sure that it is the one that is doing it and creating that record exclusively and when you're it's applied. It says to true and you're good to go. Well that that's good, that's insurance from the application side. As you guys know, I love writing applications.

A

That's I hope you know that, because applications are what we're trying to do, I don't really you know. Databases are just another way to store data, but applications are really where that's what is fun and exciting, but it's not fun and exciting. When I have to write all kinds of crappy application code to make sure I'm not overwriting records, the database should take care of that and it does so. In this case I got an existing record. So what happens later when paul comes along?

A

Well, that's later, as I mean maybe milliseconds later, when that happens again, our application code is written, so it says if exists well now what happens? Is it's going to come back with a false and say no dude, that's gone man. You can't do that and that's what I wanted. That's what I really wanted when I use it if exist. Now, it's stopping me. It's like well hold on there and it's not going to stomp on my record and from an application side. That's easy to deal with.

A

Whenever you say if exists, then I expected to check for that and come back to me if it does exist now I can go back and say I'm sorry that username or exists pick a new one and kick it back to the user. Exactly what I wanted in the first place. No more of this uh whoops I'd overwrote your record, so the fine print.

A

So it's meant to solve those edge cases. It's not meant for everything, but you have to have uh awareness of what it can do. Of course, it's no there's! No free lunch. It's using paxos! That means we're going to do more round trips, but man. It is very useful and I'd rather do it there than I would uh do in my application code. uh I put in this shutdown. Zookeeper I've seen a lot of uh cages.

A

It was a project that used zookeeper to create this problem or create this exclusive lock that has edge cases as well, we're eliminating those edge cases by putting it closer to where your data is that's where it should be. So there is a bit of a latency and it can vary based on the size of your cluster, but I would say, load test. It make sure it's part of your system, but only use it when you need it.

A

So what about a real use case right? That was, that was cool. So if you saw my talk, my third talk on data modeling. I had this form versioning um use case and it was a real world example and it was almost like a setup for this talk, I'm glad everyone played along. I gotcha here's, the punchline, it didn't work.

A

um Well, it worked, but it you that actually used zookeeper to manage, and that is really making me sad. So we're gonna fix that today, we're gonna make we're gonna run right the wrong, so this is from next uh next top data model. That's what I did at summit in san francisco and there's so the the use case was there's this form versioning system that used this working version. But the idea was that we have this user exclusive lock.

A

So, as someone is editing a form, they want to be the only one editing it there, of course, and somebody even called me out on the questions like isn't there like some, you know potential where yes, there is, and you have to create some sort of cage around it with zookeeper or use lightweight transactions right, but there is a potential that this thing could get overwritten. Let me explain what I mean you might have this problem where- and this is exactly this- is the exact slide I used but there's a problem.

A

I didn't put the danger zone in there, but when you insert the first version of this of this uh form- and you want to make sure that that one person is the only one locking that form so when you say update this particular, uh you know this set locked by and by me well in between when you create the form, and you update the lock by there's this blank zone of potential danger, you could have someone come along and do it right on top of you and that that's you know, that's unacceptable in my in my uh opinion, and so let's try to fix that.

A

So when we release the lock, we may not even be the ones that have the lock in the first place. So there's all these kind of like bad state problems that you know they just don't want to have to deal with it. So, let's, let's revisit that problem again, so form versioning part. Two!

A

Now of course, there's a punchline. You use the lwt, the lightweight transactions. So what I'm going to do now is whenever I create the record, I'm going to use the if exists, if not exists. uh Syntax and there you go, I'm the only one creating this particular version of the form. So if a millisecond later someone else comes along and wants to create a version of that as well because of the primary key, the primary key being um the uh I'm sorry, the other, the form id and the username.

A

Well, once that happens, then I'm going to be the one. That's going to be the only one updating this record and that's what I wanted, and so now what I can do later is, as I update I use this if syntax. So I, if my lock buy, is me I'm the one doing the update.

A

That's what I wanted right. I don't want somebody else, updating my record. Then it's going to accept my updates. If dude comes along because there's a guy named dude and he tries to update my my version of my form, it's going to reject it and that's that's exactly what I wanted. I wanted to make sure that I'm the only one updating this record, it's very exclusive and we're managing this on the database level, it's not through like zookeeper or something else outside which has its own problems.

A

So, yes, this is a lot of using lwt, but it should be a pretty good example of how you should be able to use it to make sure you're the only one you have exclusive locking going on in your records in your database, so the old way was deal with it in kind of weird ways: external locking or I know a lot of people just rolled the dice- hey.

A

Let's just make sure, let's see if it works it doesn't you know you do not want to go with uh pure blind luck and chance on things like where potentially your records overwritten in your database, bad news, I've tried that I always get bit on it right. So now the new ways. Now we have a very formal way inside the database using exact language and exact syntax. To make this work perfect. That's exactly what I wanted.

A

So I'm going to do my exclusive locks by existence, checks, I'm sure the existing or the exclusive one writing a record. I'm going to use the if clause, to keep making sure I'm the only one doing it from then on and, of course, just understanding that there's more latency involved in that for form versioning, I'm not so worried about it, maybe for other applications. I will so um this is when you really need to load this. I I'm going to stress that, because I work with people all the time you just need to understand.

A

uh What's going on inside your database and there's no magic, there's no free lunch all the time, but um just understand what the load balance is or what the uh the trade-off is all right. So enough about features.

A

Last time I did a talk. I said after the end I said guys tweet me on what you want to talk about, so everybody wants to talk about performance and data. Modeling and performance is a big topic. I do this all the time and I love it. So, let's talk about making your stuff faster.

A

So, first of all, with cassandra 2.0 we have uh some changes that I personally like and 1.2 has some as well. But let's, let's talk about those and what is it so first of all is a single pass compaction and second one is. These are three things that I like a lot. The hints to reduce ss table reads and then finally, faster index reads from offheat, so yeah. Those are three features that probably never made it on anybody's, slides except mine, but that's cool, because then I had no one else stole my thunder.

A

So why is this so important? Why? Why am I excited about it because, like I said I I work with people all the time whenever they're like oh yeah, I just want to squeeze one more millisecond out of my queries. Well, there's a lot to do. I mean there's a lot to do to get that and I'm all about trying to get you there as well, so why? Why is those those three important?

A

Well, it comes down to ss tables being read when you do a read a query: it's all about how many ss tables there are involved, so reducing the amount of ss tables you read means less seeks.

A

So just quick review when you do a read. It goes to this uh to cassandra from there. It has to go to the disk right. That's that's an important thing. If it's not in memory, it's got to go to the disk if it has to go to more ss tables, that's more latency, so dis seeks are the evil in a read and minimizing. The amount of seeks you do on the disc is the key to happiness and fortune and speed.

A

So I got an example here of some disc latencies. So uh if you're, using sata or uh 7200 rpm discs really slow disc, it takes 12 milliseconds to do an average seat, that's 12 milliseconds!

A

So if you come to me and say I really like to do a seek and or I'd like to have a query return in less than 10 milliseconds.

A

It ain't gonna happen, so uh it could happen if it pulled it out of memory, but if it has to go to disk, you got 12 milliseconds, um 10, 000, rpm, okay, getting better 15, 000 rpm a little better, and this is just rotational speed. That's going to give you that seek but- and you still have transport on top of that now I put this on here. It's kind of like a little pepper. Ssds are awesome right. That's .04, milliseconds on a seek.

A

That's one of the reasons I really like ssds, but you shouldn't have to rely on that all the time.

A

So, if I have to do five seeks to find my data on this 7200 rpm disk, that's 60, milliseconds of just disk, nothing else, I mean there's nothing else involved, except the spinning disc in the disc head, trying to find my data so um because I like to make this point as often as possible.

A

Don't ever use shared storage, um because these numbers are really optimistic for local storage. If you start using shared storage such as nfs, uh you will be really really bad off. um This just happened to me. I just happened three times in one week. There was a couple weeks ago, three different users, uh all with horrible performance problems on queries, all of them using shared storage.

A

That's it that was the common theme between the three of them and when I mean shared storage, I mean nfs, here's a horror story. This is a bloodbath! That's if you were at my uh if we were at the meet up last night, I talked about this. Some, the bloodbath was, they were using nfs every seek was taking around 100 to 300 milliseconds.

A

So whenever you're doing lots of those a second, maybe hundreds or thousands, a second it burned out it just couldn't work keep up because nfs has just got really slow late and it's terrible with latency. So I'm going to bring that home a little bit because that's I I like to do that, but I just don't want you to call me or send me an email and say man, my cassandra sucks man. This is the worst database. I've ever used, I'm like well. What are you using for storage? Nfs?

A

Well, don't call me, then you know it's just try something else. Maybe try. You know go down to go down to the well. You don't have fries around here. We have that in the silicon valley, but uh go to fry's and buy a 50 hard disk. It's probably going to be faster than that. A million dollar uh storage array that you bought and that's kind of the sad reality. I get a lot of that. You know how much money we spent on our shared storage.

A

It should work yeah, it should, but it doesn't and um sorry and I used to buy storage all the time it was great. You know the sales guys were awesome. They would take me out to lunch dinner whatever I wanted. They took me to baseball games. It was awesome, but it doesn't work really. Well, for what we do, we're building distributed systems, not a single point of failure and that's what you're building with a shared storage box, I had an emc array. Sorry emc! I had an emc array.

A

It went down what did it take with it, my entire oracle system? So yes, I'm a little yeah anyway. Do you want me to talk about that anymore? No, all right! All right! So is anyone going to use shared storage from now on? I don't see any hands, okay, good.

A

So let's talk about uh measurement. Now I say quick diversion I! This is all right. I'm gonna go into something a little deep here. I hope you guys keep up with me. This is gonna, be good. I do this. uh I've been doing a lot lately and it has been really useful.

A

Cf, histograms, okay, I know we're just one little topic right, one little tiny topic, but it is going to save your life unless well all right, maybe not save your life, that's a little extreme, but anyway, um it's going to make your life a lot easier when it works.

A

When you work with query, optimization and working with cassandra and just understanding, what's going on so see, if histograms are histogram statistics on a lot of things per table, so they're collected when you do a read when you do it right, when the ss table is flush through the disk and when you do a compaction, so there's just a various amount of statistics gathered about this particular column, family or table, and here's the syntax. No, it's a part of node tool, c of histograms put in the key space in the table.

A

Now what you get out of it is this nice little setup, but whenever I I put that on the screen, I've done this before I'll, be on site, with some user and I'll say all right. Let's bring up on our big screen here: let's bring up cf histograms and go boof. It goes on the screen and I get the same look from everybody and is this like a wall of numbers, and I don't even know what that means. So all right today, you're going to learn so just be ready.

A

Okay, when you walk out here, you're gonna go! Oh man, I'm gonna, use that every day now. So how do we do this?

A

So this is a output of sea of histograms very short output, but this is what it looks like you get when I say give me a sea of histograms for my video db key space and my users table. I get this output and there's these really nice columns of numbers.

A

But what do they even mean- and I think this is where uh the breakdown is like- there's no units on one side and then how do I? What direction do I read it? So, let's go through that, so the very first column is the offset and that doesn't have any units. It's just buckets buckets of numbers and they're um they're kind of meaningless until we start going further into what we're actually what next column we're going to look at.

A

So let's go to the next column, so ss tables now that so that when we look at ss tables, we get a number here. So it's how many reads how many and how many ss tables it took to do the seek okay. So the offset is how many ss tables were had to be seeked to get to satisfy x amount of reads so in this particular one. There was 107 reads that took one ss table seek. There were two reads that took two ss table: seeks all right.

A

So what I'm seeing here is the efficiency of my of my queries from cassandra's point of view like how how many seeks did it have to do if I start seeing those numbers crawl down, say if it was doing, 250 seeks to find my data there's something wrong going on here and we could fix that, but we just need to know that that's happening. So um this is this is how I'm getting some idea of how much disk activity it really takes to get to my data.

A

The next one is latency and that's in microseconds, not milliseconds microseconds and that's pretty easy. Actually, so it took five writes took 250 microseconds, 10 of them took 800 microseconds and those are pretty typical numbers, but that that way I get an idea, and these are exact numbers. This is how many there is a one-to-one relationship. It's not an estimate, it's yes, that's how long it took from the storage engine next, one pretty similar, it's reed, it took 50 reads: took 800 microseconds 300 took 1250 microseconds or uh 1.2 milliseconds.

A

So this gives you a really good idea of. What's going on from the storage into the side now, the next one is a little more. um It's a different, completely different unit. It's how many uh partitions or yeah how many partitions and what size. So I had five partitions that were 1250 bytes now this is really important whenever you're trying to find really large partitions, which can impact your performance.

A

Let's say and I've seen this before I've seen way down here. You have 30 000 one gig partitions wow, that's a little slow because it took a gig of data to get all that data into I mean it was turning through that or I saw one data model where they had one row. There was 30 gigs, I'm like. Why did you do that? Well, you said, create wide rows. You created one yeah, okay, we'll fix that, and that really happened.

A

It was kind of a funny moment. I was like wow, you really did create a wide row. The cell count um is how many cells are in the storage row and that's also pretty interesting. I've had people say I thought I was only putting five cells in there. How come there's 30 000, oh, let's go back and look at your data model yeah, that's wrong, so this is just a way to measure what you're actually doing in your in your system. So um what are we going to do with this?

A

So your data model plus histograms? This is how you're going to measure this thing. So we say we test. We measure repeat so now. Whenever you create your data model and you start running some load tests now we have a way to measure this. So let's talk about that real world example. So this is a real customer. I have well I've done this a lot um real person, real user. They had a really tight sla on reeds, um actually what this guy? What happened? Was this guy kind of stuck his neck out?

A

He's like oh yeah? I could do that in 10 milliseconds and then he came back and said. We can do that right, so I yeah all right so, two days later, we did a lot of optimization and stuff, but it was. It was a pretty cool moment because we were in cf histograms a lot and we really dug into it.

A

So I'm going to show you some real numbers, so the problem was that they they they had this variability that they needed to reduce and um they every time they loaded data, it increased the amount of latency that they had, and this may be a problem you've seen before hey.

A

I just loaded up my system also, my read latencies are horrible, so real cf, histogram, sorry for the wall of numbers, but one of the things I do is see if histogram, sometimes I don't actually look at the actual number, I kind of back up a little bit and fuzzy vision it a little and you see lumps right. So you see this nice lumping in the middle here. Well that all right that shows you like where most of this stuff is now the the short the tighter it is, that's good, the the wider it is.

A

That means you're. Getting just this big variability, so that's the fuzzy part of the, but what I'm also looking at is most of these requests. These are read: requests uh we're in that 600 microsecond to 4.7 millisecond range. So, okay, that that's that's what the number is. We were trying to tighten that up a bit, but um the problem here is look at the top it was taking.

A

It was going down to four seats to get to my data on some of those. Now it didn't go over four seats, that's cool! That's what I was hoping for. I mean I've seen it really bad in the 20 30 seeks to get to my data, but I knew that really. The key here was that whenever they were doing loading up all their data, we were looking at compactions and compactions were falling behind and when compactions fall behind it makes it so that your statistics are bad.

A

Your bloom filters are getting a little stale and it's just going to take more seeks to find your data so well yeah. They had disco problems, and so we kind of did some work on that, and it's like the focus of that particular tuning exercise was all right: let's tune up your disc and make it so your compactions keep up good. So we did that boom.

A

We got that kind of straightened out a bit. We got better compactions, we were falling behind and we actually eliminated an entire seat looks gone that didn't that it does never happen again. So we know the max was three seats.

A

All of those numbers went up by two milliseconds, which was awesome, and you can see that you can see the causality. You can see that hey we took away that one seek and the whole thing just bumped up, and so we started really seeing those numbers come out and we did a lot more tuning. On top of that, we got these much better. I have the final numbers. So what about this partition size, though? That was another thing that we were looking at too all right, so they they're. You know about 6k 8k of data.

A

So, okay, it wasn't a lot of data. So hey, we were going to manage that as well, so on the partition size, we we looked at tuning this as option and because it was a size and bytes, and it's all about those reads. So I'm just going to touch on a couple of things here and I don't want to get into this too much right now. I am going to do a talk on performance tuning exclusively and we'll go more into this, but this is another couple of things to explore.

A

Is your index interval and the index intervals the samples that are taken uh whenever your data is written out, whether or not it creates an index or not? And so sometimes in in certain use cases you can lower that for faster access, but what happens is you're creating more index and it uses more memory, so the trade-off is you're using more memory, but you do get faster response, so that was one of the things we kind of tweaked with a little bit the column index size.

A

um How, when it adds indexes to those columns as well. So if you're doing a lot of partial row reads, maybe making it smaller will index that, so you get to your seeks faster.

A

That was another thing to try. So after we did all that, we we got some really good numbers and the biggest deal was this percentile.

A

You know I'm big on p99 p95, 99, percentile, 95th percentile, because that shows what you should expect that shows variability, whether or not you have variability and we had 9 milliseconds in the 95th percentile in this is production data. This is live system. This isn't under load test as they're getting as their system is being hit by real users in the real world and they're doing 10 000 transactions per second at peak 20, no 220 million a day. So this isn't a board system, you know and we're doing 9 milliseconds 95th percentile.

A

That was awesome, but it really came down to this because we understood what the problems were and we were able to measure that and that's really the key. Just knowing what you know. Knowing what the problem is, you can solve that and there were little things I mean. I could tell you all the million things that we did, but your use case is going to be different, but just know that's where you go to get data, you got to look at your cf histograms.

A

That is going to be a great tool in your toolbox when you're tuning your data model. So what did we do? We we played around with the index interval. We wound up, lowering it because they had these small seeks and then they same with the column index size, and then uh we upped the concurrent readers.

A

They had enough bandwidth because they're using ssds that we can increase the amount of readers. So these are the three things we did for this particular model and it actually, uh by making compactions work better. Changing these three values we and measuring it very well and see. If, with the sea of histograms, we got a pretty good. I mean that's pretty good right. I would be pretty happy with that. Actually I was um so here's another thing to I'm going to throw out one more thing to look at the two hump problem.

A

I call this the two hunt problem, because I see it all the time. Thank you. um The two on problem is hey. I got my read, latency looks awesome except for right now. You know you see these. What is that? It's compaction again disc, I o or worse something else happening on your disc, so uh which uh is something I I have to point out, because this shows you that you have disk latency somewhere, that everything is cool right up until something else impedes your disk and then you'll see it and granted.

A

These are not crazy numbers. There's worse numbers out there. This just happens to be one particular user that I had some data from. um I collect these things like baseball cards or something you know hey. Can I have your cf histograms thanks? I won't put your name on there.

A

If you want to send me yours, great I'll, collect it um so uh the I what we did is we throttled down compactions on this particular one because they weren't really falling behind. But what happened was they were just letting it go crazy? So we throttled it down a little bit. We did some disc tuning, um but I mean the other thing. Is you can just ignore it? This one? I would ignore it. I 535 microseconds for a read who's going to cry about that.

A

um That's amazing, but uh in this case you just know that there was a. There was some impedance on the disk and I think it's probably more expressed because of the really crazy low numbers most of that data was being served out of memory anyway.

A

Oh that's what those numbers show me so anyway, that's the two hunt problem.

A

A

Wrapping up here so I hope that what I showed you is there's this there's data model, but then understanding it so understanding. Those internals are really really important.

A

If you haven't seen my talks about some of the data modeling techniques and like how slices work, how using the partition keys, how they can help you with your data model, that's what I'm talking about is the internals. But there are a lot of good talks on on cassandra internals.

A

Anyone that aaron morton does- I mean just you know, put your thinking cap on, because they're pretty heavy, but he talks about these internals and I've. I've used his blog last pickle, so much on just trying to understand like what is going on in the internals. If you want to become like a supermodel, crazy good at query, tuning just understanding a bit about how the how the storage engine works will really help.

A

So the other thing is boy story. The disc is the number one problem when it comes to. Finally, when it comes to your data model, because you can build the best data model in the world and put it on crappy hardware, your nfs don't do that. I've already told you don't do that, because then you're going to be hating yourself, because you think your data model's wrong, but it's not your data model is great. It was just that you have it running on horrible hardware.

A

um I know you can find that out by learning how to measure a sea of histograms. uh I would you know, go out and try it on your own systems. Try it out. It's it's really easy to see and use, and I bet you, everybody who's running a production system right now is probably logged in that dude right. There he's doing it right now.

A

Yes, is he all right cool he's like oh man, we got the two-hump problem, yeah everyone's gonna, walk away here and going. I had the two-hump problem um and and then load test your data models, because you won't see those problems until your system's under load. Do not do you could do traces. Traces are great, but when your systems under load those when things come out so understand that.

A

So um that is my really short talk. I hate to have in 30 minutes, but I have a few minutes for questions. Here's my other three talks. If you haven't seen those check them out, um they're, hopefully very helpful, um and if they're not, then I'll do more um so I'll go ahead and questions anyone yeah and what about installations when one disk is used by one? This block is used by several virtual machines.

A

That is a recipe for disaster as well, because what you're doing is you're impeding access. If you get down into the disc internals, it's going to be about what the disc head is doing, if you're, using if you're using spinning disc, you have a head, that's sitting on top of a platter and it has to move around.

A

If one system asks for the disk to be here and the next one wants it over here, there's going to be swing, latency, it's just physics, if you're using ssds, it's not as big of a problem, because there's there's no longer a disk head movement that actually has to occur or rotations that have to be met but impeding on the transport is probably way more important.

A

At that point like I, I don't want to just throw out random tuning techniques, but your scheduler that you're, using in your operating system, is probably not correct, like if you're using deadline or no up you're, probably going to be better off than using something like cfq.

A

Your mileage may vary, of course, but the transport is going to get in the way. If you have multiple uh systems all nailing that transport. At the same time, you're going to back up and you're going to have trouble you're going to see the two hump problem because of somebody else, yeah more yes,.

A

When it comes to releasing the logs, if an application dies, for example,.

A

I would ask first of all: do you really need the lock? You know, and sometimes I it's easy to get trapped in the in the acid world, where you say well, of course, I'm going to do everything with a transaction, but you really need a transaction, sometimes especially with logging applications.

A

You just need to make sure that happen using more of a time. Series model in that case means that every transaction that does occur is isolated by the fact that it's in its own time, bucket banks use this all the time you.

A

A

And a potential way to manage that is to have a process that does roll ups every so often, but you can grab. You can have that process, get an exclusive lock on the final count and then use a transaction log to keep the counts rolling.

A

In that case, it depends on the speed if it's not a speed problem use transactions to to do it, make if exists or something like that or not even if exist, just the if clause on transaction locking, but there's probably faster ways to do that in that case yep next, who else do we have in the back sections it's across data centers? And then there is a what's considered local, so it's just like has quorum is across data centers or you can use local quorum for the local data center.

A

The the the first class is, of course, across data centers, so it's data center. Where so- and I actually argued quite a bit about having the local version of that- and I was told no first, but now it's there, but it if you're thinking about like, if, if you, if you have like, for instance, a username that you want to make sure, is the only one getting created you have to do that across your entire cluster, no matter where it is across data centers. So that's the first way it works.

A

Well, yeah, so it's durable and it but paxos requires quorum as well. So if it can't satisfy quorum, then it's not going to satisfy that level of consistency. The lightweight transactions is a level of consistency in your database.

A

So if you only have say one node out of four that's alive, you're not going to be able to do it, because that's just not enough to satisfy that quorum anymore. All right well enjoy lunch.

A