Apache Cassandra Cassandra Community Webinar Series, 10 Jan 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cassandra Community Webinar | Whats New in Apache Cassandra 1.2

Description

Apache Cassandra Project Chair, Jonathan Ellis, looks at all the great improvements in Cassandra 1.2, including Vnodes, Parallel Leveled Compaction, Collections, Atomic Batches and CQL3.

A

Wanted to talk about feature that little bit of a makeup for 1 dot 1 wherein contender one got one. We added support or concurrent schema changes, but arguably the most important one of all we didn't address, which is that in one dot one you can concurrently, you can have multiple clients altering your schema across your cluster, except that you can't create column, families or cables safely. Concurrently, so you can alter them, you could add columns, you could add indexes and so forth.

A

You can do that in parallel in one dot one, but not the actual creation of the table. Definition itself so in one got to wee-wee address that. Finally, and so you can programmatically I've gave and drop table for temporary page or, if you're, using a table per customer kind of model that was totally okay because I programmatically.

A

Now the next big thing in one got to that that doesn't tie into the bat mood support that is useful to everyone, not just people running on large servers, it's virtual nodes, so historically, we've had a one token /, commander server model in our in our consistent hashing partitioning. So in this example, I have the note F colored red here has one token assign to it and virtual nodes. What we're going to do is they're going to split that.

B

A

The cluster and in practice the the default number of virtual nose is 256.

A

You can increase that, if you need to, you can even have different nodes in the cluster have different amount of virtual nodes which can help you. If you have a heterogeneous hardware in your cluster, so you can give newer more powerful, no more powerful machines. You can give them more virtual mode. They'll take more of the load from your Buster. That way about in this is glad I've, just colored in three virtual nodes to give you the idea. So the problem that is solved is that it allows.

A

Even allows operations to be distributed across the cluster in a more fine-grained kind of wave, so in particular, there's a couple places that isn't particularly important, which are adding mission to the cluster or replacing them. If you have a failure. So let's look at the state of your case where I have a cluster and then one of my machines dies and it's not just a bad power supply or something that that I can replace in English.

B

In Mecca, but.

A

I've actually catastrophic, we lost all the data is on that machine and I need to rebuild it. So in the Canary, oh, where you have one token for machine in contender, one dot one and earlier you can rebuild that den machine from it appears in the cluster that that is replicated to so, and it is commonly the case. In this example, we have each range of data is replicated to me.

A

Machines in the cluster so notify, which is our dead machine, had ranges C, D and E, and you can see which those share- those two choices for each range that it's replicated to. So what that means is you for each of those ranges we can stream it from. It appears in the cluster concurrently, meaning that and a weekend can rebuild range fee and rinse Jian Ren GE in parallel, but we don't get the benefit of the entire cluster.

A

In other words, in this example, I I have fifty percent of my cluster participating in the rebuild, because I have three ranges to screen and six nodes in the cluster, but as the cluster size grows. If I have a no, the cluster and I am only using three percent of that cluster capacity to do the rebuild so if I can spread it out and right across the entire cluster that my rebuild is going to happen, that much faster and we'll get it over with now much better.

A

So when we parallel eyes this with virtual nodes- and you can see even in this simple example of only three virtual nodes per machine- I'm already able to parallel eyes that rebuild across the entire cluster, because those ranges that I'm assigning to the different replicas are so much smaller and more fine-grain. That I can have everyone participate in it. So then that's the the main idea I'll be no! It's not about extra performance really in terms of what your clients are doing of, reading and writing data to your tables and Cassandra.

A

But it's an operational benefit for replacing them machines or adding new capacity to your cluster.

A

Because, with virtual nodes, I can, I can grab pieces of data smaller ranges of data from every node in the cluster as well, and I'm spreading that across entire cluster as well, and instead of just pulling it from a handful of machine and then optionally rebalancing later so another now we're starting to talk about the feature that are specifically for know that that fat, nodes support that I mentioned the first of those is improved support for deploying and it just a bunch of disk configuration and the relationship has with the bad news is that cassandra is already replicating my data or means no three times or harvin many times.

A

I've configured it across the cluster so in having to continue raid on each machine to replicate that locally as well. Now, that's wasting fifty percent of my business days. So, if I just want to be a larger scale, we're close where I've got a large amount of relatively cold data for machine and I want to make the most cost effective use of my hardware. What I really want to do is they want to?

A

Let commander manage that as broad st and let the tendril replication take care of preparing anything when did it fail so that it doesn't look. That's what this improved jayvon supporting 12 is about it when that disk fails.

A

When, when I do have this face, it failed in prior releases pretend to really didn't know how to detect that, and it would keep on trying to write new data to a failed disk, and it would just cause problems in your cluster until we disabled that machine and took it out and fix it assuming you didn't, have raised going on to it to hide that finger from it. So what we did and went on to is we added this failure, policies and the ability to recognize this face to Cassandra.

A

So when I visit failed in one dot to you can configure it to either shut down the entire motor, because you don't want to have that node respond to requests with incomplete information or you can tell- is to go ahead and carry on, but blacklist that fatal bit, so that we're not continuing to try to write that into it, but will only write that into the health and disk that will let us be able to repair the missing disc safely, while it's running in the cluster, so the default is going to go ahead and stop.

A

But if you have, if you care more about availability than inconsistency and that option is there for you to go ahead and black with it keep running when the next feature that we worked on for the at node support is to move a bunch of our storage engine internals off of the Java heat. So what that's about is the amount of memory we have available on. The servers has by and large kept pace with the amount of disk space that we're trying to address. So it's not really a problem per se that we have.

A

The internal structures of the storage engine that require memory proportional to the bit of space use. That's not that's, not really a problem by itself. The problem is that we've had all those structures on the Java heat and the Java sheep, or, more specifically, the job of garbage collector, has not kept pace with the growth of memory in which is to say that Java heaps are basically stuck at about eight gigabytes.

A

To maybe up to 16 is the largest you can make the Java heat before the garbage collector pause times start getting large enough to affect your application in a bad way. So what we want to do is we want to take those structures that we have and move them into native memory off of the heap where we can clean them up using manual reference counting instead of relying on the Java a garbage collector to do it so that.

B

I Jonathan so sorry to interrupt. Could you speak up a little bit. Couple of people are having a little tough time here and absolutely.

A

So we're moved these internal structures optique to allow you to take advantage of the increased amount of memory available on on modern servers at the bottom line.

A

The next the next feature I'm talking about here is, and is it related to fat nodes? First, they were moving past that subject here and then two more general features related to Cassandra application development in solving the pain points.

B

A

So this is the main feature related to application development that isn't cql specific. So what we're doing is we're taking the concept of Katanga batches that we've had for forever and were we're making them on another level more robust by by adding support for atomic batches. So as a review, a contender batches are sort of the analog to a transaction in Japan er the differences that the transaction.

A

You have a bunch of steps that happen sequentially and at any point you have the option to will that that transaction and say nevermind, you know I ran into a deadlock or I all right. Someone else modify this data out from under me, so I had to abort. If you have that ability to roll back that transaction Cassandra ventures are always they only go in one direction. You give it the entire batch of one and customer says okay I'm going to make this happen.

A

So in this there's no concept of rollback, there's just the concept up, there's a bunch of updates that I need you to make happen all all together. So those of those updates that you did contender in the batch or still individual grows changes that you want to make to those, but I've represented those on this slide as three different colors of red, yellow and blue, and so those different rows in the back are ultimately going to live on different replica. So the coordinators will know that the Cassandra client is talking to.

A

You is responsible for keeping those out to the appropriate replica, and the problem is with these classic matches is if I am a batch like this, and then the coordinator itself failed before it completes sending out that back then I have I basically created another inconsistent scenario in my cluster. What we said in the past is that the climate is now responsible for reconnecting to a different coordinator and retrack that back to deal with that coordinated failure, but that's the wrong solution.

A

This is really something that the server should be handling instead of pushing it back on the client.

A

So that's what we're doing with atomic batches that we basically created a sort of a distributed, commit log called The Bachelor that the coordinator will write the batch to before I start, sending it out to provide your ability, and so when, if we have that scenario now and atomic batch world, where the coordinator fails after sending out some of the roads, then the bachelor replica will notice and it will it will reach Lee that the that for it, on behalf of the failed coordinator to the client, doesn't need to get involved.

A

All the clan needs to know is that I gave you the badge now it's now it's your job as the server to take care of it and I. Don't have to worry about it anymore. So this is actually the default now in tql batches. So if you safety, pin back in zql, then you're going to get an atomic back.

A

If you want a non-atomic batch, if you want the old behavior, you actually have to say he can unlock back and- and that will be the OPA here, which is there's about a spirited recent performance overhead to the atomic bachelor. So if you're a one hundred percent certain that it's okay, if you get partial, match completion or for whatever reason, you're okay with that, then yeah, you do have the option to fall back to the old behavior and get that slight performance edge.

A

But in the general case you think that it's more important to do the surviving by default the spacing by default. So we made atomic behaviour of people in CTL. Now we didn't really have that option for the thrift API. So what we did was we added a separate switch next method, so you have a vacuum. You take method for the old behavior and we added a new atomic batch mutate for the new baby here.

A

So those of you using a rift, client or script based clients will want to be aware of that and how those different clients to decide to expose that delivering this to the c ql updates that we've done in 19 2- and this is this- is where we've really gotten a handle. I think, on the common pain points that people have was doing.

A

Data modeling with we basically been exposing the contenders, storage engine more or less directly to people and saying here, make sense of this and write your ass so that it makes sense, pretender, storage engine, and you know that we've been reasonably successful with that approach, but I think long term that we don't want to do that. We want.

A

We want to be able to give people a more familiar model and take care of mapping back to the storage engine kind of under the covers, and let people not have to worry quite so much about the innovation details of comparators and composite columns and so forth.

A

So to do this, we've done a couple things we fleshed out the support we had for compound primary key and mapping, user-facing tables to storage engine column, families under the hood with that, and then we've also added new features like collections to allow you to more easily handle common scenarios of the normalization in your application. Growth.

A

So this page here this light here, this is actually legal. It's legal, both SQL n p ql in both cassandra, one dot, 1 and 1 dot 2. So this is. This is like the easy case where I have a table of it's just a simple primary key I just have a handful of column that.

B

A

Need you to partition alee I, don't need particularly to partition in any particular order. So this is a really simple example that we've been able to do for a while, and what I'm going to do now is talk about how Linda to you know we're not fundamentally changing what Cassandra does I'm not trying to make it more of an SQL we're not going into joins or anything like that, what we are making it able to exposed Cassandra's strength in a way that makes more sense so to talk about that.

A

I want to give an example of a song and playlist manager and show how we go from the thrift and command line based schema to c ql. Three base schema into fender one dot too. So there was the first column family or table that we're going to talk about is just a song data itself. So this is what the definition looks like if you're defining it with the script based command line.

A

So if they create called family song, I give it a key validation, class I colored the keys light green on this lighted show them that distinct from the column, data and then I, give it a comparator and then for each column, I give it a column, meta data entry with the name and a validation class. So okay, so this is still fairly straightforward.

A

It's kind of verbose and a little bit clunky to remember this intact, but it's very straightforward and mapping bad to c ql is also straightforward, so this become create table song. I have a uuid primary key and then I have some columns to find and I flips data off, because that's just a binary blob I don't want to put three megabytes of data on the fly, but but the other three columns will represented here.

A

So if you look, the Queen goes to the representation of the data on disk, where I have a cell name and then tell value and I've got those sparse.

A

You know the Phils are, are defined, sparsely so with each so I have to repeat I, have title and and for each title fill and then I specify artist for each article and I have repeat that in my in my storage engine self. So when we, when we move to the the CTL representation now does this looks a lot more familiar because we have the night.

A

My columns, I, have I, have I defined my column up front and then I just need to live, lift the column value for each of those, and this is, and you can see.

B

A

Definition itself is that much more compact and 3 forward nodes.

A

Where is where things start to get more interesting when I start talking about how, in c ql, you define your columns up front and people who are familiar with with earlier versions of Cassandra, you know we starting to get worried, isn't any well, you are you going to take my dynamic columns away from me? No, but we're not doing that we're just helping you represent them in a more straightforward fashion.

A

So here's an example of what I mean by that one of the things we want to do when our little playlist managers allows users to add tags to their son, so in the gives, a strict definition again for that we're going to have a long-haired column family, we're going to give it.

B

A

Class and a comparator, so this is an example of what we used to call dynamic column. Family I, don't have any column metadata specified because each of my fellow peer is going to have. You know a name, the determinant runtime that whatever the application decides to insert so my still name and my column name and at a user level, are not going to chorus on 121 anymore. So here's an example of some data there where, for.

A

One of them for the first time, I've added tags, blues and 1973 and for the next 1 I've added attack, covers an attack for 2003. So I've got four tags across tucson represented.

B

A

So this is this is the part where, where your brain explodes a little bit, and so, if then you're in really good shape. So what how we mapped is to a table in c ql. Three looks something like this: where, where we're going to go ahead and they create tables on tags and we're going to give it the ID for the key and the tag name, is it's going to be our under column?.

B

A

Then we're going to give it a compound primary key definition that we haven't seen before or I'm going to say both the ID and the tag name are part of my primary key, and so how that translates into the storage engine world is I'm going to take that that primary key definition and take the first element with the ID that becomes the storage engine roki or what we call it in the in the CTL world the partition key.

A

So you can see that that the a36 of uuid, we had a single storage engine row that becomes multiple rows that are exposed in the c ql results back. In fact, we have one row per storage, engine self and the storage engine cell name become the tag name value in the particular result, set representation, and so note that I color I put an orange box around the part that that correspond here, fixed X of parts to go together. So basically what the teachable partition that we're going to guarantee all goes to the same server.

A

So you can. You can fetch it efficiently all that colored morning, so we've taken that one orange storage engine row and we split it up into those two rows and the PQ result back and and I still blocked them in in orange suppose to show the password those are coming from it's coming from the same same storage partition now. One thing that does come up is: can I again have a partition key? That is itself composed of multiple columns, and the answer is yes, you can do you use parentheses to denote that this?

A

This first group of columns is actually a composite partition key, but we're not actually going to show that in our example today, because that's a little bit more of a niche and.

B

A

The take home message from that is that, in moving to this representation in CTL, free and we've been very careful to make sure that we're not actually losing any functionality. So if, at any point in this, you get the impression that is hey, it looks like I can't do something that I used to be able to do from thrift. Then it's probably my fault for explaining it poorly because that's that's been our goal. The whole way through to make sure we're not losing any functionality.

A

One more example of it gets a little more complicated Oh before that have affordable to one more example, one of the things we added in one dot to is collection types, so you can define columns to be lips or set or maps which gives you an easier way to be.

A

Normalized then pulling things out into a separate column back so for the song tag example of in in one dot too, I could just say, alter table song and then add a tagged column that was effect of of the tag, and so that would look like this where, instead of pulling those exit, separate tables and now have a tags column and the data in it is to the fact- and that's mostly representative tql.

A

By the way it is as curly braces around comma-delimited data and with our bracket, delimited and maps are surely break and colon delimited the way it's basically the way to expect from json if jae-joong actually had a set type.

A

So we kind of made up our own syntax for that, since that's not part of the JSON standards, but under the hood, this is more efficient than having an actual JSON blob would have been in earlier releases, because what does lets me do is I can say, update song asset tags, equals tags, Plus and then some more tags, and then only those new tags will get appended to that column.

A

If the storage engine layer and not having to be right any existing data, which is what you had to do, if you were taking the JSON blob approach in prior releases, so the last example I wanted to talk about here was the playlist themselves. So the way we do that in the thrift world is we're going to say them each song in a playlist. It's going to be this composite column, where the cult, though the cell name, is composed of the different pieces of data that we want to track for that song.

A

So that's what we that's! What we've got here where of the comparator, is a composite type of three utf-8 components, and so were 11 of those components is the title: one is the artist and one is the album, so that doesn't. How does that tell those map to that to that storage engine row so the way we we map this to the situa world is similar to how we did it with the song tag. We're going to have a compound primary key.

A

The difference is that now we know that we're dealing with these composite column we're actually going to separate out each of those components into its own column, so the artist becomes its own column. The album and the title both become their own columns that we're pulling out of the.

A

Composite of these composite entries in the storage engine, so I do we have any questions so far christian not from the audience about this. This is a good time to actually take a mid presentation grape about the TTL transformations ever again.

A

Alright, so moving along, if you do have questions about it, then we can still take them at the end, with with everything else, there's one more B's about the cql mapping of table to storage engine column family that's worth discussing, and that is that how the healthy jewelz I exposes the comparator. So the comparator is what we use to with these dynamic column family. To make sure we give the results back in the orders in the application monster math. So.

B

Job me Jonathan, sorry, I was trying to jack up is trying to get myself off of meat so related to c ql. Jeff Schmidt is asking what about client-side non thrift access.

A

So CG well is a query language that kind of transport agnostic. So you can query, you can perform sequel queries over thrift know, there's an execute cql. Three, a query method is rip that most of the existing and use to access each well, we've also added 4 1 dot to native cql protocol, but you know part of the upgrade path that we want to provide.

A

Yet we don't want you to have to rewrite your entire application on top of a new- protocol clients to access any of this, so there's definitely the potential for a gentle upgrade path by continuing to use the thrift API in your clients that you that your application is built on and then adding new features didn't teach well. If that makes sense and clamps like Hector on the Java side, picasa on the python side are taking that approach of exposing methods to perform. Tql queries from basically a great debate.

A

Now it's also worth pointing out that the ability to access data that was created in one way or the other is bi-directional so I've been talking about here. How each will is basically able to access data that was created from thrift, so in any of these examples, I, for instance, this one I- can actually pull data out of this column family with from a tql query without having to go through and do this create table.

A

So if I already have this column, family defined I can go and pick you Ellen, please select start from playlist and what catendra will do is it will assign column names to the different composite component? So you'll get it back with column, one column two and calm 3 instead of artist title and album, and then you can go ahead in and say, alter table, playlist, rename column one to Tyler and rename column, two artist and and yeah.

A

You can give cql that extra metadata without having to do any rewriting of of your data on disk and you can you can access things the other way as well. If I create this playlist table, I can still access that data from Frick. If I understand how that's mapped to the storage engine under the hood and playing around with the command line interface can help.

A

With that that the CLI you can, you can say, lift playlist if you create a mist table from CL and then see that data and how that how that, how that's been mapped into the storage engine structures?

A

Are we good to continue Christians at.

B

Fat Thank, You, Jonathan.

A

Alright, so the last piece of this bubble is how does have a cql handle of exposing the comparator and orderly rows to the c ql world ordering rows within that partition? So in the playlist example- and we didn't particularly care about what order are songs are stored in and not sure that maybe we could store them in chronological order or something but I. Think of that an example that makes a little more sense intuitively is to talk about.

A

You know something like Twitter or something like Facebook, where I'm following my friends and I, want to see their updates on in some kind of chronological order. So there's an example of a way: I could store a Twitter timeline, meaning the tweets that my friends have made in Cassandra. So I'll have a table timeline I'll have user ID! That's me: that's the person whose whose friends tweets were tracking here and then we'll have tea, tidy and tweet author and three body for each tweet.

A

So the tweet author is the person the tweet and then user ID is the person who's following him here. So what I wanted to do is I want to get a list of all the tweets of people that drift X is following and I want to give those in chronological order. So what I do is in my compound primary key definition: the user ID. That's my partition key, that's the first entry, but then any other component. It needs of different components of the primary key.

A

Those are going to get turned into the comparator, so I'm by having this user any combat creed, IV primary key definition, I'm, saying that within the user ID partition tweets should be sorted by Pete ID and since I've been fine tweet. I need to get time basic UI the aversion one user ID. Then I'm going to get those in chronological order. So when I say the leg start from timeline where user ID is with sex, I get that back in chronological order, don't have to do any work.

A

I don't have to do any sorting at run time to get that. In fact, in PQ, le I could optionally say order. My tweet ID or I could say, order by tweet ide descending.

A

Both of those are fair, and by default it will be put out by default. It will perform as a pending order by on the remaining components of the primary key.

A

Alright, so switching gears just a little bit, but still in the realm of what we've done, will teach you well we're moving towards exposing more of the information cassandra has about itself through, seek you out so there's a couple categories of information that we have. The first one is about the schema and the table, definitions that we have and then the other is more about. You know what other nodes are in the cluster and look like know about them so for the fritter on.

A

So all of these, by the way all of these are going to be in the system key space. So that is that's where I'm doing these queer example. So for the schema, we have three tables in the schema cubase that deal with the schema information. So the first is scheme on this quirky basis, and that looks like it's very, very simple, a number of pieces of data about each key stage that we have defined.

A

We also have schema, underscore column, families and schema underscore columns. I won't show example, data for those because there's a lot more columns in each of them, but that's where that data about about the tables you define this store now. One other thing that's worth pointing out here about that strategy, options and other places where we have kind of a set of options in the schema, for instance, with compression options for compaction options, all of which are per column families.

A

We change those to be max now that we have this native mac data type to do that written contender, so the syntax has changed a little bit. So is the way you would define that now is with the mass syntax of curly braces and calling. But when you see in defying the restate coming back in this query here, rather than with the old kind of ad hoc colon delimited option impact from one that one and earlier.

A

Then the information that we have about the cluster itself, we have two tables for that. We have the local table, which is about the the notice. Are the machine itself says that we're talking to and then with a slightly of dinner table is the peers table, which is everyone in the cluster? What do I know about them? Do you consume the period table? We know what their network address is what schema version based. They know about what data center and rach they're in and out and their token, but you'll know because of the virtual nodes.

A

Talk is a set rather than a single entry, but compared to what we know about our fills. We know a little bit more. We track some infancy metadata about truncation time. That's, actually, a map of table I need to the temptation time to look at it.

B

So Jones and I have a question: that's sort of related to this from aditya. Where is the cql metadata caps? Are you title album artist in case composite columns.

A

Yes, so effectively, um and the short answer is yes, it would be in the schema on your score column and you can play with that and kind of see how that looks, but yeah, that's where it is also.

A

The other thing I wanted to point out here is that this gives you everything you need to know about how Cassandra route your data, the different replicas in the cluster, so in the local table, I store what partitioner I have in the key phases table I store what replication strategy on using and then in the pier table. I also have the data center in rack for each machine in the cluster.

A

So, given those pieces of information, a client can actually determine for a row that it wants to queries where that role lives in the cluster and use a connection pool to connect to a node directly that has that data locally, rather than going over an extra hawk through a coordinator that doesn't have a data locally. So a hectare of existing class SD annex plans like that I'll like to do this already and so again, our goal with cql is to give you all the functionality that you have exposed, no more happy.

A

It's over trip, such as the described ring method, call and still give you the ability to do that that kind of advanced connection pooling with speaking well based clients in the native protocol.

A

Lastly, I wanted to cover a more of a troubleshooting feature that we gotta someone not to you: that's well integrated into cql, but it's also available over threats, which is request tracing, and so what does what it does is in tql, shell, you can say tracing on and then contender will collect tracing data for each query that you do from that session until you turn it off or disconnect, and here's what that tracing data looks like and I I've gone ahead and colored it to make it clear that which machine each line is coming from.

A

So the train starts off in blue, that's the coordinator and talking to from the secure shell, it's the 127 00 that one and then it's going to the coordinators, send that request to a replica node in the cluster that dot to and that's colored in red, and then the replica machine ascends the replied back to the coordinator back in blue, which will then give it back to jook eul shell, that's kind of the life cycle of the request of a very simple request here. Now.

A

If I wanted to do this from thrift, what I need to do is is I need to call the trace next query method: that and then the sender will collect the treating data into the system, underscore traces TP, and then you can pull it back from that programmatically if you're not doing this from situation, but I wanted to give an example of how this can be useful. One kind of common anti-pattern that people come up with when they're first exposed to the sender is hey. I've, got a data partitioning and ordering within that partition.

A

So that gives me a really really simple way to model a persistent queue, and so so people will want to look you like something like this is transformed into a cql table.

A

If they create table cues I'll have the queue name be my ID and then I'll have for each entry in the queue I'll have a needed at column and then the no cake value to something like this, and then when, when that, since I'm telling it to you this compound primary key with created at being the sword in part, the primary key, and that that told me that, for each queue I have each queue is when a map exactly 21 Cassandra data partition and this entries in the queue will be ordered by created app so so far so good.

A

The problem is that simple cassandra is this distributed system and the leaves are not instant. We need to keep these in called tombstones around so that if any of the other machines in the custard missed the delete, we can replay that delete to them.

A

Rather than having that that mission, the mists of the week replay the original data backed up and have it reappear, which we don't want, so what I've represented that on this slide as striking out the deleted entries so they're still there in the sense that we have a tombstone saying that this entry doesn't exist is more. But if I do a select query from it, then they won't be included in the results because they can delete it.

A

So if I go ahead and insert a hundred thousand entries into this cube and and then delete all of them and then I go ahead and run my select next entry from the head of the cube, which is which is this query, the top here then yeah, you know what you will see it and the number of injuries in your cube that you injure them to leave it. As that number increases, your query, slows down and requests rating. Show you what the what the problem is.

A

So this line that I've highlighted in green here that to to execute this select women, one inquiries I, don't have to read a hundred thousand tombstones to get that before I found a queue entry that hadn't been deleted and so that that took the knee last column, the elects column is in microseconds, so that took thirty five thousand nine per second or 35 milliseconds.

A

To do that, and so this is a common mistake, kenichi and that affect some people doing data modeling a things like this that we're hoping that that request trade to meet ones are too is going to help people figure out what's going on and see where the problem is before it gets to be a really critical problem in production.

A

So at this point I will switch tabs here that we have training available and you can find out more about that as the at the URL here and we'll turn it back over to christian for more questions. It.

B

Has thank you very much, so we have about 10 minutes and we've got some questions coming in. If you would like to ask jonathan a question, please go to the Q&A tab in WebEx and type your question there. I'll read them out to Jonathan and we'll try to get through as many as we can in 10 minutes. So let's get cracking the first one that from Jerry this one is regarding vinodh with vino support. Can we reboot n is greater than our F number of nodes at the same time safely?

B

Can we control the distribution factor.

A

That's a good question and you know, is basically separate from your replication factors, so you're still going to have the same number of replicas across the cluster. The difference is that, instead of having, instead of having the same three or so node replicating all of your data, as you respond to all of a given range of data, we're splitting that range up into much smaller pieces and scattering those across the cluster.

A

So instead of having, instead of sharing data with Reno's you're sharing data with the entire cluster, but each each other member has a much smaller piece that it shares with you. But the number of replicas you have total is the same, and so the guidance on you can't lose more than one machine and still be able to do. Coronaries with replicas is unchanged.

B

Great I can take the next question: I love it. When I can answer a question, this is from Kevin. Is this webinar available for viewing later? Yes, it is all of our webinars are available for viewing both from the base tax com website and also from the new community website, planet Cassandra org. We try to get the archives up and posted within 24 hours and we will email you when the archive is available.

B

Ok, next question for you Jonathan from victor what about moving them tables, to offer you to minimize young generation collection, pauses and then in parenthesis? Yes, pärnu GC, for instance,.

A

So them we've actually explored that we've actually explored moving them tables occup as well. The problem is that the reference counting we need to do for that gets really really hairy, because what we, what we want to be able to do is we want to be able to free that memory up as soon as it's flushed, but if we have a client requests that is using that memory, because it was, it was accessing that name table.

A

While the flesh was happening, we can't even if we finish the flush, we can't free that memory up until any requests that we're accessing that many tables are also finished. So there's.

B

A

Just because of the way the cocodes organized and how that works out, excuse me that's a pretty hairy thing to tackle, and so what we've done instead is, when you add data to a compounding, it goes into the mem table. We actually copy that into a large byte buffer that are allocate one megabyte size, byte buffers and then copy your your data into those.

A

So we're basically doing arena allocation within the data Yale and what happens is those of the arena that we've created the mega byte size buffers those gay tenured fairly quickly, so they don't interfere with your car. Do many more after a few minutes of being up and active. So, ideally, I would like to cut the cheese book for it even further by moving those of you that I think will we is what we're doing is a good compromise between complexity of implementation. Ok,.

B

Great see how many more we can get through this one is from tama any hints on how to upgrade a 10 set up to 1, dot, 1 and then 2 1 dot, 2.

A

So, let's feel are basically the only hint you need to know is that you need. You want to be on the most recent minor release of your series before upgrading and if you news text file, it will have details on when exactly this is necessary, but as a rule of thumb, it's best. If you're on 10 and update to 10 10 or whatever the most trees tomorrow releases, and then you can jump straight to one, not two from there. You don't you don't so far. Let me put it this way.

A

So far, we've never made you stop or we never made. You do intermediate upgrades or still fully backwards, compatible activist layer, all the way back to 0 dot, six I think, but you do want to be on that most recent minor release, because sometimes there's network protocol issues where it's not it's not compatible with doing a live, rolling, upgrade a part of your quest at a time unless you're on that most recent minorly.

B

Okay and I, don't know if you have the answer to this one yet but interesting. This is actually a date specs and price question. When when can we expect one thought to in dates back sense price, so.

A

I can give you a historical perspective on that. We're with that affect enterprise to when we first released that on one dot, 10, sorry 10, and then we get a minor release of the sp2 and it was two dot one where we released is based on when de cassandra one down one row, so we're going to we're going to see the same thing.

A

I expect with the satisfaction of fries three, then first it will be released on and I'm a tender one down one and then we'll do a minor upgrade or a minor will be that incorporates Cassandra 12 as to how long that that will take all I can say is our goal is to be faster than it was with still one dot, 0 to 1 dot. One migration.

B

Okay, thank you from Richard Goldman, any comments, please on the abilities or limits of Cassandra 12 jdbc driver to locate available clusters and to what degree is load, balancing implicit, any examples or general comments on driver hints config toward this would be appreciated.

A

So let me let me grab this I need for you I'll paste it into the webbing area.

A

If you want to play around with the drop in java, let's yep.

A

hmm Put pins in the chat, I guess.

B

Go into the Q&A panel, oh yeah,.

A

There it came to get hobbling yeah java driver using the native cql protocol and it's very it's a JDBC inspired kind of API, but we cut out the parts of jdbc that don't really make sense in the secure world, and so basically, it's cleaner, simpler and it does all the connection pooling and smart request directions that you want the JDBC driver itself.

A

If you really want to do that, pretty I, don't think it does anything particularly special around connection pooling it may not even min even push the connection pulling off and say you're responsible for doing it yourself or with some other library. You really need to ask the native ECG project about that. I'm, not super familiar with it. At this.

B

Point: ok, we're at the top of the hour. Can we squeeze in one more? How is the data distributed over jbod s.

A

So, basically, we try to keep the space utilization even across the business, so yeah it's baby, it's basically a leveling algorithm where whoever it is emptiest will try to allocate data on him. But it's also aware of workload where I'm already writing one at this table to dismiss I'll write.

A

It I'll write the next one somewhere else, so I'm not overloaded, if I add a new district empty, for instance, I don't want to just brutally hammer on that and make that the bottlenecks of everything, so it will will even out the disk utilization over time, but it okay.

B

Jonathan, thank you so much for taking the time today. Just a reminder. This webinar will be posted tomorrow as an archives and join us for our next webinar in the series which is I believe in two weeks time and is cep.

B

Cp streaming with with storm and Cassandra, so we'll look forward to having you back then or its next Thursday, January, 17, Thank, You page like magic, so have a great day. Everyone. Thank you.