Apache Cassandra Cassandra Community Webinar Series, 10 Oct 2011

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cassandra Community Webinar | Intro to Apache Cassandra

Description

Speaker | Aaron Morton (Apache Cassandra Committer)
Date | Wednesday, October 10 @ 11AM PST

Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is 101 level and will examine C*'s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to concepts such as Cassandra's data model, multi-datacenter replication, and tunable consistency.

A

Series and that is available via archived on datastax com, just a little housekeeping at the end of the presentation, Erin will be taking QA and you can either submit your questions on Twitter via the hashtag Cassandra QA, or you can use the WebEx Q&A panel either way we will be monitoring both. So just a little bit about Erin he's very well known in the Apache Cassandra community.

A

He is a Cassandra commit a freelance developer and datastax MVP for Apache Cassandra. If you missed errands first presentation, what my no sequel, then you can go to our resources section and watch the archives, so without further ado, I will hand over to Erin Thank You Aaron.

A

And Aaron, you may well be on mute because we cannot hear you.

B

B

Sorry there we go now on the print enter good morning good afternoon. Everyone, as Christian, said my name's Aaron. I probably have spoken to some people on the cassandra user, lift that's a great resource to go for questions on the from the apache site. If you have any our words, also encourage you to ask questions on the twitter hashtag, as christian said or through WebEx.

B

This is an introduction to apache cassandra, so there are going to be things that topics that I skip through or I don't go into in as much detail as you might like. Please avail yourself of the ways that you can follow up on Twitter on WebEx or on the cassandra user list.

B

So what we're going to do today is have a brief overview of how we got here then we'll look at cassandra from the point of view of the cluster, which is a as a large number of machines, will take a bit of a look at the data model and we'll wrap up by having a look at the API and the different ways you can get data in and out of cassandra.

B

So Sandra was famously excuse me famously started by Facebook, and they used it to provide search for their inbox feature. They donated it to the apache software foundation in 2008 and in 2010 it became a top-level project at Apache, which means it came out of intubation. It was deemed to be a mature, stable project. Since then, we've had a number of major releases and the project really has taken on a great life and a great level of a great number of people who are very vocal about how Cassandra.

A

B

Them achieve scale of capacity and throughput and operational simplicity and keeps their website up all the time. So it's like Netflix, Twitter, reddit and Rackspace all publicly came out and talked about the ways that Cassandra helps them achieve these things.

B

We just take a step back to how 2 to the foundations of Cassandra needed some papers, which I think are accessible for. Anyone to read and I encourage you if you've got the deeper interest into why Cassandra have made some of the design choices that did you can look these up. So in 2006, google released a paper called big table which talked about a data system that they had there at the time and cassandra borrows the column family data model from big table.

B

We'll talk about that later in 2007, amazon released the paper about the database platform, they had called dynamo and dynamo hat is brings about it, talks about a peer-to-peer based system and eventual consistency, and these are two ideas that cassandra has as well.

B

So briefly, why would you want to choose Cassandra I normally talk about three points, the first things that you've scaling for capacity. So you might have terabytes and terabytes of data, and it can't- or you don't want to put that on a single machine or you might be scaling for a throughput.

B

You might have a right master and three read slaves on a postgres or am I sequel, installation and a terracotta case really what you're doing there is trying to get lower latency on your reads and on your rights as well, by separating those off from the reeds and higher throughput, and that's something that Cassandra can often help with and simplify that insulation to be sp1, canonical datastore. And the next point is operations. I.

B

Think to me this encourage that this encompasses the idea that cassandra is a machine that you can work on while it's running and you can keep it up all the time talk about high availability or continuous availability, if often as a freelancer I help people do things like move and scale up their systems, and we do that online.

B

We help them move to different hardware and we can take nodes down and bring them back up again on new hardware, all the time, keeping that system operational and lastly, the data model, the Coleman family, almond or data model- is very flexible system. As long as rigid as a tabular data model that you get in a relational database I'm, the often people will see that as a plus, if you like know more about where the sender is a good fit for what you do.

B

One of my MVP colleagues, aeroclub Oh, we'll be talking about it in a couple of weeks to encourage you to sign up for that webinar as well. Let's move on to talk about cassandra has a cluster system, so in this example, here I've got four nodes in the cluster and I'm just drawing them in a ring here. This line does not represent network connectivity. Is it just represents, if you think of that, normally it thinks as a ring.

B

We want to store a row of data, so in Cassandra we have grows like you having them in the relational database, and we have that what we call a keen which is analogous to a primary key in a relational database and because we want to get some of these high availability and and fault tolerance. We want to store the food key three times and call this a replication factor into Sandra. So you can go ahead and store this road on those one two and three, but we really want to understand why we get that.

B

Why not two three and four, and how does Cassandra know when it comes time to read that that the row is on one two and three and what happens when we wanted to have for those when we want to have 40 nodes? How does Cassandra handle that?

B

So when it comes to replicate sorry understand how we partition data and how we choose knows we have this idea of consistent hashing that Cassandra implements the two things that the idea of consistent hashing brings in the first is that we can easily match the kingdom our system onto the nodes.

B

So if we have four nodes, each will have twenty-five percent of the load, and we can very easily then understand, what's going to happen to all the computers in our cluster and consistent hashing also allows us to minimize the key movements from those drawing or leave the cluster. As I said, we want to sandra to be continuously available continuously original handling requests. We want to be able to scale it up and scale it down and consistent. Hashing helps with this.

B

Our implementation of consistent hashing revolves around two principles that you'll see in configuration files and in schema declarations that, and often you get asked about.

B

If you ever ask someone for help, the first is what we call the petitioner and our default partitioner is called the random partition off and what it does is it takes your room key and apply the transform to it to create what we call the token miranda partitioner transform things using an md5 hash algorithm and creates as token for us so because we've taken the md5, which creates essentially a random output.

B

If we put in to royalties that are lexa graphically close to each other, such as food and soup, when we put them through and we get their tokens there now, they can be in that totally random order. Now, so we've put suit before foo on the token range and their tokens are quite disparate.

B

So that's how we've got some randomization in there and that randomization helps us, have a consistent load on all the nodes and instead of thinking about the range of tokens as a ring. So it has a number mine if we think about it as a token ring, we get to do a couple of interesting things in this token ring. You can think of this like a clock, so in a clock will you count from 58 59? The next value is 0, similar idea in our token ring here.

B

If we came around from 0 to 98 99, the next value is zero, so we have a continuous range of tokens, and this allows us to apply some operation on to that continuous range and my example here I've used the tokens tokens from the range 0 to 99 in real life. The token rent range is a hundred and twenty-eight bit integer and has values up to 170 billion billion billion, or something like that. So we have a very large, essentially infinite range of tokens.

B

With our token ring, so we've got that token ring. We know a way that we can randomly distribute our row keys onto that token ring. You need a way now to connect that nodes on to that token ring. We do this by assigning what we call in the configuration and initial token to each node.

B

This token places the node on it on the ring. If you look at this model here, node one has a token of 0 and 2 of 25 and so forth. We can then give each node what we call a token range. Now it's hard to get across the idea that this post range is not primarily owned. By a note, it just helps us when it comes time to find out which nodes to store data.

B

Rob Lowe tomb has a token range of 1 to 25, because the token range starts at one after the previous token' value, which is 0 and goes round to include its own token valued. If we look at node 1, its token range starts at 76, which is one path: the previous token' value, and here we go with this clock counting.

B

We go around past 1999, the next value is 0, and so we include that in the token range load 10, you can see now that we've carved up this this space into four equal areas, and so the next step is pretty simple. If we take our Road key, we apply the partitioner to it together token.

B

We put that token into the token ring and we can work out which node as a range which covers that token ring, and we can find the first replica before our roki again, let's emphasize here that this is a peer-to-peer based system and those one is not in any sense of Master replica is in no way more replicating of food than any other node. It is simply the first one that we found remember working out the replicas, but we want more than one replica. So we have this idea of a replication strategy.

B

How do we get from having one little each other having three? How do we choose nodes, 2 and 3? It's always a simple way to do it and it is Andhra. That's called the simple strategy.

B

It takes the modes order them by their token and then simply counts around until it gets to replication fact the number of loads. So in this case we took the roki, we mapped it into. We created the token. We map that onto the token range that token range is owned by no is one they did our first replica.

B

We then count and we get to no.2 and no.3 and again there is no different here in terms of one bit of master one being a slave one, somehow being more important than the other. All these replicas are the same, and it's simply by convention that we've gone clockwise around the around the ring or that the token ranges start on the left of the modem.

B

Whenever there's a simple way of doing things is normally a more complicated way, and we call that the network topology strategy and in Andhra that allows us to use a replication factor per data center. The data center to cassandra is not only a physical building, it is a collection of nodes inside your your cluster, so this could be that you're running Cassandra cluster. That has some nodes in an AWS region on the East Coast and an AWS region on the west coast. It could be that you have one building and inside that building.

B

You have all of your Cassandra modes, but you split them into two datacenters one of those can handle the public-facing web transactional load and another one of those can handle an internal facing and elliptical mode. This allows you to separate the workload the analytical load. They could be powered by habit which we easily connect with, and that allows you to get your transactional mode instantaneously, available into your analytical side and for your internal people to be able to analyze that without having any impact on the public facing side. I have to apologize here.

B

Webex has mangled this slide a little bit. What we've got here is a replication with two beta sentence. You can have the same replication factor in both in sensors or you can, in this example, use two different data. Centers.

B

You notice that the nodes on the East Coast data center have tokens that are one off from the ones on the west coast data center. This is just a little bit of sikandar implementation leaking through. We do think of this as one entire cluster and every node in the cluster has to have a unique token, and so the way we normally deal with this is we just add one when we add a ksiva temper. So if we had a third data center here knows, 100c might have a token of two.

B

So we want to understand to understand what data center and mode it in and also what racket loads in, because then the network topology strategy has some smarts in there about rack brackets energy. So if you had.

B

So we have a simple snitch and again the idea of the simple snitches to do things as simply as possible, and it places all the nodes into the same data center and the rack. We have a property file snitch which allows you through configuration to say this. This node is in this rack and there's also some other ones which infer the data center and the rack from the IP address and likely.

B

We have this idea of an ec2 snitch which uses the internal information ec2 makes available, and since that the cassandra data center is the AWS region and the rack is the AWS availability zone.

B

The network topology strategy in this case, where nodes are in different data, centers and different racks, does its best to replicate your data on each rack in each data center. This means they've used to chloe's Cassandra into AWS, and you have Cassandra nodes in three availability zone say and you'd lose an availability zone. You lose a whole physical building, as it happened a couple of times in the last few years.

B

Your Cassandra installation can continue working, we can continue serving requests, and this is how we've seen in the past. Companies like Netflix and simple geo have been able to handle these large-scale failures in AWS and continue serving requests on top of these niches, which give us the static understanding of your cluster. We have this thing called the dynamics niche, which is a little bit inside baseball, but I think that it illustrates what it means to be at a peer-to-peer system.

B

One of the things we use the snitch for is to understand which nodes are physically close to each other. So we can direct traffic in the most efficient way, but just because two nodes are in the same rack doesn't always mean that they are that it's the closest. So we want to know that is actually the fastest.

B

The dynamic snitch runs on each node and it watches the request reply: traffic between nodes and develops, an understanding of which nodes have has the best performance. From the point of view of the node that it's running on, we don't have a global server that looks at your entire cluster and develops a view of things from 30,000 feet. The dynamic snitch gives an idea of what it means they appear to PA system.

B

The mode is developing a view of the rest of the cluster on its own from its perspective and it's using that information to make decisions without consulting with other nodes or some sort of master server more feed. Later on that, we also have this idea of gossip, which allows which takes this to the next level.

B

Again, apologies here, this slides being mangled a little by webex when it comes time to store data, because we have appeared the PA system, every node into sandra is the same and they can all perform the same actions. The client doesn't have to be aware of where its data is going to be stored. It can connect to any node in the cluster and ask them to store or read data, and we call this mode that it connects to the coordinator for the request it uses it to understanding.

B

Is the rest of the cluster to make decisions by itself without having to consult a master server and when it comes, finds it and run through the petitioner and the replication strategy, and then, when it comes time to send the message to another node together to read or write data if they were /, did it by making one network off to each node?

B

We don't have to go to a master server or we don't go to one knows the market to do some work and then go to the pass that message on to the next node. In this example, the client connects node for masks. It's a store, our food team and those all users. It seals the cluster to make one hop to node 1. At the same time, it makes the message to no.2 and no.3.

B

So this gossip thing that I've mentioned previously is a really neat and simple idea that allows the nodes to share information. So every second one node talks to one two three other nodes, and it says this is what I know about me, and this is what I know about everyone else and don't have to connect every node in the cluster just connects to between one and three sends out these little bits of information and women.

B

Oh receives that it uses some rules to work out which information is now the current state of the cluster, and then it passes that information on to the next need to its neighbors.

B

Using this noise in Cassandra understand who the who the other members of the cluster they understand, what their token assignments are: I understand what their current load or the current version- and this is how we get this idea. This is called to the idea of the appear, the PA system.

B

So again, if we had a client and our coordinator in a multi data center set up the coordinator in the left-hand side, Stata Center knows about the nodes. On the right hand, side data center- it doesn't have to ask someone else when it wants to send data over to the more option to read the same network bandwidth.

B

We know that they're in another data center, so we send a message to only one node in the remote data center, which then forwards that on to the other nodes in the remote data center that are involved in the request.

B

All of those then reply directly back to move for. So we don't have that we don't tunnel back through that one remote mode.

B

There's lots of machines involved there and start to wonder what happens when they fail. What happens when modes unavailable? How does that impact the claim? Each quiet, each request from a client specifies this thing. We call the consistency level which tells the coordinator how many nodes to wait for it says I want you to store this value for me and when this many nodes have completed the request.

B

Let's consider that request complete success, and please come back to me if the nodes, don't repeat that, if those don't return back to you in time, consider that a failure consistency levels are built in Pakistan grill. We have some basic ones of any, which means which is only available when you're doing all right, one two and three quorum, which is the most common news.

B

We use intestine sea level, a local quorum which is a quorum in the data center, that your request started and each quorum, which is only available for right, and these are waiting for a quorum in each data center that we're storing the data in the idea of a quorum is a very simple and important one. The quorum is the replication factor divided by 2, plus 1 I mean take the floor of that. So the record for a replication factor of 3 the quorum is to replication factor goes up to four or five.

B

The quorum is three and so forth. As you increase your replication factor, you get more redundancy. So, most of the time we see people would using replication factor of three and working in the quorum, which means the data has is on three nodes, and someone is two of those nodes agree. As long as two of those nodes participating in request, then we consider that request to be successful.

B

In when we look at this in on their posture diagram, we can see the client connects to node 4 nodes. 1 2 are up and running, and in this example let's say no 3 is down before the request starts. Oh no, for knows it's down that there's some other in situations where no 3 might become unavailable after the request has started and will address those shortly.

B

So let's say the client asks note for to write some data. It looks at it view of the cluster and says well mode. 1 and 2 are available, but note 3 is offline. That's okay, I've got two replicas I can store this on. I can meet the consistency level that the client has asked me to use, though, goes ahead and does the request and returns back to the client and simply sends I thought this.

B

Isn't the minimum number of times that you asked me to do that while it was running this request and node 4 knew that no 3 was down it's stored. This thing, hopefully recall a hint at hand off which means that, when no forward notices mode 3 to come back up again, it comes back up again. It will send the request that notary missed over to it and all the other nodes in the cluster will be doing this as well.

B

So we know if no 3 was just down for a couple of minutes because it had to go through a restart. The other nodes will eventually help it. Get back up to being consistent.

B

Have a similar situation in the read the client comes along and says: hey I want to read the three key at a quorum. Consistency level is one node is down when we use and replication factor of three. That request will go through no problems and.

B

In this case, we don't have anything to store like a hint at hand off most of all just accepts that we've got a quorum.

B

So what happens if node 3 comes back online and after we've, after it was down before all the hints of perhaps being sent over, because we don't want to overwhelm unknown the nose? How do we get a consistent, read out of the cluster?

B

We have this idea that consistency level modes have to agree on what the value is and to agree on what the value is. We use column level timestamps. These timestamps are really just 64-bit integers and by convention we use nano seconds, Carson, UNIX epoch.

B

We look at table here which, given web actors conveniently put the headline over, we look at on the left-hand column. We have some different, poems purple monkey dishwasher and we have nodes 1, 2 and 3. The red text is the value that wins according to the timestamp, so in the first column C note, 3 doesn't have a value into purple, 93 doesn't have a value and those one and two do and their value agrees because they have the same time stamp.

B

So we use that for the monkey column, no III has a value, but it has a lower timestamp and so will use the biggins value which matches timestamp pen and, for the last column, no three as a value that is higher. That has a higher timestamp than the other two, and so, in that case, we'll use the value from mode three.

B

Important point there is that, because sorry notary, so when those values and that are different, we have to handle that to get a consistent read. So if, if, for example, we do a read and those one and to return a value, cromulent and no 3 returns, the value in big uns mode for has to decide which is the correct value and we've seen it does that with timestamps, and it takes a match if I make sure that later on, no three actually give us back the right value.

B

So in this case it well is it sends a right over to my three and says change your value to reformulate and at the same time, it returns cromulent back to the client. So we've returned the consistent, read and taking some action that later on, we don't have to get this problem again.

B

This idea of having a quorum of nodes and is feeds into this thing. We call strong consistency if the number of nodes involved in the right for a piece of data plus the number of nodes involved in a reed is greater than the number of replicas thing. Cassandra will be strongly contestant, which means that after every right, we will return back the value that was written the normal way we go about achieving that is using a quorum for the read a quorum to the right.

B

You can also use a consistency, level of 14 reads and all for rights or all four rims and mantra right.

B

There are times when people reduce their consistency because they want to get more availability and, in that case, they're working in this eventually consistent world, and that means that Cassandra take actions behind the scenes so that all the reads eventually return. The same results we've seen, one of those things called hinted: handoffs, there's another idea of read repaired and we also have some scheduled remote repairs that you can run on the command line.

B

It's a very quick introduction for the cluster. You can see that by having a number of modes having a replication factor of three or more using a quorum consistency level, he gets the best of both worlds. You get your data replicated. You get high availability because you can afford to have one node down in that scenario and you get a strongly consistent system.

B

Let's have a brief look at the data model. Well, I said so far is pretty incomplete. I said we have. These, we have rows, may have a key and we have some columns in them. Let's log into that, a bit more Sandra has the idea of a key space which is analogous to a database, and we put our columns in these. Things are called column families, and we also now more and more calling them tables and you'll see both out there.

B

The column family is, as a naming implied, is a collection of columns and the road can have different columns in different column. Families and different rows can have different columns in different column, families and inside the column family. All the columns are ordered and addressable by their name there's a very flexible model. It allows us to only store the data that we need, because we don't prescribe what the columns are. We don't have a schema that says every row has the columns user name first name.

B

Last name: when it comes time to store the user, you can just saw the user name. If that's all, you've got if you've got their first and last name as well. You can store those as we've seen before the rose around unit of replication. We take a road entire copy of that is on each of our replicas and that the column family is our unit of storage. When you get into central you'll see that a column family equates to some files on disk Colin families are also our unit of querying.

B

We look inside them at these columns. They have a name and a value, and this time stamp that we've mentioned.

B

The interesting thing is that the value is optional, but the name is not so often you can get to a situation where the mere presence of a column in a column family employ has some information to it, and we see they sometimes in the sorts of data models. People create where columns exist. They don't have a value, but the name is important and it can paint some information in there.

B

The columns in the column family are all ordered by their name and their addressable by their name, and we'll see that in some examples of the API later on, just a quick note there, we also have time to live columns, which is very handy, theta pattern. You can store data and say this.

B

This should only be returned in requests for the next week and eventually Cassandra will, after that, we will compact it away and it'll stop taking up space on your desk as well, then the name and the value in the column can just be a blob of bytes or you can apply some basic core data types of them that allow us to order them in a smarter way and it allows clients and api's and other things further up your stack to understand what the what values are.

B

So we have a ski in utf-8 and integers and loans and things new you IDs, and these really nice things called counters, which is our distributed counter. They can only be used for column values, but they allow you to do things like counter web hits count. The number of people who visited a product page, something like that on top of this basic data model, we have composite data types. Allow you to combine two or more basic types into one structure. For example, you might have a column name which has a timestamp and the user name.

B

This could be, for example, a list of tweaks or something like that.

B

I'll be discussing data modeling further ones, that november seventh and next month in the next webinar that I'll be doing on data modeling again, we've blasted through the data model very quickly there. Just to recap: cassandra has column families which are also becoming more knowing there's tables in a Cassandra column. Family columns are all optional and the values optional, and you can store a different number of columns in different rows.

B

On to the API we have two main api's to AP is at the moment, have the original thrift based RTC API. That has been with cassandra since d0, and we have a newer initiative, which is the declarative, text-based API called the Cassandra query language.

B

The thrift based API lends itself to creating language, idiomatic, API and my examples here. I'm using Python and the great library called picasa from theta stats. There are other libraries for java haskell scala PHP and they all present a viewer for Sandra which matches that programming language.

B

So in this example, here you can see I get a connection to a column family, just by name and I want to insert some data so just by the roku and the dictionary of column, name and column value I wanted to do a get. I can get one row just by saying get me everything on that road provide the roti or I can say hey. Do we get? Is the road key and I just want this column? Name can also do a multi get there.

B

I can't upload lots of rows rotate and get them all back. At the same time, if I want to remove data, I can remove an entire Road or I can remove just a column in a row.

B

When we look at c2l things a little bit different, but they're also the same because they're the same across whatever language in using it. So this would this would work in if you're, using tql through Python or using c2l through java. You would see the same things when they do. An insert here is very familiar to people coming from again and Esther Elwell insert into the common family the key and the column names, and that creates a row for us.

B

I can get back all the columns in a row by doing a select star or you can get back mains columns from a row by doing a select and specifying the column names again. I can do a multi get where it I specify multiple rotis there I can do Adeleke where the leak, the entire row or delete just one column from the column, family and that's all the time we've got for now, because we want to leave what's times for questions.

B

So please, if you have any questions, put them through WebEx put them on Twitter and let Christian them.

A

So, thank you very much indeed. So we have some questions coming in and before we take those questions. Give you just a couple of minutes to write them either on twitter, at hashtag, Cassandra or Cassandra QA. Sorry, hashtag, Cassandra, QA, all by a webex in the panel there.

B

A

To let you know that this is one of a series of webcasts that we have. We have done two already. Those archives are available at the URL and then also we have upcoming. Is my appt, a good fit for Apache Cassandra in two weeks time, so hope you can join us that actually note the change of day.

A

That's actually I believe we're going to do that, one on the Tuesday, so that will not be on the Wednesday Oh errands first question is for sorry: Erin I learned that when you introduce yourself Erin so first question at the right fails are ye. It is not able to write to the quorum because it roll back the rights that did succeed and then follow up to that would be. When writing. Does it attempt to write to quorum or attempt to write to all and just wait for ask from quorum.

B

Ok, so I'll answer those in reverse order: right assent to all replicas for a rocky, so we don't tend to adjust the quorum to just the consistency level, we're actually a little bit cleverer than that as you're bringing on new nose into the cluster. We know that they can't handle reads, but we want to get the new data over to them that they may become replicas. For so, the right goes to all the nodes, and it goes to the ones that we know in the future.

B

You'll want to do reads from, and we don't roll back the right if it fails to reach quorum. We fail and return back a quite an error to the client and remember those timestamps that we have on all the columns and remember that when we compare them we're just you put with putting on where you select the highest timestamp, it means that our write operations are eating item at their always get this wrong idiomatic by the potent.

B

So that means that, if you send that right and it fails, you can send your client can send it again, perhaps sending it to a different mode. In the cluster and that right can succeed now, if another client in your application somewhere else has sent a different right, but it had a higher timestamp that will win so I'm sense. Allow us to resend our requests that fail, and we don't deal with rolling back a transaction when, when we don't reach the consistency level.

A

Thank you very much questions coming fast and furious right now, I can answer one. Will the slides be available? Yes, absolutely we post these slides on dates, tax com, along with the archive where you can watch the presentation again next question from prytan. What does this like by names like by column, mean.

B

So sliced by name means that you're, starting at a point that I mean you're, specifying the names that you want to get. Is it probably a little bit me leaking like old-school Cassandra into the modern era? So it's less. My name normally means you say that you want to get columns a dnc from the road with key food. The other approaches you can say I just want to get the first 10 rows for sorry.

B

The first 10 columns from the row with key food into two different approaches to things the first one often use where you might have an orm us or you've got a bit of a data model that your application layer understands, and you know that the user may have these ten columns and the next, the other one, the odds, the one where you don't know the column names working situations. Where you have time series data, often you might be storing events storing, tweet, storing signals that are coming in of hardware.

B

Something like that and you want to say give me the most recent 10 things that we've stored, so you've got a piece of hardware out there. Every minute you store something you want to just see the last in minutes.

A

Okay, great this one is: can you explain the consider from George shell? Can you explain the consistency when we need to maintain a secondary index.

B

Okay, so there's two approaches to taking your indexes: people often take Cassandra vide secondary indexes that are built in to the data model, and you specified you can specify those on the per column family basis and when we do that, Cassandra does something that you can do yourself if you want to, but Cassandra creates an internal column, family there's a hidden from the public view, and it takes your columns and your column, column names and the column values and in depth as those back to your OT.

B

That's a great model where, if you've you've got something that you don't think is very important that you do it every now and again in your data model or in your data model, you miss something, and you want to support a different sort of access pattern.

B

You can roll your own secondary indexes as well by potentially using that same pattern and having your application write data to the column family where that entity exists and also to the secondary index that you're manually updating that the problem there you get is that you have a rewrite, modified problem where, if the value changes, you don't want that entity to the index, that the old value and the new value you've got two approaches.

B

You can deal with that as your application layer with some locking or something like that, or you can deal with that by repairing the inconsistent in your application. We need to read so you read something and it comes back and you pivot from that, and you go and get the real entity and then discover. Oh, that entity doesn't have three equals bar anymore, so I'll discard that from the index.

A

Okay, great loads, more questions are from alan Jurgensen. If I have a two node cluster, one node is down. There's a read with quorum fail what about right with quorum.

B

Yeah in a two node cluster and the replication factor I'm getting there is to the quorum of two is two, unfortunately so with RS to you cannot afford to lose one mode if you've got a two node cluster and your replication factor was wrong. One then your quorum would be one. So it's why often we suggest people start at the replication factor of three, because it gives you this. It gives you the ability to lose one. Those great.

A

Dj did a kunda of. Did it concern us? Have you stitched the data spread over multiple nodes and petitions? Any performance impact.

B

So you think in Cassandra, when you're doing a read, we know where we know the notes to go to and we get there in one network hot. We put in a bit of a trick there for network efficiency, as I mentioned in the introduction. It is well an introduction. We don't send the read request to all of the nodes, all of the replicas for a row. We buy the ninety percent of the request. We send it to just a number.

B

We need to achieve the consistency level for ten percent of the requests we send it to all of the nodes, and we only ask one of those nodes involves to send us back for salt data and we say some network bandwidth there and the others send back a digest of. What's going on so I'm, not sure I've answered your question there very well, but we cassandra is made to handle data being on all on different nodes and we have design choices in there that allow us to save networkers on networking and with.

A

Great Frank nagina ask what is the quorum if replication factories, one.

B

It would be one and reputation factors. One is might not be as crazy as it sounds. I think you could use that in a situation where you really wanted to record you, some high throughput data and you didn't have the resources that has lots of modes, but definitely would suggest that and you and you add you get to that replication factor three and then you get all the benefits of having a highly available system.

A

Great, thank you very much. Ramon Sanchez asks if one users Cassandra cql, how? How can I implement a light operation like sequel,.

B

We don't support that sort of query at the moment and I'm not sure we would the thing about those sorts of queries. Let's say you out: there you've got users and they've all got a city that they say their their home city and you wanted to run a light query over those.

B

If you're talking about that you're talking about a query, then they had to go to every node in the cluster, which is what secondary indexes do and then, when they get there, they would have to do an inefficient type. Like type query, maybe you put a percent sign in front and attention at the back. You have to scan over every value there in Cassandra.

B

We normally assume that you're going to have a lot of values, you're going to have not just a couple of millions you might be getting into the tens of millions of users and so to visit every node and potentially scan every value or every entity would be too inefficient for us.

A

Okay, great we'll see how many more we can get through in the next five or so minutes, Linda Chan of how to seek to l-3 select on columns by name which could vary across rows.

B

When you're using g, two or three to get the advantages of being able to have that expressive power in a text-based API, you need to tell t two or three what your schema is. So you need to say: hey is it has a create table statement and you say my user has: please cut these columns to Barb ads and then, when you do a select, see two or three has some understanding of what it does and they can project back to you.

B

The data structure that you're expecting and once they create people I, don't want to scare people off. We don't we. We do have an ultra stable statement, but it is not like an ultra table statement in in relational database on disk. We don't store empty gaps, we're not story get in. Indeed null columns when you do an altar table, we're not taking rocks and we don't update the data in place. The schema that you tell cql is a schemer so that there's select statements were an insert statement.

B

Work in the way that you expect them to work.

A

Thank you very much. Paul blazer calf is the timestamp row specific or column specific.

B

Good question: timestamp is just column specific earth. It can be specified by the client or on the server side, buddies, just pair com.

A

Okay, thank you and Douglas Leo asks. Can you explain the concept behind the row key? Is it a natural tea or surrogate key is the same roki available in all column, families.

B

Good question: it can be a net cookie or a surrogate key and the same roti can be used in all column families. So people often use natural keys because we don't have to be like an int or I, don't have to join up and have an identity state statement or something like that. So people often use natural keys.

B

You can use the compound data structures in your row key, so you could have a key which is user name and year, and that is things that user get in that year and if you use that roki in column family, a you can have an entirely different set of columns stored in column, family beat.

A

Okay, great and probably got time for one more. These might be too long.

A

How do you restore this? Is Frank's mega? How do you restore an existing key space with a new key space named differently.

B

So I'm going to assume that you have the data files somewhere there's a couple of ways. You can do that in Cassandra we have a snapshot, backup system, which / is everything to disk and paid some hard links. We also believe coming up in 1.2 has an incremental backup system, and the nice thing is when you look at Cassandra on disk. It is just some files on disk number of finals and they have names which are king space and column family in the in the name laid out sensibly.

B

If you want to go and move your king space into a lucid data files into a different key space, you can do that by tuning load, offs and just moving the file there's also a command using the note, all utility which will reload data files from discs called motile, reloads I. Think without you having to shut it down.

A

Great I think that's all we have time for today, but I'm just having a quick look through CSS I don't know, can you do a quick use case Aaron on where there are two competing rights to the same roti, trying to work that way through the system, and that will be our final question of the day.

B

Okay, so two systems writing to the same row. Key. However,.

A

B

Have different? Yes, they say they have different times names, because the clients are on different machines, the race about who wins starts when they choose their time stamp for the request, so that kind of outside the soap, the scope and Cassandra. What we care about is that one wins and so when they they get each of those requests from take its processed on on three modes, and we can look at one of those nodes and it'll tell us how they all do the same job.

B

There's a couple of ways: it goes, it's a quiet ones, request, finishes and completes and inclined to tease, request finishes with starts and finishes after both after that happened, the request with the highest timestamp will be the one whose values and now the current values, let's say that they both get processed at the same time, because in Cassandra we have a thread, pool handling your right and it's entirely conceivable that they both require get processed. At the same time, we now have the idea of row level isolation.

B

We do this by in a lot freeway, and so is quite one, is doing its work and gets its works finished, but it's requested overlapping with client to then quite one will cause client to to the request from point A to start doing its work again, and so readers will only ever see the entire request from entire right from quiet one or the entire right from client to they won't seem intermediate state where half of one as the half the client one's been applied in the heart of clients who is being applied.

A

Ok Erin, thank you, so much I think I was a record for how many questions we got to through so quickly. Thank you, great presentation, thank you to everyone on the line, see you back here in two weeks, and the archive of this webinar will be available on the resources section of dates: tanks, comm this friday and we'll send you out an email thanks. Everybody.

B

Thanks everyone.

B

B