Apache Cassandra Cassandra Community Webinar Series, 13 Aug 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar | Oracle to Cassandra Core Concepts Guide Pt. 3

Description

Tired of timeouts? Cursing your cursors? Join the distributed revolution and bring your dev team into application nirvana. You won’t believe how easy it is to be code complete on your next big project. We will show you how to lead your devs away from the clutches of the DBA and be in control of their own data destiny. DIscover the methodology that will make your Cassandra project epic.

A

All right last week, Rachel already.

B

Is gone so fast.

A

Yeah we three-part the return of the query. We.

B

Started with a new hoe right.

A

B

Then it was return.

A

Of the query, no.

B

A

Of the datum, oh.

B

C

Tracks back oh yeah day.

A

Two miles straight back: mister return of the query: yeah. Okay, so obviously there is a Star Wars montage there, which.

B

We didn't follow through in the slides at all know,.

A

That I think believe me there's going to be plenty of Star Wars this year with a new movie coming, it's going to be outrageous.

B

And you can summon optionally Star Wars themed stuff plenty.

A

Of that right exactly so, we we are live in Austin Texas today and it is approximately 105 degrees Fahrenheit outside, so we're gonna stay inside, do a webinar yay.

A

Where were we left or was yes? We can't see.

C

A

Because this is obviously going to be watched asynchronously later on so part 1 and part 2 in the bag. That's.

B

Right yeah part 2. We had a variant.

C

B

Of wide is Cassandra and an always-on distributed system, even important mm-hmm.

C

B

How did that happen and pretty.

C

B

Happened because I'm Steve Jobs, we decided on right yeah.

A

He created this problem. We put that thing in our pocket that takes more attention than anything well.

B

I can hear em, okay, uh-huh.

A

Ya know so the part two is a really more of an in-depth look at what Cassandra data models is about. It's not not application design, so we're talking about it because and a query language CQL more details about that that will translate into what we're going to do today.

C

A

Today, we're going to talk about actually building an application, so bridging on those two first two topics we have: this is how it works, and then this is the details of how to make a dear model work and why now? Let's talk about really building an application and.

B

So if you haven't watched the last two, probably the part two is the most important here. We are not going to be going through cql at all. We're.

C

Going to assume.

B

You understand what we talk about when we talk about a partition, key and a clustering column specifically, and we.

C

B

Also not going to go into how data is distributed, or why again, that's in part one. So, yes,.

C

B

Fair warning so.

A

Let's uh start this by setting the Wayback Machine: let's go back.

C

A

Gonna date, a model like it's 1999.

B

Be more concerned about the why she came up at that point at.

A

That point I had given up on Kobol so now I wasn't really concerned about it that we yet so this is I. Think I was doing this all the time data modeling Oracle web web applications. That's when it really got serious for me, and it was always the same thing right and that's you can go way back. So this is a quick sketch of me and Rachel of one of our early whiteboard sessions in the cave, and so, if you'll notice I mean this is a funny picture. But what do you see there?

A

This is an ER D right, one demanding relationships, and so data modeling was a exercise. You started first by looking at your data, so this is the process and I. Think you'll probably remember this is like you take this data domain. Hey. We have a problem in our data. Here's the scope of all those things that we need to talk, order to deal with, store, query on: let's go through a couple of design sessions and work up. Our data mom sounds.

C

A

And that was all about normal data, not.

B

In traditional LTP social systems, I.

A

Mean it was a big.

B

C

Were going to do something.

B

Differently, you're, going to denormalize your data and just are simply right.

A

And in the case of building a web application and denormalizing was not okay,.

C

B

Was what we did exact.

A

And you tried to walk it up to how many normal forms you had and why? Because the application you didn't know what we were going to do so you're, going to just build out the model with an everybody. Wait to see one of these right. Here's your application.

B

A

That right, why.

B

So what was the? Why do people normalize data? What was like the driving theory, because between why you know what maybe normalization the right thing to do? Well,.

A

If you take it way way back, you get Peabody and Sherman to set the Wayback Machine to the 70s record. Format was what was before relational format, and it was just everything within a huge record on a mainframe or multiple files, but that was because you had a limited space and with relational he was about creating the relationships between data that then you could form together using a query and the normal forms gave you the most flexibility for those queries was.

B

Reducing the amount of space you write.

A

And so you would reduce the amount of data, duplication and you're saving every little bite and that that was great back when I had a 20 bag. Hard-Disk right, yeah.

B

I think I have a friend and income. 20 makes.

A

No I can't get an SD card like that, so we we're going to have an example. Today, I got a little ad on the slides, but.

C

A

Is is a real application and we're going to we're going to introduce it to you right now. If those of you have seen other presentations of data stacks about Cassandra and web application, you might be familiar with it. This is killer video. It is kind of like a YouTube competitor.

A

It is 100% real and at the end of the presentation, we'll give you all the details of how you too can have your very own killer, video and but we're going to walk through this.

B

But it's a real sample application just make even.

A

Really have application, yet it's not a competitor to YouTube, but given the amount of effort they put into a decision.

B

A

So here this should either strike fear or happiness in your heart. If you have done this before, I, say fear, because this is where you get pretty crazy as.

B

A heart erlin all.

A

Right: yeah, okay: this is actually a Oracle data product and I I for this. For these slides I for this presentation, I fully immerse myself back into the Oracle world.

C

A

Right, I have horrible or 12 running on my laptop I'm, using all the d-orbitals horrible data, modeling products and yes well,.

B

I remember insult work holds real nightmare. I'm schumak, gotten better right, I.

A

It was not yeah.

C

A

That installed, and so anyway,.

B

A

At all, um so this is this is a is an entity relationship diagram of killer video? Now you can see the relationships. For instance, if you look at the upper right, you see, users have many video events or have many comments, and you can see the relationships are marked by there's constraints and you can map out everything here.

A

This is a pretty good diagram and from this as an application, developer, I can look at this diagram and say: hey I know what queries I need to do like if I need to say well, I would like to know for a user all the video events for a particular video. You know that's a query. You could ask of this right or right.

B

A

Take a step back, you said.

B

That this was the data model for chilavert. It's.

A

B

A

B

Video killer videos, but not the contract right.

A

That, if I was going to take.

B

A

Standpoint and actually, we went through a design session on this and I'll post a schema for the SQL for this, but yeah that we went through a design session like this is how we do, but this is before doing anything that we don't even know what the applications really going to do other than we know it all that they made in the data so we're building the model. First, okay,.

B

A

If we drill into the user side of things, you can see that I have some. My data is nicely normalized, it doesn't there's no duplication of data, which is good, that's for swelling normal form, and so, when I go to take this, the.

B

Key the whole key and nothing about the key right.

A

You can then get the physical data, which is, whenever you create a table, and this is our this is the user table, so I create the interview table, and it has some part in here that we need, like constraints, I'm, going to say that emails going to be unique. You.

B

Would help so right.

A

B

Is probably as good as anything right.

A

And but I have to tell I'm telling part of my good model to say: hey, I want this to be unique and then your favorite yeah.

B

Why why are you putting this in dexter? What will make that decision? Well,.

A

If I'm, if I'm getting uh potentially, if I'm, going to query on the users table and I want to look up by email, mm-hmm.

C

A

Would you could not have an index on it, but what would that create? Oh.

B

The Lemnos lovely tables hands full.

A

Table scan yeah so by indexing. What I'm really saying is I'm presupposing that I will use email in a query and an equal query greater than well present integrator there may be a like, but this is just saying, I'm going to query on email and then foreign key constraints, which gives me this linkage between users and a video if you notice that there is a user ID in the video table, so that users have many videos right.

A

That is clear in this model, because that's the foreign key constraints at the bottom down here, where says foreign key user ID I.

B

Came it has many animals right.

A

Back back in the old cave generally, yeah.

C

A

If this is I, don't think there's anything wrong with this table. Although someone might tell me I'm.

B

Sure yeah and then yeah, let's just say that this was this- is for illustrative purposes right.

A

Legitimate data model, so what's next, then that now.

B

Looks like a fork in the road.

A

Welcome to road and it's funny because we found a picture of a fork in the road, but really this is what we're going to do now with kathan er. It's different right totally.

B

Different yeah like and.

A

That's why you're here you want to know the difference and some I gave you like: here's, here's, your old, familiar pastures, so to speak, and now, let's, let's see what we have a fork in the road, we're no longer going to do that, we're going to go a different direction! First thing you have to know.

B

Ahead to have your application, mmm it's all about the everything from the application to the fuse and there's actually stems from some of the practices of denormalization in general. So when I used to date, a model back in the days of data warehousing and do my star schemas and all that I would spend a lot of time and I work for.

C

Vail Resorts doing this so.

B

I got to spend time on ski lifts actually having these meetings, but asking people what they wanted. What is my marketing team? What questions of my marketing team have answered like so I was alright. Tell me what you need and then I would go say alright and then I would go interview all.

C

B

People and say: ok: what data do you have and then I would build these big ETL tools that would take the data from my source systems and put them into these into my star schema, which.

C

Would hopefully,.

B

Hopefully, answer the questions that my marketing people and.

A

Now those those because data warehousing- you don't know it's just hey: here's, a collection of data I do I know and I can use. I can use some really epic SQL figure out and I. Do some really interesting queries and some sub queries joins and outer joins that.

B

A

Did have time yeah so standard is completely on a pet and.

C

A

Need to know more about your application before you do so go back to video number one I believe I said this or you said this a.

B

Minute somebody said it.

A

Cassandra the database for applications. This is what we're what we're trying to accomplish is. We have applications that need up time. You needed scaling and users just going to deal with any other choice: they're, not okay, with down time, so we're building and.

B

Closed down can slow.

A

It down to and we're building an application using.

B

A

Database, so we have to think about the application first, then we build our models and that's I think that's the you know. That's really interesting and probably where I get the most questions is: how do I build these models when I really not even sure what I need mo.

B

And what did these models look like? Are they like car yeah.

C

B

Models are they different? What's the deal, let's.

A

Deal with that so again, thinking before you model, so this is we're going to walk through that process. Okay,.

C

A

Gonna take a top-down here, so we do not. We do not have a database without any relationship now, if not a relational.

C

A

But our data has relations well.

B

I mean this is going to be a relationship inherit and every everybody.

C

B

A

Is have many video and what we've added this is a this is actually taken from our formal data modeling class ds2 20, which is available amel.

B

Now I'm Dance, Academy, right and.

A

So there is a formal data modelling process, if you want to, if you want to take your game up a little bit, but I'm gonna walk you through just the highlights here, just just so to give you an idea: okay,.

B

A

B

Of ends and ends is that right.

A

And that just acknowledges the fact that there's a one-to-many many-to-many relationships, but what the relationships are and then some of the properties of those and.

B

How would you come up with this? Is this based on like? Where did this don't date, a domain come from this.

A

Is when we talk start talking about our application? Okay,.

B

A

We'd say we're going to build killer video, here's what you're.

C

A

That need to be there. This is a standard application, design, discussion, okay,.

B

A

Board, hey we're going to have this thing is going to take over YouTube. What are we going to need so.

B

Who would be involved with who the people they involved in this conversation, this.

A

Case probably developers and architects, okay.

B

A

Product people make sure that the product domain people are involved. This is this. Is the people involved here, because we want to make sure that we got all the parts we're vast? Yes, okay, we do need to have a user with him first name and last name check. So this is really not we're not getting into the physical data mark. This actually could live outside a Cassandra as well, which is interesting, but will quickly change that.

A

So, let's model our queries. So we have to ask these questions. What is our application? Workflow? You know.

B

A

B

C

B

A

Think we can do that. It.

B

Almost seems like it's better to do this apps after your application is built or no.

A

No, no I'll show you well, it will be part of the design process. Okay,.

B

So you may, if you might be doing this later than you would in a traditional right, okay, so.

A

You might actually.

B

Start the data stuff, the database stuff earlier yeah much.

A

Earlier, yes, because you're building your application, you're saying well, how am I going to access that data based on the application, workflow and then I need to know these like this is there's capital they're, not hot.

C

A

This is this, isn't something that is, you can just gloss over? You can fake around bad data models with good SQL or bad SQL. Is everything.

C

A

You really don't get that choice with Cassandra. So that's why we're walking through the process- and you know this is what makes everything you don't get joined here. I think.

B

I was talking about that. The first couple of episodes right.

A

B

Here's a sample this.

A

Is in that design session we talked about not just people the entity, but this is really.

C

A

About our application, it's a top-down workflow and that workflows like if you take it to the left here, where user logs into site here are things that they will probably one we want to show them, show basic information about a user, hey hi,.

B

User welcome back.

A

Back personalization, how about the videos that were added for that one users in.

B

The last five videos that you upload it exactly.

A

And yeah that show the comment well, and this is what.

B

You thought of the last.

A

Five yeah exactly these are I think we can- and this is still product and development people in architects talking.

B

A

This and same grip same.

B

Group but whiteboard.

A

Notice, a subtle difference here: I, don't think I really ever did this in relation. Maybe after the fact- and we said oh, we could go to SQL we to do this. Mm-Hmm.

C

A

Weren't really thinking about this was much of ahead of time. No.

B

You had the DDA in the back of the room, saying no.

A

You're gonna need an index on that.

C

In a cop yeah, yeah.

A

I'm gonna block that a minute.

C

A

Yeah we we now have a pretty good idea of what we need from our database based on this I think so, based on all that, look at what we got, we actually have some interesting questions. Now we can address with tables and now we're progressing towards the physical data model, which is or actually create, table statement so like when a user logs and find users by email address. I need that yep.

B

That sounds hot on liquid again yeah.

A

And if we look at our videos.

C

A

We're we're starting to see something here: I need to find all videos by a tag and I need to find a video by an ID.

A

These are indexes. Are they.

B

Well, why not well.

A

Let's get into the reasoning, so I'm gonna I'm gonna, leave this there for when we put this on month SlideShare, so you can have this whenever you read it, but well. You've already should have gone through this, and this is, you know. We know our queries. We denormalize all of these. These are like the data model. How you do things- and this should be from last week right and let's look at how we do this with like users, look directly here's a physical data model so um and.

B

For those who weren't there last week, this looks a lot like SQL, because it does look like a lot like SQL. This.

C

B

The syntax, you would use to create tables and Cassandra. Yes,.

A

This is Cassandra, query language and it is not empty.

B

Let's melt Amir's, this is, although we're actually compile and run so.

A

And this is our physical data model that is present in killer video and, if you recall, with the SQL get while I had a user's table and then I index, the email right.

B

Because we need to just search by email, we want to be able to say hey when I have this particular email address, bring it back. Whatever information about this right.

A

And the now what I had to create here, if you look on the left, is a different table.

B

Right where we made email the primary key, which in this case means that that's how it gets distributed around the ring. So when we say we want email address, equals Rachel a dataset com, it will know exactly which nodes to pull that record from. So that's how it becomes extremely efficient in a distributed system. Right.

C

B

Why you actually need to have that in the in your query, this.

A

Is a critical concept is, and you I don't want to just make this a blank of statement, but most times when you do create index inside of a relational database, you could probably create a table to do the same action, and it gives you that full distributed like what we learned last week with a primary key being distributed. The way it is and how it can hatch properly and get spread out of them are much larger ring. You take advantage of that and you get great uptime I'm amazing speed.

B

Awesome but okay, but their email address. You know yes,.

A

There is- and that is where we are at the normalization right, so let's the let's. First of all, let's talk about the indexes, though, because I, if I'm, not creating an email index I is why not okay.

B

Well, I mean, and there are things called secondary indexes- that's not it is. It is.

C

B

You to create a like the table that we had on the previous slide and put a index on email address, but we know that index is fairly unique. So if you think about it, each know is going to index its own data. So note 80 up there is going to take all the data for the cases user, ID that you know where is in the range of 80 and it will index it.

B

So if you're looking for specific email address like Rachel at data sex calm, that coordinator note isn't going to know where that name is this. So it's going to have to ask that node and that node, that node and that node and that node and that node until it finds Rachel a dataset calm, because it's only going to exist once and that seems like a whole lot of asking for not a whole lot of return and.

A

It's going to get worse with your cluster right.

B

I mean yeah, but here's only one, eight eight node right yeah, but that's yeah. That's that's a beginning. Nexus! Tarter, no starter ring for most people. Now it's that being the case. It's the it's the thing that you're indexing or you're looking for gender, for example, gender, might have five values. You know five unique values associated with it. So therefore you know to bring back 20% of data when you say: hey I want everybody's female. You bringing my pointers on the data. Okay, maybe that's a little less inefficient right going.

C

B

Single node may make more sense, but under the covers, Cassandra is creating tables.

C

B

When you create a secondary index, so if it's doing something in a certain way, that's only efficient some of the time. Why don't you just take the to your own hands and.

A

B

Index tables yourself: that's.

A

B

Of the point right yeah and you.

A

Know these are secondary, indexes have been around so big table right.

B

Exactly that's exactly why they even exist, because you glances at all the time well question be using secondary indexes. Why do they exist? I.

A

Tell people this all the time and I think this is just really decent guy who is take this? This is the your guide to secondary indexes are for convenience, not for speed. Okay,.

B

And so not, but not every question is going to fall into the category of being able to do a you, build your own index so say if you wanted to do a query where it says we're email address like at yahoo.com, so you want to pull out all your Yahoo addresses. You even model that if it was, it was very important, but something that affects enterprise gives you is the ability to solar and leucine indexes right.

C

B

Top of cuz on their data, so that was actually be a pretty good use case or when.

C

You needed to search all.

B

Of the things in a table, you might want to look at using we've seen that.

A

In we, we see that a lot in production we're using who's seen is alleged it from a distributed. Standpoint makes a lot more sense than using secondary indexes in.

B

That case, all.

A

Right and I there's.

B

Probably a presentation somewhere out there I think.

A

There is actually so. This is a long-winded thing of why not indexes but I think it's important, because it's a common pitfall when you come from relational to Cassandra and you see, hey I can do a create index. It's not the same thing. It's.

B

Not, and it makes you feel warm and fuzzy, but it's going to come back and by chest. Yeah.

A

Let's look, this is just fair warning, but, and your scanning is important, I think you now. You know the reason why it's not just no now, you know why and no and features are coming 3.0 how to really interesting things that are going to change this story of it. Even more so I mean when this database is moving in a correct direction.

A

So, when I think, when I created this table, I hear a lot of people use this with the term materialized views, it was like creating a materialized view of your data, but it is kind of like creating an index of your data so which.

B

Let me everybody that indexes are not free and they are not cheap. In.

C

Relational databases.

B

So everybody I mean just because you can do it really quickly. It still needs to be managed still days maintained and it still takes up. A lot of disk space was.

A

One of the things if I had a high answer table an Oracle world I had an insert like an inserted table or a table is being inserted on heavily I would drop indexes to try to speed up the insert times, because you know you have to maintain those indexes real time and then in my.

B

World when I was a data, warehouse I think 75 percent of my storage was dedicated to indexes.

A

Right that can take a blob space, so yeah there's no free lunch. It's just understanding the trade-offs, so I I have this. You know this concept. I need to look up. Videos by user and I created two border with it. Well so I have this user video table on the right and what what I want to point out and what we taught talked on earlier was this whole idea of denormalizing data where normalising was about less data? Duplication is like one copy of your data for for all to use.

A

We are with denormalizing we're getting to the back to this. Well, we could have multiple copies of our data and the trade-off is okay. There's still, storage costs on that that you're not creating indexes either, but but.

B

You are creating index you're, just having you just have more control of.

C

Your edit control at.

B

The cost of it is not easy button, but I think eventually we're going there. That's these that Cassandra will have sort of easy button indexes.

A

B

Easier button indexes and.

A

You can understand why we do this, because whenever I ask for user videos, I don't want to I, don't get a join, so I can't go get the name in the preview image location, but.

C

What I'm going to get in.

A

A single query on user videos is those two I I need to return those, because, if you think about web and the speed that's web requires or mobile I'm going to do a single query, so I can get my data back within milliseconds I need.

B

Millisecond yeah, because everything.

A

B

I like to say, though, is that you know how long it takes to blink your eyes. I think.

A

It's an eighth of a second or like.

B

200 milliseconds throw.

A

In a millisecond right, but.

B

Most my clients that I work with have 20 milliseconds, SLA right.

C

B

A

B

In 1/10 of a blink of an eye, you.

A

Get 10 of those in one block? Yes, that's great yeah.

B

But that's the speed that you're required to buy assignment and.

C

I'm too earlier slowed.

B

It down right, so if your app isn't responding and we've all done this, you say uh forget this: did you go to the Internet and.

A

I built these data models before where you know that the 95th percentile SLA was 10 20 milliseconds, saying yeah, keep.

B

C

A

The only way to do this, so we when we we have some considerations when duplicating data and here's some of the main points, but the trade-off is, of course, that you need to maintain it, and that is an application problem. You need to maintain two sets of data or you have two tables.

A

There are libraries and drivers that I love it that help you do that, for instance, eql Engine, which is now part of the Python driver data stacks, have some nice features for maintaining two tables same time with single data, but this is on the developer. This is where the developer have to be in charge of this particular problem and again it's for what you're trying to accomplish with your data in your application. So.

B

You same here, the developer needs to take probably a little bit more active role in understanding how saundra is actually oh great,.

A

Now I've heard we've heard that discussions that you don't need to understand it Cassandra internals, to use it well. This is the internal discussion. That's not the full bore DBM internal, but this is how it works and Emily the hooves. You.

B

As a developer to understand the stuff mm-hmm.

A

And and look at the successful people that are using Cassandra, they understand this and we work with it. Yeah.

B

And it's not that it's not difficult is just different.

A

Just a different way to do it. Yeah well, I also want to talk about another issue, and that is when we get into some. Last week we learned about how the primary key works, that it hashes the key and it goes to a single node, and so let's look at this data model where we have the latest videos bucketed by day.

A

So that's that just means that, for the the partition key is a date for and it's just a string, but it's year-month-day, and that means that the partition key will be present for a single day right.

B

So that means when you every row will be for a particular day all.

A

The videos that have.

B

Been added on that particular day, we're saying it right. Yes,.

A

And if we are at YouTube style, that could get pretty crazy right right and what does that mean could create a hotspot.

B

A

Right, okay, so let's.

B

To give an example of when a whole bunch of videos might be created, I'll be uploaded on a single day like opening up the Olympics, for example, or.

A

C

A

C

A

When people who take pictures and upload those.

C

A

Or particular cat videos really bubble butt.

B

Right yeah, like grumpy cat, has pins I'm.

C

B

Did the tube is full of cats, so just got to.

A

Go crazy but we don't know and that's the problem is.

B

That you never know when the cap, you never know what a Kardashian is been. Exceptions are that's.

A

Right videos are going to be uploaded and we have to think about this as a scaling problem. So this is where now developers and DBAs to work together, hey well.

B

What we use to keep it to telephone? Oh.

C

Noes are going.

B

To work together, dog.

A

And cat I'm gonna get ya. So this yes, so you know as a DBA. My job was to keep developers from pushing code into production, really wasn't.

C

A

You're not going to put your system, but it was more active fact. There was some consultative thing going on, but you're.

B

You, the DBA in the corner, saying no. We.

A

Added the PPA in the corner but says I was also developer. I had to play both role, and so this is important thing of why? Let's look at why how we can avoid the hotspot and.

C

A

As that data model goes I'm going to add something in here, so we're going to have this arbitrary thing. This is one way to do it of a bucket number and so with. Maybe we can create an integer of ten things right and make it so we can spread that data out so.

B

I look at numbers. Let me do artificial I. Think is the point here.

A

B

A

Can do the marketing oh well use second, sometimes yeah, right.

B

A

There's 10 yeah I take the tens of seconds it.

B

Just some sort of bucket some sort of way of right, making your roads narrower yeah.

A

B

It seems like light, wine rose, we're still not saying don't do wide routers with large.

A

Partitions, yes, I know you're back in the thrifty yeah.

C

But large partitions and.

A

So the bucket number y bucket, what are they going to do twisted it? So.

C

I have an example.

A

B

Our big cluster that.

A

Was a node cluster, it's the big one right and, if I'm just using your Monday, this is just a drill. It's going to hammer the hell out of that one note and what will eventually I mean for all day. So here's the Kardashians a little kid.

B

Or today, yes, today, is just that. That note is going to get.

A

B

Today, today,.

A

All the rest of we're gonna sit around watch as the thing goes up in.

B

A

B

A

They can Apple for making that happen for me. So what happens when we now add the bucket number well, this is I think you can see. Now you get.

C

A

Plan, it's spreading that out that load, which is kind of what we wanted in the first leg partition key. If you use a different partition, key we'll do that and if we odd bucket number it will guarantee you have a different partition. Key every single time gives you that good spread on your nose locality. You don't run into hotspots. It's just.

B

Better, okay, all I make that and.

A

I, don't have a flaming thing for this, because there's no need now the sequences, the other right after indexes. The other thing I hear people asked all the time is for sequences yeah.

C

A

This is a it's an Oracle sequence: I created a sequence increment by one start with one and then I have a cache of 10 so and what do I use this for I could say: user sequence stop next valve and I get guaranteed the very next value in in order and also non duplicate and.

B

Let's see how that works, when you have more than one node Jason.

A

B

It is very hard to define problem right.

A

This assumes that you will be on and if you've ever ever, had your sequences get completely logged and in Oracle world feel your pain I've tried to these really a mess whenever the sequences get whacked somehow or the cash gets messed up somehow and it happens, I've had that them in a few times. You have to deal with this problem, so this doesn't work in distributed systems a and so when you ask for a sequence, I know what you're thinking, how do I find unique number right.

B

Because neither always want to handle uniqueness on your up-.

A

So we have a user table, let's go back to the user table and the user table is a UUID as a instead of an ID right. What.

B

Is that what is it.

C

What the heck do you ID I.

A

Get that question all the time. This is what we're going to use to identify the unique user. So you UID is you had a consistent hash with universal, unique ID and.

C

A

Fire, if you're in Microsoft land, it's doing.

C

A

That it's a consistent hash, so I can identify, for instance, cream and but every library does it. So it's a 128-bit number. So that means that there's a lot of them and the guarantee of uniqueness is pretty solid, I'm sure there's a collision. One point in the universe at one point in some time, but who.

C

A

C

A

Is guaranteeing that you're going to have a unique number from the client? So it's a point of view of the clock, but.

B

That's interesting because people always ask me: well then, how do we get our you know? How do we get it? How do I acquire you on a UUID right.

A

B

Like I, don't just have those things.

A

Memorized, you don't, and so in this case I'm not.

C

A

To use you I do too to like do a search, then I'm going to use whenever I created a user. The client.

C

A

Create the UID instead of querying.

C

A

And saying, can you give you the next value which in a large-scale system, is always going to be either you're, locking or you're playing dice you're just going to spin.

C

The wheel, yeah and.

A

With the UID generated on the client, you just insert much faster that way. Well,.

B

Then, how do you get it back? That's.

A

Why we created the index? If you recall, oh yes, on whenever we have the user credential.

B

A

B

A

Was email yeah? Well, we created an index table tonight where we have the email address as the primary key and then one of the columns was the user ID. So we could do it search on email, which.

B

Is a much more reasonable? It.

A

Means you insert into two tables and you create a user, but you get a fast lookup for that and you get a guaranteed unique number and you don't have to worry about locking because locking will kill you. We.

B

Talked about that last week about race condition, if you're trying to insert things by email address right.

A

It's not good yeah and a user library transaction right for the box and not me.

B

I complained: diamond-plated.

A

Very expensive and it's like the diamond, the iPhone. Do you really need one topic number three bit of all-time importance for users.

B

This was always fun, because this is I. Think a very um one of our big clients gave you a how many page document of skier on this 20.

A

B

A little color, the sphere document.

A

B

Consistency, oh yeah cap theorem and not so consistent, and what do we do? The whole world gonna fall apart. Yes, the thing is: Cassandra is commonly known as eventually consistent, but I like to say to the tune of Li consistent, because you know you can be as consistent as you want it to be. You just pay for it as far as amount of time does. So why are we to bring it up here?

B

It is because it's part of your application designed determine how consistent you need each one of your reads in each one of your rights. You can set a default on the client to always do a certain type of consistency, but keep in mind every query right. You control how consistent you want that right. So, let's go through a little bit about what that means right.

A

And I've, so what I have here is a table of different consistency levels and, let's explain these are in an example, and this.

C

A

Again, developers in most likely DBAs should confer yes, because the trade-offs are going to be in both camps and.

B

Probably I would say your developers if you live even a really consistent and your to be able to be like no, you don't, and so there needs to be this.

A

Is where the some type.

B

Of conversation.

A

B

A

Concept, adult supervision of DBAs. Thank you, I know.

C

You think you need.

A

This, what I, love in my conversations are like public, a cute little developers.

C

A

You need that, but here's what you really want, and so this is going to make you're going to enable that conversation right.

B

A

Here's looking bad as.

B

This is talking: let's talk a little bit about consistency, so we've got our famous client in our nose and we've got replicas, so replication cracker free these. These are concepts that we're all discussed in the first part of this series. So if this does not make any sense to you, you might want to go back and wait.

A

Around but we're going to glue the two concepts together- and this is a very small ring if you notice that the token range is zero to a degree yeah.

B

It's very small yeah looks like I peach coming yeah all right, so we're going to do a right at sea level of 1, and what that means is that we only need one of the replicas to acknowledge that it has the right. It.

A

Will still write to all I read.

B

But it has to acknowledge from Guiseppe acknowledgment from one not from it doesn't have to gather from all three. Oh look, one of them's got it and going back to the client and we're gonna score. We have a good right. That's.

A

All it really and that's what, from from the client point of view, I just needed one, it doesn't mean I'm, gonna, say only right. It's one I'm just saying I need two acknowledgement from one.

B

Yeah and access from one of the notes so.

A

Think of all the random data would be like I, usually like some common kind of concept. It's like log data like I'm, writing log data I just want to make sure it's in there I. Don't really need consistency to very high, so I'm just throwing it it's a database as fast as possible, and consistency in this case is not. My concern is just that it gets written to the database. You.

B

A

B

It eventually get there.

A

Right, surprisingly, and there's a great talk on this by the way Christos Caliban, toast.

B

Message: yeah look.

A

Him up awesome talk about this. Yes,.

B

Eventual consistency does not equal hopeful consistency right. It's a really good word. 50 minutes, I! Think that's all it is. It's.

A

B

And that's: what's been a real some really good Studies on all.

A

This so now what why why this trade-off? Why do we need this in this situation, so I'm back to writing one again, Oh.

B

My nose are down, maybe because I said hot spots in there and they're up in flames or.

A

Some idiot unplug them in the rack again.

B

Toddlers and chickens running around data center so.

A

We have we have this problem, we have to avoid, and so this is where the uptime issue gets addressed and with consistency level of one, the trade-off.

C

A

Going to be fast, I only and I only require one I'm.

B

Always going to be always available, always.

A

I mean it's much is there's one replica available, you're.

C

Going to write.

A

To it and it'll acknowledge it, and your client is keenly unaware. That is exactly what your users are hoping for, but.

B

Behind the scenes, cassandra is: has mechanisms in place to make sure that those nodes that have the X's on them right now eventually get this data right.

A

So this is a beyond this conversation, but there's a concept of hints hints and that's where the know.

B

The repairs and yeah.

A

B

A

Goes back to the Dynamo paper, but this is managed in the background for you as a service and and as.

B

You see it's a service and.

A

As this stuff is getting written to one server, you know that it will get eventually put onto room.

B

That's where the eventual.

A

Consistency comes from if.

B

A

Are not online? No.

B

And they, but eventually they will get the data and this but believe us- and maybe we'll do a talk on this in the future. When we say that there are many, many different release valves along the way that the data gets there, it's not just one I'm.

C

B

White there's three or four times that and the last time is a completely anti entropy thing where it's.

C

B

Today is going to be.

C

B

In lockstep at the end, so that is what we mean by eventual right.

A

So new trade off so now we would move up the level of consistency to quorum and quorum.

B

Which is the I like people say 51% I like using the formula which is.

C

B

Factor divided by two plus one.

A

B

A

You're, the math major yeah.

B

I, like the formula.

A

I, like 51% committed here, um but what does that mean the.

B

Trade-Off right.

A

I know you want arguably, but don't the trade-off here, is that if I have 51% I'm getting really good strong consistency, because that means that 51% of three replicas means at least.

C

A

Your your formula.

B

A

At least two still the same answer: at least two of those replicas have to pay. I got this so when I go to write that data out I, say: I want quorum and one of them dies or.

C

A

Absolutely cool because we have two nodes that can respond back if they've got this and your claim again keenly unaware, but you're getting a good, strong consistency in your and your data and.

B

If you read and write everything at quorum, you're, basically reading, writing and everything, as mostly consistent, mostly.

A

Consistent- and this is a really worthwhile thing to mention too, if we're talking about rice in this case, this is for reads and writes and they're completely independent per call in the client and.

B

The clients have the ability to downgrade your your consistency settings. So if you start at quorum and it can't fulfill forums because the world has ended, it will retry again at one. You can set that that's completely in your control, absolutely.

A

B

A

This is one of those fundamentals is understanding the relationships.

B

A

Not anything to be scared of, but this is a very fundamental thing- is understanding a relationship between your replication factor and your consistency level again developers and DBAs working together, yeah, don't.

B

Cross the streams.

A

Don't cross the street, you know what happens: Stay Puft Marshmallow the drivers now we're going to shift gears into actually writing code I.

B

Just sent to the drivers like drivers had these options here.

A

Why every seven options very good.

B

Well done alright set up a a transition ex there's.

A

A very good transition, so when we when we go right or right arrow application, the drivers, a critical part of this, obviously, is that we are API how we get a hold of Cassandra with our application. So what you have to choose from, and just quick tour here of drivers, its data stacks we've devoted ourselves to trying to create a very comprehensive and consistent package.

C

Of Bryce not.

A

Eventually, underrepresented.

C

A

um Where you know we take some of the more popular languages and things change all the time right, but we're sure Russell show us face here pretty soon, but Java C, sharp, Python, nodejs, Ruby, C++ PHP, are all supported.

C

A

From data stacks and in the community we have drivers of closure, dough, Erlang, and there are others that I see popping up from time to time, but these drivers, what what are they going to do for you and then why would I say? They're consistent, like the experience, is consistent by what here's an example of a this is a slice of Java code.

B

A

It's good, but what I want to point out here is: whenever we create the cluster help the conceptual part of this from it from a programmer standpoint. It should be I want to connect to a cluster, so I connect to just a contact point now. This case is localhost because I do it. On my own.

C

A

But I don't want to have to think about managing the cluster inside my coat if servers are coming and going and all that stuff is happening in the background, the.

B

Server's kind of death, though.

A

They do, and you should not have to rewrite your application code, A or B, deploy a new configuration file and man I. That was one of the hard. You know hard thing to manage in production environments is configuration file maker, they're all correct. What I built it am I covered here is now. This is part of the consistent driver, experiences that we have. We have a way to connect to the cluster survey, adding some seeds or by DNS name, and then we can specify retry policies like what happens when bad things happen.

A

How do we manage that and then load balancing? How do we make sure that the data is being spread out among all the replicas very nicely like, for instance, in this case, I create this token, where policy so I'm, actually looking at the partition key and not not being not going to a coordinator I'm going to write the to where the replicated and I'm keeping it in the same data center. It's got enough crossing data center wine yeah.

B

A

Very fast, so I'm showing you this in Java, guess what same experience in something like Python. So it's.

B

Pretty nice, it.

A

Is very nice, you.

B

Get the same stuff uh-huh.

A

I mean I, my the two languages, I use the most or Java Python, but I use the Cassandra driver. If I don't have to rethink about how I connect and do things to the cluster. It's different syntax, but I still get these three things and that's the consistency, and you should understand how this work. That's about all I'm, going to get into that. What I mean we have plenty of talks about writing application code, but I.

B

Want you to imagine right.

A

Language-Specific talks tons of information on this, but I just want to keep keep this on point of when you're doing it. This is where, again, you can even work with your DBA s, to make sure that you know the contact, points and load balancing policies are correct for what you're, what you're deploying so finish up here, yeah.

B

Well, I mean so you're. If you are listening to this talking, the chances are that you have an experience of Oracle and probably Oracle has paid for your car. Oh and.

C

Your friends having.

B

Those skills in various places is your lively and their office. There is some fear that happens when you say: okay, if I develop, a new technology am I going to you.

C

B

Going to happen like, what's going after my job, like am I, am I going to I'm going to lose my job because we're going to bring it with the Fondren thing or there's no single thing and all my your horrible experience, though,.

A

Not at all and I think anything I'm living proof that, but we have you know, do we have to hear some roles that I could easily define. Just you know by looking at the the separated separated roles from when you're at Oracle developer, Oracle DBA. These are the things we're separated, but what I want to show you now is the things are changing right now,.

B

A

We we now have a bit of a Venn diagram overlap where the DBAs and developers do need to work together, but they do have their own separate O'groats. Yes,.

B

There's definitely more overlap here.

A

C

A

Drivers setup like I just mentioned you like picking the right consistency, levels and things like that capacity, planning used to be all DBA yeah developers can have a hand in that, because they're going to be able to create tables to help capacity or her it alright. So they have to be a part of that capacity plan. So she looks private.

B

Yeah well, unless see why, when we were Patrick and I were on the plane yesterday, and we were using in-flight Wi-Fi losing at 30,000 feet in width and how bright is the future like what means? What is the job? What is it like? Who feels good and RAF? What does the job situation look like for people who are using Sondra looks.

A

B

And it was bad at all Bennett. This.

A

Is a this: is a percent of growth in jobs on indeed comm taken as Rachel said as of yesterday at 30,000 feet, and why are we showing to us? um Well, if you're listening to this webinar, you were probably a professional in the field and I've.

A

Had this question brought to me sometimes not as much lately but in the past is, am I risking my career or making a potentially career-ending choice by choosing to go with something like Cassandra to work with and I think this evidence shows pretty clearly no there's a high demand for qualified Cassandra people out there, developers, DBAs and.

B

You know what we actually giving you the chance to become certified choose. You can be that person on your team, oops.

A

I mean that would Frizzle yeah.

B

A

We'll put this online before we before we get to we're going to make you smarter. If you want to actually do this, this is all living online, go to killer video calm and it's Luke Tillman guy, who wrote the c-sharp version of this and it's coming with different versions. Soon this is on github and just go nuts. It is really the application was really cool. All the scheme is there, you get a top down, there's even some some great demonstrations of how it works if you're an application developer.

A

This is going to look at for sure, because, if you're like me, you like examples. Alright,.

C

A

I know: that's right, I didn't warn you that I had 40 slides on a 39. Now.

C

B

Get no lies. You want to say that this is. We are doing our first certification with partnering with O'reilly at the Cassandra summit this year, so you can get certified and become the badass in your company. That is the first person I. Do.

C

B

To be certified to bring this type of technology to in.

A

A Wiley certification, that's what.

B

Did that totally check that.

A

Is legit yeah and, if think about this is your next big career move as is really going to enhance your career. I gave recruiter emails all the time. I need to Xander people think about this in that term, because it will make a difference in taking this. So these three webinars we have give you enough information to learn more. They should get you somewhat dangerous, but go to Dave Sachs Academy, Academy, Dave Dexcom learn as much as you can plenty of courses plenty.

C

A

All free learn, learn, learn, learn, show up its summit, get your certification.

B

Training people before is our certification training.

A

Hey yep and you got 25%.

B

Off okay, like me, and won't actually watch online videos and require somebody to stare in your face, is a bonus. Well.

A

It's comprehensive, yes, so I think.

C

A

At the end here so yeah.

C

A

Got five minutes Kevin in the sky? Do we have questions, sir good.

C

If I, yes, we do have questions first question.

C

If servers or Cassandra nodes come back after a long period like six to eight hours, how does the replicas work.

A

Six to eight or sixty eight six.

C

To eight it.

B

Can be 68 good, yeah I think that I've had servers at 68.

A

B

A

Okay, you want to take this or shall I all right through the the chance that we talked about and again. This is this is a huge topic, but I will try to nutshell it as best as possible when a node is down. Other nodes will store the the changes for it.

A

Nev's are called hints and the hints are stored locally on those nodes. Until that node comes back online now, there's a configuration value that allows you to choose how long those nodes will store like if we'll start doing, it won't do like a FIFO buffer, but it just won't collect, hints anymore, and the default is three hours and and.

B

That's and you can change that, based on how long you know your outages lasts, how long it takes you to replace hardware or how much disk space you're going to have on the thing, because I mean if you Clint hit club since forever. It.

A

B

Cost you it's gonna, be a lot of set yourself. Yeah, they're gonna. Tell me database yeah.

A

So that what outside in a three hour window, then, when note comes back online, it will replay those three hours that it missed and then anything left will have you just do a repair. You were in repair if it's beyond what the default we have is 10 days, GC grades. This is how long tombstones.

C

A

Inside your system, because beyond that, you want to kill that notifier yeah.

B

A

B

Just kick it out of the ring.

A

Gently yeah take it out gently, you want to do a remove node and reboot strap it back in because.

B

Only when you plant come back yeah.

A

A