Apache Cassandra Cassandra Community Webinar Series, 29 Jul 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar | Oracle to Cassandra Core Concepts Pt. 1

Description

Oracle is a database. So is Cassandra. And that’s about it as far as how they are similar. How they are different though… well, that’s when things get good. Have you tried to scale your Oracle database to handle 1M simultaneous users (without ruining your mental health and close personal relationships)? How about creating 100% uptime with active-active datacenters? You don’t even want to think about it, do you? Spend some time with us to learn how Cassandra can make you into the database rockstar you know you are.

A

So what are we talking about today? I think it's a tried-and-true topic. Isn't it I think.

B

So you know, you've got some experience or something with this other database.

C

A

Yeah a little bit you're right, yeah, so um we're going to talk about Oracle and how you can go from Oracle to Cassandra I. This is probably number one topic. A lot of places. I go I, don't know about you, but it's just a lot of Oracle out there right. They don't round for 30 years now, something like that and it established database people.

B

Are comfortable with it, they seem to use it a lot. I think you even.

A

B

A

For 15 years, yes, pretty actively I made a lot of money, doing it I'm, pretty happy to say and I can't say anything bad about it. I mean from from a usability standpoint. um It gets pretty complex, but for the workloads that I had given it, it was a solid choice. There's you know there was a lot of choices out there, but I know like why? Don't we just jump into why we make these choices? How about that sounds good.

A

So what here's? The problem, people we like to collect, data and I, think I say: there's a lot. Rachel we're like data hoarders right did.

D

You talk about you and me going old.

B

I do like it's like data. We.

A

Love our data yeah, look at my phone, it's full of it. I have to keep buying a bigger one, but via the data problem that we call a we love to collect. Data goes back to ancient Egypt and you know this is, and it was kind of setting the Wayback Machine. It was way before any databases, but we collected data. We collect the data about crop yields and we collect the data about financial transactions. This is a ledger.

A

This is probably one of your first real databases that it is right there opened up and looking at you times.

B

Are like an ancient Egyptian ledger believe.

A

This is from the 1800s I think somewhere in there or yet somewhere in there anyway.

A

It's just hey I gave you money, I took money and balance, and this is how you made sure that things got done right and if you had two books, then you were going to get in trouble, but.

B

Last back to accounting class yeah.

A

It's pretty bad and, of course, we collected things into tables. Trade tables have been around for as long as trains probably, and that was the tabular format, the columns and rows. That's interesting how we came upon that beforehand. Collecting data has always been a problem with humans. We do. We are collectors and we really started hitting this first data plateau now Rachel I'm going to go through these data plateaus as a concept, but the plateau that I'm talking about is we started to limit ourselves out now. Do you know where this is?

A

You know what this is? It.

D

A

Is busy this is the reservation office in Little Rock Arkansas for American Airlines. Now next.

D

A

I know it's like well just write on the deer tongue, and this was an interesting place, because this is the only place you could get a reservation for an airline in the 1950s. Now in the 1958, the United States there's a lot of extra income, people started flying by and they were taking thousands and thousands of phone calls. Do you see like in the foreground? There's a guy here with a little oxygen delivered.

A

On the plane, they put a little paperclip next to the piece of paper with their seat assignment and when you called in they clip the feet they, okay, that seats taken. As you can imagine, there are some scaling problems with this now.

B

Like I said, I'll use the column, confirm our seats on plane way back.

A

hmm Right, if you can imagine, then.

B

A

This would not the most efficient way of doing it, and so American Airlines was stuck. They had a huge problem and luckily there was a company out there that solutions. So the first data solution, which was really using a computer, is a decision.

C

A

Sabre right and Sabre was a summed up by IBM. It is still here today, folks, you can you book an airline, you probably were going through savor and it was amazing because what they were allowed to do from the air or from a central call office was collect. Information from people one at a flight and people got on the plane and they had a seat, and it was all very organized and centralized, and really that was the first solution and it was awesome and it started this whole industry.

A

So let's pay a second for homage, okay done so, but what happened? Who did you call if you wanted a database Rachel.

B

A

B

Rule I've worked there yeah.

A

Probably many people on the phone have so IBM was the one who made a big database and, if needed, a database, you called a glue. Now there were competitors, but let's face it. I began that we old adage, no one ever got fired by an IBM, and so that was the 1970s 1960s for 20 years, a dominated database industry, probably in 30 years, but things do change. We hit this data plateau again here we are. This is actually a picture from 1977, I, believe and IT workers of the world unite.

A

You didn't ahead to wear a white shirt with a black tie that was like the rule and your room. Size computer is ready and the plateau was not. Everyone could afford a room size computer and that's. Okay. That's that's really interesting, so we have a new solution. Now we have this microcomputer revolution. That's kicking up. Here comes all these new players.

A

Okay, one though that stands out above the rest is Oracle and they built a relational database that works on a variety of systems and more or less supplanted what IBM was doing pretty easily because it bit I. This is where I come along. I started using Oracle in the 1990s, because I ended up working database that it was relational because I learned that in university, so it dominated from the 80s well, I, couldn't say 80, so 91 is really started. Dominating and I.

C

A

Know how far back you go? Rachel but I'm sure you were a long year somewhere, yeah.

B

Somewhere around there and.

A

Of course, this happened right: the world's got all Internet II and College Tesco, but anyway this outfit I mean the internet became a thing and the whole world became data connected and when you look at the scaling problems that they could bring it dwarfed anything you could pull on a micro computer by here, but we have this problem right. This is I mean what was your impression of the Internet at certain times.

B

Well, besides, after these yeah, we uh we thought and waited a lot is dunya is a downloaded yet I mean I. Remember like going off, and you know you.

A

Wouldn't so many other things.

B

What waiting for something down yeah.

A

Exactly, and so we have, we have this problem right, which is we put too many people on there and we get this website Oslo problem, and it wasn't just because it was a dial-up modem. It was because we were building bigger, bigger databases, but the database was a problem, and this was when I dealt with a lot in the late 90s when I worked at coms and did a lot of consulting and he was trying to make money being an Oracle consultant is my first answer was by bigger machine and I love this one.

A

This is a centum enterprise, 4500 I think 450 loved that had wheels portable awesome. But you know that was the thing you get. You got money from receive and you bought a bigger machine awesome, um the 2005 that is really I. Put that, in my mind, is a moment and that we had this problem, the thundering herd's, and it was just because everybody and their brother was getting on the internet and you never knew when things were going to come.

A

I worked in education, the thing that we had to deal with was last-minute deadlines and, of course, everybody showed up at the last second right and what happens when we have thundering herd's this problem ugh right, and this is getting pretty close to where we are now now the this is a big moment. Everyone knows where you were when they've happened right, you remember where you were Rachel.

B

A

Who was the dude got up on stage and said? You're gonna buy a million or a billion or a trillion of these things, I.

B

Totally said, no at the time as I do speak into my macbook area, with my yeah.

A

B

A

Right guys, I I said no I'm like oh that'll, be that'll, be Nexus, feather iPod. No! It's now dominating my life and Magadan.

B

Do know so I'm, not a very good Hardware of Technology, no.

A

You shouldn't be on this webinar at all, but you come around right. You come around you, you yeah exactly so this iPhone thing. Well, now it's every every bit of phone age out there has a screen that looks somewhat like that with an app on it has probably a computer listening to it somewhere and without a doubt, zero chance.

A

You will not have a database involved, and that has created a problem, because now everyone expects it to be online all the time and you can have the next cool app that just takes off and that thundering herd is going to stampede you and your competitor, the blue space. You go on the iPhone App Store, or are you going Google Play and you look for an app? You can find three or four competitors instantly, and that means it to be relevant. You have to have something that works, so slows is good as dead right.

A

So here we are welcome. Everyone gather around to the third data plateau we've hit.

A

So this is what we did right cuz. It was like. uh We always use Oracle I, you a used Oracle and you would use Oracle.

D

B

Maybe it may be a little my speaking here in there but sure.

A

B

A

Server sure sequel server was out. There had abetted that bit of that in the life, and some Coast goes in for mix when I was want to go old school, but the yeah.

C

A

The choice of that that particular time was okay, you want to be safe, go with Oracle and everything that connected to the Internet that needed to talk to something and look at all these applications ad mobile application. We have web applications, we have gaming and from telemetry stock markets. None of these really fit well anymore because of the problem. That's inside that it's a single server relational database. Now we can build it out and do more with it. But is that really the case? Well,.

B

It was designed to solve problems with 30 years ago when we designed to solve the problems announced- and you know, Larry Allen was a cool dude, but he couldn't see the future. You know what's going on, but.

A

That wait a minute all done in an Oracle, so I'm going to compute a future we'll just leave it.

D

A

Just kidding I know no, but that was actually a cool name, perhaps giving this as a cool name. But.

B

Listen Cassandra able to see the future okay nevermind, okay,.

A

Nobody believed her, um so look. Look at this potential. I will propose to you the listening audience, a third database solution that is Apache Cassandra and it is born and bred around this new problem set. It was built to tackle this issue that we created for ourselves, which is hey: let's just put a billion people on particular application and make them happy.

D

What year was a Cassandra started, I.

A

Was at a 2008, so that's.

B

After big ole, hey got no blasts, mm-hmm.

C

B

Like they went, oh maybe this isn't going to work. Maybe we need a better solution, but.

A

It was you know, the thing is, though it was nothing Cassandra came out in 2008 I mean that was. That was when it wasn't conceived, but there was a lot of thinking before then. It goes back to the Dynamo paper, which was 2007, the Google BigTable paper, which of 2006 and those things had a life inside of their Amazon and Google before them. So these problems were being solved in with computer science.

A

Earlier at those companies that I highlighted earlier Google Amazon, they were here they are looking down the barrel, the gun of billion users, and they are solving those problems quickly with computer science and just like a centrifuge. Those things are spinning out and flying out and finding themselves into projects and open-source, rather.

B

Than trying to put together a started, my sequel system with bubblegum and toothpicks that.

A

Students really stable, not well I'm glad you mentioned that. So let's look to talk about what if we bring in the oracle architecture here, so I'm going to talk about this from from a practitioner standpoint, so I built a lot of these systems in my day. So if you'll indulge me, Rachel I.

D

Always do argue.

A

Because what what I want to show you is how I was how I built these systems, or at least there is some of the technology that you could to bear to scale up and make it work right. So these should be familiar.

A

So here's my single Oracle instance great run, then whatever 10g 1111, HP q, RZ whatever, and so we have an Oracle instance running on a single server when I get to the point where I need to scale I go up right, so the next step is going to be more CPU and that that's always an easy one.

A

I can call my server vendor Dell HP, whoever and I could say: I need a bigger box awesome and when it with more CPU I'm going to need more memory and if I'm running more seafood, more memory and I'm running a lot, I probably will want to use the database resource manager, DB RM and my if you have a choice of not using DB RM ago with that I've never been able to make it work exactly the way I wanted to.

A

But that's an option and you have to manage, probably know yep 32 CPUs in a box. You have to manage that. Ok, so now, I have all the CPU memory I'm going to add more discs, which probably means that I'm going to run out of slots on the server so I'm going to put in ass and awesome. But if I'm running a stand, probably going to use ASM because I'm going to manage that storage properly and managing storage on an Oracle server is really a dark art and a good way to make money too.

A

So this is going to solve my scaling problems in a single server great and if I need to do more application, scaling, which is always may be the case I- can add in times ten times. Ten, of course, is a in-memory database that works pretty well with the oracle rdb. Ms, it's a packet like the only.

B

Is that your only choice for your caching.

A

For caching, well, you can coherence is another product that Oracle sells, but it is not integrated as much and it is a key value store. Some people use comparing it's more replacement with memcache speed times. Ten is more integrated, so that's going to give you this gift. You want and the space when you start getting in a real scale, problems you're going to need to add more service. More more of that and so you're there's only a certain size. You can get I.

A

Think I told you the story about I used to buy six to eight new servers and those are ridiculous, but you.

D

Think you need a forklift or something. Why would.

A

D

A

Like four losers in new data, spinners there's the little lifter thing, because there's humans can't and these things up and not lose a finger. So you have to put this little forklift thing underneath this huge server and lift it up into the rack and it's a huge operation, and it's at that point you're kind of laughing to yourselves, maybe that room size, computer wasn't a bad idea because at least it was in the room. So what was the other way?

A

Or do this with adding more servers instead of the bigger server, because you run out of space, which means you're, probably going to use a Stan? Actually, you will use a stand and if using a stand and multiple servers with Oracle you're going to be in rack territory, that's real application clusters- and this is uh this- is about the the most viable way of running search several servers. Now you can do a sharted architecture as well, but rack is how you get more fluid failover or things like that.

A

But to use rack you're gonna have to use a cluster where fast application notification in case things go bad and definitely cache. Fusion, so that your your data is up-to-date and working well and fast, and everything else pretty complex, I can tell you right now: I've, never seen a rack system work flawlessly or perfectly out of the box. I've always had interesting failover issues, but I think that definitely way to scale interesting. But.

B

They're really cheap to write like you know, anybody could go out and buy one yeah.

A

Well, anybody can buy one if you have a huge budget, not as probably it is the budgeting constraints or you have to pay for license, and you get a little fan out of the deal and that's going to cost you a lot of money plus you can well the thing that makes it it keeps it affordable. You can only go up to a hundred nodes of the rack after that forget about it.

A

B

Happens if you need more than 100 notes.

A

Anyway, Golden Gate, nothing there. There is no bigger I smell.

D

A plateau coming there.

A

Would be one so Golden Gate another another cool integrated product that you can use for active active, somewhat active active transactions across multiple data. Centers has left the need modes where you could have a fan-out, Hannon use, Golden, Gate, guess what you lose: ask the transactions so.

B

If you're saying that, if you go to the Golden Gate, you drop acid, what right.

A

Yeah, it might actually help to make the docks a little more interesting if you've dropped acid, but you don't get apt to transaction and that's like a bummer, because a lot of people love those and they're gone.

A

Don't do that so the little step down from out of using data guard data guard is a way to manage the transaction or manage to your secondary databases. So you can have failover, which is nice in those cases, but it did the failure. There is a mean time between failure. You can go with active or standby active costs, more money, which means you have to pay for reading that data now. All of this data is now is protected and scaling and everything up to a certain point. What if you want to analyze it.

A

Yeah, that's right! We're going to do some ETL, so.

B

D

B

Like, but have you been as fast like what about other places like well, not miss, Eva or vertigo, or anything like that? Well,.

A

That was your previous employer. You tell me.

D

A

I know you're good at that. um You sure you can use any of those. The other data warehousing technologies, but you're going to be doing HDL. There's nothing in place.

B

Ok, like what happens about like in line like why, in a business like you've, got all your data in that database shouldn't you just be doing analysis. There I mean I hate it. So not you know just to be completely upfront. Like I spent years of my life tried the ETL job move, data from source systems to data Mart's and data warehouses for.

A

Right and it could be a pain right and I mean I'm, not a slide. I may not slide here. I can't I can't anymore, so this particular slide now I. There probably another way, and we I'm bet you're gonna tell me all about it too.

A

But if you look, if you look at this slide- and this is I- don't think this is an unreasonable thing out granted, you wouldn't use data guard and golden gate, or you might actually I'm useful for different scenarios, but or you may not use golden gate with rack, for instance, maybe you will and then, of course the final solution is using exadata, which kind of wrap it all up into a big box, and it's literally a forklift I mean you have to have a forklift, it's that big to bring it in, but is it exadata?

A

But this is complex right. This is a lot of stuff going on right now and I'd.

B

A

B

Goodness or what.

D

Yeah right I mean there's lots.

B

Of different options, that's why the problem this.

A

Is why my bill rate is 375 and power for Oracle consulting, took those complex and how do complex things fail? Complex ways? So it's the back.

D

Of a publicly sale.

B

In simple way, you see, like you know, somebody knocking with you, Lenny lose a toddler and they have.

A

A data center right mr. yeah and.

D

A

Then there's just more bill hours for me so when we simplify this.

D

B

As you go back as why I want to unwind out something, that's Amira notice that says: Patrick put some nice little boxes around stuff and put some words like uptime and scale and stuff like that. You know we were trying to group stuff together to give you an idea of like why you're going to be doing certain things, because I think you can go ahead and next slide. We.

C

B

Can simplify everything back down to a single server again and now we're going to talk about a single server of so the whole point of this webinar is to introduce people who are familiar with Oracle to the concepts behind Cassandra. So what is similar to Sondra? What's different, so starting with a single server, so Cassandra was designed from the get-go to be distributed, so you can run Cassandra on a single server. We all have you know. Most of us engineers here at state effects have been having on our app laptops.

B

I haven't run a my little Maps MacBook Air, but it is design first and so most be distributed, and it's just not designed to be distributed for two reasons for well.

A

One thing because that's the only way, you're going to have any guarantees of uptime yeah single server.

B

Will fail any guarantees of scale, so it's.

A

Good, it's good! It's that single server is going to fail. I I've had there's absolutely zero chance that you're going to have a single server last forever.

B

And how better than how big's is a single server have to be? Are we talking about the more CPU, the more you know, pl3, 80s or superdome's, or whatever activate about this? No, but we're talking about commodity hardware, because we need Cassandra was built to be scaled out to provide continuous up, Sun and and scale, but by using hardware that wasn't going to break the bank or by using clouds. You.

C

B

Or Google computers you're, all that stuff will work great. So one of these commodity.

A

Hardware, yeah.

D

A

You saying I will never have to use that look forklift again.

D

B

Say never because you know the one thing about lissandra: it could actually scale for more than 100 nodes. So.

A

That's awesome. Okay, let's do that thing. So what do I need to do? Okay,.

B

All right take you off my ass.

B

One of those knows is, you know, minimum of about eight CPU 32 gigs of RAM about a terabyte of spinning disk. Maybe you know up to terabytes of SSD, so we're not talking about massive machines. Of course you can put more, but for the most part that's a pretty typical Cassandra node, so as you talking about Cassandra is designed from the get-go to be distributed. So here is your now, your database is distributed around a ring and all those nodes in there are peers. So there's no master, no there's no slave node.

D

There's no single.

B

B

Well, your next slide so.

C

Let's talk about happy to distribute it.

B

A bit there's actually the ability to put a range of data on each node of two to 127. That's all really really big number and.

D

B

Numbers are too big for my slides, so I'm actually just going to simplify it down to ten in this case, but you know realize that you can put more than eighty primary keys inside your database with Cassandra you're cool there. That's how we can get more than 100 servers kidding right.

D

B

Each node of is responsible for a certain range or ranges of data, the next slide and if you want to add more nodes to this cluster, so if you need to add capacity, you need to add you need to scale, you don't have to just scale your individual services, though you can. You can also just easily add more servers to your.

D

B

In this case, we've added two new nodes. Next.

D

B

Those nodes are going to get pieces of data from the other nodes next, and the cluster will automatically reconfigure itself in order to, instead of each node being in charge of ten tokens, it's going to be in charge of eight tokens and all those nodes are going to contribute data to the new ones to bring it online. So now.

A

So I'm going to break the float in here, but you know that's what I do the rack when I show the rack flight? That's a shared everything configuration that. How does that distance in a machine here.

D

B

Of our favorite topics stands, so no, so this is a shared. Nothing architecture is exactly the opposite, like putting a stand behind a Cassandra cluster.

D

B

Like the single best way of failure, I mean that type of webinar all on the throne and if you're interested in more details there Lina, please don't hesitate to hit me up a check up on on Twitter or any of your local data sex engineers.

B

But for the most part, if it's not that so when I do go to see, clients which I do a lot and you go talk to these big organizations and you tell them the type of hardware they need and everybody gets excited and they go talk to procurement and the servers like no.

B

We have to buy fans because fans are best for databases, and then you hit your head on the almost on the table for a while, and you try to explain, I ops and you explain physics and they're like we're sorry, this is all that we can buy. So if you're in that situation, don't despair, we can help you. That is a way out of the fan. Nightmare. Yeah.

A

Well fan nightmare is I mean it just add, complexity and cost. That shared-nothing is really. It is the best way to distribute it, because it makes it easier in the long run, yeah.

D

A

Afraid to well.

B

You just you just purchase it or you just put together this database that is designed for no single point of failure, and then you put a fan behind it, which creates a single point of failure. If.

A

You don't have a single point of failure. We will provide you with one. That's.

D

B

Just in mind that Cassandra is designed to be always on never down so, as you add new nodes to the system, the streaming of data from the other nodes is actually a background process. Everything is a background process in Cassandra, because it's designed to never ever ever have to be brought down. So you add a need. No to the system. You upgrade your your system.

B

All those is never going to bring your system down, but what happens if one of those nose fails so say node 80, at the top there decides to conk out on me. It would be really good if I had copies of that data template. That's not on backup I.

A

Danced your slightly sorry, but anyway,.

D

It's alright. You can manage.

A

That okay, good cuz, you're, professional and.

B

Then your special sliders answer is.

A

Not okay, anyway,.

D

B

D

Solution to this size.

B

Passing balls around yeah.

D

A

Webex people I know you're on the line fix this okay anyway,.

B

Okay, so the application comes down and the application will is going to write to the Cassandra ring now, as I mentioned earlier, all these nodes are peers, so each any one of these nodes can act as the coordinator. It doesn't have to be just one, the top there's a driver that fits on the application that have number of different policies that that's. How do you round robin or retry any of those particular notes so.

C

B

Just using an example of one here, but keep in mind that any of these nodes can and will be the formatter, so the application writes a row of data, the primary key of that data, the partition key is hashed and aside a token value, so go ahead and go to the next slide that token value is actually can be written in any number of places and you, as a database administrator, decide how many copies of that data. Do you want around the ring? The most common number is three.

B

Mostly that way. If one of your nodes goes down, you can fix the other whatever it was wrong with the first one on a second one and still be able to have a third one available third copy available to retrieve requests. So I like three. It's also, you know, there's also a my pipe somewhere in there, but reads a good number. Yes,.

C

A

And so I think really because it's been proven time and time again and there's some math behind this, but a replication factor 3 is, is a good trade off for space, because you're going to be, these are going to be replicated three times and they're. Also it's for the uptime. Another thing is how you can hear your data is consistent at quorum because they have three nodes: you're going to have 51%. That means two nodes online. So that's some of the reasoning behind it. Thank.

D

C

B

So now that, as the data was distributed to three nodes, we can lose two to the nose and still make me enough time. But what happens if our data center goes down and data centers do go down? There are chicken cobblers being let loose every day and all data centers it's obscure in society or there's natural disasters, or you know actually real reasons that a data center might go down. Well,.

A

Regions go down in aw up something this happens.

C

A

In Amazon, I mean I've had that happen to me personally in it, it's reality right, it's going to happen and.

B

Then all the times like what happened last November that Amazon decided to reboot a random 10% of the nose and AWS right.

C

B

Nobody knew which nodes are going to be is just cuz; they were just gonna pop out of existence. So what is ever going to do? Okay,.

D

Everybody, don't you might or might not have downtime? No, you can't. You can't put.

B

Up your app for PR, no.

A

No no I mean you know those. So netflix is a great user. They talk a lot about these two concepts. They run active active of course, because they want to they're an Amazon hundred percent. They know that there's going to be downtown, so they run active, active and have a great webinar or a discussion on that.

A

You can go look up, but they also talked a lot about the individual notes right because when AWS reboot happened, Thank You Amazon, they didn't have any downtime because they were they were configured correctly and they had a good replication factor, good consistency, level and, as nodes were blinking out, I think they lost like 300 No that that redo process they didn't have a second down time, because we're ready and the database was built to withstand that. Naturally,.

B

Netflix goes down, it's not just bad PR. It's like national news, I mean if the president gets involved and it's free day, my and my five-year-old and have a sit. It's a little boy back.

D

A

Exactly not a good thing, so that I mean I. Think that's why they and so data Netflix is a good example. Those who hit that next data plateau right is they they knew they couldn't get where they needed to go. They are increasing stock value for their for their shareholders and they're, also making their users happier by keeping things online they're like the cable company. Now they have to be at one all the time, they're better than their cable company.

B

D

B

You to be better than the cable companies nowadays, that's true yeah.

C

B

It is so dude, so the profit already knows how to replicate data. So, instead of just sending three of three pieces of data out of a single data center, it will also send a nother one out to all the other data centers. So I have two data centers on this slide. Let's just keep scale to.

D

C

As many as you want, and.

B

We'll just send out to one copy over there. It takes a coordinator and the other data center. Then the coordinator good, advancing Thank You Patrick will.

C

B

The data properly around that.

C

Data center now.

B

Only because these are slides and I want to look pretty. Do we have the same number of nodes in the both datacenter and the same replication factor? That's because cemetry looks nice, but cemetry isn't real life, so these data centers actually do not need to be symmetrical. They do not need to have the same number of nodes nor the same replication factor. They also don't need to be in the same place. They don't need to be in actual physical data centers they could be wanting to be in the cloud want to be in on-premise.

B

One could be an azure like there. This is a this is a completely heterogeneous environment. That's.

A

The typical it's a typical arrangement, it could be and I've seen plenty of on-prem that has another kata center in a cloud environment because they don't they ran out of room, and so this this will enable that for sure we had who Yellen was had a discussion about how they moved from on-prem to the clouds and then back to on-prem without any downtime based on this replication strategy. So there are some interesting things you can do with this and.

B

You know Hollywood hasn't caught up I just watched terminator, and the entire idea was that Skynet was been a single data center yeah.

D

B

The whole end of the movie.

A

B

D

There's technology.

B

That solves that, but.

D

B

C

D

B

Uptime and scale that is the yeah Cassandra again live designed to be always available, always on multiple data, centers of various copies of the data within the data center. But we also talk about caching, so this thing also does have to be fast because we talked about down and down, but slow is also down. So remember those thundering herd's back in the 90s of you know your website being slow. That's not going to cut it anymore either if you're on an app and you're, and you can't get to where you want. It's so easy go.

C

B

Apps there you can, you can probably download another app in the time. So speed is, is very, very important as well, but if you put a caching layer again, you are introducing more complexity, you're introducing another single point of failure, and that's not what we're looking for. So Cassandra is designed to do incredibly fast transaction read and write work.

B

So, first off we have the application it talks to the coordinator knows we saw that earlier and it sends out a write request that write request is going to hit a commit log that lives on disk. This commit log, provides durability um and will also is append-only and was written sequentially, so you've got one disk that sits there and write sequentially and always append, and then, when it's filled up, the file is start to another one, very, very simple student: it hits a commit log, it acknowledges back to the coordinator and the right is good.

B

At the same time, it's going to do all a signatory. It's going to write also to them to memorize and write to what's called a mem table. So once the data is in the mem table is now queryable and any anybody coming from another another node or another application process can read the data. Add a memory there's only so much memory, so eventually those men tables flush, the disk into something that's called sort of strings tables.

A

Muriel sorry, thank you, but.

B

Those at those tables are also immutable and they, the data in there, is sorted and rented sequentially the mem table. The mem memory is cleared and the data lives is lives on disk. Those discs can be SSD, they can be spinning. Spinning disks, the Cassandra was actually designed to provide fast, read and write on spinning disks, but we're also works very, very well on SSDs, of course. Eventually those SS tables are merged together in a process called compaction and compassion of keeping up.

B

The compaction keeps your reads from being too spread out and sees that up. Speed that function, us yeah.

A

So I really feel like this is a critical thing. This is not any memory datum and even though, if it does go into men table I hear as people ask. Is this any memory database yeah, because it writes a commit log that isn't about of a durable write, and so it is on disk and then once it's in an asset table and no longer needs to be in the commit log, so we're still using disk a hundred percent across the board.

A

So diff matters you don't get away with magic if you're using the 7200 rpm SATA the speak time of those going to be the problem right. So just keep in mind. This is still a disc database. Okay,.

B

And but just only spin, so fast I always.

A

Been so fast right, physics.

D

A

D

A

Back in the day when I was a DBA, this is row TN good, although at the end this is a duty government of database administrators, and so what these are there's a reasonable list of things. You should know to be a decent database administrator. If.

B

You donate time.

D

B

This is right. This is pulled off yesterday. This wasn't back in the day. This is right. Now this isn't back yesterday.

A

We yanked it but I, but I think it's just as relevant I mean it seems to do with getting data. How do you, user security, safe keeping your data safe? That kind of thing I mean these are all basic deals. There's there's nothing complicated in here. It's not like how to set up a rack bust or anything like that. It is really and you could almost put the punch in any database, my sequel, anything, but what I think would be interesting.

A

Rachel is, if we talk about how things all right so I'm coming from this world, how do things translate into the world of Cassandra.

B

Luckily, luckily we have, we have tools for this. We have tools to make things easier for people who are Oracle, or my sequel or sequel server DBAs to get used to how things work in Cassandra. So what's currently being highlighted here in pink or red or whatever color shows up on your screen are a tasks that are handled by off center off center is available for download and you can use it against Apache Cassandra or bit effects Enterprise.

B

So here's like a best practice services which will actually go through your system and give you some ideas on whether or not your replication is is set up correctly or your performance is good or your security is set up right or your backups, so the next one we've got this we'll take a look at your at your ring right now. We have a Cassandra cluster with also solar, our search and analytics and we're seeing what is green, what is red and what is yellow.

B

So all those tell you the health of your individual node, and then we also just have the basic dashboard view. Your cluster health, your utilization, your load, I, mean pretty much. Is this completely customizable will give you all the stats of your of your system, just like you're used to with Enterprise Manager, so.

A

Functioning from the world of Oracle and huge Enterprise Manager, this is a I think this is a pretty similar tool where it gives me an all-in-one look and let's monitor things now. I I say this a lot you put a thousand nodes in a cluster. You better have something that can watch those. It's not something you want to spin on your own.

A

Now, people do, but you don't have to ops center, can manage a lot of this for you and it helps you with a lot of things like updating the servers in a regular fashion without having any downtime. It's pretty cool.

A

Doing a massive upgrade of your database without any downtime is a bit of a trick, but not so much with this. You.

B

Only can spin up new clusters, public wicks in AWS or Azure. Yes,.

A

That's true: that's actually really cool I use that quite a bit myself I.

B

Just like quick looks like I want to find. No cluster need to do some testing. It's just done for you absolutely.

C

B

This is a little a dock fly just to show you hey, yep, there's documentation, just like OTN all available one on datus XCOM, but also that all backups can be managed through ops center or the command line. Whichever one you prefer, something that people talk a lot about with Cassandra like well, if you have all the publications, do you need to that? And of course you do. Cuz I mean your your data's not going to go away, but there might be a mistake in your application.

B

I I know no developer has ever missed, put a bug in an app that might have written extinct, reggae dub, it's just okay. You can manage.

C

All your backup.

B

And your resource directly from here yeah.

A

The only time I've ever done to restore is for a late developer.

C

Never had a daily.

A

Base tell yer it's always been a total failure, but and remember test your restores, even even in Constanta world. We need to know that your restores are working. You only get one point for a backup you get 99 for restores. Do that, so how about security? Oh yeah.

D

A

Is this is the one that I think a lot of people don't think exists in no sequel? There's been some interesting blogs over the past few years. How no sequel and security are not friends, so I think that's an old notion. Don't you definitely.

B

Right I mean we've got a number of clients that have done PCI compliance on top of Cassandra right.

A

And PCI compliance, of course, isn't the databases the process, but then, of course, you've got to have a database that can support those processes and things like auditing and whatnot, and that's really important security and it's always advancing I think that's something that you'll see a lot of changes in or increases decreased security through the years, because it's just something that has to be transparent and user looks basic Security's like the last thing.

A

You want to think about or lasting anyone thinks about they're building something you're checking boxes at the end, it really should be: haven't integrated, stuff, okay, so we're back to my my duties here. So we checked off a few boxes here, that's cool got all that. What more do we need to look at now, this this seems more online of like maybe a developer would want to do this or an architect like designing a database, creating database managing objects, managing objects, there's a DBA role thing and even like looking at new features.

A

So what do we got for that.

D

Well, thank you for saying this. Patrick we've got this I mean we've got a call deaf Center.

A

D

A

Lose your writing to that tonight because.

D

I ever wanted a job and introduce you to prices right all.

C

B

So you're from familiar with being able to trace queries, because you know there might be a performer problem somewhere to see what is going on inside the internals of how this query is communicating with the different nodes. Deaf Center will do that for you.

B

It will give you the ability to write, queries, manage your update, your inserts or create tables, and look just a quick look for those who are not familiar already with Cassandra how you actually interact with Cassandra. This looks vaguely familiar, doesn't it and it's not quite as well, but it's called cql and it's designed specifically so people who know SQL can interact well with Cassandra.

B

This is tamas native interface and.

A

I will interject here. This is um this will be B. We will both wait in depth on this on our next webinar, and so this is just a preview of that, but this is the SE QL and data modeling from a relational standpoint. To this we will cover in another webinar the next webinar.

B

D

B

Management here too so.

C

Being able to create.

B

Tables and you know primary keys- oh very familiar to tools like code or no.

A

Sorry, sir, you.

D

Didn't go. That's good! Okay!.

A

So last thing was, of course, I and I totally blew it, but the ETL situation that we laughed at we landed on last time got to me. Was I I, just don't like you PL at all, because it's costly and it's not efficient and from a lot of reasons it can also lose data. It's just something that I would rather not do. So. What do we get with Cassandra? In this case, yeah.

B

I know I was able to do it at 375 an hour for ready help job. Sir.

A

That was definitely available, but.

B

This is the one thing about data stacks enterprise, which is built on top of Apache Cassandra. That.

C

B

Think it's just for me is is the is the killer app right, because you've got your OLTP data.

B

You've got your transaction data into Sondra and it uses its native ability to replicate to be able to replicate to virtual data centers again they could be in the same rack, because if you see in the same data center or they could be someplace completely different, but so you have some workload isolation, so you can run search types, queries with solar, integrated on top of Cassandra or you can do OLAP or streaming queries with spark on top of Cassandra or Hadoop on top of Cassandra or Cassandra on top legit am not sure which way that would go, but yeah.

A

C

A

That, but is um unfortunately, a little less integrated, but I think it's it's fine for what it is. It's the bring your own a dupe or you can have your own Hadoop cluster and we provide connectors so that it will grab data and it will pull and push me to to Hadoop cluster from the container cluster. So the.

B

Idea here that you don't have each other, the data will be automatically replicated in real time to these clusters. So you can integrate search into your application. You can put a bi tool, tableau or anything out there with its ODBC or JDBC connection, to allow your users to do ad hoc queries without interrupting your OLTP transaction processing. Right.

A

And that so I I think we've seen some really interesting applications, and this is what it comes down to. So, let's kind of wrap this up into a neat ball is, and let's take this back to the plateau- that we're dealing with Cassandra in itself is a database that can do things that are going to respond to what we've created right, which is mobile, apps and IOT web. Those are all creating a lot more demand and need scaling problems that we haven't yet seen.

A

Well, we have now we were in it but I in addition to that- and this is where data sets enterprise helps out to is- you do- need some extra goodies in there like being able to do search on top of the same data being with analyze that data really critical anymore, because you're not just collecting data, storing it, you got to do something with it and to make money with your application, and this is this is this- is really the important to take away when you need to make money with your database, you use Cassandra when you need to count money, you use a vertical.

A

That's when it comes down to and if you look at the users that are using Cassandra they're making money. This is a part of their bottom line and they are relying on Cassandra to make sure that it's up on line ready to rock well their users around with their wallets out there ready to go, and, more importantly, I think it just gives everyone a better experience, because, let's face it, slow is as good as down anymore.

A

Take that and that's something we can all relate to right.

A

So where do we add down? Oh those are a catch. Their native that is.

B

A little bit of a KO there's a lot of caches right. There yeah.

A

Thank you. This.

B

Is a good Google image through right there, so we've hopefully gone through and explain to you how some of the traditional methods in Oracle are done in Cassandra, how they're how you can be able to build a application that is always on highly scalable cassandra.

C

B

Put there there is a cache. There are some things that you need to change about the way you think in order to take the most out of get the most out of the system, and that is going to be the topic of our next two webinars we're going to be next week's topic. It is going to be on data modelling. So how do you date a model for Cassandra and the following week? We're going to talk about how you change your development methodology?

B

So how do you need to change your organization or the way your organization works together in order to take advantage of Cassandra and meet the new technologies to the best of your ability? I'm.

A

Super excited about part 2, now I've been doing data modeling talks for three or four years now. Actually, four years I checked and the the concept of data modeling from a relational Cassandra is getting more bait. I think an instance. That's a word, and we I think this is where we're going to probably you're going to hear some of the same things again, but hopefully some new things and we'll highlight some of the newer features that are into standard treat a and probably what's coming in through dotto.

A

But we need you to understand that this is not this isn't going to kill you now, if there's one webinar you're going to if you're going to watch one of these watch that one or you already watched this one so if you like, but that number two is going to be really critical, especially for developers because understanding your data model, it's the first thing you need to get whenever you building successful application all right. Well, we want to see you.

A

We want to see you up close personal and that's going to be the Cassandra summit in 2015 Santa Clara, two days of fun and excitement. We are also going to be doing a certification with O'reilly Media. So this is a big big deal. So this is what's a certification test. We have training you can sign up for that. It is that is to pay for part of this.

A

Now it is free to go to, and if you want to do a priority pass, basically guarantees that you can get into certain sessions because I'm going to tell you there's going to be thousands of people here, it will be a big event, so getting that priority path is pretty important. If you want to guarantee spots because there's going to be some really hot talks, and if you give you that it gives you that guarantee so Rachel and I both have priority fathoms. So you can pick a winner there. It doesn't matter you're the winner.

A

When you get 50% off the pick, mine and and certification get 25% off. You can pick Rachel fidella, but we really do want to see you and register now. It is filling up fast, get your hotel rooms.

A

I mean do all that now, if you can, because you don't want to wait last year, we people waited to the last second and it was really tough, because there's a lot of people didn't get a go, because it's just full and that's we don't want to have that we're also doing it online a lot of its going to be online. So if you can't make it for some reason, you won't miss out and of course all of the talks will be videoed and we will have that all available on our YouTube channel as well.

A

So if you can't make it don't don't despair, you will eventually be able to see some of the talks almost actually all of them, but really the important thing is when you're there you get a talk to people and I think this is the most important thing talking to people relating experiences. Finding out how they're doing it is really something so I think that is it oh we're going to have to we're going to have to take some questions. I think here see who has the Q&A? That's the question. The first question I have.

C

Yeah by the QA or just boats ask some questions to you, guys Evan.

A

Can you pick a few of those out for us? Please good.

C

Amount of these, so all the questions that we don't get to, we will try to throw a little blog together for everybody, I.

A

Did actually saw a comment just recently that exadata ears shared nothing, and that is true, that is true. Although a disease I have run Exadata in production and I should be very clear. Yes, that is they shared nothing architecture, although it is a very specific architecture. It's built, we buy a box and it's huge. If you want to do multi data center than you, you have to use Golden Gate on that, but I will I will make that clarification. Okay, here's.

C

A question yeah: well, the nodes that have special applications such as bi tools in the middle, still be treated as nodes that normally receive and store data.

C

A

That one every node every node is the same. So if you're, if you're running in the multi data center with analytics and search those notes in that data center, will all be the same where they're running Cassandra and whatever extra so like in search node. There would be writing solar on topic of Sandra and then analytic, cigarettes, Cassandra and spark, but if you're running contender only then can stamp. Each cup sender note is independent.

A

There there's no difference between those other than what they're primarily responsible for the data, but if you make a request to any node that it will find your data, so it's not there's no specific nodes that are that are set aside for just queries, and these are and I know that this is a very common pattern with a lot of databases is like well either the query nodes either the data in it, because none of that there's a master/slave, architectures or anything like that's all peer to peer. So the short answer is no.

A

The long answer is why.

A

All right, Devin hit us with another one.

C

Here's a question: what are the biggest differences between Cassandra and Hadoop yeah.

B

I can go ahead and take that so they're very, very different beasts. Hadoop is designed to do a collection so take tons and tons and tons of machines throw a ton ton of jet of data at it, and you know, do some plotting through it with MapReduce. In order to figure out answers to you know deep questions. It is much more of an oil, an OLAP word oli, P system. You know designed to do data lakes and basically process the data of the universe.

B

That's what it was designed to do is just to pack through the internet. Saundra, on the other hand, is an OLTP system. It is designed for high speed, reads and writes little tiny bits of data short request data coming in and out at all times. They work on completely different file systems. They are completely separate projects, they run in conjunction with each other. So you can, you know, take your Cassandra data and move it into the dupe.

B

If you prefer to do your analytics on it, and but you keep Cassandra for it uptime for its scale and for transaction processing right.

A

Yeah they are they're, not the same systems, not at all all right, Devon another one.

A

Devyn still there oh yeah, alright, alright one more one more question, and then we can. We can be done with this.

C

Can you write and run spark or solar on different nodes from Cassandra nodes, I love.

A

It I kind of answered that, but let me be very clear so, when you're running data sect Enterprise it will you can't run them non separately, so whenever you run Cassandra only that will be in one data center when you run SPARC and Cassandra together, they are on another data center in SPARC and solar and a different data center. This is for workload, isolation and also, for you know just for task isolation.

A

We want to make sure that these nodes are there for a specific reason by using data center segregation, it gives you it gives you some options, and so that's why it's done that way. You know the SPARC is going to use so much CPUs, much memory, etc. That is going to be a different type of node, potentially so you'll want to make sure that those nodes are there. Remember we keep taking advantage of here is the basic part of Cassandra, which is replication.

A

It will replicate your data, no matter what, and so that's what makes this work. We just take advantage of that basic fact. So I think that's it now, just we will be collecting these questions as well. So if we didn't get your question, we're going to try to we'll try to follow up with a blog post, do the QA and there as well so just keep your eye out for an email when it gets posted, will try to post a the QA blog post as well.

A

Just so we make sure we're get over ajaan your questions. I know you have a lot of course hit us up on Twitter I, see a few people already have on my Twitter account right now great. We love to hear from people and if you see a Cassandra Day coming to a town near you, our next ones next month in New, York just come on by, we have a lot to talk about there. You can ask your questions there as well lots of experts.

A

So thank you very much for spending an hour with us and we will see you next time.