Apache Cassandra Cassandra Summit 2015, 14 Mar 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cassandra Summit 2015 Keynote

Description

Featuring Billy Bosworth, CEO of DataStax, Jonathan Ellis, Apache CassandraTM Project Chair, and Scott Guthrie, EVP of Microsoft.

The Cassandra Summit 2015 Keynote dives into the continued rise of NoSQL databases, Cassandra 3.0, and live demos featuring the leading distributed database technology, Apache Cassandra.

A

Ladies and gentlemen, please welcome Patrick, McFadden and Rachel Petrovsky.

B

Good morning, everybody I.

B

Like the enthusiasm, Cassandra summit 2015, there are more people in this room than I knew used. Cassandra. That's amazing, I'm kidding how's. Everyone doing him this morning, really good.

B

Wait a minute how come up here alone, where's Rachel! uh She just left me hanging. Okay, all right! Well, all right, we'll just keep going alright! So first thing we're going to do this morning is a riveting, live demo right and let's bring up the the demo here. So I have a couple of data centers here this is going to be really awesome. Those are raspberry, PI's, running Cassandra, they're, really really running Cassandra, so I have data sent or two over here David center, one over here killer video anybody know about killer video.

B

Oh yeah, we're going to dig into it we're going to do a code dive, we're going to see how it works. It's amazing and we're having it running right now and I have a status page up here. It shows all the nodes running. You know something you might see in a plasma and you're in you're in your data center or something like that, but it's just showing how the thing is running right, but I'm doing this alone, all right where the hell's Rachel at I mean she should be here. Well, I'll just keep going. Okay!

B

So sorry, sorry, sorry, sorry Rachel.

C

Sleep, it's a classic gin and I had to get coffee. Oh you.

D

C

No you fantastic Oh.

B

Rachel, what are you doing? What are you doing? What are you doing hold on you just spilled coffee, I built this. It took like a whole weekend, I'm.

C

B

Is why we can't have nice things.

C

I'm so sorry, but but but the app still running wait. Oh still.

B

Do our demo wait all right, yeah? Well, it's Cassandra! It could do that right, yeah, but why are you so late? Oh.

C

Well, I had coffee, yeah.

B

C

Not an excuse, but there was this tornado one.

B

This is gonna show up on your farm interview. Ok, tornado! Really! Oh wait! This tells me there's a good story. All right, mid I ever tell you the time. I had a tornado that hit a data center. Yes,.

C

You've told me about the timing. Oh it.

B

Was a mandatory but.

C

How who else is here about the time that this tornado hit Patrick's data center? It looked just like that. Oh wow, there was a tornado. Oh.

B

It was, and you know it's really amazing- it took everything apart and it kind of looked like this right, so there was like a blender type tornado and it do.

C

You really bring a blender to conferences, oh yeah,.

B

It's a Blendtec, it's really nice but oh like will it blend right yeah? So when the tornado hits- and you don't even know it takes like a node puts it in here right, what's gonna happen right, oh, like a tornado right! Oh.

C

No I made it, I I, don't I, don't think Billy's gonna like this. What.

B

Do you guys think, like smoothie ya, lo, no.

C

I think it's a good idea, just all.

B

B

Oh, that's uh hang on that's just every! Oh god! Yeah.

B

That's really bad, not the way. I expected. Oh whoa, that's bad! That's really, really bad all right, one more time! You know this is the devil.

B

Yeah I think that's done. Linkin own nice.

C

Like nice job, you.

B

Know some toe? Oh no! Oh! It's a chaos. Monkey! You know some of our customers do.

C

This on purpose and wait.

B

A minute hold on: no, no, no, no hey Bad Monkey. What are you doing? Oh, no, bad, bad! Oh! No! Oh! No! Put that down! Put that down! What are you doing? huh He cut the network, cable.

C

On purpose, this is why we don't get should give out free passes. You don't know who comes to these things, that.

B

Is not what I oh hold on? Why is there monkey over there now? What are you doing? Bad Monkey you're, a bad monkey. Don't do that? Don't touch that? Oh, no! No! No! No too bad Go, Go.

C

Go, what he's got a cigar? What are you doing? No.

B

No there's no smoking in there. Oh, oh now we got a fire. Oh wow, all right, I know: I'd handle this I'm a first responder I know how to deal with this okay hold on all right. Okay, let me okay, all right all right, hi, fire, okay, I know how to handle this stand back. You.

C

Have a fire axe? I have.

B

A fire here we go.

C

All right, I don't think I don't think it's ready, I.

C

Think it's out now.

B

Well, I think would cause a mess how's their demo ready. Oh.

C

B

C

B

A couple things only three.

C

Nodes left you.

B

Know what let's just trash this thing.

C

I think it's a good idea, go for it all.

B

Right I think it's good now, I'm.

C

Never gonna be late again. I promise I'll get.

B

Out of control, didn't it just.

C

A little bit hello, locket raw, okay,.

B

Well, Wow, we completely destroyed it. Oh my.

C

B

C

B

Really out of control fast.

C

Maybe we should pick better data center.

B

Thanks thanks guys, that was nice, that was nice they're, making fun of us now that we destroy everything. So all right, I guess we're gonna try to show as a cassandra is awesome, but it's not awesome enough to just destroy everything. Is it no.

C

I guess if you actually have you know a couple.

B

Of notes, yeah.

C

Yeah lr I should.

B

Have at least one note but I think we killed everything, I'm really feeling bad about this. This was not what we expected to do. We were hoping to do a demo of the app really.

C

Nice code I've and an IDE to show you guys it was gonna, be really great. Yeah.

B

And this is, you know this is what you're gonna hope for whenever you're in your data center is, whenever you know bad things happen, and things fall apart, all over the place that it's still running right and or if you have chaos monkeys. That you're deploying ladies.

A

And gentlemen, please welcome Billy Bosworth CEO datastax. Oh.

E

Alright time for a little bit of adult supervision, the real fire marshal has asked that you guys cease and desist from all of your so-called demo activities, appreciate you destroying the room and trashing everything we have in front of us right now, but on a serious note, thank you for the very visceral experience around what what the life can be like in the real world, but Patrick you come from a background where you manage these systems in production and you've seen you really have seen a lot of crazy stuff and I know one of your companies, the the hopes and dreams of young people hinged on a click, and if that click went through good things happened.

E

If it didn't, the next 45 years of their life were altered, but there's some truth like this demo. Isn't there right.

B

And despite a tornado, that actually happened, but I mean this is this is what we needed, because it was mission critical and when I think a mission critical. It could be anything from a heartbeat monitoring, a heartbeat to something as trivial as a cat video. But it doesn't matter. Your application is the critical part and if you're in the 21st century, gown is dead, you cannot survive with downtime and you can't survive with slow, and you know, because you're here, the Cassandra that database that can deliver this.

B

If it's on a little cheap, Raspberry, Pi I, don't recommend that. But you definitely want to use it in your data center because of what it can do for you. Oh great.

E

Well, Rachel Patrick! Thank you so much. We give them a big round of applause for the kicking us off and a great demo, and hopefully no more gorillas will be making their way on the scene. So I want to welcome all of you. This massive crowd to a great 2015, Cassandra summit.

E

It is really a privilege to have all of you here today and to see what's going to be in store about one year ago, exactly I stood on stage at our cassandra summit of 2014 and we celebrated something that was changing in the cassandra community, and that was a departure from the rugged pioneers of the earliest days into a new class of user into a new class of customer into somebody who was maybe not used to living in the bowels of these open source projects, but they were bringing to light new activities and new applications and new ideas coming from a mainstream audience, and I predicted at that time that the creators of this new universe of ours were going to be you.

E

You are going to be the authors of a new way of life and the expansion process was going to occur and you were going to hold the power of authoring. This new future and I predicted that the expansion would start to happen pretty quickly after last year's event. But I was actually a little wrong on that one, because the expansion has happened so much quicker than I.

E

Think even we would have imagined it has been a spectacular thing to watch us go beyond Planck time for it, for you, nerdy ones, in the audience and into the inflationary period. We're watching this stuff grow at an incredible rate. A few statistics from last year's conference we came and we had over 2,600 people decided to join us on site to register. To come to the event. We had more than a thousand people who watched via streaming video selected portions of the conference, and we had sixty sessions that were created for you and by you.

E

That was remarkable because it wasn't too long before that when we were having conferences with 2025 sessions, and we were calling you the week before saying hey, can you present? We need a few more sessions to get filled? Well, that's all changing very radically. In fact, I want you to do something that not too weird I won't ask you to get up and hug anybody or anything like that, but stretch out a little bit. Cuz I do want to ask for a show of hands on something.

E

If you did not attend, if you did not attend last year summit, can you raise your hand really high for me look around the room. Did anybody at ten last year it looks like maybe nobody did that's remarkable. Thank you so much. Thank you for coming this year and being a part of this.

E

Let's put those numbers in statistics for this year's conference and think about where we are today more than 6,100 people tried to get here today by registering we had to for the first time in our history, stop the registrations we're out of space, the real fire marshal, not Patrick with the hat, but the real fire marshal said. We can't have any more people in here, so we had to actually stop the registration process more than 5,000 people are going to be watching this event, online I think it's going to be far more than that.

E

Actually, that's it. That's a conservative number and we're streaming more pieces than we ever have before, but you want to know the real mind-blowing part to me. Look at the number of sessions.

E

137 sessions over the next two days, five months ago, our community team, who is responsible for shepherding the process datastax, does not decide which sessions get presented. That's a very democratic process that you all vote on, but we do Shepherd that procedure and about five months ago the team walked into my office with a binder about that thick and I said. What's that, and they said these are abstract submissions.

E

That's never ever happened where we have had that many submissions that early in the process that ended up leaving, unfortunately, some unhappy people, because we just didn't- have the room in the future.

E

We will I can't even imagine what these numbers are going to be as we go forward, but this is incredible in one year's time most companies don't even make it this far in the size of a conference throughout their entire life and to watch this happening happening and unfolding right in front of our eyes and, having you be a part of it is just remarkable. So in case it's not obvious to you yet. You have chosen wisely investing your career. Thank you for the three applies right.

E

You have chosen wisely making these choices with your energy and talent is a non trivial exercise, so you are making a bet you're, making a bet with the most important thing you have in life. Your time, that's something we all have in common in. You are choosing a technology that is special, but what makes it so special it does some amazing things, but a lot of software does amazing things. I think what's different about Cassandra.

E

It is that it is foundational it's bedrock and when you have a technology like that, that's not an ancillary technology, but a foundational technology. All of a sudden, you can build upon it in amazing and incredible ways: you can build applications, you can build your career and you can build entire companies on this technology at datastax. We are very privileged to be so integrated and intertwined with this technology.

E

We have our offering called datastax enterprise that we are very delighted to bring people world-class support and enterprise functionality and features, and very easy-to-use tools- that's great and exciting, and it's a privilege to do that for our summers. But for us that's just the beginning. That's just the foundation and for you in your world, in your applications and your company's, it's probably much the same.

E

You see this technology as a bedrock foundation, and then you can start to free your mind to think very differently about how you build applications and service customers and bring context of transactions, whether it's with technologies like spark or solar or in memory or a variety of other things. How can you take those ancillary technologies plug them into this bedrock foundation and then do things in new and interesting ways?

E

I think the best way to really understand that is to hear from people who live this every day at datastax we have a community team, they're called evangelist and you are their audience. They live to be passionate about you about how to bring you, the wisdom and knowledge and education and examples to help this community grow and to learn from you and to bring it all together.

E

Two of our best are John Haddad and Luke. Tillman they've done lots of really good creative work to make examples for people to learn from and understand. How do you think about building a modern transaction? How is it different from the way things used to be? Is it more difficult? Is it easier? Is it different and hearing from them directly I think will help us all gain a very tactile understanding of what some of this future can be? So would you please join me in welcoming to the stage John and Luke.

A

Thank you very much Billy, as Billy mentioned, I am Luke Tillman and I'm John Haddad, and we are technical evangelist for datastax. So before john and I joined datastax believe it or not, we were actually responsible for putting things in production. So before we started living this sort of glamorous life of evangelist, they actually let us write code and put things into production. That's.

F

Right and it's important to us that we continue to understand what that's like we're talking about provisioning servers in the cloud we're talking about, reading, documentation, understanding, how things work and we're also talking about writing code. There's only one way that we can continue to understand this. We have to keep building things so.

A

You saw it in the earlier demo. We decided to build something called killer: video, calm and killer. Video is a video sharing web application and it's powered by datastax enterprise and Microsoft Azure. So if you go to killer video com, that's killer video without an e you will find links to well you'll find not only the live demo, but you'll find links to things like the source code. The cql schema as well as a whole bunch of other resources to help you get started, building your own applications on top of Apache Cassandra.

A

Now, when we first started building some of the basic features on the site, most of them were just kind of core cassandra and c ql. So, for example, this list of the most recent videos on the site- that's on the homepage, is just a really simple time series data model and a c ql query to show it.

F

Yep now what if we want to build something a little bit more complex? What if we want to build something like a personalized recommendation? Engine you've probably seen recommendation engines before. Maybe you bought something at an online store and you've gotten other things that maybe you'd be interested in buying, or maybe you waited a movie highly on netflix and you got recommendations to watch other movies that are very similar.

F

Now we could try and build something like this from scratch, but it turns out it's really hard to do with scale, so instead we're going to go ahead and leverage spark. So if.

A

You haven't heard of apache spark before you're going to hear a lot about it over the next couple of days, and so apache spark is just a framework for doing distributed computing and we spend a lot of time at datastax, making sure that spark works well, not only with open source Cassandra, but we've also taken and integrated it in to datastax enterprise and so with our recommendation engine. Instead of building it from scratch, we're going to leverage one of sparks machine learning, algorithms called alternating least squares and.

F

So the way this works is we take users like luke and I, and we take videos like this cat burrito thing and then we take our ratings and we basically feed them into this algorithm, and this is process, is called training and then what happens? Is we get? This thing called a model.

F

A model lets us get recommendations for people based on the movies that other people have already seen and liked that have in common traits with me, and this is really cool- we're going to take the predictions that we get out of this algorithm and we're in a store, the MCUs sandra, so that we can show them on the homepage. So this is really really cool, but that's not the coolest thing about this. The thing that's amazing is that we managed to build this in less than a day and under a hundred lines of code.

F

Now, that's absolutely unbelievable. This is something that would have been ludicrously hard just a few years ago, and now anyone in the room can do this. Okay, you don't need a team of PhDs and machine learning to be able to do this. Hey Luke, you have a PhD I, do not have a PhD. Do you have a PhD I, don't have a PhD, so it's really cool about this is Luke mentioned. This is an open source project. It's apache spark, but it's also baked right into DSC and it's really great and easy to use. So.

A

Another part of DSC that we lean pretty heavily on in the killer. Video demo is DSE search. So when users add videos to our catalog, we ask them to provide tags which are essentially just keywords that describe the content of the video that they're adding to the site. But.

F

It's really not enough to just ask people to tag everything and be only be able to search on those tags. Users expect that they can do things like search on the title. They can search on the description, other metadata, that's available, and so to do that. What we've done is we've leveraged the search feature built into DIC as well, so.

A

If you've never seen this before, this is what it looks like to turn on search on our videos table in killer video. It's just a single line from the command line, and once we have this turned on, we can start sending solar queries to Cassandra and, if you've never seen solar query syntax before this is what it looks like to send. A simple query for Cassandra in the description of a video and one of the cool things about DSC search.

A

Is that DSC that so creating is baked right into cql the query language for Cassandra, so when I want to send this search to Cassandra I can actually do a cql query. That looks something like this and I'm also not just limited to simple stuff. Like simple queries like this one, where we do cassandra in the description because we're based on apache solr, we have all sort of the power and the flexibility of the solar project behind us. So when somebody actually uses the search box on killer video, we really send a query over.

A

That looks something like this to DSC and.

F

One of the things- that's really amazing about this, similar to the work that we've already looked at was sparked. Is this wasn't particularly difficult? In fact, this took only a couple of hours and maybe was about 20 lines of code in order to get the search functionality. That's pretty cool, you don't need a team of search experts in order to add search to your site, that's already backed by Cassandra. Hey look. Are you a search expert.

A

I am NOT a search expert John. Are you a search, expert, I'm, actually kind of a search expert.

F

Yeah, your search, excellent yeah. So what have we learned here? Hopefully we all taken away that adding these features, which used to be really complex and used to be really hard, can now be added in a trivial amount of time. This is absolutely amazing. We no longer have to choose between a scalable operational data store and one that's easy to use, and it's.

A

Also, really great to see all these awesome open source projects integrated so tightly with Apache, Cassandra, yep and.

F

Furthermore, the last stage of the game thing that I'm really excited about is cloud providers like a sure make it really easy to deploy our apps into production without having to buy a bunch of really expensive hardware. Thank.

A

You all very much have an awesome, Cassandra conference.

G

That's a job well,.

E

Thank you very much John and Luke. That was awesome. Thank you for giving us a little deeper dive into that technology.

E

Hopefully, in an audience of this size, some of you at least are as old as I am, and you remember the heady days of something we used to call internet development and when we were doing our internet development, we were looking at all kind of new technologies and trying to figure out how this new world was going to sort out, and you probably remember the day when you first took a look at something called asp.net anybody remember those days come on. You got me that old all right. We got some old-timers in here.

E

Let these elderly people go out first, when we run exiting the airplane, so Scott Guthrie was one of my early heroes. Actually because I was a developer back in those days and he was a co-creator of that asp.net technology.

E

Rumor has it that the prototype was developed in a in a room over a weekend with his co-creator, which may or may not be true, but Scott has been really marked by innovation throughout his entire career and today, as EVP of Microsoft's incredibly important cloud and enterprise business, he is continuing to push that innovation, but he's doing so now in a war will we're bringing it into the mainstream and so I've gotten to know Scott a little bit over the past year, as our partnership has developed, and he is an amazing individual.

E

So would you please join me in welcoming to the stage scott guthrie.

E

E

So Scott welcome, first of all, thanks a bunch for having it's great to be here. Great I see you went with the red shirt today. Yeah.

D

That's my standard.

E

Unusual choice: it's not! For those that don't know the red shirt is something he has been characterized by for how many? How long? Now, how many years I.

D

Just don't have any good dress sense, so I look at her the same shirt over and over.

E

A bit simple and it's easy that way so Scott you still work for Microsoft I assume this audience is largely going to be very open source oriented, very Linux oriented, there's a lot of them. Are you in the right place? Are you sure you meant to be at this conference I hope so? Ok, so it's gotten I've been talking quite a bit about that fact that there's this traditional sort of disconnect between the technologies, that much of many of you are very well versed in, and the traditional technologies of Microsoft.

E

Can you tell us a little bit about the thought process of the organization and your world in particular? What what does it mean to us and why is it important that you're here yeah.

D

It's good point: it's more things that happened in July, Billy joined me at a microsoft event, which is our big worldwide partner conference, and we kind of talked about the same question which is gosh. Why is he here and you know, I think part of it is really hopefully representative, the kind of new type of Microsoft that we're really looking to build, which is one where how do we put customers really at the center of everything that we do and then work back and figure out?

D

How do we best meet their needs and how do we enable them to transform their businesses, build great applications and really leverage the cloud in order to transform everything that they do? And that means you know I'm able m to use every technology, and so we love windows. We love Linux. That means we love open source and not only are we looking to embrace open source and enable it to work great with our platform. But how do we take even our core platform?

D

Things like asp.net and the.net framework and open source that as well I, contribute our own code into the ecosystem and so yeah. That's that's a big change for us and it we're still early in the journey, but hopefully opens up a whole bunch of possibilities, and you know enable some great partnerships, like the we have with data stocks today. It.

E

Is thank you and the the conference in orlando was was remarkable and scott asked me at that event. Well, what do you think about our partnership and I said I?

E

Think it's improbable five years ago, I don't think Microsoft and a company like datastax would have had much in common, but today it's it's been actually really impressive, and when you talk about putting the the customers our audience in the center one of the things that that I know we will hear in a lot of the sessions, and you will hear from people as you talk that there's a trend emerging because of the capabilities of Cassandra, to do something that traditional databases and even some newer databases still can't do very well, and that is this notion of multi data center and the capability of having an active database in multiple locations.

E

Well, from our perspective, at datastax, our audience is moving very quickly into a hybrid environment where they're looking at their on-premise solutions and then also their cloud solutions. Do you see that same trend? And what does it look like from from your side of the fence? Yeah.

D

I mean I, think every customer right now and every organization is really looking at. How do you embrace the cloud in a pretty substantial way and that journey is going to look different for different organizations? You know some have large existing on-premise investments that are going to take years to depreciate or years to migrate, and you know there's others that are startups and you know don't have any existing legacy. I can move even faster and I.

D

Think the thing that's important in that journey is how do you have the flexibility and the choice so that you can kind of make the decision on a workload by workload basis or an application by application basis?

D

In terms of how you leverage the cloud- and you know, I- think where the vision that going to we have in Microsoft and I think datastax has that's very complimentary, is how do we have a set of technologies that can work both in the public cloud in a single data center or better yet in multiple data centers all over the world, and can also integrate in an easy way with on-premises systems and that ability to kind of interconnect those and have that type of hybrid story really enables maximum choice and Mabel's need to build some fantastic solutions.

D

E

I know: we've heard a lot from Microsoft about cloud first mobile first, so that doesn't mean cloud only. It just means the way you guys think about the world. We have to enable this this opportunity for customers yeah.

D

I mean one of the things that we often talk about its kind of core to our strategies. How do we enable a world where you can take advantage of hyperscale, where, when you have a runaway hit application, you can scale up to any amount of capacity where you can basically run your solution all over the world close to your customers yeah?

D

So you have the optimum performance that can really both please them, but also frankly, drive your revenue and then marry that hyperscale with an enterprise-grade platform, and then the hybrid capability to maintain maximum flexibility and I. Think that combination of hyperscale, hybrid and enterprise grade ends up being super powerful and basically means you can go and tackle any type of scenario. So.

E

When you're having these very practical discussions with somebody and you go in, and and they have these six data, centers and they're- trying to go through a data center, rationalization project and they're trying to optimize for these things- are these some of the core topics that you find yourself engaging with at a strategic level with the with the CIO or the VP of IT or people are trying to figure this out. Yeah.

D

I mean I think right now we're the stage in the industry around cloud adoption we're, certainly the enterprise level I'm now finding pretty much every Enterprise is, you know grappling with these problems because they are, they do require a lot of things to work out and figure out. What does that journey?

D

Look like, but I think it's something that every company is going to go through over the next 18 months to two years and really start that journey, and so you know it's it's an exciting time, I think to be in this industry and to be a developer. You know.

E

One of the things that you and I were talking about offline because it is surreal. Joy gets to talk to Scott about the development world and how application architectures have changed and I think. When people talk about cloud, they automatically start to assume things like operational cost reduction, and you know don't need as much power as much footprint and I don't need the cost of administering all those machines. I think that's important, but again when you have a database like Cassandra. That starts to enable you to distribute the database in an active state.

E

There's another really important element that I that I know many of you really are trying to optimize for and that's low local latency right, great alliteration there but low local latency is is the key, because, with these apps being at this scale, the throughput becomes so important. When that app is successful, which sometimes is the worst cause of failure, unfortunately, is the app succeeds. It needs to scale, and then it falls over.

E

So when you think about app design, not costs or not ease of that kind of administration, but application architecture, design and low local latency. What would you say to this audience about about that kind of aspect of azure? Well,.

D

I think you know it's. You know one of the things that there's been a lot of studies on, which is you know what is the cost of say, 10 or 15 milliseconds of latency in your application for a mobile or web based solution to any customer was a consumer or an enterprise, and you know, basically, you know for every millisecond, you add.

D

Not only does it kind of dropsy sad, but in a retail organization you'll see are your sales will actually drop in everything you can shave out to get better performance, usually translates and pays back, not just in terms of SAT but really in terms of business results, and so you know if you can have a platform that enables you to really shave off as much time as possible and make your app as snappy as possible.

D

You really have a winning solution, and so you know the thing that we've focused on, for example Thatcher is we now have 19, we call data center regions around the world. Put that perspective, that's actually more than AWS and Google, combined and they're, literally all over the world. North America South America Europe, Asia, Australia Japan. We even have two regions in mainland China we're the only cloud provider, Western cloud provider that operates there and so.

E

The beauties so they're actually up and running in mainland China.

D

They're up and running in beijing and shanghai today, Wow, and so you can basically take datastax or cassandra based solution and you can deploy them into one or all of those different data centers and you get to choose where you want to run your code and the beauty. Is you know if you want to reach the Australian market, you want to make sure your app is running in Australia.

D

If you want to reach the South American odd market, you want to make sure your app is running in South America and, frankly, even if you want to reach the US market, you really want to run your app, both East Coast and West Coast, and probably one or two places in between and that ability to have kind of a cloud infrastructure that enables it and then, more importantly, the ability at the application level with Cassandra to enable that consistency model really enables you to build much much faster apps and, as you saw earlier, enable even better reliability.

D

I'm not going to let your employees in any.

E

Of my damn centers no going building, but don't let all your.

D

Data centers, but but really enables you to build much better architectures and apps as well. That's.

E

Great yeah I do think that availability, we always think of availability as this kind of dramatic outage, but in today's world, if an app is slow, it might as well be unavailable not just for the CSAT in the bounce rate, but it will literally start to queue up to a state where it can't catch up anymore, and so the throughput has to be there and that low local agency is really important, yep. So some closing thoughts.

E

You have we're all here and we get to benefit from some of your industry experience and you talk to global customers on a regular basis. One thing I love about scott is an executive. Is he really truly is a person of the people? He is out a lot with customers, so if you were going to pass on just a couple of nuggets of wisdom so that we don't bang our heads on the same, maybe walls that others have before us. What would you leave us with? Well.

D

I think, if you're building apps in this modern era, there's many lessons- I mean if I to pick 3 I'd, say on the really design for agility and think hard. As you start building and scaling solutions, how do you have the flexibility and the agility to react because you're going to have different things, whether it's you know a natural disaster or someone's going to come in your office and say, look I need this feature done quickly.

D

Yeah and I need to be able to scale around the world so having kind of an engineering system and a set of platform choices that give you that agility I think it's probably the most critical thing in the cloud space I think the other one I'd say so have good monitoring, because it's one thing to be able to kind of know that you're reacting to our tornado. It's another thing to know that the tornado actually hit you yeah and I'd say the number one thing for any online system that we typically run into is gosh.

D

I wish I had more monitoring data in place to understand what the problem was. I think the third one is really just don't throw any data way. It's amazing how much value you can extract from the data that you're already storing inside your applications. You saw the the demo a little bit earlier in terms of how can you apply spark and solar and others to data, but I'd say you know in general: that's something that that you can really find and extract just huge amounts of value from that data.

D

Don't throw it away find a way to use it yeah.

E

Very different mindset from how we thought, just even five years ago and certainly ten years ago, on how to build these apps. So thank you very much for that wisdom, Scott, but we did something with the video killer, videos demo. That was a big jump, and that is we just assumed that the infrastructure was in place when we began.

E

We've just announced some more about our strategic partnership together as datastax and microsoft, and part of that is the experience that we want you to feel when you're working with an azure environment and when you're working with a datastax cluster. So would you be willing today to actually show us what that will feel like in real in real time sure, let's write that most let's come on over and get started.

D

Well, have here is our as your management console, so it's a browser-based experience, there's obviously REST API s and command line utilities that you can use as well, and basically you can use it to kind of manage all of your different Azure services.

D

So we've got you know hundreds of different services across a wide spectrum that you can actually go ahead and use, and one of the things that we've done is also integrated as part of the experience, what we call our marketplace, which basically allows you to consume services from a whole bunch of our partners and other solution providers- and you can see today- is data stack today on azure, and so we actually have it as the featured item in our marketplace gallery.

D

You can learn all about it and then basically directly go ahead and create it and create a new cluster inside a shirt and what we've done is is rather than just sort of say, here's a single VM go, deploy it and and leave it as sort of an exercise to the reader.

D

In terms of how you scale that across a massive cluster, is, you know, work with datastax in terms of providing sort of an integrated experience that makes it trivially easy to stand up not only a dev test environment, but really a production environment that can run anywhere around the world, and so, for example, let's just call this demo Scott and Billy I'm, going to give it a user name.

D

Password here, I can basically we'll call it Scott demo, the first time I did it and then basically I can choose where I want to run this around the world. As I mentioned, we got 19 regions open for business. Today, you can basically just click any of them from this list, which is doing the West us and so.

E

That's the regionality option. Well,.

D

That's the reason about, like any.

E

Of those are all available and basically.

D

That I'm just gonna hit okay and then you'll notice. What we've done is sort of integrated in the datastax specific settings as well into the experience, and so you know, as I mentioned, you could do a single BM but be more impressive. You could use, let's say 90, know Cassandra cluster. Where will basically install it and coordinate it? It's as simple as pulling it from the drop-down list.

D

I can choose the size of machines, I want to run, and so our largest VMs can do a half a terabyte of RAM 7 terabytes of local SSD storage, which is a pretty screaming fast. Have you got 90 of those deployed around the world and then basically, I can just go ahead and enter in my datastax, username and password, which is the hardest part of the demo. So have to remember what.

E

If I get that email address, he's going to get seven thousand emails right after.

D

This conference, basically, you could okay you'll see the summary. We click OK and then basically just confirm it. If you don't have a license, you can actually buy it and transact it directly. As part of this experience, click x and we're now deploying a 90 node Cassandra datastax cluster in the western half the u.s.. This will take a few minutes and then at which point I've got a fully working system up and running. Here's one I deployed in East Asia yesterday and you can see basically or all the nodes that are now running.

D

If I want to drill into any of these directly in the azure management console you can see, I can actually do that, and so I can see all the settings, the network settings it's pulling out, cpu percentage directly from asia, and you can see going back to the how much we love linux. We've even integrated directly the linux serial console output directly in the management portal, and so you can see here.

D

This is actually what the boot sequence looked like on that Cassandra node, and you can see basically all the way from the OS all the way down into the cassandra boot sequence. Exactly what happened as part of that everything again you can do here. You can fully autumn.

E

This is all right now we're still in the the browser-based as your console yep. This I can get down to this level of note information right from one place, all.

D

In one place, full role-based access control, everything's set up. You know this this we show this 90 node cluster. We're deploying we're even going to working on a template that found out actually after Ursa yesterday, which will also now allow you to deploy multiple regions simultaneously and set up a virtual private network automatically or if the same sort of two minutes to wow factor. You saw here and then basically you know the beauty is you can use all the data stacks tools against this, so here's that 90 node cluster up and running and.

E

This is our: this is datastax ops center and we're looking at now.

D

Exactly is the datastax ops center, you can see the dashboard in terms of the health of this killer, video application, and that is the killer. Video application now deployed running in Azure scaled out across 90 nodes anywhere in the world. Now that is fantastic.

E

Scott, thank you so much for joining us today. If you're, a real pleasure got Guthrie.

E

If you guys are interested in more of that killer, videos, application and you want to dive deep into the data model or how the code was constructed or the queries that were used in the partner pavilion area. You can find our meet the experts centers, and they will help you go very deep under the hood of that application if you'd like. So. Thank you very much also to our engineering teams. One of the things that has made this a great partnership is: we've got executive alignment between Scott and me.

E

We've got technical alignment with the engineering teams that allow this amazing stuff to happen, and then we've got field alignment where you hopefully feel a single experience if you're working with with both of our customers. So thank you again, Scott for all that help.

E

Next up, I would like to introduce mr. Jonathan Ellis. I have had the privilege of working with Jonathan now for over four years at datastax, Jonathan was one of the founders along with Matt file and jonathan is a pretty special person. I continue to be impressed more each year of getting to know him and not just for the benefits that he brings to the technical world on the Cassandra side. Him and his team do amazing things.

E

He and his team do amazing things tell my daughter's use the right program, but they are also so passionate about every one of you. They really want the community experience to be phenomenal and they've done so much work to that end. So I would like without further ado to have you help me welcome on stage the Apache Cassandra chairman mr. Jonathan Ellis.

G

Thank You Billy I'd also like to thank Scott and the evangelist for those fantastic demos. I would like to be the third person to get on stage here and tell you to check out that killer video application and see how easy it is to build a modern application with Apache Cassandra and deadest X enterprise. We've put a lot of effort at datastax over the last couple years to bring you a first-class experience in building those applications with our Cassandra drivers.

G

You'll be pleasantly surprised at how easy it is and how productive you are in that environment.

G

What I'd like to do for the rest of my time today is give you a vision of where the database industry is headed both at a high level in terms of the entire operational database segment and also at a low level, in what apache cassandra specifically is doing.

G

You probably noticed over this past weekend that there was an outage in the Amazon us east to data center.

G

This is the kind of challenge that modern applications need to be able to cope with and they need a new generation of infrastructure. That's designed to handle this kind of event, that's why in 2013 Gartner recognized this and replaced their oltp database report with one covering a broader category of all operational databases, including next-generation technology like Cassandra, and if your ebay or Instagram or Salesforce and you're looking to build a new application or extend an existing one.

G

You you're going to want to do that on the best technology in the industry, you're going to want to do that on Cassandra.

G

Now for me, when, when I'm thinking about a category, it helps to have specific examples to help me wrap my mind around around that, and so in the database industry I like to think of databases along two axes and on the top. Here we have the operational category, that's where you run your business as opposed to the analytical category on the bottom, which is where you run reports against what happened in your business and so in the upper left.

G

We have the Oracle and sequel server operational systems and in the upper right you have the next generation operational systems like Cassandra and people. Ask me: well: what's the difference between know this technology from the 90s and Oracle and sequel, server and so forth, and you know Apache Cassandra, what makes it different and better suited to modern web mobile and IOT applications and there's there's three key properties of a modern operational database that Cassandra does better than anyone in the industry.

G

Those are the ability to be always on and the ability to scale and the capacity to deliver high-performance. So I want to take a couple minutes and talk about each of these. In turn, when web 10 hit in the late 90s, that was the the first beginning of a change in customers. Expectations, google launched or started their company in 1998 and a lot of people in the room have been using google almost as long as they've been alive and every time you go to that google com, you expect to get that search box back.

G

You know it would be unthinkable to go to google and get back a page that says we're down for Planned Maintenance come back at 6am, it's just unthinkable and the ten years after Google was started. Steve Jobs introduced the iphone and this same expectation of availability started propagating to the mobile world, and today mobile is the dominant player in markets bigger than desktop in a lot of places and the the problem with with this new world of needing to design for scale and availability.

G

Is that the way to do it is wrong and I'll give you an example. You know if you're building your application in the 90s and you have your your application server talking to the database. You know everything is nice and simple and and you're at you're in a happy place.

G

But then you start outgrowing the capacity of that single database server to handle your workload, and so you think, okay, well, I I can just take that single machine and I can shard my data across multiple machines and then I I need some kind of fault redundancy, so I'll replicate those masters to a couple slaves, so I have the ability to failover and I need to put a routing service in front of this, so that when my client comes along with an update, now, I can be told which shard owns my data, and I can send that to the master and the master sends it to the slaves and and everything works and, and everything does work as long as nobody goes down.

G

So this is this is kind of where the industry was circa 2002 and you know that's where we were, but we know we know better now, because this design has some built-in limitations.

G

One of those is that, even if the software is working perfectly, if I lose a master, I can't do anything with the data in that shard until I have a failover and I elect a new master and bring that online, and then I can start accepting requests against that data again, so I've got I've got this built-in period of downtime if that master goes down, but there's also another property.

G

With of this, which is perhaps even more scary, which is that the architecture is brittle so, for instance, if I have instead of my machine going down, all my machines are alive, but I have a network partition and a switch fails, and now some of the machines. The machines on the lower left here can talk to each other, but they can't talk to the machines in the upper right and vice versa.

G

And so what will happen if you're? Not careful, is each of those halves of your network will elect masters and start accepting requests, and so this is. This is called a split brain scenario and when you have multiple masters, accepting updates for a given partition, then you're going to introduce corruption and that this isn't just a theoretical problem. Arman Ron occur, gave a talk a couple years ago about how this exact problem happened to his MongoDB cluster and the architecture that I've shown.

G

You is basically a simplified version of how MongoDB works and arman described how there was a network partition and mongodb got confused. There were multiple masters elected and, and he had a corruption in his database. This is how MongoDB has achieved the industry-leading reputation that it has today.

G

By contrast, Cassandra manages your data without any master slave replication set up each replica and a master in a cassandra cluster can handle your updates independently of the others. So, even if two of those replicas are down, it's no problem, I don't need to take any heroic failover events. Cassandra keeps on working because it's designed to tolerate that kind of failure.

G

Now it's true that that, in an extreme situation like you, have Patrick and Rachel, destroying every node in your in your cluster that cassandra cassandra can't deal with that. But in a more realistic scenario, this kind of design can mitigate real-world failures, so the kind of the best petri dish for our infrastructure failure, maybe amazon web services- and I said that not not to throw stones at amazon, because you know they're they're the best in the business at this, but even though they're they have so much experience and so much expertise.

G

You still have a roughly one major outage a year. It's almost it's almost uncanny, how that happens like clockwork, and so what I take away from this is that now, even if you are the best in the business, then you do need to plan for that kind of outage, because it's going to happen so in 2011, EBS took down us East, EBS, again, elb bad network hardware, reboot apocalypse and then most recently, dynamodb metadata service.

G

So you know, Cassandra can can help you deal with these Christo's from netflix described. How, during last year's reboot scenario, where ten percent of all the men, all the VMS in amazon, were rebooted. They lost over 200 cassandra machines over 20 of those didn't come back at all, and there was zero downtime and by the way, Christos is here at the conference you'll be here tomorrow and give at his talk. One of the things he'll discuss is how Cassandra helped mitigate the most recent downtime.

G

The next thing I want to talk about is scale and it's absolutely critical to be able to scale to the largest workloads in the industry. Apple was here last year talking about their 75,000 Cassandra nodes, but it's arguably even more important to be able to scale as your business grows and protect. Wise is a good example of that. Now they started two years ago with a three node Cassandra cluster. Today there at three hundred Cassandra nodes, so Cassandra has grown with their business by two orders of magnitude and and made that happen smoothly and seamlessly.

G

Another factor having to do with scale is being able to scale geographically so here, I've diagrammed, a Cassandra cluster that spans New, York and London data, centers and I've put eight nodes in London in New York and five in London and I've done that deliberately, because one of the things that Cassandra gives you is an unprecedented degree of flexibility about how you configure this.

G

So it's it's totally comfortable with having an ACM metrical configuration where you have different numbers of nodes in different regions to handle different amounts of capacity requirements and I can have three replicas in New York and two in London and I can configure it at that level of granularity as well.

G

This is important because to deliver the kinds of availability that modern applications need you, you need to have the ability to scale across regions. In case you have a regional failure, so these these Keys properties of operational data bases they tie together.

G

So when everything is going well and you have Cassandra replicating to multiple regions and you can geolocate your users and send them to the closest region for the fastest possible response time- that's nice, that's nice, to have that that the ability to do that. But this is really critical. When things don't go well and you lose one of the region's and now I don't need again, I don't need to do any failover events inside Cassandra.

G

All I need to do is push out a DNS update, for instance, out to my users that say: you'll go to this region. Instead, you know no matter where you're based globally go to this region, because it's still up and Cassandra lets. You do that.

G

Finally, I want to talk about performance and again, there's kind of a right way in a wrong way to build a database to deliver maximum performance, and the wrong way is to is to build it on top of an abstraction layer that that keeps you from taking advantage of what your modern hardware has to offer. So this is a diagram of the Hadoop file system. Hdfs and the details aren't super important. So much is that hdfs is designed for moving large replicas of data at once, so the block size in HDFS is 64.

G

Megabytes HDFS is kind of an eighteen-wheeler of big data where it's designed to you put a lot of boxes in it and it ships those all to the same place, doesn't accelerate particularly quickly, but it's very cost effective and efficient at moving large amounts from one place to another. That's what it's designed for the problem is when you take this, this file system, that's designed for an analytical workload, and you try to build an operational database on top of it.

G

Then you run into problems and and that's what HBase has done and that that's why they have have those problems.

G

A group of researchers presented a paper at the usenix conference last November, where they examined production failures in HBase and in Cassandra and murat damier bus from suny buffalo summarized this as saying it seems like each base is very buggy, with fifty percent of failures being catastrophic, where a catastrophic failure means that either my data wasn't available because the mission the cluster went down or it lost data permanently, and this comes from the brittleness of that master slave design architecture that I talked about earlier.

G

By contrast, cassandra manages its storage locally and manages its replication natively to get the best possible performance out of your hardware. So we can build on the blocks that the operating system gives us like a map and, like f, advised to pull data into memory when we need it most and give you the best possible performance data.

G

Sterics contracted with a company called endpoint earlier this year to benchmark the top no sequel systems where you have Cassandra and couchbase and HBase in MongoDB, and if you ask someone who was you know kind of familiar with the industry and said what workload would Cassandra be most appropriate for he'd, probably tell you, it would be a right heavy workload. So that's what we've gotten in this first graph, where I have operations per second on the y axis, and this is a ninety percent rights work load across the x-axis.

G

What you see is going from a two node cluster and doubling the cluster up to an eight node cluster and again Cassandra in blue, doing very well against its industry peers. But you might be surprised, then, till to look at the read, read dominated workload and cassandra is actually doing even better, relatively speaking in this workload. So cassandra is really a general-purpose system that can handle a wide variety of workloads. This is the the balanced workload. What we're doing fifty percent fit reads.

G

Fifty percent rights- and you can see that some of Cassandras competition has trouble with the competition between those reeds and those rights that their conflicting with each other and reducing performance and then, finally, we did want to recognize that no even operational databases will need to do some light analytics as part of your application. Workflow such as the the machine learning that Luke and John showed us with spark against the killer, video demo.

G

So now that you understand what what cassandra is designing for around performance and around scale and around availability I want to talk about what we've been working on in Apache, Cassandra, 22 and 30 to make Cassandra even better at these characteristics.

G

Now, a year ago, at the summit, I talked about Cassandra 21 and at the time we were planning to go from two dot, one straight to 30 and deliver you a traditional big release, full of features in 30, but what we found that was that we were introducing a new storage engine in 30, but not all of the features that we were planning dependent on that storage engine.

G

In fact, most of them didn't so we split it up into two releases and we said we're going to try to get people the new features as fast as possible, so we're going to split out those features that don't depend on the new he engine in 222 and then the features that do depend on the new each engine. Those are going to be in 30 a little bit later, so 30 is in release candidate. Now we released that on Monday. We expect it to be generally available in October to dot.

G

2 was released back in July. So that's that's. You know. Everything here is is available for you to test and play with so. The first feature I want to talk about in 22 is JSON support.

G

Now you know that that Cassandra thinks about the world in terms of rows and columns and I can have a table that looks like this and I can insert data into it with a cql statement like this starting with 22 I can do all of this in JSON as well, so the syntax for that says, insert into table name JSON and then I, give it a JSON document. Literal and Cassandra will parse that and transform it into its native representation.

G

This is designed to allow Cassandra to seamlessly integrate into a world of JSON based microservices, so there's actually a subtle difference in the two statements that I gave here and you'll notice that in the c ql statement, I generate my user ID with the built-in function. Now that creates a time-based uuid, whereas in the JSON example I'm giving it a uuid literal and that's deliberate because we're in the in the JSON world, the idea is I'm consuming data from another service rather than rather than generating it myself.

G

If I'm generating the data myself, it's probably a better idea to stick with cql, but you do have that choice.

G

Cql allows you to nest data structures in in your cassandra rose and we can expose those to json as well. So we introduced collections back in Cassandra 12. Here's a table that has a tuple asset and a list column types here and so I can insert into that table with cql like this, and note that my tuple is represented with parentheses around the values the set has curly braces and the list has square brackets, so I have different representations of each of those literals and c ql.

G

So in json I only have a single kind of sequence, literal, and so that's that's the the list literal in the square brackets, and so, if, if you give me a JSON value and I'm and I know that under the hood in Cassandra, it's a tuple or a set I can coerce that JSON list and transform it into the appropriate Cassandra type.

G

The other way to nest. Data inside cassandra is with user-defined types and a simple user defined type looks like this, where I have an address street number and street name, and then I can use that in my table definition and say I have a street address column that is an address value and then I can insert into that with cql and then with json as well. So a user-defined type becomes a sub document in the JSON world and we can take this further.

G

We can nest things arbitrarily deeply in a Cassandra table and expose those to JSON and the reason you'd want to do this. Is you want? You want to denormalize the data that your application needs for a request into a single Cassandra row, so you're not having to join data from different machines together in a given request? Nesting data into a Cassandra table makes that easier.

G

So, let's take a slightly more complicated address definition where now I have a set of phone numbers associated with with each address and then I'm going to. Let users have multiple addresses now my users can have a home address and a work address and we'll just let you supply any kind of address you want.

G

You give me a text key and then an address value that it maps to, and so exposing that to Jay son looks like this now, where I have the ID and the name literals at the top, and then the addresses collection becomes a sub document of, and my home address becomes another sub document inside that.

G

So again, what this does is it lets me build modern micro, service-oriented architectures and play with those with Cassandra in a frictionless manner and making and reducing the obstacles to being able to build those applications.

G

We've also added in Cassandra 22 we've added windows, support and I announced at the summit last year that two dot one was kind of an extended beta for windows now into dot to its production, ready and it's a first class citizen. We expected to be on a level playing field with our Linux support.

G

We've also added support for role-based authorization, and what this is intended for is for larger enterprises that are managing multiple members of a team. We we'd want to avoid the problem we're through human error. Some members of the team have different permissions than the others, so you can create an accounting role.

G

You can assign permissions to that role and you can assign users to that permission and Cassandra makes sure that that all stays in sync so, as you add users and remove users from your accounting team, all you need to do is add and unassign that role and that possibility of a mismatch. Permissions goes away.

G

We've also added user-defined functions, and the idea here is: we want you to be able to push logic closer to the data in the Cassandra cluster. So as a very simple example, this is how you would define a sine function that computes the mathematical sign in Java in Cassandra. So you say, you know, create function, you give it the parameter list, you give it the language name, and then you give it the function body.

G

That's all there is to it out of the box. We support java and we support javascript, but we support integrating with any javascript in a pie compatible language. So if you want to write functions in Ruby, you just drop the JRuby jar in your classpath and and you can create functions in Ruby and Cassandra.

G

So I can invoke this function like this. In my select statement where I'm invoking the the function on a column called value- and you may be thinking that you know this actually doesn't look terribly useful, because whether I compute the sign in the cassandra cluster or whether I compute the sign in my application code that doesn't really matter a great great deal, because I'm pulling the same amount of data back to back to my application, server, either way and and you're right.

G

So the the more important thing that that these user defined functions, enable is the ability to compute aggregates at the server side and then aggregates a little more complicated, because I need to define an intermediate state function that accumulates values that are being processed and then I need to define a final function. That gives me the final value from that intermediate state.

G

So for my average function, I'm going to I'm going to have an intermediate average state, and that looks like this on all this is doing this is this is saying that, as values are fed in to my averaging function, I'm going to increment account of values that I've that I've, aggregated and I'm going to increase the running total by the value that that I just saw that that's what this says and then my final state function is going to take the running, total and divide it by the count to get the final value so now.

G

This is. This is a lot more useful, because I can combine thousands of values at the server and compute the average without having to leave the server and then I can just send the average that the client wanted back to him and that's much more efficient, now much more efficient use of network resources.

G

Finally, for 422 we added commit log compression you'll. Remember that for two dot one, we did a lot of effort on our performance for cql, reads and writes, but we were starting to be bottlenecked by commit logged performance and by the amount of data we could push to a single disk, and so we've all EV ated that bottleneck in 22. So you can see in green Cassandra two dot.

G

One performance 22 in blue is about ten percent faster, but not only is it about ten percent faster, but you can see that there's there they're kind of spikes in the two dot one graph where it has to pause briefly for the commit log to catch up and those those pauses have been eliminated into dot too.

G

So it's not only faster overall, but it gives you more consistent performance commit log compression is kind of experimental into dot to it's off by default, but it's going to be turned on by default in 30, and I encourage you to to play with that. You can turn it on on a single machine, make sure everything's still stable and then roll it out to the rest of your cluster.

G

Finally, a day tiered compaction strategy is a new compaction strategy that we created during the 22 time frame, but it's not actually limited to 22, because we designed compaction strategies to be pluggable. We were actually able to take this back to two dot, one and even 20, without risking the stability of the system, so I'm going to look I'm going to show you two graphs that illustrate date tiered compaction performance and what this is designed for.

G

It's designed to handle dense time series workloads and allow you to pile a lot of data, even cold data, on a single box. So what I? What I've done in this workload is I've, actually pushed out all the way to 18 terabytes of data on a single machine? And so this this graph here? This is the Reed performance, so I've put leveled compaction at the bottom.

G

Sighs tiered compaction in the middle tiered compaction at the top, and you can see I know how each of those compaction strategies compares across across this data set leveled compaction as you're, probably not too surprised to find out, falls off a cliff at about two and a half terabytes of data and you're, probably not surprised because level compaction is designed for read mostly workloads. This one is ninety percent rights.

G

So it's very right intensive, and you also see on this graph that, on the redes de tiered, is outperforming size tiered and you're, probably not surprised by that either because sized here it is designed to give you better write performance.

G

So, let's look at the at the right side of this workload, and now that this is this is probably a little more surprising that even on rights day, tiered compaction is outperforming sized here, because it can take advantage of what it knows about time: series data to avoid unnecessary disk activity and wasting that I owe on data that doesn't need to be compacted together.

G

So that's! What's that's? What's coming in to dot to? We have a lot of new features across the board, as mentioned earlier that that we pulled out into 30. We pulled the new storage engine into there and that touches everything in the system at an enables new features that we couldn't do with our old engine, our old engine. No, it's it's served us well, but fundamentally, at the atomic level, it thinks of data in key value pairs and so to deliver features like materialized views.

G

We needed an engine that that understands cql, semantics adam at a more fundamental level, but even without those new features, there's one important benefit you get from the new storage engine, which is that it's much more space efficient and how much more space-efficient depends on your workload, and it depends dramatically it can go from saving you, fifty percent on one of these workloads to seventy percent on another.

G

What I can give you, as a rule of thumb, is that data models with a large number of collating columns are going to save more space in 30, because that's one of the places where we able to pull redundant information out and store it in just a single location.

G

Another improvement we've made in 30 was around hinted handoffs, so I showed you earlier that in a Cassandra cluster I can have any node route, the request to any replica, and if something goes wrong we don't need to take.

G

We don't need to panic it's designed to handle that one of the ways is designed to handle that is with something called hinted, handoff, where, if any replica doesn't acknowledge an update for whatever reason the coordinator will write, what's called a hint to itself that says once that node comes back online or I'm able to talk to him again, we'll send that that mist, update over and and delivering that hint is called handoff. So that's where the the feature name comes from the the problem with hinted.

G

Handoff has been that we've stored the hint data in a cassandra system table, and so that looks like this, where, when I write the hint it goes first to the commit log, then we sort it in mm table and finally, we write it to disk and then, when we delete the hint after we successfully deliver it.

G

That deletion creates a tombstone in the commit log in the mem table and on disk again, and we're still not done because right now in this, in this picture, I have the original hint in one data file and the tombstone that says it's deleted in another. So I actually need to compact those together to reclaim that disk space, and so this is there's a lot of overhead in this design that doesn't actually help us in in hint delivery, because we don't care about indexing hints by their idea or random.

G

Access to two different hints that Cassandra's storage engine is designed to provide all we care about is store a hint safely and then bulk deliver them. So 430 we created a very simple custom storage engine for hints, and it just looks like this, where I'm going to create a flat file for when, when a replica doesn't ignore my update and I'll just append hints to that file and I'll do that for every replica and the cluster that I need to store hints for and then, when the replica comes back online and I deliver.

G

Those hints I just delete the entire file at once, so it's very, very lightweight because it doesn't need the extra indexing that's going on in in the main cassandra storage engine. So the performance difference that you can get out of this.

G

It varies widely based on your cluster size, so I have here the most extreme possible example where I have a two node cluster and one of the nodes is down, and I'm writing hints for it to the other, and this is extreme because, as my cluster size grows, the responsibility for storing hints for a dead node gets spread across all the nodes in a cluster. So in a ten node cluster you'd expect the difference to be about you know.

G

Ten percent is large, but in this extreme example, 30 is about twice as fast as as 22 in this degraded situation, and this this comes back to what I said earlier about how we want Cassandra to be able to scale down as well as scale up, and this is one of the features that is more meaningful to smaller Cassandra clusters, because those are important to finally I said that the the new Cassandra storage engine allows us to build materialized views for the first time.

G

What that looks, like is suppose, I'm collecting songs in my application, and each song has an album associated with it, and I want to ask Cassandra what songs belong to a given album. So with materialized views.

G

I can create the view and say select star from songs and my primary key now is going to be not just the ID but first the album and then the ID, and by doing that, that that tells Cassandra to partition the data by the album and so now I can go and I can say, select star from songs by album for a given album and Cassandra can get me that song list and you're probably thinking this sounds a lot like indexes and functionally they're very similar. I can take my songs table.

G

I can create an index on the album and I can select from the table using that index. So it's functionally very similar. The difference is that indexes are managed locally. Each node in the cluster index is the data that it owns. So what that means is when I go and I asked an index what songs are in this album cassandra has to scatter that query across the entire cluster and then each node can look up in its local index.

G

Oh, do I have any songs in this album and then we gather back those responses and feed them to the client.

G

So as a consequence of this, if I go from my six node cluster here and I'm doing 10,000 requests per second and I go up to a 60 node cluster, then I'm still going to be pushing around 10,000 notes per second because of the bottleneck has become the scatter and the gather parts of the operation, like contrast, a materialized view, since cassandra is repartitioning that data. In your view, I only have to go to a single replica.

G

I can go to the replica that owns the partition for the tres hombres album, and I can get all the songs back from that one, one from that. One replica. So now, when I go from a six node cluster to a 60, then I expect my performance to go from 10,000 per second to a hundred thousand.

G

Now the the reed performance of a materialized view is absolutely identical to the reed performance from a normal table. It's the exact same code path manage the exact same way, but I want to give you some idea of what to expect from the right performance in your cluster as you introduce materialized views to it. So we're going to look at that a couple ways. The first way is to know, let's, let's take the writing to a table without any materialized views in at the top here in purple and then in the red line.

G

I have a single materialized view that I've added and then in the purple at the bottom I've got five materialized views. So the rule of thumb is that adding a materialized view to your table will cost about ten percent of your performance, and this is done just with the standard Cassandra stress tool. But we also wanted to look at this a slightly different way and say no, given that I need to denormalize this data for my application. I either need to do that now.

G

The old-fashioned way by batching up those updates to tables that I'm maintaining by hand in my application code, or I can have Creek Cassandra- do that with a materialized view.

G

So, to compare those two approaches, we created a custom benchmark tool called MV bench, and this is what our performance looks like 44 materialized views against a base table of playlist data, so the materialized views are delivering fifty to eighty percent, better performance than doing it manually, which is good because not only is it more convenient by pushing that logic to the server I can deliver very better performance, but you'll notice.

G

One thing of with this materialized view line, which is that it looks like it's trending downwards over time and that's not an accident of the benchmark. That's actually that's actually caused by something real going on under the hood and what's happening here.

G

Is that in my base table of playlists here as I'm, inserting more and more data into into each of those playlists I'm increasing the time I spend in lock contention against the base table because materialized view maintenance needs to take out exclusive locks against the base table to keep the views consistent with the base table data and so, as a rule of thumb that that contention starts to become meaningful at about. 200 rose / partition.

G

Fortunately, most tables that you'd want to denormalize with materialized views have small partitions. But this is something that you're going to want to keep in mind, as as you incorporate materialized views into your data model.

G

So we've talked about what's here, into dot to what's coming up in 30 I want to take just a couple minutes to talk about what's coming after 30 and in particular, what we're doing to make Cassandra delivering features to you on a more regular basis and with even better stability, and so you saw, you saw a little bit of what our telegraphed approach when we split to two off of 30, because smaller releases, you know ship faster and they're more stable, and so we want to take that even farther we're going to start doing monthly Cassandra releases to give you more features and better stability, and we didn't just want to jump to doing monthly releases because you know that would be a good way to let our feature development get ahead of us and and cause problems.

G

So we looked to Intel's tik-tok development process for inspiration and what Intel does is they split their their microprocessor development into a tick of smaller transistors and new manufacturing process and a talk of a new micro architecture, and so by splitting those up, they reduce the possibilities for error and the possibilities for conflicts, and you get a more reliable schedule. So the way we're applying that to cassandra is that every other monthly release will be a pause in feature development and just include bug fixes so 30.

G

We expect in late October will begin the tick-tock releases at the beginning of december, with three dot. One which will be just bug fixes then 32 in january will include new features. 33 will be just bug, fixes and so forth.

G

Now, if you're paying attention to the industry recently, intel has actually had some hiccups in their tick-tock process and we're realistic enough to acknowledge that will probably have some hiccups as we move to this development process as well. So what we're doing is, in parallel with the tick-tock process, we're going to continue to deliver traditional stabilization releases of the 30 line, so we'll release, 301, 302 and so forth, and each of those will not include any new features at all, but strictly contain bug fixes.

G

The other thing that people ask me about when I describe the tick-tocks process is what about compatibility, and the answer is that we're going to be a hundred percent compatible across the entire 3x tick-tock line and at some point, probably in about a year, we'll introduce four decks and will that will be our line at which we sunset deprecated features and so forth?

G

But for that entire 30 X series, it's going to be a hundred percent compatible I, hope you're as excited as I am about the new features in Cassandra 22 and what's coming in 30 and the new development process. That will allow us to continue delivering new features at a regular cadence and keep Cassandra the best operational database in the industry for years to come.

G

This is important because you're not building applications from 20 years ago, you're building 21st century applications, and you need the power of a 21st century database behind that now to explain what Dennis X is doing to help train the world to use the power of Apache Cassandra I'd like to bring Billy back on stage.

E

Jonathan before you head off, I want to get some help from you here. Should I get this right. I'm gonna roll this out. You grab that end and let's see what we're looking at here, don't burn. My fingers, you're cutting me here, walk back, so I don't fall off the stage all right. What we have here is signatures from our inaugural certification process for Apache Cassandra sponsored by o'reilly. So big, congratulations to everybody on the list and we will make that available for everybody down in the in the partner pavilion area Jonathan. Thank you.

E

So much for all of your effort in time.

E

So a lot of deep stuff there right Jonathan is nothing if not thorough, and we will always get a good look at what's going on with the technology and the inner bowels of the systems and how they work and that's one great thing about being in the open source community. But that does lead to a challenge that is vitally important that we help you solve, and that is the ecosystem itself.

E

How fast can we get the ecosystem up to speed on how to use and take advantage of these great new features, and so our training program that we ran yesterday was far far more successful than we had anticipated. We had planned for 600 people. That was our capacity. We started to try and push it to 650, because we had a lot of inbound demand. Then we tried seven, we ended up I, don't know how we put 737 people through the training yesterday that was done with a partnership with O'reilly for the certification.

E

That is the kind of thing that is going to take now this technology and once again put another accelerant on it, so that you can take this knowledge and start to apply it in very meaningful ways. Not everybody in here is going to need to know how hints are handled at that level. But what you will need to know is how to build a fantastic data model to support your application.

E

You will need to know the basics of how you get your data in and out of the system, so our training initiatives are going to be centered around helping this world move much much faster, so please take advantage of them. We have a lot of free offerings available online that you can come and get yourself trained at your own pace, but do it do it right? Because when you do it right, you're helping yourself you're helping your company you're, helping everybody see the value that you bring to the market by getting things like a certification.

E

So let's learn how to do this stuff right out of the gate. It'll make everything go much faster. Next I want to thank a very special group of people. We got the opportunity to talk to Microsoft with Scott and see what they're doing with us, but the partner pavilion as I said earlier is just off the charts.

E

This year we've got and I know well beyond 35 partners back there and I know many of you were there this morning, and already the room was getting packed, but just looking at our our gold partnerships for a second is very representative of what's happening in the industry and what you're seeing is market leaders, people that have had market dominant positions for decades are realizing. We have to also get in this game.

E

Stay ahead, make sure we're leveraging all these new technologies, but you also see traditional market makers, companies that have combined created billions and billions and billions of dollars of value and created new markets.

E

That is fantastic when you see that kind of endorsement from a traditional ecosystem come to an event like this and then finally, there's a third classification of partners that you'll see, which is the startup crowd, to build the next couple of decades of these giant companies that are going to define our industry. So we really want to thank them for all their effort and I know. You won't be disappointed when you go back and spend some time talking and seeing what they're all up to. But finally, and most importantly, it is really all about you.

E

This is where we want to thank you for every thing that you've done for the time: the dedication, the passion, the energy that you're applying to learning this new market in this new technology. Everything I said last year about you being the authors of this new world is a hundred percent true, but I believe it is risen by an order of magnitude when I see what's happening now in the market and when you go to these 137 sessions and you listen to what's going on and you realize this is in your control.

E

This is your future. You get to write an entire industry that doesn't happen very often guys. It just doesn't happen very often it can't the ecosystem can't sustain that in anything other than a couple of decades at a time. So it is a real privilege to be able to stand here and thank you for your participation and to help you get accelerated and to help you get creative and get passionate about what you're doing so. Thank you all very much and I hope you have. A wonderful conference enjoy yourself.

E