Apache Cassandra Cassandra Summit 2013, 26 Jun 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo

Description

Speaker: Dave Gardner, Senior Engineer at Hailo
Slides: http://www.slideshare.net/planetcassandra/no-whistling-required-cabs-cassandra-and-hailo-by-dave-gardner
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.

A

And I'm excited and a little bit nerve united states today. My first trip stateside my first Sandra summit and today, I'm gonna, be talking to you about Cassandra at halo and when I, when I proposed. This talk, I kind of that's as far as I took it really I. Just thought. I will come and talk about my use case. I'll figure out the details later and then, when it came to the point when I was actually writing. My slides I was having some difficulty kind of formalizing.

A

The talk and I think the reason behind that was that the kind of story of Cassandra halo is actually really straightforward and the story really is that it just works.

A

Cassandra is something that we use and we don't have to put a lot of energy into using it, and we don't really thinking about it too much so from a coma Cassandra perspective, that's absolutely fantastic, but from at all prospective it made my life slightly hard, and this hasn't always been the case. I'm from the UK I'm from London and I started using Cassandra back in 2010 and back then we were using version. Naught point six and I think it's fair to say that in those days Cassandra was not something that you could just use.

A

Cassandra was something that you needed to work out, Cassandra of something that you needed to put a reads with amount of energy into keeping it running, keeping it running smoothly.

A

So that led me to start the meetup group in because in a in London it's the the longest running Cassandra user group in the world. That's my claim to fame and I found it in screw. The motivation really was to try and find some people who were who are using this database in 2010. Who could pretty much tell me how to use it now that was my motivation and back then a common theme of the user group was kind of war stories.

A

It was people would come and tell their tale about how they got burned and how it blew up on them and- and we learn- and we went forward so fast forward to today- 2013- and it's it's quite impressive- really to think how far cassandra has come in that time. Now we're on version, 1.2 and I. Think Jonathan's talk yesterday kind of really brought it home to me just how many new features coming.

A

You know in version 2, dot, 0 and and really just how much does be achieved by the kind of team, the Apache team and all the people who've contributed code. So- and that's not me so thanks to those guys for that effort, so yeah.

A

So that leads me nowadays with a database that I haven't really had to think about in the same way that I would have done say in 2010, and that was kind of got me thinking about my talk because back in 2010 you know a good Cassandra tour would usually involve some pain and some some real effort to get the thing working, whereas nowadays we don't really have that. So what am I going to talk about today?

A

Well I'm, going to use the talk as a kind of opportunity to to do bitter, a retrospective on our a user of Cassandra, halo and I'm, going to talk about the adoption of the process of how we got this technology into our organization and I'm going to look at it from three perspectives.

A

I'm going to look at it from the perspective of the development team, from the operations team and from the perspective of Management and to kind of research this this talk, I, I, went and I made an effort to go and talk to these groups of people in organization I'm, one of the developers I'm on I'm into box one and kind of the people in the operational team.

A

I have some contact with and management increasingly less so so that was quite a valuable experience for me to go and talk to these people so before I get started on I'm kind of on Cassandra I'm. Going to just tell you a little bit about halo halo is the the taxi magnet it's an app that runs on your iPhone or Android, and the press of a button. You can hail a licensed taxi to come and pick you up. So this is in London and all you have to do is hit.

A

The button say pick me up here. The taxi will come, come and get you you'll be able to see the taxis you'll be able to see the driver on route and when you get to your destination, you can just get out the cab safe in the knowledge that it will be charged against your registered card.

A

So that's kind of halo in a nutshell, we're making it really easy to get a cab and to give you some kind of context to the sort of technology platform that we we operate and we built halo to come a long way since November 2011, where, when we launched in London it's now the world's highest rated taxi app. We've got over 10,000 five star reviews. Now: we've got over half a million registered passengers and a halo, a hey Liz accepted around the world once every four seconds.

A

So we've come a long way and that's kind of you can see that in the the cities we operate in, we now operate in 10 cities globally from Tokyo to Toronto. So it's a we've made a lot of progress and that's and that's not the end of the story either. So halo is a company. That's growing and we really have global ambitions right now.

A

Halo is a marketplace that is facilitating a hundred million in run-rate transactions, and we've raised 50 million dollars from some of the world's best venture funds and we're using that money to expand more.

A

So with that in mind, we'll start to look at Cassandra and how we how we ended up with Cassandra halo when we launched in november two thousand eleven. We we didn't use Cassandra that point halo was a platform that was built by quite a small number of people, a team of three or four back n engineers, and we had to sort of web applications based on PHP and MySQL, and we had to kind of java backend to do most of the heavy lifting we used for single availability zone resilient in AWS.

A

We were using multi-master replication.

A

So why did we end up using Sandra? What was the motivation behind adopting Cassandra well before launch? The focus of halo is all about features. We needed to get the platform ready to deliver the core experience of halo in London, and once we launched we, we had a slightly different folk. We knew we wanted to expand globally and we knew we wanted to become a utility. So we wanted. We wanted this like really resilient and reliable system. We wanted hailer to always work.

A

So if you wanted to get a taxi, we wanted to be able to get you a taxi. We didn't want, have any downtime, we didn't won't, have any any periods where we were. We were having difficulty so that kind of desire for greater resilience seemed to be a good fit with Cassandra's design and its kind of high availability characteristics.

A

The international expansion plans seemed to be a good fit for Cassandra's global, multiple data center replication, and then we had some we had. You know we had expected growth, we were, we were going to invest in marketing. We had plans for its global expansion, so we wanted a database that we didn't get in the way of those plans.

A

We wanted something that would support the company as it grew and the fact that Cassandra scales, linear linearly for reads and writes, seemed a good fit for that and then finally, we had prior experience so I when I joined halo in 2010 I'd already know sorry 2011.

A

I joined halo in two thousand lived when I joined I'd already been using Sandra for about a year, so I could bring that experience with me, and that was that was that was useful in kind of being able to choose technology and kind of know that it would probably deliver what we wanted it to deliver.

A

So the path to adoption of of Cassandra was was largely a unilateral decision. It was, it was really develop, a lead. This was back in the days when we were running out of a boat on the Thames, and we had quite a small office, we're all in the same room and we we sort of the development team decided to adopt it fundamentally.

A

The way we went about kind of bringing into our architecture was that we we took the PHP MySQL web apps and we we broke down the functionality that they provided in two independent services that did kind of one job well and those services. We reused cassandra for the data store and then slowly we we started to sort of plug report. You know hollow out the functionality of those the web apps and replace them with this.

A

These services are kind of s away and we did the the launch path was that we we we were just starting to launch in the US, and so the first thing we did was: we ran all of our US operations out of Cassandra, but for the core customer data, and so many other stuff that needed to be global and then slowly but surely we switched over the other cities that were already launched, which were doubling in London until we were all running off this platform.

A

So the development perspective, the overall feeling from the team is that it just works, I, think and that it's easy to interact with, and it just kind of gets the job done. You don't have to think too much about it.

A

We've got two main use cases for Cassandra: we've got empty storage and time-series data I'll, just sort of give you a flavor of how they had a matchup and how we use our use Cassandra.

A

So this is an example of entity storage disease. Are we store our customer details in Cassandra, so we use the row key here? Is a 64-bit integer and we're using a kind of snowflake style, globally unique number generation and then the column names so things like created, timestamp, email there there the cassandra column names and then the values are the actual property values. So this is kind of an oft-used pattern. It's quite a straightforward way of using Cassandra. We have one row per record and they get distributed around the globe, which is quite nice.

A

The main consideration for this use cases is to make changes on a column by column basis. So if you, if you want you know, when someone in the app updates one piece of information, we make a mutation into Cassandra to just update that one field.

A

What you don't want to do is you don't want to read the whole record change one thing and then write the whole record back, because if you do that, you've got a potential for race condition. So, if actually one thing over writing another, whereas by just changing the one column you you avoid that you're basically saying you know to set this piece of information.

A

This gives you an idea of the kind of read and write workload that we're doing. This is just one of our kind of entities, and this is the customer entity, and you can see the rates are quite low, so we're peaking at about fifty reads a second and the right rate is really low. So this kind of gives you an indication that we're not really using Sandra, because we have a big data problem or even a really high volume problem, we're using Sandra. For other reasons,.

A

The second pattern that we're we're using is is kind of the time series data. This is kind of the bread and butter of Cassandra. Really, a lot of people will be using this, and this is one example. This is our storing our communications history in Cassandra.

A

So when you, when you, when you take a journey, will be sending you messages, potentially SMS messages emails and we keep a record of those four things like customer service and so that customers can request a you know us to send them a copy of the receipt and things like that, and so what we're doing here is we're storing all the information under one row. So here the the day, the date is the row key, so 2013, 06 01, that's the row key and fit in each day.

A

We're storing all of the email send all of the messages sent under that one row. The column name here is a time uuid. So that's a type one globally, unique identifier and baked into that. It has the concept of when it was generated and Cassandra can understand these, so Cassandra will be able to order the order. The columns by time, so what you end up with is we end up with one row that contains all of this stuff sent ordered by time.

A

So we can query it in a way to say get me things sent between these two time ranges. We also denormalize on right, so here where we're doing another time series. So it's the same information here, but we've got a different row key.

A

So, instead of storing everything for the day under one row, we're storing everything sent to me forever and this kind of works out for us, because the volumes are on that high, so we're not expecting to send millions and millions of messages to people where you know keep it quite small and that kind of leads into the key considerations for time series. This is an example of the kind of read em right work like we're doing for this particular the communications case.

A

This this work, though, we've got a slightly higher I'm right rate, which is the green line. Then read so it's kind of flipped around and you can see it kind of follows a that's our sort of pattern of use. When you know we have rush hour and busy and evenings and stuff, we do have higher volume stream. So this is this is our stats kind of event stream, and this is a time series data as well. This is this is sort of this.

A

Is that kind of higher highest volume data, and with this we're peaking at around five thousand right operations per second and the read you can see, the read rates are really sporadic. This is a kind of reporting system that we use and you can see that this is on Friday there's a sort of some general traffic. You know people requesting stuff out of our platform and then at the weekend it kind of disappears. It goes down to nothing, so that's kind of this kind of shows on our highest volume case.

A

So we do have some some data. That's that's! You know it's not all really low poly.

A

So the key consideration for time series really is to choose the roki carefully, because what you want, what you don't want to do, is you don't wanna pour all of you know all the records in one row without use case it kind of works, because we do not do normalizing on right. So generally, one record will update three or four different rows and we're storing a sensible number per row.

A

If you were writing, you know a million every second and you wouldn't store them under a day roki, because you'd end up with really really wide rows. You want to try and partition them a bit more carefully. So that's kind of that's really the key consideration for time series.

A

The client libraries were using a halo we're using the stein acts java client, which is the netflix open source project. We use in PHP casa for PHP and we're using ghazi for go we're not using cql at the moment we're kind of using the the kind of older style rift base. You know our pc client and for us that that seems to see what we're doing right now we might move to c ql in the future.

A

Potentially I think there could be some advantages, especially around kind of inducting new developers, but right now we're using the kind of thrift style, RPC, clients.

A

So the the other sort of half of our use case at cassandra is analytics. One of the things we lost when we migrated data to Cassandra was the ability to to conduct queries of the the sorts of queries that you'd say. You know count ness calculate the sum of this or average with a group by clause. So that's something we did a lot of in the PHP MySQL web app, so we'd have you know analytics to be able to.

A

You know count how many jobs the driver are done or how many jobs a customer had done, and the migration to Cassandra meant that we kind of lost that ability, because Cassandra doesn't have that ability to to say, select cam star or some the other. So we use a product called hakuna analytics, which kind of gives us that such a facility back, we define we kind of predefined query templates, which basis eze.

A

We have to know ahead of time how we want to query it, and then hakuna will write all the data to Cassandra and denormalize it massively on right such that we can query it in real time. So we quite light is because the integration is very straightforward for us. It just gives us this facility without having to really work at it. So we find that very helpful. This is kind of an example of what what we can do with this tool, but we wouldn't be able to do with Cassandra.

A

So this is a kind of query language, that's specific to a canoe called aql, and you can see you know if you've ever done any sequel, you'll kind of recognize, what what we're doing and if you've ever used cassandra you'll recognize that you wouldn't be able to do this with war cassandra. This facility just doesn't exist, and we can do things like you know, group, by different time periods and stuff, and one of the features that we're just kind of starting to explore is the kind of dashboard side of it.

A

So this is relatively new and we're kind of using this to give us some operational in sight. Whether halo is running really so. This is an example of a sort of plotting customer demand over time, and one of the the newest dashboard features is the kind of the geographic heat maps. So halo is a most of our data is geographic in nature. We have customer demand at specific location and kind of drive, a supply at a specific location and one on the feet.

A

One of the newer features that we're just kind of starting to get into is the ability to kind of get a feel for that on a map, so that's quite handy as well, so the challenges of adoption.

A

This is the main one really people joining our team people joining our company would generally come with a back history of sequel experience. So you know, generally people will just have 10 years experience of MySQL or something Cassandra experience is unlikely. I, don't think anyone has joined our company with prior Cassandra experience. So that's a challenge, that's something that we have to kind of work around and mitigate and I. Think that kind of leads on to the second point, which is that some people can sue themselves in a foot and I.

A

Think that's kind of true with sequel as well. In that you know you can you can structure your data badly? You can create terrible indexes.

A

You can make your database perform poorly, but I think the the kind of natural base level of experience with my sequel or the sequel solutions means that people generally avoid that you know unless you're very, very new developer, you're generally going to avoid that, whereas perhaps with cassandra, because they don't have that experience because they're coming at it fresh, there is the potential of people to shoot themselves in the foot with Cassandra things like you know, secondary indexes, cassandra people, keeping the SQL mindset and not d normalizing enough on right, some of those things trying to join stuff.

A

So we have to kind of guard against that. So some of the lessons learned on the development from well one of the things I think is really important- is to have an advocate in the team and I've kind of taken that role in our company, but perhaps I haven't been quite as on it, as I should have been I. Think it's important to try and get everyone bored. You know we've got people joining the company continued, so you need to kind of sell in the dream you to tell them.

A

Explain the tool tell them what's going on and kind of infuse them into using it, because if you don't and they'll probably shoot themselves in the foot with it learn the theory. In my mind this is this is a really important point.

A

I think it's cql can kind of encouraging SQL mindset you it's good in one way, because you can you can get people on board quicker because the way they interact with the database it's closer to how they used to use sequel but at the same time it's kind of dangerous, because it's I think it's important with Cassandra to understand. Understand the underlying storage engine. You know understand about the kind of colocation of data and understand kind of you know how how to play to its strength. You know d, normalizing and stuff like that.

A

So we make a real effort halo to try and share this stuff and I.

A

Think you have to make pretty more of an effort than you would with other storage engines, because it's so new, so we try and get people to you know come to the meetups and to learn the theory and we try and do things like peer reviewing data models to make sure people aren't doing anything crazy, so I think Jonathan gave a good example that yesterday, which was the kind of Q example where you you have a durable Q and every time you read the next one off it you're kind of reading, 10,000 tombstones.

A

It's that sort of thing that just getting someone who's used it a bit more to look at it will avoid all right. So next up the operational perspective.

A

I think the overall feeling from the team is that cassandra is allowed a very small team of operations, people to achieve things they wouldn't have considered before it existed. So the main thing- the main kind of point here is about the kind of global active, active replication and the fact that we can do that with a really small team.

A

This is halo. This is where we operate in the moment. We've got offices on the ground. Here, we've got people doing stuff, so we're in we're all the way from Tokyo and Osaka, obviously London, where we started in Dublin, we're down in Madrid and Barcelona in Spain, and then we're over in the US and Toronto Canada Montreal Washington, Boston, New York. Of course, we've just launched there. So there's a lot of places and you can kind of get a feel for why the replication story of Cassandra was so important for us now.

A

This is kind of what our setup looks like. We run two clusters of Cassandra in production. Each cluster has six nodes in every in each region and we're in we're fully on AWS are in AP southeast one, which is the Singapore region. That's the service, Tokyo traffic, and you know the Japan market we're in u.s. East, one which is Virginia, which covers off North America and we're in EU west one which covers off Europe. Basically, the we've separated our clusters into two we've got a stats cluster and operational cluster.

A

We've done that because the operational cluster is is the thing that's needed to get a taxi. If this cluster starts working, our app stops working. The top cluster is less important. It's not on the critical path, so we've split out the use cases and the workloads really to isolate. You know potentially huge volumes of data being ingested in the stats cluster, taking out the operational side of things and we haven't. There will be a third data center for the stats cluster, where we haven't quite got around to doing that migration.

A

Yet each of our region's we're operating in a Amazon virtual private cloud and we're using openvpn links between the the V pcs to connect them we're using em one large machines at the moment with provision die-offs, so we're sort of paying Amazon to give us guaranteed levels of I ops. So we're running on EBS, which is I, guess quite unusual. Most people aren't running on EBS the operational cluster, we're looking at about 100 gig, a node in the moment, and the stats cluster about 600 gig. This is with compression.

A

We only just recently turned on compression I've got a slider better and a second.

A

The way we do backups is reasonably caveman. We take SS table snapshots. We were then uploading these two s3, but we found that that was kind of saturating, all our network bandwidth and causing issues. So now we just eat, take EVs snapshots of the SS table snapshots and that's instant. So this is kind of one of the reasons we use EBS. Is that kind of ability to take snapshots quickly we're not using any sort of clever tool like Netflix tools that allow us to do smarter things?

A

Perhaps we will in the future, encryption was a requirement for our new york city, launched the regulations in NYC, put put in place a sort of huge number of things we had to do it to be able to launch their, and one of the things we had to do is we have to encrypt all data at rest, so all of our abs volumes are encrypted with DM crypt.

A

We chose that because the operations guys chose it because it's quite uncomplicated, apparently I, don't know anything about it really and the tests that we did suggested it added like a one-percent, IO performance here. So it's quite manageable. Really we use op Center, which is the data text tool. We use the free version and the ops guys are quite quite enthusiastic about it as a tool.

A

We feel that it kind of gives new staff and easy in on kind of getting to grips with what sander is how it operates and it's kind of the just the ability to have these simple. You know the simple screens of data that tell you what what's going on. So it's a it's a kind of quick win. It's a free thing. You can install the news, multi DC.

A

This is kind of the main one of the main motivations for our adoption of Cassandra, and it's really the the big success story and that we think of Cassandra, as though we were able to you know when we, when we launched our Singapore region, all we have to do is bring up some machines and sort of type a few things in and it was it was online and active. You know zero downtime and it's been rock solid. We haven't had any problems at all with it.

A

We read and write a local quorum consistency level and in order to kind of make that work we use. We use narrow repairs on schedule to go around the node just to make sure that all the data is is in sync, so that's kind of. If you went to Jason Browns talk yesterday, he was talking about repair to make sure that any inconsistencies are dealt with. So we do that on kind of a rolling rolling nature around the cluster each night we've recently started to use compression.

A

So we were at the point where our stats cluster was running at about one and a half terabytes per node and obviously with Cassandra. You need you need sort of fifty percent Headroom to be able to do the major compaction. So we were we needed sort of three terabytes, a node which was I, think it was actually more than we had available and at that point we didn't really want to add. More nodes are going to talk about that a little bit later in a management perspective, but we did we didn't want it.

A

We didn't want to scale this out for really for cost reasons, so we thought we'd try out compression, just see what happened and we just turned it on. It was very straightforward and it just worked and it gave us enormous savings, really we're down to sort of six hundred gigabytes per node. Now it's very easy to accomplish, and we just run no tools, upgrade SS tables to kind of apply the compression to all the historical SS tables.

A

So lessons learned operationally I think the main one is that Cassandra doesn't demand a lot of your attention. It certainly doesn't demand a lot of our attention. Anyone, I don't know anyways, we'll use anyones used cassander a lot, but this is a view of op center of our stats cluster and it has an operations person. You kind of want to see the circles to be the same size and you don't want to see lots of streaming going on.

A

So this cluster shows circles that are very much not the same size and quite a lot of streaming going on, but this cluster still works. This is still carrying on soldiering on fine. When I took this screenshot for the presentation it, I would do realize that our cluster was kind of so out of balance really. So the point is that you know I think of where we are as a company. Now we probably need to invest a little bit more upfront and Cassandra. You know cassandra has been very good to us.

A

We haven't put a lot of energy into it and it's repaid that by just working, but we're probably at a point now we need to be a little bit more proactive. We need to kind of be a bit more. You know, searching out these things and fixing them before they hurt us.

A

So finally, the management perspective- and this is kind of the one that I think is- is kind of was the biggest eye-opener for me when I was preparing this presentation, it's not something as a developer. That I've really thought about before the presentation and the the management perspective is that this is. This is a quote from one of our from our vp operations and he was kind of saying the days of the quick and dirty or over, and his point really was that that technically our management believe that cassandra is a perfectly fine solution.

A

You know technically very good. They can understand why it's attractive as a solution for developers, but they do have concerns the management do have concerns about our adoption of Cassandra and I. Think that boils down to kind of two main areas.

A

So the first main area where is really about there is a concern that we're putting a lot of data into a datastore and we can't get it back out and that's the perception from management so that I'm not saying I, don't necessarily think that's true. There are. There are ways of query and cassandra in a kind of a no ad hoc. You could run if you're running DSC. You can just run hive.

A

If you want to bake it yourself, you can run Hadoop I've, but the perception amongst management is that we can't as a company and that we've we've chosen this technology that basically mean that we can't query and I guess. As developers we focused on the operational side, we focused on we're a taxi app. You need to get a taxi. You know we're going to focus on that use case and we haven't necessarily considered them. Management use case of you know.

A

I want to get some data out, and what that's meant is that when we were all in one room on our boat on the Thames management, could pretty much go up to any member of the team and say hey: how many did it added order and then they'd be able to type a quick, SQL query and answer the question with Cassandra? They they can't do that. The number of people who are able to accomplish that diminishes.

A

There is a kind of caveat to this, which is that actually, our relational data is at the point now, where you can't really do that either because we've got we've got a lot of data still in relational data stores, we're slowly migrating and actually that data. You know into the point now where you can quite easily log tables and take make problems for production by running queries against it, the sorts of things that we would have been able to do a while.

A

So the second point is, the second kind of concern is: is: are we causing ourselves a big data problem without really thinking about it? So this is for anyone who recognizes this.

A

This is a map of London, and this is drawn simply by plotting points that drivers of centers, so part of our operation is that as drivers drive around, they send us their location updates and we use that to allocate the nearest taxi driver and we store all of those points that have ever been sent in Cassandra and we've got somewhere between I I, really don't know how many somewhere between five and ten billion would be.

A

My guess- and this is like a very small slice of those points just in one city plotted on and plotted to draw a map and I guess the management question is: are we causing our solve the problem?

A

Is there a business value in storing this data and that kind of leads back to my point earlier about that stats cluster, where we got up to the point where we will have one point: five terabytes per node in a 12 node cluster, and it's it's costing us money to store that and the question is: are we getting a? Are we getting business value for storing that data, or are we doing it for no reason and I?

A

Think from the development background, there's a danger of you: do it because you can and cassandra gives you a tool that will just do it. You can keep adding nodes. You can keep stirring as much data as you want and that's fantastic. But the question is the question from management is: is there a reason? Is there a business reason for doing so?

A

So what the lessons learned from this well I think we could have done a better job at keeping the business informed, pre-launch and just after launch worried, we were tasked with increasing your resilience of our platform and that's one of the other reasons we chose Cassandra, so it ticked those boxes, but what it perhaps had is we were making we're making decisions that had trade-offs.

A

So we were trading off kind of the ability to query easily for this increase resilience, and perhaps we didn't communicate those as clearly as we could have done to our management team. So that's one thing that we probably could have done better.

A

Another interesting point is singing from the same hymn sheet: I, don't know whether that's whether American people understand that for a turn of phrase, but but basically we have one of our very senior engineers in the company, one of the founding engineers he wasn't 100% sold on Cassandra. He was he was. He was unsure that he had bought. You know the advanced trees were going to get from that outweigh the disadvantages and I think we've just proceeded anyway, really without getting buy-in from him and then, when business concerns would surface.

A

That kind of lack of consistency within the development team would potentially exacerbate the problem and probably what we should have done. This is probably me. I should have made more of an effort up front to get all the people on board. You know amongst the development team and I. Don't think that would have been that hard if I'd have actually put the energy into doing so. It's just that I didn't and then finally kind of provide solutions. I think we should have.

A

We should have invested time and effort up front in in providing fundamentally an ad-hoc query interface to Cassandra and I. Think that would have would have kind of headed off a lot of the perception from management that this thing's not quibble and I. Don't think I would have been that hard to do so. We should have. We should have sort of provided those solutions probably from day one and what we could then have probably done is turn the kind of graph from earlier into.

A

Something looks a bit more like this, where we're saying you know pretty much everyone, if you can query sequel, you be able to ruxandra, so that's um that Sonny will be looking to do so to kind of wrap up.

A

Halo, we really like Cassandra. We like the solid design principles that it's founded on and the fact that it's designed to be distributed from day. One I think that's a really important point. We like the hecho characteristics and the easy multi DC set up their kind of the two killer features for us. We don't have an enormous volume of data.

A

We don't have an enormous volume of read and write requests, particularly, but what we do have is a need to run on three continents and a need to run something that is going to be very resilient and reliable, and then the simplicity of operation I think it's easy to overlook that. But Cassandra faraz has been very, very easy to operate. We haven't really had to put any energy in at all, perhaps to a detriment, and it's kind of you know that's the long term. That's the cost.

A

You are paying every week, you're you're, you know you put database in you've, got to maintain it. You've got to operate it for years and that simplicity, the fact that all the nodes are the same, a modulus cluster. There are many moving parts that makes life a lot easier for the ops team for successful adoption. I think it kind of boils down to two, not many things.

A

It boils down to having someone internally he's going to kind of sell a dream and get everyone on board get the developer, who is in one hundred percent sure convince them up from get everyone to learn the fundamentals you know when people join the team have a way of teaching them about Cassandra before you throw them in the deep end. You know: stop them shooting themselves in the foot.

A

Investing tools, that's something we should have done. I think that's an important point that sander adoption developers are used to. You know when you're building your software, you can just kind of you know, execute queries to get stuff out to see how it's running, to debug it and with Cassandra it's it would have been useful for ours.

A

I think to have those tools up front, a kind of Hadoop integration to be able to do batch, analytic queries and then finally, keep management in the loop make sure you explain the trade-offs of the decisions you're making if you're adopting an our SQL store. It's not all going to be positives. You know every decision you make is going to have trade-offs so just make those clear upfront say we're making this decision for the right reasons, but these are the things we're going to be giving up.

A

You know potentially are more widely used and people who have experience with in terms of SQL we're going to be giving that up to get these other things so the future for halo and we're going to continue to invest in Cassandra's we expand globally. We've got big plans to launch you know hundreds of cities and next year we're going to hire some people I think to to look at Sandra, specifically I.

A

Think we're probably at the point in our in our kind of you know business now where we need someone who is an expert in our company. You know, as we start to rely on Cassandra more and more as it becomes our primary data store, we're going to want to have those those skills in house, so we'll probably start to recruit that person soon we're going to focus on expanding our reporting facilities.

A

This comes down to kind of really the bat, the batch analytic side of things, giving people about those tools to be able to answer those questions quickly and easily and then, finally, in terms of the business and halo I've got aspirations to extend our network, so a network of 1 million consumer installs. With this kind of virtual wallet, we want to extend that beyond calves. Calves is like the step, one we're going to move into other areas and we're going to continue to hire the best engineers in london, NYC and asia.

A

So that's my talk. Thank you very much.

B

C

Hi I was wondering you mentioned. You didn't, have any plan right now for doing analytics, but there are several ways that maybe you were thinking about. Can you expand on that sure.

A

So the question was about the analytic side. I think what, with the main thing we're thinking, is that we need to be able to execute arbitrary queries against Cassandra, datastore and that'll probably take the form of Hadoop. So we've already got we're already using hakuna, which gives us the kind of real-time business-focused analytics for things like in app. So when a driver might look at their staffing out within our driver app and that's powered already, we've got that sorted.

A

What we need is the ad hoc tool, the ability for someone to just to you know how many, how many of this do we have where this that and the other, and really the two approaches as I see it are either we go down the route of of kind of a home baked solution, something like you know, trying to integrate hive or pig with Cassandra ourselves, so fundamentally Hadoop or we could just buy DSC and use that so I, don't know which of those two we're going to do yet, but will it be evaluating the kind of those two, those two ways of working I think.

A

No yeah, that's a good point. A nice loaded question there so yeah. The question was about how you would arrange the topology and the way we would do is we would have another datacenter so where we've currently got 31 in Europe, America and Asia we'd probably have another one in London and we'd replicate probably just one replica to that data center. So we have a complete copy of all the data in one data center and that would be used solely for reporting and analytics.

A

So what that means is that, when we're running these batch jobs that got very different kind of read and write workload to you know the operational side of things, we're not going to be impacting performance, so that's kind of the way that the generally accepted way of doing it and we'll be looking at going down that route as well.

D

Just worth saying in a few weeks, you'll be able to do keep the ad hoc queries in the inner cooney in a sequel like fashion the next release, so that will give you a bit more flexibility, ok,.

A

So now we've got three options to to test.

A

B

Two questions: first, one is about the both, but backup was what what was the requirement for backing up the essence tables? Was it a management thing of? You must keep all the data just in case something happens, and how do you? How do you test recovery of of the EBS snapshots? Ok,.

A

So the good questions about backup so I think I think there is some nervousness amongst management about Cassandra I. Think that's one of the main reasons we running on EBS, rather than ephemeral. There's that kind of that potentially irrational, fear that if all of our nodes stopped, if we stopped all of our nodes in AWS and then brought them back up, we'd have no data left.

A

If we want a femoral, which is a slightly absurd view, but but that's kind of one li driving force between EBS in terms of the backups I guess near disaster recovery. What if we introduce corruption? What if we accidentally delete all of our records? So we we wanted to keep kind of historical snapshots really in time, so that we could go back a week a month, I'm with it with EBS. We can do that.

A

It I think our backup solution is relatively caveman, I'm thinking something it's one of those areas that I think we need to improve upon, but yeah that the plan of the moment is to is to it's just snap, shot all the SS tables and copy and copy them somewhere in eps in terms of recovering at the moment, we don't really have a need solution for that, so I know Netflix are always you know that they test it affects, do all the time because they kind of use their backups as they're reporting platform as well.

A

So they kind of use the backups to restore a new cluster to do reporting on we've done two exercises. You know in in kind of a year halo where we've kind of said right. Let's, let's test this, let's see if we can actually recover data and both of those exercises were positive. Eg we were actually able to. You know fire up a brand new cluster from the backups, but it was.

A

It was a very time-consuming process to go through that, which is why I've only done it twice and I guess there's there's a sort of hope that you know that it'll keep. If we did it today, it would work, but we don't actually know that. So I guess it would be nice to have something kind of automatic, perhaps that you could sort of press a button and verify that it's all working.

C

Any other questions.

E

How do you plan about going to to balance your cluster, so you showed that some of those notes for more use than other. So what would be a typical approach for that.

A

So the reason the reason that stats cluster was so imbalanced was partly because I think some of the compression hasn't fully taken effect. So so there's kind of some of the nose home haven't quite got around to that yeah. So the imbalance in our cluster isn't to do with that data model.

A

It's not like we're, storing everything you know under one row, some people you every now and again you find someone who's kind of got a data model that stores everything under three rows and they have a 1000 cluster and they wonder why it's not very well balanced, so we don't really have that what we have is kind of a I effectively. All we really need to do is is finish. The compression make sure that's fully fully done and then run repair, and you know make sure all these inconsistency.

A

So all the date will be streamed from nodes where it needs to to make sure all the nodes have all the data they need and theoretically, once we've done, then it should look a bit more like our other cluster, which is pretty pretty well balanced. So it's pretty it's basically. Just run repair pretty much.

A

Okay, well, you can grab me after if you, if you want a chat about halo and thanks very much.