Apache Cassandra Cassandra Summit Europe 2014, 27 Dec 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: RedHat: Scalable Geospatial Indexing with Cassandra

Description

Speakers: Jonathan Halliday, Software Engineer at RedHat & Rebecca Simmonds, Research Associate at Newcastle University

From sensor networks to social media analytics, the handling of geospatial data is increasingly important to many applications. In this session we discuss the challenges of indexing events by location and illustrate how to model spatial indexes using new features in CQL3.

A

Hi, my name's Rebecca Simmons and I'm. Currently a research associate at Newcastle University finished off. My PhD I need to talk to you about scalable geospatial indexes in cassandra with my colleague Jonathan from Red Hat. So I want to talk to you first about a few of the things I've done so I've been working on a system to analyze social media, specifically twittered a scalable a using cassandra at the back end.

A

So the idea is that we have this real-time analysis on the top, but then cassandra underneath to do this historic analysis so that we can get queries back in near-real-time to be able to combine the deer I also involved in some research projects, with a research project called side which is social inclusion. Inclusion in the digital economy- and here are you- do some cross-disciplinary projects. So I've worked with geographers and psychology students to combine the system that I'm actually working on with being able to analyze the data for real use cases.

A

So we started off the project by looking at different metadata of tweets, so things like hash tag, ed and the timestamps. We also obviously looked at geo located there now working with drug Affairs. It was obvious that and geo-located there was gonna, be quite a hot topic between us, both and that's we're focused on it. More also jennifer was quite interested in geospatial data. So that's how we all ended up working together. So we've been working on a project that looks a collection over six weeks and it collected tweets around an event called Idaho.

A

We ended up collecting ten point, five million tweets and the problem was being able to visualize this there four different disciplines to actually analyze it. So obviously starting to think about this. Each tweet has a geolocation, so you've got a lot of shade and a longitude, so it's already two-dimensional and quite hard to actually deal with. If you think about starving this as a single point, this is no use at all.

A

You can't go through every single point in the database checking whether it's the correct one, so this led me to think about bounding boxes and basically dividing the geographical area into different bounding boxes and then using the points and that bounding box as a key. However, if you're going through each bounding box, once again, it's taking too long too many searches- it's not in near-real-time and it'll, be quite difficult to bring it back a lot of data as well.

A

So this led us on to using a different structure in memory to be able to cut down the searches that we were doing so we've used a tree structure and so there's different ways. We can use this and Jonathan will talk about this. A bit more I'm just here for the demo, and so basically I want to now show you the demo, as you can see, on the screen through the web browser we're on Wi-Fi, so I'm, hoping that it works and, as you can see, it's a map and I've zoomed in to London.

A

Each of the numbers shows a cluster of tweets so that you so there's the area of that one for cluster of tweets. These zoom in should go down to individual markers so that the user can look at each tweet. The information provided and be able to analyze that what we've done with this demo is we've also added in another dimension and we're looking at time as well. So not only returned all of the markers for this geospatial bounding box. We've also used time in it.

A

So if you click on this button, we get a lovely graph that shows you the six weeks of data that this area has collected. It shows you, the rear, perder and the job. Roofers were particularly interested in that they'd like to look at different areas and seeing whether people are actually tweeting and especially more deprived areas that might not have the money to get the technology they'd like to see whether people where and when we analyzed this sort of data, a Newcastle.

A

It turns out that people were actually sacrificing quite a lot just to be able to post tweets. So that was quite interesting, and so then, if you click on one of the buyers, so this is 19 to the fifth, then changes all the day and for just that there. So that's all of the data that there within that a bounding box. You press the player button. You get a nice sweep through the day of each day as the bars hurry on going it's adding more and more as time goes.

A

So the hope is that we can use this for topics as well and been able to see the spread of information. So using this technique and using the own brains is the system in memory that Jonathan will talk about. We've actually managed to get the responses so quick that when I was struggling with put in the markers on in the browser- and that seems to be the bottleneck is a minute. So I'll pass you over to Jonathan now and.

B

He'll owe me a moment to wrestle with the technology here.

B

Yes, that looks good alrights, hello, I'm, the Red Hat half of this partnership, so how many of you are here because you've got some kind of problem like the one we just saw where you've got a vast amount of data, and some part of that is a special element, some kind of location information, and you need to index it. How many people does not describe a good chunk of the audience?

B

What never for the rest of you do you, okay, so they the case I wanted to solve was that more and more of the data that's coming in has some kind of spatial in open to it, particularly now we're seeing data coming in from sensor networks. Anything and everything can be instrumented. You already are because you're carrying around the mobile phone- that's got location, sensors in it and we want to be able to query this data efficiently, so the dimensions of who did what and when they did it. Those are fairly easy.

B

They're, linear things so already built into cassandra is the ability to to handle that kind of data. Anything that can be put at a align is easy. Locations are more difficult because they're inherently multi-dimensional. So in the simple case, a location is just two dimensions: excellent. Why and a geolocation is differentiated from a general location just by being expressed in terms of latitude and longitude.

B

You can add a an extra dimension if you want to to capture height, although many applications don't need it and as we'll see, you can also add an extra dimension to the same index to capture things like time as we saw in the demo.

B

So there's an increasing need for this kind of thing and I wanted to look at whether cassandra was going to be a good platform for that kind of indexing. So, historically, there's been this view that cassandra is essentially a huge index construction kit. It's a toolkit for building exactly this kind of thing. So the first glance that should be the a very good tool for this kind of job indexing. The built-in types is pretty much done for you so column. Families are inherently distributed index organized tables, so they just work great.

B

It was recognized that full denormalization is not always the way to go since version 0.7, which is a while back now, we've had secondary indexes and it's become known, this kind of a sweet spot for those you don't want doing next things that have very low, dimensionality or very high dimensionality, but that still leaves you a lot of wiggle room in between where secondary indexes are a great tool.

B

The problem is that Cassandra doesn't really understand anything about spatial types or relationships, so we're going to be able to reuse some of this functionality, but we're going to have to add on some new functionality as well to be able to handle index spatial data.

B

Fortunately, that's exactly the kind of thing I like doing I like open platforms that are toolkits with extension points. So, let's look at what's already there. That might be useful for this kind of problem seeker seems to be the way forward. How many of you are still using thrift. Just out of curiosity, know a few okay, the early adopters, a kind of saddled with it. They went into production, early and systems, sort of very stable and work fine and keep using v, but people who are not yet in production is still looking at it.

B

I think CQ Elle's, the the starting point. These days, user-defined types mentioned briefly earlier today, they're already there in 2.1 a synchronous queries are some of the one I want to touch on. If, like me, you come from a jdbc background as a programmer, you used to the database access API, that's inherently synchronous, but using an asynchronous API brings you certain advantages, particularly with the distributed nature of Cassandra.

B

So that's what we have now as building blocks, but also looking forward to new things coming in three, so Jonathan Alice's hinted at a couple of these and if you're in this room for the previous session, you saw some of the user-defined functions. Work that's going on distributed indexes. Also, look very similar at the table spoke this code inside Cassandra to keep them in sync.

B

So what can we? What can we use from these suite of building blocks to to allow us to model special types, easily user-defined types great way to extend the type system inside the database? So you can define new types in this case, we'll start by defining a point which is simply an X and y coordinate, and if it's geospatial data that x and y would be latitude and longitude just to make life difficult by convention that no longer the other way around you put the y coordinate.

B

First, don't ask so user-defined types work well with the Cassandra toolchain. Historically, if you've wanted to do this kind of thing, your only option was to do the serialization in the client use protobuf or whatever your preferred format is and then you're. Essentially, writing a blog to the database and the database can't look inside that blog. It has no native understanding of what the content is. So that's a bit of a pain during the development cycle.

B

You can do things like query it in cql, but what you get back is just a bike buffer and, if you're doing that in CQ, LSH renders it as a hex string and it's kind of meaningless. So what you'll notice is your transition to using this new feature? Is that suddenly the tool chain helps you out much more? If you do a query in CQ LSH against a type that'll render it in more or less JSON format for you in the shell, so it makes debugging a lot easier.

B

You can key on them, so this is similar to what you would once have done by concatenating together several columns to share your key, but the database will just handle it for you now. Don't ask me why they introduced this frozen term. Apparently, it's not AG marketing tie it with Disney. It's something to do with keeping the type stable in future versions. So schema evolution is going to be an issue here, I think for for simple types that aren't likely to evolve like a point you're, probably on safe ground.

B

It's less clear to me frankly how this is going to work with complex types like mapping entire JSON documents into a type in the database, where you need to evolve the schema of time, but right now that's a great start for us. We can define points and work with them natively in the database, but points only really allow us to do exact, match lookup. So if events all happening at exactly the same point or group of points, that's fine, but events are kind of messy and happen all over the place.

B

So we need a new construct. We need an area or a bounding box, so I've chosen to express that as a pair of points to diagonally, the first corners of the box, there's other ways to represent boxes, of course, but that one's the the simplest- and this just shows that you can nest these types now in the database. So you can have arbitrarily complex structures built up with other types and of course you can include primitives in inside these types as well.

B

So that gives us the the two basic constructs we need in the database. Now, how do we actually build an index out of these? Well, what we're going to do is we're going to take our coordinate plane whatever. That is, if it's the surface of the earth, for example, we've got latitude and longitude, and that gives us essentially agreed. We can break this down into so we're going to divide up a surface into what essentially is a piece of graph paper us piece of square paper.

B

Little regions grids all the same size cells, so any event happens inside one of these regions according to its latitude and longitude.

B

So this allows us to take a vast number of points and map them down into a smaller finite number of buckets or regions. So our index is going to look like region, that's the the container for the events, the the specific location, the point where they actually happens within that region and then some bit of data. So if we're cheating, this is a completely organized index organized table, the data will be a serialization of our entire event structure, but it could equally well just be a UUID key into another table.

B

So this is the the fundamental building block. What we've got here is a distributed index essentially, so any particular event is going to be assigned to region according to where it happened, and that region in turn is a primary key. So it's assigned to some part of the Cassandra ring and physical node in the ring.

B

So then we have, we have all this data inside Cassandra. We need to be able to search it so for many applications like the demo we saw earlier. What you want is a visualization, so you've got a particular area of interest. The bounding box that defines the the area in which you want to retrieve events, and you need to find out what the correspondent Supreme that in the underlying grid, you've got in the indexes.

B

So you need some kind of function in your client that is just going to calculate which cells in the index, which regions in the index overlap this bounding box of interest, and that's going to give you back a list of regions and then you perform an inquiry against Cassandra. So this is just a query where you say give me back all the data from my index as long as the condition holds the region for that event matches one of the regions of interest in my list year.

B

The conventional way to do this is with a single statement, as you see there on the screen and that fans out from the coordinator so client, a coordinator is a single statement and then the coordinator breaks it down into pieces and sends it to each of the nodes that owns one of these regions. So potentially that's up to the size of number of regions. You're querying, that's the number of nodes you hit, so you get some degree of parallelism there.

B

What we found when you tried that is that there's quite a lot of latency involved in that process, while relatively speaking, Cassandra's still very quick, but not quite as quick as I wanted.

B

So one of the techniques we got quite interested in was decomposing this on the client side instead of flattening the coordinator, do it so if we take this in clause and instead of sending it direct to the database to the cql pass from the coordinator, we write in the client a number of selects, each of which is an exact match for one region instead of a list of regions, and then we give that to the driver.

B

The driver is smart enough to know the ring to apology and to send each of those queries directly to the node that owns the data. The coordinator becomes the data owning node for each of these, so that eliminates an extra network hop.

B

It also means if we can fire these off in parallel through the asynchronous API in the driver that we start getting results back much sooner, because one of these regions will respond very quickly, for example, if it has a few items in it or if it's on a particularly powerful or likely load a node that allows the client to immediately start dealing with that result set.

B

Instead of having to wait for the coordinator to assemble all the results from all the nodes, where obviously a latency is bounded by the slowest node, so we saw here a great reduction in tail latency. Unfortunately, we also saw an explosion in the amount of code we had to write in the application. There is an asynchronous API and there's a fairly good blog on the data stacks website about how to use it if you're using the Java driver, but it does involve some additional plumbing which, in my view, really understanding the driver.

B

It's just not there yet. So this is one of those cases where you start tuning. You can write the basic app very quickly, I got something running, and if the latency is not quite where you want it, this is the kind of technique you can do to to optimize it a bit.

B

So that gives us a result set, but it's not an exact match for what we want, and the problem here is that the viewing area, the bounding box, that we're querying so that's what the users got on their screen doesn't always conveniently, indeed, almost always doesn't line up with the index grid.

B

So we need to do some further filtering here. We need a an operation, that's going to say, yeah we've. We know it's in the right region, but part of that region overlaps, the area of interest and part of it doesn't so we need to to filter a bit further. We need this contains operation and ideally we'd like to do this on the server side, because that saves us having to move all this data to the client and also we've got much more processing resource on the server side.

B

We've got a parallel cluster there, whereas the client is a single node and it would have to do all the work. So, theoretically, we should get much better performance if we can push this to the server there's no native way to do it. Yes, if you were here in the previous session, you would have seen a preview of functionality. That's coming down the line to do this using user-defined functions.

B

It's not there yet, but I'm kind of impatient, so I started poking around to to try and figure out how to do this in the existing code base, and it turns out there's an extension point called query handler. Has anyone actually played of this was even aware of it? Oh yes, yeah! Okay! Well, not you! It's about what I expected! It's not been well publicized, but it's very powerful.

B

It allows you to intercept the cql string, that's simple, the coordinator and essentially rewrite it, so you can completely replace cql if you like, but what I wanted to do was simply extend CQL with a new keyword, a new method effectively. That would allow me to to pass the filtering parameters down to the the node. That's handling the the coordination, the CTL passing, which, because I'm using the the asynchronous API in decomposing on the client side, is also the data roaming mode.

B

So that involves a fair bit of work. Currently, hopefully, it'll be much much less work once we have user defined functions and version 3. But again it's worth it. If you have this kind of performance issue, this kind of problem where you want to to do some statistical filtering, and you can't currently express that in CTL, it's well worth looking at what you can do to to move that to the server, because it will bring down your tail latency substantially.

B

The reason there's such a big range here from a very modest improvement food or quite substantial one- is to do with a resolution of the grid. So if you have a grid where the the cells so roughly the same size as the area of interest, you're querying against that's great, but in some cases you can have cells in the grid that are much much larger, and in that case a lot of them doesn't overlap. So you can eliminate quite a lot of data at the server side.

B

With this server-side filtering technique and in those cases you see a much bigger improvement. So what you're really measuring is the the proportion of data that you can eliminate at the server and, of course, once you have this technique. It you're not just limited anymore to using the contains method to test whether the point is inside the area of interest or not. You could push a dish all filters down to the server, so you could push a filter that it was based on some other characteristic of the event, such as the user or the time.

B

So we see that there's a huge range of results. We can get there according to what the grid resolution is so clearly. This is a critical tuning parameter for this kind of application, and the fundamental trade-off you're looking at is that the number of grid cells determines really your parallelism because you're dealing with wide rows on the database. Your partition key is the grid cell.

B

So essentially, each disk sink. Each unit of disk scanning is the grid cell. So more cells is better. You've got a lot of irrelevant data in big cells. If you make the cells smaller, you've got less date at the filter out and therefore it's quicker yeah you're reading less off disk than reading data off disk and then just discarding. It is kind of a waste, but at the same time you want to keep the number of disk seats down.

B

You don't want a huge number of cells because your disk head is going to be jumping around all over the place and that's gonna slow you down a lot. It's typically the bottleneck in Cassandra read operations. So from that point of view, well, fewer cells is better. You stall one enough to keep all your disk spindles busy, but you don't want really more than that. You don't want to be doing more than one C per disk if you can help it in a query.

B

So what you'll notice is you try to build? This kind of application is that some of them have a natural resolution that suits it quite well, but more often than not, it's query dependent as with so many things in Cassandra. So again, denormalization is the way to go. Cassandra is built on this idea that you should have a column family that suits each of your queries that structures the data in a way that suits the the particular application or sub application.

B

Query that's going to hit that table so simply build indexes that have different grid sizes. If you want an application where the user can zoom in on a map, you've got maybe half a dozen different zoom levels simply build an index for each one of them.

B

Caveat to that. If you have a vast amount of data, as so many calendar applications, do what you'll find is that the the coarse-grained ones, the ones where you zoomed all the way out and can see the entire surface of the earth very, very cluttered, visualization almost useless.

B

So there look at ways you can pre aggregate the data, so I was on a property website the other day, for example, looking at new houses and one of the things it does in cluttered city center locations is, if there's an apartment, building it pre aggregates all the places for sale in that building into a single marker on the map. So it's a two-level aggregation.

B

You can see on the map which buildings have apartments for sale and then you click on one and it'll bring up a separate tab with the list of property, so that kind of technique can help reduce the clutter in your visualization and it reduces the number of items in the index as well. You've effectively got a two-level index between things that are rendered spatially and then things that are tabulated separately.

B

The other reason you don't want to make those grid cells too. Big, is for the right world if you have a grid, that spans say big city and you're, doing social media analytics a dense population center, like that's, going to generate a lot of data, and that's all going to go to one physical node in the ring and that's going to cause problems. So you want to sometimes break down busy cells like that into smaller sub cells, to help spread the load across the cluster and avoid hotspots.

B

So this skew is a huge problem for geospatial applications, particularly anything that's user generated data, just think about the cities versus the oceans. You've got huge tracts of the Earth's surface, where, basically, nothing happens in social media terms, because there aren't any people there and then you've got cities like London, New York, where a lot happens so fixed size grid or load suits visualizations. Quite well. It's it's the right technique to use if you're, rendering things on a map at a particular resolution. It's not always the best structure for the reasons so sometimes tying together.

B

These different resolution grids into a tree structures worth considering. So if you've studied computer science, your idea of a tree is probably based on the bee tree. Each parent note has two child nodes and they're balanced, and the reason they're balanced is that the performance of the tree is proportional to its depth.

B

The kind of tree structures we use for geospatial data differ a little bit first of all, you're going to divide the tree based on this nested grid, so your top-level grid is going to be well, let's use geo, hashing, just 32 cells and each of those is going to have 32 sub cells and so on so you're not dealing with necessarily trees of two children. It's an R between the move, children, according to how you want to build your grid, but also they're, no longer balanced in areas where you have a lot of data.

B

You want a deep tree to help split that data amongst several nodes and in areas where next to nothing happens, you can afford to keep it in big lumps because there's relatively little of it. So this is going to add some complexity to your application. It's one of those things you do at the end. If your performance isn't where you want it to be, it's an optimization technique you bring in because it does add a lot of extra work to the code and the reason for that is the tree splits.

B

So you have a node, you start off, it's empty, you fill it up with data and at some point you say: ok, this one's getting too big. Now it represents a geographic area of a big city and I've got lots and lots of data for it. I want to split it into 2 or n sub cells. So in the KD tree you're splitting two ways: the quad tree four ways to do hash. Thirty-Two. Take your pick, there's not a lot to choose in performance terms, but to do that split because the structure is distributed.

B

You need some kind of walking or caching mechanism, and this is where the extra complexity comes and bites you you can do distributive, locking it tends to be rather slow. It's not my preferred method. My applications tend to be able to tolerate a little bit of temporary inconsistency, so I go for a caching solution where I keep the the tree structure in the client.

B

So when a split happens, there's a period of time during which the client has a stale cache, because it only sees the parent mode until it refresh is its cache, and it's gonna keep writing to that. Parent, open query information, but meanwhile, on the clients that have different cache, timeouts might already have started. Writing new data to the child nodes and that's a problem because my clients not gonna, see them. It's not looking at them.

B

So I can actually solve that one using a vertical slice through the tree, because I've got all this lovely parallelism, I can use asynchronous queries in the API. I can simply query all the different resolution grids. So I can query the the parent and potentially the children, and if the children don't exist yet cuz, there's no need to create them.

B

Cassandra will simply return an empty result set and that's very cheap, because the bloom filters will just say well nothing there, because we're keying on the the grid cell is the the partition key. The bloom filters help us out a lot there. So the read side is not too bad. We can just pay a small extra CPU and network cost to to have Cassandra handle that using the bloom filters.

B

The problem is the writes, because you need to write to the correct depth in the tree and if some of the node has created a splice and created child nodes and your cache is stale you're still writing the poem you don't want to check each time for the structure of the tree. That's very, very expensive, read before write is considered an anti-pattern in cassandra, so you want to avoid that, if possible and there's various ways to do that.

B

If you happen to have some kind of event, bus setup already to your clients, you can simply send out a notification that the tree structure has changed. I did instead by polling, so I've got a cache and client I write to prepare it node until the cache, timeout and then I go and refresh my tree structure from the database and start writing to the children if they exist and because of this, and because the reads are going through a vertical slice in the tree.

B

Other nodes, even with the stale cache, still see my reads immediately: it's just that I'm not writing into the optimal location until the cache turns out.

B

So, as you can tell it's it's kind of hairy, it's it's complicated to code, but it does work fine place to the eventual consistency, the strengths of Cassandra, this index, design, whew, okay, last topic, cloud no presentation is complete without some mention of cloud, but this one's rather obscure. This is a ship called the flying cloud. Does anyone know what she's famous for?

B

No? It doesn't think so, nor did I until a little while ago, so the flying cloud was a sailing clipper in the 1850s and she became famous for setting the speed record from New York to San Francisco, which in those days meant going all the way down one side across k board and all the way up. The other side took an average of 200 days, not exactly a pleasure cruise flying crab.

B

Did it in 89 days less than half the average time, and this is relevant, I promise it's relevant because the way she did this was based on the work of Matthew Mora. He was sailing, although not on the ship. He worked in the tempo of charts and instruments for the US Navy and what's now the Naval Observatory, and they were essentially the IT department of their day. They looked after all.

B

The data from the Navy looked after its clocks, its kilometers, looked after its charts, but also was the repository of logbooks from all the voyages, so Matthew Maule. We now call the data scientist. He went through all these logbooks and he aggregated all the observations of wind and weather and estate is actually still available. Today. Here's a visualization I found him. We the web. That's. This is great example of skew.

B

By the way the continents are completely black because there's no sailing data for them, obviously, and the oceans are populated with with data missus voyages of the US Navy from about two decades.

B

So he published this as a kind of Almanac, a book that allowed navigators for the first time to look up a particular route and indeed a particular time of year and say I'm going from A to B at this time of year, where the winds gonna be strongest where the cones gonna be in my favor and suddenly, instead of relying on word of mouth and their own experience, they had empirical data in handy reference form and that's what allowed Flying Cloud and the successors to cut the journey time so much.

B

It's a great example of a game-changing use of data analytics long before Cassandra and their dupe and all the the modern tools we take for granted. This was all done with pen and paper, so I, rather like that story. I, like the idea that data analytics can can radically change the domains that's operating over, so quick wrap-up and then I'll. Let you go to lunch. I promise.

B

In my view, the features we have now cql the forthcoming features, particularly in terms of distributed indexes global indexes, can make a sound draw. A very good platform for geospatial indexing. Getting the performance does still take a little bit of work. You might be lucky, you might not need it.

B

Do the simplest thing that you think will work and then start adding the the complexity piece at a time until you get it to perform well and keep in mind what Matthew Moore ID'd, he built an adapter analytics app that enabled and the Flying Cloud said a record stood for actually a hundred and thirty six years. So if you got anywhere close to that you're doing really well, if you manage to build something that gets you remembered at a conference next century great, alright, that's it for me. Thank you very much.

B

Have you got any questions.

A

B

Jr. has it is essentially a tree in that you can have as many characters as you like you need and you're saying. Would you go straight to the the narrowest resolution and.

B

It does depend on your scale. The the tree structure is essentially optimization. That is necessary where you have a vast amount of data in very high skew.

B

So if you're in more modest territory, the kind of thing that's just barely bigger than what you would do with the relational database, if you're using to sound just say because you want the availability characteristics or the the replication characteristics rather than the brute scale characteristics, then, yes, you can get away with a simpler model of just denormalizing, with the exception of obviously the root level where it gets very, very hot spot, and you probably don't want to denormalize for a good, that's for maybe even in 30.

B

You know the root level of geohashes 32 cells. That's you might get away with it, but you know 132 that the world service might unfortunately, cover two or three big cities, and then you in trouble.

B

If you have a vast amount of data, if you're trying to index say the entire tweet firehose and by some miracle at all, geo-referenced isn't currently at I mean proportion of it is. You know we're a few years away from all of it having geolocation data on at that point, yeah, you probably do need the tree structure, but you know come back next year, we'll we'll have a new suite of tools in Cassandra to do more advanced versions, and they still get easier over time right now, it's it's early days. You have to build more.

B

The building blocks yourself, but more and more, this will get baked into the engine. It'll become simpler.

B

Not yet I hope to get enough free time to polish it up enough to put it out there with my job, allows for that or not I. Don't ask for it please. If there's people clamoring for it, then yeah that incentivize, my boss to say yeah people want this will give you the time to do it. So.

B

Pushing more of this functionality into the engine? Ok, so there's there's generic things that are not specific to geospatial their usage patterns that work regardless. So the the Java drivers support for parallel query I would I would love to be able to hand it an in query, but also a flag that says decompose this on the client in you know, handle it for me, rather than forcing me to write all this graphic ode with features and yeah. That kind of thing I, think, is a great candidate for pushing into the driver or into the engine.

B

Some of this is already happening. I mean I'm being proactively looking at what you can do with server-side code for a while.

B

If you were here for last year's talk, when I was talking about doing top K calculations, I was using something called introverts, which was a way of taking vertex, which is a an async application container effectively and embedding Cassandra in it to give it some kind of query capability, some kind of server-side processing capability and we're seeing that you know introvert itself didn't really take off, but he proved that there was a need there.

B

A use case for this and that's led to work like all obstructive search showing on server-side functions, so I think more and more. That is coming through already in terms of the geospatial itself, I'm, not convinced how much of that belongs in the engine. It's still a bit specialized.

B

It would be a low priority for the core developers right now. I think that might change over time. As more and more people start saying, oh I've got geospatial data. What can you do to help me, but it might be that so we see some kind of rich client library growing up. So this, if I, have a rope and sauce fish that it might evolve into the de-facto, you know client add-on for doing this kind of work.

C

B

Still paying to read off disk there's just several bits of I/o as the disc read, which obviously is expensive, particularly for spinning discs, there's a network transfer from there to the coordinator and then there's the transfer from the coordinators of the client. Now because I'm using this decomposition pattern in the client, where I I take my in-laws and I break it down into crews that the driver then sends direct to each data.

B

Owning node I've eliminated the hops through the coordinator already, so that network I/o is gone but yeah I'm still pulling things off disk and then throwing some of them away and I'm getting away with that, because I'm still doing what single seek it's a sequential read that I'm then throwing things out. It would be worse if I was doing random hops and then throwing away the result of that.

C

B

Is yeah yeah, so what you? What you see happening sometimes, is that if your grid is a to finer resolution, you have a vast in two thousand and thousands of potential buckets and a lot of them are very, very sparse. Maybe one item- or maybe they don't exist at all now, if they don't exist at all the bounce off the bloom filter, so you don't pay a disk cost for those.

B

You still pay a network cost, but it's cheap if they have very, very little data on then you've degenerated the case where you're sinking all over the place and that's nasty. So you don't want to make the grid to find and that's where having the the structure of the tree. Cached really helps you, because the parts where you need the tree to be deep because there's a lot data in it, it can be, but you don't make it universally deep, because there's lots of it. That would be sparse.

B

You want to keep those bits at a higher granularity. You know course of granularity so that you, you have the minimum number of buckets necessary. Basically, minimum number of disks.

B

No because it's spread out the tree is distributed across the cluster, so the server doesn't have any inherent knowledge of the tree. It knows the individual nodes of the tree because they're the partition keys, but it has no understanding what a tree years or how the bits relate that intelligence is all in the client.

B

There is a pattern in which you make the coordinator smarter, so you make it in charge of the tree. You promoted to being the client of your application. Basically, so, in the same way that in sequel you send the in clause to the coordinator and the coordinator breaks it down, sends it to the data owners, you could do this the same way. You could put the intelligence in the coordinator, send it the geospatial query. Have it understand the tree?

B

That's great, if your clients a short-lived or if they're memory constrained, because they don't need to cache the tree structure anymore, they can get all the performance just by relying on the coordinator to do it personally. I would follow that pattern by having an app server somewhere. That was the client and having dumb clients. That talked, you know rest or something to the app server, which then talked to Cassandra for me, but that's because I'm, a middleware guy and I I do absolutely right. Your mileage may vary depending on your your environment,.

B

Ok, I guess we'll leave it there. Let you go to lunch but come around. If you have no questions.