Apache Cassandra NYC* 2013, 8 Apr 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: NYC* 2013 - "Building a Scalable Time-Series Database with Cassandra" at BlueMountain Capital

Description

Speakers: Jake Luciani and Carl Yeksigian, BlueMountain Capital
SlideShare: http://www.slideshare.net/planetcassandra/nyc-tech-day
This talk will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

A

Hi I'm Carly X, again I'm Jake, luciani, we're from Blue Mountain capital and we're here today to talk about financial time series and how we've stored them using Cassandra 12.

A

So the first thing you know need to know your problem. So right now we have thousands of cousin tumors. They ask data, they want to consume the data as quickly as possible. They want to write the data as quickly as possible.

A

So you know they want consistent reads: they want to be able to handle ad hoc user queries quickly and- and we do have multiple data centers.

A

Alright, so the data that we store is financial time series data, so this can be tick level data. It can be data that the traders actually put in so it could be very sparse, but it's it's time series data, so it looks kind of like this, so you have a bunch of discrete points at Isla at many intervals of time, and you want to be able to store both of these efficiently and you want to be able to query these efficiently. So what are the queries that are our users?

A

Ask us, there's a time series query and there's a cross-section query. So the time series query is basically here's a start. Here's an end, here's a periodicity and so between the start. In the end, at certain intervals, I want data, so here start as 10 a.m. and is 2 p.m. and we want it at every one minute, so the star at the end and the periodicity defines the query.

A

The other type of query is a cross-section, so I give a single point in time and I want to know the data that's stored at that point in time.

A

So in this one there's two different two different data points, so the Microsoft and the Apple price and we're asking for the data as of 11am, so that as of time, is the only component to the query, so cross sections are for random data. We don't know ahead of time what all the components of a cross-section query are going to be. So if we optimized for the cross section, that means that we're storing thousands of rights and we can get inconsistent queries across the rights and we also need by temporality.

A

So that basically means we have one time which is when it happened and one time when we found out about it, but it's hard so we'll ignore it in the query. Okay, so and we have really demanding users, so you know I mean this is this: is a system that we've been working on for six months?

A

We've dedicated a lot of resources to it, and so they want a lot of the system.

A

So we can't optimize for both cases. So, let's optimize for the time series, it's a really simple data model to go put into Cassandra. So this is how it gets stored. Now this is Cassandra 11 how you would represent it. Basically, you have the the ticker. You have the name these two times and then the value so Apple last price for two days ago, as of yesterday or when we found out about yesterday last price.

A

For yesterday, we found out yesterday and last price for today that we found or for yesterday, that we found out today so.

A

We're actually using Cassandra 12, so it gives us a bunch of improvements like we and we use all of these so gives us cql three. It gives us the v nodes, j, bod pool decompression buffers assist EO where parallel compaction off he bloom filters, metrics and current schema creation.

A

So cql three, it's a pretty simple, pretty simple table I mean it was a pretty simple mapping from the data that we want to store to the c ql three that we want to write. So here we have the ID which we store as a binary, the property that would be last price. We sort tix because we have to represent dates somehow, and then we have the value which is just bites.

A

So this is the time series query that we want to write. This doesn't even include periodicity, so it's just start and end and give me all the data in between those and then here's the cross section that we want to write. So given some. As a point, give me the last knowledge time that we know so reading way too much data back. We get every single point between the start.

A

In the end, even if there's you know a million knowledge time, so we're rewriting the data a million times, and we only care about it every one minute. We would get all million points back and we have filter it down to one and we have the knowledge time as well. So we want to be able to to know what we knew or we want to be able.

A

We want to know the last point that we knew about so we're building a service, not a nap, so our users they're running on a huge grid and their applications and they're, hammering us very quickly. So basically they they want to go as quickly as possible and we're the limiting factor. So we have our service on top of Cassandra, so that Olympus is what does the the filtering for us?

A

So here's, the type of filtration that we have to do so. We filter everything by knowledge, time and filters. The time to use queries by periodicity, so Cassandra gives us back a ton, more data that we actually want to use. So we're filtering right now. We filter 200,000 points down to 300 that we actually return.

A

So one thing that that we've been discussing is push down filters, so basically, instead of having the the service layer do filtering, we would push that down into the Cassandra layer. So we do down samples on right. We also would do basically the periodicity would be done on the coordinator so rather than having it go back to service to be done, the coordinator would be able to do it with its local cache and then the values that we're storing aren't all doubles. So storing you know, we want to store blob.

A

Basically so, and some values belong together. Sometimes we have some complex value that we want to store. So we use thrift to do this, so thrift gives us the typed extensible schema and it gives us the union types give us an easy way to deserialize. So this is an example of a thrift Union. So basically the we have just like two int and then getting it back at in a double. So it can be one of those three, but it can't be all three so getting it back out.

A

In c-sharp, which is what we use, is pretty easy because you always get back the type, a type. That is exactly the type that you but you put in, so it would be like a sore type that volume. So, if you're, if you're, storing the volume data alright so now I'm going to turn it over to Jake.

B

Thanks, but that was the easy part right, so you know all this data modeling and customers, consumers, but the really hard part is an enema deuce is using this no scaling. The really hard part is, you know, figuring out. How is this going to support our business requirements? And you know the first rule of scaling that you know everyone always tries that never works is I'm.

B

Just gonna make everything goes fast as possible by creating like the biggest you know, a process and I'm gonna buy the biggest machines and I'm gonna buy the you know the I'm going to set all the settings to you know 200 trillion and I'll never have a problem every single scale and that that never really works. So the real key to just scaling this type of system is, is you know we you have to you have to think about what is the right hardware for this workload?

B

How can we deal with with the jvm which ends up being? You know ninety five percent of the unknowns because for every different workload, you're gonna have all different types of a heap, Hagman, tation and and and all sorts of other problems that that you may not have on on on other workloads. You know so you have to think about. You know how do I read this data? How do I write it? How frequently does it happen and what are the typical queries that I can a tune for?

B

The next thing is you know you have to tune cassandra for your workload? You know Cassandra comes out of the box with with with some really sensible defaults, but you know you can't really go to production with with those default settings, because there's so many factors and cassandra gives you. You know it.

B

The cassandra gamble and the cassandra msh there's a whole bunch of things in there that you should read about and understand and understand what the meaning of those are when it comes to the impact on your system and then finally, this is sort of an outside thing, but this is what the what the olympus service does, which is to prefetch and and and and and cache data for our workload, so that so that we try to anticipate what the consumers are going to ask for, and this is obviously a lot easier for us, because you know a lot of our systems are automated, but but but but it definitely applies to a lot of use cases whoops, I'm back right.

B

So the first rule is, you can't fix what you can't measure and what we do is we. We use this great open source project called reamonn, which, if you guys haven't heard of it, should definitely check it out and what allows you to do is sort of stream. You can push metrics in into Riemann and it's basically a event processing system. Where you can build alerts, you can build aggregation. You can proxy off to systems like graphite and you can build some really.

B

You know you can basically have it beep your cell phone, when you know when, basically when two nodes go down like that, so you can build all sorts of interesting alerts and dashboards just using Riemann.

B

We currently push about 4,000 metrics per second into Riemann and what's cool about Riemann is you can actually use it straight with Cassandra 12, and this was on the data sex dev blog, which we shamelessly stole, but but so Raymond has a Java client and if you drop that Java client into the Cassandra Lib directory- and you basically put this little jar file, if you compile this into a little jar file- and you tell the manifest basically load this before Cassandra starts, then it basically sets up a Riemann reporter to take all of the all of the 12 metrics, which basically tracks all of the Java beans and everything else.

B

That's in inside of Cassandra that's track um and it pushes them out out out to Riemann and what's cool about Riemann is you've, got to kind of got two different views that you can that you kind of get right out of the box. The first is you: can you can push metrics into this dashboard? You can build these sort of real-time dashboards of like what the what the latest metric is right. So you can build these and it uses.

C

B

Sockets, so so it so, it all runs in real time in the browser, and you can configure these these dashboards and save them down and you can track and set up dashboards for all sorts of things. You can see like what the current load is like what the current everything is and then at the same time, Raymond will push everything off to a graphite after it goes through the the streams.

B

So, if you don't configure any streams, you can basically say you know any metric that that has the word Cassandra in it sent to this graphite server and a any metric that that you know that has my app in it send sent to this graphite server side it's in both and, as you can see in graphite, you can sort of so having these two together, you can. You can answer to questions like what's going on right now.

B

You know, there's some giant query running and you know: what's the current cpu, you know I, don't really care about what it looked like 10 minutes ago. I don't have to change that dashboard.

B

I just want to know what's right now and then, when you know someone yells at you and they're like what happened, then you know you have a quick place to go and you can build these dashboards that that take basically any metric and I think I missed and then one of the cool other cool things about Raymond is the fact that it's push space you can you can kind of combine your Cassandra metrics with with your own application, metrics, so Raymond uses photo buffers.

B

There's we wrote a c-sharp driver, there's job drivers, drivers for every language and that the proto buffers support and you can basically write your own little application metrics. So if you're writing application that, like queries, Cassandra does something with it and then you know it sends it back out. You basically build a metric around that task as well and and that gets pushed out alongside of it. So you can build your own dashboards in like one single integrated system, it's really really valuable. It's helped us figure out a lot of issues as we fit them.

B

The other big tool I want to push is visual vm. This is like I, don't know how I didn't know about this being a Java developer, but it's it's. It's the Sun tool that Sun comes with comes with comes with this process called j stat d, and if you start it up, you basically get all these real-time metrics out of out of this tool, so you can basically connect to to any java process, and you can see things like.

B

If you look over on the right, you can actually see the real time garbage collection stats as they're coming in so you can see like you can watch your your.

B

Eden Phillip and then you you can watch it get a pro promoted and then you can watch the DL gent fill up, and then you can watch all of the the garbage collections happening and it gives you real, like the ability to kind of tweak some some garbage collection settings watch. What happens come back and you don't necessarily have to have all them walked out. So it's a really useful tool. You can also profile of the application live.

B

You can do all sorts of really cool things and- and it has plugins built in highly recommended so found actual scaling. So those are kind of the tools that we use to kind of help figure out what's going on and how can we can make it better? The next thing is, you know so, some of our machine setup. You know we're using SSDs for for all the hot data.

B

Currently you know all of our data is hot, but over time you know a seer as yours goes on we're going to be able to move data off off of SSD. You know our j bod config, which means just a bunch of disks. Instead of having like a raid 0, if you have raid 0, then basically, if you lose one disk, then the whole node is gone.

B

If you do like raid 5, then you're losing you know, thirty percent of your of your disk space j-bot is is just the idea that you know you have a bunch of different mount points and you just randomly throw data into random drives. So if one of those drives dies, then all the other drives are still working and Cassandra 12 has this built-in. Now it works really. Well, it's actually helped us a bunch of times. We've lost more than I.

B

Think three or four drives at this point, and and it's basically kept everything up and running and we've only lost your support of the data.

B

Another thing I recommend is: if you're using SSDs II need as many courses you can get your hands on, because it's it's very sick, CPU limited. So you know that the I/o falls away, because you have all these really fast seeks and all of a sudden, the bottleneck become CPU and Cassandra's built to run with a very high concurrency. The the jvm does a great job with it. So I would definitely you know, put as many courses you can get.

B

You know: 10 10 gig network, but banda network cards and jumbo frames, which is the idea that each TCP packet you actually try to shove as much information as you can into each packet yeah, so Jay bonds, a lifesaver I, didn't actually wanted to put some information in here, there's actually from from our hardware vendor that there was a bios update that we didn't know about ahead of time, but they pushed that bio update and basically said this fixes the problem. When you do lots of sequential reads for a long period of time.

B

You know you could end up losing the drive, which is exactly what happens in a compaction right. So we so we all of a sudden. We were doing like compaction, like you know, over over over a weekend and all of a sudden these drives started failing. We were like what's going on, so it turns out. It was this. It was this bug, but but but Jay Bob worked around it. So you know at least it got most of the way through.

B

So we installed the bios update and then everything everything came back and was working, fine, the the black magic of JVM. This is the next trick. I put down our sort of tweaks that we've done so we run a 12 gig heap, which is you know as high as we could. We could push it. The EEG in Eden spaces is 1.6 gigs. The survivor ratio actually shrunk, because you know we didn't really have this promotion problem, so it gives us a little more room for heap.

B

We used compressed scoops, which means if your heap is less than 32 gigs, it doesn't have to use a long for each pointer and then this other thing which we actually just opened a Cassandra ticket on. There's there's this feature which should be default. If you use the server flag, but it's not, but this actually gives us like a 15-percent boost on on reads just by adding this tlab and what it is, is the throw threadlocal allocation, so so in Java when you're creating new objects.

B

If you don't have this turned on and it basically uses one sort of shared system to basically allocate new things onto the heap, but but with this turned on it's done per thread. So you get a lot more throughput things, don't get bogged down and locked so figuration changes. This is like all very detail-oriented wanted to. We wanted to kind of go through and tell like all the little things that we've we've set, because I thing can be pretty useful for everyone, the henna hand off.

B

We set to a single thread versus the default to the hundred kilobyte throttle. This is sort of just to help us figure out. You know when, when it ran with multiple threads and a larger throttle limit, basically there was too much cpu being spent and we couldn't really keep up with with reads and writes the mem table size we set to 2048. This is for a 12 gig heap, so it leaves enough room for the mem tables and we really want to focus which we really tried to.

B

Even though cassandra is well-known for a right heavy system or writes work really well, it is really good for reads you just have to you know, be very careful and and in tune things and the Cassandra community. We're really focused on trying to make this like a top priority. We want it to be as fast as possible to reed's, just as fast as writes on the server side for the thrift service.

B

We use the half cent cafe sink since we have this giant compute cluster come in and hit these nodes like crazy, they kind of hit it and then stay open. So if you have like one thread per request, obviously that's that's not going to scale the compaction we've set to four threads for for multi-threaded compaction. This is a good balance, because we have 16 cores currently and it leaves four cores for compaction. Xand.

B

The rest are available to do reads and writes, and we turn off the the internode compression, which is new because it was causing too much GC, because, basically on each message it gets off of the wire it basically has to decompress into a a buffer and- and since we were running on a 10 gig network like what's the point of having compression everything's going to be fast at least, we haven't hit any limits yet. So this is the point of the talk that Jonathan referred to earlier this morning.

B

Was we have this problem with compaction, but I want to start just by discussing level compaction. I don't know how many of you guys are are aware of leveled and how it works. There's been a bunch of blog posts about and talks about it, but basically what you can.

B

What it allows you to do is sort of limit the number of of SS tables that your data lives in, so the default compaction is called size tiered and what it does is, every time cassandra flushes out a mem table, 2 discs those kind of build up and what size tier does is it takes a random set of SS tables of a similar size and it compacts them together into a larger one, and then once there's four or you know a certain number of those it takes those and it could packs them together into another one.

B

So that's great: it's like it's a nice and simple it's efficient and it multi threads really well. The problem is that you know four wide rows, which is our use case, as you saw earlier in the data model. We end up with this problem. Where we have our row, everything is split out into these.

B

You know huge why bro so over time, you're going to get your data into more and more SS tables, and that wasn't really something that worked well for us, so so level compaction makes sense, because what it does it puts a lower bound on the number of SS tables that you have to read from so this picture shows in leveldb level, compaction comes from leveldb, which comes from google and they'd describe it in this.

B

You know giant comment, and you know it's sort of been adopted by a bunch of systems to kind of to be used, but what it does is it it takes. It creates a certain number of levels and for each one is each level is ten times the size of the previous level, and what you do is you fix the size of your SS tables for each level right, so so each SS table is going to be I.

B

Think the defaults, five megabytes we set it to like 64 megabytes and what it allows you to do is so in level. 0 there'll be 10, SS tables and, oh sorry, not level 0 in level 1. There will be 10 esas tables in level 2, there be one hundred and level 3, they'll be a thousand and so on, and and what it allows you to do is is.

B

Is you don't it's not randomly ordered each level the level is is is sorted by by Roky, so you can actually just ask the leveled manifest you can say hey which SS tables does this row belong to and I'll say these. You know, n SS tables should have this row and then you go check it so vs. I steered, which is kind of this like exponential. It keeps growing and growing growing over time and there's been talks. You know people in the passive head, like SS tables of like 300 gigs, or something like that.

B

So in order to work around it, you can use this and it allows you to kind of it's a good workaround for wide rows. So it allows you to and it allows us to handle our use case, which is sometimes we want just a particular point, and sometimes we want a time slice now. The problem with level compaction- and you can see it in here- the yellow you can see in level 1 through 5. We only have to check a couple but level 0 which is sort of the raw flushed SS tables.

B

You always have to check all of them and what ends up happening. Is it it's like breaking bad. You know you get this under high right load. You can't keep up with the compaction zan leveled, so you end up having this like gigantic effective. You keep writing out these essays tables. The compaction can't keep up. So you end up having more and more and more of these esas tables, it almost defeats the purpose of level, because you know you want to have the point of it is you're trying to limit the number of s's tables.

B

But in this scenario you will have to check you know a huge amount every time. So what ends up happening in a hydrate, hi hi read in high right, which is what we have is a you know, your your your reads: go way way way down. So what we decided to do was- and it's pretty conceptually simple- is we combine the two compaction strategies so that for for level 0, we use size tiered. So as long as it's in level 0, basically the the the the level compaction hasn't picked it up.

B

Yet so it's just sort of there we know we know all the SS tables are relatively small because they were just flushed, so we can size, tear them together. It's really not a huge burden on the system and we end up cutting down on the number of SS tables that we have to read from in level 0 and in the meantime you know once it does get to one of those larger SS tables.

B

It actually goes faster because the SS tables are not much larger and in order to get into level 0, it picks like a random 32, SS tables from sorry in order to get in the level one. The the compaction strategy picks a random 32 level zeroes. So if those are all larger, they knew up getting more throughput into the system anyway, so that that was the compaction stuff. Now this other issue that we had was compression a lot of the CPU time is spent on compression. You know with with SSDs. You really want to get.

B

You know the most bang bang for your buck, so we wanted to kind of keep as much in it compressed as possible, but it's very CPU intensive, because you keep rereading the same block over a compressed block over and over, even if it's in the page cache. So there's a faster compression that came out. You know relatively soon after snappy, which is called lz4 and and in benchmarks it's it's. You know forty percent faster than snappy and there's a guy who works in solar who wrote a java implementation of it. It's really nice.

B

It has basically a pure java implementation. It has a Java unsafe, which is this magic API that you're not supposed to know about, but everyone uses and there's also just a pure see version so and you can see in this benchmark snappies over on the left and the the unsafe one which doesn't require any native hooks runs at the same speed of snappy. So that's a huge win right. There, because, then you can run on a lot more platforms and, at the same time, the the jna eyes is faster.

B

We didn't see this in practice because the the blocks that Cassandra compresses are so small and so much time is spent going back and forth through jni, but it did cut down on the 95th percentile latency so like the overall throughput stayed the same, but but I put our latency dropped a bit, so that's good and finally, the CRC check is another huge area that it spends a lot of time in so a lot of time is spent when you're profiling and you're. Looking at this, the CRC check for each compressed block.

B

It currently uses like a just a pure java method, which is really slow, so it ends up taking you know to it, ends up causing a 2x performance hit when, when you do this crc check, there's a last chance, there's a percentage chance that you can set per column family, you could say I only want to read of every ten percent of the blocks. Did the CRC check now in a dupe they actually created?

B

They did this benchmark and they found that a pure J&I version of the CRC check runs 30 x, faster than the java version, so it might make sense to to move that into jni and I want to throw up. These are our current stats. We have. We currently have 12 nodes in two data, centers we're running at our f6, which means we have six copies. So it's it's kind of wasteful, but it gives us a guarantee that we can either lose the data center or we can lose a node in each data center.

B

We can do 150 thousand writes per second for our data size. We can also do a hundred thousand reads per second for for for the cross section queries we have over 6 billion points without compression and the uncompress it's about 2 terabytes and our Layton sees are down there below those are actually our our Olympus level. Layton sees so those aren't actually even Cassandra, that's like after it goes out of Cassandra and then through our service, and that's it I, don't know, probably blew through that didn't I all right.

B

We got 15 minutes all right, so I want to open up to questions or if you want to go back over anything, you didn't understand or anything else yeah. So the patch for the hybrid compaction, that's I mean the only. The only thing is it's not. You know pretty enough because it assumes like so are our SS table our our our leveled compaction is set to like a 64 megabyte thang, so it actually looks for like a fixed number of megabytes available to do the size turn.

B

So we just need to make it generic, but you know later later.

B

B

The question was: can you reiterate what kind of filtering Olympus was doing yeah.

A

Sure so it does the see if I can go back that far see if I knew, which page it was on. So it does the filtering down by time series. So when someone asks a time series, query they're asking for they have at the start and the end, and then they have the periodicity.

A

So if we're providing one minute, the last value that we have for that minute, the Olympus will take the raw results that cassandra has, which is every single data point, so micro, second level data points and it'll then take that and it'll roll it up into a single data point for that minute, and then it also does the knowledge time filtering. So we can go back and update a value many times, but we really only care about the last value. So that's what the Olympus does both of those those components.

A

The user can specify what what interval yeah I.

B

And the main idea is that you know all the rights come come come through the service layer and they go out to the surface layer. So we can so we can down sample that data as it comes in so based on the query we figure out which which one to talk to and then but within that you might ask for like every third day on Tuesday that happens to be a Tuesday at the end of a month right. So you can't plan for all of those down same points.

D

B

A

It's kept in memory right now,.

E

C

Are your last side you mentioned and it works in Java 7 and not to open that can of worms, but can I open that can of worms? Yes,.

B

Well, I mean so yeah, one of the other motivations for the LZ, for is so the snappy implementation that we use in Cassandra doesn't work in Java 7. It may end up getting fixed, but in the meantime it's it's sort of this a hard barrier for getting into it. In for making Cassandra go to java, 7 I think Jonathan. You probably mention that I think like Java 7 is going to be like the de facto I mean it already works in Java 7.

B

But one of the barriers is, you know all the SS tables have all the column. Families have compression turned on by default, using snappy. Now that this is in the this was committed and the new default is going to be this LZ for which works with Java 7. So if you, if you're upgrading from java 6 to java 7, you have to wreak recomp act, all your data with the new compression scheme and then everything will work great in Java 7. We.

C

Just ran into so many problems that ended up being java. 7 related that I mean I've, been sort of banned from using Java said how.

B

C

Over the last say, 3-4 months really.

B

Yeah I mean you know I, I I'd be interested to hear I mean we are sort of in testing with it, but we haven't hit any any any major roadblocks and I know there are people out there that are using Cassandra with Java 7 and don't have issues.

A

Ya know it's a it's not something that we support right now, but it is something that you can put that. Oh so moving averages can that be calculated in Olympus. So it's something that where we will add and it's something again where it's going through all this data that we have and providing the the filter in the service.

D

A

Yeah it Cassandra doesn't doesn't do the the roll-ups are.

F

You loading all of your market data in real time incrementally throughout the day, or you doing some sort of like bulk load operation. If you could have a vendor feet of that data.

A

Yes, so right now we're, we have basically very discrete points that we actually capture market data at and we we are recording that real time in the future. We expect to grow the cluster size so that we can handle real time market data coming in.

A

Yes, that's what we've that those are the points that we went through on like Java tuning and and tuning Cassandra? It's because when you read and write, the performance hurts because you're doing like in memory operations and you have to do locks I mean that the not locks, but you have to your reading stuff that could be right being written by another another client.

A

So there's that's mainly the reason that we've been doing all this tuning to the JVM and to Cassandra to handle that that load.

B

Right and one of the things to notice is that we're actually doing a quorum level reads and writes, so we do want consistency through it. So it's not like we're getting random randomly consistent data.

G

Plans in case of an interruption I'm, sorry in case of interruption in or error in the data stream. What what means, what you have to update the data and and I? Have it a dress properly upstream right I mean, if.

B

You go back to the earlier slide. This is this: is this whole bitemporal time series there's actually two dimensions right, so we never actually update points in place. We create a new point with with a later knowledge time. So what that allows us to do is say like because what you want is to be able to go back in time and say when, when something happens yesterday, when we had the bad data this happen, so we want to simulate that or you know, we also want to have the ability to fix it.

B

So having a second dimension of time is, is what gives you that so was.

H

B

So that's that's. Why there's two tix as the as of which is when it happened and then there's a knowledge time, which is when do we learn about it.

E

What are some of your current performance challenges that you guys are facing right now that you're trying to work around or workout, especially as you're, starting to kind of scale out a little bit more? You.

A

Want to take go.

B

On uh yeah I mean we just have a week. We have our consumers are much much much much larger and greedier than our Cassandra cluster. So what we try to do is we're trying to optimize like so we have functional. The functional use case works right like it does what it's supposed to do, but it doesn't always do it the best way.

B

I viously, you see, you know we can be really really do really really silly things like pull down 200,000 points and step back, 300 really bad idea, but you know so the idea is we could we can prefetch that data and cache it? Because we know it's coming and we know everyone wants that.

B

One data point or something like that, so we're trying to be smart about when we pull the data but I think the longer term goal is you know you know the fact that Cassandra's open source- you know we're were a strong development shop. So you know we really want to make sure that cassandra has these functionalities, and you know we're working with the Cassandra committers and everyone else to try to make sure that you know the idea of like being able to push down a filter into cassandra is something that everyone could probably use.

B

You know. Cuz cql will cover some cases, but you sometimes want to filter a result set like so based on this result and do this and that's really what what what we're gonna eventually end up doing.

B

Throw one over there.

F

Have you ever needed to roll off some data because it seems like you're taking a lot of data points per company and I, say I? Think there's a like a two billion column limit for many data points. You can store per row so have you ever had to do that? No.

A

We don't have that level of data yet so we haven't hit any of those limits yet, but.

B

I mean our plan is we're basically going to segment over time like a like a SS table per year per month or whatever ends of it? I mean I column, family prettier per mo.

B

I think we're out of time almost ya know: oh well standard I'll, Catholic.

B

Finest 05- that's okay, I want to. There was something I wanted to go over. I had a funny joke actually.

B

Yeah, if I can remember it on the spot, I can't remember what it was. Oh want.

C

The wrong way the wrong way.

B

As I'm just going through I'm like what was that joke again, I thought everyone. I wrote the side of night. Oh here's, another question, good.

A

We're all saved would.

H

You give us more insight of how you find the last known value in your time series for a given ticker or company yeah.

A

That's it's basically done using the less than operator and c ql, and so we just look at the the knowledge time so that's stored in in order and we have ordered by the as of and the knowledge time. So it's going to be the first record that that we get back so.

B

I'm just going back to the data model. There yeah.

A

So so we actually store. The very last point: that's that's been written is the first value in the column. Right so see the.

B

Clustering ordered by the and descending. That means. The first point is the most recent.

A

Right, you have to go through the ads of time, yeah we for most of our queries. We know what the ads of time that we're looking for is, and then we do the knowledge and we.

B

Don't have that many knowledge times per point.

I

So I was kind of curious about the hybrid compaction strategy. If you take one of those really super huge SS tables from level zero and push it up to level one because of the size limit of SS tables and leveled compaction, I guess you would like take that one as stable and suddenly there are like a hundred and fifty or whatever in level one does the leveled compaction. Is it able to handle that?

I

Oh we're, actually not just one or two or three SS tables over what we're supposed to we're like rats, doubler and nominal capacity right.

B

I mean if it doesn't fit in in level one then it'll it'll promote it to level for something like that. So it'll push everything else else out. Well,.

A

And- and we stopped- we stopped the sized here once it hits the limit of leveled, so we would only do sighs tears up to so. We have 64 megabyte SS tables in leveled, so we only do sighs here to until it's 64 and then it's ready for promotion. Look.

I

B

No, there is no or so so. This is at at at DML time right so so this is like the setup for the column family. So this isn't a query. This is it's stored in that order. So it's that's but yeah. If you add, like somehow read it in some other order it would. It would be very expensive.

B

D

Question what about stock splits, which changed the price is that handled in this some hell? No.

A

Yeah I mean basically so.

D

These are actual just actual price, so you would have to know you would you.

A

Would basically, after an event like that, but time series doesn't work back, so the you would get a new instead of like Apple, you get like Apple to something like that, and so, if you wanted to go back, you would have to find all of the entities that contributed to the stock split or like it's basically just apple, and then it did a stock split and then it became Apple too I.

D

Mean do you, do a SEP create a separate row or yes.

A

D

Okay, so you do that and I would have to know which row to go to paste. It.

A

Would be it's basically as of some time, ap comes a new entity. Okay, thanks.

A

Yeah we did, the problem is, oh sorry, do we do a comparison against other tech level, databases so the the problem with a lot of them? Is it doesn't answer the other query which is the cross section, so it does it's optimized for the time series, but not the cross section, and we need to be able to do both.

B

Okay, well, thank you.