Apache Cassandra Cassandra Summit 2015, 14 Mar 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: DataStax Startup Panel: Cassandra Summit 2015

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

B

Hello and good evening, everyone, my name, is Matt file and when the two co-founders of data sacks and welcome to the startup panel, one of the things that we do at datastax is provide our software for free to startups, and we do that with the goal of getting more people familiar with the software getting feedback on. What's good and bad, usually they tell us more about the bad than the good, which is actually good. A great quote is bad news is good news. Good news is no news and no news is bad news.

B

Think about it. What's good and up on this age say all what everyone introduce themselves in. A second is a group of people who have worked very intimately as a part of that program, as well as with the software and we're going to keep this relatively free flow. In terms of some questions about what you've experienced, what you don't like some recommendations for the greater audience and we'll see where it goes.

B

This is a rather intimate audience if you have any questions or comments, please yell raise a hand first, but then scream and we would love to incorporate that feedback. So why don't we go from right to left on our stage and introduce yourselves.

A

Hi, my name is eric liu bao, I'm the CTO and co-founder of simple reach, been working with cassandra since point to which, if I remember correctly, is before ripped on 0, which is also before datastax.

A

So I have a very long and storied history with it and I've watched it grow up over the years and I've, as anybody who may go to my talk tomorrow, have encountered the absolute worst of it, but has also enabled my company to do what we do today, so I'm pretty pumped about some of those things. Why.

B

Don't you give a couple seconds on what your company actually does sure.

A

So what my company does is content analytics, so we we help. You know, publishers and brands and agencies and e-commerce companies get the most out of the content that they create or provide.

C

Right, I'm, Sam Bisbee, the CTO at threat stack a boston-based company. We do a continuous security monitoring for one servers that are typically deployed in clouds like Amazon pretty much. A vast majority of our engineering team had never used Cassandra prior to working a threat stack. So that was an interesting experience. We had also been scaling up as a time series data company trying to shove over a hundred thousand inserts per second into Cassandra two terabytes a day and so learning how to operationalize that and then also remove delete data efficiently was pretty interesting.

D

I'm milia beira know: okay, yeah Millie, abarrach I'm, the CTO for Channel IQ, and we do ecommerce intelligence for brands, for compliance monitoring and for where the BI solutions I've been working with data stack since version point eight at another startup, so my startup was acquired by chana, like you, so I've had the privilege of taking our Cassandra knowledge and incorporating into another team that hasn't had that knowledge.

E

My name's Sebastian Estevez I actually work for Matt at the startup program. I am an the startup program from a technical perspective, so all these guys get help for me, I'm kind of an ad hoc basis, and hopefully some of you guys on the audience. Also already do we will and.

B

To put more flavor there, sebastian has an interesting job. Data sax is a corporation, is an enterprise company, so you know we sort of count our customers by the hundreds, if not load thousands, whereas with the startup program you have a large number of companies come in and he is the only technical person except for you who recently joined who worked with all of those hundreds of accounts.

B

So the interesting thing about what Sebastian gets that no one else that data sax gets is that you see hundreds of customers on a month-by-month basis and you get to solve all their technical issues, so he needs to say, he's a little crazy at from time to time. So getting into this, what I love to actually learn a little bit from each of you is: what was your original motivation for getting into Cassandra like? What could you not do is something else out there.

B

What could you do better and why don't we follow that up with what was your first surprise, interacting with the software.

A

First, surprise:.

A

I'll skip the point to stuff because way back before one point: oh, it was a surprise if anything worked.

A

So as a one point, oh, the biggest reason that we actually got into using cassandra is that we had some serious, like high-velocity high-volume injection ingestion issues, rather that we had to take care of. So you know we started out with trying to do counters and read us. Then we started out. Then we moved to you know we looked at because couchbase we looked at HBase. We looked at basically anything else that could be used as to count things.

A

We did and then quickly found out that the hardest thing to do in computers is count, and when you talk to people who are non-technical, they don't believe that, but that's actually I think the reality. So we've spent the last couple of years trying to figure out how to make Cassandra count things very well and I. Think mostly thanks to Sebastian.

A

We've figured it out and just.

B

To get some perspective, you mentioned large right in Jess and counts, throw some numbers on that. So.

A

In terms of counters, we do about at pink. We do about 400,000 counter operations a second at lower. We do maybe about a hundred thousand counter operations. A second and each counter operation could be anywhere from you know. One increment of two hundred cells to maybe I, don't know a couple thousand increments on a couple thousand cells on a single row and that's just on one table.

A

So you know in terms of volume, we may not do a whole hell of a lot of volume compared to some of the companies out there, but we happen to do it with like very very difficult data structures.

C

So one of the big first issues that we ran into was we went through implementing roughly every anti pattern in the data model that you could, because we had all had experience on effectively every other database quote-unquote.

C

So you know we're a multi-tenant platform, so we implemented multi-tenancy at the column, family level and we just completely drowned our cluster in files and tables. We quickly learn. We never ran into the counting problem because we had other solutions for that, but pretty much everything we were doing was pre materialized views and that's actually been pretty successful for us.

C

So far, we've had great success, either very wide and shallow schemas or very narrow and deep schemas, so either a schema that drives a very specific widget or a schema that will drive, for example, go look up a process for event on a given environment and then contextualize that against everything else in that process tree every network event in that process tree at that point in time, but then also historically over the last 30 days and go pull that all together. Yeah.

D

So one of the problems we've had we've also done a lot of the anti patterns. We built a pretty complex, queuing system in Cassandra.

D

Nobody should ever do that and they should use something like RabbitMQ that we learned quickly. We've done other crazy things like attempt to store sessions in Cassandra. For.Net probably should also not do that. So it's definitely been a learning experience since point 8 and for us you know. Originally, we started looking at Cassandra when we were first building our startup purely because we knew we'd be processing a ton of data with a ton of updates, as we try to index like content online.

B

So you don't deploy something directly. Why don't you give some insight into some of the common use cases? You see why people start to use the software and then some of the common things you see, people make mistakes on yeah sure so.

E

The the folks that sign up for the program I divided largely into two camps, there's folks that have a system in place, that's running on a database and is in its and it's close to falling over or is already falling over, and they need to make moves and then there's the camp.

E

That kind of has heard about those experiences and read about them and are being proactive and building a system that that's going to need to scale it doesn't need to yet- and you know, those folks usually have a lot more time to figure out how to do things. The right way. Cassandra surprises me all the time right. Actually, yesterday something I learned something about details and SS tables. That I wasn't aware of specifically that if you change TTLs on an SS table it only, it only affects rights that are coming after the fact right.

E

So little things like that are will always keep coming. But one of the one of the important things I'd like to kind of make everyone aware of, is just that. Cassandra and DC are both very tunable. They have a lot of levers and one of the ways to be really successful with the products and get things to work well, is to find out what levers to use. One.

B

So we talked a little bit about Cassandra and what you guys are doing there, but at data Sachs, there's an enterprise-grade edition of cassandra that we, you guys have access to for free that software includes things like search capabilities through solar technology analytics through the hoop stack as well as spark. It also includes things like performance package by doing things with the main memory, security and an OP Center. Can you guys talk briefly about which of those features if any are using and what your motivation was for using those over other things in the field.

A

Sure so we use spark, which is part of the Hadoop stack ish. We use solar and I think that's probably the sort of extent we also use opscenter sure we use opscenter can.

B

Go with why and.

A

What yeah I'm getting there? Let's try to try to think while I talk it doesn't. I am not good at multitasking, so the reason we use spark is that one of the things that is a you know an issue with counters and, like I said, we count a lot of things, whether it's counting page views or counting unique users, and we count unique users by using something called HL, l plus, which is hyper log log plus, which means we store bits of data in very large sequences.

A

Also, something Cassandra was not built to do an anti-pattern that we've, you know, fought that Cassandra fights us on regularly. So in order to you know, double check ourselves, we use spark spark is a is a framework that allows us to run against a large amount of raw data, and you know check our aggregated data to make sure we're approximately correct.

A

We also use solar Solar's, a full text and full-text search engine, and we do it not just for text, but we also score articles and we score some of the content that we check in very interesting ways, and- and you know, we use spark to search against that and log that and run those with intersections against some of our counter data and help provide sort of recommendation algorithms in a way similar to what you guys saw this morning with the the team of John and Luke.

A

And you know those things have enabled us to those things a bit amazing and made our life easier, because the operational administration of solar and of spark which would normally require Hadoop is actually kind of a pain in the butt. So because we already know, cassandra is actually the devil. We know in this case, and we know the headaches that come with administering it I'm not having to administer. Another set of you know, distributed systems or two other sets or three other sets or distributed systems has certainly alleviated quite a few headaches.

A

I would say that op Center helps with that, but I'm, not personally a guy who spends a whole lot of time in there, but it can be useful.

C

Great, so our usefulness for op center has acted as kind of a bell curve. Actually it was not useful in the beginning, because we were running op center on the same cluster. We were trying to monitor pro tip. Don't do that. We were also the number of tables just wasn't making it happy, and then we fixed those things that we saw a lot of value from it. But then we were large librado and Gravano users.

C

We use librado for long-term historical metrics, and then we use graphite and Griffin ax for one second or ten second resolution metrics, which we keep for up to seven days in our data center data center, and we actually have seen more value out of pulling jmx stats out up there because we're able to more easily join that against other stats from our application stack.

C

We are very interested in possibly using solar, so we asked earlier why we went to Cassandra I forgot to answer that we use a lot of elastic search and we were trying to do things on elastic search that thou shalt never do and where.

B

Are those things.

C

I pretty much all the types of queries I already described in previously like it was trying to pull out too much data. We were trying to do non-inverted indices with a inverted index.

C

You know even today, I question very heavily our usage of leucine and potentially solar, just because all of our data is highly structured. We don't do analysis with was seen as just kind of unnecessary, so we might actually end up implementing inverted index through a sparse matrix on Cassandra. Instead, we won't be doing Hadoop because we banned Hadoop. We also banned zookeeper and we've already got spark running. We basically take data off RabbitMQ 10-minute batch into s3 and then pull that into spark because sparks driven was just looming and waking file.

C

The square tourism seems like it'll, be really cool in two years.

C

Why do we Bend zookeeper I'm, trying to say whether or not to do that Athena chrous, Ponce or the actual one but uh yeah the actual one? So the actual one is we have very few pieces of our infrastructure, they're, actually master slave and we've had too many prior experiences of putting zookeeper into a split brain situation and having associated pain, and it's just it's completely unnecessary. All the compute in our stack.

C

We live by a kill, dash nine life cycle, so it really wouldn't be that helpful for us throughout the rest of what we consume. Ntsc we've enjoyed the kind of security components and we've also enjoyed some of the performance patches, but also mostly Sebastian yeah. He was a feature.

B

Put that on the website and.

E

It's good another.

B

The security company uses the security features are not in the DSC. So that's good. It's like checklist item there.

D

So we use outside Cassandra, we use spark and we use solar when we were introducing Cassandra at Channel. Like you, a lot of people existing employees at you like you like. Well, we shouldn't use solar. We should use elastic search which I was like well. Why we manage another thing when Moses functionality is built into DSC over time, they kind of saw those ways, and you know we stuck with solar and it's worked.

D

We use spark for everything from text analytics to generating visitor analytics for different graphing purposes that we provide to our customers to matching products to one another. We use ops center the monitor everything to run repairs which sometimes work most the time, not so much yeah.

D

Well sink. That's a.

E

Alright, so I don't, I don't use any of these things in production, but the lot of the DC integrations when you, when you think about the apache suite of open-source products you can you can why all these things together, and sometimes it takes duct tape and sometimes virgins change and one of the things that these he gives. You is just time to value and simplicity, and so you know with a couple api is like you guys saw this morning.

E

A couple api calls you have searched and you have analytics and reaper sondra data and a lot of customers. All of our startup customers are enjoying that are benefiting from those.

B

So sorry about that, but eric just made the comment that I quote: you hit the jackpot with him referring to sebastian and in my rebuttal was he's proof that either luck or God exists, maybe both but he's been a blessing to hundreds of startups and really knows his stuff go ahead. So.

A

I actually want to make a point of this, because um I don't, I don't think anybody really unless you've gone through hell as as a start-up and when I say gone through hell, I mean, like some startups, have a little bit of trouble.

A

They'll get like you know a website outage and like it's annoying and it might be the end of the world if your startup is just a website, but if you have like serious back-end infrastructure and you've had downtime or really big problems with that with with serious infrastructure, uh you know you, you actually have to sit in question a lot of times. Like am I, going to be able to survive. This. Is my company going to be able to survive this and and sebastian has spent?

A

Probably I would I would say, to give you an idea if he was married, he would be divorced now. He has spent that much time with you know my company alone and had to spend overnights work.

A

You know late nights in in in our office and we're just one of, however many hundreds of startups that he's had to work with, and I would say that on the whole, uh you know the talk that I'm giving tomorrow is really 95, plus percent a result of of his being able to be a part of like a full-time employee of my company, as well as datastax and and at least these two guys and everybody else.

A

So if nothing else that you would take away from this like understand that having access to somebody like that is incredibly valuable, even if they are not a full-time employee of your company. So now you can continue and.

B

We do not actually lists him as a feature of the startup program on the website, but you know we could put it on there all right. It happens all right. So why don't we actually go through and just briefly talk? Is it this? The point of this was a little bit like the good stuff is stuff that you we're not up here to preach the good stuff. You can weed the marketing material for that this is actually which I'm trying to convey things.

B

That would be lessons learned of things to either pay attention to if it ever happens to you or things to avoid altogether. So you know and that's- and we we've done a panel like this at each of the startups, where it's basically like the horror stories, this one just got appropriately titled slightly nicer than what it's been in the past. Why don't you go through and talk about?

B

You know a really really bad experience, and but at the end there take us through like why it was bad and sense of like what could you avoid it or what should have been done so the audience can learn. Why how they could avoid that, if it ever happens to get and you're going to be good I.

A

Have a lot of them.

A

That's that's. That's.

D

A

Great question: he asked it if I have so many bad stories. Why am I still using well? The answer is that, like a lot of the technical problems that my company has to solve, they just there aren't really solutions, and the thing is: is that like in order to make a company in order to build a company that is a technical company?

A

Sometimes you have to solve problems that don't really have solutions and that's hard and you take the best of what's available and you make do and you hope that it works and like there's a lot of that and- and you know, there's a reason that Matt and I have become such good friends over the past four-plus years, because um we've we've hit some really hard times, and you know I could tell the story of counters up here and like the hell that it's been through for us for many years, but if you're really interested in how we solve that, like I'm, going to do 40 minutes on that tomorrow morning, I think it's much more interesting to, rather than tell a very specific horror story, to kind of convey.

A

The idea that, like sometimes there's no ideal solution and sometimes will hit the fan- and you know you the best you can do- is sometimes it's the anti-pattern. Sometimes it's you know breaking the rules and everything that we've had to do over the past four-plus years to solve the hardest.

A

Problems have been breaking the best practices and breaking the rules, and the biggest thing that I can say as a takeaway is you know not telling us without telling a specific story is the best thing you can do for yourself is understand your your use case to the point where you can understand your failure, cases to the best of your ability and then figure out how to break the rules to make them work, and that sounds incredibly counterintuitive.

A

But after four plus five plus years of breaking rules left and right, I can tell you that the data that note you have to know the rules in order to break them and it takes a company to push the limits like the way. Datos taxes I feel like fanboy, in a very weird way. Considering how much like bad stuff I've been saying over the years, you have to learn how to learn.

A

You have to learn how things are done, the right way in order to learn how to do things the wrong way and we've dumped enough stuff the wrong way in order to make it the right way. It's counterintuitive but I feel like anybody up here. It would just nod their head and be like yeah that makes sense and until you've been through the fire, it actually really like it doesn't make sense until you've been through the fire.

C

I'm blanking someone on what the original question was but tell a horse or okay, so I'm.

B

Probably about Cassandra, but you want to go up.

C

No it's about Cassandra, um so we originally had this brilliant. So I worked with a lot of other databases. My prior company was a database company I had a very fundamental concern coming in to doing time, series on Cassandra, where everybody was telling me hey, use, ttls and like well. No, the whole point of the database is to not delete my data I. Don't trust Cassandra because of all the horror stories.

C

I've heard much like this, because I don't trust when the Cassandra community implements a feature for the first time and yeah, and so I just was going to basically go in and do data cleanup myself and reclamation. That was a very bad idea. We got into a situation where we were running our 36, no cluster of I to to xls at like 85 to 90 percent disk utilization per node for a very long time.

C

They were playing all sorts of compaction games and then we, the this horror story, got even worse when we had some other issue in Cassandra that caused cues to back up in RabbitMQ. And so then we were just playing this ping-pong game of like all right.

C

Well, we need to clear the cues, because you know some company had spun up two agents to my other servers that because of the forking mechanisms and just how I Oh worked on their box, it almost tripled our ingestion right into our pipeline from two servers right, and so we were fighting that fire.

C

But then we realized oh crap, we're trying to optimize this queue to write faster into this thing, that's about to run out of disk, and so we were basically just going back and forth for a good seven days of not sleeping and fixing that but I love, Cassandra, I.

A

Can tell ya I along those lines. I'll tell one quick horror story, because that reminded me of a very entertaining one. So.

C

uh So part of the moral story was to set TTLs, but that doesn't actually fully fix the problem because of all the sessions of this week on dtcs problems. We were also incised here: compaction. That was another problem. The moral that story was it's very hard to pick out just one, because there was like many things: there's a hundred micro fixes to go and address the fundamental problem. I think one of the things it Sebastian with it to earlier was it.

C

There are so many tunable in Cassandra itself and then tossed in the JVM, which is black magic, and you know you're in a world of pain, because you know we're a start-up of people who haven't done large-scale operations of Cassandra. Before and literally it's like peaches, lock, mr. dev, ops and me- and you know, the rest of the people are doing what they should be doing, which is building product and value, and it's the two of us just you know making it work and we way understaffed.

C

Cassandra I mean anybody going into the Cassandra they're going in with less than like two or three full-time engineers. Who would really understand data model JVM and operations at the cluster I can't recommend it? We have come out the other side of that tunnel through all the fires and everything, and now Cassandra is one of the most stable happy parts of our polyglot data platform.

C

But you know she was suggestion like there are moments when you're saying they're, fighting fires you're, like wow I, really hope this works. I think it will. It's probably fine.

B

And I will say in terms of the ecosystem, you know the first Cassandra summit was five years ago. This is the sixth one and there were over 6,000 registers.

B

The first one had about 145 people, so the ecosystems gotten bigger and one of the really nice things about a bigger ecosystem is there is a partner network now or then we phrase important work, there's an ecosystem of people of companies whose sole job is to actually run other people's Cassandra deployments as a service, whether that's in a public cloud or manage on your own hardware, if I would highly recommend unless you really want to don't run it yourself. Well,.

C

There's that, but, as part of this sense, does it start up and like that's, not affordable and it's very hard as a start-up where you know the ratio of engineers to databases is already bad enough for us to then go out and bring in contract to hire or just contractors or ms peters or anything. It's really not. It's really not a cost effective model for us I.

B

Just don't think you're a good negotiator, but that's a different discussion.

B

This is around JVM and special skills or details.

C

There's way too much content in there, I would say you get really fine grain in the metrics, so there's just general JVM stuff like get really fine. Grain of the metrics really understand all the different eden, space etc. Pärnu like if you're going to go roll out java, 8 or you're going to go roll out g.

C

One g c, like he's, been telling us to roll out g one g c four months and we haven't because it actually slowed down our ingestion rate and we've had to go back and finely tuned things to try and figure out where that actual problem is coming from.

C

So I just took like a statement as a Cassandra, but just don't trust the JVM, so we're using de stacks enterprise on Ubuntu, 1204 and we're using Java 7 may be sure. No.

E

Yeah well, okay,.

C

No java comes with.

A

Anything you download, you have to download separately with just cuz Oracle, like that's your choice, but I think the huge take away from that is monitor and instrument every single possible thing and worry about whether or not you've over instrumented or over monitored later, because that will probably never be a thing. I was going to tell a story, but I'm gonna let ilya go because we're getting the sign-in and.

B

Whenever you do, yours, I'd actually maybe touch on some of the previous technology used and that led. You use yours before you get to the horror story, because you have a very interesting pass with other databases to be used. Yeah.

D

So when I came into channel IQ, it's a company that started off in sequel server decide that sequel server could no longer handle kind of the load from the types of things that its customers were.

D

Looking for and so channel, like you raised a bunch of money, decided that they're going to invest in the big database system and create a whole new product spent ten plus million dollars higher tons of engineers spent a year and a half building this on top of MarkLogic, which, for those who haven't heard, mark logics, probably biggest customers, healthcare, gov and that probably went about as well as it did at channel like you so Sam's in the audience.

D

He's got a lot of experience with the sequel side and the mark logic side, and we actually turned off all our marklogic instances. A few weeks ago we bought a cake to celebrate the death of marklogic and nobody could be happier so yeah. But as far as our horror stories, I mean we've used DSC for years. I could tell you that you guys do a great job coming up with features and marketing them and they never work.

D

So I'll talk a little bit about one recent case, so ops center launched this feature of like performance services and slow queries and sounds great right. It's going to help you find the that makes your stuff run. Slow and I'll help you fix. It sounds amazing. The only problems when you turn it on will crash Cassandra, eventually and eventually became every six hours for us and when you go turn it off, it actually doesn't turn off.

D

And then, when you tell us the Sebastian, he goes no, it should turn off, can't be that and you plumb the source code in DSC, and you tell them no it's right there. It doesn't turn off. So yes, I would say our biggest lesson is yeah, even though we wait for like fixes and whatever the next version is some of the default stuff and what launches in new versions, it usually doesn't work. The first time around seen can.

A

I can I can I throw us out. They get there a couple years back right before the New York City Cassandra summit I had a I had a node decommission like it just decommissioned itself, and I I woke up in the middle of the night to the node, just going like just commissioning and the like poured. 9160 the the CQ up or stopping listening and I was like freaking out I'm, like I, don't know who my company would decommission a node without talking to me, I would started going through bash histories.

A

I was forming out like what is going on and then I started digging through the logs and I found a log entry. That said out of disk space decommissioning and apparently at some point, someone in DC on the DSC team thought it would be a cool idea to just when the nave runs out of space to decommission the node, like just remove itself from the cluster, and so I I woke up and freaked out and I sent this bug report to at the time. I didn't remember.

A

Who was we've had so many people help us over the years. Yeah I think I might've entire Hobson and option, and I sent it to somebody and he goes there's no way. We would do something this stupid, so he forwarded it to someone else and the email thread was just five or six people saying that there's no way someone else must have done it there's no way. We someone would do this this stupid. Then someone found that the patch and they did a get blame and remove the name from it and they were like.

A

No, we did it we're not telling you who did it, but we did it. So we had some very entertaining DSE feature stories over the years to read: notes just like to disappear.

E

Alright, so one thing with the performance services: it does turn off of you set it to zero all right. You have to set it to zero I'm. Just saying just saying: yeah you do all right, but no so so to be fair. I. Do I do get a lot of I. Do get a lot of cries for help, let's call it that kind of like I. We were in a firefight help me out it really.

E

It really helps when, when those come with like a full stack trace right with, like some information on what's hot OS wise with like a little bit of background on, what's actually wrong rather than my node died, kind of do I mean sometimes actually what it is, but very rarely so I think I'm going to do a talk tomorrow on like things that I see very often with the startups hundreds of them and at the end I think I want to spend some time just outlining ways that you can get bastard getting the problem and get better at digging yourself out of a hole potentially with that with the Start programs.

E

Assistance. So I do encourage that so.

B

We've got just a couple minutes left. Are there any questions from the audience that anyone like to ask really quickly right up front? If you scream it, I can repeat it. What's up.

B

A

B

For the video was and I quote, datetime compaction, no.

A

Date tiered update.

B

To your compassion, oh god, why I think, okay.

A

So so, to be fair, the idea of date, tiered compaction, is actually really really good. The problem is, the implementation is really really bad. In the absolute best-case scenario it's bad and, and the folks over at I can remember his name. I can't remember the company I said it before you remember: okay, that's fair! So oh I, remember it now, so the folks over at CrowdStrike did a really good job of elucidating.

A

Why date tiered compaction is a is a complete and utter failure in its current instance to the point where like, for example, if you do anything with any sort of streaming, so a bootstrapper remove, which is another side, horror story, we couldn't remove or add nodes for like two years.

A

That's not an exaggeration actually was like two years. uh You know the point where, if you remove or add a note or any sort of streaming thing it tears, it triggers a major compaction everywhere around the cluster, so date tiered compaction when they get it to work. Right is actually brilliant for time. Series data and, and, as you know, he'll be able to tell you like it will massively reduce your I/o, but if you're in his use case, it will reduce your I/o if you're in our use case, it'll quadruple it.

A

So it really just depends on being able to be smart about the pattern you use and understanding the implications. Rather than saying, this is a good idea. I'm going to go ahead and implement it and then, like being like I, don't know why didn't work, because you don't really fully understand like the the hell you just imparted on yourself.

C

So I I wouldn't go quite that far cuz I. I do think one component of dtcs that is really good. Is that it's better than size to your compaction?

C

Yeah I mean yeah. So, like you know for us for our use case, we insert only that's our entire data model. We never we never delete. We never update bankers, don't use the racers right and so date to your compaction strategy does a very good job of combining data and then dropping it. The problem that we run into is that it combines too much data and the calculus. You need to do to figure out why it would combine data the way it does. I don't have time for that. You know it's in a lot it.

C

It was a very clever solution to what should be a very simple problem. Right you know: I was able to implement something on nodb mysql like I, went and implemented a object store on MySQL like it was 1995 right, but it was very simple because you're able to create tables / days, you're able to partition, based on time like being able to bucket based on insertion time. That's all that we wanted and once it times out to actually unlink.

C

The file like I would have loved to build that on Cassandra, but I could not reliably reclaim disk as a time-series data company that sells based on retention period. If I sell you 15 days of retention, every second I whole past that that 16th day, that's just cogs and laying out the window. That's all margin.

B

So this is the end we have no more time but, first and foremost, I'll, say two things. This group of guys up here is sarcastic, as they are as blunt as they can be from time to time also represent some of the smartest people I've ever met in my life. There's some of my favorite people on the planet and well. Three of the four of you are: you can guess which one and they they know this stuff inside out.

B

I think these guys would be hired by any company on the planet virtually any time if it wasn't for your less than award-winning personalities here and there with that said, if anyone does have any questions, I want to deep dive on anything. I think you two have speeches tomorrow on the ends. I did you ever wanted? You do no, no.

C

I didn't qualify, you.

B

Didn't qualify, I'm, sorry and you had one today right so, but so I would definitely recommend seeing them. I. Think they're great speakers actually I haven't seen. You speak, but you're, a good speaker, yeah you're good in private, at least so, but and I think we're also going to go over to the hotel bar for a drink.

B

If anyone wants to deep dive on any of the technical stuff or just storm whatever's in their brains, because it's a it's a mountain of knowledge that they have so thank you guys, as always we'll repay it with sebastian's time and we'll go from there.

B

B