Cloud Native Computing Foundation Storage Special Interest Group, 9 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF SIG Storage Meeting 2019-10-09

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

CNCF SIG Storage Meeting 2019-10-09

A

Hello morning, good afternoon,.

B

B

Hey Alex, are you there I can't hear you I'm.

A

Here, can you hear me.

B

Good morning, good morning,.

A

A

Just wait a couple more minutes and then I can start. I was hoping, nunn-lugar would be drying soon.

A

Okay, I think we can start so we have two documents that we wanted to that. We wanted to review this time around. The first is the database paper that subbu has been working on, and the second is the project review process that the Theron has been working on subbu. Do you want to just give an update as to where we are on the on the database? Doc of I know that Quinton has put some feedback into the doc and I've just written up some some notes that I would like to go through as well sure.

C

Yeah so initially I think the there was some confusion about what the definition of a database is I.

C

Think many of the comments came from that initially, when I wrote, the document first I had categorized pure relational DBA databases as well as the new sequel ones as relational and and there was no clear differentiation between the two and I think some some paragraphs were talking about the pure relational ones and some we're talking about the new sequel ones and that clarity was missing and so I think now what I have done is actually clearly separated the two it's talking about pure relational as one category and music well as the other category musical is kind of slightly broad and they have a large variation in what they support, but I've tried to address them in a generic fashion.

C

So that is one that's I believe that's the main change I had made in the document understood. Okay, and there was also you problem. You were not there in our last discussion. No I wasn't understand sorry yeah. There was a discussion about what is a database in general, like Cassandra, was considered as one whether it should be a database or not I think it was borderline, so we decided to include it, but we already have a key value store section and so maybe maybe I'm thinking.

C

Maybe we should talk about the fact that this line is blurring and some systems whatever get categorized, as in both yeah.

A

So I think that's that's that's a fair point. So so I wrote some notes at the at the end of the document. Just because I couldn't find the right place to add comments for heads yeah, so so I think I. Think one of the things you know that we were discussing in the past was in the rest of the white paper.

A

We can't we had kind of defines the attributes like you know, performance and consistency and durability and whatever and I think it's probably worth adding a paragraph or two around consistency and eventual consistency, and things like that and any other attributes that we that we consider as important for for databases.

A

So, for example, one of the one of the other things that I thought about was you know, do we do we need to discuss the topologies of the databases specifically in terms of you know things like consistency versus availability versus partition, tolerance and things like that. I.

C

Actually so the the debate I'm having about it is is those are generic considerations, not necessarily a cloud native concern, but it I don't have an objection to adding it. I mean it's. It's an important thing too that people should think about when deploying databases, yeah.

A

I think I think we we actually have a section in the white paper around sort of the CAC tier I mean you know in broad terms, but what I was thinking is that some of these some of these items, you know we should discuss the attributes in terms of the different topologies or different database architectures.

A

So, for example, if you were, if you're operating with a master/slave type environment, for example, you have particular consistency and availability attributes, whereas if you have a large-scale distributed system, which is you know, shards or whatever else you know like like like 4th s or or even you know, things like spanner or whatever you have different consistency and different availability, yeah.

C

Yeah yeah yeah I can I can there's a there is actually the third the third. There is a third property which is durability. We.

D

C

These systems juggle with these three durability, consistency and availability, yeah.

A

So so I mean in the previous in in sort of previous sections, where we found these sort of things, I mean we even make just like a little table, which kind of just said. You know for these three types of topologies. These are the relevant attributes that you should expect and it kind of helps to select sorts of different usage patterns. Or you know, different types or classes of database for different use. Cases sounds.

C

A

C

Have actually referenced that section in this thing it's section nine point, four right.

A

Yeah, okay, cool yeah.

D

I'm in favor the table too, and you know in terms of a table I think maybe you have to there are just so many potential databases that I don't think you could cover all the dozens or maybe even hundreds that are out there. So first pigeonhole them with some popular examples in each category and then cover maybe in the table the topology differences, the factors involving wanting to split.

D

If, if this database is implemented by multiple nodes, there often are going to be failure, domain considerations and storage failure, domain considerations that might be independent of the compute, so I think maybe covering that in table, would be great.

B

Dangerous even listing the popular ones I mean. Could we just figure out what the technical, architectural differences are and separate them out and then have the list not lead with the popular examples, but leave with the architects.

D

B

D

B

D

I agree with that: it's just that, having you know, sometimes you can have a bunch of text and somewhere in there throwing a popular example, might solidify it in somebody's mind if they're familiar with something I, don't want to be kingmakers here and nominate. Whatever examples we give is you know the foremost examples in the category, though so.

A

So that's we don't rehash to the Bates from the white paper. We, we kind of had we kind of fact, quite an extensive debates as to what we includes and what we don't include in terms of project names or whatever else we we certainly said we should include.

A

We should include references to some popular examples that allow people to apply. You know context to what they already know, because that helps them understand the documents so, for example, it's kind of nuts to describe an object store without mentioning mystery and it's kind of nuts to talk about the key value store without talking about that CD, for example, especially where's.

A

Maybe some of those projects are our CN CF projects as well so I think have a handful of examples is is fine as because, especially if they're sort of you know, generic household names but I agree in general. We're we're not aiming to be kingmakers here, but we're an example serves the purpose of clarifying the document. I think that's fine.

C

Yeah I think I yeah I definitely agree with leading with architecture and and it's, but if, if, if like, for example, if spanner definitely is the leader in some of those architectures so made if something relevant to spanner is mentioned, there is no harm in talking about it. Yeah.

C

The one thing I'm wondering about is there is this concept of atomic clock that many of them use for better serializable consistency spanner that that kind of ism is a big subject. So I don't know if I'm wondering if we should even mention that well.

A

So so I think in all of these things right there is.

A

There is a question mark as to you know, do you assume this is a service? You do you consider. Do you build this? You know and similarly, for example, we kind of said when we were discussing topologies in the white paper. We said you know. Charlotte systems are great at balancing out the load, but one of the disadvantages is you know you have to. You might have to have operational requirements to to rebalance workloads if you kind of get the shards wrong in the first place.

A

You know, and those are like some of the pros and cons for each of the different topologies, so I think it's it's completely fair to say that if you want a really big distributed, scalable database and you want strong consistency and not eventual consistency, then you know things like strict timekeeping are our complexity factor and if you're considering it, you know this isn't just like something you can. You can ignore.

C

Cool good, so I'll give it a shot. I will try to I'm thinking. This is probably about two or three more paragraphs. Hopefully, I'll try to constrain myself to about two or three paragraphs to see if I can cover all the trade-offs and then we can have another review of those parts. Yeah.

D

With regard to your comment about the clocks, they'd say that, certainly, if the the clock requirements for something are so tight that it presents difficulty in container izing, it then in it needs to be mentioned. Oh yeah you're, like.

C

Yes, yes, we can't. We can't necessarily expect clocks to be accurate in a cloud.

A

Yeah, that's a good points and then one sort of one other small points I had was, you know, do we want to do we want to discuss or include a paragraph for sort of caching layers, and does you know it does a cache like race or whatever count as a database? For example? Oh, oh.

C

Yeah, listen yeah.

D

Really smooth definitely counts, I mean it can be used as a cache, but you certainly have the option of putting backing storage on it in it. I've encountered many instances where it's done so there's no doubt to me that Redis is in the database category I think this is more a key value, though it is you've now user, a form of database I mean FCT, is a key value. Yeah.

C

So that's the thing which I mean because we have already have a separate section for key value stores and wondering may I'm even I thought. Maybe we should merge the two you know like, because many of them are evolving into databases and database is a some databases now start to give key value. Api's I mean.

A

Honestly, if, if that's the, if that's the right, call to kind of have one section which is called key value stores and databases and discuss them together, that's probably fine. Yeah, like.

C

Just announced transactions so right now, yeah.

A

And- and you know- and we also have, for example, we have titanium kV the ti kV project, which is used as a backing store for the TI DB project and the TI kV is is already a CNC F project anyway, so so yeah. If, if you think it makes sense to merge, that's not something I would be against the.

C

Question is: should we donate towards that right, or should we go for the big bang and well.

A

It's probably easier to stick to databases, and then you know add in if we think that if we really think that they should be merged, we can always do that as a second step. But it's it's kind of up to you. If you, if you take.

C

A

Hard to keep the separation then so.

C

I think what I will do is I will I will definitely mention that these lines are blurring and and that sometimes things can come in both categories. I think that's important to mention, and then maybe we can do another shot at putting putting the two together and see how that looks as a separate attempt and then yeah.

A

C

Looks good we can merge that in yeah.

A

Probably make sense cool.

C

So I have two so I have two two points. One is to mention this, that these lines are blurring the other one. Is there two or three paragraphs about a trade offs? There was one more thing, I think you say we needed to talk about. Oh.

A

And I also I also have mentioned. Do you want to mention sort of proxies and the balance areas or things like that's I, don't know whether it's much of a factor or not actually.

C

I have a specific paragraph for that. I am actually stating that a proxy is more or less a necessity to run in the cloud right. Yeah, it's coincidentally Kelsey tweeted the same thing yesterday, all.

A

Right well, I, I, know, I, know that, for example, you know things like envoi, for example, are actually adding database protocol level sharding and things like that to the proxy layer. So, for example, I know that they added support for my sequel and they haven't supports for Redis and they actually do a bit of sharding themselves. Oh No.

C

C

Yes, so their section, let me see at the section, is.

C

Yeah, the one where I got a talk where I have the comment says this is specific to my sequel and post voice. So we can maybe expand that out a little bit. I try to not mention any specific proxies like.

A

I think it's I think it's fine to mention something like like an for you, because it's it's it's CN CF yeah project anyway, so yeah.

B

C

I will expand that out so I know hey. Let me know time to start making notes so I need to mention the lines are blurring I need to mention the three sections on three trade offs and proxies okay,.

A

A

Very cool thanks for all of this, like.

C

Let's go to your comments, make sure that they are all covered consistency and eventual. We are going to cover that. Do we meet yes, the second and the first and second, are a combined topic. Yep, no sequel and document databases for now we'll choose a category and put them there for now and then in which you look at merging.

D

C

Cassandra I think last time we agreed that it should be in databases, I think it's a clear example of a borderline case, yep cockroach, strong consistency that goes back to the first end. First yep database is based on an underlying underlying key value, store, yeah, I. Think cool I will I think that's a good good point. We should fold that into the new sequel categories, because that's where they are most relevant databases, and/or caching layers, so that we I think maybe I will add an edit for the for section line. To mention.

C

Read it and memcache. Okay is.

E

Already mentioned in such dire, oh say.

C

Oh cool, perfect awesome, no need to change that use of proxies we've all I've, already added Oh cloud provider; databases that would be a can of worms because they are all commercial and there is actually like in the last year or so there's probably like 100 vendors, that claim to be cloud provider, databases.

C

Including by the way planet-scale, we are about to announce our own cloud provider. Do we test database.

E

A

Okay, all right.

C

A

So maybe let's not get let's open that particular conference yeah.

C

And so yeah I'm really afraid of that. It's a very contentious, subject: I think: that's! That's.

C

Cool all right, so I will make this I try to make this edits within the next week or two, and then let's have another this one. That's.

A

Awesome all right should we move on to two we're in stock in that case next, so the the the latest version of the doc is in the meeting minutes agenda.

A

How people found that, or, and.

B

The chill should be CN CF, sandbox process, template open, and the reason for that is there were two docks. One didn't have the proper permissions for people to comment. I also put the link here in the chat just in case you can't find it in the agenda.

A

Perfect, thank you so.

B

Yesterday, I was invited last minute to the CN CF clothes session to talk about a stock to the TOC and they were.

A

B

Extremely receptive and excited about this, they were gonna, spend some time going through and adding comment, um but they they fully agreed that yeah. We we need a better, well-defined deterministic process for people to understand and.

B

How the cigs interact with the the TOC long term, so the it's still pretty rough, but I wanted people to weigh a and I didn't intend this to be a polished document by any means, but just to solicit feedback and come up with a better process. So the main points it to see that the TOC agreed on was time boxing that you know there should be a responsibility on the TOC to review things within a timely manner and provide a response back. There should be a way that projects understand if they are rejected.

B

Why they're rejected- and you know, with the possibility of like a rate review so that we talked about the TOC saying, like your you know, here's the criteria by which we're judging it doesn't fit these things, but we would review it in three to six months.

B

If these things are fixed- or it's not cloud native, you know by design and therefore it's rejected so I think they have a problem of not saying no and and or just letting things well, because they don't want to say no and I think that that has to stop so um and they also agreed. We don't have a good process now for what that this responsibility is because I told them I feel like we end up with duplication.

B

Now now we have cigs, we review things, we give them a recommendation and then the projects still end up presenting to the TOC and we start over from scratch. So how do we also make it? A more efficient process makes sense to have the subject matter. Experts review the projects and do the due diligence and give a recommendation. So how do we do that? Better?

B

So I don't know, Alex I, don't know that we need to go through it line by line or if people just want to put their comments in and I can tell the TOC that we have reviewed it as a sig and maybe give it to sig apps next and have them weigh in before we.

A

Yeah, definitely, we can I think we all should I think we should all review this. So if I'm, if I'm, just looking at this in terms of in terms of sections, we're kind of defining what we expect out of the out of the TOC and the process and the timeline, and what also what the process in the sig itself should be right in terms of when they hand it over to us.

B

Yes and the other thing that they were kind of knocking around that I think is worth noting is genoise was having yet another level of entry into the cncs. That's just for neutral IP to me, that's what sandbox was supposed to be.

B

So I just wanted to point that out it's under your the project rejection recommendation project marinate in the Linux Foundation public group, we're trying to find out what that level was. There was no decision made on it, but then they just talked about well, maybe we we need somewhere that it just goes and and gets more contributions or fix. You know fixes governance or et cetera, and to me I always thought that's. What's the intention of sandbox was so I think there's definitely still some.

B

Cohesiveness, that has to happen of what the expectations are at each one of these levels. So it's it's good that we're talking with them about it. Mm-Hm.

A

I mean, as we had previously discussed, some of the challenges we're having.

A

Having sort of there's this perception that the goal posts were changing is maybe different criteria, we're getting applied to two different projects and I and I. Think having having this having this process, formalized will hopefully remove that that issue I'm not entirely convinced that we need something more than a sandbox honestly or something less than a sound box. Personally, because I think the sandbox is kind of very intentionally a broad I and.

B

I was a bit surprised by the suggestion, um but just hoping that this helps because I I told them. My concern is one there's no time around any of these things. He cloak that we proposed is been in over a year waiting to be given a decision and to me that's not only has the entire TOC changed, but it seems like the criteria by which they're judged has changed right. So things have to be time.

B

Boxed and people have to understand the criteria which they're getting judged against because though I know they have good intentions, then I think I hear these rumblings of unfairness. Right Mike will hop him. This project got in, but my project didn't get in and why was it rejected and in and I think they also haven't been doing that in the public.

B

They they do it quietly in the background, but that doesn't help other projects learn what they should be doing so I think that's I, think it's okay to say the project doesn't fit based on this criteria and how'd that be public information. I, don't think that needs to be done in private, so I'm hoping there is more transparency that comes out of this.

C

Once they know it, they may gaym it.

F

C

Maybe we need to have a way to verify right.

B

Yes, not that Jo beta brought that up like he doesn't want to have like a criteria, and they say well I met all these, so then you have to accept me and I tried to make the language in there address things like that, like these are the minimum viable criteria. These are not the completeness of what it means to to be. You know, I mean we. It just needs to be worded in a way that gives the TOC the flexibility, but it also gives a good enough direction for these projects.

B

So I don't know it's it's, hopefully, a step in the right direction. I I.

A

I think it's definitely a step in the right direction. I mean for for what it's worth. My my two cents is that the sandbox is a great place where a project can mature and get and gain its its its first few steps under the foundation right and I. Think a lot of the challenge- and you know especially the comments about the gaming rights- is because the sandbox shouldn't be about.

A

Marketing or sort of gaining gaining commercial sort of plaudits or whatever, from the CN CF, because you're in this in the sandbox, it should be about building up the community, building up the governance and whatever else to get to the point where you get into incubating and I.

A

Think the the key thing that's missing here is that you know the guidelines around not marketing and keeping the sandbox projects as a separate category are really really important, because a lot of the question marks that we kind of keep on keep keep on hitting around removing all those are because the sandbox projects, then you know, directly, do get marketed, and things like that, so people do see the sandbox projects as getting again and I. Think that's that's a challenge here.

A

We if, if there wasn't that perception of getting again and it was all about a community, then some of these issues would just go away.

E

Well, once you become sandbox project, then by default you will have like a intro and deep-dive session at Keuka, where I think that's huge. So even if they say no marketing, definitely once you become a sandbox project, you get a lot of more attraction.

A

B

Well and I agree and I I. It was unsettling to hear such diverse comments around sandbox in in from the TOC board of what people thought it should be. You know what I mean like I, think some people have really high expectations of what they think a sandbox project should be, whereas I have I feel like a long time ago, we removed like the due diligence for sandbox, for the exact reason that it could it could grow and flourish and expand the community there. So I mean things like that have to be.

B

You know it has to be consistent for every single member on the border. What they expect, or you know it's chaos, yeah.

C

The the fact that, like I, think when they were discussing sandbox projects, they were talking about like having like a hundred like accepting them in the hundreds which I have which probably won't scale for these cube. Con sig talks right. You suddenly, like hundred of them, want to present that's probably not going to work out eventually.

A

B

Feel free to add comments, suggestions. You can put it directly in the docker out as a comment. I mean I, don't I want it to be a community driven document. I, don't feel like you necessarily need ownership of any of this I just would like to include for everyone and I. Think we've certainly ran into some of these things. You know just as one of being one of the new SIG's reviewing projects, so your input is desired.

B

A

Cool, did you um did you to having this sort of done by any particular date or having any times it's off to the next sick, but by any particular dates? No.

B

They just it was like a last minute thing: hey can you join and talk about your doc and I said sure and then I didn't commit to dates or finalization. They they wanted some time to review. I can bring that up on the next public call and we can figure out a date that we could ride. Ideally, I think we would want it done by Q Khan. We would want to have this criteria set forth published and you know so. New projects can look at it. Hearing to that, yeah.

A

I think that's a good goal to have.

A

That's a good goal to have okay.

A

That sounds good. Were there any other sort of specific points we wanted to discuss around this process, I mean were there any? You know particular process points that you wanted specific feedback on right now or or can we do this offline.

B

I'm fine with everyone just adding their comments: offline, okay,.

A

Okay, so what I wanted to do next, then, is the the the last thing on the agenda is to discuss the the benchmarking and performance paper which I've only just started, putting together and I kind of want to apologize for letting this slip. I was meant to set up a call a couple of weeks back and then life happened and it didn't quite happen as I planned. So what I'd like to do just now is discuss some of the ideas and perhaps have a little bit of a brainstorm.

A

So that's I can put some an outline together and share with the with the group, and then we can actually have a meeting to start sort of fleshing out some of the some of the details of this of this paper.

A

So, in terms of in terms of scope for this paper, what we wanted to kind of do was have a place where we can offer information on the tools and methods for measuring performance and benchmarking, the client native storage and what I wanted to specify as goals were going to be sort of the following three things: the fiying, the commonly used tools and the under test criteria, the fine, the the common pitfalls that that people come across and provides the ability for users to use these tools to measure their own environments and specifically the non goals.

A

So the things that we absolutely don't want to touch is we're not going to be publishing, benchmark numbers and we're not going to be providing sort of our own vendor or product or project comparisons. It's it's. It's all about sort of providing the end-user with the ability to run their own tests, thoughts, comments, questions.

A

Does that make sense.

B

um Why would we not publish the benchmark numbers just curious.

A

So some I thinkin- this is benchmarks- are very, very often kind of how long is a piece of string sort of concept it is.

A

There are so many there are so many items that can affect a benchmark from the environments that you run it in the CPU with a cloud instances: the networking, the actual physical storage, how everything's interconnected how the storage is configured and everything that's actually providing numbers is generally very specific to a very specific environment and I think it's always kind of opening yourself up to to sort of being gamed, because it it kind of creates this environment.

A

Where you know different people might want to publish different benchmarks and they tweak everything for a particular use case and I. Don't really want to get into that into that arena. Where we're having to argue the pros and cons or the Howard. Do you compare apples to apples between different benchmarks, different numbers, so what I'd really like to focus on is to give people the ability to run their own tests on their own environment.

A

So if they're, if they're looking to test two projects in there or two tools in there or two providers or service providers or two storage vendors in their equipment, ethics cluster, they can use, they can use the tools to sort of compare them in their own environment as opposed to and I know. Obviously what it allows them to do is for the end users to then actually publish their own numbers, but that would be their numbers for their environment as opposed to our numbers and some hypothetical environment.

C

Yeah and the people cheat a lot. They like turnoff safety features of the databases to get better numbers and stuff.

A

Well, yeah I mean- and this is why I said you know I kind of wanted documents, common pitfalls. So, for example, you know, storage systems will go faster if they're replicating with loose consistency or asynchronously rather than synchronously, and you know they will perform amazingly if the entire data sets exists in cache and isn't actually hitting. You know physical media, for example, and things like that, so these are things we can. We can actually, you know, have a few paragraphs to actually document these things.

A

So people know what they're comparing but I, don't really want to get into the into the complex scenarios of trying to justify why a particular system has a particular number, because I think that's. If, if you know, if promoting a particular project is a can of worms describing the performance of a particular project is a gigantic kind of worms. I'd.

D

Agree with that I'd, maybe even even if we give users the tools to run their own benchmarks and then publish them, I'd even be go so far as to say, maybe we'd be hesitant to publish links to those unless we're prepared to validate that the tests were done in a repeatable and provable fashion. Yeah.

A

Absolutely I mean you know: at the end of the day, we don't want to become like an SPC, Institute or whatever right to to to sort of have paid performance tests or something I thought that's kind of the scope of this.

A

This is this is more a case of we've written a paper and people understand the different aspects and different attributes of a storage system, and what we're trying to do here is give them the capability of measuring one of the attributes, which happens to be performance, and we might you know the next thing might well be something like consistency. For example, then we might suggest different tools to test those kind of conditions, but I don't actually be in the market of publishing the marketing numbers so to speak. Yeah.

D

I agree so the the summary is: we're gonna set ourselves up to teach people how to fish, we're not gonna catch fish for them and we're not gonna run the fish market. Basically,.

A

B

And so do we need to have a disclaimer that they can't also use those numbers to feed their own? That, like I, don't at all disagree with the reasoning behind it, but I also don't want it to be used as a weapon against other people saying using the benchmarks for their own purpose. Then I mean do. How will we prevent the other side of it? Even though we don't publish it? How do we prevent other people from taking those numbers and doing comparisons are.

D

You saying you're afraid of something like some vendor puts out a press release, saying we ran the official, CN CF benchmark process and here's the numbers.

B

Not that I would think anyone on this call all would ever do I'm. Just you know, I don't want it to be used as a weapon against us that we're suddenly in the middle of well.

D

I think, unless we made some attempt to ban people from making statements like that, we can't stop them. No.

B

But we could also have like before they use the tool they have to agree to, that, like these are not to be released for public. This is just for personal use, right.

F

B

F

B

Some really strong legal disclaimer for the use of the tool before they even.

F

Are allowed to so.

A

So so, first off so a couple of things in in the first instance we're not building a tool or a framework we're describing tools which are publicly available anyway, and most of those have some sorts of disclaimers anyway of their own, but either way you know we're not saying we're not building a tool. That's somebody who's going to use the cnc of storage tool or whatever right, but also. Secondly, I think the CNC F is fairly well documented things around trademarks and things like that.

A

You can't really use things like CNC I from the logo of the CN CF and all of those things without the CNCs permission. So I'm not I'm, not too worried about that and the CNC, if we're actually pretty draconian at enforcing that as well. Yeah.

D

I agree, and- and the other thing in terms of stopping them from a legal perspective, in the sense that you know we're pretty much on an open-source license with that allows people to fork. It will I, don't see how, since anything, we produce would be under the open-source license that people couldn't take whatever it is and declare that they had forked it. Therefore they can do whatever they want.

A

B

Okay, I just wanted to know my concern there, but I think you guys are right there. There should be framework in place in people, decide to use it on their own. Anyways I mean I, stopped them from using open source stuff. To justify sorry.

C

We should talk about how to configure the tools and basically not how to configure the database that they are testing against right. That's what it amounts to yeah, exactly and mention that this benchmark tests, this type of workload, so make sure that your the workload that you intend to run in your production matches what the benchmark is trying to do.

A

That stuff is my plan, and you know it's all. It also means that people can, you know, use the tools to benchmark different configurations, so you know something like if they want to measure.

A

You know the query performance between having two replicas and three replicas. For example, you know they can also do that. You know that.

A

So so initially, when I was what I was thinking of, doing was focusing on volumes and databases as to things to to measure. I know that we could also potentially do key value stores, but I don't have a ton of experience in that space unless somebody wants to help with that area. But certainly you know in the in the volume space there are.

A

There are a number of good open-source tools, including you know, like the obvious ones like FIO, where we can sort of document the different types of different types of sort of benchmarking criteria and block sizes and random versus sequential and read/write ratios, and caching versus non caching and compressed versus uncompressed. Indeed view versus on these. You can you know all of those. You know obvious things which are which are fairly well understood and I. Think with with databases, there's you know, certainly quite a bit.

A

We can do with things like you know, sis ventra, or something like that with key value stores. Perhaps there's the wire CP benchmark suite, which, which is quite popular there, but I would probably need a bit of help to to structure that bit of the document. So what do you guys think about ah sort of focusing on volumes and databases as a first step.

C

Yeah, so we have a via ourselves lon, both suspension, TPCC benchmarks, and when I was at YouTube, we actually ran YC us be against her with us, but those are kind of I. Don't know how much why CSP has evolved. Since then, yeah I mentioned that I I could have the person who ran the benchmarks document about how it should be run, since bench is fairly straightforward.

C

You just point it at a database and then it transits queries. Ppc see, is more complicated and there's recent interest about TPCC benchmarks, but for a long time nobody even talked about them or.

A

Are the CPCC benchmarks so they're publicly available, or do you have to purchase some sort of license to use them? Oh.

C

It's a public standard and there are many implementations.

C

There is one actually there's actually a since bench lower version, so you can actually run TPCC using suspension surfing. Oh.

A

C

Published open-source that project, that's actually what we use for our TPCC.

A

So so Nick I see I, see you're on the call and I know that previously you had reached out to to say that you might be interested in helping out on this. Did you have any ideas on this? Well.

G

I was gonna, I was gonna, say I. Think I agree with what's been said so far. It was just the section at the beginning, with the commonly used tools and calm pitfalls.

G

I wonder whether we also need to define the concepts and I think there was a reference to that, but I wonder whether you know in terms of latency what that really means: spikes versus average latency throughput and then you know even then do we need to go to the level of saying SSD right cliffs and that sort of impact, or do we gloss over that.

A

G

A

I mean so I mean that's. That would be the SSD right. Cliffs would be one of those things where I would suggest it would come under the common pitfalls. So you know just because people run these sorts of tests and and if they don't run them for long enough, for example, they they they kind of get to see the cached version of the SSD and and then it slows down over time.

D

A

I guess you know that's what this thing also applies when you're benchmarking in the client, because very often the car providers provide you it's a certain number of I ups for for for a volume and then you kind of run out of credits and it's it's almost similar to an SSD right, cliff, so yeah I, think I think that's probably worth doing.

A

So if we, if we're kind of agreed on on some of these concepts, I could put a quick outline together and then it would be really awesome if we could have maybe just a separate call for the people who are really interested in helping write some of the content to get together, and we could kind of split up the work. If that makes sense,.

C

So do we have a native list of tools that we want to cover I.

A

Would I think we can get a long way of the way there by using sort of covering FIO and insist bench potentially, unless anybody has sort of like an idea for another killer tool, but I think those two probably give us a lot of the things we're looking for day. One cool.

A

So, in that case, I don't know super. If there is a specific person, you think we could work with and Nick is there? Is there a particular time we we could? We could sit down and and sort of kick off after I send out the outline sure.

C

Yeah I will I thought. I might vaguely remember asking you for your email to start a thread, but I don't actually started it. ah Okay,.

A

Why don't I do that, then I'll I'll ping, you guys an email, I, didn't I will start the thread that make sense and then I'll add the engineer that random.

C

Edge box for us fantastic Tom,.

A

Buckley, brilliant.

A

Okay, so the host last thing in the agent that did anybody have any other I think that they wanted to cover today. So there.

C

Was a third document that we discussed, which I'm trying to remember what it was I think it was how to run each system.

A

The use case examples that that Lewis was was working on right, yeah.

C

Yes, that one yeah yeah.

A

So Lewis wasn't on the call or wasn't able to join today. Sorry, so I didn't put it on the agenda, but maybe we can discuss it next time sounds good yeah, because because, ideally we want that template ready for Q phone as well yeah, it's all possible.

A

A

Okay, in that case, I think we're done much four minutes to spare thanks for joining everyone.

B

Thanks everyone thanks Alex thank.

A

B

G

G

G

G

G