Flatcar Container Linux Tech talks, 19 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Service Mesh Benchmarking Framework - Thilo Fromm & Suraj Deshmukh, Kinvolk

Description

This talk will show how we created a framework to benchmark service meshes, how to create large use and throw clusters, pipelining of metrics in persistent storage, how to choose the right metrics to get a holistic view of the performance of the mesh, (ab)use of the Grafana charts to get around the limitations of time series database, tweaks to the wrk2 tool to get the job done, etc.

A

Awesome, uh I hope uh you all can listen um and hear me and uh all right. So thanks nisha for the introduction and uh hello everyone uh good your time of the day and hope you all are doing well, uh thanks for coming for coming to our talk.

A

uh First of all, I'd like to thank uh hannah for setting the stage for us because uh now attendees have an understanding of what the service mesh is, and so that now I and t logo can build on top of it, like it's a really good segue into the performance aspect of the service mesh, so uh yeah the next uh next slide.

A

uh So like uh yeah, uh uh we are already introduced, but yeah still, I'm sure deshmook. I I work on locomotive uh the kubernetes distribution of ken folk and my core speaker is tilo. He is uh director of osn security and leads the flat car team uh flat car uh container linux. It is a new avatar of uh core os container max or the next slide uh yeah a little bit of introduction of my employer uh kinfolk.

A

So we do software right from kernel to the application level, uh of course, kubernetes in between and uh so yeah. In one line we like to uh say, like you, are kubernetes and kubernetes linux experts and, uh uh like nasa said, uh we'll, take questions at the end uh next slide.

A

So uh uh uh I I I'll talk about the agenda first, because uh the uh like simon sinek likes to say, let's start with, why so we'll look at why we have chosen uh a certain way to collect matrix and the rational behind it, like everything from sample size to statistical spread and things like that, and uh then we'll go on to look at the implementation, the engineering aspects of how we have set up this framework and how we have collected various metrics from various clusters.

A

And finally, we will. uh We will end with a on a practical note by looking at the demo how it all fits together. So, at the end of the talk you will learn how to how we build the benchmark framework and our learnings from it. So if you, if you feel uh feel like uh using it, you you'll get a good idea and if you want to build uh something, similar you'll also get some inspiration.

A

So uh from here uh or tortilla,.

B

um Thank you siraj. So, let's start with some theory the rationale behind the benchmarks that we did so the metrics that we'll be looking at um indeed cover three of the four gold signals.

B

So um thanks for bringing this up hannah, the goal that we have the overall goal of our benchmark is to determine the cost of operating service mesh. It is generally a comparative benchmark, so we are looking at the differences between different service measures. um We're looking at regular use cases, so there won't be any cluster overload.

B

uh The data we collect focus mostly on request response latency, as hana already raised. That's the one that you can't uh compensate easily of just throwing money at your cluster you'll.

B

Also, look at cpu and memory usage of both the control plane of the service mesh and the sidecars in our benchmarks and we'll have control metrics and those are um there to make sure that we really don't do any that we really don't run into overload situations so we'll be looking at the request, response, error rate of our load generator and we'll be looking at cpu and memory usage of the application under load and the benchmarking tool, and if those saturate, the limits and the nodes there on and obviously not in a regular use case.

B

But we're running into an overload.

B

The sample size and statistical spread is something that I've seen ignored and quite a few benchmark results that have been published out there. So we run our benchmarks, as many classes actually run in an infrastructure as a service environment that our iis provider serves us in their data centers. We have limited control over this environment.

B

So when we do our benchmarks, we want to make sure to have enough data in order to identify and remove strong outliers in the data. For instance, we could have a lemon host in the cluster that would skew our data.

B

We could have noisy networks so basically neighbors that do a lot of network traffic which impacts latency, buggy top effect switches and things like that, and we wanna identify those and basically exclude those from the results, and then there are variations um that we need to include uh not basically just cover the statistical spread of the data that you're collecting.

B

So there may be some variety of these servers and the network equipment that we're using. uh We don't know if, like all of the hardware, runs the same firmware versions, we don't know about hardware revisions, and that is just diversity that the environment introduces and it just you just can't escape it's always there. If you use data centers, you need to make sure um to have those covered in your benchmarks.

B

If you don't host yourself, so we basically repeat the same runs multiple times both on the same cluster one after each other, as well as on multiple clusters that have the same heart aspect. We just basically try to gauge um the uh statistical spread that we're seeing that that will be that we need to cover in the data.

B

So if you look at um charts, you see that uh it's neat, you see latency right, but this chart is a lie because it's a single snapshot, it doesn't tell you anything. So what you want to see is at least the the amount of the ranges that your that your tests run into.

B

Otherwise um you may you may even look at an outlier, but in any case you will have no idea about the minimum maximum spread of the data that you're collecting and that's particularly important when you're, when you're comparing um service meshes with um that are supposed to add, minimal, latency right all right um as another uh thing that we're doing, we want to have our benchmark um user experience centric, so henna had this great um animation in her talk where you could see the inside of the of your cluster of your micro architecture, environment and you could see, requests and responses having individual latencies going back and forth.

B

This is not what we want to measure. So we like to take the position of the user and the user always uses this the whole application at once in your cluster and your application um then consists of individual microservices and you want to cover that in our benchmarks. So a single user action fed into your application, which consists of multiple microservices, will cause many microservice endpoints to be called so.

B

The the benchmark, we're gonna run, um is user-centric and will basically interface with your cluster as a big application, instead of just covering individual endpoints latencies, that's very important to us, and then we have a very specific way of measuring latency overall, and it's also uh factors into the user-centric side of things. um There's this um developer jill tenna. Who can explain it a lot better um than I can and there's this youtube youtube talk that you should watch and he he coined the term coordinated omission.

B

So that's discarding data that just basically feels like it's not there, but in fact it's usually user, impacting so um taking coordinated permission into account allows us to reflect the user experience on wait time when looking at latency and instead of measuring the requests per second individually. We actually measure committed rps over time.

B

So to give you an example, if you have a user action that causes in your cluster 100 endpoint requests and your application commits to, on average 100 rps across services, that's 10 milliseconds per request on average, then the user expects one second of wait time, but if one of those hundred requests has a stall for one second um and all the other complete um in 10 seconds, then in 10 milliseconds, then the user will have a 200 percent, wait time of what the user expected and we don't think that many uh traditional ways to measure latency actually reflects that.

B

So if you look at um the example in terms of uh statistics, we have 99 requests that go with 10 milliseconds and one request that takes a second, and this is kind of what it looks like right. um If you take the average on uh the per request kind of your thing, then you have 20 milliseconds. On average, you have five milliseconds p25, because that's below the 10 milliseconds average. Obviously p50 is 10 milliseconds, because that's what the application guarantees pf 75 is 15, milliseconds and even in p99.

B

You don't see this one outlier, even though it's highly impacting for your user.

B

But if you instead look at the time of things when we see 980 milliseconds that have 10 milliseconds latency response times on average, and then we see one second, that has a thousand millisecond response time, and if you factor that into uh the equation, then you see that your average latency, if you measure over time and not over individual requests, is 500 milliseconds, because you spend two seconds on something that should have taken you a second um and the latency reflects much better in the percentiles of what you're measuring.

B

So that's the approach that we're taking and um how do we fix that when measuring latency on the technology level? Well, so to get started, we need to feed the expected or committed requests per second into our benchmark. um So if you start with committing to 100 readers per second, that's one request: every 10 milliseconds. We expect the first request to go out at 0 milliseconds, the second and 10 milliseconds to third at 20 milliseconds.

B

So if one of those has larger than 10 milliseconds latency, then the succeeding request will not go out in time and that again is easy to um to map in software. So our fix is instead of measuring latency. At the point in time where the request goes out, you start another measuring latency at the point in time where the request should have gone, um and that gives us this great um time focus and very, very user-centric view of things and, let's hand over to suraj for the implementation part.

A

Sure, thanks tilo, I think we can start from the next slide yeah, so uh uh yeah. This slide shows you how the setup looks like benchmarking, like we have one controller cluster like this is all locomotive uh which is kubernetes behind the scene uh and, like all the logos that you see behind, uh like they are deployed so like controller cluster, has a prometheus deployed with uh storage backed by open ebs uh like this storage is quite quite large like uh so that can store all the metrics coming from various uh and then on.

A

The controller cluster is going to help us to visualize all the metrics. We have script and down uh down below. You see all the leaf clusters that have been deployed from controller cluster, so each leaf cluster has various components like open ebs again for storage for to back storage of prometheus contour and metal lb.

A

They they help you to expose your application over the internet because uh we deployed on on packet cloud and they don't uh and uh the the right way to expose application over the internet on packet cloud is using metal lv and then at these endpoints on the uh the endpoints on the leaf clusters. Are then scrapped by controller clusters?

A

We also have external dns in use here, so that when prometheus and prometheus is deployed on these leaf clusters, there is a dns entry made for it in the aws, so that controller cluster can knows where to where to scrape from and uh a link id and sto. As you see like uh these are the two service meshes we uh uh we did benchmarking on and they are deployed as as needed like uh as as the tests progress.

A

We also have metric server. There is no logo for it, so, but it is needed because uh istio uses horizontal port autoscaler and it auto scales as the load increases, and we will later also see how our controller cluster knows about the prometheuses, prometheus or various prometheus running on the leaf clusters so yeah over to next slide.

A

So this whole thing that happens at the at the root level or the controller cluster that you see it is all done by this. One uh help chart called orchestrator help chart. uh So uh so, apart from, like I said, before, open ebs and prometheus, there is also this child deployed on the contraster, and uh so it does various things like it has multiple jobs. They download helm, they download uh the charts as well and terraform, which is needed by the uh locomotive.

A

Deployer loco ctl uh also builds loco ctl because we were like also experimenting with it, so we could always we could give this commit and then it will build loco ctl and all these binaries and helm charts and everything is available in this one volume. So this orchestrator application has access to it now.

A

Orchestrator helm chart has one golang application, which actually runs multiple kubernetes jobs, and then these jobs go and create these uh leaf clusters uh uh and, like uh the if job reports, a failure, then it starts again because kubernetes jobs, so it is started again and since it is backed by a volume, uh we don't lose on any manifest or any configuration that was used for creating a cluster.

A

uh uh Earlier earlier, uh uh like in the design process, uh we used to like uh delete the whole cluster.

A

If there was a failure in the job that turned out to be very cumbersome so uh yeah, we don't uh delete it now and we just leave even after jobs are completed or failed or whatever uh the configs stay there, and uh someone can always go, go back using this one debug port that is always running and then do whatever like delete the cluster whatever and uh also uh like, like I said, like we have this one volume which has all all binaries and configs and everything uh so this uh yeah this this helped across all the jobs and and finally like none of the scripts, uh I mean this.

A

This is all backed by a uh bash script. Right now uh like for the jobs uh jobs that uh start leaf clusters, none of these scripts were baked into docker images, because anytime, you wanted to make a change. You don't want to build a docker image, push it to a registry and pull it again and test it.

A

So the best way to do such kind of things is create a config map from that scripts and mount it as a volume, and so so anytime, you make a change, it's only helm upgrade and the thing starts again. So that's uh one of the learnings and uh yeah over to the next uh slide thanks.

A

So uh the the like every job, like I said uh uh it is, uh it is backed by this orchestrator application, so it runs kubernetes jobs which deploy leaf clusters, and then it deploys external dns, prometheus grafana and all that, except for istio and link id, because we we do that later when we start the benchmark runs uh since and since this controller cluster jobs, like these jobs, are running on the controller cluster, the root cluster.

A

This is where, when it when it is deploying these child or leaf clusters, this is where we do. The uh registering part like this is where uh the root prometheus knows about these uh child prometheus and that's how prometheus from rose creps from this child, uh so uh yeah, and then uh we also like during the benchmark, runs. We also deploy push gateway before before starting benchmark runs and uh let's see what benchmark runs, look like uh or to next slide. Thanks.

A

So uh look like you can see. These are like three nested for loops uh for every request per second, like 500 000 1500, we run, we run the for loop for five times and we run it for three types of service mesh uh link, id istio and no service mesh, so we still call it service mesh, and uh so so, every time what happens is it the service mesh is installed? uh If it's, if it's bare metal I mean, if there is no service mesh, then we don't do anything we just return from that function.

A

Then we install this emoji water application. So emoji auto is a dummy application that linked rde ships, and uh so it was used so that, like uh we needed to simulate the micro services architecture and after that is in this run, benchmark function uh deploys the actual wrk to help chart. uh This is where the job runs. uh By taking all these parameters, it starts firing all these requests.

A

And finally, we run the merge job so that all the metrics are uh are sent to push gateway. And, finally, when this, uh when both the jobs are done, uh we delete the emoji water applications again and clean the mesh.

A

So uh we clean it every time because, let's say when you deploy sdo and like you're deploying this application, the proxy is injected by proxy is injected by the uh the mutating uh webhook that is always running so so we need different proxies for different service measures and we need no proxy when there is no service mesh. So that's why every time we install and every time we clean at the end.

A

uh So this is the the core of uh what happens in there or to next slide yeah. uh So at the node level you this is how leaf cluster looks like. So there are these multiple workload nodes where emoji, auto applications are running, and then there is one benchmark node, which is running your wrk to application. So all the requests from benchmark node from wrk2 are filed to these emoji water applications. You can see like all of them are of same type.

A

This machine type is available on the packet cluster and uh the controller is just single, because there is no, no, not that much load on the uh on the control plane of kubernetes. So it's just one more uh or two next slide so, uh like I said before so, wrk2 is used to generate the load and measure latency and emoji over to as a demo app, and it is deployed in like uh in multiple times in multiple applications.

A

So you can always tweak that how many applications you want, depending on how much you want to stress the whole thing uh yeah or to next and uh so so locomotive. uh I think I didn't mention it before so all like when I say components, we have this notion of components or all the helm, charts are sort of backed and the configs that these components provide are supported by uh by by kinfolk as a part of locomotive, so link id and sql.

A

We have added as experimental right now and uh if, if you want to check out, you can just download the local sequel binary and you can deploy the cluster and these components.

A

So once you get, the slides, you'll have the links as well sure or to next slide. So we saw the earlier for image that we saw. It was the node level flow of data. This is what happens at the pod level, so you can see like wrk2, that job is first of all, sending all these http requests to various applications, and once all the metrics are collected by wrk2 they're pushed to the prometheus push gateway now the now so why push gateway? uh You might ask so so prometheus is very good.

A

It is a has a pull mechanism of metrics uh now something that is very short-lived, like a kubernetes job. How do you live in? It's uh it's uh not very efficient in terms of like uh by the time promise starts scraping like discovers this job and starts creeping metrics from it, and the job might have died and uh you might lose out on metrics.

A

So push get reacts as a uh as a stop gap solution here, where you push all your metrics to push gateway and then promise you scripts it from the push gateway and, uh like you can see here, like the prometheus, uh like every every leaf cluster has a prometheus and, like I explained earlier, the root cluster then scripts from these parameters, which we haven't seen shown here. But that's what happens like the root scrapes from this parameters. So we have metrics from all these leaf clusters in one root cluster so that you can.

A

You can see metrics from various clusters so that we to increase the spread uh so yeah. I will do next slide uh yeah and now we can see the demo of how it all happens.

B

uh Thanks so much okay, so that was the implementation of the architecture that.

B

That he's built and all of those levels are individually reusable, so you don't need to set up a orchestrator cluster or controller cluster to get leaf clusters. You don't need leaf clusters to run those benchmarks in your own clusters.

B

Just have a look at the repo later and see which parts you can use. What I'm going to demo now is, if I just quickly this. Basically, this is what we're gonna looking at be looking at. So it's a single leaf cluster um it'll be pretty low level I'll start a single benchmark for you um before we can start a benchmark. Obviously we need to provision, but, since you know the technology behind provisioning, a cluster is pretty amazing, but looking a cluster actually being provisioned is kind of like looking paint watching paint dry.

B

We skip that episode and start right where the action is so you'll get a recap.

B

That shows you. We checked out the service mesh benchmark repo. We did a few changes in the configuration that needs changes to make things work in your environment.

B

We created a variables file, which we filled with back-end information that terraform needs to use in order to work and to share state and then we applied and then the cluster got deployed. It takes a little longer than that, but that's basically the gist of it. Now.

B

You can take a look at our cluster.

B

And we see we have three worker nodes, we have our controller and we have one load generator node. Those three worker nodes are exclusive for the application and the load generator node is exclusive for the benchmark tool.

B

Name spaces wise. I prepared um the application. Already. You see four instances of emoji water deployed. You just use uh sorry five. You just use five different namespaces for that which gives us different urls for the endpoints and thus we're simulating a more complex application. If we look.

B

B

The parts and in the single name space we see the three microservices the motivation consists of all of those microservices will have endpoints and will cover all the namespaces and all um microservices in them all right now, the results of those benchmarks will be displayed in a grafana dashboard that we call the um benchmark cockpit, that is um set up to give you um an overview of individual benchmarks. You'll be able to introspect benchmark data and you'll you'll be able to see the benchmark running now. Let's start the benchmark.

B

B

That's the demo effect.

B

No there's my helm come on it's reasonably complex, so it's um it's a good idea to have this in your history um and this basically deploys wlk2 um and um tells wlk2 that there are five applications. That's the five emoji vocal instances, um the benchmark. It sets a committed rps which is 50 rps, because we're going to take it low here. um It sets the duration to two minutes. um 24 connections, that's 24, wlk threads and it gives an initialization delay of 10 seconds.

B

So that's that's useful for getting your your benchmark, part meshed, which in this case won't happen because we're doing a bare metal benchmark now helm installs. This.

B

We see a benchmark, namespace appearing.

B

And wlk 2 is running and uh you should be seeing a new entry in the benchmark list here and that's basically the the currently active benchmark. We see that it's eight percent done um and just for refresh as the benchmark is running, you can see the average and current rps that is being emitted while the benchmarks running and it's not stellar, it's just 50 rps and that's for demo purposes. Obviously you folks do your own gauging and uh determine your own rps rates.

B

In the middle section of the dashboard, you can introspect service mesh resource usage, which we currently don't see because we're not benchmarking a service mesh and things run on their manner. Please scroll down a bit. We see our control data, so we have the memory, usage and load generated by the benchmark um tool. That's pretty it's pretty low. um The load is 0.0 something and it takes 1.4 gigabytes of ram, and we have plenty left.

B

You see the same here for all of the modulo applications, that's the microservices, so we can have a a kind of control mechanism um to validate that we're really not overloading anything, and we also have a bird's eye view on the cluster, which is the load on all of the cluster nodes, which is also quite knowledgeable.

B

As soon as the benchmark reaches a hundred percent. You will see results in the below result, section which right now doesn't have any data, because the benchmark did not finish.

B

B

I was speaking too fast.

B

97, so results should pop up any second here we are, those are the results of this specific benchmark run. We see a little bit of statistics, we have transport errors which are all zero, so this is a good benchmark run.

B

We have latency percentiles here and we have a very detailed breakdown of latency percentages down here to dive deeper now um single benchmark runs are very nicely into introspectible using this dashboard, but to really get a summary of things, we created a summary dashboard, so this will show you comparative latencies of service measures compared to bare metal runs and since grafana wasn't really built and prometheus wasn't really built to display non-time series related data. This is kind of a little um manual.

B

So what we need to do now in order to refresh this in order to feed our um our benchmark run into that is, we need to run a separate job that is the matrix merger, so the matrix merger will simply.

B

Take all the benchmark runs that are in in prometheus, merge them to produce merge, metrics and feed those merge metrics into the push gateway where they are picked up by the dashboard. Here, it's actually pretty it's actually pretty um a pretty quick job, so.

B

It should have completed by now it's done, and uh we can refresh this dashboard and we'll see here a new one uh having popped up. This is. This is a little um almost abusive to uh to grafana's charts that we're doing here, and it's just to um basically display all of the different percentiles that we have in the data and since the prometheus push gateway will continuously feed the merge data into prometheus.

B

Just waiting a little will basically give us the display that we need, and then we have an overview of all of the runs. The first section of the dashboard has a comparative overview and we can scroll down and can basically introspect bare metal, link, id and istio percentage individually. I did a few runs before um this presentation to just warm up a little data for you. So there's some data in this in this dashboard already now something else this dashboard is very good for is spotting outliers and there is a bare metal outlier right here.

B

If you look at the higher percentiles, we see a spike in latency and it's it's a little more than 80 times the latency that we've seen in every single other benchmark. So chances are that we had a noisy neighbor or something else going on here. So the dashboard offers you the option to actually exclude runs which we're gonna do now. This is the offensive run. It's now excluded and data can be looked at without this run. So, as suraj mentioned, the the controller cluster will have the merge latencies of every single of the leaf clusters.

B

So there's a special version of this dashboard for the controller cluster that gives you information of the various benchmark run on on the specific leaf clusters as well. Now, if you're interested to dive into a specific benchmark here, um for instance, for some reason, we want to introspect um this specific run and we can click the link and that will take us to the um to the uh benchmark cockpit again- and this is frozen in time um at exactly the time span, where this benchmark happened.

B

So we can, for instance, introspect uh all of the running benchmark data. We can look at the results we can see if there are any transport errors that have happened or we can give a get a detailed breakdown of the latencies. Basically, everything that the cockpit has to offer. As a closing note, I said that we're benchmarking whole whole applications and we consider the cluster to be the application.

B

So one of these statistics we have in the results section is for every single endpoint, um the actual amount of calls of requests that this specific endpoint received. This is something that you can configure in the um in the benchmark helm chart.

B

So if you want to have a different um distribution of endpoint calls, you can just edit the simple text file and we will get you there. And that concludes our presentation.

B

C

So, thank you very much. This was very interesting to me and um I I've read the article. I think it's already one year old, what the article you wrote about linker d, benchmarking and istio, and um actually, when we built service mesh.es, we were thinking about doing a benchmark and then we were like no. This is this is crazy.

C

You know you have to know many things to do benchmarking correctly and reading your article and also the comments people like ah no, you need to do this and that so I think this was a lot of work.

C

um So thank you very much for doing it and my my questions are um the benchmark you just did. Did you did you do it with the latest versions.

B

um Those are the in development versions that we have in locomotives. So raj has more details on that, so we he started migrating istio and link id into locomotive as components. This is ongoing work, as suraj mentioned it's experimental, so we would by no means call this a comparative benchmark right now. The thing representing today, just the um the automation environment around it, um and also so it's by accident, istio and lingard, because that's what we integrated with locomotive it can be done with any.

C

B

Measure, obviously it's just for the benchmark part. uh It's just naming. That's all yeah, so you wanna, do you wanna elaborate a little on seo and linkedin components.

A

Yep uh the the istio version. I think it's the the one before uh the I mean when we, when we did the second avatar of the second service measure benchmark we wanted to, like you, know, make it a framework so that others can use as well.

C

That's when we.

A

Integrated uh electricity and uh and sdl, so we use the sd operator for it to deploy it and for link link id. Is the latest table not the edge one, not the edge? Okay,.

C

So you're you're already um using the the sdod component, I mean the the yeah okay. I would expect that they have been improved. The performance has improved since 1.5, because I've only seen um blog posts and benchmarks for all the versions, which is not very interesting. If you look at the current versions, so I I saw in the in the benchmark that they are pretty much the same right.

C

Look at the current uh istio and linker diversions.

B

Yeah, but I mean it's: it's really not a uh quantifiable this. This is really not quantifiable data. It's just something I fired up yesterday on packet and ran a few things, uh so there's not no optimization going on here and we haven't looked into uh into any of that. um The thing we didn't really quite finish um one year ago, when we did the initial benchmark is so we were quite dissatisfied with the level of automation that we had in.

A

B

Benchmark um you know suit, and that has changed a lot in the in the recent months, like there's been significant improvements into it in, I think all parts of it. So why are we set up now to do like another round of benchmarks? We didn't really do it now and I wouldn't I wouldn't start with 50 requests per second right, comparing the two. I guess they can deliver slightly more so.

C

uh So the benchmark you showed was for 50 yeah request: um okay, yeah, so.

B

That's yeah, I mean we're saying that we're testing clusters in regular operating conditions, but um I think you can push it a little further than 50. um okay to get a quick demo done and have no. um You know life demo. uh Badness is happening to us. So if you're driving it the safe way.

C

So so, where are you heading with it? Will you focus on introducing the other meshes too, or will you you know push the limits off of lingerie and istio and and see what happens or will you introduce other test applications?

C

What's your plan.

B

So we were looking at other test applications um a little uh they're, not hard to integrate. You must, if they're automated, and if they can be deployed automatedly, then um it's it's pretty straightforward to do the thing with emoji water is, it turns out to be quite efficient, so um the first thing we did uh because we have emoji water pretty well integrated in the test. Suite first thing we did is we just tried to push it as hard as we can on bare metal and turns out.

B

This thing can do a lot of requests per second without introducing much additional latency and with no errors at all. So it's perfect for um for using it as a test application. I don't want to talk bad about other demo applications, but we also tested the book info or books, one that comes with linkedin and couldn't really push it harder than 50 or 100 weeks per second, without massive um errors and and delays, which is probably a good thing.

B

If you want to deliver a demo application for a service mesh because then you have something for debugging right. um It's it's not a good thing. If you want to want to run a benchmark, if there's, um if there is any other target application that we should be looking at uh we're taking we're taking patches, npr's like this is the whole service. Mesh benchmark project is an open source project or on github. um If you want to automate things, yes, please.

C

So do you think people could because every member, in my talk, I said I advise people to do their own benchmark with their own application, and I was writing my master thesis on on service mesh and I did my own benchmark and I was like okay, I will just measure you know by hand with with cube control stats, and you know- and I said, okay hannah- you cannot reproduce this one, you you just do it once and you have no idea if, if there's you know something going on which which will be very different next time, so I think for many people it would be really interesting to see how their own application performs.

C

Do you think this could be done.

B

Yeah, absolutely so, as mentioned the the the service benchmark suite that that um version 2.0 that we've uh worked on and that's now pretty much done, um has like several layers. Where you could, just uh you, don't you don't even need to go as far as deploying your own locomotive cluster. So if you have a cluster up and running, um you can use the benchmark port straight away. As long as you have grafana, where you can put the dashboards, which are also in the repository.

B

So there are json exports for grafana dashboards for the cockpit and summary one. um You just use helm to deploy the wrk port and you configure just your own applications and points in the in the handshake before deploying and then this thing runs right and as long.

C

B

Have a push gateway where you can point the wlk port to to deliver their data, then the dashboard will work, and so that's like the lowest level. You can interface and you can use from the from the automation suite.

B

Of course, if you want to do a comparative benchmark like what service mesh is best for me, um you would need to do manually. What is automated in the in the service benchmark um suite, so you basically have to remove um whatever service measure we're trying to get a bare metal um benchmark.

A

B

Add it and then remove it again and put on the other service mesh that you want to benchmark against, and that is all good right. I mean because it's still your and your own application, but um if you want to have a comparison between multiple service meshes, then having the second layer of um of extraction, uh where locomotive basically does all of the leg work for you, um it's probably good. It's probably a better idea.

A

And with the framework we are trying to be more generic like right. Now there are some hard codings, I would say, but uh yeah.

B

We are striving.

A

Towards the generic, as we.

C

Can, if you, if I want to add let's say the traffic mesh or console or something else I just uh I have a file where I place the install um you know, commands and, and I could integrate it or service.

B

Meshes the the what they're currently doing, which is by no means uh limited, which should by no means limit the way you would do things. But um what you're currently doing is we're implementing the service measures as components in locomotive yeah. That's good.

C

B

It's also an open source project right, so the to answer your question: um you would you would go to the locomotive project and um you know create either fork it or create a pr there. That adds this as a component, which is cool if the which is easy and straightforward.

B

If the service mesh is just the handshard um or um if it's, if it's a straightforward controller for istio, and then um you basically use locomotive uh for the lowest level of automation, and you can put the service mesh benchmark framework on top of it, and it will run you back if that scares you by no means you're limited to that, you can always have your custom, your own scripting, around running benchmarks and only base things on the on the server smash, benchmark, repo and install and remove your service mesh use it using other scriptings that works.

C

Yeah, I think it's too bad the the the implementations. They don't provide the information on their own performance. You know, if you look at the docs, it's you don't get this piece of information, so I think yeah.

C

I think istio does but yeah.

B

So there's a there's, a metrics and benchmarking working group at that cncf and that's probably the right level to kind of establish this kind of a response, because I feel benchmarking is a thing that will always need an independent body to carry out it's it's it's slightly political and well. We've noticed that with our first benchmark post, but um that's that's.

C

B

Things are right.

C

B

And I get it, I mean I get why people get um kind of you know agitated when when they read things about their favorite servicemen.

C

C

Okay, I'm looking forward for your next blog post, then.

B

It may still take a while. We have a. uh We have a few things on our plans, but yeah we'll get.

B

B