Ceph Ceph Month 2021, 23 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Month 2021: Improving Cosbench for Ceph Benchmarking

Description

Presented by: Danny Abukalam
Full schedule: https://pad.ceph.com/p/ceph-month-june-2021

A

Okay, um yeah improving improving cost bench for self benchmarking, so at softline we spend a lot of time doing benchmarking. um It typically falls into two categories. We have baseline performance testing of our notes, uh which we just do on a regular basis, and then we have specific benchmarking that we do on customer by customer basis.

A

So we caused back, I'm actually talking about the second thing um so yeah. Obviously, saf is tunable and configurable for different workloads. So we have a you know quite often we're trying to help our customers understand how to configure an architect a set cluster for whatever they're trying to do so.

A

There's lots of tools out there for benchmarking, but cause bench just seems to come up time and time again. So um so what is cost bench? It's an open source project licensed under apache. It was developed primarily by a couple of researchers out of intel shanghai r d between 2013 2015.

A

It was released as open source had a few original design goals which separated from some of the other s3 benchmark tools, primarily around scalability and extensibility.

A

So it started off very much as a research project research paper and that whole shebang, but then it very quickly graduated into a project lots of users, and I think the reason for this is it was built to be modular, so it um made use of java's osgi framework to break down the different parts, such as the controller, the driver, the web ui scheduler uh api plugins.

A

All these are autonomous components within the project, and so that made it very, very easy for vendors to write their own plugins and add support for various object, storage systems, including proprietary ones, so there's also plugins for amazon, s3 and google cloud as well so white support for lots of different voice clusters.

A

So why does it keep coming back over and over again? Well, there's lots of benchmark tools out there, although not very many- are designed specifically for the purpose of benchmarking, an s3 object, store or even any storage behind http. So in terms of open source tools, fio is really powerful. I'm sure many are familiar with it.

A

It has lots of bells and whistles it's great for benchmarking, block block storage in general and has an rvd back-end. It's pretty mature. It's been around for 20 years very well, documented um and fio also has a http backend, which you can use for sv benchmarking, there's also a tool called hs bench, which I believe or hot source bench, which I I think came out of the staff community as well, and it's a fork of another popular s3 benchmarking tool.

A

But you can open, github and you'll, find you know no end of small scripts and projects that you can use. That will let you benchmark s3 uh internally. Within software, we also have our own benchmarking stack, all set of tools that we built from the ground up to do a whole host of benchmarking.

A

Ports file block, object, a number of other custom interfaces, but in spite of all of this, um we keep coming back to cosmos, and so so why is this the case?

A

Well, basically, many ceph operators and prospective new users of seth seem to use cost bench, and so, just because of its wide adoption and use it has become in itself a benchmark when comparing comparing performance results, it means that we can exchange files with people that are already using cosbench fairly easily and run the same workloads on our systems.

A

So it's yeah a benchmark in the true sense of the word uh yeah. We might be able to get three percent more out of another tool or five percent out of something else, but um while we have tools that we use to squeeze every bite per second that we need out of the cluster, sometimes that's less important than having consistency across different systems. So we end up talking to us.

A

People use cosmetics and, like that gives it value as a um another good thing about postbench is that it has support for lots of different proprietary storage technologies so um because these were added over the years, it means that you can use one xml file and benchmark many different storage clusters using those vendor contributed plugins in terms of architecture.

A

It actually uh cosbench has a controller drive architecture which not all benchmarking tools have, and so that means you can scale it fairly easily um to very large clusters and also it's fairly straightforward, to figure out if your driver nodes are becoming a bottleneck for your storage cluster. If you need more driver nodes uh or you know, if you've hit, if you could hear the limit in terms of generating load, um then uh lastly, it also supports the battery.

A

So I think someone um contributed a values plug-in a few years ago and that still works, which works really well, so uh we can use cause bench to benchmark both s3 and then also the browns as well.

A

um So this is what cost bench uh the the workflow usually looks like you have an xml file. uh The powerpoint only ever runs one workload file at a time uh you feed the xml into the controller node and- and you can do that by the web interface or command line and um cospects will then send out the different workshops that you specify within each workstation in parallel, and it will do that and send that to your drivers, sequentially and these will then go and do the http cancel puts all obviously um yeah.

A

It's about us, it'll be it'll, be just like articles, but um typically http gets inputs um and the cool thing about this as well is that cosbench also has the ability to address multiple wireless gateway endpoints. So you can give it a number of http endpoints, and if you structure your workload correctly, then um you actually have to worry about load balancing, because it will, it will do some. You know it will balance the load across your endpoints for you, so that's fairly good as well.

A

So this is what a cosbex workload looks like you take. You can see at the beginning of specified storage type, so s3, I'm saying um I've, given it an access key, a secret key, giving it my end point in this case.

A

Just this workload is a very basic kind of one endpoint uh workload file, and then you have different work stages and so um typically you'll have some free benchmark uh workstations which prepare um you create the bucket buy the data so that you have some data to read and write, uh read, write from and then you'll have your benchmarking workstages. So here I've got a hundred percent right and 100 and I've got 70.

A

30 read right, that's another thing as well, which uh not all benchmarking tools can do is that you can have uh mixed read, write workloads um I mean.

A

Does that, but but not not everything does that and so you can, uh you can, um you know, add a ratio to the um that kind of a mixed and mixed uh benchmark and then, uh after you've done your basic work stages, you have a couple of post benchmark, essentially clean up deleting the data and then disposing of the pocket as well.

A

I've actually skipped in this work. I've skipped the uh creation of the data, because I have my right work stage first, but you need to be careful with that, sometimes because um uh you typically read faster than you write, and so, if you, if you do a time-based workload or you can sometimes end up with an overflow, so it's used. This is quite lazy. It's much safer to just have a um right data, but if you're careful you can get rid of them.

A

So that's that's! Essentially what it looks like.

A

B

What's our network look.

A

Like this is typically uh what we do when we're benchmarking with cos bench, I've taken out some of the other uh peripheral nodes that we use like management interfaces and storage management.

A

That kind of thing so just got the set roles here, the osds monitors and the wireless gateway, and you can see um here we have outbound management, but um at the bottom you can see it's all plugging into the same switch and um we you can have a separate s3 network or you can do whatever you like, but um we typically have a separate sv network and then our cost bench controllers and drivers plug in to the relevant network.

A

And then you know uh you can use the the right ip based on either http or braddos. um The other thing is that typically we do bonding. um I haven't it. It was a lot. I tried to do the diagram of bonding, but it was a lot more. It was a lot clearer when I didn't so I thought I'd just keep it this way, but you could buy, borrow the mix and- and it's the same thing essentially.

A

um So yeah cosmetic has a web interface. It's not that great, it's okay, sometimes to just watch the cluster and see how things are going. um I tend to not use it. I tend to use grafana and prometheus to watch and see what's happening on cluster and also the really useful command is the set osd card stats, so I can see what's up supporting if that matches up with what I'm seeing um on the other side, and so you know get some validation there. um You can really you know other than watching.

A

What's going on, you can also kick off workloads and generate workload files from the interface, but it's more effort than it's worth in my opinion. So I I just don't bother with the web interface at all. um I just yeah it. I tend to just submit workflow files using uh the command line and um and yeah. So I'm not a big fan of its face really.

A

um So what sucks with cost bench turns out quite a lot of things? um Firstly, there's no build system, so you can't really build it easily from source today. If you want to build it, you have to manually construct uh development, environment, eclipse um and figure out how that works.

A

It distributes all the builds uh the built binaries in the in the repo, which is obviously always bad. um The java users is now fairly ancient, but that's partially because it hasn't seen much development in many years, um but it also means that you have to faff about even just to run it. Let them build it.

A

You have to back around with getting um an old java, which can sometimes be fun on um on modern distros, um so yeah the build system is basically non-existent, and actually I think it's it's a lot more difficult, getting it to build on linux than it is on windows. I think I don't even know if we've managed to get it to build on linux before so uh I.e with the eclipse kind of strategy.

A

That's what I mean. um So the build system is not great. um The product is also kind of dead, so it's unmaintained. um While there are some forks that add in uh vendor plug-ins and a few other things there's, there hasn't really been real development on the the core of cost bench. In a long time, maybe more than five or six years, and so the last release was more than five years ago, and if you try and run the last release, it actually doesn't work.

A

You have to run the ultimate release, there's something wrong with the last release. I can't remember exactly what um so there's all these different traps as a user trying to try to use cosmetics the first time you fall into many of them and it's can be quite fun.

A

um Obviously, there's no build system so how we're going to distribute anything? um Basically, if you want to install it on your nodes, um it's just a very manual process. I imagine that people in the community that still use crossbench have probably written their own scripts to do this in an automated way, but everyone has to figure this out themselves.

A

So there's no package, no package management, no doctors, no nothing, um and not only that to start the um to start the uh the project. To just forget to run you have these um bootstrappy batch scripts that it relies on and uh they're they're pretty fakey and not very reliable. So um that's another! That's not the issue! Kind of sucks, um and then I think the most annoying thing is the workflow, but also the most easily solved thing, um which is that no one really likes writing xml uh or well.

A

No one, I know likes writing xml and constructing workload. Files is really a pain to do manually. So uh once you've got a bunch of xml files that you're happy with. On top of that, you're gonna have to pass the results from the runs um and because you have a lot of drivers, you have to.

A

You know, go in and figure out what the aggregate result is by adding up all of the results from your different drivers, which which it doesn't do it kind of, has something that does this in the web interface, um but sometimes it's there. Sometimes it's not. It tends to just vanish and then you just have to do it manually. um So it's very unreliable.

A

uh So you know you end up getting a calculator out and summing results, and you know once you've done that ten times you, you kind of irritate it once you've done that a hundred times. Well, yeah, I don't know not great so another pain point is the workflow.

A

um So what kind of we have an internal branch of cost bench and over time, we've kind of tried to make our lives easier to just deal with it? um So what what progress have we made over time? I think the biggest thing is that we've actually introduced the build system. So uh you can.

A

uh One of my colleagues actually uh you know spent spent quite a lot of effort, but he spent quite a bit of time uh getting it to build with maven, so we can now throw clips out the window from linux, just type name and build and we're off. So that's that's really great um and that's. I think I think, a big, a big change that we found pretty useful.

A

um On top of that, we also have uh packaged plus bench for debian, so um software analytics is a fourth debian and so um I'd imagine that our packages would work on any w system, maybe even until we haven't tried it um and so yeah. We build these with gitlab ci um and this places the various um you know when you install the packages it places the various pieces of crossbench in conventional system directory. So you have configurations in the use flashing to see.

A

I think we have some stuff in bar lib and then, um and so it's all kind of a lot more tidy and we also have systemd services, so you can easily start and stop and restart various cost bench services, uh the driver and the controller relatively easily single command. So once you've edited the configuration file in ecc, you specify your drivers and controller nodes. Then you just literally run the system, the start command and you're good to go, um and then we've also done some work to make it easier to generate cost back focus.

A

So from our standard benchmarking stack, um you know we we can. uh We can specify um a workload and it will go away and build some xml for us that we want. It doesn't support everything, but it does some basic things um and then um we also have some some scripting that helps us pass.

A

The um csv files that cause bench logs as part of it's so kind of the results gathering as well, so we have some scripts for generating the workloads it comes in and then we have some scripts for pulling out the results on the other end, and actually I figured out after we did this. I realized that um the set benchmarking tool- cbt also, I think, has done some of that as well. So I think that supports cost cosmetics as a back end, um but um I haven't I haven't.

A

Actually uh I haven't looked at it in anger, so I'm not sure if those scripts might be better uh or the past stuff's better, but um it's definitely worth exploring and then the other thing is: we've got some slightly better documentation as well.

A

So all the cosmetic stocks uh are in this pdf and it's like from this long academic kind of scroll, 30 pages kind of things, and it's also uh not very clear in some cases, kind of what different variables do and how to uh what the syntax is like when you're building an xml um it. It's not immediately obvious.

A

uh What's meant by a lot of things, it's still a useful reference, but um for a new user for the first time uh we have some kind of basic documentation which just makes it easy for somebody to get started.

A

uh You know clean the repo, build, run, cost bench and then uh or install package and then um get started and make a basic workload file. So we've uh we that's, that's a lot easier as well. So here we have this internal branch. It has a few of these changes. um There's still more that we can do. I think, uh to make cost bench a bit a bit easier, but I definitely find some of the changes we made already a bit easier.

A

So we want to tidy up this branch and um make it available in the open, ideally at some point in the next month or so so that um other people will take advantage of uh the efforts that that we put in if it's useful to people, if people are still using crossbench and anger, then that's great, uh maybe maybe somebody else will find um will find this useful as well. um We'll probably only have packages for debian to begin with.

A

um If other people want packages for other distros, maybe we can look at it or maybe they you know if they, maybe they could quite easily build their own packages and contribute. That would be welcome. um I guess we're just generally, you know interested to see who else is using crossbench still and um if there's others in the community within the staff community that are still using it, it would be great to collaborate and um figure out how how we could work together.

A

So that's kind of uh a summary of what we've, what we've improved with the cost branch. I welcome any questions.

B

This is sheikha to say, audible,.

B

Yes, we can hear you, okay. I I just wanted to understand one thing, so in my testing I generally use cause bench for performance analysis of one of the ss3 products which rednet has, namely nuba. So one thing which one challenge which I faced with causebench is that the data stream which it generates is actually it's a completely non-random data data stream. It's like 100 percent duplicable.

B

So is there any thought process where you are trying to do uh uh input stream, which can be where we can, as a user, specify, okay, 50 degree purple or uh 25 percent de-loopable or 75 percent neutral d-duplicate? Any any thoughts on that that one.

A

So your question is: um if you want to test the benefits of deduplication and video, I guess with um with uh cosbench, uh because the input stream is uh reproducible every time it makes it very difficult to to test the benchmark to see that yeah. That's right.

A

Have you looked at some of the? um Have you looked at some of the variables within uh the cost branch pdf for how you actually generate the data, because there's a number of different um kind of uh buttons there? I'm not sure if you could achieve what you want today uh with cost bench. I suspect you might be able to.

B

Okay, so what what I do? Generally, they say: there's another cause bench which is available, which was actually edited by nexcenter. What they have done is probably they have edited the input stream to generate a completely random input data. So I use that, but ideally I would want to have a feature which, which will allow me to select between any percentage like 50 or 20 percent. As you were mentioning- and I will go back and check in the cosmos documentation if there is any way to uh specify workloads in a certain way which does that.

B

But I I suspect there is none so I'll probably go back and recheck it.

A

Okay, I mean that's it it's interesting. um Maybe if uh you know offline, if you send me your um your workload file um and also point me to the center the next center before you talk about, because I I I don't think I've seen that before um I you know, if you're happy to look at I'd, be happy to look at this and see if I have any observations or thoughts.

A

B

We can do that yeah.

A

What's the best way to get in touch with you, shepard.

B

Yeah, that's that's great I'll, bring you over the email, then great thanks.

A

Oh any other questions, anybody else using cost bench other than.

C

Shaker well lots of internal processes within red hat used cause bench for addition to seeker.

A

Yeah, uh do you use that, with uh with the cbt tools, you have your own stack.

C

I I think both um they're they're they're, the perfect scale team, but then right now has has their own setups and automation. um There must be many others, there's certainly a lot of um folks. That would like to see something more lightweight and easier to deal with, but but we do agree that it gets good. It can get once you have it going. It can do a good job.

A

Yeah, exactly that's that's kind of the problem. Is it's actually once you've torn all your hair out, it's actually pretty good.

C

um Chris bloom of red hat wrote a golang kind of translation of parts, a little parts of it and I've never fully evaluated it. But I don't think that has gone anywhere and and mineio wrote their own sort of driver client driver set up called what they call warp and then I've been meaning to have someone look at that. But I haven't seen any got any feedback yet from anybody that does regular benchmarking.

A

Yeah we we've also written our own, go benchmarking tool internally, which I mentioned briefly, um and it supports um it, does a lot more than cosbrush does in terms of interfaces, um and it has a similar driver controller mechanism which is pretty cool. um So you know another thing that we might look at doing is open sourcing now at some point, um but I I you know, we typically use that very extensively internally when we're not or we're not using cost bench.

A

I did see the the gas bench blog post as well looks very interesting. um I I know also, I think, mark mark hpc. I keep seeing market each other. Is it mark nelson? uh I think he yeah mark nelson he's kind of done a lot of work in this space as well. I think he's one of the main guys behind cbt right.

C

Now, oh well, yes, and also the hot sauce bench he's been and and and we've made a lot of use of that since mark.

A

Also spence, as well yeah.

C

A

Well, um there's no more questions. Thank you very much.