Ceph Ceph Month 2021, 1 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Month 2021: Ceph Project Update

Description

By Sage Weil and Josh Durgin

Slides: https://www.slideshare.net/Inktank_Ceph/202106-ceph-project-update/Inktank_Ceph/202106-ceph-project-update

Ceph month schedule: https://pad.ceph.com/p/ceph-month-june-2021

A

All right welcome everyone um to sethmonth2021 um josh and I are going to give a quick update on the sub project and then we'll be moving on to a few other talks. um So I'm going to start with a little bit of stuff background that will be familiar to anyone.

A

Who's been using seth, but should be helpful for those who are new talk a little bit about stuff month, um give an update on the stuff foundation and what we've been up to a quick update on the stuff, the sepia lab, where we do all our upstream testing and release stuff, um we'll talk a little bit about telemetry and then talk a bit about the future and what we see coming next, um so just a little set.

A

um What is seth people describe seth as software defined storage as a unified storage system as scalable distributed storage or even the future of storage or linux of storage.

A

These are all nice words, but are a little bit meaningless and a little bit more concrete terms. um Seth is open source software. That's a software-defined system doesn't rely on any particular piece of hardware because you can run it on any sort of commodity parts. That means quantity, servers, ip networks, hard drives, ssds and vms, and so on um and it's unified, and that a single cluster can serve object, block and file workloads. Obviously, um so seth is free and open source software. That means you have the freedom to use the software freeze and beer.

A

You have the freedom to introspect, modify and share um free is in speech, so you get source code and you can modify it as you see fit. um That gives you the freedom from lock-in to a particular vendor and gives you the freedom to innovate by extending and improving the platform.

A

Stuff is also reliable. um That means that we believe ceph should be a reliable source system. That's built out of unreliable components with no single source of failure.

A

We provide data durability via replication or erasure coding, um and there are no interruptions of service from rolling upgrades or online expansion or contraction of the cluster and seth is always designed to favor consistency and correctness over performance, although obviously we'll try to get those and also stuff is designed to be scalable. So it's an elastic storage infrastructure, which means that the cluster may grow or shrink over time. As your needs, change or as hardware is refreshed.

A

That means you can add a remove hardware while the system is online and under load, um and we allow seth to scale up with bigger and faster hardware, we scale out by adding more hardware, more systems and more racks to a single cluster to provide more capacity and performance, and we also allow you to federate multiple clusters across multiple sites, with asynchronous replication features and disaster recovery capabilities um and on top of all this stuff is a unified system.

A

That means that we can provide object, block and file interfaces from the same storage cluster, um which is built on this rados component, which handles all the replication and data distribution in the system.

A

We release seth every year in march, so we have a stable named release every 12 months. We provide back ports for two releases, so that means bug fixes and security updates. um That means that a release will release really reach its end of life shortly after the newer release. That's two years after that, so, for example, nautilus is reaching end of life right about now. Now that pacific is out, so we just did a security bug fix release a couple weeks ago.

A

We're going to do one more stable release for nautilus, then it will be done, but you can upgrade two releases at a time, so those nautilus users can upgrade all the way to pacific viewers, newer and so on.

A

We provide packages debian and rpm release packages on the upstream sep website, as well as container container images, um and recently we've been working on some pro process. Improvements in this area so we're getting better at doing the security hot fixes quickly and getting them out simultaneously for multiple stable branches and are working on a more regular cadence for those stable releases um so a little bit about stuff month, um which is where you are right now june 21..

A

um So our goal for theft month um is be a little bit more interactive and bite-sized. um The background here is that obviously covet happened. um A year ago we were going to do cephalocon in seoul and that got canceled because it was right when the outbreak originally happened.

A

um We did not plan an in-person event this year either, obviously because the pandemic is still continuing, um but we wanted to give people an update on the project and provide venue, but um we didn't want sort of like a full-blown virtual conference that took all day and had you sitting in front of zoom for extended periods of time. So the idea is to spread it out over several weeks and have sort of bite-sized chunks.

A

So a couple hours at a time with a couple talks to bring people in and try to make it more interactive. So we can have more questions um and interactivity with the speakers and also have some freeform discussion. um Just so, users can network and share share um experiences.

A

um So a couple hours at a time, um maybe two blocks a week um with a few planned talks and that unstructured time and some lightning talks sprinkled throughout um there's an ether pad um with a link there at the bottom.

A

That should be in the chat, um feel free to add questions there um and also at the bottom. For that particular day. There's an area to just um add any topics that you want to discuss: sort of after these.

A

After these presentations, they'll have all the developers online and other users and so on, and we can talk about really whatever you want um this first week we're going to talk about this project, update, we'll talk about rados and the new windows support that we're working on next week will be raido's gateway, I'm in a number of performance talks the week after that is rbd's, dashboard and more lightning talks, and then the final week we'll have seth ffs and stuff adm.

A

So please open up the pad and add whatever you'd like um next year. We would like to do another cephalocon um and assuming we do, it would be in march 2022, but we don't have any definite plans yet, um but we don't know where we would hold it. um We could go back to doing it in seoul, which is where we'd plan to do it two years ago or one year ago, but didn't um we could also go for something in north america um after seoul was canceled.

A

We were talking with linux foundation, about possible venues and portland came up as sort of a leading contender or sort of price location balance, and that really we don't have any specific plans here.

A

So if you guys have um suggestions um about which region and so on, um please let us know we hope to have an in-person event, um but it's sort of an open question whether we want to try to invest in making it more of a hybrid type of thing where you can also participate remotely or if we should just focus on having sort of a more traditional conference, more interested in your feedback. So please let us know um a bit of an update on the seth foundation.

A

um So, as most of you are probably aware, we created a foundation a couple of years ago to support the theft project, essentially to get industry stakeholders to pull financial resources together that we can use to support the community.

A

So we have a number of career members now um 12 a couple new entrants are bloomberg, joined in the last year um and a number of general members as well. um New members here are cloud base and vexos um and then a number of associate members as well.

A

These are um educational institutions and government institutions and nonprofits, and so on um so current projects that the foundation is focused on one that we started a couple years ago was an effort to improve the theft documentation and we have a full-time technical writer stacked over who's on contract. For a couple of years now, hopefully, you've noticed some improvements on the setbox as a result, as a result of his efforts.

A

We also have a website update that's in progress for the last several months or six months, maybe even a year a while now um that effort has been spearheaded by soft iron, one of our premier members and we're very grateful for their efforts. There um we're shifting away from wordpress, um so it's gonna be a static site um generated from github.

A

You can go ahead and take a little source code there and there's a there's, a sort of a tentative site, whatever development site, uh we're hoping to launch this within the next month or so um so stay tuned.

A

um Another new project that we're just kicking off is around training materials, so here we're working with the linux foundation's training group to build out online course material to help people learn how to use stuff.

A

Jc lopez, who was responsible for a lot of the training material way back um and inktank days, um is helping put together a lot of that um and we're hoping to get um an initial set of stuff like 101 type courses together online. These can be either hosted on edx platform or lf has their own thing.

A

They can support both sort of cell flat or instructor led courses. So there's the potential to have more advanced courses later, even like paid certifications and so on. We don't really have any specific plans here. We're focused right now on just having some initial free material there to spread the level of suffix expertise within the community, but we're excited about pursuing this new direction.

A

um Other things we're doing we're drastically releasing.

A

Sorry, my headset always turns off um we're drastically reducing the amount of money that the fundation has been spending with public cloud providers to host a lot of the infrastructure by buying hardware to put in the cpu lab. Instead, it's much more cost effective, um and so at this point, we've basically um are only putting public-facing content in ovh. So things like the tracker and the stuff website and download.com.

A

So we're pretty happy about that. We're saving a bunch of money there and buying lab hardware. So that's hardware for doing builds for doing ci tests, all the github checks and stuff that we do and we're also expanding the stuff cluster. That's in the lab that we use to store all of our stuff results, and so on. um Windows support is coming along we're contracting with cloud base to do a lot of this initial development and to set up all the upstream ci and testing and stuff to do that.

A

So we're very excited we're going to hear more about that from alessandro in about 45 minutes and also there's a new marketing committee right now, soft iron canonical and red hat are participating, but we're eager to have anybody else going in. Essentially this is the idea here is to coordinate um sort of upstream project, focused marketing activity and press and so on um for the community.

A

So we're excited about that um and josh. I can turn it over to you. If you want to talk about um new stuff in the lab sure.

B

Thank you sage. So, like sage mentioned, I'm recently expanding some of the lab hardware to help us with building this up faster and many more builds in parallel, as well as adding more more test nodes to the infrastructure.

B

Let us run more tests with the phase of soft developments, we're continually needing to expand our test coverage and run larger scale tests as well. So another key piece of that has been some improvements in our test infrastructure called technology.

B

um I had a google summer of code student last number who changed the the model there so that we could actually uh run larger scale tests by locking many machines at once and not having um kind of competition for lab resources and there's one more work in progress on this um by another developer actuaria, who is moving from the current in-memory queue into uh and moving that into a postgres database, which will allow us to have much more intelligent scheduling of our test lab.

B

So that should be able to help us use the lab much more efficiently and get a lot more interesting testing done, especially at larger scales within the lab another area. We want to focus on going in in the future is downgrade testing. This would mean some testing downgrades within a given major release, so from one minor point release to the previous one or perhaps for the the previous few.

B

um This will give us some confidence that we're not making any kind of backwards compatible changes, and let us kind of uh guarantee that we, you users, will be able to downgrade themselves if they encounter some kind of bug in in uh after an upgrade.

B

This is much easier now with that containers and stuff adm, since you don't have to worry about all kinds of package dependencies between the different demons you're, just replacing at one container at a time. So look for this to come in the.

B

Future see if you could go to the next slide.

B

Another major aspect of the future is at different architectures, a big one is arm so ampere donated a bunch of hardware to the lab and we're now building it um packages and container images for center state and u.2 focal uh we currently we there.

B

We had to address a number of issues in podband and clay and to support multiple architectures, but uh things are improving there and we've already seen a number of users try this out on our hardware as small as raspberry pi's for home clusters and we're excited to see where that arm, however, goes in the future.

B

In terms of broader usage, one topic we wanted to talk about today was telemetry, so cephas had a telemetry module that uh allows users to opt in to share some of the data with it with stuff developers and there's a public dashboard that displays a lot of information about kind of which versions are running, and currently we have over a thousand clusters reporting telemetry with, as you can see, at 300, over 300 petabytes of storage.

B

The orange here is octopus. These are different versions and the little red there is uh pacific, so pacific was just was just released around that time. We see that line starting to get increased and octopus is leveled off, so it's users are upgrading now to pacific and uh now that it's so, which is great to see.

B

But telephony is all opt-in, it requires you to explicitly acknowledge that uh license for it, and there are a number of different channels that you can choose to opt in or out of there's the basic information which has like basic metadata about the cluster, like how big it is, what version it is there is uh crash metadata, which uh really helps us developers understand what kinds of problems people are actually running into uh in real life and whether something in the we're seeing maybe in our tests once out of ibm 100 runs, is a very common problem in the field or if it's really limited to our hardware.

B

We've already seen some great use of this. uh When we had a buggy version of tc malik in the container image that was causing a bunch of crashes in rgw, um we saw a bunch of those show up in the vh telemetry.

B

So we understood that that was a big problem for folks and worked quickly to fix it.

B

There's also a device channel for uh capturing metrics, about which kind of disks are in use, as well as health data like smart from the from those disks. This helps us try to improve uh our disk protection model that tries to uh give you an idea of when whether your device is likely to fail soon or not.

B

There's an optional identity channel that lets you provide some kind of information like an email address in case um developers would like to contact you about your cluster and in the future, we're working on a performance channel to add a bit more granular information about how the cluster is used. In terms of like uh what kinds of read versus write patterns we see in the cluster, what kind of rates there are um other sorts of information like that.

B

That will help us understand how well stuff is working at different levels, uh with real workloads and what kinds of real workloads we see um in in real life. This will help with developers optimizing stuff, as well as potentially making uh tuning for users um more of a close-loop feedback.

B

You could get through the telemetry dashboards, but that's all kind of a work in progress and we're gonna be very transparent about what what we're collecting with this, when you're, enabling telemetry, you can also.

B

Grab the json report that includes all this information and inspect yourself to see exact, precisely what is being shared and what isn't safe. So you think on the next slide, we have a little bit of information about from the most recent ceph user survey, asking whether folks enable telemetry, and we see about a quarter of that response. Finders uh haven't enabled on every cluster, another 15 or so have enabled on some clusters and 60 don't have it enabled yet there's a wide range of reasons. For this.

B

The most common answer is that the clusters on our protected network that doesn't have access to the internet. Unfortunately, we have a solution for this. You can add a socks proxy to that selling p module and have it report through that. That's one side pretty easy to solve. Second, most common is, I haven't gotten around to it yet.

B

Well, as we'll show you in the next slide, it says a couple commands, so I hope we need to help spread the knowledge of how to do this, so you can use the self telemetry show command to see what is being would it will be shared, as well as the show device to see what what the device information would be, and you can turn on telemetry and add the sox proxy information.

B

If your cluster isn't connected to the internet, that's all it takes.

B

I would like to talk a little bit about where we see stuff going in the future in general. um I think there are kind of three major aspects of that we want to focus on. First off is, of course, the basic uh reliability, which is the baseline of a storage system. You expect it to be stable and not lose your data, of course, and then. Secondly, there is a user experience, though out of the box.

B

Experience is a big thing for stuff and it's been drastically improved over the last several years with the dashboard and scifidm and zip adm is also including more making it much easier to manage. The entire cluster, from installing it out of the box via single commands to.

B

Kind of helping you set up more end user protocols that you would that would be taking more effort than with other systems. So you can really easily set up a small cluster to as a kind of nas replacement um and we're working on kind of turnkey support for nfs and object, storage and samba coming in quincy.

B

Yep third major aspect first is of course adapting to evolutions in data centers and hardware, and especially in relation to performance.

B

So there are a number of new classes of devices coming along down the pipe they're at varying stages of development. Some are already available for purchase today. Some are still in kind of standardization phases, but we're looking at a number of these, especially in the context of um like crimson um rework of the osd, to take advantage of these incredibly fast storage devices, which have really shifted the bottlenecks over the last uh five to ten years from the disc to the cpu.

B

But we'll see a lot more discussion about what exactly that means in terms of uh crimson uh later in the sf month. But uh for now is to be aware that this is kind of where we're dealing with all these kinds of different new hardware and new devices and it's being designed in a way. That's flexible and composable enough to work with any of these.

B

Well, another major emerging technology is uh mtme over fabrics, so the npme protocol has been around for a long time, just as a local storage protocol for uh for devices, especially flash devices and, more recently, the fabric side of it has enabled access to remote devices as well, without involving potentially the cpu on the other side.

B

So this can save a whole lot of the most precious resource that in the data center today, which is your cpu power.

B

There's a project underway already to look at tackling this on the on the client side in terms of presenting an rbd device over the nvme over fabrics as an alternative to discussion, this could be used as like a metal on a service for cloud manufacturer where you're exposing a disk directly to a host, um and it looks like a a plain nvme disk, though almost any kind of host could talk to it directly.

B

There's some new hardware out there, um like um more next style hardware like nvidia's, bluefield chip that um allow you to do this without without involving the host cpu at all.

B

There's also some discussion on further future plans for crimson a kind of a second phase of it, which would would involve changing the replication style of the osd, to avoid um needing to write, involve the cpus on the replicas in the replication path at all. So that would be a massive decrease in the cpu requirements that if you have a three to a three times replication that would be you know, cutting out the cpu from those two replicas would be a two-thirds reduction, which would be huge, that's a a very uh large project.

B

So that's further in the future. We expect there'll be more things like this to come.

B

Finally, there's a large ecosystem of software around nsf that stuff integrates with um some of these things are I'm ensuring, like rook in the kubernetes space and there's a more also more support for k native via rgw, especially with the bucket notifications, apis and things like sp select, which let you work with all kinds of software, like spark that treats the very large scale object store almost like a database also a lot of efforts around data movement in general and multi-site, with interoperability between a private cloud and a public cloud. In particular.

B

This will probably be described a bit more in the rgw talks later this month, um but some of it involves some kind of replication between a private data center and a public cloud, as well as within a public cloud and being able to choose different backends for each of those there's. Also some newer emerging de facto standards in in the industry like apache aero 4k. These are kind of general data and formats for use for for a large number of tools, especially in the machine learning and data science.

B

Space which are giving storing uh data in a very efficient format. For queries, as well as um enabling queries to be saved as kind of like a what you might call a materialized view in the relational database world, so you can have multiple steps in in your pipeline that kind of trade these these these use of the data and then expose them to subsequent steps without any of the data any computation leaving the cluster you could.

B

So you can essentially shift a lot of the cpu cycles into the massive storage cluster you have and off of the maybe single clients.

B

So, what's coming after quincy? Well, we haven't figured out a name exactly yet, but there we have an ether pad of the url there and the current leader is uh seth rogen. So please uh go and vote because I'm not sure that that's the last name we can come up with here.

B

That's apparently the name of the squid at the uh vancouver aquarium.

B

I think that's all we have for the project update. So if there's any questions, we can take a couple now.

C

Now, looking at the aether pad, there was a mention about the crimson update, but it looks like you covered that.

B

Yeah, I think we what um well I know I will not discuss crimson at least briefly um in a minute, and I want to schedule a larger crimson update later than in the month. As.

A

Well right, I put I put the links in the chat for the um the pad with our names. um Usually uh I go through um wikipedia or something similar and try to come up, find all the like. The common names for cephalopod species that start with r. I haven't.

B

A

That yet so maybe something good will come up there or somebody else is feeling motivated, feel free.