Ceph Cephalocon Barcelona 2019, 24 May 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Keynote: What's Planned for Ceph Octopus - Sage Weil, Co-Creator, Chief Architect Ceph

Description

Keynote: What's Planned for Ceph Octopus - Sage Weil, Co-Creator, Chief Architect & Ceph Project Leader, Red Hat

About Sage Weil
Red Hat
Ceph Project Leader
Madison, WI, USA
Twitter Tweet
Sage helped build the initial prototype of Ceph at the University of California, Santa Cruz as part of his graduate thesis. Since then he has led the open source project with the goal of bringing a reliable, robust, scalable, and high performance storage system to the free software community.

A

Thanks everyone welcome back to day, 2 hope everyone is having a great time. I know, I've been really enjoying this conference and I'm looking forward to talking to more of you today, so I thought we would just take a few minutes and talk a little bit about what is coming in the next stuff release stuff optimist. Yesterday, I talked about what our priorities were, that we set a year ago and what we did for anomalous and so I wanted to give everyone a glimpse of what our priorities I'm over thinking about our four octopus.

A

So it's a lovely picture of an octopus I found through Google Images I mentioned yesterday, that we used to think of the stuff priorities in terms of before sort of components that we picked out a year ago, but that on poun reflection we had a discussion on Saturday with some developers and sort of revised that thinking to sort of group things in five categories or five themes. It's it's somewhat. An all-inclusive in terms of this is like everything that we kind of feel like we should be doing so.

A

It's I'm not sure I would describe them as priorities rather, but rather call them themes, because all of these are important, and so I really need to be looking at all of them. So but I'm going to go through them and give you just a glimpse of some of the things that we're talking about for octopus, and these aren't necessarily guarantees that they're going to be in the next release, because we have to sort of revise our planning and so on. But this is really what we're.

A

What we're thinking about so I'm gonna start with usability, making stuff easier to use easier, to manage easier to consume and easier to operate at scale and I. Think the biggest piece in this category is around the the orchestrator API that I mentioned before. We really want to improve the clusters, ability to reach out and talk to either rook or some sort of sort of bare metal type orchestration tool currently based on SSH.

A

Although we might generalize that to be something as simple as run this command on this host- and maybe SSH- is one way to do that- maybe plug another remote X execution frameworks, but the end goal for octopus.

A

Our hope is to expose all the basic functionality via the standard stuff, CLI things like adding LSDs, replacing OS DS provisioning, new monitors, adding a radio Skateway daemon, adding an MDS, removing an MDS, all those sorts of things, and so to expose those basic set of workflows, either both through the CLI and also through the dashboard, so that we finally have sort of an end-to-end GUI experience for for those sub clusters.

A

That's the goal and how far exactly we get because there you know it goes on with nav NFS, daemons, I scuzzy gateways, well, that one's actually goes to are actually in pretty good shape. How far we get remains to be seen, but we want to have some basic set of functionality. That's all in place and part of the goal here is to take the sort of the venerable historical stuff deploy tool and capture.

A

All the things that you used to do with it and basically bring all that functional functionality, so that's built into the sub cluster so that you could actually have sort of a single bootstrap phase that just brings up one monitor in one manager for a new cluster, and then you use all of these Orchestrator commands de operations to deploy the rest of your stuff cluster, and once we do that.

A

This paves the way to have sort of a standard set of processes in the set documentation that are consistent for all users, whether you or I kick things off with rook or using sort of that bare metal approach. We can have consistent documentation for replacing disks and so on. So really looking forward to that one of the things that this unblocks is the possibility of. Finally automating upgrades every time we have a major release. You know it's like a 15 item list to like make sure this is the case.

A

Upgrade the names in this order double check that this health state is set. This way, you know, run this command and there's the set of steps vary a little bit between each release, but we'd like to be able to automate this more and the challenge here is figuring out what the division of responsibilities is between the orchestration tool. That is probably gonna.

A

Kick this process off and how much stuff can manage on its own and so we're discussing how we can do that, so that the manager module can handle all of those sort of internal stuff dependencies and and careful gating and sequences and so forth, but still leverage this orchestration API to go through an automatic, the OST restarts and make sure your PJs repair and all that stuff. So the ear upgrade can proceed in a nice fashion. So this is coming and we're excited about it.

A

One of the other sort of last things, an item that is gonna, be worked on for octopus in the usability category category is taking the thing that we did for our beady top. That lets you identify the top I/o users and do the same thing for set of s, and so a small thing, but very useful for actual operators in the quality category. I think the biggest effort here is best step board is around the telemetry and crash reporting that I mentioned yesterday.

A

That's new and Nautilus, and so right now the Nautilus now has the ability to phone this information home and once we can convince users that it's in their best interest to turn it on. But right now is just getting dumped at a database. So we have to build all the backend tools that let us introspect and analyze and identify trends so that if we push out of stuff release and people start seeing a particular stuff crash and those start trickling back into the database, that we can actually notice and preemptively go and figure out.

A

What's going on and fix the bug as quickly as possible, and we also want to make sort of the back-end infrastructure tools so that developers can can browse through those data sets and identify. You know if this particular crash is happening on these specific versions. Maybe it started at this point and stopped at this point. Whatever the the correlating factors are, we also want to make sure that we're doing everything we can to get users to turn on the telemetry and if users are don't want to do it figure out.

A

Why so that we can adjust it because it's it's it's a very delicate thing asking people to make their clusters phone home. We have to really trust that the data that they're sharing is not violating any of their their privacy's or policies or whatever.

A

So we have to make sure that that's an ongoing conversation and that we're very careful about making sure that we're not phoning home data that we shouldn't there's an effort and that was kicked off a couple of months ago, called doc, II better, which is focus simply on making the Ceph documentation better they week. It's a it's a team that meets every other week a couple different times over the course of the month and they discuss all of the documentation infrastructure.

A

All the tooling on the back end that automatically generates the documentation on the website and also where the content and gaps are because we we think that the Ceph documentation is one of the key areas where you can invest. That will help us grow the user community and help onboard developers and so on.

A

We're also looking to find a dedicated person that we can contract or hire that can focus on working on the upstream documentation, I'm revising the information architecture just going through all of it and updating and make sure that it reflects the current best practices with the latest stuff.

A

So if you know anybody who's, interested or you're interested in participating on that effort- and definitely let us know- and we're also continually looking at our test suite our automated test, suite that we run to make sure that we're doing everything we can to ensure that that stuff is high quality. So individual components are meeting regularly to review what the test coverage is and brainstorming ways that we can expand, write new tests to cover parts of the code that current they aren't being sufficiently tested.

A

There was an effort a while back to build a test suite that also focused on cross version compatibility, so you can ensure that a particular separation on the client side. You can still talk to a different set version on the server side.

A

We need to reinvest in that effort and make sure that we're running those tests regularly and we've also discussed in the past, but haven't fully implemented a test suite that does downgrade testing so that within a major release Nautilus, for example, if you install a point release and it causes problem to ensure that you'll be able to downgrade back to a previous stable version within the same series performance. So one of the first are the main focus areas here is around raters QoS, so we've had a QoS infrastructure design.

A

That's been partially implemented in ratos for quite a while. Now and in fact there was a great talk yesterday from the folks from ZTE about their efforts around this and some of their pending changes, and that's all great, but largely this effort has been blocked because it depends on carefully managing the queue depth in blue store.

A

All the QoS prioritization happens of a higher level, and once you commit to doing that work, if you're feeding your prioritized commands into a deep queue with a high latency, then it's pretty ineffective and the trick that we're trying to address trying to solve is figuring out. How do you manage that queue depth in an automated stuff self tuning fashion, so that it'll automatically adapt to a slow, hard disk or a very fast team, give me different workloads, different I/o sizes and so on?

A

So we're gonna put some real effort in this cycle to figuring out addressing that problem, because that's gonna unblock I'm being able to deliver this as a general solution. There's also ongoing work with blue store in general to improve performance. There's sort of two efforts here: one is around charting the rocks. Tb, that's used for metadata internal to blue store so that the effects of compaction are less impactful and the space utilization is more efficient and the other is looking at a rocks.

A

Tb fork, currently a fork called T rocks TB that essentially separates the the keyboard key portion of the data and the value portion of the data into different IO streams to improve compaction. Behavior and initial testing has shown that the combination of these two changes has had significant impact in improvements in performance for Blue Star, so we're we're excited about both of them. In this fs space.

A

There's an investment in in create and unlink operations to function, asynchronously in stuff, FS workloads, the the latency tends to be dominated by the fact that you have to have a round-trip to the MDS, for each create or, unlike and so being able to do. Those asynchronous can unblock things like on tar and RM and so our fourth to go much faster, so it's complicated, but the team is working through it and we're very excited about making this sort of leap leap forward in the protocol. And finally, there were several sessions.

A

Yesterday about the crimson effort, that effort is continuing. Of course, the focus initially is getting sort of an end to end implementation so that we can test the folio path and then observe what how well it's doing and what how it behaves so that we can validate a lot of our initial assumptions about how the software should be designed and how it should be put together and based on that, then we can figure out what the next steps are. So I remind everyone.

A

This is a multi-year effort, but we're really aiming to make sure that in a year or two, when we have these much higher performance, flash devices and persistent memory as sort of the normal deployment hardware for high-end high performance arrays that that we have the software, that's ready to sort of meet those meet those requirements in the multi site, multi cluster space, you know being able to scale beyond a single data center and there a couple of things going on one of the the biggest is around our GW multi-site capability.

A

So our GW has multi-site Federation and replication. That's managed at sort of a site granularity, but there are a couple of key things that we're doing to sort of revise the way that our GW is structured, how the multi site is structured to sort of do the next iteration v3 of this. The first is bucket granularity control of those multi zone, multi-site federated replication relationships.

A

One is support for a sort of a pass-through storage so that, when you put into a bucket you'll, actually just write it through to s3 or to Azure, or something like that, like that say that you can use our GW as a protocol translator to give you a consistent, API endpoint across different um topologies different clouds, one is bi-directional: replication of a bucket to an external object store, so you can have an rgw bucket, an s3 bucket that are mirrored an active-active, writable and replicating both.

A

Both directions in the final capability is having individual objects within a bucket, be able to tear out to an external storage expanding on the current lifecycle policy that currently allows you to tear within a South cluster. So you can also tear out to something like glacier or it's or its moral equivalent, and finally, we have multi-site capabilities and stuff in our GW and in our biddies async mirroring capability.

A

We need to finally take a look at what we're gonna do in the South of s space I'm initially- and this is probably gonna- take the form of sort of a snapshot and periodic sink type of capability for disaster recovery, but we're currently brainstorming ideas about how we could do sort of more online bi-directional replication within the filesystem and how to address those. Those use cases in the last piece in the ecosystem. Space is probably an unsurprising story, we're continuing to vest in invest in our integrations with and relationships with, kubernetes and rook.

A

We want to make stuff the obvious storage choice for container infrastructure. Openstack, of course, is a huge ecosystem that we're already well integrated with and heavily invested in. We want to continue to make those users happy analytics, is sort of an emerging in use case that we're looking at that's seeing a lot of traction around. You know data Lakes, big data, a IML analytics backed by our GW and keep our eyes out for new for new ecosystems that are growth, opportunities and places where stuff can really shine.

A

I mentioned yesterday that we're thinking about changing our release, cadence from 9 months, 12 months, wires, actually tweeted at a poll. The results are maybe not decisive but leaning towards the 12-month cadence. So this is gonna, be an ongoing discussion point for the community. If you're have an opinion here, let us know you'll see some. You know threads on the on the list and so forth, but the the net of it would be that, instead of upgrading every 18 months, you could upgrade every 24 months every two years at the at the limit.

A

I just want to say a few words about how you can get involved in these efforts. Steph is an open community, open source project, the more people that help us, the more that we can do so we use all the usual free and open source tools and processes. We have mailing lists and stuff develop stuff users mailing lists you go website. You can sign up for those if you're not on them already we're on IRC all the time. So you can talk to us.

A

One of the easiest ways to get involved as a developer is to go on github, look at pull requests and help review code. That's one of the hardest things for sort of the core development team to do is ensure that they're setting aside time to review new pull requests, but it's also one of the most important things we should be doing in order to bring new developers into the community.

A

So it's a good good place to focus and, of course, just opening tickets, opening bugs and commenting on existing bugs if you're seeing them in your environment is extremely helpful. So we know how to prioritize issues and what we should be fixing the documentation, as I mentioned, is a priority and a focus to make it as good as possible and as helpful as possible to make it easy for users to on-ramp.

A

There's now a new link in the upper right on any documentation page that will link directly to github to let you do a pull request to edit documentation. So if you want, if you see, changes the inaccuracies or typos or whatever, it's super easy to make those changes and propose them so I encourage you to do. That is a good way to contribute. We have lots and lots of meetings. We use video chat because the stuff development team is distributed all over the world and so there's a public community calendar.

A

We're gonna send this link this URL somewhere else, so that you don't need to copy it down, but there's a there's, a public calendar that has all of our stand-up meetings um all of our weekly meetings on various topics. All these meetings are open. Some of them are focused towards users and are very easy for people to join and discuss things.

A

Others are the daily stand-ups for developers, so developers can join and ask about the pull request that they've open and ask about bugs and so on, but you're welcome to drop in on any of them and talk to people on YouTube. We have a set channel on YouTube that has a ton of video content. So all of the talks here at cephalic on are being recorded. They're all gonna go in this channel. All the talks from last year's cephalic on in beijing are here all of our weekly meetings.

A

Most of our weekly meetings are recorded and available here we also have code walkthroughs on lots of different stuff components, and we have several of those walkthroughs that are targeted specifically at new contributors. How to get your development environment set up how to write your first patch, how the Ceph code is organized and how it's how to approach it, and so I definitely encourage you to look at this as a good resource. And, finally, we were in the process of revising the sort of stuff getting and started getting involved.

A

Page, that's on the Ceph website so that it has links to all of these different resources and we'll be ensuring that those are those are easy to find and last cephalic on is only once a year, but we have SEF days in all over the world and in various geographies, and so, if you're, trying to connect with your local stuff community. This is a great way. So the next one is going to be the Netherlands there's one plant at CERN and September.

A

There's gonna be in 1:1 in London, when Poland and if you haven't had a set day in your area and you'd like to organize one and the hardest part is usually finding a venue that can hold. You know 100-ish people and then just talk to us and we'll help you set it up. It's not actually that difficult I mean we would love to continue the successful program in new areas. So thank you very much.