Ceph Ceph Month 2021, 16 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Month 2021: Project Aquarium - An easy-to-use storage appliance wrapped around Ceph

Description

Presented By: Joao Eduardo Luis & Alexandra Settle
Full schedule: https://pad.ceph.com/p/ceph-month-june-2021

Project Aquarium is a new open source project to build an easy-to-use appliance wrapped around the Ceph project. The project started development in January 2021 and has become a passion project for the storage team at SUSE. In this talk, Joao Luis and Alexandra Settle will take you through a demo of what the team has achieved so far, talk a little bit about the architecture of Aquarium, and what's next for the project.

A

Perfect, all right, so hey everybody! Welcome to the talk today on project aquarium, my name.

B

Is alex saddle, um I am the software engineering manager at souza and for the purpose of aquarium, I'm also the product owner well,.

C

And okay, I'm a senior engineer at caesar. I've been working with the safe project for quite a bit of time and for the purposes of aquarium, I'm the back end lead.

A

B

So we're going to go through today what? What is this aquarium thing? Why are we doing this? What is our objectives? um How are we going to do this? We're going to go through a little demo um to clarify this demo is pre-recorded, because we don't trust demo gods and then we'll just have a quick recap on what our outlook is next steps where we're going, what our plans are and if we have time we'll, have a little q a already let's dive in so what is aquarium?

B

The million dollar question so aquarium is a open source project. uh The storage team at souza, who previously worked on developing the souza enterprise storage product, has now been working on a new approach on ceph development and deployment. Sorry and management. um This is an opinion version, opinionated version of a storage appliance um and, to be completely frank, we're still working on what that opinion is.

B

You know we're constantly developing and iterating uh this definition, but we do know we want this to be something that can provide and manage data without an application context, just mostly to add simplification to the day. One installer experience.

B

So we've split this into two clearly defined work streams at this point, they're relatively obvious and you'll get the sense for the theme, but we're going for gravel, which is our back end. Obviously, the base layer of the aquarium and the glass, which is our front end, what you look through, how you're gonna check out the aquarium um and to be completely honest, we're really enjoying our own theming, so yeah, all right. Why are we doing this?

B

Well, the perception is that ceph is too complex uh to use for the average user, and this is some feedback. We've received a lot uh being part of the deaf community as a product for such a long time, um and this complexity stems from step, steph's flexibility, which supports such a huge matrix of use cases, um and this affects this- has on the available capacity, performance and availability.

B

um All current management tooling deploys cert bottom up, so the user must specify deployment patterns at the demon level, create pools and crush maps, and then only then can the user deploy their workload.

B

So this project is basically exploring if other user interface abstractions are practical and, ultimately an improvement.

B

um We believe in abstractive complex, complex concepts with simple terms, so our main goal is simplification, where a user is not required to configure individual demons or disk layouts, but we can provide a high level specification of what they want in terms of availability, usable capacity, um but basically that translates this all into a deployment layout. um We also believe in a really tight naming scheme, so I think that's always worth cutting out alrighty architecture.

B

C

Okay, so the way we've been.

B

I'm sorry this has gone the wrong way.

C

Okay, that's fine.

D

C

Okay, so acaramic itself is a python backend um serving an angular front-end, um it's basically a systems system service. We start the aquarium on boot and it. This starts aquarium on blood and we've chosen not to containerize this, because aquarium will be relying on tools like fadm and other system host system. Binaries, though containerizing it would make everything a bit weird and the purpose of this system service is mostly to manage and monitor um path and to do some deploying we need.

C

Can you go to the next? One?

C

Aquarium itself will be running on every single host in the in the system, so it's essentially a cluster on top of a cluster which is always fun uh the nodes should they need to communicate with each other. They will do so over web sockets, mostly because the framework we are using for the http backend already supports websockets, and it was just easier.

C

We keep persistent state that we might need every that we might need to share between the several hosts the royal aquarium nodes.

C

We we use fcd to keep that state, mostly because xcd does what we need, but also we didn't want to implement. On top of the fact that we all we are already having a cluster having to also implement a consensus, algorithm and the key value store, and all of that, so we just leave fcd deal with that. Instead of having to implement that ourselves and this way we don't have the concept of a follower nodes and a leader, node and each node can serve the front end to the user without any particular constraints.

C

While the aquarium itself is a python uh python, back-end and the angular front-end and all of that stuff, um it runs as a system service. It runs from an image that we have been purposely built building.

C

This is meant to be tumbleweeds uh base ram disk, which is meant to be run directly from a usb stick or to be pixie booted.

C

This image is where aquarium lives it starts on on boot, and um if we reboot it, the query makes sure that all the persistent state is kept on a system disk that is created upon deploy, but mostly this is meant for the upgrade path, so that upgrading aquarium does not mean re-flashing a disc or upgrading an os. It just means changing the image and rebooting the server, at least that's the hope. The hope is that this will make upgrading aquarium easier and dissociates aquarium from upgrading aquarium from upgrading saf.

C

As alex mentioned, we have these two front at this two work streams. We have the front end in glass, backhand, being gravel.

C

Gravel itself has three major components. To some extent, one of them is dealing with the local node on which that particular service is running and we'll keep track of disks gpu load. Things like like that that can might be interesting to have access to and are necessary before we actually deploy the the initial node or deploy that node before tap. Adm actually has access to that information.

C

We have a component that deals with all the operations within the aquarium. Cluster may be.

C

Deploying the the initial node or having nodes joining the the cluster, and we have this resource abstraction um module that can basically translates to concepts into something that the web front-end can consume in a fancy manner, so that these these resources can be abstracted to the user.

C

On top of this, all sits on top of libras, essentially python libras, which we then use to communicate with zeph and cfdm. This fadm binary alex. Can we move the next one regardless? The front end is the the one thing that people will be facing and be consuming.

C

We aim at having an easy and guided install and having a blizzard working from the person out. We are abstracting the these concepts like we don't have pools or gateways. We are offering services, we have. We offer a file service which can have multiple backends, and we intend at some point in the near future, to have like use case driven deployment in the sense that the user will be specifying a set of wants the system and the system will be providing a solution on feasibility or.

C

Whether you know whether that that this case is visible, given the existing resources in the cluster or even given the the node and the existing resources, I think this is a demo time.

B

Yeah all right switch it up. Let's see how we go.

C

Okay, let me see if this.

C

Okay, so as as mentioned before, this is a pre-recorded demo, see the whole thing: okay cool. So to start, this is a web browser we are connecting to. In this case, it's a vm we are connecting to to aquarium and when we connect to the host at the specified port, we will get this splash page, welcoming the user to aquarium moving on. We will go through a guided edit steps. We are creating a new cluster.

C

We are going to be asking the user for the the node's hostname and after that we get to decide whether we want to use a self-hosted, mtp server or something from the internet oops.

C

E

C

Guys about that yeah we we get to to see like um the devices that we are. We have available, which was not clear enough, but essentially the the service will pick a solution for this, the disks that we have available, choosing one of them as a system. That's where we will persist all the data we need to persist.

C

We can then add the services we might want, or we can wait to do that from the dashboard.

C

We get immediate feedback based on the resources we have available, may be the number of replicas or the capacity that we they have available and at this service, as it is being deployed, it is creating the pools in the back end and calling the all the necessary bits and bobs and and f so that we get uh nc dashboard well, it will be eventually fancier, but right now it's it's providing the information that we actually need in terms of services.

C

The hosts we have available extending the node will extending the cluster will require adding a new node to the to the fold.

C

For that, we will need to choose the join, existing cluster uh setup again set up the hostname and we will need the ip address of an existing server any existing server and the authentication token that we have generated upon initial deployment.

C

um This process is fairly quick to achieve. um Basically, we are just registering one node with the existing cluster and adding that to ceph via the cef orchestrator.

C

We have to take in a bit of creative liberty here too, and cut all the boring parts between then and now, but you can see that the system shows the existing uh the both hosts on the on the newly added post and we can add a service, a new service from the second host, and it will show up in the first host as well.

C

It's so both nodes are basically being able to perform tasks and see live data in the in the cluster, as as they are operating it's it's a bit irrelevant from which node you are actually doing anything, um and this bit here is just to show that we actually can write data to the cluster. We provide you with the with a helper to mount points for each service. In this case and in the background we've got script.

C

That is going to be writing data to the to this cluster and we keep track of the reads and the rights performance based on uh on services. Instead of instead of pools- because you know that's the abstraction that we we are offering- okay- so that's about it for the demo right.

B

C

Yeah, that's that's done.

C

A

All right so what's next, um I think what we've got is something pretty pretty exciting, but we still need to really be thinking about.

B

Where we're going from here, it's a project, it's new, it's budding um so in the short term, we're going to focus largely on dashboard improvements, um getting the object service up, running and running from a usb6 and pixie booting on real hardware, and, to be honest with you, tests lots of tests we're working on an extensive testing plan to get ourselves up to scratch.

B

I think one of the most important things is testing um and that's always been such a such a high bar in the stuff community, and that's something we want to keep following, along with. In the medium term, upgrades are going to become a priority block service and resource constraint, solver and the long term, world domination. Obviously, um no our priorities going forward are going to be community development.

B

We want to be you know, this is a passion project for our team and we want to be open and honest everybody who wants to get involved, um we're here for a good time and a long time as a bonus, uh we're very proud of how far we've come in such a short period of time. We started this project in january um and we've been working with some seriously emerging engineers and we're open to anyone and everyone come to check out what we've got we're about and provide some feedback.

B

um We do, as I've put, there have a like developer roadmap. uh This is just sort of a glance at all the things that we're really hoping to achieve over the next. I think we put seven or eight milestones uh as you all can correct me. If I'm wrong there.

C

I think we have about eight milestones. Eight milestones.

C

Subject to change.

B

Subject to change, indeed, um cool, so that's us! uh Our project is open. Links are here or you can contact either one of us or anyone in the storage team for more information, we're on slack, we've got github discussions board. We meet three times a week to manage as many different time zones as possible. So we do uh one that's more for apj one for europe and for the americas.

B

um We have a basic, getting started guide from zero to hacking and we'd love to see what people think. So, thanks very much. I can probably stop sharing my screen. If anybody had any questions.

D

Yeah ernesto here uh I'm interested in knowing uh what are the lessons learned from the dashboard experience, except that for me uh compared to the this one.

C

I'm sorry can you, can you say that again.

D

Yeah yeah, I am interested in knowing your uh lesson learned or how do you apply them to this from their past experience with the step dashboard.

B

That is a really good question and I do wonder if we have one of our dashboard engineers in the call is either. I think folk is here and so is tatiana. Do you guys mind speaking up to that.

E

It's really hard to compare, because um the projects and the um the the ways we are going are really really different.

C

All right, I I think in parts the the thing is that we are trying something that is slightly different, while taking into account that the ceph dashboard is something to behold in terms of not only the information, it shows the user, the usefulness of that information, but also the the level of detail and that's something that we are trying to some extent abstract, while um factoring in the.

C

What do. I say this factoring in the bits that are most useful without actually copying the the existing dashboard right, because we don't want this to be competing with the dashboard, because I think they follow two different approaches, at least at this moment in time, and they they are, or at least we are hoping that they are helpful in two different contexts, one in which we want to have access as much of cef control as possible and another where we can afford to relinquish some control and just use abstractions and whatnot.

C

But again we are still in the middle of figuring. A lot of this out. Most of this has been a learning experience and trying whether this works out and whether it doesn't I'm sure if this actually answers your question.

D

Yeah kind of thanks uh one extra question: uh is it possible then, to switch or jump from this uh departing dashboard to the sub dashboard as a fallback mechanism, or is that completely hidden? There.

C

D

No way to at this moment.

C

The dashboard is disabled.

D

C

For well to reduce the amount of things that we have to factor in, but I think that would be something that might be useful, even if it's eventually for someone who either wants more control or maybe to even debug things in case they break again. Everything is a bit open.

D

Still, thank you folks,.

F

I have a similar question. I guess.

A

F

To ask if um um the front end is making use of the dashboard back-end api in order to get its visibility of the cluster and whatever it's manipulated. So if it's disabling the dashboard module, I'm just curious. What um how it is that you're, interacting with stuff itself.

C

uh We're doing uh python labradors, um mostly initially, we actually attempted to use the the dashboard api, but we realized that a lot of that api is targeting the dashboard itself and not necessarily or at least it's not in the state, yet to be an api to ceph and as such we chose to go with python libras and actually issuing commands directly to the manager and the monitors.

C

Hopefully at some point, this can change, because personally, I'm not a big fan of that.

C

But this has been the approach we've been.

F

F

I had a okay, a different question um about the um about the hardware side of things are there like? um Are you targeting particular um particular models or heart of hardware for the service themselves, or is this meant to sort of run on anything? um I'm just curious.

F

If the goal here is to have like a an opinionated appliance that you deliver with like known skus or something like that or um yeah,.

C

So that's that's one thing: we've been we've been discussing.

C

How opinionated we want to be one of the the things might be eventually to decide that we are opinionated about the number of disks, for instance, or the how much cpu one needs to have ram and whatnot and be a bit more strict about it.

C

um If the tricky thing about deciding on the actual hardware is that first of all, one would have to have access to the hardware to actually you know, validate it and other. On top of that, that seems a bit out of the I don't know it. It can be tricky overall, but I don't think at this point in time. We are looking for specific, specific vendors models whatever for, for the hardware.

B

Yeah I mentioned earlier that we're still discussing a bit about what opinionated means and honestly, that's part of the question, um and I think that's something that as much as we, it is quite like a double-edged thing right like as much as we can say yeah. This is exactly the kind of excuse. This is the way we want things to go. We do recognize that peop, it's an open source project right people can take that and do what they want with it.

B

um So it all really sort of depends on the future and, I would probably say after the this year, I think we might have a better idea of where we're going with regards to hard work. But it's going to take some time and probably a lot of thought.

G

Okay, a question: what what's the benefit of also deploying atd.

C

um So the the idea at least we've been operating on is that aquarium needs to work regardless of ceph.

C

So if we have a cef outage, aquarium needs to still be working, be able to at least eventually help the user to recover the cluster, and, as such, we cannot rely on, for instance, monitor evalustor to store plate that we need to operate aquarium as such. We need something that is not in saf to keep that state and the benefit of xcd is that it already has a consensus, algorithm and already is able to function a clustered way so that we don't have to implement that in aquarium.

C

Okay, make sense.

G

Yeah make sense.

G

Now I was wondering why.

G

Any more questions.

G

All right well, thank you, drow and alex for presenting on project aquarium for the community.

G

The do we got a loop, well, hey.

F

Mike, hey, hey.