Rook Community, 27 Feb 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2018-02-27 Rook Community Meeting

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

What an hour earlier help any more logs enter or that's still evening over there. That's.

B

C

All right we recording, has started so yes, let's get going then so. This is the February 27th 2018 community meeting and let's go ahead and jump right into the agenda. One of the first things that I wanted to talk about today was any issues that we have open for 0.7 and, if we're going to need to do any releases to address them, I know of one fix so far that has been back ported to the 0.7 branch for being able to publish the helm, charts to the new rest.

C

Three buckets so I believe that we will have to follow up in some way to actually get that those the helm, repo, updated bassam. Do you have a comment on that for what the next steps may need to be.

D

I've not been able to test promote because that only runs and we actually run the promote build so I'm hoping it works. It was the wrong repo I was at the wrong bucket, so but I think we should. The one way to find out is we should run it so so like go ahead. I'd say you know: let's once we get everything else and let's build and see if we can deploy it if it works so.

C

My my question about that are two questions. What the first one would be was the bucket name change. It was that associated with the transfer of ownership. Ncf, yes, okay, I got it and then is it possible to rerun the 0.7 build or like a do, or is it item potent in a sense that you can do a promotion and it will overwrite the existing 0.7 or we need to take a zero at 7.1 for this? Well.

D

The court has changed so unless you're willing to different commit hash with the same 0.7 tag, which is bad practice. It's gonna get a different tag.

A

So we should just run the new. Oh seven, one yeah.

D

So my suggestion that, yes, oh seven, if the code hadn't changed, then the you know, if you would change and.

A

Jerry there is one more change in there. The dark changed for upgrades that'd be good. Well, it is been reported already.

D

D

A

Exactly even more reason to ever not one.

C

Okay, so then I tried to catch up on the conversation about the new SEF version. Yesterday, now is that's a week, picking up twelve dot 2.3 because the new packages are released and therefore the build will automatically pick them up or was there a specific commit to our repo to point it at twelve 2.3 yeah.

D

So one of the problems with how we're building stuff right now until the they make changes upstream, is that we pick whatever is in there abt or repositories and right now, the we're using luminous and they've just upgraded luminous to twelve to three. So every time we build a docker image, we're gonna pick the latest. There is no way to control which minor version we pick up.

D

That's changing as part of the work happening in the 1700 repo, but right now were were kind of stuck with that right.

A

Unless we choose not to recycle the build agent and then we keep character version, but that's it's.

D

Very fragile- and we knew this was the case, but we also knew that there was gonna, be a fix upstream for that and so I think it's fine 407 one. We should take twelve to three and then hopefully for the 0h release, we'll start using the images from not a published by self container.

B

A

Sounds like that'll tear apply on that in the next couple of weeks, even or fairly soon, yeah.

D

Yeah they took some of the stuff in to make files that we were using. You know out of a bunch of other things, so that should help.

C

Okay, that sounds good then, and then could you all walk me through a little bit more about the agent. You know recycling yesterday, that happened and what full effects that has yeah.

A

Well, by stopping the agent well, an agent will have reused. Docker images from previous builds to for more efficient builds. So since we wanted to pick up SEF 12 to 3 I thought: well, let's just recycle the build agent, and so that's why I recycled them and then the dr. cash was gone and brought up a new build there. I picked that up was answer your question. Yes,.

C

Yes, that does and then going further a little bit on that. What is the you know, scope of a dot of a Jenkins agent. Is that an ec2 instance.

D

Yeah and its there's nothing only that we care about, so you know you can really kill them at any point. The only thing that happens is that it's a longer build because there's no image. Caching at all, but that's exactly the effect we wanted. We wanted to get the image fashioned out of there.

D

B

There a way that we can specify the separation, so we always have to well push a change to the data file to say, yeah use this new version.

A

Yeah yeah, that's what I was saying and soon the Ceph team is gonna, be fixing that upstream, so we can pull certain packages yeah.

D

It should be for 4:08 I think this will all be fixed and especially, and we'll talk about some of the background. If we start using different images. Well, we'll just use the safe container one and and there they're going to be labeled with its specific tags that are optimal hider versions. So I honestly, don't know how they're going to solve the problem of. If there was a security update to base OS image and the apt repository has moved on I, don't know how they're going to rebuild it with an old version.

D

So but that's that's something worth taking up with them. Is.

C

That something that you're aware of that they are already tracking or is that kind of out in the ether issue, I.

D

Came up on a few issues, but I'm not sure where the authoritative issue is for it. I'll take a note and I'll add it to the long list of things: they're tracking, ok,.

C

So I wrote down here as I'm sure you all can see that I will follow up on running the 0.7 at one release and making sure that gets out and then verifying that the helm charts have now been updated as well after that, okay, does anybody else have anything to discuss for 0.7 or does that cover it for everybody.

D

Does the picking up the 12 to 3 fix the issue with set manager, I'm Prometheus, it.

C

Does yep so that's this guy down here that is fixed and rolled out to the 3, so we'll get that for free with the release. Today, ok.

D

B

Because we're now the manager bromios metrics are broken, so we need to. We really need this update, offset.

C

And we also so that's 4 0 at 7 release branch and we also got our getting 12 dot 2.3 for free and master as well, because the Jenkins agents caches were blown away. That's.

B

Right excellent.

C

So, okay, so that covers all avenues then I believe.

B

D

Ahead, Alexander just.

B

A second Nicastro can go.

B

So is there anything speaking against dropping the agent or the cache at least I, don't know once a week because well without dropping the cache we wouldn't see. If there I don't know a package or something goes bad to put it like that,.

D

Well, the caches are super safe. The only reason they're that this issue happens is that, because of this you know we don't have the packages for Seth pins, but if there is a change in upstream Ubuntu, even when we build, we will pick up unusual a window release even with the caching, because we we run all the base images with poll, which means that if there is a change somewhere upstream, will pick up the whole chain, so it seems very safe to me, except for this one issue. We identified ok.

B

C

Alright, that's that sounds good for 0.7 for me, so I also wanted to discuss the much higher rate of continuous integration pipeline failures, we're seeing now. This may or may not be related to the Jenkins agents being reset yesterday, but we are seeing what we previously believed to be random failures, we're seeing them now at a much higher rate. Travis did you happen to have any insight on that from last night? I did not follow up myself, yeah.

A

I was thinking more about this, and we have seen a pattern where, like agents that are reused like same ec2 instances will have a higher rate of failure. We had several master builds in a row fail which confirmed that, because they were one hour after the other this morning, I started the master build again. So all the agents were stopped because no builds have been running and the build succeeded now so there's something about warm agents that tends to cause these issues between. There must be builds, leaving things behind or running over each other.

A

A new build starts.

A

So if you know I feel.

B

Like either, we need.

A

To it feels like there's a Jenkins setting that will we can change that. Maybe so it uses a fresh agent every time or is there a timeout that says how long Jenkins is to stay up for the next build or if we can I want to look for a setting like that? What I haven't yet so.

C

A couple questions then, on that you know to me it to me, but I would have a preference for you know if it's possible, that is to figure out what the stale state is. You know if it's something that should be in the responsibility of the test framework itself to clean up and it's not cleaning up appropriately, then you know I would like to be able to fix it there right.

C

Definitely yeah I, don't know if we know what that exactly is, but if we could figure that out in any reasonable timeframe, but then the other question would be about what is the current? Do you know what the current Jenkins configuration is in terms of how long a in agent instance will be idle before it terminates it completely such that the next build that comes up will get a fresh and completely new easy to instance, seem.

A

Like it's like 10 minutes and if there's no build, then it stops, but it.

D

Stops but it does not delete the instance. We we stopped instance, but it keeps it state. It.

C

Is that is that, for an indefinite time period, and so we manually delete the tert like oh okay, well,.

A

Then that for the agents there's did I, guess I, don't know if there's type of agent or whatever, but there's the one that the main agents that runs the build and then there's the other individual agents that run each kubernetes, 1.6 1.7, 1/8 and n tests right. Those agents seem like they start up and they go away.

A

But then they're reused that so but those are terminated, looks like it and we don't well. We reuse them so I wonder if we can not reuse those agents.

D

My my gut says that we should understand what the failure is before we start changing things so, and it feels like we're assuming making assumptions about stale State when I could even what could be just a you know, a unit test earlier. That's that's. You know intermittent or yeah.

A

And the most common failure, when this happens, is that it says it couldn't I was it tries to delete the operator or the cluster that or start one that doesn't exists anyway, something where it exists and it shouldn't or it doesn't, and that should, and so it fails where it's. The only way it could happen is a seems like Steve from previous build, but oh yeah I'll look again and see, if maybe there's some logging dad to help track it down or something. Yes,.

C

I, just looked at the build from the PR pipeline that I've been having a lot of problems with since yesterday, and yet this is the exact same error again. Third, pretty resource status, dot extensions default already exists for the object store on him. That's the seems.

D

D

The test framework does is leave some state behind or something right, yeah I know yeah and it seems independent of Jenkins and agents to me.

A

Well, if the agents start up clean, we don't seem to see it. We.

D

Do we never had agents that start up things.

C

You're saying there's a difference in lifecycle policies between the you know, main Jenkins agents and then be like what the agents that run just like a 1.6 cooper, daddy's pipeline right.

A

Yeah because I see instances right now called Jenkins slave public.

A

Let's see well and I also see thinkin's public one seven and one eight that are running there must be a build running right now and those seem to go away when there's no build to running that. Just those latter to you, but.

D

Anyway, yeah, let's do, let's maybe spend a little more time on that offline right.

C

Yeah and I've took a note to open an issue on this too, so we're tracking, because I do not think we have something open for that. Yet oh yeah.

A

C

All right so we'll follow up on that then it's the next agenda item I had on here was talking about the roadmap, so I have a draft of the roadmap it's up and running here and this page, and so this is taking a very high-level look at some of the upcoming rock milestones for 0.8 0.9 and in 1.0.

C

So let's start going through what some of the features are for the you know next milestone here, 0.8, so one of the big things that you know we want to accomplish here is you know it's something that we had gotten feedback on and the you know CN CF acceptance process of being able to enable other alternate storage beckons besides just SEF, so we need to get started on the design.

C

For that you know very quickly and be able to reach a you know, consensus about how best to approach this and then we're gonna have to do some refactoring. That's got obviously got to be front loaded. You know across the repository to be able to. You know, enable this and have other backends besides just stuff one potential back into start start with.

C

Besides, the stuff would be mini, oh you know they it's it's a popular repository, it's a popular project and then a big advantage of you know going after one is that I've been you know specifically, is that it does not have a very complicated set of configuration in deployments process, so it's achievable, and then also you know, having a second back and in general, would be able to help us kind of vet our design and make sure that we have the right, abstract and reader factoring to being able to.

C

You know in a sane way, support more than one back in I'd.

D

Say also that many of our has a hump hump chart today and it's fairly limited like it doesn't even do a racial coding or doesn't even set up many year to do some of the things that it does. Well, that's why I an operator controller for many other I think makes a lot of sense, so so that that seems like an interesting back end to a second part. The first other back end besides stuff to add, would be makes makes it interesting. The.

C

First, second back end: alright, as.

A

Part of that design, I think too well, when I started this design, I think what I would feel like I was missing and now I'm realizing is that you know including well, what's the scenario? What would be the motivation in this case for many users to come to work to then go run Mineo right, so I guess you know here's what mmm here's? Why and the motivation.

B

Sir, you guys could just jump in as I have to leave in a few minutes. I would just like to ask you guys for review and feedback on a Mon stateful set or request.

A

Yeah, okay, yeah I meant to ask you about that, so that yeah look at that again anymore.

B

Any questions for ya: okay,.

A

Yeah I'll follow up with you for sure. Thank.

B

You have a good meeting Thank.

C

You Alexander, yeah and Travis. You know one of the points that Bassam had just made to I thought was a fairly persuasive argument, for you know the. Why of many or the benefit that it brings cuz you know their existing helm. Chart bassam set is very limited, and you know a lot of the features that many of us capable of you know can't even be enabled with that, like a ratio coding.

C

So that would be a why somebody would, when you used to look at an operator as opposed to their you know, limited come chart all.

D

Right the hell I mean this is the this is a common pattern and actually a core premise of rook, most storage systems and stateful workloads. You know go as far as using you know. Stateful sets, but don't go any further today and as a results, you don't get to do things like you know, scaling them dealing with things like you know. If they have any kind of consistency requirements, while they're scaling quorums setting up failure domain setting up you know migrating replicating stuff across failure, domains being sensitive to how often that happens.

D

All of those things are, you know, left to as an exercise to a human right, and so Mineo is no exception. So you know, and- and it just seems like a popular project and and might be a good place for us to start yeah.

D

And it I think the interesting thing here is: it grounds the design around multiple backends and refactoring right.

C

Having something tangible to make sure that, where our design is, you know kind of vet to design.

A

And I guess part of that to clarify in the design doc why rook should provide that that and how it already.

B

A

This infrastructure, whatever to make it easier to implement whatever, as opposed to many of writing their own operator, like Prometheus, has its own operator or whatever yeah.

D

And I think I think that will become apparent as you look at the design great.

A

C

I realize I was on mute for a while there, sorry, so you know kubernetes the next item in the list. Kubernetes has a couple of different options and for being able to extend the platform and define your own custom resources. Currently, ruk is using the you know, custom resource definitions which are somewhat limited and rigid and what they enable you to do. Api aggregation is a much more flexible way of defining and extending the kubernetes api and gives you a lot of features. You know like validation and multi versioning.

C

You know custom logic and they're like merging four patches. Your support for upgrade matching stuff like that. So you know front-loading this work. If we're going you're going to change from using CR DS to api aggregation would be important as well, because, that's you know, part of defining and solidifying our API surface area right yeah. There.

A

Definitely a lot more powerful, and especially the versioning is, is a worry on my mind and on my list right.

D

Now I'd argue that you know we have a it's completely. Busted are starting right. Now, it's like we have to. We have to ask the user to go literally, transform an old CRT turn if we upgrade it right and, like you know, have to write tools that are outside of it and it looks like as a 1.7 there is now well see. Are these were introduced in 1.7 and API aggregation was introduced in 1.7 and it you know it looks like older.

D

You know, there's a ton of flexibility and API a creation. That's not there's not exist in series. Mm-Hmm yeah.

A

I agree just the versioning question alone and the validation even better.

B

D

B

A

It allowed us to start in a simpler place for sure it.

D

Was a continuation.

A

D

Tp cars and that which was where we started so it was a simpler transition from CRD city from TBR's to CR DS, and then it was to go to API aggregation. But it does look like everybody's building around API aggregation now so.

D

That's that's the path forward.

C

Yeah, so it seems fairly clear that we've, you know want to accomplish this. You know in terms of I guess we don't need to be discussing your resources for this right now, it's just kind of defining what the roadmap is and what we want to accomplish.

C

So the next item I had on the roadmap here is you know our existing CI pipelines and build release promotion pipelines? All that is hosted in a Jenkins instance that we need to migrate over to something hosted and owned by the CN CF I. Don't know you know what all that entails, but I don't think it's entirely trivial and it's probably gonna take a fair amount of work to do that. Is there.

D

C

D

It is gonna, be a lot of work and we have to figure out how to stage it, but I'd recommend that we start with talking to the CI team, the CNC F and putting up line together.

D

We are going to need some like essentially an owner, for this. Do.

C

We have any idea who has the skill set. They would be able to help out with this. You.

A

Know there is a potentially John from quantum said he potentially can help with this, but I haven't had confirmation on on that.

D

Let me find out if he's interested.

A

Very natural great to have on board he.

C

Certainly has the skill set to accomplish this no question about that. Well, okay.

C

Another thing that I'd like to accomplish is getting together your design or early support, for you know snapshotting and policies around that one benefit that gives us is that that helps solidify, and also that you know some of the design we have for multiple storage, backends and so defining these you know common abstractions and commonality across the different storage providers, with something like snapshotting in the policies associated with them would go a long way also in strengthening our story, if around multiple storage beckons and also vetting it at the same time,.

C

Okay, the so there's there's still work, I believe I, don't know how well it's captured somebody who can perhaps comments on this for running these privilege and also you know, security issues in general. I think a lot of these were associated with what we saw an open shift and I. Don't know what all the scope of the work that's left there, but I know it's not completed yeah.

D

So there I think the there was a bunch of cleanup and unit test cleanup to get there, okay, PRA and CLI out of the out of the tree, and that enables using less you know the whole security designer on namespaces and removing or reducing the number of service accounts, and all of that there's already work that started there and I think it got stuck on. There is just so much unit test that used the Brook API yeah, so it's it's gonna need someone to go.

D

Remove the API and CLI completely and fix all the unit tests and the rest of the work can resume yep. So it feels like you know a couple of days of grunt work.

D

It will be out of the wedding I.

C

Have a ton of background in context around you know what the API services used for and also potential replacements, for that they can give the same functionality that this you know, integration tests are relying on the API service for currently mm-hmm.

B

C

May be a good candidate yeah.

A

Well and maybe we could share some well some load somehow if it, because we both have so much contacts there. Yeah.

C

A

D

A

It'd be good to get the least privileged we're going yeah.

D

I think, once once that's removed, we can put all the new sir. You know the rest to continue the work on on fixing up namespaces and service accounts, and that should get us a lot closer to a better federal securities in.

C

Bastami, you wrote a one-pager design for the you know: secure service accounts and namespaces and security assisted with that right. Yes,.

D

It's it's actually been reviewed, and it's in that and the design part. Oh.

C

Cool, so it's already in master in the design flaw awesome great great great. Thank you.

D

A

Right, oh by the way back on the API Greg regression, if you want to put someone's name by it, I'll be happy to take that I. Think.

C

We can worry about that when we have issues in yeah opened up in the repo, then we can worry about putting names on there now our and it yeah and.

D

There's a there's a pattern: there, that's emerging with a single API server builder I know that a couple guys here are looking at it, so so it might be might be good to share notes here. How do I get this fine? Okay, awesome.

C

So the next one I have here is about running on arbitrary persistent volumes. That's something that we've had a desire to do for a couple milestones, but it has been kicked down the road a couple of times, but that's my understanding of this feature is that it would enable rook in general to be you know, backed by you, know, Google persistent disks if you're running in Google or EBS volumes, if you're running in Amazon, anything that can be surfaced as a persistent volume could be used as the for lack of a better term.

C

You know disks to support rook. So does that? But some do you see that having impact beyond stéfano s DS? Is that start getting into the multiple storage backends as well? I assume I, think it.

D

Gets into SEF and other backends, because what's F, it's most likely wrote the local volume, although you know things like abs are interesting too.

D

Basically, there is a. There is a notion of a lower storage cluster substrate, which we build storage systems on top of, usually it's local disk, but it doesn't have to be and I think that ties those the running, arbitrary Peavey's and whatever we end up doing as the storage cluster cluster scope or whatever ends up being. You know using that so I think this is a very critical feature honestly right now we're this whole host error thing.

D

A

I feel like around this there's also a bigger question of well. What, if Brooke made it easier to use arbitrary Peavey's even without running the back end? Does that I'm not sure what that means yet, but feels like if I'm running in AWS? Why would I even start stuff just use EBS.

D

That's almost an independent question. Quite honestly, yeah you know I think running across availability zone is faster fit over time. You wrote a blog on it. So right, yeah.

A

They're definitely scenario for it just feel like.

D

And for other backends I think it's even even more over the case, there should be a you know: loose coupling between the backend and the actual storage substrate behind, underneath it, local volumes are not.

C

All right, I think we've still got some issues with you know. Shutting down nodes are draining nodes, restarting nodes. You know all that sort of thing we're still issues floating around there that have not been well addressed. So effort on that would would would still we see issues about that. You know pop up on slack with a not zero frequency as well, so that would go a long way for helping the user base.

C

I, don't know if we have those specific issues well captured, but I know that we have some issues of you know the general behavior, the from it users where the general behavior is they'd, be starting to note, and things went south yeah.

A

I was just commenting on at least one issue yesterday around that, so that definitely is awesome.

A

C

Then we have some stuff specific features, improve and improvements that we'd like to accomplish. One of them is finishing off the work. That's I started in 0.7, where we know we have support for adding and removing nodes to a storage cluster right now, but and then we actually have a lot of work that goes to supporting individual disks, but that's not officially supported by the operator at the cluster level. Yet and there's a number of prerequisites for that to happen. I'm one of them would be the architecture change of having a single OSD per pod.

C

Instead of you know one pod per node, so that changes a number of things in the within the operator and also the you know, obviously demon itself and then also being able to do centralized device discovery so that the operator can make scheduling decisions about nodes from a centralized location is involved in this as well.

A

I'd like to I guess: that's the highest priority in my mind, for besides adding and removing discs, or the first part of that is the one OSD for fun right.

C

I am afraid of cause it and there's also you know the Ceph manager balancer module. You know this. It seems like the in terms of being able to keep PGs balanced across the cluster, that that is a cheap way that you know we can enable that by just getting you know, have the operator to get the that module loaded by the staff manager instances and then start getting that behavior for fairly cheaply.

A

C

Then and then some issues about Mon reliability. You know I have less context here than Travis and Alexander have and then at, but I assume that the state will set work. That Alexander has been working on goes towards this category right.

A

Yeah and I've been working closely with him and he's doing a great job investigating this. So I haven't heard the latest on his investigation to get it working because I I wasn't sure how I would work with changing and things.

D

One thing Travis that I'm curious about is you know: I I, believe the there is interest in the Red Hat team to use in the mimic time frame, which I believe is May, or at least May ish, so I I'm, hoping that we can get more people involved and getting the staff back hands up to optimum. You know, after production levels in mimic, if it's going to be deploying with.

D

Enterprise storage or whatever I mean, and that's going to require you know more more resources on it. So I I don't know what the plan is there right now. I have not not seen a lot of changes yet yeah.

A

Jared, if you could add another bullet for that to upgrading to mimic.

A

And I should get us a lot of good things. Pg management.

B

C

And then it's also important to note here too, that you know the work around multiple storage, backends and refactoring the code base. That's going to have you know the moving parts and pieces of the codebase moving around so we'll need to make sure that you know this up front and quickly to make sure that we're not stepping on toes for the you know, seven, what we wanted to do specific features in their force, F right.

B

Right: okay, alright.

C

So that's what I have for 0.8. It feels like a lot a whole lot. Yeah, it's huge I'm kind of wondering if there's things that can be bumped, you know out of 0.8 I feel.

D

Strongly about the historic muscle, storage, backends and getting a guy a granola stuff figured out early because it's gonna set us on. You know the right path going forward, so we have to front-load it. The.

A

Arbitrary Peavey's part of that or I.

D

I, don't see like can't be yeah, so those those seem to seem and then removing the API and CLI we we just have to do. First.

A

Security, you can make that as a bullet of privilege. There's.

D

You can see volumes snapshotting, a policy is falling off unless we can get some more help. Maybe you know there's this whole google Summer of Cook thing we are now part of.

D

Maybe that's an item for that may be a good outcome here is that we have a bunch of issues open and we can put Help Wanted on them.

D

So so there has to be some. You know some pass through the issues and to do that, but.

A

If we, if we take something as help-wanted, we probably don't want to assign it to a date, though, or I can't rely on it at least.

D

Oh, we should start there I should, if we're not gonna, do it that's what Help Wanted on it right, just yeah and and I I think Jared. You were gonna. Take the action. I am to open a couple of tasks or projects in google Summer of Code thing right. Yes,.

C

D

There might be a deadline on it too. You know.

D

C

It seems like that, getting you know, issues opened that corresponds to the roadmap, and you know identifying ones for help wanted that's kind of a required step, or that would give us what we need to then go ahead to go over to the Summer of Code and kind of file things there and.

D

Then front-loading the designs, the one page, was getting those reviewed and then identifying the work should should bonus and then Travis I think on your site will be good to get clarity on who else is involved in making the specific changes right.

A

Yeah I haven't had much communication with them yet already. Next week, though,.

C

Here let me try to catch this real, quick and then I'll come back to another comments. I had so we need to get.

C

C

I guess he all didn't see it on my screen, but a doctor update just popped up and stole my focus. That's a hard! No on that one.

A

Yeah I got that about ten minutes ago.

C

Okay, so yes, so I I was thinking that you know this. Removing the rock API at CLI needs to be done soon. That could be done in 0.8, but then you know this all the security work that that enables probably could get bumped to 0.9 instead. So we'll start with this and then do the later security work and in the next milestone I, don't.

A

Know with all the security work that's already been done, is it is it pretty close, though? Already with once the API and CLI is gone, it's very close and.

D

And I think it has an owner I. Think Ilya was wanting to push this forward. I really.

C

B

C

Was a fair amount of work left forever? You know that they're doing the right, namespace and service account stuff and.

D

I, don't I think it's really probably in the 60 70 percent line. They got stuck on unit tests, okay, I.

C

Didn't realize that so if.

D

You, if I, don't Travis, if you want to chip in their help with as well, maybe Jared you can look at multiple storage, back-end designer I, don't know what the word yeah.

C

This is all my plate, design for most research beckons and that's good. That's going to consume all my time. So if Travis is looking for something to do with his cycles, then that maybe.

A

We talked about switching couple things and I. Think I'd start there. I.

D

Mean if you, if you can knock this, not be a unit test out in the next few days, we can get the whole security thing done. Okay, this week.

C

Okay, good, that's good to know. I thought there is much more extensive amount of work left to do.

D

C

Excellent okay, cool I would I would venture myself to go.

D

C

He bumped the vine, snapchat policies, stuff yeah,.

D

C

Just don't see that happening in zero date.

C

Okay, so it's um we're.

A

Gonna at nine there's a question: I didn't bring up again. Kuma is one point six with TP ours are we at the point we can drop support for that there's a number of upgrade issues. It would come with that yeah.

D

I think lost logging thing is open shift. So where is open ship right now? Well,.

A

It looks like the latest open shift is on ranae's 1.7. If.

C

D

A

Place I just glanced at it yesterday.

D

And what's the deployment.

A

D

I'd love to drop TP hours and especially, if we're going to API acquisition exactly so. If we go to API aggression where one seven and the minimum version would be one salamati yeah.

D

Maybe maybe we can yeah? Maybe we can make that as part of 0.8 release.

A

It feels like it needs to be well, and even especially, by the time we really start out date. 1.6 should be even less.

C

Just gonna put it on the roadmap here for now, then or say: support.

C

Okay, so we've got about a little bit more than ten minutes. So let's try to get through this here as well. The the amount of you know, items that are fleshed out in 0.9 becomes much less than already in 0.8, but we big thing that still needs to be addressed is upgrade.

C

You know, in an automated fashion, we've been kind of limping along with annual upgrade support, for you know you, with the user guide in Java, good job of updating that to support to make sure that it works with the 0.7 release as well, but we need to get to an automated solution.

A

Well, with all eight changes by the way, you know what would happen if we couldn't support upgrades from ODOT sentiment o28, since we're still in alpha, can we make that statement? I mean I, think we'll try our best, but it seems like those are some pretty big making changes. I.

C

Don't know enough yet about API aggregation to know if you know, if anything at all is possible from going to serious to API aggregation, but I have a feeling that that's going to be either really difficult to support or not possible. I, don't know right, yeah.

A

So I guess I just wanted something to keep in mind that upgrading in the future may be really hard until I make all those big changes so yeah we can move on for now. Yeah.

C

Yeah and- and we may need to be okay with that- to get to the right architecture, still um the operator running and high availability is still a point. You know a single point of failure. Right now, the operator dies for some reason. There's no, you know. Besides the standard, kubernetes no support, there is no, you know, master slavery, I believe.

D

That that comes also with API equation, because the controller manager was part of API regression does video relaxation. So let.

A

Me get that already.

C

Okay, that's good to know.

C

We have durability of state as well. That was in a previous milestone, that's uncertain, I know for supporting you know: local storage, local volumes and being able to you know in a disaster. Recovery make sure the configuration since metadata esterday like that, can be regenerated data. That happens if somebody recently didn't hit Travis where they accidentally deleted. You know all of the like the CRT, the operator, all that sort of stuff that deleted a lot of things they had to rebuild their cluster from what was existing on disk. Did that go well for them, so.

A

The didn't end up doing it. Actually, they said it was just a test cluster, but they want to know if it was possible in case of it was a production cluster. So I said, oh that I think he opened a ticket, so we could document that, but that yeah I haven't gone there. Yet okay actually try it. It should be possible, but yeah I think people would feel better if it was documented and tested.

C

So we'd already discussed, you know moving to a PIR Gatien which would give us the ability to do some of these features about you know more rich validation of CDs and status and progress all that so we'll get the you know the ability to do those the API aggregation, but there's still work associated with doing them to support. You know enable and support those types of features, some thinking that just you know making the switch to a pea aggregation.

C

This is all that's really going to be accomplishable insert a date and then using some of the abilities for more rich features. To enable those features will be more possible in 0.9.

B

C

Already talked about snapshots, we moved that's from 0 at 0 to 9 and then the stuff features I have here are there to improve data placement pool configuration which there's been some discussion, that I can't recall right now about from the Red Hat team I believe that that fits better in the Ceph platform itself, as opposed to it at the Brooke level. So there may be more discussion there about, especially about what's going to be happening in the mimic time frame that may be addressing some of this more appropriately. At this F level,.

A

And, and by the way the if mimic is in May, that's still in our ODOT 8 timeframe, all right idea: I guess we put that nobody but yeah of that in the future. Okay,.

C

And then also for SF s are using a block devices of the our BD. We have dynamic volume provisioning for that and your PBC's purposes. Persistent volume claims, but we did not have the same experience. Force ffs so enabling that might be might be something to address. I haven't heard it in normal. Oh.

D

I'd love to see this if volume stuff move earlier into even 0.8, but it's gonna require some somebody to kind of drive it. It's.

C

It's in the same vein as some of the changes associated with adding the moving discs. That's in the same, you know, code part of the code base and some of the same logic goes on some of those those paths there, but yeah getting a research for that may be pretty difficult right.

A

Yeah great area for well the whole volume provisioning area losing Steve so there's a whole.

C

But some would you like me to move the SFI em stuff up into 0 today, for now, I mean.

D

It's not broken during a so I'd say, probably not until we yeah okay.

C

And so that's all I have for 0.9 and for one dot. Oh, you know to get there we're gonna have to you know a lot of our the types that we had to find our custom resources will need to. You know, make it to stable or v1 version. That's you know, obviously a hugely nebulous goal that has a whole lot of. You know smaller tasks. To get to that point.

A

D

Then, but see our side piece should probably move to 0-9 and under set features.

B

C

So there was definitely need to be some more thinking about window because, basically we're just saying to get the window, we need to be thirsty. Let's yeah, that's pretty light right now and we'll have to know well.

A

And before stable, we need to declare beta for the different features yeah.

D

So I think for that one I, don't think the project as a whole should be declared alpha and beta right, but I think the tights should be so.

D

We should figure out which types are in what stage it's like say. These has all fine beta 4 types, but I suspect some that some of the base types for cluster and all the stuff total beta first yeah.

A

Hopefully we can get there for a couple of them in ODOT 8, but with all this changes that may be well.

D

Once they go beta, that means we're supporting them upgrade and going forward, and once that happens where the API aeration, we should be able to take them to us. V1.

B

A

C

So the next step here is to update the roadmap MD in the rec repo, with some of this stuff. So that'll be part of a pull request and you know, which is then has the availability for more commentary to solidify this before it gets committed to the repo, so I'll take a stab. Then, when I open up the proquest at some of you know these items about, you know what types could have beta and then we could discuss it in the pull request that sound, reasonable.

D

Alright, thanks for driving, Jared yeah.

C

No, probably, and then was there anything else we need to talk about today. We already addressed this here, and you know this is Alexander requested to take a look at the monster force at PR. Travis I. Do you do that, and this is another follow-up to so?

C

Ok, so I all gets I'm gonna run the 0.71 release, get an issue open for the you know: integration test failures, we're seeing and get a PR open for the roadmap as well, and then Travis you and me, and Bassam may need to get together soon to kind of identify.

C

You know: do we it's been a long time since we did a triage period, so we've got some some Croft and some housekeeping that needs to be done for issues in the repo yeah.

A

I made a quick pass through yesterday yeah we got lots of emails, but there's more to do.

A

C

So we could talk about this. I mean talk about that sometime soon. Soon, ok does that. Did anybody have anything else before we call it a day today.

A

Sounds good just.

B

Excellent meeting sure I.

C

Think you did alright guys, then we'll get back to our normal schedule for the next meeting and I'll make sure that I'm trying to get a calendar appointment out of the somedy. Are you the owner of the roof, dev Google, Group I.

D

Started it I've not seen any emails, I'm, not sure I.

C

Look: okay, that'd be great yeah, if you can add me to that or something so that I can kind of, maintain that and send out calendar appointments to it. That'd be perfect. Ok, great guys, thanks for joining today, I really appreciate it. Thank you.

D