Cloud Native Computing Foundation Storage Special Interest Group, 23 Sep 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF SIG Storage 2020-09-23

Description

CNCF SIG Storage 2020-09-23

A

A

Ardalan, can you hear me.

B

Hi erin, how are you.

A

I'm good, how are you.

B

Doing well thanks, it's been a while.

A

Pull up the agenda for.

C

C

A

Give people a little bit of time to get here.

A

I'm gonna give everyone two more minutes to join, doesn't look like. We have um very many people.

A

A

A

All right, it's five past: let's go ahead and get started, um so I put in the chat. Hopefully I don't know if zoom allows you to see the chat after you've joined or not I'll, go ahead and re-paste just to be clear. So today we have just three items.

A

um The first one is the dataset lifecycle framework yeah. I believe you're you're joining to talk about that. The project went through a sandbox review on the toc yesterday and they had some questions about how it differentiates from the cozy cap. So I wanted to to start uh just with that discussion. If we could please.

D

Okay, um so to that context, I I shared the frequently asked questions I just compiled it today, because um the documentation is still lagging and it's a bit difficult to digest. What is exactly the framework trying to accomplish, and um the comments were fair right about, uh you know not being very clear on what is adding to the ecosystem and how does it compare with the cosy or with the other csi? So if you have um so, I can, I can just share the screen for the wiki or um you can just uh yeah that'd.

A

Be great, please do.

D

So let's try this.

D

Okay, can you see my screen? Yes, okay, so yes, uh so I'll, I will just go through everything, because it's just a few questions right. So what does the framework do exactly right? So it brings one new custom resource definition right. The data set, so basically it's a pointer to an existing data source and the current implementation is both s3 and nfs, and so yeah. Is that just a crd? No, so, basically, what we're doing is that we're trying to map uh not we're trying to map.

D

We were mapping every data set that you create with one pvc that the users can directly mount into their pods right and the logic is implemented as a normal kubernetes operator. Now, what is the motivation right for this work? Why why we started looking at that right? So when um container storage interface was introduced, right, more and more storage providers were available on kubernetes environments right. It's just that.

D

From our perspective, we feel that the barrier for um non-experienced non-power users of kubernetes right- it's uh the buyer- is a bit high to leverage the available csi plugins right and gain access to the remote data and sources on their workloads.

D

So that's what we're trying to enhance we're trying to enhance the user experience mostly of data access in kubernetes, so we're bringing the higher level of an abstraction the data set, and we take care of all the necessary work uh about invoking the appropriate csi plugin configuring provisioning and giving in the end the pvc for the end users to use right, of course, we're not looking to replace csi right.

D

If you go through our framework right, we have. um We have implementations of csi plugins that are standalone right, so you can take this part of the code and use it. So we have csi s3 and the csi nfs implementations that are actually open source and I will talk about what we did there. So our aspiration right is to be like a meta framework right for uh csi plugins, and the comparison that I like to make is like the same way: cooper flow. You know makes machine learning frameworks accessible in the same fashion.

D

We just want to make csi plugins and pvcs accessible uh to kubernetes environments so to the cosy proposal. Right so, of course, we're not competing with the cosy proposal and we're not um currently right aiming to um have this functionality.

D

You know rewritten as part of our framework right. So when we started the project almost a year ago right, the only csr plugin that we were aware of was the csi s3 that I'm pointing in here, and actually we have four and we maintain, because there were some dependencies with the side, cars and all this stuff. We are updating this uh as as is in our um in our repo right so now in the future. Right when cause the interface becomes part of kubernetes, of course, we'll stop.

D

Maintaining our forked version of the csi s3 right and directly invoke cosy for uh creating a pvc for buckets in cloud object. Store. um Cozy also aims to manage the full lifecycle of a bucket provisioning configuring access, which we don't, which is actually beyond our scope right um and the buckets and the s3 is just part of what we want to support as type of data sets right and an additional some additional benefits that are on the roadmap, and we have some initial implementations.

D

We feel that it's after you introduce that concept of a data set as a higher level abstraction. Then you can build also high level orchestration right, so um I think we can achieve improvements in terms of performance, so we have, uh you can have we try to present a plugable casting interface and we have an example implementation of how this would work, so it will be completely transparent to the user and they can provision cases depending on the type of data sets without the user explicitly.

D

You know specifying cars or configuring the cast on their own and also we feel there might be interest on the security aspect, because imagine that we could have a common layer of um access management for credentials of the different type of data sources. Whether you have you know your normal stream credentials like um secret access, key and the access key api and the same fashion, you can have username password all part of the same, um a access management layer and we believe that we there are there are. There is some interest.

D

So I would like to point out and give a shout out to the people who have embraced the framework. Even in this very early stage. The european bioinformatics institute and david specifically are running a boc with dlf and cubaflow under cloud infrastructure, so basically they're using um pipelines and s3 and the 1000 genomes data are on s3 packets and they're, rewriting their pipelines to with the data set convention, because the s3 credentials are before the left were repeated, and you know plug they're everywhere, like environmental variables.

D

Instead with a frame with a frame, they feel it's uh more convenient for the user to digest and write their pipelines.

D

The reason there is from the open data hub and you can see a relevant issue- that there is interested in integrating dlf directly into open data hub and, of course, there's pathetic proposal, which is actually very close to the data dataset specifications that we are uh supporting in our code and if you look in their uh in their code, um dlf is actually forked and it's being under evaluation. Whether you know it can support uh the implementation right.

D

So um we make a post right now and you know take up any questions or comments that you might have.

A

I just had one question, so I thought the purpose of using the pvc, though, was to leverage the way that we mount volumes to to keep that inherent uh functionality for the data sets I'm switching to cozy. It's not going to be exactly the same, but that you, regardless you're still the idea is still you're wanting to basically um provide an easy, accessible way to a specific data set. So it's almost like pre-populating a bucket basically and um allowing a pod to point directly.

D

I I need to study a bit more, the cozy proposal, but uh I I thought also it was. I wasn't sure that it was part of you know creating a pvc, but you know if, um if it's more integrated the more it's integrated to the kubernetes environment, the better for us right, so we won't have. We don't have to do any uh new types of orchestration on our own right. um The pvc. uh So imagine that there's a one-to-one mapping right. One data set one pvc.

D

If um we are thinking the data set to be like a user-facing thing right, so uh the user is aware of data sets and in their posts they just use a pvc them a configured pvc from from the framework right and also we're envisioning scenarios where there would be another another provider, let's say creating the data set pointers, let's say so: it's a data set object. There would be another persona creating in the cluster. The data sets and would be the other persona, a simple user single.

D

You know a user who just wants to launch pods and loans workloads that will uh mount in their pods without, without you know, configuring finding pvcs and all this stuff. We are adding an admission controller to inject those pvcs in uh in the pods with labels. uh But this is uh an additional feature it it will.

D

The pvc would work on its own as a normal pvc uh in kubernetes right and that that's our goal not to bring um not to replace pvcs not to replace csi, but instead have a more user-friendly interface, uh and maybe you know even configuring it for for the users who don't want to be bothered with um you know, provisioning and uh creating the configuration for their uh pvcs.

A

Okay, um I this is very useful, so thank you for putting together this faq. I think that will help clarify some of the questions that uh the toc had. I'm not sure, and it doesn't look like amy's on what the review cycle is. Maybe it'll be pushed out until october again, when they can get back to it, but I'll find out and I'll I'll. Let you know.

D

So the the other thing I would say is you know we were striving to get more feedback. So what um I'm trying to understand? The new process is a bit uh not clear to me, because I was starting to put on a pull request and it was a google form, and now it was a bit kind of hard to keep track on where the review was happening, and uh I didn't get a chance to explain a bit more uh well, I understand it's.

D

It should be part of the framework and the documentation, but uh yeah anything anything that I can answer or give a better explanation or demo. I would be happy to.

A

Right, so um it's perfectly fine that you find the new process a little confusing because it's only been since august and I don't think it's ironed out yet so the new process. The idea was that um they were going to try to simplify the way that projects got into sandbox and not require a presentation.

A

The idea was that the toc could look over the questionnaire that was filled out and then, if they had additional information they would kick it back to the sig as they did here, and then we would provide that information. So yeah you're not required to do a presentation. um You know you you already prevent you know you already came on the sig. We recorded that we provided that information. That communication is still being ironed out. So I think you've provided everything you need, if not I'll, reach back out to you and let you know.

D

Okay, great great thanks. Thank you.

A

Yeah thanks I'll I'll, send this over to the toc find out when their next meeting is and I'll get back to you, then thank you for this all right. Moving down the list uh kieran, I believe you had some updates on the licensing questions we had from the last time. We talked about open ebs.

C

Yes siren, uh I have updated the agenda doc with uh the information I'm trying to put together. uh The action item was to list all the open, ebs top level repositories and what are the dependencies those repositories have on other projects. uh I can probably share my screen and quickly walk through that. That's okay,.

B

C

A

You, I think the the question was around the cs: tour engine right that license; okay, yeah, so go ahead and share your screen and we'll walk through that.

C

So what I hope you can see my screen now, yeah perfect, along with the c store, I kind of updated for the remaining repositories. Also, so maybe I you know since c store was the first one uh that had a lot of open questions, I'll cover that and then we can go back to the other ones, um all right.

C

uh So the main concern with the c store is it's a it kind of depends on the zfs and zfs itself is a cdl licensed project. So what openvps did was uh for the openvps dfs project, which is actually a kernel uh cfs.

C

You know you can kind of build kernel modules with that and ported that to work for running it in user space, uh so the open vpc store, which is a fork of open, eps dfs cfs, is a modifications for making it work on user space and then the actual functionality of how open ebs uses that framework or uses the cfs for storing the data and the replication and high availability features that were added on top of it. They are all part of the open, eb slip, c store and open. Eps is gtt reports.

C

uh These themselves, you know, since libsy store, is completely written uh by openly bs authors, uh that being apache is not a problem.

C

Now the you know, prior to the last call the way the code was being built was the openvc store was actually pulling in the uh changes from libsy store and building that uh binary. I think that was kind of highlighted as an issue. So now we turned around that, so lip c store is the one that actually contains the main call. If you will that uh instantiate the binary and it uses the open bbc store as a library right um just like any other project, we use any other dependencies.

C

So that's where it is at uh now still keeping it open for discussion and trying to understand if this is okay or anything further needs to be done um and leave it at that.

A

So yeah, I think this helps clarify how you know how you're using it, how it's being built differently. I think it addresses the questions. um I would need to probably run this by alex and quentin. I think we have to have consensus from the leads before we move forward um and then it would, of course, still go through the the due diligence as it was normally so. um But before that, does anyone have any questions around this or concerns that they want to bring up.

A

Off the cuff, it seems to satisfy the concerns we had, um but let me talk with alex and clinton and get back to you karen does that work.

C

Yeah, that's perfect. I mean.

A

Okay, thank you, and I can't remember who was was sod doing the due diligence for open ebs for you, okay,.

C

Last time when we um had the presentation, I think this was the first thing that we had to get across before assigning the reviewer. I I don't think anybody is assigned yet. Okay.

A

A

Sorry, I'm just taking notes and I'll update the notes um after the call, I'm writing them down. So I can listen all right perfect. Thank you for that. I appreciate it and you included this link in the agenda correct, so we can just perfect all right thanks.

A

Okay, the last item we had was provega. We need uh so justin uh cormac from the toc offered yesterday, the day before to run the due diligence, but we also need a tech lead from the sig, um so is. Is there any volunteers.

B

Volunteers, for what exactly.

A

Sorry, luis I'm on mute again to work with justin in case he has questions on the due diligence for provega.

B

Oh, I I I'm not very familiar with it myself.

A

B

A

All right, not all the tech leads, are on the call today. So I'll just send out a separate note to see you know who's most familiar and has the time to be able to dedicate to that.

B

Actually, it would be- maybe nice, uh maybe maybe at the next meeting, if we could go through the due diligence such process, because I myself do not know it very well. I don't know if many people do. I don't know if that will help.

A

Okay, yeah I'll add that as an agenda item for next time, maybe an expectation too, of what the tech leads.

B

A

The due diligence.

B

That'll be great.

E

Aaron this is tom. I I would like to draft that process and kind of see it in action. So uh the provega is an interesting technology. If, uh if there's a chance for a newbie to kind of watch along and see it see it go through the paces of uh coming in that would be great.

A

Yeah that'd be awesome, I I don't think we have a good process today. I think in the past it's just been um whoever has the time and we go through it and we just work together and it's it's not necessarily repeatable or comprehensive. So absolutely if you have time to volunteer and want to do that, that would be awesome tom. Thank you.

A

All right, that's all I had for the agenda today. um Does anyone else have anything that they want to talk about or bring up.

A

All right well, then, I'll give everyone a little bit of time back on their calendars. Thank you have a great week.

B

A