Kubernetes Storage SIG, 28 Sep 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage 20170928

Description

Kubernetes Storage Special-Interest-Group (SIG) Meeting - 28 September 2017

Meeting Notes/Agenda: https://docs.google.com/document/d/1-8KEG8AjAgKznS9NFm3qWqkGyCHmvU6HVl0sk5hwoAE/edit#heading=h.vcj8dh41d9r2

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

Moderator: Saad Ali (Google)

Chat Log:

09:11:04 From hekumar : can you hide that zoom window and maximize spreadsheet?
09:11:28 From hekumar : Palak/Saad ^
09:24:20 From jinxu : Saad, My headphone is not working

A

All right, hello, everyone today is September 28th 2017. This is the bi-weekly meeting of the storage special interest group for kubernetes. As a reminder, this meeting is public recorded and published on YouTube. Today on the agenda, we are going to be doing planning for q4.

A

We have a demo from Eric and we're going to be discussing the face-to-face and a couple. Other items on the agenda feel free to add to the agenda as we go, I'm gonna share my screen and we'll get started.

A

A

All right, so first thing is 1.9 planning. We are at the end of q3 we're starting q4 1.8 shipped yesterday, Wednesday for planning. We have a new p.m. de l'eau who is going to be working with zigge storage and reporting back to the PM sig. It used to be Matt Tulio, but Matt DeLeo is working with other steaks now Pollock. Do you want to take over I'll, stop sharing my screen and you can share yours.

B

Yes, okay, your voice is a little soft. Is this better by any chance? Yeah, okay, perfect yeah! So thanks everyone for filling in the feature entries I just want to go through each of these line. Items assign a priority: fine owners for the missing features and then have planning for 1.9.

B

So let me just get started with the first one. The block volume support. Do we have the owners and then Scott or Mitsui, on on the call by any chance? Yes, I'm here: okay, perfect, can you just talk a bit about it and what priority would you assign for it.

C

High priority is that.

B

C

B

Yeah one two or three, okay, yeah.

A

I would agree, this would be a p1 for the group. This corner got.

B

It and it said that it will take more than one release, but we're still targeting alpha and 1.9 correct.

C

We have a phase release for it, which is notated in the design document, with features we'll go into which part of it, the Alpha being the bulk of the implementation with the dynamic provisioning. All.

B

Right perfect moving on, unless you have anything.

D

On the phone, this is too sharp from VMware, so this is the first time I am attending a planning meeting. So can somebody please stratify? What is a pool? Is I mean what the goal is? I mean how many units.

A

D

Are planning? What is the priority? How do we suggest are so.

A

If you have suggestions, please jump on to the planning, spreadsheet and add an item at the bottom, we're going to go through these item by item and we're gonna try to figure out priorities for them and assignees for them as we go as a collective group were gonna talk through them, I think for some of these items, we've been working on them for multiple quarters already, so their priority, pretty obvious for some of the new items which we may not be familiar with.

A

We're gonna ask you to kind of explain what what the feature is and and then try to figure out what the priority is and hopefully assign someone to actually work on it and for folks who are new to the sig. This is an opportunity to step in and to contribute. This is where we figure out what we're going to be working on who's, going to be working on it, who is going to be reviewing it and that kind of thing.

D

A

Gonna, we're gonna call it we're, not gonna. Freeze it this right right now, it's not going to be frozen until the feature freeze date or kubernetes, which tends to be about two or three weeks into the quarter. I'm, not sure what it is this time, so we have until then to revise it, but we should have something nailed down before the new quarter begins, which is October first next week, but well. We will be able to revise this, at least in the next storage sig meeting.

B

A

Problem back to you, public, okay,.

B

Sounds good moving on to the next feature request we have, which is implements scheduler and binding logic for local persistence, storage support and the local storage is still in alpha. Do we have Mitchell on the line yeah I'm here so.

E

Actually, there's a few things we're tackling with regards to local storage in 1-9 I made a comment in the notes like three major things: we're doing I don't know if each of those needs to be a separate line item or should it'll be bundled together.

B

Got it Oh? What is it estimate of time required for the bundled item so.

E

I mean all of this is, is estimated to be worked on this quarter, but for the full localstorage feature it's going to take orders missional.

A

I think we should probably have line items for all the stuff that we're gonna commit to this quarter. So it's easier to track. Okay,.

E

Then all again so then we can keep this line item just for the scheduler work, in which case I am hoping to get an alpha implementation out in 1/9.

A

Perfect is there anything else that we're gonna commit to from the local storage site for one night, yeah.

E

There are two more things that we're working on. The second thing is enabling raw block for local volumes and then the third item is the security hardening that we're going to do for the local provisioner.

A

Okay sounds good I added the rob lock at the bottom of the list, and the security hardening I believe is already on the list. Oh okay,.

E

B

What do you want to assign to the scheduler item? I would call it a one card it because it has a strong dependency on the local storage as well right. Well,.

E

The this is the main saying this is the main feature that is preventing local storage from moving to data guard. It sounds.

B

F

And display bringing down the window that lets do window on the right side. The spreadsheet is crossed to the left. Oh.

B

I'm, sorry: okay, no problem! Thank you! Wait! I'm! Just trying to is this better? Absolutely.

F

B

Zoom in some Suman I mean.

F

100%, on the view, okay.

B

F

Excellent. Thank you. Okay,.

B

Sure all right Michelle do you want to assign priority to the block or do we want to come back to it later? Yeah I'll go.

E

Ahead and fill that and I guess, let's see, I I would call raw block either a one or two yeah. Okay. Most a lot of this work depends on the actual block feature yeah.

A

E

A

Call it it I think if we don't end up getting yet, we don't need to stress about that. At least this quarter, cool.

B

Sounds good moving to the next item can enable containerization of mount dependencies. Do we have the owner here, Jan.

B

G

B

Didn't talk a bit about the features and what the priority would should be. According to you,.

G

The future is about container raising things like ice.

G

That are currently running on the host, so the priority high priority. For me, you know our priorities from the sick, I.

A

Would probably call this a p2 I think this is a really nice feature to have. It's gonna make a testing of antenna testing of the existing entry volume plugins much easier and it'll also make it easier to to have add support for these volume plugins, where you don't necessarily have the ability to install the dependencies on the host OS. That said, if we don't end up making it for any reason, it wouldn't be the end of the world yan. Do you agree as p2?

A

Of course you can see another reason, I hope not, but.

B

Thank you, Oh sounds good. So so do you want to speak up out of three si si muy amable yeah.

A

So the hope this quarter is to we're good we're. Gonna have the face-to-face meeting in a couple of weeks, I'm hoping to send out a design doc, probably a week before that, and then hopefully we can finalize on the design and start working on implementation and I would like to drive this to alpha this quarter. That would mean a new si. Si volume plug-in being introduced entry that would enable si si volume drivers natively in kubernetes.

A

There are a lot of people that were very interested in helping with this I've listed them as owners in in the owners. Section I think we have enough people who are interested in actually helping develop. We probably need more people to to help review it than anything else.

A

We've got Brad and Steve, but I think I can probably pull Michelle or Jing to help with the review as well, so we're pretty well staffed on it priority wise I would I would consider this a p1 for our group having enabling out of tree volume plugins that are containerized and making si si work. I think is, should be a high priority for this group. Does anybody agree, disagree.

H

Is the second item.

A

I would let's just bundle it all into one: it's design and implementation.

H

That's thinking, if it's separate you could uh you know how design is Priority One and then the info is p2. Okay,.

A

That that's fair. Let's do that, then we're gonna end up tracking. It was the same feature, repo bug, which is what I wanted to keep it a single item but yeah.

H

Feature what's.

A

H

The design need a feature, I thought. Oh.

A

That's true you're right, you're, right, okay, so what we could do is create two line items, one for the design, one for the implementation. The implementation could be, p2 design will be p1 and the future repo bug will be on the implementation. I'll go ahead and make that.

B

Sounds good so this line item there first with implementation, which is p2 right. Is it fair to assign that oh.

A

Sure that's fine! Okay,.

B

All right, moving on to generic storage, topology solution that works well with CSI and local storage. Sorry I want to speak on that yeah.

A

So this is a effort to try to come up with a generic mechanism to handle arbitrary topology in storage. When I say topology I mean the essentially availability zones in kubernetes. Today we kind of hack around this for cloud providers. We hacked in the zone in region handling code, it's passed around as an annotation, but it's not very flexible. The code and the scheduler is actually hard-coded to look for those specific annotations and and do whatever it needs to do with them.

A

What we want is to be able to handle topology more generically for storage such that it doesn't matter whether it's region zone or it could be on Prem racks or whatever other arbitrary topology it is. Our code should be able to handle it more generically. This is something that I've run into on the CSI side of things and Michelle's kind of run into this, with local storage as well, because, ultimately, local storage is a topology where essentially, where you have.

A

It boils down to just a single node. Is the availability zone so I'd like to work on a design that ties all of this together and this would be either design or alpha? Definitely not pay got.

B

It so maybe we can never. We decide on whether it's a design or feature we can maybe mention it in the title and, however, there's.

A

A column there target status in 1.9, okay,.

B

A

Go ahead and change that too I'm gonna call it design units, that's perfect. Okay,.

B

A

Priority wise I would put this as the same priority as CSI, which is two, so that would be fine and.

B

Does CSI depend on it? Is it a block officer, I.

A

Think we can get CSI without it, but it would be really really nice CSI would be incomplete without it I.

B

See so I think the only comment I would have is: if we have it in design in 1.9, we might not be able to implement the Alpha of CSI. Is it a fair statement to make.

A

I will be able to get an alpha just one.

B

Okay, Tony, better okay,.

A

And from the local storage side, I think Michelle. The same story is true right. I.

B

Think we lost her sorry what story.

A

Topology for a local storage, if topology doesn't get make it to implementation. This quarter, it's not going to block local storage, correct.

E

Yeah, because, right now in local storage, were playing with this topology idea of the persistent volume node affinity, so we're already kind of using that. The only thing in my block is going to beta. If we want to come up with a different way of specifying topology.

B

Got it sounds good? Moving on to eye line item number eight volume, resize file system, D size, more volume, types, ito e tests, we don't have an owner yet can anyone speak to it? Yeah.

I

So we did the same work in 1.8. We got, we have a already designer, so we need to carry on with this work in 1.9 and I was like this to take two to be done in 1.9 and I had more volume types CAW in a few and yeah. Does it sound fair, like the parity, wise I, think it should be? That's what it was in the last quarter as well.

B

Okay, go to me yeah and move to beta in 1.9, and what is your time estimate? Is the LEAs more than Emily's.

I

Like the GA will take office ly more than really like to point zero or something, but beta we should be able to hit in 1.9, okay,.

B

So let me add all four that.

B

Sounds good whenever you get a chance, maybe you can add to the viewer as well? Oh, and if anyone wants to volunteer to the view this feature, that'd be great yeah.

I

B

Moving to line item nine snapshots to entry alpha so.

A

This is something I wanted to ask the snapshots team. Is this the goal that we want to shoot for for this quarter? How comfortable do you feel with that? Do you think we'll be able to get the appropriate sign-offs, etc?.

J

I, don't like that genius Autobots are here on that. This is what I mean. So we already have the external snapshots controller and it's being worked on and there's confidence, that's doing. 1.8, it's going to be stabilized. Hopefully, entry entry API um into plugins will help is to guess the functionalities that has did and reduce a lot of redundancy is we have as a external controller.

A

So I guess my question is for q4: do we want to drive this to an entry volume or entry.

J

J

I'll follow up the grades so.

A

The challenge with bringing it entry is we're. Gonna have to make sure that we go through all the appropriate architectural reviews, especially make sure that we run it by the workloads team. I. Think one of the big outstanding questions around the snapshotting feature is how is it going to interest with end-to-end flow and as long as we make sure that we get all those approvals? I think that this would be a reasonable goal. Okay,.

J

So do how would I need other stock points? We can start with to start the interview migration.

J

What do you mean so anything we need to sign off first or to start with the paperwork so.

A

I think we're gonna start with a feature repo bug. That's going to. This is the work that we intend to do. Then we need to go and create a design. Doc saying this is the feature that exists out of tree: here's how it currently works and here's the proposal for pulling it into tree. We plan to add this controller. We plan to extend the API in this way, and this is the way that it should be used end to end by users.

A

That design is going to get reviewed by this storage sig, and then it should also get reviewed by the workloads team, I, believe it's a gaps and the API owners, basically, okay and once.

J

A

Those approvals, then we can proceed with implementation. Okay,.

J

What is Jian Thomas took at this time? Okay,.

A

J

A

Good. Thank you thanks.

B

Moving to the number line item number 10, P, V, taint and toleration x', we don't have owner list it. Can anyone speak to it? Yeah.

E

So yeah this is a feature that is very important for local storage, it's something that is on the local source roadmap and I'm, just looking for I'm looking. If anyone has cycles to take this up, because I I mean.

A

E

Take it up, but I can't take it up this quarter. You.

H

Can put my name here be child's okay.

B

Thanks Brad, it sounds good.

B

And what died or T do we want to assign to it.

E

B

Is what design phase this quarter right? Yeah I! Put it Reza, okay, putting it as and.

A

Michelle, do you want to give folks a quick summary of what the feature ask is here? Yeah.

E

um So the main goal for PD paints and Toleration x' is to it's sort of similar to the notation Toleration feature, mostly used in health monitoring, where you can have some entity monitoring the state of your health of your disks and if, for some reason, your disc becomes inaccessible or and healthy, you can taint it, and then you can set Toleration x' in your app to say, specify whether or not your application is capable of releasing it's finding on its volume. So the major use here is for distributed file systems where they have.

E

They keep multiple replicas of their.

A

E

Those applications are perfectly capable of being able to release the volumes, get a brand new one and rebuild from their replicas. This.

A

Is gonna be a really cool feature? I think the challenging part is going to be figuring out. What that health probe looks like, but that'll be part of the design.

B

Sounds good moving to line item number 11 local storage static, the provisional security hardening is changing on the line yeah.

K

So, to give a bit of background what this is currently in, the the static provisioner has a and it's running a daemon set, so it's running in every single node and every every single pod has access to creating and deleting modifying PBS, and this has been insecure and we want to limit that scope. So the new design we came up with is to only have that access in massive master in a in a trusting, node and then have have the pause on every single, every other node as worker nodes.

K

So they will be reading into reading the file system, discovering new volumes and performing the deletes a parishioner, whereas the master has the all the central logic for determining when to trigger, discover or to modify PVS etcetera. So right now, I have a proposal out waiting for seven review opinions just waiting for some security clearance before I move on to implementing it.

B

Sounds good any questions from anyone? Ok, what priority would you assign to it.

K

This is a this is having my radar I know so I put one I.

A

Think for the storage sig, though Michelle. What do you think do you think this would be a p1, or should we make it a 2 I.

E

Think for alpha, it's not critical, but definitely we need this before we go to beta. Ok,.

A

And since we're not going to beta this quarter, is it ok to keep it as yeah.

B

Okay sounds good to it.

B

The next one is enables cube cuddle to expose capacity, isolation, resource management, which is unit four local, ephemeral, storage, hello.

L

Yeah, yes, I'm, not sure Nick is here or not, but basically is part of the capacity isolation feature, so I think Nick is working on it and I will help to review it and the target is in our beta because we have kind of other considerations. But this is alpha yeah, it's it's just part of the isolation feature already working on ones, 7 and 1/8, and the priority is 1 or you know, I'm, not sure we have 0. No, we don't have 0. So it's 1.

A

Yeah that seems reasonable to me. I think we want to drive capacity, isolation, resource management to beta in the subsequent quarter, and this is a prereq for that. So it seems reasonable.

B

Sounds good? We might need a feature that buffer this okay enable cube cutter to expose PVC capacity usage stats. We have a buffer on the line.

M

Simply to expose PVC metrics, we are.

B

And what is the target status for 1.9.

M

I know what the priority should be: I know it should I think it's 3 in terms of the target status I'm, not sure what the logistics are of introducing new coupe, carro command or API. So.

N

M

L

So yes, this capacity isolation right now is our fat. Then, yes, it will be alpha and yeah. Okay,.

D

A

I think the confusion there is right, I'm, not sure what the right way of people.

L

Oh sorry, actually it's not our fight, the the PVC racket, the volume usage stats. It's not true. There are two degrees of capacity, a solution. So if we in this is just G a this expiry, the same thing I see our memory yeah.

A

We'll just that makes sense, but I think we'll just run it by the cute little folks to verify.

A

M

Yeah I'll push proposal for this, which describes it in more detail, and then we can go from there perfect.

A

And let's get verification from the cubicle team to make sure that we follow their process for alpha-beta GA, okay,.

M

A

Follow up with me offline I'll help you find that.

B

Thanks moving to number 14 ito, 8 e 2e test for capacity, isolation, resource management for local ephemeral storage. It's.

L

The same thing with like lying 12 yeah do.

B

We need a separate line item for this yeah.

A

So, basically, a capacity isolation, resource management, we're gonna drive to beta in q1. This line item is asking for end-to-end tests, I'd mark it as a one or two, maybe a two and it's a level work, but we should track it as a subset item. We do need somebody to work on it, so.

L

They make already working.

A

On that, there are already some tests for days cool, so we'll run down as the owner for this sounds.

B

Good 15 review and improve division attachment mount logs for ease of debug ability.

A

Yes, so these number 15 and 16, this is kind of a sanity check for improving debug ability of our system. We've added tons of events in the past couple of quarters, and we already have lots of logs that we can use for debugging.

A

It's not entirely clear how useful they are in their current state or the ask here is for someone to for line 15 review. The logs that were putting out for things like mount calls attach calls provisioning, deleting, etc and make sure that the right things are getting or making it to the logs, and we haven't sufficient information for for debugging, and so 16 is is the same thing except it's asking to do that for events, the a skier would be. Are we attaching the events to the correct objects?

A

For example, if there is an error on the attach call and an event is generated, you know, is it its purpose to the user in a way that they can use that information and understand what went wrong? So it's more if they examine the system find where the gaps are and and fill those in any volunteers. For these two items would be great. This is a nice intro item for someone who's just joining the community, because you get to dig around the whole codebase and it's not extremely difficult to ramp up on yeah.

H

I think I was gonna. Take a look at number 15.

O

This is David by the way perfect.

A

L

And I will help David with these items. I.

I

Can help as well here in theory.

A

Okay sounds good. Let me go ahead and list you guys as.

B

A

B

Good aside, what tidy would you assign to it? I.

A

Would make it a three okay.

B

And target started this more like a bug fix or yes, I think so. Yeah agreed.

A

B

It do we need a feature, but for this I guess not.

L

B

Cool, so this covers I, guess both 15 and 16. Yes, moving to 17 prevent PVC PV deletion when pod is using.

L

I think this is something like we kind of want to have, so that user wants lose their.

A

L

It's also a nice feature to have and similar like anyone who interesting. This would be good. You.

I

Sorry, you added my name like the preview Pete I volunteered as reviewers, not as oh yeah I to number 16 I thought David volunteered for both the items. I've got me wrong that we can I.

P

Can probably take a look at those yeah.

G

B

Thank you. Okay,.

B

So line 17 we don't have an owner yet I.

I

Can look at it I think you had some plan to do it, some illiterate, what needs to be done so yeah and.

L

I can help he meant yeah.

B

Sounds good, would you call it a feature or a bug fix.

B

H

Have bugs open for it? Okay,.

B

I

It may very well turn out to be a feature because of the complexity involved in it right.

B

Sounds good number 18 improve volume, management, stability.

L

So there's still some issues of his our like amendment and I have a tab in this document called stability and the list. Some issues like we discovered so far and would like to address them and also internal or external anyone who like to look at them. It would be great, so it definitely helped to understand that our core code base model management, yeah.

I

So 19 is also kind of related in particularly like I, have me and think we both above two separate github issues, but they both have the same problem, which is like certain things. Certain volumes can go missing from actual state of the world and I think.

L

Improved stability is kind of category and, like the tab, stability, we list issues there yeah. We can consolidate all the issues over there and then prioritize them and try to fix them. Yeah.

I

We can delete line on line 8 19 and the I think.

A

It might be worthwhile to actually break the stability item out into multiple actual tasks, rather than just a generic stability item, because that's hard to track. The second thing is trying to under gauge the priority for each one of the bug fixes after we break them out and figuring out how much effort we should be investing in in fixing them I know there are a number of bugs that exist in the system, I'm, not sure what their priority is, because we don't really understand what how frequently they're happening.

A

I would really like this quarter that we invest in improving our metrics and logging to get a better gauge for how frequently these things are happening and let that feed into the priority for these bugs, but two a month and jinx point. We do have some of these areas identified as there. There are issues we're not entirely sure how frequently they happen, and it's worth you know trying to fix them, but maybe not as a p1. We could mark them as p2 the most.

P

A

Call outs would be, let's break the stability item out into actual items and then yes,.

P

A

So that were prioritize it out. We have more information, sure.

L

Okay, even use this to explain all the tasks or just use the benefit tab to put or even paralyze them, so that, because that might be kind of a long.

A

Yeah, so I think we should push it into this spreadsheet. The reason being that anything that gets put on a different tab gets tends to get ignored and okay, this spreadsheet every week and if we're making progress, it's very good. If we're not. It's also evident. Okay,.

L

Sure I got it they're.

I

Saying that that it makes sense to have it as separate line items and the spreadsheet so yeah.

A

Basically, it doesn't necessarily have to be individual bug level, but grouping like items together, for example, I know there is a number of issues around preempted. Vm's node restarts things like that. So we could group that into one line item into hardening and improving that yeah.

I

I think my engine can work together and try to like have separate line items for each type of the category of bugs, but one thing that you mentioned that I, maybe you guys, will have time to discuss interface to places like that, the like how often this is happening, so we won't have like the matrix that we have so far. They will only tell like okay volume attach failed. It wide field. Well, we should maybe have a matrix of life. How many parts are failing to start because of all the matter?

I

I, don't know how to capture it, but you.

Q

Know I think it would be great on maybe the six torrid list if, if somebody could start a thread that just says ideas on metrics- and you know- maybe we start with a mailing list and just throw in everyone's bag of ideas of useful metrics. And then we can, you know hopefully in 1:9, start finding ways to expose those signals and make them consumable. You know one of the the challenges so that the the Google team got together and we were like you know, 1/9 stability.

Q

What we want to do- and it's really kind of hard to know where to focus, because a lot of our data is very anecdotal, though you.

L

Q

Have our ideas of interesting signals that would help us make more intelligent decisions in the future, but the real question is you know everybody should be able to contribute to what those signals are for 1/9. So you know: if anyone wants to go to the list and say hey, you know: here's the signals that we think are useful and then yeah.

A

I completely agree in a month, I think to your idea having some metric around number of pods or something that are failing due to volume issues that could be an interesting metric and if we can, we should add something like that to this quarter.

L

A

Having more metrics at this point, yeah.

L

So great so here we didn't put any metric exposure.

L

I know socially I had I.

A

Yeah, that would actually be great if, with there's more metrics that we can add, let's think through that we can do it offline and then review it in the next next meeting. Okay,.

L

N

You perhaps indicate where this existing list of signals in.

I

We have yeah, we have like last quarter, we implemented the volume operation matrix previous quarter, we are providers to his matrix and then we added last quarter. We added the TVC matrix like so we have consistently been adding the matrix. We haven't been listing them in one place because sometimes the change name. So maybe we should list them and that could be an item itself, but but does the the three three kilometers that I'm aware of that? They are added. So.

Q

This is where my ignorance of the kubernetes curve comes out and clear in the kernel. I know that there's trace points, and you can always you know programmatically probe and find a list of trace points that let you understand where you can collect and gather signals for the future and there's like three sub systems to do that, though, I think they converge to a while back is when we put in these signals today.

Q

Is there any programmatic way to find that list or even filter for the ones that we think are interesting to focus on storage, yeah.

I

You can just there's a URL you can call actually for, and that just gives you all the list of all the available matrix that we have and we can use them. So that's in.

Q

This case is we all the kubernetes system, or just the volume or storage focused ones.

O

Just like all those.

I

Signals or all the matrix, bio natives, not just the storage, also so.

L

You are talking about on the node right this order controller, both in opposed yeah right, we kind of like at the end point you can query those.

Q

This is the part where I feel sometimes things you know stop too soon. Possibly you know when you're an engineering, your developers, that's really great, because you understand the system and you understand the code layout really well. But if you are, let's say a support engineer or a cluster admin, you know something's going on with your volumes and you're not really sure what or why and you don't know where to look. You know. All of those signals may be way too much to juggle.

Q

So is there some way that we can either a come up with I, don't want to say script, but some filter or even something that we can put on the you know a github page. It says: do you want it to bug you're having trouble with storage?

Q

You know these are the things that you can do to look at the signals that we now have in place to filter through, and you can see oh look error here as opposed to, let's assume that there are lots of errors in other areas and you may get confused by too many too many signals at once. Yeah.

L

It's definitely I think good to have that document. After we add more metrics right now, there's no like documentation really like explain even the metrics. We added right now, there's no like good documentation. So.

Q

I guess I've got two points. One is if we don't understand how we're going to do that. Just adding the signals is, you know, maybe premature, like we should probably figure out how we want to consume them before we. You know as we're thinking. Oh sorry, let me organize my thoughts now seems like a good time to come up with a high-level list of all of the areas that we think we're.

Q

Adding signals will be helpful, but then, before we start truly injecting all of them, we should probably consider also how we think we expect different user levels to consume them. Yeah.

N

I mean in the form of documentation just perhaps how to even hook them up to things like Zipkin Jager. If that's what you typically do with them about how to how to consume them, not just what the list is.

I

Yeah I have documented some in the peer surveys: how to consume them in how to consume them and how to read them like there's a documentation on communities that I owe and but is not exhaustive and- and there was some resistance. Transformational signal situation. Sorry suggest immunization folks for about like listing in division matrix in her face, because my changes submit, but.

Q

That's my point: I I think that if we end up listing them, it's immediately gonna get out of date. I think that we, if we could come up the way to you, know programmatically tag and collect I, think.

A

Some of that actually could be on the side of folks who are actually running the kubernetes cluster. The first and most important part is actually surfacing the metrics. The second part is collecting them and then making them easy to view. The the part that Hamas has worked on is been surfacing. It and I think that belongs in the open-source side of things. I think these, how you surface it, how you collect it can actually be on the specific to each deployment.

Q

Someone just said earlier that they had talked to sig instrumentation, who was that.

C

Q

So you know, maybe you know, would you mind, did they provide you with these sort of here's, how to do it or do they if it's still up in the air on their side, because we should probably align with whatever you know, stick instrumentation wants to do no.

I

I think we'd they haven't provided any specific iodine about that so I'm, sorry, so yeah, so we need to now. I would like to open like talk to them again, maybe and get an idea like okay. How do we like it's too hard right now to figure out what those endpoints are call endpoints, where the matrix R is like how to set them up? It's like to difference. I would like to talk to them again.

I

I think I can do that this quarter and and figure out, okay, how if we can find a way to programmatically, generate to start and rather than have a static page, so but I don't know if they will have a immediate solution. It repeats the solution that is odd series deployment specific. How one deploys community is how it could be deployment specific so but.

Q

You find out and come back for the next meeting yeah so.

C

Do I mean are where the metrics prepended with something to indicate that their storage metrics, so it's easy to filter on those? Are you saying the piece that's hard to set up is the actual gathering infrastructure, not adding the pieces of storage we care about? Is it divided out that way? I guess. Can we have like a demo of it?

C

I

You can, but not all metrics necessary, have necessarily have stories stories word and itself, some some some some do. Sometimes it.

Q

Sounds like we have a lot of questions about how to consume these metrics to anyone volunteer that, maybe then six or to give a demo like I, think Aaron has great idea so that we can all sit down and see how it's done, and if that person can also reach out to sig instrumentation and hear, if they're ready to give any guidance. That would be great. So.

L

I think I can help and he Mont two.

A

L

Them yeah, before exchange on email, I, think the six instrument also making like changes, so we need to like work with them together for this metric exposure, yeah.

Q

Do that come on? Are you? Okay, with that yeah.

I

Q

So that sounds like a great action item so that we sounds like there's. Two one is come up with a list of the the signals that we want to see that we think will be helpful and then camonte and jane can come back next meeting with a mini demo of or or a presentation or like just a report of how we feel they should be consumed or where we are today. Yeah.

L

B

Sounds good, so I think line item eighteen and nineteen would connect to twenty-three then. So we want to have all the bug fixes and some of the issues that we are getting to be in. You know taking so surfacing these metrics into account for thinking about how do we want to plan and prioritize them?

B

We already talked a bit about number 20, which is enabling the or block for local storage. If there's nothing more to add, then we can move on to 21.

B

So on 21 it is encryption of empty death. We don't have owners yet.

A

So this was a line item I added I would prioritize this as p3. It's a cool fun item to investigate and design and maybe a push to alpha this quarter. If we have the bandwidth, the idea here is currently we create an empty door. When a pod starts it exists for the lifecycle of the pod, then we delete it if anybody else gets access to that machine or for some reason that data doesn't get cleaned up.

A

All that information is sitting on encrypted and available to everyone, and the suggestion is to possibly use filesystem encryption to encrypt the directory when we create it and put that encryption key in the kernel ring, and so if, for any reason, the data is not cleaned up, perhaps because cubelet crashed or for whatever reason, I and or somebody else gets access to the actual localhost disk. They won't necessarily have access to the data that was written in that empty der.

A

We would have to design this to be an opt-in feature, because it would necessarily break some of the assumptions about empty dirt around lifecycle, but that could be hammered out in the design. It's a cool fun feature to work on I. Think David on our team was interested in taking a look at it, I'm, not sure if you're still interested David yeah.

Q

I'm still interested I thought we weren't gonna do that this quarter, but if we are then I'm.

A

Just gonna put it on the list as a low price. We, if we run out of things to work on sure.

F

Yeah, that would be great.

B

Sounds good, a backup of Peavey's lumber trained who seems pretty connected to snapshots that we discussed earlier? Can we talk about the scope of the two items and if there any potential overlaps.

D

And just to give ground.

A

D

About this request, so so far, six storage I feel is focusing major majorly on persistent storage and, and we have talked about few aspects of availability like reputation. I, don't know it is my ignorant. If they're in backup has been talked about, maybe I'm not aware of it, and what I see is that customers who are unit of anarchists are actively talking about it. We have some backup vendor so I've reached out to a VMware.

D

You know how we can do this so I see a pier need, and what I see is that you know this is not a vSphere provider problem alone. It is a general order problem and this is something you'd like to help out in designing. So that's the what.

Q

Do you want to design what is the problem you want to solve so.

D

We got I'll post the issue also there's a customer that Kim, who is saying that I want to back up my TVs.

D

And that I want to back up my TVs and it is currently not possible to back up through most of the back of programs like beam because flow through. Where does lead, please in persistent independent mold. So this.

Q

Is the background really I thought that there was a lot of like technologies out there? That would allow you to backup from cloud providers and even sync, two ways, and my other question is: are we sure that this is functionality that belongs in the kubernetes control plane? Is that what you're, suggesting or just suggesting to have any sort of guidebook or suggestion of how to do this even outside of it? Yes,.

D

So, as I mentioned, the issue that was by a customer was yesterday I just look to see, if you guys think and- and we haven't thought deeply about it, but by the time you show for the phase meeting, we would come up with some proposal by then okay yeah.

Q

This feels like a great segue to talk generally, someone who I know is a is muted in the room right now. There's a new Googler, not a new Googler someone who's been around Google for a while, but a new member of the sig, whose name's Alfred Alfred are you there.

Q

In any case, I think he may be muted and listening. Alfred fuller is one of the storage experts in Google, who knows a lot about what's been going on and sort of outside, of Google enter and on Prem environments, and we've asked him to you know also be around to just. Do you know to be a participant to the sig and he has said that he is got the time and we'll be able to be focused.

Q

So if you you know, please make him feel, welcome and he's got a lot of experience in both building storage systems and storage api's. You know, he's probably going to be focusing on snapshots with Jane and I. Think that's it, but for questions like the one that you just asked about, and you know what's available for backups and cloud providers, it might be another area where he might either have expertise or know someone who does and I.

D

Q

P

Yeah, thank you thanks, yeah I'm, definitely very interested is seems, like you, guys, are looking at a a lot, a lot of good things to augment through Nettie's and I'm. Happy.

Q

To help so, you know, I think the idea of coming out the proposal for the face to face sounds great, and it also may be something that Alfred might be able to contribute to.

D

Q

B

That's good, okay,.

A

Priority wise can we assign something here, I'd like to understand how this is different from snapshots. It sounds like we're. Gonna get a lot more clarity as we approach the face to face, so maybe we can leave it on prioritize until then, yeah.

B

That sounds good I think we've covered all the items so order your side.

A

Okay, cool- we are just about out of time. Unfortunately, we're not going to have time to do the demo that Eric had planned today for his docker volume driver phlex driver, but we're going to put him first thing at the next meeting so that we have enough time. For that last thing I wanted to mention was the face-to-face meeting. It is going to be October, 10th and 11th. The location has been confirmed.

A

It is Google office in San Francisco. The details are linked in the agenda document, so please take a look at that. The event has hit capacity.

A

Unfortunately, so if you have an RSVP'd and you're but you're still interested in attending, you can add yourself to the remote attendees list, and if anybody drops out, we can pull pull folks in, but otherwise we'll have a zoom setup in the meeting room for folks we're going to be able to attend, remotely and I believe Dell, EMC and port works are going to be hosting a dinner the first day. Steve. Do you have any thing you want to add to that.

N

I'm mobile and traveling right now so I wasn't able to pass paste map Coordinates, but I'll do that when I get connected better. Also, if those who are going to actually go to the dinner can add the text plus dinner after their name, so we can get an exact headcount that would help out a lot.

A

Cool sounds good, I think. That's all I had I had an item from Michael Rubin about growing the SIG's storage community Michael. Would you be okay, punting that to the next night, yeah.

Q

I think we may even want to do our agenda is getting so packed now. It may make sense to almost order and then again, de on email before the meeting. It's definitely a bummer that someone had a demo all prepped and we didn't get it done as we were sort of enumerated over the priorities of planning. So much so.

P

Q

I think it's cool I think that mine could maybe wait a little bit if we have demos.

P

Q

Instrumentation reports we want to have so maybe we can send a mail for the you know large items almost at the point where the community meetings are where we want to give estimates and time box some of the items that we're discussing, because just.

P

H

Q

Happening the sig is pretty healthy and successful. We're.

H

Doing every other week, should we, let's.

Q

Try timeboxing first and then see what happens because you know I, honestly, Brad I think we like each other so much and when there's so much stuff to do, we could do this almost every day and still have cool stuff to talk about I agree. Four hours a day, I understand.

A

The timeboxing idea, I, think, is really good and maybe we can start sending out agenda agenda. Doc is open for anybody to add ahead of time, but maybe, let's start adding estimated times that it'll be easier to plan ahead of time. Yeah.

Q

I think the community meeting, the way that's being run I feel is, is pretty fair and successful, and it forces people to you know organize what they're going to say ahead of time and makes the best use of this time, because there's so many people in this room right now that it feels like this time is precious yeah.

A

That's fair all right. Does anybody have anything else, they'd like to add.

A

All right, thank you very much. We'll call it for today and we'll see you in two weeks.