Kubernetes AWS Provider, 31 May 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: kubernetes sig-aws 20190531

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hello: everyone: it is Friday May 31st. This is Sig AWS. We have a very light agenda today. Please do add items to the agenda if you would like to discuss them, feel free to put your name on there as well. I put a link in the chat. I will be pasted it for now and I just joined I. Don't have history works. I. Am your while you're doing that I?

A

Am your moderator, facilitator, Justin, Santa, Barbara, I work at Google, I reminder this meeting is being recorded and will be put on the internet, and please be mindful of our code of conduct and be a good person and don't say anything mean about eventual consistency, which seems to be the first item on the agenda and with that I guess we can go right over to you. Commands I think you added that right. Right, amount, yeah,.

B

Yeah, okay, so so the problem is that we are seeing. We are seeing like a bunch of test failures and unlikely they'll fail in production to where we're like. We are running describe instance or describe volume and describe wall of Iran, and the volume says the describe volumes is I'm attached to this node X and we did not perform an in detach like say: let's say, volume is attached to node X.

B

Then we perform a detach and then we pull if it is detached- and it says okay I'm detached now, so we try and edit attach it to some other place and then the describe bottom same described problem. If you call it again, it now says it is still attached to the node from where is we just detached and similar problem is? We have observed in the describe instance, code path where we detached from a node.

B

We pull that they detached, and it says it can be touched, but then someone else tries to attach and then that checks again- and it says oh, this new instance still has the volume so so yeah. So that's the problem. We are seeing and I think the I don't know if it is recent or what happened. But we haven't seen this kind of issues where, like the same same API, call well we'll tell X and then it'll del Y and just we're thinking we'll have to kind of audit the whole codebase actually and see.

B

If we're like doing the right things that.

A

Sounds like fun now can I ask a clarifying question: do we think that this is because we're using two different read methods, or do we think that read methods in general can like sort of use, cache data or whatever, and then, when we go to do a what I would call a write operation like an attached, but it actually does like a real check. What do we think the like yeah.

B

It's this, the read operation that is giving us cache data, for instance. In one case, what is happening is like when, when a volume is, you are trying to attach a volume to a node, X and and before we attach to save the the amount of the right like the mutable API calls we make, we perform a describe volume on the volume to check if volume is available. If volume is not available, we we, you know, like we see, okay if it can be charged.

B

So we do a describe Illume call and it says still attached to some old instance from where we have already detached, and so it's the read is read all the read: API calls which are returning us cache data and breaking a walk in the code path. So.

A

It doesn't like we have some comments from people that work at AWS enabled will understand the internals better than those of us that don't I don't know. If anyone here wants to comment on it more or give us something. Some insight, yeah.

B

Great because AWS it's the documentation says that the correct way to handle this is like is the Polat with exponential back-off, but we already do we we just that we don't know when we can rely on the value of you know like a read. Api call, it sounds like we have to always do a mutable API calls to get the corrects. Well, you that to to get the collect like accurate state of a volume, or instance right.

A

Which would mean basically that, like I guess what we can do is we basically have to be able to recover? We assume that all our reads are bad, and then we just do a sort of optimistic right and we just have to recover from the idea that are optimistic. Right may have been bad, I, guess, yeah or may even based on based on other stale data.

B

And the reconciler, in both the the tackiest controller and and in the cubelet site, is not very well. You know like return to recover from something like that in in one click case, at least a volume is permanently stuck actually like. It will never attach, like the water part, will never mount so and.

A

We do have CSI coming. I do wonder whether, whether it's just that you're noticing this more or whether something has changed underneath us, in other words like if it's, if nothing, has changed and you're just better observing it, then maybe we should sprint towards CSI and make sure the fixing CSI, because it feels like a lot of the time. We've been constrained by the interface that we've been. We have exposed to us the attach detach controller, and maybe we could make sure that CSI has a more appropriate interface. I guess or a more powerful interface.

A

I don't know if you have any ideas or what you suggest.

B

Csi has same problems that entry travel has at least right now, because we were just noticing and see. I said never also has written, with the assumption that you know like that. The read calls returned accurate data so and CSI itself will not fix the problems that we have because still uses it a little controller. It's just. This calls are going to external driver now.

C

So I wonder if we're doing an optimistic in the actual SCSI driver, though like.

D

If we're solving it from that perspective, like trying in the sales try again, because it's not entry, I haven't actually delve into that code,.

B

So you can't repeat that again: Chris yeah.

D

I was saying I wonder if we're actually doing like the optimistic right within the CSI driver, though okay doing in I, haven't looked at it. I know our I think we just graduated that to beta, if I recall correctly, might want to look in there or potentially file an issue. Yes,.

B

If we have been trying to avoid, you know like right using the right calls, because you just said the right: kutas are so low and and API quotas are so low and in the in a big enough cluster, you may run out of that quota pretty fast and it's the the read calls are so cheap, so it's like I mean all.

D

Right I'll see if I can get more information internally as well. Yeah.

B

D

Looks like there's a couple of people attached to us already. Yes,.

B

I just want to bring this up and sync it up with see. If anybody has better idea.

B

With Chang Chang he's been commenting but yeah, but.

A

And this is on is this something the the error that were observing is mostly on detach or mostly on attach, because we do have evidence on attach like when we could go and like look at when the when the volume actually shows up on the node right, whereas on detach I feel like we don't we don't have any evidence after the volume disappears like there's a second step. Alright, so the device disappears and then the volume can still be in a detaching, State, hey.

B

The problem is more than two actually rather attached, so actually it's kind of both. So in the detached case, what's happening is like you detach a volume and you the others.

B

This two cases, one is that we do judge about in the the responses of describe volume and, describe instance, could be inconsistent, and so we detached a volume and that exclusively uses described volume, but when we attach the same volume to some other node or same node itself to because you, the bug we have seen, is on the same note itself, then what happens is that attach call makes a describe instance, call and chicks device mappings what devices are attached to this load and at that time the described instance will return the volume that would that we just attached in the in the response, and then we assume that our device is already a time.

B

So we don't call a div attached because described instances and we will proceed further for them for the. What is that wait for attach we pull for, but that never succeeds wait for attach never succeeds, because we never call attach. So that is one quote.

E

But I wonder if, if.

A

We did a paranoid call to describe volume, I wonder if it's a I wonder if there's a compounding problem because we're switching API is I, don't know.

A

Of course you have any insight on this but like if we as well as doing describe instance like list the mount points we did describe volume as well on an attached to to sort of double-check, which I know is sort of like, but you know it's double check the volume status for that exact problem that I'm on just described it will it can't hurt other than quota, but it's obviously a problem.

A

D

Quota would be my only concern. I I, don't know what that would do. What that would yeah. That's all like. Okay,.

A

All right, well, I, think I think if we all have a look at the washer a pair of issues, it looks like parishes, yeah.

B

And- and the second case was, was the inconsistency in describe volume itself actually well, what happens is like we detach the volume and we are trying to attach the volume now to some other node. So what happens? Is that attached?

B

So what happens? If we try to attach it and before we attach it, we actually pull. Do a describe volume and the square bottom says volume is not available, so we return an error that this volume cannot be attached and a tightness controller when it sees that the volume is you know like cannot be attached it.

B

It raises an error called dangling error and that added adds the volume to actual size of the world and and and but in truth, volume is not actually attached to the node, because the describe all them just returned stale data. So in that case, what happens is that the volume is reported as attached on the node status and the cubelet weights on for the volume to appear on whatever path that that described wall in return and it and the volume never appears there it just so. The the part is forever blocked.

B

Actually, so that's the second instance of it where the described volume itself returns incorrect data, but that one I mean did we have a fix in like I'm thinking of like fix it. Proposing some fixes in ink unit is part of me feels wrong to kind of you know like fixed communities.

B

This thing to work around some of this limitation, like fix on with the core community construct to work around some of these API limitations, but and I, don't have a good answer than to to fix some of the way we handle this in like surface. Is this one level up so that coordinators can deal with this correctly yeah.

A

I guess my one hope with CSI is that my understanding is that there's a little bit more opportunity to have objects for volumes. I, don't know if that's true yeah.

B

A

We could like we could put a little bit more and next we have. We have volumes already if you're, using assuming using PVCs, which I imagine most people are about now. So we could put a little bit of state on that and sort of try to like I. Don't know like avoid the worst of it.

A

I also think that we, we should be more willing to expend quota to save from serious bugs, particularly if the quota is on a sort of user initiated operations like an attached or a detach, rather than just background, building like background polling is what really kills us like, adding and adding any more calls. There is like really bad, but I feel that adding calls to operations is less bad because it like, if we get throttled, we just basically we're not going to continue to.

A

We may even slow down the speed at which we can launch pods or whatever, which is not great, but it's not as bad as like, just getting stuck and like being on it, just having no quota at all and being yeah, which is what happens on. If you make do it. An overly aggressive, continuous background hold, for example,.

B

Okay, okay I think we can to cover some of the state by doing that. So, okay, I.

A

Think we have a wind coming anyway on some calls because of the I, don't know what it was introduced, but there was something about the polling of volumes on an attached right where I think there's a PR I think you coming tonight amount, whereas, like we're gonna introduce like exponential, back-off polling and right now, it's a pretty aggressive, like is every second polling for 30 seconds. So we should get some polls there. Yeah.

B

But that that was a recent change that was artful for attack that just for, like after creating when the walls becomes a.

D

F

A

B

It's it's, it will contribute API quota, but this I thought.

A

Was it's hash, yeah I think yeah well, cool.

E

A

Right, well, that's thank you for drawing our attention to that.

B

I have one more related items: I go.

A

B

So so exponential back-off we are currently like doing whatever we are implementing very implementing, like we just back off by certain amount, and we do it but I EVs, like AWS STK, said there should be a retry after header present in their responses, but that's never present actually does Amazon know about it. Can this be fixed? Is.

A

There an issue phase, I'm, sorry I meant or is there- is.

B

No issue for it I'm sorry I'll log, something, but it just said when we had we observed when we are trying to fix this exponential back-off. We are right now we are basically blindly exponentially. We are not backing off I, don't know how the AWS could have internally works. Like let's say we are doing a two minute operation here. 1.2 is our exponential backoff factor. Then we just back off like that, but AWS STK said you should back off until retry after header. What, if we try in between like a request?

B

Is it still counted against the quota, so so so I'm just trying to see it might be best if our exponential back-off or when we throttle the API request, it could use the retry after header, but currently we don't you, we cannot use retry after header like currently, we are throttling right when we get request limit exceeded error from Amazon, but but we are just throttling by our own heuristic. There's, there's no logic, there's not much logic to it. So, but if we could get retry after header from Amazon, then we can.

B

You know like do more. Smart throttling, so that we know that okay, this API call is not going to succeed until that, it's probably a bug on Amazon side. I, hope someone like in Amazon can fix this or something that missing to try after header I'll. Just try to are.

E

Using this header exists, so does not, it should exist, it should exist, it doesn't.

A

Exist: okay, yes, yeah I mean I. I am guilty of writing the terrible global retry handler, which is very hand wavy and heuristics, but it is designed exactly as you say, because there's basically no other way to know when a when you are the limits, aren't accessible, yeah, yeah, and so we basically have to just back off and we do it sort of a global across the globe. Leave me across the process.

A

Whenever we CIA retry a quota error Karen what it's called rate limit exceeded yeah, if a hint would be great if it was accurate, yeah I think a bug would be wonderful, I, don't really know if yeah.

D

We can definitely start looking at it from that perspective, if.

C

There so yeah the STK says that a little sister, yes, is.

B

There should be a d'etre after Heather, but it's not present. So that's the problem. Okay,.

D

I would say, file an issue drop it in the AWS slack, our channel in in slack. So we can take a look at it and see if we can get more information.

A

Great thank you. Thank You, Kristin, Ament cool. We have another item on our agenda from Seth, a device, encryption provider, PR number 16 I- think that's.

F

Just a question for you: Justin, we want to add mica to as an owner of the repo he's trying to help me. You know pick up where we left off and get automated tests and some for release process going. I.

F

A

That sounds like a great idea. All.

F

Right I just wanted to make sure it was okay with you, I.

A

Have approved it, thank you, cool thanks, sorry about like leading you shouldn't have to do. That's I apologize for that, but thank you for thank you for it's Mike, cool and then ah yeah that'd be great, and then uh we have another item on the agenda ritesh. Is that correct, multiple pause? On the same note, do you want to talk us through the issue.

A

We can't hear you, can you hear me, I can.

A

So there's an issue linked: eight of us EB SCSI driver number, 295 Chris. That looks like you see seeds who I presume cryptid Chang, who I presume is the correct person to look into this yeah I, don't know if I don't know. If there's anything in particular, we should discuss further on that Ritesh. Essentially, it's not possible to share the PV back with the EBS volumes to share between pods. On the same note, if the pods are deployment objects, whereas it does work with stateful sets, that sounds odd.

A

A

Okay, well, I will certainly read this issue. As someone say, something I'm sorry I will certainly read this issue. This is surprising because deployments and stateful sets should both end up as pods and the volume logic shouldn't really change. The way mark is at the pod level, so it shouldn't really matter what they use new deployment or staples that, but just because something is surprising does not mean that it is not true. So this is certainly interesting and yeah. I think yeah.

A

Thank you for raising the issue and thank you Chris for ceasing to hang on that I.

B

Think this should work. There should be no difference which one stateful second deployments, I'm.

A

Always surprised it works for saiful sense, but okay.

D

Okay, real quick internally Chang also confirmed about the above issue, about the not having an optimistic right or or fixing that issue. Cs I will have that I wasn't sure. So. Okay.

A

We have reached the end of our agenda. I, don't know if anyone else has anything else, they would like to bring up or Ritesh if you're able to I don't know to chime in. If there's anything else, you want so fine your issue, because these I wasn't able to hear you on there, but otherwise.

A

Everyone gets a little bit of time back on their Friday and once going twice going three times. Okay, wishing everyone a very happy end of May and see you in two weeks see.

D