Red Hat OpenShift Santa Clara 2019 | OpenShift Commons Gathering, 18 Mar 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift Commons Gathering 2019 Santa Clara Future of CGroups Breard, Heo & Brandenburger

Description

OpenShift Commons Gathering 2019 Santa Clara
Future of Linux Control Groups
Ben Breard Red Hat
Tejun Heo, Facebook
Filipe Brandenburger, Google

A

Well, thanks everybody for, for being here really quick. My name is Ben Briard I'm, a product manager here at Red Hat, so my focus is on kind of lower level pieces of the operating system.

A

I've been mainly focused on everything, we're doing on the core, OS, relic or left side of the house, and you know kind of how we're tying into OpenShift that was kind of talked about earlier this morning also spent a lot of time, working on system, D and some large container technologies and so forth, and we're gonna talk about a super important topic today. If you guys are familiar with control groups, raise your hand, has anybody ever heard of these? So like half the room?

A

No, no, no leave it leave your hands up now, if you've ever logged into a Linux box within the last five years, it's a little I'm, not I'm a little surprised at that, but okay, now what about? If you've logged into Facebook come on guys or use some of Google services every hand in this room should be alright, that's pretty much everybody! Okay, so you have all interacted with Louis control groups, whether you've known it or not. Right. This is one of the primary.

A

You know kernel api's, that we use for containerization, and you know isolation, accounting and all this kind of stuff. So anyway with us today, we're really excited that these guys came. So. Thank you guys for being it's a big deal that they're here so I'm. Sorry, hey, June, he's the upstream maintainer for control groups from Facebook, so you know tell.

B

A

Authority on this topic, so thank you very much for coming out here and then Philippe from Google, who is literally on the front lines, doing all of the v-2 integration stuff in container space or should say leading that effort so and we're gonna dive into some of those specifics as we go.

A

But first I'm gonna kick it to take June to kind of walk through sure.

C

I'm sure group see Facebook, so we're going to start with what we are doing with a resource, control or signal to at Facebook and then you're gonna into how you know we're, hopefully fit it's planning to integrate everything into Lib contain relevancy so that they can change on to see group to they are still on single one, and- and you know, then we will go into how we can make the transition for the creator ecosystem, so I'm, just gonna start with my portion, so I want to start with this graph.

C

If you think about a web server, you know it's a it's a from our production web server, so we have a lot of restaurateurs, but everybody has a lot of web servers right. If you think about the web server, you know larger flip. It's not gonna just have that web server right. It's gonna have a lot of my training in a sheriff or other. You know, machine maintenance and and all those stuff, and sometimes like those things, go wrong right.

C

You run chef, you know it runs yum and whatever, and somebody you know, makes one innocuous, one-line change and sometimes they're just leaks a lot of memory right. Imagine that happening, so this is kind of simulating that so the the purple line are just consider a little purple line. There's the IPS leakers per second right right, so Ted line is what we are doing. What happens when you're doing load testing so the tag web server is now fully loaded.

C

So if you look at the first red line this, you know where we start ten mega BPS memory leak in the part of system which is just you know, support support part of the system. So it's not the main workload, but it's a management part and it starts leaking memory. I feel about I, don't know four or five minutes. You know it consumes. Whatever is left in terms of memory in the system and the system started thrashing right and then it dips, because you know there's no memory, and this is a hard disk machine.

C

So hard disk is really slow. If you run out of memory, you're accessing hard drive and it's slow, so it dips and the content reclaim some memory. It manages to come up again, then it comes down again and it just dies there. That flat line is just loss of data points right. The Machine completely checks out after a while, so we disabled our de mediation mechanism like an outside Mine Train, so it stays stay down longer.

C

It will come a little bit sooner, but you know it still the same thing after about I, don't know half an hour. The machine called me booty and then it comes back up again now. Imagine like this happening synchronized across a lot of machines, and that does happen in the fluid right. It's kind of surprising. Sometimes when that happens, it's kind of really scary, but you know some hose changes some bugs trigger. At the same time, that's really scary.

C

So if this happens in a lot of machines at Facebook Facebook, it's going down right, nobody's gonna, be happy everybody's getting paged. So it's not a happy situation. Now. Look at the queen line, that's the same thing. I said doing exactly the same testing, but with resource control setup to protect them main workload from the rest of the system. So the first first 10 megabyte line. We started the same thing: it dropped a bit drops a bit and the covers right and it's completely fine. So we started it again. Another leak, that's the same thing.

C

We did these three times and you don't need tips, maybe 20% for a minute or two right. We can survive that right. People may get paged, but this is survivable. We all about happier with this graph.

C

So you know from purple to green. It's a lot of improvement right. We all want to have that. So we are resource control group at Facebook, and this is our mission statement. We're conserving full OS resource isolation to unpack that a bit we're conducting. Is that that we don't want to pay right? We want we want to have resource isolation, but we don't want to pay overhead nominally right like if we go into like in our production, tiers and ask them yeah.

C

We can protect your workload, but you gotta, pay 10% of your machines right nobody's gonna buy that so we want to keep performance the same, is it gonna be mostly free? Throw s means that it gotta be transparent right. We don't want to go into the production.

C

You know teams and ask them to put like severe restrictions on their locations. We want them to be able to keep doing whatever they've been doing, and we want to layer resource isolation transparently, on top so test our goal, and if you think about that right, I mean because sounds simple right. If you have control groups, which we says that you know it can, categorize workloads and distribute resources should be easy, so the term epitaxy is what we use for those management part of the system right.

C

So every host in our fleet has to pay a tax to be inside Facebook inside Facebook flit, so there's ap packs, and so the project became a P tax right. We all wanna. We want to protect the manual close from malfunctions in the text part, and we chose this project because it's minimum right I mean if you have working resource isolation. This should be possible, and this is the minimum you can. You should be able to achieve. So this is the minimum viable product in terms of memory and IO isolation.

C

We didn't get to I mean we mostly invested in investigated memory and IO isolation, because you know CP is relation, is easier and more difficulty now in a different term. So we concentrate on memory and IO for this project and these are the requirements. So when something misbehaves in system applies or in the rest of the system, which is not a main workload, the main workload, the impacts on many workload should be limited right. The main workload should be able to survive.

C

It might not be 100%, you know perfect, but you know the impact should be like 1020 percent for a short while so that you know the fully can stay stay up and, as I said before, we didn't want our applications to be changed at all. We wanted visual control to be layered on top transparently and, of course, we're conservation. We don't want any performance regression like we can now sell this. If we they have to pay five ten percent, it just doesn't fly so uhm chorus sounds simple right.

C

If you have working resource control, how hard can that be right? You just said memory limits. You know I own limits and you should be done. But if you look at the project, name is epi text tool right. It has to because the first attempt failed miserably.

C

Like you know, everything else at Facebook.

A

C

Recording so um so one of the problem was that I don't see any clock here. Let me undo okay sure so one of the problem, so there were a lot of challenges in different areas, but like the biggest one or like the one in terms of memory management, memory control was that just like. If you look at a single one right, there are two knobs right in terms of memory control, one is memory limited bytes and the other one is soft.

C

Limiting bytes right there like a subtle differences but like what they eventually do ultimately do is putting a hard cap on how much memory testicle can consume, and this thing will really work. Well, so we we try to use it. I mean it's the obvious thing to do. Right: I want to protect main workload from the system that slice, so we're gonna, put memory limit on system that sliced and there should be fine, didn't really work out because it turned out that under maximum load, machines are often oversubscribed right.

C

I mean not constantly I mean if the machine is constantly oversubscribed. It cannot sustain the workload weight right, but it would nominally stop. Oversubscribe can parolee here and there right when something happens, right and and in my trouble bit. But you know the machine would be able to sustain that. The problem with like putting hard limits on memory consumption, is that if you put it too low right, if you restrict the measurement part too hard, then the system will we suffer, because the management part is constantly flashing right.

C

But if you put it too high so that you know the management part can can breathe, then the protection might not be enough and the problems that a barrier should change it dynamically right and if you said it at any level, you are you're selling a barrier in terms of memory distribution.

C

So you are kind of lowering the overall memory efficiency if the system is oversubscribed already right, they're, just gonna put it further into the you know, Dib, and so we realized that when we try this right, more muchness were folding over ten before so we can work out and of course, it was really difficult to configure you gotta have to like find an exact.

C

You know number of bytes which can write between those two lines, and a lot of part was that if you think about like memory and I/o, they are really not separate at all right.

C

So if you said memory limits on something and and that's smaller than if that's smaller than lower than is working natural working set, it's gonna generate a lot of BIOS because you know doesn't have a lot of memory. So you know color memory measurement Cup kicks in and kicks out. You know what it thinks to be cold pages, which are actually active, working sad and you know soon after you try to fold them back in right, so that just generates a lot of is whether you have swab or not. What doesn't really matter right?

C

All your code pages get swapped kept. You know 48 out and 48 back in so that just generates all other wires and if you're heading like IO storm happening in in the measurement part. This point of fact your main workload right if main okhla test anything any I/o, it's gonna get affected, and so yeah tell us another problem that we noticed and if you remember the first scrap that I showed you right. There's like this. Can you mean a stretch where there's no data point being reported right?

C

The machine is still alive right, I mean like it's powered up. You know it's running like full Celt like if you look at the energy consumption from the management interface. Is this consuming all the power there is? The problem is that colors way of recognizing that the system doesn't have enough memory, it's kind of crude right and it's it kind of in a sense. It just has to be a really conservative, because you don't want Connor to be.

C

You know killing things willy-nilly right, so so Connors criteria for tree during em killing, it's really conservative and that often means that you would fall into a condition where the system is really not doing anything. It's just kind of thrashing. The only thing is doing is thrashing, but the color would still think that yeah it seems to be making full progress, so your service is down, but the conflict that is okay. This way you get.

C

You know that 20-minute stretch of the Machine being on these pencils, and then you know something external has to be drove that by rebooting it. So obviously you know that's not good right and it also combines me the first point right. If you said memory limit right, the really interesting thing is: you can fall into this thrashing condition. Even with you know, free memory available right.

C

A cigarette has memory limit the workload you know hits against it and it goes it tries to go over buddy can't so it keeps thrashing, and then they can actually, you know, bring down the whole system to make the whole system on these pansit. So, by selling memory limit you actually made your system worse.

C

That actually happens a lot and also IO control, and also we didn't realize that we realized that we didn't have any working I/o controller, because, like one example, is that, like the memory controllers, the I/o controls, we have don't really translate across.

C

You know really high-ups devices, and you know hard disks at the same time, but the bigger problem was that none of the existing ones handled shared is that well, if you think about like a fire system operations or swap right when when, when you make a make changes to files, it generates, a lot of metadata is right and the thing with men are, they are benedetta iOS is that take they are serialized if you XT for journal XT Putin is fully senior, so it doesn't matter who you are.

C

If you create a journal entry, you create the street to order in there right. So you cannot really so that it just has to be executed right away, but right you think about it. They should still be charged it to the guy who caused a Taiyo. So these none of the existing I/o controllers did that which means that, if somebody causes a lot of metadata is all a lot of sois buyers, they would get away with it without being charged, and you know that, obviously in agreeance isolation, so we worked a couple years on it.

C

You took a lot longer than we expected and- and so these are the solutions that that we came up with so in co2. There's some memory too low and memory that min all right. So so, there's a low mean low, hi max right, so high end max highest best effort limit, Max's absolute limit. You know, if you try to go over it, you're gonna get killed low is the kind of you know. The other way around low is best effort, guarantee the corner might break it.

C

If it's in emergency mean is stricter than that right it would. The corner would kill something else: people freaking breaking it. So then, with the low end mean lift up. You know to push down and another part of that another kind of really nice property that we added to low in min is that the protection is proportional in the sense that let's say you're working society's 10, gigabyte, right and and it kind of varies over time. Let's say swings between 9 and 9 and 11 gigabytes and like without problem.

C

What it does is that you can set, then the protection si aching, abite or 60 kilobytes, even and then it is, it will keep the proportional I create you're, gradually folding protection beyond their point, so you don't have to get the number easily right. You can just kind of ballpark it conservatively and it'll still give you. You know sufficient protection, so that made you know, configuration a lot easier and we basically can use almost the same configuration. You know everywhere, not everywhere, but you know almost everywhere. So it's in terms of operational simplicity.

C

It helps a lot and and Joseph Pasig of our team implemented IATA latency. This is completion, latency based IO control and the one thing special about this controller. That we hope to add to other controllers too, is that it hinders back charging, meaning that if a C group does a shared IO like a metadata or swab IO, it will go through because otherwise there will be per conversion, but it will get charged later, like a credit card, you know just spend first, but you get charged you later and you pay for it.

C

So it maintains overall isolation, and the thing is that I I said that I said that memory and I were conjoined right. So if you try to control memory, you have to control IO together. Otherwise you are just you know, pushing on one side and getting getting licks on the other side, and this is why, like one of the fundamental differences between secret one and C group 2, so single one has like per controller per resource type. Everything is completely independent right.

C

There's no like you can create multiple trees and there's no easy way to correlate them with each other and they're kind of creates a problem when you have to control resources in conjunction with other resources.

C

So in signal to like there's a concept of resource domain, so when you create you know memory pressure, you can tie it to the same resource domain. That Iowa controller can look at, so you can control post memory and I/o on the same resource domain yeah. That's one of the critical enabling things about 0-2 to make this possible, and, and also we edit, some people psi.

C

What what psi tell you is that, like how short of a specific resource the workload is under right? So, for example, if it says I am under 20% memory pressure, it means that the workload is 20% slower because it he didn't, have enough memory for the past many average or it has different average intervals and that helped a lot in terms of allocating resources and monitoring workload to health. If you remember what I talked about Connor um killer right, the problem with Connor um killer was dead.

C

It couldn't tell whether the workload was healthy or not right. So it would kick in too late to be useful, but using PSI we get a canonical way of telling where the workload is just healthy or not right. If something is slowed by I, don't know if your vegetable is slowed down by 40%, it's obviously not healthy, well, you're, not doing a good job and and colorful color um killer right. It would never kick in at elevation like on a loom killer would only kick in when the pressure goes up to 90 90, something percent.

C

So based on psi excuse me: we implement this optical undie, it's already to like, like everything else, so it watches like the system. Metrics psi is the main source, but it also watches other metrics and it is really configurable. So you can tell you know things like you know. If workload is suffering more than 5% and system, you know the management party is doing more than this. You know, we know that you know. System part is messing up the workload.

C

Then you will go out and kill whatever is misbehaving in system, and on top of that we also deploy paul-etta's on or resource control machines, and the reason for that is a little bit convoluted, but a little bit subtle is that we we used to use x fo for our loot file systems.

C

A lot mostly and the problem there was that yesterday for journal, creates this really bad prior convergence in Britain's that, where a high priority see group would end up waiting for a little privacy group, I'm sure it can be fixed, but, like our our team, a lot more boniface expertise than ext4, so we fixed everything in power, FS and we're just reaching over to bar FS. But you know this should be fixable in other fashion systems.

C

So um well, that's that this is a similar test, so we are in the process of certifying this or, like you know, qualifying this on different service tiers and deploying deploying them- and this is a more modern motion- distance, SSD machine and again you know, green and top-line. You know green line, you know three, three memory leaks, it doesn't. Even you know it doesn't matter. This is fine purple line.

C

You know, that's not good, but you know the difference is more striking now, because you know we have more I/o, better I/o, and this is a memcache here. It's kind of similar testing and the graph color is not great, but, like the top, you know the green line at the top, which is barely visible, is the protected machine and the you know, orange line, which is you know going way, is you know, obviously the unprotected one and with this was with I, think 50 megabyte per second leak.

C

You know, and you know the with the protection the machine didn't care. You can't. You cannot even tell in this performance any difference.

C

So um we have this minimally verified, minimal, Viable, Product in terms of work, conserving memory and Iowa isolation. You know in in this epic text tool, which is the workload protection we load in host protection scheme and I said that it is the minimum minimally Viable Product right. What that means is that of the of the pieces that I talked about like all these things, if you take out one, it's not going to work right, I mean it may work to a certain extent.

C

But it's you know: it's not going to be reliable, because all these pieces are needed to actually contain both memory and I/o. And if you don't contain both, then it's not gonna work, so we had their basis now and we are in the process of developing more profiles.

C

So let's say it is it's workload, protection profile, and now we are in the process of developing side the workload profile where, if you have like this isolation, if you can protect me overload from the side from the rest of the system, then you can do you know whatever you want on the on the side, part of the system right without affecting when you look load too much and and so we are in the process of experimenting with and developing that and the requirement there is that the latency impact on me workload should be limited or controlled, and also there should be no difference.

C

No digression whatsoever in terms of disaster readiness, meaning that when the main workload wants to spike up it should be allowed to as if there's no sidewalk load and the second. The other thing that we are working on is well. This might be more interesting to to you guys I guess. Instead, you know when you put multiple containers or workloads on the same system, and you want to say these guys. You know, 20% is guy, guess 40%. This doesn't work reliably yet mostly because our bio isolation, so we are working on that one.

C

So one click away. That I want to say is that, as I said, having one component configured doesn't really help you much. It might even hurt you, so it might be interesting thing to think about like if anybody wants a resource isolation in their system. You know it's not a symbol of it's a profile of configurations to protect all affected sides and I'm gonna hand.

B

Over to Melissa, thank you yeah, so I'm I'm I work at Google, I work on kubernetes I work on the node component, so cubelet and I have been working trying to bring all this goodness into the kubernetes world and and mainly the first component here- is bringing version 2 of C groups into kubernetes and so I'm going to talk about that that project a little bit.

B

Of course, we want all the components we want to look at psi when I get only working on that you, but it's a group with you is basically like the first step towards that. So earlier on, you know in Russia we're talking about like the the components in the stack, and one see is basically the component that ends up running the container so like wider, either you're using my OpenStack or or sorry, OpenShift or kubernetes are using cryo or container.

B

Do your darker, your your mainly one, you end up using one see and run see created this library called leap container to abstract all these steps of creating a Linux container. That's that's what we see today as that's: that's what we basically wanted! A lid container is not see group to friendly and that's what we're trying to fix so system is the path reward see goofy to because everybody loves system day and essentially system D has been embracing.

B

C group QAPI, a lot of signature design was was was was made based on feedback from from system D system. D has become in a way like an API to to chicano features from from user space and yeah. So the container already has two separate drivers.

B

One of them tries to quite directly choose to the C group file system and the second one try to to go through this through system will actually open ship who use this system did driver already, so in a way, it's kind of like problem solved, but not really because it's it's kind of like going around systems in many ways and writing directly to the tree and making lots of lots of assumptions about version 1 of C groups so system.

B

He also offers like a transition path, essentially because you can configure to use Tremonti group v1 only that you hybrid mode, which is what you end up, seeing most distribution stays where it's both is Moe is marking both version 1 and version 2.

B

It's basically using version 1 to control all of the the limits, but version 2 is already mounted there and unified mode, which is only see butcher is mounted and that's where we want to go so luke container has this new group driver and it makes out this group 1 assumptions right indirectly to the C group 3 and doesn't work at all with the unified hierarchy and so I'm starting a plant. You not three-step plant you to fix this system.

B

District group driver first of them is actually setting setting system D properties instead of writing to the C group 3 so like when you start the system, the unit bit like a service unit or a scope unit, which is mostly what container managers use a scope unit, you can tell it which kind of memory limits, CPU limits and so on. You use so you're. Basically abstracting. It's telling system the these are the limits. I want and system.

B

They can figure out whether it is in C, so you've won, or a group Q or in the future. So group 3 is gonna, basically give this kind of API, while system is useful to to write the status properties and modify this properties reading the statistics is something you want to go directly to the to DC group 3, to which you will it so you're gonna have to detect whether you're running on the unified hierarchy or not, but there is there are some simple and documented ways to do this by checking this.

B

This is a fast group filesystem checking. If it's a group to file system directly, then you know you're using unified hierarchy, and you can that detect a hybrid case as well and step 3 is fixed delegation, so the ligation is a concept in system. Do you are like you create a scope unit and you give it to the container manager, in this case like run seal in container cubelet, and once they get this unit they're free to use this subtree as well as they want.

B

But there's a caveat that, like you, should actually create a subdirectory, because that top level that the top level item directly like the top level c group, is one that system D wants to keep controlling in case you wanna saw it so you can actually enforce like memory limits and not let your containers that are running there kind of bypass the limits set by the system.

B

One problem with with with with doing this is that OC ice packs, which basically come from the docker image and docker aspects that were created like few years ago, were created with C group one in mind, because that's what was pervasive at the time and so like a lot of the items that this pack lets you set, don't really are not really matches to the C group to so. In some cases we can do translations. In some cases we can ignore some settings. You set something that's not available and on Scripture we can ignore it.

B

But the main thing is like one of the big motivations for moving to the group Q is we want to start making sure making good use of these limits? Like the mean with the memory memory limits like traditionally, we only had the the hard limit at the top and the soft limit for a force up. Reservation. That's not as good as the new sub the new reservation limit, which is memory law and the the hard limit, for instance in in security.

B

One is something we we don't even like really use right now in in kubernetes, because we we don't want ohms in our container, so like we're, basically monitoring and evicting pods, and we will would really like to be able to to set some pressure and some containers when they're going about order. Their assigned limits and the memory high is actually like a great tournament to do that. So like we want to start using those new new limits, and we probably need to address that. You know to the OCI specification as well: fantastic.

A

Excuse me, so it's interesting so, for you know well, over a decade, we've had secrecy, one in place right. So it's one of the most okay kind of well established. You know like underlying API, is right that we have to work with. So it's it is pervasive and any type of you know resource allocation. You know today uses it, so you know we contrast that, though, to what we hear from you know talking to customers and those running large, you know to be OpenShift or communities environments.

A

You know, there's a big push right now of running running this stuff on bare metal right because you don't. Why should I pay the tax of that? If you know I already own these systems, for example right now, one of the one of the challenges there is just you know we're actually talking about the silver lunch. You know there is overhead in the system right and secrecy to is not a magic bullet. That's gonna just solve all of that challenge, but this is actually probably one of the best knobs and levers we can pull by.

A

Moving that you go system over to v2 to get that utilization up to warrant running on on some of these larger servers right today, you end up tapping out a lot of subsystems before you get to capacity on a lot of systems, or at least that's large amount of the feedback we've gotten so there's other obstacles to solve before we can just flip the switch to where most Linux distros are are running v2.

A

You know when you look at it from the operating system point of view. You know we we gain a lot from sensible defaults in Linux right if you've ever done any performance tuning. You know well, there's a reason why it set that way out of the box. So you know when you install rel or Fedora or any flavor of Linux you, you expect these three things listed on top to basically just work out in the box right.

A

So when we look at kind of where v2 adoptions at here on the side, we're actually saving pretty good, so they've already done at least with liberty and KVM stack version, was it five? Has support for the memory and Co controller, which are two of the most common ones? Are you using the bird side?

A

The rest are targeted for five one or five, so vert is clean to move over to v2 and then from the container side right, that's actually, probably the biggest, no, the biggest that's the next barrier right that we have, because it would not be good if you install the box and your docker run or pod man run just fails right like that's.

A

It's not a good experience, that's, which is why why this work is so is so important here when we look at other container engines system, the end spawn and LXE already support v2, so it'll be fantastic once you know, OCI and kind of the run see stuff works as well. Now, on a kubernetes side, this is actually probably the longest road we have ahead of us. Because again, you know. We talked earlier that the OCI specs are very v1 centric. Well, so is a coop api in several ways.

A

So this is actually probably the longest road, but you know we. We can't actually start that until we get higher level stuff done so sorry, I mean all you got. You got some homework so yeah, so the work is going on the specs. There's meetings like like this week happening on this stuff right, so this is.

A

This is all actively in progress right now, but this is actually an interesting lesson that we can all learn from when you write a technical spec, there's a there's, a cost of that when you write it to a very specific implementation right, and so this is kind of why we're having to go in and deal with this right now. It's an interesting lesson around that and then some of the other controllers that are used commonly in containers.

A

A few of these haven't actually landed upstream, but they're mostly done, but they haven't gone out in a million kernels I think the CPU sets that landed right and freezer is liyan, so that'll be that'll, be all all lined up here really soon, but again the thing that like really actually concerns me about this, is we don't want any. We don't want the ecosystem to be dependent on one versus the other. That's bad! It's a bad experience right. We don't want a Deb, an RPM kind of situation here.

A

So you know when things like open JDK, actually do a quick check and read the see groups to see. Am I running in a container or do I have the whole system right right now, that's a v1 specific call. We need to get to a space to where more user space isn't written around a particular implementation of cgroups, because now we have to like that problem goes up from the cluster view.

A

We have to actually track and taint nodes or or label nodes, rather to know where you can run or and like that's a problem that what he wants to deal with right. So we got to get we got to get over to v2 before this gets out of control from that hand. So from the distribution side of the house, we are we're working on flipping Fedora to the fault of e2 of Fedor 31, so that's like November timeframe and so like the the criteria for that is libvirt.

A

The run see stuff has to be in place, or else we can't flip the switch we know. Kubernetes is not likely to be done with that. At that point, and of course you can always easily boot a system in v1, and it's not a problem so that'll be like it. You'll opt to v1 in the stuff on the Relf side of the house by the way rl8 is in beta, everybody here used it, of course, just for the recording all hint all heads are nodding.

B

A

Are you laughing don't have to cut this out so rel rel 8 is gonna continue to default to v1, but in a in a soon minor release, I said a 1, but maybe a 2. We will also have full support for v2, so rel 8 is our release. This is going to kind of bridge this gap and live in this dual world kind of secret life.

A

So anyway, and then you know if we miss on through or 31, what we don't want to have is the same situation in like a realm 9 world. So that's that's why we got it. We got to hit this in Fedora 31, and so that's that's why this work is so important.

A

We do you think I did reach out to some of the fantastic people and Sue say they don't have a specific date, but they think it may be possible to flip either, maybe in pot before Fedora, so we'll see how that goes, but I'm not actually aware of other distributions plans, but once one disrobe normally goes through the stuff. A lot of people follow suit. So now, if anybody wants to try v2 now, it's super easy to get your hands on a fact.

A

You can just mount up, you know type secret, be to none and then pass it a path by the way, that's a 90s reference. So if you're like under the age of 30, you may not. You know remember that that's okay, but it's really easy to just get the hierarchy. I started. Looking at the controllers. A better way to use v2 is actually to boot your system with the system, the unified secret hierarchy. You know just to pin that to the kernel boot up, everything works, you have the unified hierarchy system v.

A

Will little translate any of the well. You do like a best-effort translate of any of the old like CPU shares, become CPU weight and these types of comm higher level controllers, but that that's the newer terminology there. So it's really easy to just fire up and actually use this stuff. So if you have systems today that aren't doing containers, vert and Kubb, this is stuff you can go ahead and and and leverage and get the benefits like jejune walked. Us through that facebook's doing, which is fantastic, I can't wait.

A

Okay and just a couple other quick hitters here, so you can run. You can't run like this hybrid mode, where you have V one and two available, but the same controller can't be used in both spaces. So it's it's kind of it's kind of useless, so we really recommend you pick one or the other, that's that's ideal and yeah, and then, if you just want to disable controllers one at a time, you can do that as well. So that's really all we had.

A

We just want this to be like an awareness talk of like kind of what is that value secret v2? Why is it important that we go there? We don't want the ecosystem to kind of split. We don't want more user space being attached to one or the other right. This should be a low level implementation that you know your container runtime or your wonderful init system, abstracts away for you right. So that's really why we want to make this aware with everybody. So we've got a few links here again, we'll make these slides available.

A

So if anybody wants to read up and become an expert, this is a great topic to spend a lunch break on and impress all your coworkers, so I think that's that's all we have so thank you for having us and thank you guys for.