Cloud Native Computing Foundation KCD Amsterdam 2023, 6 Apr 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: INGs Container Hosting Journey by Robbin Siepman

Description

ING has been building its Container Hosting capabilities since 2018. In this talk we'll share the details of this "mission": - how we built it, - what does it look like - what use cases we support (and which ones we do not (yet ?) support) - how we secure this environment - how we dealt with year-on-year exponential growth - what mistakes we commonly detect with workloads hosted on our platform - custom code we build and are Open-Sourcing And of course a short "demo" to show this isn't just slideware.

A

So good afternoon, everyone your best Italian host. Ever it's uh it's again here uh we have had all lunch. We all have our refreshments I see no okay.

A

um We have now Robin from sipman, uh it's Robin sickman from ING lead developer, it's uh I must say it's um it's beautiful to have. This is our first consumer of kubernetes coming on stage. It's uh we always see vendors. Getting their talks through as not brother pitch is fantastic. You guys are doing a great job, but it's also nice to see consumers to see what the troubles that they go through to to enable kubernetes within their organization. So Robin.

A

Would you like to join me yeah? Thank you very much. Give it up. Thank you.

B

Good luck, thank you, hello. So the mic is working. That's good and everybody has a full stomach. I hope. So that's nice, I'm, going to talk about ing's, container hosting Journey, so uh like I read, is that we have our own private cloud and we're building container hosting there, and one of the servers that we offer on that container hosting is what we call namespace as a service, and this talk is mainly about what does that infrastructure actually look like, and how did we build that Journey?

B

So, first, a little bit, my name is Robert I'm the lead developer in the ing container hosting platform team. That's what that acronym stands for and ing is a bank, but we write a lot of software. We are very apparent in in Europe, but we are all over the world and in fact, the container hosting platform team. The team that I'm in is also a team that has many nationalities in there I think that's very cool and today I'm going to talk about.

B

Why do we actually do namespace as a service, as opposed to maybe giving full clusters out in the private cloud? And then what does that stack? Actually look like in the private cloud structure and then I'm going to zoom in on how we built the namespace's service on top of our overshift, so I already spoiled it.

B

We run openshift instead of kubernetes on top of kubernetes I'm, going to talk a little bit about the dependencies that we have to to build this uh this journey and then I'm going to zoom in to formal for the uh the controllers that we have some of the applications. So there's some python code in there there's some Gold Line code in there, so I hope you're excited for that and, of course, a demo I recorded it though. So it's not a live one, but it's a demo nonetheless.

B

So, let's dive in so we only offer namespace as a service, and the goal is I'm just going to go through the full slide.

B

It's a lot of facts, but in ing we have a full cluster right and we don't want to give full clusters to Consumers, because then we give out a lot of notes, a lot of resources and they will not be utilized fully. But if we build a multi-tenant cluster and we offer namespace as a service, then we are in full control of the compute and we can give out the resources to namespace of those exactly what they need right.

B

So if one namespace of one application requires 10, CPU, cores and 10 gigs of memory, we give it to them and another namespace requires etc, etc. So we spread it out over the cluster, but we are in control of the compute, and that gives us a number of advantages, also in terms of compliance, the the people that request the namespace. They don't have to worry about the underlying infrastructure right, the cluster it works.

B

They they don't have to know what compute nodes they are running on uh if they are patched stuff like that, they offer or they take the namespace- that we give them as a service and they also get the compliancy on top of that. So they know that the platform they are running on it has been penetration tested. It has all the the risk controls in place and so on.

B

It does mean that these teams- we have a lot of them. We have hundreds of teams that run that use the same kubernetes cluster. They are in control of the stuff that they deploy in their namespace, so the compliancy aspect of the cluster. Yes, that's in the container hosting team's control, but anything that you Deploy. On top of that, that's the the application team needs to maintain that um we will zoom in on that later.

B

If you want to know more about what it's like to run on the container housing team, there will be a talk from Adnan tomorrow and he will give you the journey.

B

uh So what use cases do we actually support on our clusters? We have two main use cases. One of them is 12 factors I'm wondering here show of hands. Who here knows what a 12 Factor application is quite a lot. That's good! That's good!

B

For those who don't know 12 factors, the easiest way I would describe it is that you focus on your app being stateless, so that makes it very easy to scale and if one of your applications gets killed, it doesn't matter you just spin up a new pod, so it's all about portability, scalability and so on. So it means on the kubernetes cluster itself. There is no persistent volume claims. There is no persistency at all.

B

If you need persistency in an as an application, you connect to an external database so external as in it's not inside the kubernetes cluster, but still within your own cloud right. So these are. This is the main type of workloads that we offer for consumers or for data services providers. So these are teams that really know a lot about how to handle persistency how to handle storage for those workloads. We actually offer a storage which is built on top of Port Works, where they are for us.

B

They are completely separate clusters all right, so this namespace is a service offering what I'm talking about? What does that actually mean like? Why do why do you have to build so much stuff for it? You can just do OC, ADM new project and you're done right. Well, not really because in the fact that, if you do LCM new project, then how do you know which users have access to that certain project? You need to give people access to it right. A group, maybe you're, also going to need some some networking in there.

B

So maybe this namespace needs its own IP address, and maybe you need your own private, Network and so on. We in ing we have Azure devops, where we deploy our code to on-premise, but there also needs to be a connection between that on-premise cluster and Azure devops. So how has that then created?

B

And, furthermore, we have multiple data centers in ing? So if one data center goes down, we can move to the other one, but it also means that if you request a namespace in one data center, it needs to be available in the other. So those are actually a lot of steps that OC ADM new project doesn't do yeah and we have automated that and that's mostly what this presentation is about to do. All of that we have a lot of components. I've listed them here.

B

One of them is the the ihv API that knows all about the different clusters and different data centers. How to orchestrate it. Then we have project controller that ensures that name. Spaces are on the cluster and yeah. There's more there's off delegators say: does controller image reporter pod research meter, end quote autoscaler, there's a huge list of them. We have about 36 at the moment. Obviously, I can't handle all 36 of them right here, but I will zoom in on three of them.

B

So I'm going to talk about the ihp, API project controller and the what is it the auto scaler there in this presentation, but before I? Do that I want to show you? What is our infrastructure? Actually look like so for ING, we have a data center, we have multiple data centers and then we have a team that offers bare metal as a service.

B

So they make sure that the physical compute nodes that come in they are registered and ready to be consumed by by platforms and then on one side we have Azure devops well, I should say: I do one Pipeline and which happens to run in Azure devops, and then we have a lot of apis that we depend upon. So we have seen the B that's for asset registration. We have a charging endpoint. There is some monitoring and logging in there security, monitoring and networking. So these are all apis that are offered in ing private cloud.

B

Now that we have our infra code, so everything for us is inference code and then we provision nodes via that using an IPI installation running openshift 4.10 by the way, and then we get our openshift container platform. So this is just this is an openshift installation without any of our own components on top of it. Yet then we have all the applications that you just saw. They're. Also an infras code via githubs.

B

We deploy them via argocd on the cluster and then we truly have the ing container hosting platform, and you can also see that these components they connect to the apis that are there right, the internal infra apis and then finally, we can offer the namespace as a service. On top of that. So we have a cloud portal inside ing very similar to what the public clouds have where you can.

B

Click like, hey I, want a new namespace and you go through it and then actually you call our apis and then the consumer has their namespace yeah and, of course, the consumers once they have their namespace. They want to deploy their application, so they have their consumer code also in the one Pipeline, and that goes into the name space.

B

Any questions about this slide because then otherwise I have to go back.

B

No, that's excellent!

B

So let me zoom in on some of the applications that we have. The first one I'm going to handle is called project controller project control is written in Python and it's for namespace configuration management. When we originally started building this component. We were very well versed in python in the team, but there was no python operator framework in for kubernetes, so we build our own, which we call scaffolds and I will zoom in on that and what the project controller actually does. Is it takes a specification of a project inside ing?

B

So, for example, hey I want a new project with this name and I have I need these resources it's bound to this group and then the project controller. It creates all the resources associated with it, so it creates the namespace. It creates resource quotas, it creates row bindings and so on, but let me zoom in on the scaffolds framework first, so the way the scaffolds framework works. Is you have your own application? So this is, if you're building python code right, you have your own application and then you import this stream watch.

B

The stream watch is offered by the discovers package which listens to a kubernetes object. So this can be a custom resource that you have defined yourself, or maybe it's conflict maps or whatever you please right and then via a watch. The stream watch listens to that and then it calls event listeners so event. Listen is also a class that the Scarface framework offers and that's what you inherit from. So you have a bunch of classes in electronic ventilations.

B

So this might seem a bit abstract, but if I zoom in like this, you can see that you implement a bunch of event listeners. So, for example, in in terms of project controller, an event can be like: hey I want to create a new namespace, so you have an event listener for a namespace. You have an event listener for research quota and then, when a new specification comes in, so a new custom resource comes in. So let's say somebody tries to create an ice HP project.

B

Then it goes to the first event listener that says all right: I need to create a namespace. If that's successful it moves on to the next one, it needs to create a resource quota, all right cool. Then it moves to the next one creates and now row bindings and so on, and the cool thing is that all these event listeners they are called in order, but if one of them fails so let's say one of them fails, then the ones that have already been called they will be rolled back.

B

That's what you see there with uh with the rollback.

B

Another component we have is called the ice HP API, so the component, you just saw the project controller, it's very cluster specific. It only knows about clusters, but the ihp API. It knows about different clusters, and that's what you see here. So we have a user. It goes to the workflow, it clicks, hey I want a new namespace.

B

Then you hit the ice HP API and it calls all those infrastructure apis that I mentioned before, and then it calls the cluster API on every cluster, which generates that Isha switch projects back and then inspected by the project controller, and then we have all these resources there.

B

Well, that's a lot of orchestration, a lot of things that need to happen, and all of it is of course time consuming. First, let me take a look, so this is what the API spec actually looks like. So these are the things that we offer. You can get some information about, your namespace, you can create One, update it and delete it, and you can also patch it.

B

For example, if you only want to update your resource specifications, then you can do that, uh but in order to go through all these steps in order to get the proper name space, these are the steps that we need to do so a request comes in and if we first need to get some Network information and that Network information needs to be registered in the seem to be so that's the asset registration part and then, lastly, if that is all complete, then we need to charge for it and we actually need to create the namespace on the cluster.

B

Now all of these stages, they happen sequentially. So only after stage one is completed. We move on to stage two and so on, but all of the units in a stage, so these blocks on the left and right are called units. They are concurrent so registering stuff in the CNB.

B

It takes a second or shall right, but we do it all at the same time same for that to speed up the process, but it does mean that if we are in the first stage we don't want to wait all the way up until stage three to find out that hey. Maybe there is a naming conflict right, so at the first stage we already do a check stage. We call it so we do some sanity checks.

B

If the request is likely to succeed, and only when the request is likely to succeed, then we continue with the Run stage and then we do some actual networking and so on um so yeah if the flow is all good, but at the last time you know like at the last unit something fails. uh Then we are in a little bit of trouble because we have executed a lot of actions right, we've created, we have the asset registration already and the networking is done, but somehow we cannot create this namespace on the cluster.

B

In that case, we need to roll back everything that we've done so far.

B

Now. What does that actually look like in code? So all the steps that you saw in in the diagram- these are it is in code. So we have this the first curly bracket, that is a stage block so to speak, and you can see the network creation step there and then we have the other steps that seem to be create the charging and the namespace, but they are dry, runs in the first stage and then we have all the other steps right.

B

I hope this is uh clean to see and then finally, we execute everything when the in the Run stage, cluster actions and reply call all right and the last component I want to show you is called quota autoscaler. So we found that on our clusters um or when you request an a space. The requester of the namespace has to fill in how many resources is your application going to consume and at the moment where you requesting it, it's very hard to estimate right.

B

Some people haven't even started: writing their application yet, and we are already asking them to give an estimate on how much they are going to use. So that's very tricky, and on top of that, then you start building your application right and you also need to fill in these Port resource requests. So how many CPUs is your pod going to use and then again you're going to make an estimate and maybe, after you've, you've seen it running for a while?

B

Then you know what your application actually uses like you do your performance test, then you really know what it what it does, but it turns out that knowing all your resources, for you request your name space, that's really tricky and to help with that we have a component called the quota autoscaler.

B

So what we do is we look at the namespace resource quota, so a resource quota is basically I hope. Everybody knows what a resource code is because we're all kubernetes, but in case you don't. A research quota is basically a limitation of the compute resources that your namespace can consume. So the sum of all the part resources in your namespace, they cannot exceed what it says in your research quota, okay. So this is also what you pay for by the way in the in the ind5 cloud, everybody pays for their research quota now.

B

If we notice that that research quota is getting full, we can watch that using this ihp quota scalar component and when it's almost full, we can automatically do do calls to our charging endpoint. So the to the ihp API, we say: hey, this code is almost full. This team needs more resources than we call our Automation and which increases the resource quota. This behavior is managed by a custom research called a quota. Scalar object, which looks something like this I hope. It looks really familiar because it's almost the same as a horizontal pod, Auto scalar.

B

So you have some behavior and you can also set some minimum values and some Maximum values and you have to read the behavior like what is the? What would you like the CPU ratio to be so? If you set down the values, 50 and 70, it would mean that I want my resource quota to be between 50 and 70 utilized. If you put down 100, it means you want the 100 efficient usage of your research quota.

B

That means that yeah there is no space for pods, essentially to come up you're fully using your research quota.

B

I want to share some metrics with you with how that affects the research allocation on a cluster. So not too long ago, we were running on overshift 311, and this was what the CPU allocation looked like. So we have over 9 000 cores allocated in resource quotas. These are the actual requests. So this is what the research quarter looked like.

B

What the the namespace have, what you pay for and then the request is the sum of all the polls like what do the polls actually request and you can see the teams highly overestimate how much resources their application actually need, and you can see the usage there as well.

B

So you can see that the memory usage in terms of request is quite good and for CPU yeah there is a lot more burst to it, and now we built another cluster, an openshift 4 cluster, and there we implemented the quota scaler, and you can see the difference right. So we went from 9000 cores to give or take what is it 2000 or 1700, and also for memory.

B

So we users no longer have to worry about what they fill in for their namespace resource quota. It scales automatically, and one other thing we did is that for Dev and test workloads we saw that there were a lot of high CPU requests on a pod level, and this also allocates the research on the cluster, but for Devon test namespaces yeah you don't they can be a bit more flexible right so for different tests.

B

We automatically scale down the CPU to 10 Milli cores, but we still allow users to set their limits as high as you can. So for those who don't know, if you request resources in a pod, you have resources. A requests. Requests are research that you are guaranteed to have, whereas limits they are reached only if the resources are available on the cluster, and since we have so many compute nodes on the cluster with us, you're almost bound to hit your limits anyway.

B

So this allows us to gain in this margin here. So we have the request for V1 and then for Devon test. We force it lower, and that gives us a nice bump there. Obviously, for memory we can't do that because we can't, if you yeah memory, is not as compressible as you. So here are some other metrics. So here you can see the quota scaling and there you see the the port research mutated. I was talking about by implementing the code of scalar feature.

B

We save 7.6, 000 CPU cores in namespace resource quotas and with the mutation there for the dev and test name spaces. We save 500 CPU cores in in the request yeah. So I already mentioned this right, so the port research mutator for development and test namespace is only forces to Port CPU requests to be 10 Milli cores, but the CPU limits can be kept as it is all right. Last but not least, I have a demo.

B

It will be a very quick demo. It is a video, but it shows the full stack as I've just told you in the story, so we're gonna requisition a namespace via the ice HP API after the namespace has been requisitioned, we're going to scale up some some workload in it and then we're going to see the quota Auto scalar in action.

B

All right here we go so there we go here is a namespace specification for the igb iOS, so it has a name. We have some workload type. We have some resource that we wanted to have. Then we post this payload to our automation right, we create an a space, so this creates the namespaces on multiple clusters.

B

uh We can also see what operations it did. So we can see that there was some networking in there there was some asset registration and finally, it was created on the cluster.

B

Those are all the steps now if we actually look at what what is on the cluster, so the automation on the cluster, it creates its ijp projects back again, it has the the same name. It has the same quotas, it has the same workload type and there's the name, and this is then picked up by the project controller and it that creates in turn creates the namespace. It creates the resource code. So you can see that whatever in the specification is reflected in the research quota there, it also creates role bindings.

B

So the what groups have access to this namespace and you automatically get a quota scaler. Now we have our namespace, let's deploy something in it. So I have this nginx path. Here it has some CPU request. It has some CPU limits, so, let's make it happen, it's starting up and we can see we are using some of our quota right, we're using 250 megabytes out of one gigabyte and our quota we're using one out of four CPU limit, and so we can scale up to four replicas before we hit our research quota.

B

So we are completely full right. Now, if you try to create another pod, it will fail, because your research quota is full.

B

So we have four parts running and now we scale it up to six. We get all these scary error messages because well you're. It's forbidden right, you're, trying to break out of your quota, but this is where the quota autoscaler comes in. It's actually listening to these events, and it knows that you are trying to scale up outside your quota and it sees here it calculates based on the specifications of the replica set, how much resources.

B

Actually you need more and then it calls our our Automation and it updates the research quota and we can see it's actually updated there and all the parts are running. So that's what you mean that what I meant with the 100 efficient resource quota usage.

B

There's more I have I have one more slide, so uh I showed you a lot of stuff. I showed you some python code. I showed you some some other stuff.

B

um I mean it's really nice to boast about it, but it's not very useful for you. Unless you can touch it yourself. So can you have all this code and the answer is yes? Yes, you can so uh at kubecon. We will open source. This live, um yes, very cool, and what will you actually get? So you will get the code for the quota, Auto scale or the full component of the quota autoscaler. You will get it for the the operator, the python operator framework. We will also open source it.

B

You will not get a project controller. Yet we are also planning on open sourcing that maybe, but for now you will get the python operator framework and for the the orchestration you saw. We had all these lines where we have like these stages, that we run sequentially and the units all concurrent and that's what we call orchestration and that will also be open source.

B

So yeah that is it so now are there any questions.

B

There are questions, but you guys don't have microphones. So that's.

C

Hello, thank you for your talk. I want to ask. Do you use this in production because I've seen data from non-production and I wonder if I, if you already use that.

B

Yeah we we use everything uh in production. What you see here, but the components they are all replicated across different clusters. So for DTA there is an instance of the ihp API called autoscaler running and for production we have another instance running across multiple clusters as well. So, yes, everything you see is is running in prod.

C

I wonder why the choice of giving the data for non-production instead of production, because the it will be more impressive, maybe to see how much resources you save in production, yeah.

B

The the reasoning for this graph is that we noticed for Devon test a lot of resources are allocated and for production workloads. We are not very eager to scale down the request like we. If a team says hey, they need that many resource to run in production. We believe them. We don't we're not going to touch it or for Devon test. We can be a bit more aggressive and yeah.

B

That's why those metrics are a bit more interesting, but for the production metrics, they are perhaps a bit less because you don't have the Pod resource mutator, but for the quota Auto scaling level, it's still uh very beneficial.

B

You're welcome.

D

D

Similar workload and I wonder well: we've made some different decisions and we used actually some Market Solutions, like capsule, for creating namespaces automatically by the developers and themselves also the vertical port autoscaler, but the last one yeah I, it's not working properly for us. So maybe this is convenient to use, but I'm wondering why are you I mean? Why is Auntie developing these Solutions themselves? Because there are some solutions in the market.

B

Yeah, so for the namespace as a service part, we have a lot of interfaces inside Inu that are very ing specific, like, for example, our our networking implementation that yeah it is just very ing specific. So we cannot ask a vendor to build that for us uh and for the second part of the the scaling part, so it actually works with the horizontal Port Auto scale, as well as vertical or Auto scaling. You can use both of it and this code, autoscale kind of runs on top of that. So you can see it as like.

B

A management for your namespace research called as instead, and we talked actually with multiple vendors and multiple companies, and none of them had built it yet so, uh therefore we did it and you can also use it because, when it's open source, I hope that answers your question. Okay, thank you. Yeah.

E

One question because I I saw that there is a cmdb record against the namespace, so it can with this orchestration, can you get multiple namespace per C cmdb record or it's one namespace per cmdb record yeah.

B

So what we do is because we run many many clusters and all these clusters. They can have the same namespace right so, depending on how many times you want to replicate it. And what will you register in the seem to be? Is we combine the namespace name with an identifier of the cluster that you have so, let's say your name. Space is example right and then we will have example, one two three, depending on like one, two, three being the cluster identifiers.

B

Does that make sense.

B

F

Hi, thanks for the talk, do you have any restrictions on the upper limit for that quota? Auto scaler in case of misuse or security incidents.

B

Yes, yes, we do and I'm sad to say it's also necessary. Yeah. There is a.

B

There is a limit we have uh from the top of my head I think we have like 35 CPU cores and like a 150 gigs of RAM per namespace, it can be that uh some of the workloads that we have they actually have an extremely high load, and then they use all these regions and perhaps they need more, and usually these are yeah big consumers that we talk to them about their use case, because if a team hits these limits without talking to us, then usually they are over committing the resources that, like they are requesting more research than they actually need.

B

So this limit is in place and then we talk about it on a team to team basis if they need more.

A

We have a question in back in the back of the room: Robin. Yes,.

G

Hey you did explain about the support for 12 Factor applications. I was just curious that ichp supports stateful applications as well or.

B

Yeah, so we currently only provide States for applications for data services, so for most of our consumers, it's all stateless, 12 Factor, but for uh with inside Ing we have providers that offer data search, for example the elk stack. You might know the elk stack so.

G

These guys worked in ing for five years. Okay,.

B

So the yeah, so the elk stack, is a consumer of persistent storage. It uses the same Automation in the end.

A

uh Robin, are you okay to take one last question yeah, of course,.

H

Thank you. I have a small question, sincere basically in the like private cloud service provider. um How many clients do you have at this moment more or less.

B

So inside the cluster we have a little over 2, 000, namespaces and I. Think, together with the non-prodden products, we run about I, think it's roughly 5000 pods and that's per environment right. So we replicate everything in another data center as well. What is the metrics you're looking for or I was.

H

Because I was curious actually about the more technical uh part of this, which is the Persistence of the kubernetes Clusters that you're using under the hood. Basically, did you go with the custom solution regarding the storing of the state, or are you still using the standard one which is etcd yeah.

B

We're still using at CD yes, but for the so are you referring to the storage we use for our own apis or the storage we offer for our data services? Consumers? No.

H

The internal one specifically for the need of the kubernetes Clusters yeah.

B

Yeah, okay, thank you. Yeah we found at CD I think it's it's performance, but it's keep a good eye on the metrics and you may also need to fine tune it right. There are quite some nice articles about it where you can say: okay, I have a huge cluster. What metrics do I need to fiddle with, or what parameters yeah all right. If you have more questions, I'll be walking around still. So uh thank you very much for listening and have a great day.

A

Well, thank you very much. Robin um I am 100 sure that people will come find you not necessarily for good things. We have a now we're going to restart at 2 30 with the ARA and she's going to show us about the Gateway API. So you can stick around. You can go, get some refreshment, but please be on time we're going to start 230 sharp. Thank you.