Cloud Native Computing Foundation KubeCon + CloudNativeCon + Open Source Summit China 2021, 18 Dec 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Keynote: Science in Cloud Native - Ricardo Rocha, Computing Engineer, CERN

Description

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Cloud Native 中的科学 | Science in Cloud Native - Ricardo Rocha, Computing Engineer, CERN

A

Well, hi. Everyone welcome to this session on science in cloud native. My name is ricardo russia, I'm a computing engineer at cern today.

A

I will talk to you about the usage of cloud native projects in the scientific area and give you an overview of the status of this, but also the challenges that are still existing in terms of using this type of software in for science, I'll start by giving you an overview of the status of the usage in this area, so cloud native is really fastly changing the the way that scientific infrastructure is built and maintained.

A

There are a couple of areas where this is really already very significant. The first one I would highlight is reproducibility.

A

So the fact that container is a very well defined unit that can be easily deployed in different environments also helps a lot with reproducibility, which is a key for for science. So the fact that we can take an existing component even an old component and deploy it in a current or say more modern infrastructure that will come up in 10 years. This is really making a huge impact in terms of of redoing old analysis and making sure that things can be taken forward in the future.

A

The second part is this idea, again of having a container being built once using a well-defined way and being able to run it anywhere in different platforms. So in scientific infrastructure is very often heterogeneous. Scientists will try to use as much as infrastructure as it's available to them, and this means having to comply with different systems underneath.

A

So the idea that people are standardizing in containers and container orchestration apis really means that that this task is uh is being simplified for for end users and finally, which is also related to what we just talked once you have this single unit where you have wrapped your codes and data, then sharing these units with your colleagues is much easier.

A

This means this is something that is really, of course, key to scientific collaboration. The fact that you can have your analysis and share it easily for your colleagues to reproduce.

A

So all of this together means that the infrastructure itself is very much simplified, and by doing this and having access to these tools, uh scientists can spend a lot more time doing actual science than maintaining the underlying infrastructure.

A

One of the things that is also coming out of this is the access to a much larger set of resources.

A

Thanks to these standardized apis around the cloud native tools, there are still, of course, a few challenges, and I will highlight three today: the first one is software distribution, uh when, when we start thinking that the scientific workloads are made of thousands or tens of thousands of individual pieces, pushing the software to to where, where the analysis is going to be done, is key and doing this in efficient mannery manner is, of course key. The second one is ruthless environments.

A

A lot of the infrastructure scientists have access to has a very strict policies in terms of what and how people are able to run their workloads. These are shared environments, so running them in unprivileged is a requirement, and third is the advanced scheduling. This is where the biggest difference is towards uh compared to traditional I.t happen when you're doing scientific workloads- and this is uh things like batch batch-like workloads where uh queuing and priorities and fair share, are very important, I'll cover a bit more of that as well, so starting with the software distribution.

A

So, ideally container images will be very well layered and optimized having images that are over 10 gigabytes and not really well. Layered is really not uncommon in the in the scientific field. uh In reality, even images like that are, for example, 15 or 20 gigabyte.

A

In total, uh the actual workload will require less than six percent of that uh to to to run properly so that it's very inefficient to have to download the full image before starting your workload. um If you think that these clusters can be huge, hundreds or thousands of nodes, then this problem is even bigger. You need to pull the images across all the the nodes if the images are very large. This imposes a huge pressure in terms of network and storage.

A

If you think that, in addition, you're running thousands or tens of thousands of parallel jobs, this problem is even made worse to help with this. uh As I mentioned, the idea would be to have optimized images. This is not always a possibility.

A

The second one is caching, and this is particularly important if you are, if you have geographically distributed clusters or nodes in your cluster, this will help a lot in with efficiency and, of course, peer-to-peer distribution for offline distribution of the software is also key, but one thing that really helps is this concept of lazy pulling.

A

This is the idea that, instead of downloading the full image before deploying your workload, you will instead do kind of a remote mount of the image and gradually download only the content that is actually required and requested by the workload after the container is running. So this means that you could have a flat startup time of your container and then access the actual container image contents uh as the the workload requests it.

A

One example of an implementation of this is this: remote snapshot in container d, uh this uses a concept called sql tar um which, basically, if you know how a docker image works, underneath it's pretty much a a set of terribles each one for for each layer in the image, but one one, smart uh concept that is used here is that a tar of tars- it's still a validar. So it's fully backwards compatible with the existing uh container image formats.

A

But by doing a tower of tar, you end up with a seekable tar, so you can basically navigate the tar to find, for example, individual files that are being requested by the workload, and this is pretty much how it happens. uh The the runtime will, instead of downloading the the image before launching the workload the container will launch the container and expect that the the data will be made available when needed in terms of performance. This is uh has a dramatic impact.

A

You can see pretty much flat the startup time, no matter the image you're using in these cases. It's actually pretty small images, but if you, if you extrapolate these two images of 15 or 20 gigabytes, the startup time will be very similar. Of course, then the the the workloads can be a bit slower as they request the data, but considering we are only requesting a very small amount of the total data. There's a very big impact in terms of reduced network pressure and storage needed on the nodes as well.

A

The second challenge I mentioned earlier is this idea of ruthless environments, and this is particularly important for high performance computing clusters.

A

A lot of the scientific workloads are deployed using these massive supercomputers, which are shared machines between multiple end users, so there's a very much restricted access on what the end users can do in this in this environment.

A

This is not a good fit for the way kubernetes and other projects around it are built today and the expectations they have. But there is this effort to have what's called ruthless containers, and this is a possibility of helping to onboard more of these resources into this ecosystem.

A

So a link here to the project.

A

The goal is really to manage containers as an unprivileged user, and here is not running the container, only the container itself and privilege, but to run the container run time as well and privilege, and this will really lower the barrier of onboarding hpc environments into a cloud native uh deployments.

A

They have a very nice definition of what an enemy privilege user is it's a user that is not in the good graces of the administrator, which is a very nice definition of what we expect here.

A

There is support already uh for this kind of uh deployment in docker apartment, build kit, container d, so there's already quite a lot of things that can be done using this project.

A

Having support in container d also means that um tools like kind mini cube and kubernetes, using a distribution called usernatives as well as k3s case 3s, are already an option to to to try out this type of workloads.

A

Finally, I'll mention the third challenge, which is advanced scheduling, and this is really again the key of the differences towards compared to traditional it deployments.

A

These are features that are really requirement required for traditional hpc or htc high throughput computing type of workloads, I'll mention here priority queues. This is the idea that, as you want to maximize the usage of the clusters, you're actually allowing workloads to be queued before me being submitted, and these queues have priorities for higher and lower priority workloads.

A

This is also meaning that there is a requirement for preempting running workloads to replace them with higher priority ones, since this workloads can take a few hours or even a few days.

A

This is something that is not existing in the built-in schedulers, but there are multiple projects that are focusing on this. The second requirement is fair share. This is the notion that uh um you want to optimize again the usage of the the cluster, so you allow some teams or users to have more workloads running than what their usual quota would be if other users are not using completely their own quota, but over time, this should compensate.

A

So you want to balance uh this for everyone to have what their expected quota should be in a longer period of time. The third requirement is gang scheduling. This is the idea of submitting multiple jobs. At the same time, this is critical for for workloads like mpi, where you need communication between the different pieces, so you need to be able to schedule multiple workloads at the exact same time, otherwise they wouldn't run properly. This is also something that has to be been built into skyzola.

A

The last one I will mention is again: we talked about workload distribution in multiple heterogeneous environments. Another requirement is to do this across multiple clusters. Again, the goal is always to maximize the access to whatever resources are available. Multi-Cluster is one of them, of course, so there are some projects that are really putting an effort into providing this in in our ecosystem.

A

uh The first one I'll mention is volcano. It's the cloud native batch system and it really tries to offer all the functionality from a traditional batch system, but using cloud native apis and tools. The second one is admiralty, and here the focus is more on the multi-cluster part, and they do this by having this notion of proxy proxy pods on a top level cluster that then have the actual workload pods running on child clusters.

A

The third one is armada, and this is again focusing on batch workloads and they focus on scheduling and running of this workload, specifically on kubernetes and finally I'll mention the virtual kubelet. This is kind of masquerading the kubelet or the node in a kubernetes cluster, from the actual resources that serve the node and in reality this can be an actual node, but it can also be a remote api, including the api of an external kubernetes cluster or a serverless platform.

A

So all of this together uh really tries to to to make these goals of uh improving the access to to all types of resources to to scientists in using the the concepts that scientists are already used to. There's a lot more going on. One of the really promising developments comes also from six scheduling where they try to onboard all of these concepts into the kubernetes scheduler, and this is something that will be evolving fast and then really looking forward to see progress there.

A

So this comes to the end of my uh talk today. um I hope I gave an overview of the what the excitement uh towards cloud native in the science area is and what the challenges that still exist are as well.

A

A lot of this discussion happens in groups like the cncf research user group, and I put here the link and also these projects fall under the technical advisory group tag runtime in the cncf. So this is also where a lot of the discussion happens. So I hope this is only a teaser and I look forward to cook on in may in valencia with a lot more news in this area and for everyone. Listening enjoy cook on china and hope to see you all soon.