National Energy Research Scientific Computing Center (NERSC) New User Training, June 2020, 16 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 14 Shifter

Description

Part of the NERSC New User Training on June 16, 2020.

Please see https://www.nersc.gov/users/training/events/new-user-training-june-16-2020/ for the training day agenda and presentation slides.

A

So great I'm, Ching cannon I'm also with the data analytics and Services Group, so all I'm, one of the developers and maintainer shifter, and also work on a number of container related efforts. So I'll be talking about container use that norske so quickly we'll go through an intro to containers, talk about the role of shifter, we'll walk through kind of some shifter in action, and then we'll look at container usage nurse, so first an intro to containers and shifter, and it's likely that many of you have used containers already.

A

So apologies if this is something you already familiar with, but in the container space that you know still, the the Big Kahuna is docker. That's the most popular kind of approach to using containers and what docker really did is it provided a simple set of tools to build ship and run applications or services? So you use the docker tool to build images. They capture all of the applications requirements you put those in a recipe or you can commit them manually.

A

You build this image and then you push it to docker hub, which is sort of like github for images or you can push them to a private registry, and then you can use that to share those images and then you can go to a execution post or a system and use a docker engine and now there's other ways. You do something like docker images to pull that image down and execute.

A

So what exactly is in an image you can kind of think of it as kind of like a snapshot of you know your hosts file system, if you were using Linux, for example, so it includes the base OS, the Linux operating system, so all the libraries that are included with the distribution. Typically, it could include other libraries that you've installed other tools, the user code itself, the part that really interested in in executing- and it can include data.

A

But there are some limitations on that, and then it can also include runtime settings things like environment variables, working directories. How you want to execute the application for network based services, there's other things that you can include in that. But those are typically not relevant, HPC container use cases, so you know those that have used docker for me with it. You know it's pretty cool, it's you might sort of wonder well like why don't we just have dr.

A

on the on quarry and let people use that and the primary reason- and this is actually starting to change, but one of the biggest reasons has been security. This docker security model it's kind of an all-or-nothing kind of things. So, if you're able to run docker on the system, then it it's almost means you can you effectively have root permissions on that system unless it uses some really new recent features.

A

Another issue is a kind of system architecture on our crate systems, as you've heard, we don't have local disk and dr. kind of assumes that, and so that creates some some barriers. We also want it to integrate and sort of play nice with our resource management system with slurm and there's some kind of mismatches there. And then this is really not as much of an issue.

A

These days used to be a big problem with just the kernel requirements that it needed, but these days usually what we're running is modern enough and then the other is just complexity. It's like another thing that we would have to manage on the system. So for this reason we nursed developed shifter. It's an example of an HPC runtime. This is actually one of the first to be developed. We wrote it back in like a 2015, I, think or something like that, maybe even earlier, and what we wanted to do.

A

There is kind of leverage as much of the docker ecosystem as we could so all the build tools and image repositories and stuff like that. We just replaced the runtime piece to make it more amenable to HPC systems. So we we really wanted to make sure we address these security issues, but also thought about things like scaling and performance as well, so shifters really designed to address those.

A

So while why will users like using containers and shifter one thing it is? Is you can develop an application on your desktop and test it locally and then run it push it and then run it on well. I've still got Edison well, I do need to update these slides, so you can write it on Cori and without you know, additional work. It also enables you to sort of solve your own dependency problems yourself. So you know rather than having to ask a nurse staff member.

A

Can you install this other package for me, you can just build the image. The way you want it put any tools you need and the world is your oyster. You can even pick a different OS than maybe what we provide on the system. So, for example, if you're not a big fan of Susa, then you could use CentOS or a bun or something like that, and that may make it easier for you to get the packages you need for your application and then we've seen some examples of this and Maurice talk.

A

It actually can improve application performance, especially start up in some cases, and I've got a slide talking about that and finally, it can sort of improve reproducibility and sharing. So you can it's easier to go back since that image is kind of a durable thing. You can go back and reuse that same image years down the road and even if things on the system have changed, that image has not, and so it can be good for making sure something can run over a long period of time.

A

So again, you know why it's useful for science I touched on some of these one. Is this reproducible at the angle? It really means that you can keep these images you can in the registry. You can put them in publications and it makes it easier for somebody to go and try to reproduce your work. For example, portability because you can take I can run it on my laptop I can run it. You know in a cloud system. If other sites have an HPC runtime, then you can run it there and then you can.

A

You know Cory or Perlmutter in the future and then it can really reduce the effort to because, rather than everybody, trying to rebuild and create this environment, you could all share it. For example, so you can have a community image that everybody just knows to use that it's already been validated so very quickly, just sort of shifter in doctor in action, we'll start with creating an image and somewhere on here, you'll notice that it says a laptop.

A

So we don't currently allow you to build images at nurse, mainly because you would need right now, you'd sort of need docker installed and for the reasons I listed above, we can't have docker on the system. So that's why you can't currently build images on quarry, but on your laptop or workstation or somewhere else, you would create a docker file.

A

This is what it looks like it's pretty simple, I, there's just a couple of kind of primitive commands that you can use to build up pretty complex applications, so you base it when you start, you start from some base image typically, so that could be a distribution like a bun or there may be one that already has a lot of the tools already installed, and you could start with that so you're, you know your collaboration might have a base image.

A

For example, this is an older syntax, but you usually have a maintainer that says you know who built that image. This is kind of an optional thing, but good practice, and then you typically use these two primitives run and add to build up the image so run basically says inside this environment on building. Do this operation and you're basically creating another layer of the onion for this environment and you just keep building up from there and then the final product is. This is an image.

A

The other thing you can do is add that basically a door copy that takes content from outside the container that you're building and brings it into it. So that could be from your local file system. It could be from a URL or something like that.

A

So these these two kind of operations are usually what you use to to build up your image, and so you would use docker to build the image like it's shown below, and then you would push that to a registry using docker push so similar to how you do like a push to get github repo.

A

Once your image is built and ready, then you can go to quarry and you can you can run it? Actually, you don't even need that module load boy, there's a couple of anachronism. Zand these slides, I'm realizing, are so here. It shows an example of how to submit that job through the batch system. You can also run it on the computer, I mean on the login nodes as well.

A

So this shows one good practice, which is you can actually specify the image you want to use inside your batch submission script and the integration we've done with shifter and slurm will actually make sure that that image is kind of prepared and ready to run, and so this this when you're trying to run at scale. This is the practice you want to use, but the key thing here is the only real difference in how you run the application is there's just this extra shifter in the middle there.

A

So you say s run like you normally would for doing this lerm slam on, but then, before your application, you just shifter in the path. Alternatively, you can also do you can specify the image as part of the shifter command, but again, if you're running a scale, you want to use it in the batch group.

A

One question is: can you use shifter with MP applications and the answer is yes, we've got a basically integration already done on the system. So if you build your application using fairly, you know recent version of in pitch and followed this recipe. Then, when we run when we run that application and with shifter will automatically bind kind of map in the MPI libraries for the crate system into your container environment and so you'll get the optimal performance just like if you used you've built it natively on the system.

A

So this is the common model that we recommend and there's documentation on the dock tation site on how to do this. So this is just a quick example here we're going to base off of a base image that we developed a long time ago that has NPI already installed. We compile add in our example, application compiling it using MPI cc, and then we can build that image just like we showed before and then use this recipe to execute. It.

A

Laurie mentioned that we recommend shift or in docker images for MP applications at scale, and this kind of demonstrates it. So this is the pine amic benchmark, which is a oops sorry, which is a benchmark that simulates a kind of a complex Python application loading up a lot of libraries.

A

So here we're running it not even a really large scales, but we show that python application being in different file system, so that could include the lustre scratch file system or the ZFS file system and, in general, the best performance we get is with shifter, and it's for those reasons that Laurie was just describing earlier Python. It's got to go through and kind of walk the file tree to build up the namespace for all the libraries that are present and you think about every node.

A

Every process starting up they've got a it has to do that operation, and so it just leads to a lot of metadata operations on the file system and with something like lustre. That means all of that traffic goes back to a small number of nodes along the bottle night, it was shifter what it conducted that can be resolved kind of locally on the node, because it kind of has all the metadata already in the image.

A

So again, this is something we identified early on in this band on on just a second, my network went unstable for a second okay. Can people still hear me? Okay, yeah, okay, good, been having internet issues in the last couple of days. Okay, quickly, shifter versus docker, just a few things that you need to be aware of the biggest is the the first two. So one is the processes run.

A

Is you run as yourself not as route, so sometimes you'll run into images where they just weren't designed with this in mind, and it could cause problems typically, there's easy ways to work around that. But you know just something to watch out for another. Is that images with shifter mounts the images up read-only? This is really critical to how the scaling works. Some images will are designed to maybe make changes to the configuration at runtime and maybe in a directory.

A

That's you know like an ad see, for example, and so this could could cause issues again. There ways to work around it, but can be a little lava gotcha. You know less of an issue, but we mount up home directories and global file systems automatically. So those are already going to be in your.

A

You know visible and you typically will start out and the directory that you started shifter from and then there are some direct docker file directives that are supported by by shifter, but again those term there's top two are the main ones that will typically have people.

A

You can just like what Ducker you can do volume out, so you can take path outside the shifter and make it present inside the container in a path of your choosing. So this can be you for making you can sort of always have your data show up as /data, for example, and then that way, as you move from different systems, you can kind of abstract out the location of things. There's a feature also called per node right cache. That creates something.

A

That's like a local disk on a on a node has not quite as good of performance, but it can give give good performance for certain patterns. So, for example, if you want to run a database on a computer note which we've had people actually do this, for some specific reasons, this feature here can be useful for that. Another is when you're running.

A

Gosh I'm blinking, all the sudden spark it wants to have a local disk, it kind of assumes local disk, and so we've used this. In fact, we added support for this because it's partly and then.

A

Finally, if you have content that you don't want to put on a public registry like docker hub, we do operate a private registry and you can push your images to that.

A

You may I can't remember if we had earlier talks about spin spin as our system for hosting kind of persistent edge services. It's it's also container based but again, there's a distinction. So shifter is what we use for running HPC, Jobs containerized, the applications that are going to run for a period of time and exit, whereas spin is what we'd use to run containerized services, so that might be things like databases or web portals or web services, for example,.

A

And just to show, while we originally designed chip to really for data intensive kind of applications and things that maybe we didn't envision running super large scales, we have seen examples of people running to the full size of a quarry. So this is probably the biggest kind of hero run that I've seen with containers- and this was a cosmic microwave background. Samuel that ran across all of quarry. Kl pulls full-scale and in this case, shifter was really pivotal to them getting these jobs to run because it was a Python application around some other libraries.

A

So Python was kind of a glue code for all of this and the startup time. Because of these issues, we talked about earlier we're just killing their ability to get these jobs run on quarry until they moved it to containers.

A

All right and then a little bit about usage of containers at nurse. This is a snapshot from 2018 trying to generate one for for this past year, but you can see a number of different applications that have run using containers. A lot of these do happen to come from the high-energy physics and experimental kind of use, cases and applications, but we do see others besides that as well and we've gone from containers being you know, it's just a very marginal amount in 2014 to over the past couple of years.

A

It's you know in the six to eight percent range, that's what we're seeing and there have been. You know thousands of different images that people pull down and run at our skin 900 over 900, unique users that have used shifter at some point in time to learn more there's, there's information on the doc site and again just like we've said with the other things. If you want we're, also looking for ways to improve the documentation.

A

So if you have suggestions on things that add, please let us know, there's other training material out there that you can look at so we've. Given us tutorials at things like SD and ISE, and the X scale summit meetings, and so those are out there, if you want to consult them and then there's numerous resources, of course on docker and that's it maybe I didn't have time to add a slide on it. Well, I guess we're at time. So I'll just leave that and see. If there's any questions, ok.

B

So Jane there's one question: if you have some GPU architecture, where applications is there a way to build a docker container away from quarry in such a way that those reflect the quarry GPU architecture and not the architecture of the building.

A

Okay, so I think about it. Maybe if I'm gonna interpret the question is, can I use containers too well in just a second, let me let my network um can I use a container to to run GPU applications and get you know sort of native performance, and the answer is yes: we've already got recipes, I think up on the dockside on how to use containers on the GPU nodes and there's some flags that are enabled by default. So for the most part it should just kind of magically work.

A

But if you run the cases where it does that, let us know we can work through those, but the short answer is yes: it should be possible.