National Energy Research Scientific Computing Center (NERSC) Introduction to GPU Training, February 2020, 14 Mar 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Intro to GPU: 02 How to Use Cori GPU

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

So if thanks jack for the very forward-looking talk, I'm gonna, take you back in time to the here and now. That was a really good luck into the future of nurse.

A

But today the resources we have to work with are the quarry, GPU nodes and so I'm gonna give you a really high-level overview of how to access them via Cori and the resources that we have available and some of the software on the system. So just a couple of slides and then we'll try and get her and logged in and accessing the GPU nodes via slurm, so that we don't have to eat into the hands-on exercise time later.

A

Okay, so first things. First, it always helps to read the manual and our information for the GPU nodes for is on a slightly different website than the standard nurse documentation pages. So I'd like to point everyone to Doc's dev nurse akka of /c GPU, which has a couple pages about the notes themselves. The hardware information, the CPU and GPU information, including some of the no topology people, are really interested in, especially for MPI use. It's got instructions as to how to access the nodes via slurm.

A

It involves a little bit of module shuffling around and the syntax is slightly different from the usual quarry. Node access, syntax- and we have information about the software on the machine- are on these nodes again slightly different than a lot of the standard quarry software that you might be used to working with, and so we've got information about the particular software that we support on the GPU nodes, some code, examples, both source code and compilation, and some kind of frequently asked questions that we get or have gotten from early access users.

A

Okay, so the most basic thing we'll do, which is I'll, go through another couple, slides and then I'll leave, probably 10 to 15 minutes for everyone to do. This is to to get onto the GPU nodes. You'll first want to log in to Cori, and then we generally recommend doing a module purge which gets rid of all of the module files that are loaded by default.

A

This is because the Cori GPU nodes have a different operating system and a slightly different software environment than the Haswell and Kol nodes, and so the vast majority of the standard Cori software will not work on the GPU nodes. There are a couple exceptions to this. I think Helens gonna cover that in detail later, but this is kind of our standard like purge all the modules and then load only the ones that you'll need, CUDA, GCC, etc.

A

So, in addition to having a slightly different operating system, the nodes live in a separate slurm controller. So we have you purged the modules and then load was called the es, ler module and so doing this you'll use the same standard slurm commands, but it talks to a slightly different slurm controller. It's where, if anyone's ever used, the transfer QoS, you might have used the es, slur module, and so the quarry GPU nodes are also in the es salaam controller.

A

So once you've got those two modules steps done, you can, what will will be using interactive sessions today, and so we'll do an S Alec, the newest thing and probably, or the most different thing you'll see, is the GPU syntax so see if you've used core you'll be familiar with that actually has well our the capital ckl in this command.

A

I want one node in this particular syntax, I'm saying please give me one GPU and then I would like it for 30 minutes, build it to this account, and we have the GPU underscore training reservation for today.

A

The project that people are have temporary access to the GPU new nodes under is m3 502, so I'll give this a shot in a couple minutes after I've gone through another couple slides, but this is kind of the bottom line exercise for, for this short talk.

A

Another thing that is different about the quarry GPU nodes from the standard has well and can L know, does that in some cases on hezbollah al you will use s run, but it's not mandatory per se. But in order for the system to see the GPU is when you're running any given piece of software so to speak, all commands must be run through SR on and so, if you're doing a really simple command. So you look you log in and get a GPU node, and you try something like nvidia SMI.

A

Without using s run, it's gonna tell you. I can't find the GPUs I have no idea what you're talking about. So this trips me up a lot when I'm when I'm doing development stuff. So this is one key thing to keep in mind for the tutorial later today and trying out any examples is that everything you do should be prefixed with s run and then that will pick up on the GPU nodes that you've the GPUs that you've requested on the node.

A

Another thing that I should point out is that in requesting the GPUs is that you'll probably want to think about it. A little bit differently than Haswell are: can all nodes in terms of resource allocation, so, unless you're doing something like specifically requesting a shared QoS or a shared job for a Haswell or knights landing, node you'll get the full node. So you have all the cores. You have all the resources on the node. This is not true for the core EGP nodes.

A

We don't have a lot of them relative to Verve users that we have, and so the core GPU nodes are shared by default, and so what that means is that we kind of anticipate that users will be good citizens and requests only what you need so for small examples. You'll want to do just one GPU, every node has eight GPUs on them and the small examples we have today I think one should be sufficient. Helen, do you have any multi node multi-gpu, okay yeah?

A

So we we ask kindly that people restrict themselves to one GPU, so that they're, so that everyone is able to access the resources simultaneously.

A

And so that's just another thing to keep in mind, because it's it's a different way of thinking about how you access the resources on the machine.

A

As I alluded to before, we have a software page on our documentation site for the nodes, where we warn users that only some modules on the system will work with the core GPU nodes, but the vast majority of the modules software modules won't.

A

So the compilers that are available for use of GPU nodes are GCC. The PGI compiler, which Jack talked about the Intel compiler and LLVM, can be used. The GPU nodes support MPI via open MPI and m'v a pitch to helen layer, we'll be talking about using open, MP and open ACC directives. We have a couple different versions of the CUDA SDK available on the nodes and also I know. A lot of people are interested in machine learning and we have modules for tensorflow and pi torch. I.

A

Think today, we're going to be mostly focusing on open, MPP, open, MP, open, ACC and CUDA examples could I see CUDA, Fortran and I have one small Python example.

A

And so please, let us know if people are interested, I, guess in tensorflow or PI torch for a future future tutorial okay. So this is the time for the login exercise. So we'll have everyone do login and get a local copy of the training materials and then also try and login to a GPU or request to GPU and then so I'll just kind of float around I guess for the rest of the the time.

A

For this talk, I want to make sure that everyone is able to do what they need to do so, since we only have an hour for the hands-on exercise exercises this afternoon and there's a lot of material in the hands-on exercises, so I think that's it for my slides and if there's any questions, but if not, please try to do the login exercise and give us a shout if we've, if you run into any issues.

A

Especially one time, how do you get that special training.

A

Super basic question: there's a various population, things have to happen on the GPU node or getting a bb-gun, our normal Quarry node and then good question um generally. As long as you do. As long as you do the module purge and then load only the specific modules you want. You can do the cross compilation on something like a login node, but in general we suggest that people come by compile things on the GP nodes, but you can, you can usually successfully cross, compile things so standard practice is to SL at first and then everything else.

A

Yeah I usually do the module purge and then and then do the s alloc and do your compilation and then test your code, okay, chip, yeah. It's a good question.