National Energy Research Scientific Computing Center (NERSC) Using Perlmutter Training, January 2022, 11 Jan 2022

Previous Meeting

⏯

youtube image

►

From YouTube: Using Python and Jupyter on Perlmutter

Description

Part of the Using Perlmutter Training, Jan 5-7, 2022. Slides and more details are available at https://www.nersc.gov/users/training/events/using-perlmutter-training-jan2022/

A

um Hi everybody thanks for being here uh in this short 30 minutes. We're gonna try to cram in a lot of information. So please bear with us. We'll have three presenters um I'll be the first. My name is laurie steffy, I'm a scientific data, architect and you'll also be hearing from daniel marcol and roland thomas.

A

Okay, so what we're going to try to cover here is um general python advice. A lot of this will apply to both cory and promoter. We're going to talk about using python on gpus, I'm sure a topic. Many of you are curious about we'll talk about using jupiter on promoter and we're going to try really hard to leave about five minutes at the end for an open q a so. This would be a good opportunity for you to ask us questions all right. So, let's get started, um I assume most of you have used corey.

A

So there are a few differences and I'll try to highlight the most important things here. One thing hopefully you'll notice and like is that we'll have a more uniform uh python and jupiter environment, so the python jupiter 3 kernel will be based on the current default python module. So right now that it's a little bit different, but this should make it easier to keep track of your packages.

A

Also, the the kernel will now have the environment variable python user base, which means that it will be able to find pip packages. So this also will result in more uniform behavior and ron can tell you a little bit more about that.

A

um That said, many of the kind of best practices we've been advertising on quarry are still in effect, so for good performance. You can install your software stack on our global common file system. um This is a project-based file system, so you don't have your own directory, but you can install in your projects directory.

A

um If you run out of space and a ticket, we can try to get you more. um Another best practice is to use a custom combat environment for any kind of software you need that is not in the module we provide. This is a nice sandbox, it's easy to get rid of if something goes wrong and it's also very easy to use in jupiter via the ipython kernel conversion and finally, of course we are still encouraging shifters, especially at large scale, and this is also easily used in jupiter um as a kernel.

A

So if you need help with this, please let us know. Oh and finally, perimeter has uh 6 000 gpus. So, of course, the taking advantage of those will be a little different okay. So these are some common problems. We've seen people running into you so far on korean problem.

B

A

What we tell you can help avoid some of these issues. uh So, first, these systems are sharing file systems, and that means a lot of um system. Customizations like in your dot files, for example, uh will be shared. You most likely don't want the same kinds of software packages and environments on these systems so be careful. First of all, um try not to put too much in your dot file, and if you do, you need to periodically review it make sure you still want it.

A

um If you put system specific things in there, you might shoot yourself in the foot.

A

um The next thing is, if you're building custom environments most likely, you want to keep a separate set for curry and chlorometer mpi for pi, for example, will definitely not work um from one system to the other, so you'll want to maintain separate installs.

A

One tip is to append the system name to your environment, so you can keep track, but of course you can do this. However, you like and uh pip can be a little more dangerous now with the shared system, so we'll cover that okay, so one big change on perimeter now and actually coming shortly to promoter is that you'll finally be able to use the conda activate command. Some of you may be familiar with our current setup, which requires uh either source activate, um which is here I don't know if you can see my mouse.

A

uh So this is the current setup or um using a conda init conda activate, which is not so good because it adds customization to your battery c. So now,.

B

A

New setup, which is on promoter today, you can just load the python module and conda activate and deactivate with no changes to your setup. So that's great! Please try it out. um If you want to know more, we have some pending updates to the docs that you can check out all right so for parallels.

A

Achieving parallelism, mpi for pi is a very common library and so a lot of our user questions have been around. How do I build and use mpi for pi, um so we provide it in uh the python module, so it's already there. You can load it and test it out, but if you're going to build your own, like- I think many of you will. This is the recipe that we have uh iterated on and settled upon for now.

A

So we recommend the gnu programming environment because at least last I tried we can't build mpi for privacy and video compilers you'll need kudo tool, kit and python, of course, build your custom environment. You'll need this flag here um so that you can access the gpu and finally, uh there's some other stuff in here. We'll talk about on the next slide, so note that when you've built mpi for pi with cuda support, it will ask for the crew libraries at runtime.

A

So even if you're running cpu only code, you will still need the cuda toolkit, mod or library loaded in the path. um It's a little bit.

B

C

A

That's how it works, okay and finally be careful with pip pip will try to be clever and help you and it will search our shared file systems for packages and say: oh hey, like I found an mpi reply. Let's use that, but if it's finding one that you built on corey, you won't want that because it doesn't work so be careful.

A

The best practices for pip, in my opinion, are that you should use pip within a custom environment. So it's within your sandbox. You know exactly where it is. If you get rid of your environment, you've. Also gotten rid of your package makes it easier.

A

Also, when you're pip installing use the force flag, this will force a rebuild, so it will prevent them. It will prevent pip from using packages on the wrong system. This appeared in our mpi for pi recipe. Okay. Finally, if you do use pip user- and I know some of you do- it will install to places specified by the python user base environment variable that's here, so this is in your search path, as defined in the module, which can be a little bit tricky depending on how you've set up your environment.

A

So it's safe to delete this at any time. If you want to clean up or save space, and now we'll move on to daniel margolo.

C

Hi thanks lori um yeah, so today I hope to just give in a couple of minutes um a few I'll present, a few slides on how to get started um with gpus and python on on the nerf systems, particular perlmutter.

C

um So the first thing um so starting in python, you're, probably familiar with numpy and scipy libraries that sort of form the backbone of scientific computing, ecosystem and python, and so those libraries don't use utilized gpus out of the box.

C

But there are many python gpu frameworks out there, um some of the very popular ones, kind of serve as drop-in replacements to those core backbone. Libraries uh like numpy and scipy, as well as pandas and scikit-learn. So those those popular gpu frameworks are are coupei and there's also a number of them in the the rapids library uh ecosystem.

C

There's also coup numeric, um which is not currently available or working on on promoter. But it's something that aims to be a drop-in replacement for for numpy, but also scale across many many gpus there's also the popular machine learning libraries such as pytorch and tensorflow that also support a lot of general gpu computing features such as working with array, data structures and performing linear algebra operations on those. So if.

A

Your application is already utilizing.

C

Pi torch or tensorflow, those could be good entry points into to leveraging gpus and other parts of your application as well.

C

And then, if you really want to write your own gpu kernels, there are a number of libraries that that will let you do that as well from your python code. So some of those are number pi, opencl and pycuda.

C

And one nice thing about some of these libraries in the ecosystem in general. um Well, I guess it's there's kind of two sides to it. There's all of these libraries are kind of you know, they're they're, trying to accomplish different things and they've implemented things in different ways, and so some of them are better at certain things than others, and so there's a lot of trade-offs that you have to consider when choosing a framework.

C

um There is some effort in various places to try to coordinate and kind of share a common data structure format. So you can pass these array structures that are created in these different libraries between each other and then there's also some effort in the community community to kind of standardize what it means to have an array api. So on the right here, there's a little screenshot of the api for the.

A

C

Average function from a bunch of these libraries and they all have a slightly different interface. So that could be a little bit of a headache when trying to mix and match these various libraries. um But the good news is that there's there's some effort to to coordinate and improve that going forward.

C

um So getting started on pearl butter, um as lori mentioned, um you'll want to load, use the the toolkit and python modules, and you can you start by loading those modules and you have to pay attention to the version that's brought in from the kudo toolkit module. So in this case here I'm highlighting that the version of cuda is 11.4 currently by default.

C

So that's important. For example, if you wanted to use coupey when you install coupey, you have to make sure you specify the version that corresponds to cuda and then it's just as simple as that, as after installing coupe, I you can kind of import coupe I as cp, just as you would import numpy as np, and you can create an array on the gpu device and print that out. So there's just a few lines of code and you're already doing gpu computing in python.

C

um And so here's one more kind of the next step is: how do you, you know, move data between the cpu and gpu device so, as was mentioned earlier today, it's an important consideration to keep in mind when you're thinking about gpu and just doing.

A

C

Cpu gpu programming in.

A

General, you have to.

C

Keep this idea of where your, where your data is on the cpu or on the gpu. So here are just a few commands that kind of show how you can do that data movement in using coupon.

C

And I don't think we have time to really go into kind of this more advanced gpu programming, but you you can do things this. This slide here is meant to kind of show an example of writing your own kind of lower level kernels using using the number um just in time, compilation to write a kernel um and then also how you can use, mix and match these kind of number and coupon, as well as using kind of this generalized uh array api um to operate on that that data as well.

C

um Another thing that's very useful, as you're kind of starting off and trying to understand what your what your gpu code is actually doing is to visualize uh is to to run a profiler and visualize. What's what's actually happening running on the gpu versus cpu, so you can use the nvidia insight systems, profiling, tool to profile your application, um and you can use the kind of standard, nvtx markers and ranges to decorate your code um and add your own labels that you would be able to visualize in the profile.

C

So this is really helpful to do when you're kind of first porting applications or trying to understand the performance. It's really helpful to use a profiling tool. So I just wanted to plug that.

C

And finally, you're probably wondering if you're just getting started is my code, a good fit for a gpu. So there's a lot of discussion about this in the presentations earlier today, and so just reiterating. Some of those points of cpus are good at doing things very quickly through kind of low latency and gpus are better at high throughputs.

C

So on the right here, I've kind of done just illustrated a very simple benchmark of just doing the dot product between two matrices at various sizes and comparing the the processing time using numpy or coupei, and so numpy has a lower processing time for small array sizes, but at a certain scale above 200 or so here the size.

C

The coupe version is much faster. So here you can kind of see the the overhead of running things on a gpu is.

A

C

Worth it at larger array sizes, so if you.

A

Have more questions, there's kind of more.

C

Discussion about this for for specif for python applications, specifically in the nurse documentation or if you want to you, can open a ticket as well.

C

Next up will be roland talking about jupiter on promoter.

B

Hi, uh I'm rollin thomas I'm from the data science engagement group at nurse now, um and I'm going to talk to you a little bit about jupiter on perlmutter what you need to know. uh Moving from corey to perlmutter, um I've got about seven slides to answer kind of three high level questions that I think everybody will be on people's minds. um These are basically how do I make sure that I can run notebooks on promoter phase one.

B

How do I actually make sure that I'm using the gpus um from jupiter um on promoter and is there anything special that you need to keep in mind when moving workflows from cory to promoter using jupiter all right, so um the first thing you'll need to do is just make sure that you can access promutter with jupiter by logging in at jupiter.nurse.gov, and if you could advance um to the animation just real quickly um what you ought to see when you log in and go to the home or the console page.

B

Is this extra new row above the quarry set of buttons? There should be probably three buttons there for you if that row doesn't show up for you, then what's needed is for you to be to have promoter added to your list of server logins in iris. um If you can't figure out how to get that done, um file a you can file a ticket. You everybody here in the training, probably should be able to see this line of buttons.

B

A

B

All right, um what do the different buttons actually do when you click them? um The far left button uh labeled shared shared cpu node uh on promoter. um What that does is that's the equivalent of running on the query shared cpu node, um it's just that it's on promoter, so these are shared login nodes. um They are not chart. Your usage is not charged and will not be charged in the future. There are no limits currently on cpu, gpu or memory uh usage right now and like on quarry.

B

You can use uh this configuration mainly for debugging testing and developing jupyter notebooks um things that are not supposed to be kind of compute intensive or really long running things. That's really more for jobs, um the middle button, exclusive gpu note what that does. Is that gets you a a notebook running on a promoter phase, one compute node, all to yourself.

B

uh When we start charging, um the account that will be charged will be some sensible default. That might be your default gpu account. uh If we have something like that, um there's a six hour time limit there, so you can run a notebook for up to up to six hours, which is the longest. uh You can run a job right now on promoter anyway.

B

This is good for interactive gpu work where you're um using the gpus a lot and things that maybe need to run a little bit longer, so things that are maybe kind of more compute intensive, um some things that will not fit in the resources you get on the shared node and then finally, the button all the way to the right pops up a menu that allows you to configure the parameters of the job that your notebook is going to be running in, and this is a way for you to be able to use more than one node.

B

um You can use up to four nodes, so 16 gpus here um you can customize how your slurm allocation works. If you don't want to use your default, um your default account you can switch to another account, use a reservation or whatever.

B

What you want to use this, for is when you need to scale up to using multiple nodes. If four nodes turns out to not be enough for your needs, please contact me I'd love to hear about your needs and, and we can work on on getting you the resources you want to use next slide um again. How do you actually make use of the gpus? What do you actually have on the login nodes? um I think it was mentioned. Probably earlier in the training.

B

There is one kind of shared a100 on the login nodes right now, it's kind of a free-for-all, so if you're using it you're the person using it, if somebody else is using it you're not going to be able to use it in the future, we might have multi-instance gpu set up on um on the login nodes uh so that more people could use the gpu actually interleave work on the gpu more cleanly, um the exclusive login, uh the exclusive gpu node gets you for a100s and then, of course, on the configurable option.

B

You can have up to 16 a100s next slide, and how do you make sure that you're actually using uh the gpus on the node that you're actually on so you've, probably seen nvidia smi? You can do that from the notebook you can do that from the terminal tab in jupiter lab.

B

There is a lab extension that we have installed. um There's this kind of funny uh icon. I think it's a gpu card. It's supposed to depict a gpu card, it's the third icon down on the side, on the left hand, side of the jupiter lab user interface there, if you click that what you'll see are a few different dashboards that you can open up for gpu utilization memory, consumption, um pcie, throughput and things like that. um This lets you monitor gpu usage on the node that you're on.

B

So it doesn't, let you see other gpu usage of gpus on other nodes in your allocation. You need to do something different if you want to see a graphical representation of that right now, next slide.

B

If you're a user of python who likes to use the das framework for distributed parallelism, uh we it is possible to use the das dashboard to monitor everything. That's happening in your cluster, observe progress profile, your workflow, which is really what it's for right now. The way that you need to do. That is, you need to open up a separate browser tab and then copy the url into the url search bar. For that, however, um there is another lab extension um that we used to have installed a long time ago, but was kind of problematic.

B

um The issues with that lab extension are just now being addressed and so we'll probably be putting the das lab extension into the.

B

Maybe the next deployment of the jupiter lab stack that we do and that will provide a similar kind of interface to what you can get from mv dashboard, which is what I showed on the previous slide except it'll. Let you see all of the activity happening across all the nodes you're using in your das cluster.

C

B

Okay, this is my last slide. What are the high level things you should know moving. uh You know, as a jupiter user from corey to perlmutter.

B

um One big difference is that on the shared node notebook uh configuration on corey, there's only four nodes dedicated to jupiter use that are repurposed login nodes, basically um out of all 24 login nodes, we took four and set those aside for jupiter use and so about. 250 users, usually on a daily basis, are kind of crammed into those four nodes, and so there's cpu and memory limits in place to keep those nodes healthy.

B

On the promoter side, there are many more login nodes, there are 40 login nodes and we don't restrict jupyter notebooks to any particular subset of those nodes right now. um So that means that jupyter notebooks run alongside ssh based um login sessions. We don't have any resource limits in place right now, so all kinds of fun things can happen there on the exclusive and configurable notebook side of things.

B

um On the exclusive note notebooks on the cpu side on corey, if you've used that you might have noticed it's a little bit flaky, the number of resources that are available are is actually pretty small and you need to ask to be let into a special qos to access those on promoter. Jupiter jobs are much more first class when you click the exclusive node button. You really shouldn't wait too long to get a notebook that starts up.

B

We have no plans for for requiring people to have access to a special qos for that we're keeping an eye on the allocation success rate. So when you request a jupiter notebook um from the hub in on a compute node, are you actually going to be able to get that quickly or do you have issues with that? We're keeping an eye on that and we'll make adjustments to policy um to try to keep that as responsive as possible?

B

Last point I want to make here is uh to touch on setting up your your uh python. Sorry, your jupiter kernels, um of course on corey all the documentation. That's there! That's on the jupiter, um sorry on the jupiter page on docs.nurseco. All of that basically still applies to promoter that applied previously to cory. um One thing that you'll, probably notice is that, as you put in um things like the nvidia libraries, um sorry about the background, my my daughter's at home doing school today- I don't know if you can hear her. um Your content.

B

Environments will pick up a lot more kind of bigger libraries and um you, you may need to avail yourself of using the google common, the global common software file system right. That's it for me,.

A

Okay, thanks so I'll, just quickly, wrap up and try to point you all in a helpful direction. So we have a lot of documentation. We've been updating our docs almost daily now, since we've installed promoter, so we hope you find them helpful. Some of the pages you might check out are just the general promoter page. We have a whole page about python, so this talks about khan environments, condo channels. um We have python on promoter where we point out system specific things.

A

You should know about most of what we just talked about here, the jupiter page rolling just mentioned, and uh we have a pretty handy search bar so check it out. You can type in mpi for pi and find what you need pretty quickly and if you can't find what you need in your docs, um please submit a ticket we're very friendly and we we want to help you and make sure you can get your work done.

A

So, in summary, welcome. um We are here to help and make sure you can get uh make productive use of this brand new, exciting system um check out our docs file ticket, and I think we left a few minutes for questions so we're all here. If you have questions about gpus, jupiter or just general python,.