National Energy Research Scientific Computing Center (NERSC) New User Training, June 2020, 16 Jun 2020

Previous Meeting

⏯

youtube image

►

From YouTube: 15 Deep Learning

Description

Part of the NERSC New User Training on June 16, 2020.

Please see https://www.nersc.gov/users/training/events/new-user-training-june-16-2020/ for the training day agenda and presentation slides.

A

Okay, good so hi I'm Mustafa, one of the machine learning engineers that does I'll be talking to you in the next 20 minutes about different, deep learning capabilities that we have at nurse how to access the resources and how to use them efficiently and also some links for further resources on certain like how to do certain aspects of deep learning at norsk. So to start with, so this talk is actually focused on deep learning. We do have, of course, like the traditional machine learning.

A

Software stack is all something that the transit norsk and it's a standard thing and I'll talk a bit about that, but I've focused mostly on deep learning. As you all know, by now deep learning has been is on the way to transforming science. So, there's a lot of excitement about the potential of deep learning for solving so many problems that we can't solve in we, some of them that we can solve and using traditional methods and some of them where deep learning will help us.

A

Do it faster and I won't go through through the list of things here. But I just want to highlight that in the index fear of that UE there is a lot of big investments that are either already here or they're. Coming for a lot for different applications of deep learning to the various Sciences that the do e is involved with, and there is a like a 300-page report that came from the town halls on AI for science.

A

I'm, sorry that in the short time I won't be giving an intro to what deep learning is, but I can just say that it's so deep learning is a part AI. Is this the general theme of how to use computer to perform tasks that humans usually perform?

A

Machine learning is a particular way of doing I using statistical methods and deep learning is a sub class to machine learning that focuses on using neural networks to solve the same tasks so and with neural networks we can actually solve more even more tasks than what we were able to solve using machine learning. But it's still it's a sub part of machine learning that focuses on using neural networks.

A

The the approach that we have for supporting deep learning tasks is we do that at multiple levels. First of all, we focus on making sure that we have a software stack that is optimized for performance on our on our machine. So we work closely with the hardware vendors and also the software vendors, in this case by tortion tensorflow, with Google and Facebook, to make sure that the software's of the gives they hire the best performance that we can get out of our machines, and so that's for the software stack for accessing the resources.

A

Our preferred way right now is to. There are two ways of doing, of course, interactive Jupiter notebooks to do prototyping and to run small and experiments and tests. So that's something that one one way of accident the resources another is to four.

A

We provide tools to that enable large scale, deep learning through the badge system and finally, we also try to to give training and consulting to to essentially organize or conduct some training and consulting programs for applications of deep learning or science and I'll talk a little bit about this later on. So first on the software stack Lucy.

A

So if you, if you look at the plot plots to the righty or that the first plot that this this one I know remember that musing cursor. So there are different libraries that are famous that are popular for using deep learning for machine learning in general, people use, psychic, learn and that's something that runs smoothly on our machines, for the purpose of the start will focus on the things are related to deep learning, which is Kerris, tensorflow, 2.0, pi, torch and 1001. These are the most popular libraries.

A

The software that we provide is it's usually compiled with the libraries with the backend libraries that perform best on on our machines. They they accentuate the linear, algebra libraries that perform best on our machines, and there are multiple ways of accessing those.

A

One of them is just to use the nernst modules which I'll talk about, and that's the most popular approached people, at least for prototyping and testing out things in the beginning. They just use the nernst modules. You see this 89% of the of the users use that, and then you can also set up your own Conda to.

A

If you want to further customize beyond what you can do with the modules and then you can also, some people prefer to build from source or to use shifter and shifter usage is actually on the rise for report, a software stack. So that is lesson. That's that I'll be talking about each one of these to do so. All of this is for single node or a single GPU performance to do beyond to go beyond that.

A

As you, if you have some experience with deep learning, you know that it requires a lot of data which takes a long time to Train, so you might need to distribute your training. So that's another aspect that we focus on is how to do distributed. Training on our machines, I'll talk about a bit about this ya later and then also mention shifter contributor.

A

For for the modules, it's it's pretty simple, so we have conceptual modules and we have also Torche modules and tensorflow includes carrots. So if you want to use Karis, we provide curse that its back with that tends to flow back end, which is intensive pro 2.0. So you can also do it. Use just a tensor fuel module and the way to check which versions are available. You just do module avail, tensor fill or PI torch, and it gives you a list of those. Then you can do after that.

A

You can do module load, for example, a particular version. If you're running on the CPU partition, if you're running on the GPU partition, there will be another one with a GPU here, instead of Intel same thing for for pi torch, we try to include the most popular packages in these modules.

A

If you want to further add other packages without having to do a lot of things or to create your own Condor, you can just use the pepper, install, u user, which will install any package that you want in your in the in the user area related which is which comes with the with the tensile or a torch module. I'm. Sorry, just wanna check on time. Okay, so.

A

If you want to create your own Conda, then we would recommend that you look at the docs for tensorflow my torch to find out which version of which bill, which Conda builds often Sofia warfighters, to install it's likely that the default ones might not give you the best performance in our machines, and we provide some guidance on that.

A

If you, if you ever need to create your own Conda environment, as I mentioned earlier, shifter is becoming more common now to run deep running stack yours, we do provide images of PI, torch and tensor fill that are optimized for best performance on our machines. This is currently only for quarry GPU, and that is what we plan to use in the future on per motor.

A

The way to use this is, if you want use it interactively you can. This is again only on Cory, CPU you'd.

A

You would essentially use this with with a couple of options mounting then the network volume and then running the image that you want in this case, for example by torch version 1.5, and you can also.

A

Yeah, if you want to submit to submit your bad jobs with with these, we recommend that you use the slurm batch options for shifter rather than using the interactive command as it's here. You just use. You specify the image and the volumes to mount in there Spach options, and then, after that you just run your script and if you get to use it and you have some feedback, please let us let us know we are interested in.

A

Essentially if things are working great we'd love to hear about that. If things are not worth specially, if things are not working great for you or you see, degradation in performance compared to the module or the bare metal versions, please let us know so that's in terms of how to access the software stack.

A

This is the so general recommendations of how to do things. We recommend that use the modules. That is the easiest way right now and if you are in quarry, you can also try the containers, that's what we recommend encourage view. If things don't work out for you, you please check the docs to see which- and you want to build your own condo or your own image. Please check the docs to see how to get the best, the best versions of tensile for my torch for developing and testing your your workflow.

A

We do if you're running on the batch system like directly on a command line, then using the interactive qsys is the it's the best way to do this. You get resources almost immediately and then you can do prototyping or you can use a Jupiter which also just requested some resources from a batch system.

A

The one thing that we recommend that you do when you run your your code once you get, you finish, the prototyping and you're. Just you now, you're about to run like full full-scale. We recommend that you check the utilization of either the CPU or the GPU that you're using if the utilization is under 100% significantly under 100%.

A

That usually means that there are some bottlenecks in your near work flow and the most common bottleneck is to is the data pipeline and essentially, for example, if you're running with the GPU, your data pipeline cannot keep up with how fast the GPU is processing the data. So it's just so. The GPU is waiting for the data pipeline to fetch more data and the there are multiple recommendations there.

A

The first one is to use the the framework specific API for for data loading, for example, for fighters. There is a de apply towards data loader for tensorflow there's, also TF. The data set dot data, the data set that we recommend to use, and those would also give you enough knobs to tune the d-10 gestern pipeline. Hopefully that will give you best performance. If that doesn't work and you're on the cpu, you can use burst buffer for allocating your data.

A

If you are on the GPU, you can use the the SSDs that are our slash temp on the quarry on the GPU note itself. If you have any questions about that, please just ask or send me an email or send the ticket and I'll send. You will send you more kindness than that if everything works and you're sure that the data pipeline is all alright and everyone, but you still have utilization issues, then you might want to actually start looking into some profiling tools.

A

For distributed training first for tensorflow, we recommend that you use uber framework Harvard's framework. We test this. We test every version of this on our machines. We make sure that it gives the best performance Harvard on a scale of a few nodes, whether it's a few tens of CPU nodes or a few GPU knows. That means you know tens of GPUs.

A

It should give you almost ideal scaling without I/o if there are no I or bottlenecks and that's what we recommend that you use with denser flow, and you can also use it with carrots for pi torch. We recommend that you use distributed data parallel, which is a part of pi torch itself. This is a class that helps you do distributed. Training.

A

There is a. There is one class that is called data parallel, which is much easier to use. It's just one line, but it's not efficient, so it it's just down the line. It would be a bit problematic to scale beyond like a couple of GPUs, so we recommend that you use the distributed data parallel class with distributed data parallel. You actually need to specify the back at the communications back-end on the CPU partition. We recommend that you use the MPI back-end on the GPU partition.

A

We recommend that you use the nickel back end and there are tutorials here for how to use the state of the distributed data parallel axis it's straight forward, and if you have any questions, please let us know so.

A

For their workflow tools, a few words about sometimes one first of all, Jupiter as I mentioned earlier, it's available. You can access you to the nurse job. I'm sure you've seen that in the and the talks today, and it should work on both Cori Fiat CPU for accessing monitoring your training. We, the most popular training, monitor framework, is tensor board people, both tensorflow and fighters. Users use tensor board, it's very easy to use and it's there's a lot of community around it.

A

So we recommend that you use that and then to run tensor board that nurse we have. You can click on these links to go to the darks or you can follow these instructions. They should work with these instructions out of the box. Okay. um How am I doing on time? Okay, two minutes so for HBO.

A

If you need to do a hyper parameter Turing for which is something that, if you're doing deep learning, you might want to choose the learning rate, the batch size, the number of layers and all of that we have two packages that we we support, one of them as cray HBO, and that is what we recommend to use on the quarry, CPU partition and the other it's Ray tune, which is what we recommend right now to use on quarry GPU.

A

There we have dogs for how to use, creates Bo, and it's you know that the most important thing about an HBO framework when it comes to running it, that machine like ours.

A

You is integration with the scheduler and create vo and integrates seamlessly with this term, and that that will save you a lot of time that you can just run it out of the box, and we have also examples that you can access to see how to run that for rate, you know, quarry GPU break yonas is, is it's very popular in the community? A lot of people are using it. It's a for so many different backends different algorithms.

A

This different scheduling, algorithms- and we also have repo here that shows you how to to run with to run it with slurm I, have here a chef which you can see for yourself later so essentially, it's the course that you will see in the reports.

A

All you need to change is two lines: the number of nodes that you're using and then the which Pytor version or tensorflow version you're using and then just running your code, and it should run out of the box and it uses multiple MA, not only a single GPU node with a GPS, but you can use multiple GPUs and there's many yeah, so multiple nodes that will have so in this case I'm using two GPU to two nodes that have a GPUs each. So it's running sixteen experiments at any time as I mentioned earlier.

A

We also have some resources for people to delve more into deep learning for science. We left left some links here that you can check out for yourself later. Most importantly, the deep learning for science school has which we ran last year.

A

It has so many lectures on different topics that we thought we thought they're most relevant to people who are newcomers to deep learning for science, and the recordings are available on YouTube, so yeah, so we're very excited about the potential, for you know: deep learning for science, we're trying to support the different scientific communities and the user community in so many ways, and we hope that the the software that provide this is that gives you the best performance and please, if you have any issues, feel free to reach out to us, and we encourage you to show in the nurse user slack and there's a machine running channel there.

A

In case you want to ask general questions either about machine learning or about machine learning at norsk. Thank you.