National Energy Research Scientific Computing Center (NERSC) NERSC AI for Science Bootcamp, August 25-26, 2022, 15 Sep 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Day 1 Intro to AI Lecture & CNN Primer & Keras 101

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Good morning evening afternoon, everybody I uh just wanted to say stephen. That was awesome. I could. I could have listened and looked at way more projects. So if.

B

A

Larger powerpoint, you can send that to me and I'll just nerd out on that. That's great okay, so it is 12 43, p.m, eastern time, 9, 43, pacific, so I'm in florida. So I'm going to try my best to keep on the west coast time for you all and not mess that up every time and let's share.

C

A

C

A

So this is the ai for science, bootcamp um yeah, thanks for being here. So a little about myself, I'm a senior data scientist and the an aia scientist for the nvidia ai tech center at the university of florida. Also a site lead there, so they have a system not as big as per water. It's it's 140 dgx a100 nodes which comes out to run around over a thousand a 100 gpus.

A

So I'm in charge of helping uf researchers do.

C

Ai on that, basically not from.

A

The system side, the hpc side, but more on the ai side. So that's my focus. My my research and dissertation was in generative models, mostly for time series and then my phds in deep learning machine learning. So.

D

A

So excited to be here- and this is my fifth boot camp- I've taught so.

B

A

This goes pretty smooth, so.

B

E

A

Hour to talk, we're going to take a break for 15 minutes and then we'll come back and we'll do a lab for for the remainder of today and then q. A for 30 minutes.

A

Okay, let me move this little. Oh one! Second, there we go all right, so learning goals for today, so we only have a couple hours really over the next two days right. um We can.

B

D

Far so this is a pretty cool picture right. You can think of it. Two ways you think of it as we're going on vacation.

A

We live up in the mountains and we're finally getting to go to the beach on vacation. So to get there we'd love to just jump right in right, straight, shot down the mountain, um but unfortunately,.

B

A

Can't right, we got to go around these windy rooms to get there or you.

B

D

A

We got a tsunami coming and we got to evacuate, asap up into higher land and.

D

A

It's the same predicament right. You.

D

A

Just get there as quick as possible. Unfortunately,.

D

E

A

Road and the path we got to take is windy and it takes some time. So that's, what's that's what this trip into learnings and deep learning 101 um ai for lack of a better word. That's where.

D

A

Right now and let's see something real, quick, sorry there we go, I'm actually going to stop my video because I am lagging on my side. Okay, so the goal of this is is two main goals. I want you all to be able to be comfortable and a lot.

F

Of you might be there already. This is a huge.

A

A huge participation we have so one of the main goals is being able to talk fundamental, deep learning enough to read papers, to read, tutorials blogs and understand. What's going on right. That would be huge for me if you leave here today and tomorrow, like that, second would be being able to use these notebooks that we're going to go through and pick your problem set and set it right there, where the problem set is in the notebook, your data, all right.

A

Let's say that your data right in that notebook and be able to manipulate what you need to manipulate to get a simple convolutional neural network to run that'd be huge.

A

Those two things alone will get you so far and just setting up that foundational work right and this is going to be high level, especially today, so I'm hoping it it drives you all to want to explore more to and google some things look up, some other papers or tutorials or anything like that on deep learning to you know, expand.

B

That curiosity.

A

To get you to to dive in deeper.

A

And real, quick like uh like I mentioned, I got about an hour to talk and then we're gonna take a 15-minute break and then we're going to do two labs they're going to be on keras and we're going to be classifying endness and we'll be doing a keras 101 and a cnn lab so pretty short day only an hour of me talking and uh then you all get to go into it.

A

So I'm going to try to make this as exciting as an hour can be at high level ai, so so bear with me, but I think we're on the right track so intro to ai, and you can see that in parentheses I have d, l m l d s, deep learning, machine learning, data science, ai, all synonymous today in industry, for some reason as the buzzword for ai right. So you can look on linkedin.

A

There's probably you know 20 million people that are like ai expert, ai scientist, ai researcher, ai practitioner right and you ask them what a neural network is, and I don't know because they are maybe data scientists, machine learning, engineers right but ai's that buzz board.

A

D

This is just going to cover deep.

A

Learning but intro to ai is you know we'll talk about that here in a second, with a nice little chart.

A

So, where we're going to right is this whole idea of a new way to code? So we're going to look at traditional programming back before there was machine, learning, slash, deep learning, slash ai, not really ai, but and then what we're at today. So traditional programming right, you had hardcore coder hardcore programmer and they had a task that they needed to accomplish and they had expert knowledge in that task or.

B

A

Go find somebody with expert knowledge and just kind of understand and write a bunch of pseudo code to the program they had to write to solve that task right and then.

D

At the end, they had human.

C

Readable functions.

A

You know if, if or statements loops, you know boolean things et cetera a ton of things that just what we go through right. You know I have. I have two kids three and one and same thing: I'm trying to teach my three-year-old right if it feels warm the stove, it's probably hot right. She just doesn't look at the stove and knows it's hot she. You know you gotta, try to teach that. So that's.

B

That's an idea.

A

Traditional program and then we get in today, software 2.0 we've got this awesome optimizer. This is a robot called atom, which is actually one of the most used optimizers. We have in deep learning and we're going to feed it. A ton of examples and we're going to have machine learning, understand from the examples and the optimizer right finding this space in this manifold world that we, our data, lives in. It's.

B

Going to find a machine learning.

A

Function that explains everything we want, so you can see right so task and expert knowledge later ask an expert knowledge is now replaced with just a ton of data, a ton of examples, and hopefully they're labeled right for our sake. Right now, let's just say: they're labeled make this a little more simple of an analogy to pick up and.

D

B

Of a human readable function, machine learning can actually.

A

Learn our function for us.

A

Now, let's look what this kind of looks like right. So if we have this task right, we want the probability of it raining. That's our task and we know we can input temperature pressure and moisture levels from some sensor we collect, you know somebody could go in, they could code function, one that says if temperature is let's say we're in florida, where it's 100 degrees all the time it can still rain. um That's kind of uh my new point right, but you get the idea.

A

Temperature is 100, then pressure's whatever and moisture is high, then go to the next function right and then again and again again. What that could look like right is is something like this some hand written function. Now this you know is a little more difficult right. This is this is a lot more than just function, one two and what I made at such a high level. So we have to put temperature pressure and moisture, and then you can update the mass mask update the momentum update the energy do macro physics, do microphysics and.

B

A

You finally get some prediction on participation, precipitation and that's converting expert knowledge into these functions right and each one of these would be its own function. That's very intense and very labor intensive to write, and now we get into this learn function, idea this machine learning function where.

B

We let our machine learning algorithm.

A

Get a ton of data right and they're going to learn this as they go and we'll talk a little bit more. What relu is that's an activation function, though, and sigmoid also an activation function, where, basically, you feed in data- and you can see our data is tpq.

A

It's going to be multiplied some weight at that first level and a bias will be added in very similar to something like mx plus b right and from that it's going to have some output, which we're going to call a, and it just goes to the next layer. The next you know function.

D

A

Equation this whole program and so forth and so forth, and then.

D

A

A prediction right, so you can see how these two things, one very labor, intensive, very complex and the other a learned function, just based on data that we have that's labeled. It's amazing really shifted everything. We did everything we do now in this whole domain, so today right learn to use this new approach and revolutionize science. I do not know why that's in there, um but uh we're gonna, we're not gonna actually touch any real world science. Today, that's gonna, be tomorrow today. We're just gonna. Do two easy primer, jupiter notebooks on curiosity cluster.

A

That's gonna go over a fully connected neural network which we'll talk about in a convolutional neural network.

A

All right- and I mentioned before when I did this slide a-I-d-l-m-l-d-s right, so this is kind of where that was coming from at first, we had artificial intelligence right. This was this concept of expert systems executing handwritten algorithms at high speed. You.

B

Think of the chess.

A

The machine deep blue, maybe I think it was called that executed and beat tons of people in chess and then in the 80s early 2000s. Well, I guess up to 2010 machine learning was taken over right. It's a subset of ai, obviously, but instead of having expert systems, we had machine learning, algorithms that learned from data, but the data had to be transformed into some handcrafted features right hand. Crafted features so feature. Engineering was a huge component in these two.

A

D

In this 20 30.

A

Years span: that's a decade right, anyways, um yeah from the 80s to 2010 right future engineering. There are probably tons of people with phds right now. Well, there's not tons of people with phds, but other ones with phds. A large number that did machine learning feature engineering could have been a dissertation right, a whole phd in doing future engineering on time series data. On this specific time series data set that's collected from a sensor to do this. Xyz thing right, but.

D

Now we have deep learning.

A

Right, thanks to good old, alex and yenten from canada right for alexnet 2012, they were like. Well, you know why don't we just use gpus to accelerate something? John.

D

Le did back in the 80s.

A

Make it super fast, well, super fast, faster and uh we'll just learn everything from just data right. We can learn our output and features from data, and that's where we are so that's why, when I use the that that quote, ai mldo data science kind of can encapsulate all of this right, because you're doing science on data data analysis, it.

D

A

Amazing, how far we've gone with deep learning today and.

D

A

Now is synonymous with a more.

D

A

Right all most a ton of applications out there right now that say, ai there's, just some kind of cnn running on some data right I'll say that with a grain of salt, but just keep that in mind all right, so deep learning versus machine learning. When should.

D

A

Learning right: when should we use traditional so before we get into that right? Let's, let's look at this difference again and again. I hope I hit on it a little bit um this feature extraction. That's the main difference. You have input data, you have a classifier but feature extraction. You know it used to be a human in the loop or some feature extraction technique that a human made special for that task right.

A

So, in the case of image classification, I think one that took the world by storm was something called sift features they would pinpoint locations on an.

B

A

Something like I don't know: 72 sift feature locations and if you showed the image at a different angle, it could still pick up those sift locations and get those features right so that whole idea of a translation of variance was there and then those features you know went through a bunch of things to get the features they would be put through a classifier one of the best ones at the time was an svm right. Svms are running lead on all benchmark image. Classification data.

B

Sets and a lot of other benchmark.

A

Classification data sets too and that stands for support, vector machine. You know, random forest is hugely used still. I know that's used in the industry a lot especially.

D

A

Rapids um xg boost with rapids is, is like in every kaggle competition you can think of and kaggle is an open data science. Competition website that you know.

B

We have our own team, the kaggle.

A

Grand masters of nvidia and they all use xgboost almost every time it seems but anyways traditional machine.

C

A

Right they have that future extraction. Deep learning, you learn the features from the input data and do the classification all in one network, machine learning, future extraction, implement it on the data fed into a classifier okay.

A

So if we have a small set of features, maybe only 10 or 100 pieces of data, too might want to look at traditional machine learning right. It's notorious deep learning does need a lot of data and that's that's a huge research area. You know that I'm very passionate about so my dissertation was trying to do fu shot generation using deep learning on time series data because there's a lot of instances.

A

You know if we want to push ai deep learning applications to make the world better right. A lot of instances don't have a lot of data collected. So if we only have a few because it's expensive, maybe that occurrence doesn't happen more than once every every 10 years right. Things like that, there needs to be a way to be able to generate more data, so we can use these tried and true deep learning, algorithms or come up with better few shot, zero shot. Classifiers, uh you know, detectors, etc.

A

Anyways side note and then supervise deep learning, so supervised, meaning we actually have some kind of label for our data. You know there's tons of tons of different examples. There's.

B

Convolutional neural networks.

A

Recurrent neural networks, a lstm which is a type of recurrent neural network, long short term memory, gans, that's one of my favorite things.

B

Ever though, the fusion.

G

A

Are taking over in the generation domain, the generative domain, so degenerative adversarial network and then variational, auto encoders, also very popular, and these finds these find features automatically from the data and you usually have high dimensional data, so images sound speech, time series, that is, you know, financial data et cetera, whatever.

D

A

To look at and there's usually large labeled data sets and then you can use deep learning. You should use deep learning and again.

D

In green here in video rapids.

A

I'm just going to talk on that, just because there's nothing like evangelizing and video stuff, because it actually makes your life a lot easier. So.

B

Nvidia rapids is.

A

Just our data science platform on gpus so.

D

Things like numpy.

A

D

Everyone uses numpy.

A

uh We have something called kupai that you can run on a gpu, get 10 100 x, speed up no joke right, no joke um panda's, another popular one. We have something called qdf.

D

A

Qdf and pandas are so close one to one to get a 10x 100x speedup. You can literally get your environment with nvidia rapids in it that you're running your code on and instead of importing pandas as pd just import pdf as pd, and everything should be one to one all right. There might be one or two functions: that's not right now, um very rare, but then all your all, your uh data frame will be on the gpu and every little thing you do will get 10x 100x feet off. It's amazing! That's a side!

A

Note, though, we're not really focused on that today, but rapids is something I'm very interested in too. So I teach a course on rapids use rapids a lot with my uh researchers at uf and other researchers. It's just really strong right. It cuts down the amount of time you're waiting to load in a data set which uh you know. So I will tell this story back in my lab. We had a gigantic data set of infrasonic data time series data from infrasound, and it was typical you would leave the lab.

A

You would load into data come back the next day and your data would be loaded and you could start working right. That's how long it took to live this humongous data set and I started nvidia. I told my advisor about rapids. I said: hey just try this out. You know you got nothing to lose and the phd student who was in the lab at the time did the same method. We always do load in the data, get ready to go home for the evening before he got to his vehicle.

A

That data set was loaded in right, so that is hours now, you've added to actually doing research and not just waiting for data to load and then down here at the bottom. For deep learning, we have nvidia qdnm, which is basically accelerates every framework for deep learning.

A

So this is actually a pretty cool example of complexity. When you look at images right- and this is really close to what we saw from steven forecastnet- they actually in that paper- talk about atmospheric rivers, but.

D

A

D

A

D

A

I'm not an expert is an.

D

A

River right and think about how you would hand craft an algorithm, not features, but a expert system to look for this in an image with a bunch of if, if not whatever,.

B

D

A

Booleans anything to figure out how to tell the computer to look.

G

A

This exactly and segment this as a atmospheric river right, it's almost impossible, but.

D

With deep learning.

A

You can feed these images in that somebody who actually knows what it looks like these atmospheric rivers. They can go ahead and highlight a bunch of data right. They can.

D

Say, oh yeah, that's one feed that in the deep learning algorithm.

A

And it can classify it, no problem, segmented, no problem, that's.

D

A

D

A

Leap and bound to something towards ai is just phenomenal and, of course it's not a fundamental one-on-one ai course, without showing you an artificial neuron in this biological neuron.

D

A

um The whole idea right is that we can take the way the human brain takes a uh takes information and decodes it and gets some kind of prediction out of it into.

B

A

B

Simple equations.

D

With adjustable.

A

Parameters is how this is.

D

A

And we can, we can look at this a little more right. So you know you have a bunch of inputs and.

B

I I'm not even going to try.

A

To say all these in the biological I.

D

Should know them by.

A

Now, as many times as I have to talk, but there's.

D

A

Dendrites axons neural impulses, the myelin, shell, sheath and terminal branches of the x axon um anyways. You can kind of get an idea how this formulates together right, but back at the beginning, when I showed you know w x, plus b, you can think of of that more so right.

A

We have three inputs here: x1 x2 x3. We got weights for each one of those that connect to an output y and that's what we're trying to find and we let the data in this some activation functions inside each neuron right figure out what this function is to get the best y and you think of it as a generalization of curve fitting too right.

A

D

A

You know granted it's a little different because it's a polynomial. um What is.

D

The generalization.

A

Right, you're just trying to find some function. That gives you the best one.

A

The big difference between these two is this. We just have you know in this case it's 2d right. We have some floating point numbers we're trying to find the line of best fit. You can think of this back in the day when you did your chem labs and you're in excel, and uh you had to do that for the first chemistry lab.

B

D

Find the line of best fit.

A

And you run that little excel regression line and plot it on your plot.

D

A

B

A

Data is highly dimensional right, highly complex. It's really hard to do a polynomial on something that has you know a megapixels one million dimensions of features right, even a 786 which is 28 by 28.. That's wrong, 784 dimensions, 28 by 28., so find.

B

F, given x and y.

A

Right, that's the whole idea, that's all we're doing find a function, and here we just feed in data. We know what our outputs are going to be, and we just kind of do this optimization throughout this whole thing right, so that that's a very generalization of what deep learning is trying to fit into an analogy of curve fitting, but let's, let's have a little fun and look at some examples of you know: images real world data and here's one called lunar crater identification via deep learning. So you can see you have some digital elevation map.

A

I call it a map, let's just call it an image. I don't want to upset anyone that actually knows what this is and uh you have your ground truth right and then you have your predictions and you can see this is actually pretty straightforward and pretty pretty good results.

A

um You know this is from from a team, I think from the university of toronto and it's the automatically detect craters on the moon. That's the whole the whole point and their model was able to recover 92 percent of the craters in their test set and and that's amazing. So the blue circles are the ones that got right and the purple circles are the ones that got wrong in this middle image, and that's that's remarkable.

A

Considering there's a lot more blue than there is purple now on this this one to the right, I'm not sure the predictions there must be a way, they're scaling out the predictions to make sure so, if they're too small, maybe they're just not putting them, I'm not sure I'd have to look more into that paper, um but they were able to identify. Oh that's what it is. That makes more sense. So this.

C

Is all human labeled.

A

Right, the ground truth and the algorithm was actually able to identify tens of thousands more thousands more craters that the humans didn't even pick up on smaller craters that had never been labeled. So the ones on the right are creators.

B

D

Missed right, you can kind of see.

A

Things I just like that might be a creator right here and it kind of picked that up so that's kind of neat. It's amazing, that example.

D

Just shows the power.

A

Of deep learning, here's a cool one coronal holes- I believe that is holes, coronial might be sun.

B

But they use a u-net right.

A

So u-net's very popular, especially in medical segmentation. um Anything really segmentation is pretty neat. We're.

D

Not gonna touch on u-net.

A

In today at all, I think there is a way to use it tomorrow, one of the cfd labs. This.

D

Is from nasa goddard.

A

And their heliophysics team, the ground truth is indicated by the blue there. I think you can see well that teal and that's provided by a very slow handcrafted algorithm.

A

So someone went through an expert system and tried to label these right and that's what they use as their their ground truth and the yellow indicates the pixel probabilities that is actually a coronal hole. The probability of that detection.

A

It's pretty remarkable sunspot prediction not on people but on uh on the sun right predicts.

H

All zeros, unless special care is taken right. So it's a highly imbalanced.

A

Data set and I'm.

B

Actually, working.

A

On something like this now with some researchers from the university of florida, where we have epilepsy onset zones in the brain and our.

D

Data sets like.

A

95 class zero, not an onset zone and the rest.

B

Of it, the little five percent.

D

A

Onset zone and it's amazing how much pre-processing and data engineering goes in before we can even get something valuable on the outside, so they undersample the majority class. They super sample the minority class to try to even things out use a.

D

A

Which is a new type of loss? It's not just like a mean square error or anything like that and they select small crops from high-res imagery, so the crops with large fractions of sunspot pixels will be the positives and the negatives will be randy selected crops right so they're.

B

Really trying to bring this data.

A

Set down to something that they can use on a deep learning algorithm and they train the convolutional network on small crops only and predict on full resolution images and the results are pretty good again right. um It enabled them to label 1.5 million images where they would have taken, probably 1.5 million days to do it with their slow handcrafted algorithm.

A

So just remarkable all right so deep breath everyone we have. uh We have about 34 minutes left and I have a chunk of slides to go through so now we'll get into a little bit of training with it like to train these algorithms.

A

um What are we doing when we train them and then we'll get in the implementation basics? Maybe you need to do deep learning, basically spoiler. There's gpus involved right in this one, we'll.

G

A

Learning energy, if you use and then I'll get in a little bit on the intro to deep learning so fully connected networks and cnns.

A

So in this training versus inference phase, we can think of training. As this awesome lego looks like a transformer, no pun intended um and you're going to try to build this transformer out right. So this is training, so you think of it. In your deep learning, algorithm your scene, learning algorithm. You go out of your data and you're trying to piece together this this training with all this data in this algorithm to get them optimization right, the best optimal use case of your of your model, and then we have our inference phase we.

A

uh This is not a transformer. That's I don't even know what this is. This is like an old cartoon model. I think. Nonetheless, um we got this transformer. Let's say this: this bot, and now it.

D

Can go, do things.

A

D

A

Apply the lead model right, so that's the difference we train and then we use that model, that's been trained, it's converged hopefully, and we can deploy it in inference mode or test mode or in the wild right. Anything like that, where it's actually looking at real world use cases data, that's never seen, and it's applying everything it learned from training to that data, and then you can continuously do this with online learning right.

A

D

A

Approach where you're constantly retraining your algorithm from checkpoints and making the algorithm better right, that's a model deployment and keeping things up. It's huge in industry right now so training the players right. We need data, we need a model, we need a loss function.

A

D

A

Optimizer and you.

D

Might be thinking.

A

These are some pretty cool pictures. Well, they definitely are, um but we need. We need data right, so training data describes the behavior we wish to learn. So our data, hopefully, is labeled in this case, like I said, we're just thinking about supervised learning right now.

A

The model is that function, the equation, we're trying to fit to take that loss, function and minimize it optimally right to get it as well. Let's find the.

B

Optimal, it might.

A

Be a maximum we're trying to do, but for this case, let's just say we're trying to find the minimum loss. So if this is error, reconstruction error- let's say there: it is reconstruction error. We want that to be as small as possible right, because we want our reconstruction to look just like the input, and then we have an optimizer. That is basically the strategy to search this manifold space.

A

This optimization space to find these optimal parameters to get our model to have the best weights to have the minimum loss right, so it all kind of works together. Many choices be made, though, so here is one of those techniques that we use. It is the technique we use not one of to find a solution for this.

D

A

Manifold space that our data lives in our optimization space- and it is.

B

Called gradient.

A

Descent right so here in this 3d space, we start at some random point with random weights. That could be our first pass through the data right. We compute the gradient of that loss function and we send it back right. We'll talk a little more about this and then we take a step in the descending gradient descent and we move a little bit and we try to get to some optimal right. Some.

D

C

A

This case, so we stopping the error small now, in this case it looks like there's two saddle points that could be similar errors right, that's something, hopefully our manifold space, our optimization space as well behaves and doesn't have a ton of saddle points. There's nothing. You can be too sure about right. So a lot of these optimizers help with that- and here is one two three.

B

Six different optimizers.

A

I wish this would play on one loop. It does great, so you can see sgd, that's called subgradient descent momentum, the sub gradient descent with momentum, nestorov, rms, prof and atoms they're, all different optimizers right, so adam's, the one we use the most and it's it's slower.

B

A

D

It doesn't get.

A

D

Any kind of saddle points or anything.

A

Like that now, in this use case that we're looking at right now, you can see they all get to this local minimum of this minimum. We're trying to get to adam just takes a sweet time to get in there.

B

A

Atom stands for adaptive momentum and it works well for many image problems for sure and basically it's a way to jump over local minimum to get to a global minimum, and I mentioned back propagation right, compute the gradient efficiently assigning weights. So we do a forward pass of all our data through a network. So all our data gets pushed through a network.

A

All these weights remain constant right. Remember the weights up so think of these as all x1, x2, x3, x4 and so forth. They go forward to the next layer we have weights assigned to each one of these everything's fully connected. So there's a ton of weights, as you can see, with all these lines and we get to some output prediction and we have a loss function right.

A

So we compute the gradient of that loss, function and propagate it backwards to update those weights right all those weights, and you can see that with the the blue line um and the red line too. But that's that's the goal right: we're trying to update weights these parameters with the gradient from the proceed from the post-seating layer right, so the layer further back so back propagates all the way through.

A

And then you know, each weight can be nudged. You know a little bit in some small amount of direction to obtain this, this great function that gives us that optimal. But that's why we train on a lot of data, to make sure that back propagation is a little more sophisticated and we train on many epics right, many epics epics being one pass through the entire data set and those epics give us a better understanding of. What's going on in the data to update this function, all right.

I

D

Then you know this is.

A

This is a pet peeve of the mind, so here's the soapbox, I don't know why we talk about pie, torch autograd, when this whole thing is in tensorflow and keras, but they do the same thing so tensorflow keras has an auto grad. So basically we don't have to compute the gradient. We let the framework compute it for us and take care of the back propagation for us right. So this entire slide. All you got to get out of this is these deep learning frameworks make this effortless effortless for us right now and if.

D

A

Have a loss function, that's custom and it's differentiable.

A

You can use pi torch, autograd tensorflow's, auto grad um keras, which is part of tensorflow, and it will go ahead and compute the gradient for you and do all that right. You just have to input the function into.

A

You know, computer code right. It's pretty remarkable.

A

D

Just make it simple for us, which is great. We like simple.

A

Now this idea of curve fitting with a single layer- it's just we're looking at this, and this is a bunch of codes that, if you haven't seen tensorflow before this is what it looks like um tensorflow keras. We have our data. We just.

E

A

Some random data and we assign some validation, data and training data right, so you're like what is that? What's the difference right, we have predictions too, that's our black line, so we actually do a great job fitting this line right. But are we over fitting? That's a word? We use a lot right, overfitting and you're like what does that even mean?

A

Well, overfitting is when we fit our model too well to the training data, and then it doesn't rationalize well right.

A

So, let's, let's talk a little more about our data and sense and we have a whole pile of data and what we're going to do with that data is break it up in training data for the training, the model, here's, this validation, word, validation, data for hyper parameter, tuning and test data for final eval. So in a training loop, you're going to train the model and each epic it's going to see some data. It's never seen. People sorry see some data, it's never seen before, and it's going to validate on that right.

A

So basically, it's going to look at this data as if it's test data in that epic and then you get some kind of score right. If you're.

D

Doing accuracy.

A

You'll get accuracy on validation data and then, after your model's gone through x, amount of epics for training and it's converge you're going to use that model on the test data that it's never seen and that's what you would publish on in like a benchmark situation or you know a new paper or anything like that or you.

E

A

Of this test data as data in the wild, so you're going to deploy your model out in the wild say it's a camera trap for detecting animals, and hopefully it does a great job, classifying every animal it sees in the woods and a way to you know, go back to this overfitting idea is we need to monitor this validation loss right, so you can determine if your model is learning by tracking the training, error, training loss and the validation training logs right.

A

You can do this with something called tensorboard high torch has tensorboard as well, or you can actually do what this is doing and you know keep them all and plot them out afterwards.

A

The drop in the blue training curve means the function is learning right, so our training function is learning on training data. That's good, because our training curve, the blue line, is dropping comma.

A

It's going down right and you can notice the scale's logarithmic. So it's going down pretty significantly and.

D

A

In the orange, the validation data shows, our model is getting better at predicting points, it's not being trained on right, so, in other words, the model is generalizing well, which we want. If that.

D

Validation loss.

A

Was coming down and plateaued way above the training laws, we could think our model is probably overfitting to the training data. Then.

D

You got to go through a bunch of techniques.

A

To try to overcome to say overfitting, so implementation, basics, you know you need data to do deep learning. You need a machine learning framework. This shouldn't be deep learning framework um in this case we're using tensorflow.

D

A

And you need a gpu, do you really need a gpu? Yes, ah I work for a video. I have to say that you can do cpu, but it just takes forever. So gpus really make life simpler for us, then there's different deep learning frameworks, python based. We have pi torch and carriage tensorflow and before when I would touch this, I told everyone. I knew no one that used mxnet or julia and stephen and his awesome plots showed that he has five people using these combined.

A

So there are people out there, but I know nothing about mxnet, gluon or flux. So.

E

I'm just a pie, torch.

A

Guy um I used tensorflow when it first came out when we had to use a bazel.

B

A

To get it into c c plus, and that was miserable so now it's all in pi torch, it's really great and then for our case we use jupiter notebooks, I'm a big ide guy, because I like debugging and seeing variables and everything on the fly um but jupiter notebooks are really popular because you, you have something you can present. You know if you're going to make this open source. Everyone loves reading blogs, especially blogs with code. Why not have a blog with code that you can execute like a jupiter notebook or a google collab.

A

And then you know nvidia gpu cloud registry. We have a ton of stuff on the nvidia gpu cloud. We have containers that are optimized for nvidia gpus. We have a ton of sdks like clara, well monie moni's, our medical, imaging sdk wins our ta here today, she's a mona expert. So I'm just giving her a shout out. She saves my life a lot of uf and.

D

C

D

B

Got nemo nemo megatron.

A

Megatron they're all doing large language models or nlp or even you know, speech recognition. Things like that it goes on and on. We have so many sdks, but at.

D

A

D

Cloud registry.

A

Ngc, you can get a container and get a bunch of models that are pre-trained, and then you can go ahead and play with them all right there as it is on your system. It's quite remarkable.

A

And I thought I took these slides out. I'm very sorry the point of this slide. We can do regression with something simple, like sk learn right or we can do regression with something like tensorflow. So it's pretty universal right. It's not just deep learning. You can do things like regression.

A

um I actually don't know why that slides in there, but.

D

Anyways, we'll move on.

A

Deep learning and gpus everyone's doing great, we got 20 minutes left all right, so this is actually pretty cool code. I love verifying my gpus, um it's very important when I'm doing multi gpu things which we don't touch on here. So I do apologize for that. But we do do a ton.

J

Of single gpu and it's good to know, if you even have a gpu.

A

That you can access um funny story. I had just started my job prior to nvidia, where I ran an ai prototype, we'll have for the dod. They got me this sweet workstation with two gpus I was like yeah. This is great, but I was using because they were envy length and everything these two gb 100, which shows right here, um thought I was using multi gpu work and kept getting om errors out of memory. I was like what is going on in here. One of the gpus wasn't even uh pcied.

D

A

Into the board, so um if I would have just ran one piece of code to see what devices I had available, it probably would have stopped probably like four weeks of misery at that job. Nonetheless, this is uh important stuff, so one line for tensorflow one line for cares. You know just to find out if you have gpus and what gpus you have pi torches a couple lines. You know just to get more printouts on it, but doing this is very important and they do that in the lab.

A

So you don't have to worry about copying this down, and then you know I mentioned this before. Like gpu usage and deep learning frameworks is simple: they just they. Let us do it right. It's it makes life easy. These deep learning frameworks, we'll not talk about julia- that that looks like looks like hook, is focused, but keras automatically is your gpu tensorflow 2. pytorch is actually a little more robust right, because you can do a bunch of things on the cpu too.

A

You don't have to do everything on the gpu for other things, not just deep learning right, so you actually have to do this to device thing in pi torch to be able to utilize the gpu, but pi.

E

Torch is really.

A

Good documentation.

E

A

uh Debugging and will let you know if you have a tensor, that's on the cpu and you forgot to put it on the gpu right and then the coolest thing nvidia ever did for a terminal is nvidia smi system management. Interface. Lets you look at your gpus. You can look at it real time by typing in watch space, nvidia, smi, dash, smr, and you can see everything there is.

A

You can see your gpu fan usage right, the temperature they're running at and when we were running a super large language model on this new superpod university of florida, we actually had our temperatures get up to 80 degrees celsius. I think higher 80s and it flagged right like we were flagged by that. So we were actually monitoring that in flag, because that means really slow training, um something's, not cooling correctly, and we had to troubleshoot that quick or else you know things just don't don't perform up the part um you can.

D

A

Your wattage, you know what you're, what you're using power wise your memory.

B

A

Important and then gpu utilization. This is one of the best ones we actually want to have that at 100 and have your watches up sky high right, it means you're getting the most out of your gpu. Typically, if that's happening, your memory is pretty capped too, and you.

D

B

A

Information too, video.

B

Smi in the terminal.

A

Unbelievable all right, let's get into some fun stuff, fully connected networks, mlps, multi-layered, perceptrons and keras- are called dense networks, given a neuron that is connected to every single neuron in the previous layer. So we.

B

A

Tons of weight right in this case we got four eight inputs and we have six in the next layer. So I'm not gonna. Do my math right. I think that's 48 different weights right. There could be way wrong, but anyways.

B

A

Connected at some point, a fully connected network right, so every piece of input data has an impact to the next layer and every output of that layer has an impact to the next layer and so forth. So everything impacts the output that prediction and.

D

Each one of these neurons.

A

Not in the input, obviously, but each one of the neurons in the layers and the output, they have an activation function, so sigmoid and tan h were used first and everything, but we had a lot of discrepancy with those because they have errors in training right. Sometimes things would just blow up things would vanish things didn't do well. So uh the tried and true now is relu, which is rectified linear unit, which.

D

Is just a linear function.

B

Where anything negative, we just chop.

A

To zero right so that max zero x- and it gives you some non-linearity to your network and you've learned a bunch of really cool stuff um used a lot in cnns tons in cnns and pretty much anything now. Really. I don't.

G

C

A

Can age or signal usage, but leaky relu? We use a lot in gans generative adversarial networks, because sometimes we need some negative input: negative output. Sorry from that activation function. Now you could be thinking wow. What's that even mean? What's that look like do not fear, because I have the shared new.

C

Screen share screen three boom.

A

Playground.Tensorflow.Org highly suggest you play with this. If you are new to neural networks, this is looking at dents out here. Dense neural networks, multi-layer perceptrons, um everything.

J

I just said right.

A

So, let's play with it right say we have data 2d data, you can input either your x1 x2. You can put x1 squared as well x2 squared x1 times x2, the sine.

D

Of x1 sine of x2, okay.

A

And this is just.

D

Set up to train.

A

In tensorjs, I think tensorflow.js javascript on.

D

A

um So it's a lot of fun right. So if we have our activation function as sigmoid and we're going to try to learn this distribution over here, that's the whole point right: we're going to try to find decision boundaries um of these two classes. So if you hit play actually follow it over here where there it goes, hopefully everyone can see that, let's see, maybe if I make it a little bigger.

A

And we're going to it's still learning um you can see the training date is learning we'll just stop it here at a thousand ish. You see, we didn't do too hot right, but let's just change it to relu same exact network right input, x1 x2, um two hidden layers with two neurons: each each one is relu activation.

A

It looks like the same thing's happening right. It's just not learning this well you're like what is going on.

A

This is fun to watch this, isn't even that complex of a distribution like a decision boundary. You would think um so.

E

We're way past that right.

A

So let's go tan h, and this is this is literally I'm sorry if you.

D

Hover over this it'll tell you what each neuron is learning.

A

So apologies there seeing that difference. This.

D

This isn't anything special.

A

I'm just doing this to like let.

D

A

With what's going on in an mlp, because this looks so much better than slides, so we actually have trouble with that too right. So what? If we add a neuron to each hidden layer.

A

Boom, isn't that amazing, like one extra neuron, allowed us to hit that decision boundary effortlessly, um so here's a fun one.

A

Let me run it tan h.

A

This is tan h, right and it's just converged. If you want to rule.

A

It looks like it's converged. Still, right did not do a good job, but this is a case where, knowing the data.

D

A little better like what, if we.

A

Just ran x1 times, x2 right and let's just take one away too. That's just effortless right.

G

Like this, I love it. Anyways.

A

So um playground.tensorflow.org it's a lot of fun. This.

B

I think is the most.

A

Complicated one to learn it's the swiss roll is what that's called and if you run.

B

A

We won't give it x1 but different features. You can give it right it it's very hard for it to figure this out, um but it does do as good as it can. You can see the training fluctuations right. This isn't something to be scared about. If you see this just let it train longer.

A

What you want to see, though, is this training loss and test loss decrease, let's just go into town, crazy, anyways, fun, fun little thing to play with: let's get back to the slides cool I'll move that out of the way, so we're not looking okay. So this is a deeper neural network. More layers left more levels of abstraction. It's a super old paper. This is actually a deep belief network again.

A

Don't know why it's in here, but the whole idea is low level features are learned in these layers closest to the input and low level features, meaning edges textures.

A

You know, blobs mid-level,.

D

A

As you move closer to the output like objects right, so here.

D

We're learning.

A

Oh, I got it right here. We're.

B

G

A

Lines and edges and things mid-level.

B

A

Learning objects, pieces of objects right, not really objects, but like a nose, an eye, here's an ear things like that, and then the closer you get to the output. The high level features come out, so we're actually learning objects which is whole faces. There's a popular paper out. I think it was google in that might be.

D

Inception where it learned one of the neurons.

A

Learned a uh the cat right and they're like oh my gosh, this craziest thing ever learned the whole cat. So when a cat's fed into the.

B

A

The only neuron that fires at those lower levels and boom- you know it's cat, but the difference being. This is a deep belief, network, cnns and mlps work similar right. So if you have a bunch of layers, those layers closest to the input will have low level features and those layers closest to the output will be higher level features all right, eight minutes to go we're doing great. We are all stars, I don't see. Hardly anything with chat, awesome, we'll keep going so.

B

A

What are cnns used for cnn stands for convolutional neural network problems with translational invariance is why they came about, but they're used in 1d and uh audio and time series right and variance and time they're used in 2d. Of course, computer vision, 2d spaces.

A

B

Think of a video.

A

Right as images through time, so that could be like 3d or computational physics right.

A

And there are multiple different computer vision tasks that we might want to do. There's classification, so we'll look at the top row. I like talking about dogs and cats. The bottom row, though, is more science. You know, weather related, so I'll try my best, but here's a picture of a cat. The classification would say what is this a picture of right? This looks like a cyclone tropical cyclone. What is this a picture of? That's classification, show it an image. Tell me what it is: what class does that image belong to?

A

Classification and localization is the type of object detection we can think of, but basically.

C

It's what's in this.

B

A

B

A

Right so classification, this is a cat and that cat's right here hey this is a tropical cyclone and that cyclone is right there in that image and then you get into object. Detection object. Detection is what all is in this image. What objects are in this image and.

C

A

You tell me what those objects are, so basically, it's classification localization on multiple objects, so we have two cats a duck and this adorable puppy, and this is where they're at same down here right. This looks like it has spotted six cyclones, tropical cyclones and I'm just going to make a whim that this is an atmospheric river just because it seems like a buzzword right now and then instant segmentation is exactly object.

A

Detection, except instead of a bounding box of precisely where it's at in the image, tell me every pixel associated with that object and give me that outline around those pixels that segmentation.

A

So here's the cat, here's the cat, that's a tightly knit crop around the cat, the duck and the puppy same thing with the cyclones, though this thing here that it thought was, the cyclone has now turned green. I don't know what that is.

A

Little nurse gave us the damage, so we need to figure that out um but atmospheric river right. Here's that so there's.

D

A

Tasks we can do now.

B

Here's an example.

A

Of classification, looking at satellite images and seeing what the land use is, so we have a satellite orbiting going crazy and we want to figure out hey.

E

A

What's this piece of land being used for and.

E

Some of them are pretty.

A

Pretty simple right like oh, you see some airplanes, there's a good chance. This is, like you know: runway um baseball, diamond beach buildings, it's these ones where it's like mobile home park versus medium residential versus dense residential, like that's a tough one to decipher through right. So it's pretty powerful what these deep learning algorithms can do on images, but the right data labels now.

G

Why not just use a fully connected.

A

Network on images right well, I mentioned before everything's connected fully connected each is connected to the next and so forth and so forth. So if we have a megapixel image, one megapixel is one million pixels, our input's, already one million. So that's one million weights. We automatically have to learn if our next layer just had one neuron.

A

Now, if I had two, you know this goes up three four five. You can get the idea right. They just don't scale well at all, so they create cnns. That kind of help with that that issue and we'll talk a little more on that too. It's also objects in nature. The translation and variance right objects and nature. Look the same from place to place.

A

I mentioned sif's features before simply try to you know, pinpoint these ears and the eyes and these feet on this cat and then, when you move the cat, hopefully it picked up the same exact features on the cat right: the same locations well, cnns.

B

They just don't.

A

Care right you put that cat anywhere in this image. You.

B

Can even turn this.

A

Cat upside down still do a pretty good job at class line as a cat. Now.

D

There was a paper.

A

Called capsule net that came out by hitting things where he said if this cat was picasso'd right on this image and he had his head or she this little cute cat. Whatever.

D

A

D

A

Here eyes on its feet and ears somewhere else, the cnn would actually still classify that as a cat, because it's picking up on those different objects in the image from that you know mid layer and learning objects and capsule. That was supposed to help that.

D

A

That never took off um there's some papers on it and that the implementation was rough and it really.

D

C

A

What it was promised theoretically.

B

A

Application, wise was not so cnns are still trapping true. So what is a convolution? It's just a small matrix transformation applied at each point on the image, typically through some convolutional kernel, and this case is a three by three edge detector kernel, a feature: detector kernel, sorry not edge detector, and you just put it over the exact location on the image you update, that middle pixel see, and you just slide it right. You might be thinking wow. That is very.

D

A

Well, do not fear just guess what I have another cool website.

A

um I know we're getting close to time. That's okay might get.

D

A

Over our break, but this is polo club github. I o cnn explainer, take a look at this check it out. I'm gonna put in the chat right now. Oh thanks, whoever put the playground tends to the tensorflow, probably aaron. Thank you so much for putting that link in there for people.

A

This is really cool right, so it talks about what's a convolution, what's going on? What's a neuron? What's a tensor? What's a layer? How do you update kernel weights in that? What do each layer of the network do and then get down here? It'll! Actually look at each network right, so it you can see that kernel see that little kernel here it is. Oh, I listen that little kernel sliding around this image and updating as it goes right, then it updates right.

A

So this is this is just a little video of it, um but they actually have this going around too. So it's it's remarkable stuff. It's a great little little little tutorial blog. You understand hyper parameters of each kernel and what it's doing it's interactive right.

A

So this is when you have a uh five by five input, but your kernel is a two by two.

A

So if you have seven by seven it a one pad, when you run a three by three, you can see how it's updating the seven by seven with the padding right, I'm gonna change that anyways check that out play with it or some the notebooks they go over. This really well too, and uh at the end of this, is a video tutorial. How you can actually go in and play even more right and just try to understand what cnns are doing to a little higher level, all right. So with that we are at time.

A

I'm gonna take five more minutes. Everybody's time really sorry! um I just talked too much so.

D

This is a sobo filter.

A

So you know back in the day when people had the feature engineer, we talked about somebody spent forever figuring out hey. How can I get edges from a convolutional filter and what would that look like so sobel came in and he was like. Oh if we have this filter we'll get horizontal edges and if we have this filter we'll get vertical edges, oh vice versa, and boom. If you apply those, that's what you get it's pretty neat, but now cnn's actually go in and learn optimal filters right.

A

So this just goes over. Some of this is the last bit of slides. I had before a break, so this is good. So imagenet is the standard benchmark image classification data.

D

A

I think we've they're trying to change it, because there is some scrutiny in imagenet, where actual images aren't labeled correctly. It's a bunch of rgb images, a thousand.

B

A

D

A subclass of ten.

A

Classes so like dog is a class or you can have portuguese water dog collie rottweiler things like that. So that's the classes versus the subclass, and I made this so this was in there, but in 1998 lacoon created lynnette.

A

Actually I thought it was um and then alexnet is what spurred the deep learning revolution right so using a gpu, they accelerated the convolutional cnn portion and how it updates and backprops and everything blew up since then that went to vgg and inception in 2014 resnet in 215 exception, resonant 50 and densenet 2019, and these were all state of the art in imagenet classification.

A

And then I added this because now we're in this transformer error. The vip is the vision transformer it.

D

A

You know steven mentioned a little bit pre-trained on a bunch of data and then fine-tune on a specific task, and it's.

D

A

Things and every.

B

Every aspect of.

A

Every benchmark across domains um 88, I knew it was 88. lynnette first started it looking at. You know endless for the usps. This was super slow because there was no gpus at the time. Utilizing uh acceleration so svms or vector machines with some future engineering actually outperformed this. So it kind of died right after this publication, then alex came in with uh hinton and one imagenet with a cnn that took you know an accelerated pace of training compared to lynnette and people started paying attention because.

J

D

A

D

A

Second place in the previous benchmark: vgg.

D

A

Basically, let's get the biggest network, we can get with a ton.

D

A

uh Layers of different sizes and vgg is used a ton for feature extraction right now and you're like well. I thought we didn't have to do that. It's true, but you can actually use these networks, cool features and then use those features in different downstream tasks as well like uh video.

D

A

Object, detection things like that inception: google net trains different size convolutions in parallel right, so it's just a more complicated model, but these models get larger as we go on, gpu power gets greater right. The hardware gets better resnet.

D

A

23 million parameters and.

B

A

You know basically helps with learning the identity function by a skip layer, um resnet50.

B

Means 50 different layers of this.

A

So it's crazy! Here's a plain cnn and you can see the uh resnet is just the plain cnn, with some skip layers.

A

Densenet came in and said, hey what, if we uh just made everything connected right and every cnn layer can know what's going on from previous cnn layers right, so it has a more universal understanding of everything being learned and trained from the data and they did.

B

A

It's huge right, it's pretty dense and it takes a hot minute to train even on gpus, but it's it works, and then we get to vision transformers. So this is not from the original paper, but I, like the mushroom, was actually shown pretty high. So it's an idea so just taken from nlp, where you have sentences, you do.

A

You know some type of patching or tokenizing you flatten it and you get your position embedding and then you use this transformer encoder, which is this over here to the right, very powerful and, like I said it's just taking over, so that's it. That's all. I had for right now um put my camera back on. Hopefully my internet's better, because what happens when your wife comes home gets on her phone um yeah.

D

That's all I have.

B

A

It's 10 50 your time 150 my time so we'll take a break until for 10 minutes come back at 11 and we will get on curiosity and we'll start the labs I'll actually talk a little bit about the lab. um We'll look at some challenges here. I'm still sharing my screen. So that's good, so just challenges disperse some conversation and then we'll talk about the lab. Okay, but right now go ahead and take your bio break thanks.

A

So much for listening to me talk for an hour and five minutes do apologize for that and uh yeah I'll see you back here in 10 minutes.

I

All right back at it.

A

So while people are funneling in because we had 100 some now we're down to 90., that's okay I'll uh talk about some some challenges that actually steven see what I'm telling you that presentation you gave is awesome, but uh some of the challenges they have right. It's labeling large quantities of data so.

B

Ways we can overcome.

A

That without manual labeling and before I get into that automatic labeling is like a billion some dollar industry. It seems right now, there's startups left and right that are doing labeling and the synthetic.

D

A

From labeling and things like that, but data fusion is pretty cool, so that's kind of what very similar to what stephen was talking about using one data source to label for another. So if.

D

You have a bunch of like lidar.

A

Plain clouds: let's say you.

F

A

For it, but you have some 2d images and then you do some nifty nerfs to it get a 3d 3d model. You can label that point cloud from that: 3d model right self.

D

Supervised learning.

A

That's huge, that's a transformer 101 all over it reinforcement learning I actually haven't seen too much of that, but reinforcement learning is absolutely amazing. That's something I wish. If I.

B

Could go back and.

A

Do my phd all over again, I would do reinforcement learning, but my wife said not allowed to do that and.

D

Then human in the loop.

A

Right, so that's a big thing in like text summarization and nlp is having someone sit there and see what's happening to correct it, and then the model will self-correct itself.

A

um Transfer learning that's gigantic, especially for like image classification. Where you take one of those huge pre-trained networks. I showed you on imagenet and just use that for whatever downstream task you have, where you can you can uh you know we do.

D

That a lot in our video.

A

Dli for fundamentals and deep learning, but this is neat they're, transferring everything they learned in a omniverse environment, the simulated environment of this robot arm, and then they just take it and fine-tune it on the real data set in real life and then pins forcing physical constraints. That's just that's just awesome right. Forecastnet has a lot to do with that, but the fourier neural operator and things like that. Keeping physical.

D

A

um Abiding by the laws of physics, and uh just it's.

D

Amazing, where we're.

A

Going at and now deep learning and interpretability.

D

So this is something.

A

I I think, is very cool. I know when I was working for the government, the dod. This is something they always wanted right, so they want a network that gives you classification results, but.

D

J

A

Confident the network was in itself, so you know typical, hey. This is a volcano. That's all you want to hear right. Look at that that volcano is pretty high, but what they want is this is I'm 95 certain? This is a volcano and I'm 100, confident in my classification result and.

D

That's the type of.

A

Deep bayesian network, not.

D

Bayesian belief, but a.

A

Bayesian neural network: that's that's what I've seen. This is a layer, wise relevance, propagation that they're shown here super cool stuff, all right, so.

B

Let's get into the last.

A

um The standard hello world for deep learning is ms digit classification handwritten.

G

A

Classification- and you know this has been tried and true thousands of papers on it. I think the benchmark now is like 99.97 accuracy. So it's almost perfect. It's not that difficult right, and this is a 2d cnn, that you could use to train and test and get a huge, huge accuracy on this right. So here's your data, you're going to load.

J

A

You're going to reshape it you're going, you know, take the the labels to categorical which they'll talk about in the lab a.

D

A

You build your model, you train it things done, you train it here, one line, and then you evaluate it right on the test data, so pretty cool stuff. This is actually written incorrectly. So this test, this validation data, should not be accessed in y test, and this is cheating because.

B

You don't want to ever see that.

D

When you test the data right.

A

So in essence,.

D

They should have another.

A

Category called x, val and y val, where it's part of the training data that you use for evaluation, but what we're going to look at is slightly more interesting. It's called fashion mnist, it's 10 different classes of little thumbnails of different types of clothing and bags, so t-shirt, trousers, pull over stress and so forth, and this is something that they tried to preach a lot in these boot camps and someone came up the other time. I taught this and they had a discrepancy in it. But we'll go with this six step approach.

A

You have data and with that data you know you have some tasks right and the discrepancy was shouldn't. The task come before data and you know you could think of that as just some kind of replaceable loop. You.

B

Have a task that you.

A

Want to solve and you have data you know you need for that task that can solve it or vice.

E

A

I have this data, I wonder what it can solve right. So that's.

B

The two different ways of looking at it.

A

You have a model, you have that loss you're going to train it, which is learning and then you're, going to evaluate that six step approach is what we'll take with this. So, let's so I'll actually walk through this. Hopefully everybody got on curiosity.

A

Share screen here we go. Let me get these things these things, I don't know all right. So here we are. Hopefully everyone got the here so today only we're going to do intro to dl, so you'll click on that directory hope. Everyone can see that when we go back so when you, when you launch the lab, you see intro to dl tropical cycle intensity and start here, so you could start to start here check.

D

A

Right always fun to do that, and then it will tell you what you're going to do cnn, primer and keras 101 and then the next one would be tropical cyclone we'll do.

D

Tropical cyclone.

A

Intensity estimation tomorrow, and you can see- I do- have a gpu now, if you go to this plus, you can actually run that uh nvidia smi that I was talking about in jupiter by putting exclamation, nvidia smi and you can see. We are indeed on a100s that are 80 gig a100s, so they're powerful, the most powerful gpu you can have right now. That's in production, so you're gonna have a lot of fun, very powerful machine on your hands.

A

So what you're gonna do is go into intro to deep learning directory. Now, that's the most confusing part in the world. You do not start.

D

At cnns, you start at part two.

A

I don't know why it's called that, that's where we start um so cnn and primer. Please take your time read this. This is so well put together. It's unbelievable they'll go through that six step approach that I kind of glossed over and then each step of the way right. They talk about the data. They talk about pre-processing, the data. Why you're doing this? Why you're doing that right? You you understand it from the code and then they have stuff written about.

A

Why you're doing it like here's, a great one, the image pixel values on a grayscale image range from 0 to 255. So now they want to normalize it between 0 and 1, for your train and test data and then every lab it seems I get this question. Why are we doing this and it says it right here: the normalization of the pixels help us by optimizing. The process with the gradients are computed right.

A

So take your time right, really gather what you want out of this lab now you could, in theory, hit run all and get it done, but you don't really learn anything from that right. We're here to learn- and some of you here might already know everything in this primer course after that- I'm sorry you can run through maybe do a quick update uh in your brain kind of hang out with us for another hour and uh yeah. So you'll do part two first right.

A

So this is a uh this is an mlp you'll go through data pre-processing, defining your model and you're, just gonna make a dense network uh at mlp and then after you're done that you'll go to cnns.

A

Okay, and please read through that. This is really good too. They go through the convolution and then things of that sort. So it's it's extremely well written you'll learn a ton from the notebooks. So take your time ask questions in your group. Ask questions to the ta in the group um anything like that.

A

So there is one thing I want to note at the end of each notebook has something highlighted right, shut down the kernel, so in the cloud when we're doing this on a100s with tensorflow, when the notebook is up, it automatically allocates that gpu. So when we get into tomorrow's stuff in our labs, we'll get a lot of hey, I got om errors. I got all that mirrors it's typically because no one went down and shut the kernels so to do that, we're on our directory tab right now, this little folder you'll go right below it.

A

You can see. I have three different kernel sessions up already and I haven't ran one bit of code, so you'll just shut down the kernel, you just click shutdown, so run.

D

A

Run everything through it and then before you move on to cnn's just come in and shut down the kernel for part two: okay and then, if you want to keep this, um you go to file, you can download it as a jupiter notebook.

A

You can export notebook now, let's see so when I do dli.

D

Export to pdf does not work.

A

Yeah you'll get this every time so.

D

What I do is I.

A

File, there's export export as html, and it saves it down below it actually downloads it. So if you bring that up, you have the whole notebook and then, if you're, really interested in using like adobe or some kind of uh you know, pdf.

B

Reader, you can.

A

Print this and save it as a pdf from that right.

A

So that's how I would do it if you want to save them, but it's.

D

Good to have on.

A

D

A

All you know this is all we're doing the next 50 minutes. So please take your time. I'm troy go ahead and knock us out, put us in our groups and good luck. Everybody.

E

Awesome yeah, I'm gonna hit um open up breakout rooms. um We have seven different groups, so seven different rooms, each room has a ta. The ta will introduce themselves briefly um and then you guys can go ahead and jump in make sure you're asking questions. If you've got them, you can always post in slack, doesn't matter so you're automatically going to be moved here in a couple seconds.

E

Oh cool, so let's wrap up in qa. No, um I hope today was a good start to our learning. We got a lot of hands-on stuff tomorrow, um so with that we can go ahead. Did you want to go through any of the results from the notebooks or anything.

A

Thinking about it, question yeah we we could so I think there was a and I brought it up to you. There was an error coming up where you know the the kernel wasn't setting itself, so maybe some people, hopefully everyone nailed that um yeah. I think we saw with the dance well wait a second I'll just go ahead and pull up the screen any of the tas want to want to talk or anything. While I'm doing this sorry, I didn't think to.

D

A

I know I heard peter and jonathan darcy going through a lot of talk. I.

B

Didn't jump in.

D

A

Section like I should have.

C

um Caleb, like in my session, uh there was a question that we just thought of like out of the break room, just in the last minute, as I was trying to answer, but there was a question about in the uh convolution layers. There's a parameter called filters, which is either 64 or 32 like the question was: is there something that is calculated? Basically, how are we determining? What's the number that we need to use in this parameter filters for the con 2d layer.

A

Great thanks, thanks for relaying that so I saw peter shake his head too. I think uh I saw someone also in another room, so my claim to this all the time is: there's no exact number to use. We usually base a lot of the stuff off of papers. We read right models that have been successful so in in this fashion.

A

We look at a paper like oh, they did really well with this architecture right and what you'll see with that is typically their powers of two and that's just because gpus work faster in powers of tubes, that's how our threads and blocks and threads and blocks and warps and all that are, are calculated and formulated. So if you have something in the power of two, it flows a lot smoother, but there's no calculation on how many filters to use just like we don't know the the perfect size to use either.

A

A lot of due diligence needs to be done before you. Try something read papers right, read on your problem set and then, if you find a model, say you're, just finding one open source and you're going to try it for the first time.

A

You do that and then you just you, could do a hyper parameter search if you really wanted to so you're, not sitting there just guessing and checking everything, but I don't have any validated answer that gives that other than trial and error. I think I was in your your room, so shank.

G

Yeah yeah, that's basically what I said and you start off with models that people have used uh for their application and then you fine tune it for your own application uh and usually 64 128 256 are the numbers.

D

G

A

With that said, we'll see one tomorrow where there's some funky numbers and that's just because we're mimicking our model off of paper that was published and that's what they found. It worked the best now.

E

A

Every possible combination of filters and number of filters and filter size- I highly doubt it we would still be running.

B

A

If that was the case, but.

D

It worked for what they were doing.

A

And that's what we'll mimic tomorrow.

I

Cool, so is the question about size of filters kernel size.

A

Well, it was more so how do you know, can you calculate the best number of filters, best number kernel, size, etc.

I

At least theoretically, you can basis base it on the local receptive field that each kernel size would have on your image right. That would capture these size of the features that you really want to be captured right, and the number of layers would determine if your total receptive field would cover your entire image or not, at least theoretically I mean practically, you need more experimentation with it, but at least theoretically.

I

Yeah, you can look up local receptive field, yeah.

A

That's a great point, I guess, if you're looking on a super resolution image, you don't want 32 3x3 filters right, you'd, never cover the whole image that let alone pick up. Those large features, you're hoping they get from an image that size um in the.

B

Time series domain.

A

Similar things too right had a long, continuous time series with like 2 000 samples per signal. You can't have 100. I guess you could yeah.

B

A

Like three 1d kernels you'd never cover that span of that time. Series so great.

D

A

A

Cool any any other questions. Sorry, I pulled up the lab. I don't know if anybody wants to benefit from walking through there or not. I think we saw with the the first part, which is part two of funny enough. um Adding dense layers didn't help anything with the uh performance right.

A

It might gave like a little bit of a bump, but we really didn't see any of those bumps until we moved to the cnn in part, two well cnn, the second lab and uh that's just kind of a test, but a testament to like how powerful cnns are right and how they pick up on different features that obviously we're not going to learn from a dense network of flattening because we lose that spatial coherency right. So we flatten an image.

A

You know you lose everything with where laces are in correlations, the souls of shoes and and things like that,.

A

B

I don't think I'm gonna go.

A

Through a notebook, that's that's it. That's literally all I would have said if we would have went through the notebook. Sorry.

D

C

Curiosity, so I was trying to get it back up.

H

E

All right do we have any other questions before we close out.

H

um Can I ask a question: go for it um so uh in cnn how to calculate the all the layers like do you apply the filter layer by layer.

A

B

A

That I I would ask you to go over the question again because I didn't fully get it, but if someone picked up on what he was asking, please please take it.

I

Can you elaborate that a little bit I didn't fully get it.

H

um So so I have 64 layers of these matrix matrices. So how do you calculate these matrices to apply the filter layer by layer or do you have other ways.

A

Do you mean, do you mean how does the model learn each filter per layer.

A

I guess, when you say calculate I'm kind of lost on that like.

H

How do we know to pick.

A

H

No, no, I'm not talking about the numbers. So let's say we ask it to be 64 layers. um So do you just apply the filter uh like six to four times.

A

We want this cnn to be 64 layers.

H

D

G

Of filters in this layer is 64., not the not the depth.

A

Oh okay, thank you go ahead and take it then that's I'm still trying to figure it out.

G

uh Actually, I didn't get the full question. So yeah you have 64 filters. Is the question: how do I compute these filters? I mean these are all learnable parameters and the optimization algorithm essentially learns what the optimal weights and biases is for each filter and to compute the activation in each layer. It's just a matrix, matrix multiplier.

G

That's it um and.

B

Is your question.

G

How is it down in under the hood uh or or maybe you could elaborate a little bit more.

H

Oh, uh I think I understand what you're saying so you're uh saying that you're just do that. Multiplication like six to four times.

G

uh Yeah, so if you have 64 different filters, then you do that convolution operation 64 times, but that convolution operation under the hood is just a matrix matrix multiply, I mean that's how you implement a convolution underneath uh and yes, you basically get a tensor which is the size of the image or whatever it is depending on. If you have padding or not- and it's just uh like the operation is on 64 times, so your output will have 64 channels.

D

A

Yeah, thank you. You nailed it there's a question in that chat. It might be a little off topic, but I was curious about the execution of these deep learning models on gpus are these executed on the tensor cores or regular sms? Not exactly sure the differences between these, but the a100 data sheet mentions their flops performance separately.

A

Great question, so I know tensor cores, maybe if robbie's still on he might be able to talk about this better than I can.

B

A

Cores are specifically designed to do acceleration on deep learning frameworks right, so they help I'll go ahead.

B

I was just going to say: tensor cores are just specified specific hardware optimized for mixed precision, computation, which is the type of computation that is done in deep learning. Algorithms.

A

And that makes precision something you can flag in either framework right and it knows what precision to use to accelerate for training even further without losing accuracy or performance. Whatever your task is.

B

Yeah the reason that you're able to use mixed precision in deep learning is because you're you're, basically calculating approximations during each epic. So you actually don't need like a full floating point representation, a full 32 representation, 32-bit representation of of your values right so you're able to reduce that precision and multiply your compute throughput in the process um because, like I said, you're just calculating an approximation, it doesn't, it doesn't have to be perfect.

B

Thanks robbie, that's not true for scientific applications! That's why, in science, in the scientific hpc world we don't use tensor cores, I mean they're starting to use some mixed precision stuff for some, like intermediary calculations in certain applications and stuff. But for the most part, most people are using at least single precision, if not double precision in those types of algorithms, it's really just algorithm specific and how much accuracy you need.

A

Thanks thanks a lot rob. I appreciate that cool any other questions.

J

uh So I just had a quick follow-up on that thanks robert. um So um are you saying that you know, while these uh dl models are being trained on the pencil course, the sms are essentially unused.

A

B

A

Don't know the answer to that one.

B

J

Sorry so my question is: uh when these dl models are being uh trained on the tensor cores, uh the sms, are they underused? I mean they just stay idle. The.

B

Tensor cores are are inside of the sms.

J

Okay, okay, then maybe I I need to read about the architecture, because I.

D

Mean yeah separately.

B

So trying to find a like a diagram for you, but basically inside of each sm, you have a number of different compute units and it varies for each architecture that we've released um and that's also, for example, why double precision compute is slower than single precision, compute right, typically on our sms, you have half as many double precision floating point units uh alus. Then you do single precision.

B

So um we typically measure I mean when we release a new gpu, we'll say the single precision and double precision and mixed precision results, but the the main number that's usually advertised used to be historically the single precision floating point. Compute number more recently they've started to highlight deep learning performance because it's obviously really really popular and really widely used.

B

I'm trying to find.

B

I'll see if I can find it, I need to google around a little bit and see if I can find it.

J

Yeah yeah sure, thank you very much yeah. Thank you. I can wait till tomorrow. Thank you very much. Our work yeah thanks.

I

A

D

An attempt not to butcher your name, but I'm gonna, do.

A

A

Actually, I'm not gonna. Do it miss, mr mrs wong, when you got.

F

Yeah, thank you. Yeah yeah, just uh I just discuss with peter and give a very good um explanation between the fitting and our traditional fitting and some machine learning uh uncertainties but, uh and he yeah. He kind of told me that, in order to get um uncertainty from uh similar to the what we did in the traditional fitting, we need a lot more work like training the model several times, and we need to take into account several sources of uncertainties in order to get a good estimate.

F

So I'm wondering, for example, in the science area, we probably need a lot of cases to give uncertainty which have a very good estimate. So then, in this case, what's a benefit in such fitting problem.

F

So, what's the kind of advantage in this case, if, if we use actual ai for fitting oh yeah, I'm not sure that's clear because yeah well in the area where uncertainty is very important, can we get some more advantage from ai.

I

Are you saying ai or in general, are you asking the use of gpus for bayesian analysis.

F

um Yeah, the probably yeah the gpu based um private basin calculation, sure.

I

um It it again there's two part answer to it. uh I don't know if any other ta wants to take it, but I'll give a first shot.

I

So in full bayesian approaches you have this uh mcmc kind of sampling rate markov chain, monte, carlo stuff and that's kind of inherently sequential in nature, and I'm not sure if there are any algorithms that are really parallelizable and in such cases it may not be beneficial to use a gpu, but um there are certain other flavors like gaussian process regression and where you kind of assume certain analytical uh assumptions right uh in such cases, it boils down to just like matrix vector kind of multiplication, so there actually really helps okay.

F

um Yeah, thank you, but if I, for example, I use a bayesian, for example, something like basin reconstruction.

F

um Is that kind of better than the traditional fitting or some machine learning, which actually only gives you a prediction with some estimate accuracy but hard to tell the uncertainty.

I

I wouldn't know on top of my head, probably we'll have to know a little bit more about the problem. Oh.

B

Yeah, just for anyone interested I posted this is just some random article that I found, but it has a decent visualization of one sm, basically, in a turing gpu.

A

Yeah, I put it in the slack channel in case anybody wanted to see yeah.

B

So you could just see the different compute units in there. You still see your shared memory and your cache and then in this case this is a gaming card. So it has a ray tracing core as well, but you wouldn't have that in a data center gpu.

A

Solid great great font and then for your the previous question, mr wine. I think if you typed it in the slack it'd be a little easier for us to get right. Now, it's a pretty long question, I'm pretty little. uh I think praveen did a great job answering it, and I agree that it would be specific to the use case right, um go ahead and put it in the in the slack and I think we'll get a better understanding and maybe a better answer from it.

F

Right yeah. Thank you very much.

A

Okay, so we have 10 minutes till the end or if there's no other questions, we uh can go ahead and wrap up now and I'll see you all tomorrow at nine. Yes, nine.

E

A.M: 9 a.m. Pacific.

D

B

Sounds good see you all.

E

Right awesome, yep, thank you so much everybody and we will see you same time tomorrow.

A

Awesome thanks. Everybody.

H

Thank you. Thank you.