National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 13 - Deep Learning for Science at NERSC - Prabhat

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

Screw up also serves the entire sort of nurse user base, which is about 7,000 users across the globe. So if you have any questions on using the nursing machines that ask group is sort of the first place to go to robot. Also wears many hats he's a computer scientist by training, but he also works across a broad range of Sciences as you'll see today and he's published in pretty much every domain science and computer science and most recently the award that shot the task group into fame was the Gordon Bell Prize last year.

A

B

Right thanks thanks Nick, so do want to thank you all, for you know, sharing a lunch here. There is a tendency to enjoy lunch in the great California weather. So thanks for you know, bearing with me I guess for the next hour or so so, I'm gonna be talking broadly about deep longing for science. I. Think much of the foundational stuff that has been covered in the first couple of days has been in a fairly generic, but this is the deep learning first sign in summer school.

B

So I think we are slowly transitioning into science topics and what this talk is going to do is to give you a very broad perspective on what classes or applications can benefit from from deep planning, so very early on I.

B

Think I do want to call out that when people think about data analytics at nurse in other supercomputing centers in science broadly, there is really a broad set of tools that one can break two pairs I think Brenda early on Monday had a slide that showed you know on the left, side, AI and the machine learning being a subset of AI and then deep learning being a subset of machine learning, but really broadly.

B

There are many many methods that you can bring to bear and chances are that you know as assumed as a postdoc as a researcher. You already are using several of these methods, so classical linear algebra is obviously relevance. You know you might have classical image of signal processing tasks that are important useful in some cases in science, graph, sampling graph analytics becomes important and really I'll call out that statistics in many ways is the foundational technology. We are behind machine learning and eternal. So that's something to keep in mind.

B

Obviously people have vanilla, you know statistical significance, tests on and so forth. Those will always be important always be relevant for scientific data analysis. Now, of course, you know the AI revolution, the be planning on us, it's impossible to ignore the the three circles on on the left, but you know just keep in mind that as a day-to-day practitioner, you will be using many of these tools down the line now.

B

I think another message may be in this slide is that it is not the case that deep learning is gonna, subsume and replace other classical analytics technologies or methods. That's not the case. Deep learning will be one of several things that you'll be able to use.

B

So you know, I do need to do justice to the institution that funds us, so the Department of Energy has been systematically tracking what scientists are saying about their needs, so we run all of these requirements, gatherings workshops across all of the offices, so nuclear physics, saionji physics, biological and environmental research, fusion, basic energy sciences, and when we talk to scientists across the board in the BOE and I suspect, this is true for NSF or NASA or other agencies, I think in the bottom of passion.

B

They are saying that they would really like access to advanced statistics and advanced machine learning capability, so I think we see that requirement coming in a bottom up now. The DIA is, you know, I would say it fashionably, late, studi planning. All of these things took off in 2012 I. Think we've been certainly running a lot of workshops, capturing requirements, so we ran a couple of workshops where in I think it is articulated what the deep learning requirements were and what potential approaches they might be.

B

So I'm going to chat a little bit about these three classes of applications and in the next slide we are shortly going to have and by the way these are the two workshop reports that came out of those two meetings shortly in the next two or three months. We're gonna have a bunch of AI town halls in the BOE, wherein many researchers will essentially I think try to articulate what the open challenges are in domain sciences and what needs to happen differently in the computer science area to accommodate. You know these these emerging requirements.

B

You may know that there is now presidential AI initiative, which is really helping I, think various federal agencies launch programs in AI now the reason I put this up is because, as students as postdocs as researchers, this is top-down funding coming to you. So so you know, you probably ought to you prepare yourself or team up with others to respond to funding calls when they become our entity.

B

Alright, so just to be concrete now you know when we talk to scientists- and we say so you know you say you one of those statistics or you, you want to do machine learning and deep learning. What is it that you mean? What is it?

B

What is a specific problem that you are solved, so this is my attempt at tabulating various domain science areas in the do-e specific areas like astronomy, cosmology, particle physics, climate genomics, light sources, material science, I am Collider, is plasma physics along columns and then along rows are typical statistical tasks that you might want to solve. So perhaps you have a pattern: classification problem. Hopefully you know what that means by now. Maybe you have a regression problem. I think we in a blender call out early on on Monday. You want to predict a continuous, valued quantity.

B

Those are typically cast as supervised forms. Then you have a range of unsupervised from so clustering or dimensional production. There's the anomaly detection task, and then we also have tasks like designing, inexpensive, surrogate models or essentially designing experiments, so broadly I think in the BOE we are leaning towards bunching up these and I'm. Sorry I should have mentioned that everywhere, where there is an X on this table, means that that domain science has that requirement.

B

So in the BOE I think we are bunching these requirements into three classes of applications. So all of these applications pattern classification, regression and clustering dimension, reduction. Anomaly detection you can think of as as analytics prompts. So really the vast majority of of tasks is in this space more and more I think what we are realizing is that, despite all of the big X scale, pre exascale machines that we have, it is still not possible for us to simulate the universe and all of its exquisite fidelity.

B

So we do need to design circuit models and that's where you know can deep learning help augment, replace and Hanks current simulation tools. That's where that comes into play now. The last one be planning for control is an interesting topic, if you have say a light source or a telescope or a microscope or a network or a supercomputer or a data central. Can you somehow control it more efficiently, so the self-driving car analogy, you know how does that apply to the do-e? So that's where control comes in.

B

So we do think that you know a table of this form. These three buckets sort of broadly characterize. What the do you might want to do need to do in in this space. Alright, so I think over the last two days and I'm sure even before that I think you all know this by now that deep learning is working in the Atari right, so state-of-the-art results in computer vision, state-of-the-art speech, recognition, results, game playing systems, you know self-driving car systems. Everything is being deployed in practice.

B

So around five years ago we ourselves are ask ourselves this question: well, deep learning is working. I mean I think we see this coming, so it works for commercial applications, but can it work for science, so there are certainly similarities between deep learning for the industry and people in designs and the similarities are in the kinds of tasks that we need to solve. So we certainly, you know, have patent classification regression. You know songs for tasks, but there are some differences in. In particular, scientific data looks different.

B

You know we have many more channels than just RGB our channels, deep, typically are associated with high precision or high accuracy. The kinds of noise and artifacts that we have inside big data are very different from what you might see in a commodity camera but, most importantly, the structure of patterns, structure of clusters, structure of anomalies are very different from what you might see in the image metadata set.

B

So I think you know my favorite analogy is if you have a megapixel camera and you go around clicking images of the world, every image is one point: in a million dimension space now, a contemporary climate simulation has about a million pixels as well. So every image every frame from a climate simulation is a point in a different million basis.

B

Space and now the question is, you know the cluster is corresponding to cats and dogs and bedrooms, and so on and so forth, in your commercial images are likely going to be quite different from the patterns corresponding to cyclones and hurricanes.

B

So the question to ask really is: can deep learning as peace, statistical inference, machinery learn to separate out patterns in both of these spaces? And obviously you you know you won't have a workshop. You won't have a talk on this unless this was working. So over the last again, five years we've been systemically exploring deep learning for a range of applications and, yes sure enough. We find that that seems to us about three years ago we wrote this or Riley block.

B

Three is feels like a very very long time now, but we wrote this or on my blog on on the patterns or essentially, I think the success stories that people seeing at that point in time, I'm going to touch upon some of these later today, but you know feel free to check out that blog later on, alright. So what I want to do next is to walk through one science area, climate science in detail, and then you know just skim through a few other use cases.

B

You know complicated break out on be planning for climate yesterday, so I think some of you have seen these slides before, but I'll just you know, go through these. Nevertheless, so for us you know at Berkeley Lab in the Astro. We really care about science questions in the end. At the end of the day, you know we need to be able to target an important science problem and solve it with deep learning. So in this case the important science from that we're going off phrase to understand climate change, in particular climate change.

B

So power you know, has been characterized by very simplistic quantities. How is the global annual mean temperature going to change by the end of the century? How is the sea level gonna rise by the end of the century? So even though you have a million pixels on an image, you're compressing all of that data, you know and years worth of information down to a single number and then all you try to track is how that number change or changes over a hundred years.

B

But you know there is exquisite fidelity in such datasets, so if you would, if you were to pull out some satellite images, for example, you know you can see a hurricane growing in the Gulf of Mexico, you can see an atmospheric River making landfall in California. You can see an X tropical cyclone in Northeast. You can see a weather front, so more and more people who care about climate change and its impact in the places where you live. You know in a city like Berkeley or New, York or so and so forth.

B

They want to find whether these patterns are going to make essentially impact and where, if they did, it, I think. The good news here is that we have now climate models that can produce such patterns. So this is a state-of-the-art cam, 5 quarter degree model, and you can see that these simulations can produce tropical cyclones and atmospheric rivers extracted cyclones and other well other weather patterns. So it's certainly within our capability to simulate.

B

Potentially, you know what what will happen in the future, but now imagine if this movie, instead of you, know playing out for four months. If this movie had played out for 100 years or a thousand years, there is no way that you, as a human expert or a scientist, could reliably in an unbiased fashion, pull out where these patterns are. So you really need a computer vision capability. An automatic tool to find patterns in such large data sets now I can replace this climate simulation by maybe a time elapsed.

B

You know microscope image, maybe a time-lapsed telescope data set. So the analogy still holds true in that you have massive amounts of space shuttle data that you want to be automatically analyzing and finding patterns in. So you know again, if we draw an analogy between what needs to happen in the computer vision context. For you know, images of cats and dogs versus what needs to happen in science. We share the same problems. We given a dataset, we need to say whether you know is there. Is there a pattern?

B

Is there a cat in this image or not? We need to solve the localization from in the draw me tight bounding boxes around the object of interest forth in the detection formulation of the problem, given an image with multiple objects, overlapping occluded sum and so forth. We might want to you, know, find out essentially three different kinds of bounding boxes, all variably, positioned and incised, and then finally, the segmentation is, is the problem. Wherein you know these boxes aren't good enough.

B

You really need a per pixel prediction on where these patterns are so over the last three years, we've essentially shown that one can apply deep learning to all of these problems and get straight of the office of totally doable. So what I'm going to do next is to walk you through three slides which touch upon what kinds of architectures we've developed to solve these these problems.

B

So first you know about three years ago. It was unclear in our minds on whether deep learning could be applied to the put of a simulation. Like I mentioned these two spaces, these two million dimension spaces, real images and simulation output are very different, so you know we started with a fairly simple alex net style, architectures again a simple input, input, layer, convolution, cooling, there's fully connected layer, and then the job of the network is to predict a binary label. You know, is there a?

B

Is it a tropical cyclone in this image in are a different network. Critics is Ren at Mystic River. In this event or not, you know, a third network predicts where Isabella front this image or not now, I think one of the things that we did carefully and I. You know what certainly encourage you all to do. The same as well is that, apart from implementing this alex net thing, we did do due diligence and implement other baseline machine learning architecture.

B

So simple, logistic, regression, simple K, nearest neighbors support, vector machines, random forests and of course you know, the Roth parameter is associated with many of these methods. So you do some form of hyper crime to optimization, make sure that you've chosen reasonable parameters, given these methods the best chance that they can on this dataset and see how well they do so. Yes, indeed, I think we found that you know deep learning based methods do give you state-of-the-art accuracy but I think.

B

Most importantly, one of the things that we learnt earlier was that all of the numbers, the predictive accuracy, were all fairly high, so all numbers are 85 percent or higher. So I think this was a very important lesson done in that I think there may be a tendency to just jump to deep learning. As the very first thing you try and I think it's important to first characterize how easy or hard the problem is to begin with.

B

There are pros and cons in using deep learning, so I think if you can maybe apply some simple methods and slowly work up the complexity. Try. You know that at some point justify why using deep learning that may be worthwhile.

B

Alright, so I think that exercise prove that convolutional architectures could handle special datasets produced by simulation and and the binary classification promise was approachable. So then, what we did was to come up with the semi supervised architectures again the the problem, the challenge here being that while we may have labels or perhaps two or three classes in this dataset, there are many other patterns that we do not have. Label data form.

B

So essentially, what the semi-supervised architecture tries to do is to take your input. Data set. There is a by now I think, you've seen these convolutional architectures.

B

There is essentially a bottleneck layer and you can ask the architecture to make predictions of patterns box locations and class type for labels, that it has four types that it has, but the the semi-supervised weight in that there is an unsupervised component of this architecture wherein you force this architecture to recreate. You know, match the dimensionality of the input, produce images that match the spatial size and also the the number of channels that the dataset has but the constraint here being that you force this network to go through a bottleneck layer.

B

So if this can be made to work, if a single network can both produce accurate labels boxes and recreate the data set, then it probably means that is that this bottleneck layer has essentially learned all of the interesting patterns. So that's effectively become a meaningful latent representation of the dataset. So after by the way, I should mention that you know. I have not done all of this work. There are a number of folks who've been contributing to these projects and they're all called out in in the bottom of the slide.

B

So these are some folks from ela that we that we work with so the output of this strain network, you know, looks like the following. So essentially have a global image. There are multiple weather patterns in this image, so ground truth is in green, so you have many tropical cyclones, you have an extra pickle cyclone and you have an atmospheric River and then red is what the network's predicting. So a single network is predicting these three events and their locations. Obviously there are artifacts, you know there are. We are missing out on few events.

B

The scale isn't quite right and there is an offset, but that that was a known limitation of this architecture. At that point in time, I should say that when we attempted this, when we try to run semi-supervised learning a scale on this big data set, we found out quickly enough that you could not run this on single GP or a single CPU. So essentially what we do. What we did was to scale this architecture to all of Cory.

B

So Cory is the machine that we have at nurse has about say: 10,000 Knights lining nodes, and we were successful in scaling this architecture out through all of the power system. At that point in time, I think the conventional wisdom was that deep learning would not work well on on CPUs, but I think we proved that you can get a fairly high level of performance in its scaling itself. So this is the the largest example of a deep learning application on a CPU based system. We were able to achieve 15 Panna flops.

B

You know of performance in scaling this up. The next step in that slide is segmentation, so that again, is a much more computationally demanding task. So this is the the tiramisu unit architecture that we used. You probably have seen versions of this by now, I guess in the summer school the input image is a million pixel with 16 channels and the output image is a million pixels with three channels. You know the different channels correspond to pixel wise, you know. Is there a tropical cyclone here?

B

Is that an atmospheric level here, or is it just background? There is an encoder piece again, you start off with you know this big representation. You essentially come up with a compact representation, and then there is a decoder piece that will produce an image of the same native resolution. As the data set a bunch of skipped connections, it's really hard to get such deep networks to converge. So we you know we use that as well now, when we optimize this and ran this on a single voltage GPU.

B

Essentially, what we found is that just passing one image through this entire network requires about 40 teraflops worth of compute, and we had I think about 200,000 images. So you really need a large scale, compute resource to get this thing to run and converge. So so we did that on the summit system.

B

So some it is the number one machine in the Toa and actually around around the world, and you know we were successful in scaling this unit architecture on 27,000 voltage, GPUs and for the first time, I guess in the community this particular deep learning application was able to exceed an exaflop. So that's a billion billion floating-point operations per second. In half precision in fy16 mode- and you know everything hung together at that point, so the code was working, the thing was converging scaling well and so on and so forth.

B

What we care about science, I guess in this in this talk, so this is what the final result of the Train network is you give global image, 16 channels million, pixels and outcomes in all of these prediction masks. So the the black contour lines are what human heuristic has specified. The ground truth is and the dots, the blue, dots and the red dots are what the network is predicting. You know where the features are, so we do see a fairly high quality agreement between the human heuristic and and the network output.

B

You know, Karthik talked about climate me yesterday, I'm not gonna go there, but there is a bottleneck here in that. What we've shown proven so far is that deep learning based solutions can match human heuristics, but beyond a bypass human heuristics completely. So climate net is a project that essentially is headed in that direction. So I think. As Karthik mentioned.

B

You know this particular project won the Gordon Bell Prize last year, so you know I find it really remarkable that you know we had the Turing award for Pai grandfather's, I guess in the same year, and then after that, in the field of HPC. This is the biggest award and again that's also in the AI space.

B

So again, I think enabling a deep learning application to hit an ex-cop is really a big big accomplishment that we talked about for about NES, all right, so I'm going to switch tracks now so from climate science to cosmology and we'll pick on two problems in cosmology, you know one problem is: is around predicting cosmological constant. So again it is very common in science, especially in the competition approach to any science to have theoretical bottom, which you then code up and you wrong. But you know often people are interested in knowing just my theory.

B

Does the model actually make sense, does mass reality so on support so I think one of the things that we try to do is to take these these theory models. So essentially you plug in three numbers. There are eight or so, but we plug in three numbers. We run out many many simulations. We run a so given a choice of three parameters. We run an n body simulation till the age of the universe.

B

We end up with a box with dark matter particles in it and then essentially we turn this over to a deep learning system. Saying hey if I just gave you this box and I gave your regression task in this. In that you are supposed ridiculous three models. Model parameters. Can you know, can you do a good job at that? So there are some colleagues at CMU, Shirley, Howe and others who essentially showed that?

B

Yes, indeed, a 3d convolutional architecture could solve this problem so again, remember that a lot of your examples that we've seen so far are operating on 2d images, but in this case you have a 3d volumes and again remember that 3d volumes show up all over the place in science, so I think what was unique here is that we quoted up a 3d, convolutional architecture, intensive flow and then again scaled it out to you know a thousand corey KNL know it's got a very high level of performance.

B

These are the the network's prediction, so the ground truth is is in the diagonal line, and then the model as having run on different concurrency is is able to match I think reasonably the parameter estimates, so we are taking this walk in a few other directions in the future, predicting many more parameters when you try to create a 3d, convolutional Network and things sometimes don't fit into memory. So we are working on.

B

You know beyond data parallelism coming up with a hybrid data model and pipeline parallelism techniques to get these models to to fit on on big machines.

B

Now the next project is, you know: what's the first project right sitting right here, I think the last talk was was really excellent in exposing you to ganz and how ganz are doing a wonderful job of you know: modeling the distributions of bedrooms and Persian cats and celebrity faces. So you know, that's great I mean I, think some of the features that the facial features that are being captured now just remarkable, but we care about may be more socially meaningful applications and I.

B

Think the practical challenge here is that, despite all of the big machines that we have at our disposal, we really cannot explore all of the parameters that that can characterize the universe. So can we use again? Can we train again to produce synthetic universes, and you know one if we can do this, if we can do so, if we can create again that can produce a realistic universe, then maybe we can explore you know essentially, which which parameters work best and and so on and so forth.

B

So essentially, what most of our did again a few years ago. Now again, you know this work was done. A couple of years ago is remarkably long time ago. I guess in deep learning world is essentially to show that, yes, indeed, you know suit are carefully constructed, can can start generating and producing images that are statistically identical to training data. So there's there's a range of Diagnostics that that we can bring to bear on this problem in Emily. Work walk through the Diagnostics that that people use mostly I, think you appeal to perceptual quality.

B

So you know, can you make out whether this image is fake or not on how artistic it might be so and so forth? I think the neat thing in science is that we with every task I think there can be a quantitative metric that you can propose, and in this case I think we did have.

B

You know a few pass spectra based Diagnostics that convinced us that began was doing right thing all right, so switching from cosmology to astronomy, you know one of my favorite projects is Celeste and essentially what we did in Celeste was to propose a graphical model. I think Emily walked you through. You know graphical models and variation inference. So we we did the right thing.

B

I think we had statisticians work with astronomers to propose a graphical model that will essentially capture the dependence between galaxies and stars, and our CCD counts might arise on sensor, and you know what I'm I think. What we found is that purely relying on variation inference doesn't quite work. The estimates aren't quite as accurate, so I think.

B

One of the things that we try to do over time is to replace some of the boxes in in this less graphical model, where deep learning, essentially, what we now do is to replace you know a mixture of gaussians that we were using earlier in this less traffic model, where the variation auto-encoder that does a better job at modeling spatial descriptions of galaxies. So that seems to you, know again, work work quite well.

B

You know a completely different domain again will have been knock when talk about the LHC next. But you know the the Dodge Hadron Collider is an exquisite instrument. Is there one of the most expensive instruments in science? It can produce data at about a second. There is no file system, no computer system. In the world that can handle data volumes at that rate, so often what physicists will do is to encode particle detection.

B

Logic in FPGA is right next to the detector, so that you can throw away, discard uninteresting data and reduce the data volume to a gigabyte a second. At that point the data is led off to other disks and so on so forth. So what we did again a few years ago was to explore whether this hand-tuned handcrafted particle detection logic that is baked on to the FPGA, see whether that can be replaced by a convolution architecture and again sure enough.

B

A fairly vanilla, alex net style architecture was able to beat the physics baseline substantially and again now you know the LHC community has dozens of workshops in in deep learning, and many folks have now improved upon this base. Point so I think, for you know, range of detectors.

B

You know if you care about particle, classify classification problems and you've had if you use hand-hewn logic in the past, it's worth three examining and coming back and seeing a deep learning and can provide for a replacement, and you know if you're not going to chart about inference here at all, but you can certainly take train models.

B

You can compress them quantize them and essentially bake them into FPGAs or other special-purpose logic right next to the instrument and chances are that you're gonna get good performance, so Steve Farrell sitting right here, has really I, think led the charge in exploring graph neural nets and essentially solving the the track tracking problem. So you may have you know: onion ring like detector arrays around the LHC and when, when the collision happens, you know, particles will hit detectives and your job is to chain them up in the form of track.

B

So this can be formulated as a graph problem. I'm sure that you know Steve is going to touch upon that, hopefully in the post session later, but I think one of the things that we've been exploring and have found is that graph neural networks can do better than the terminal methods that pls you folks have used. So far. Last year we worked on a different, so this is something called the ice cube detector, it's it's in Antarctica.

B

So under the ice sheet there are essentially holes that we drilled in with these sensors being deployed in in these arrays. That's the Eiffel Tower for scale and essentially a neutrino stream through the universe they hit this detector array. So a few of these domes will light up and your job is to figure out which signatures correspond to your neutrino versus some other background.

B

You know radiation so again that community was using a certain physics, baseline hand-tuned by post office over a decade and by using a 3d, convolutional net work and now graph completion Network. We can effectively improve on sensitivity of this of this experiment, so this one, the the ICML a best paper award last year as well so I think this is one of the first useful applications of draft neural nets to a scientific, prompt, all right so I'm going to come back to this this table.

B

You know the reason I brought this up early on, so I guess we'll come to that reason. House, essentially, you know this is the landscape of problems. The majority of them are in the data analytics space. There are some simulation and control problems and essentially I think over the last five years. What we've learned is that these architectures, you know, are relevant and can be applied for these problems. I think early on on Monday.

B

There is a question from somewhere there and the audience on what architecture should I use, and you know when does it make sense to apply deep learning, so I would say that in general you know start with statistics, then try some simple machine learning models and then, depending on your problem, characteristic apply deep learning architecture.

B

So if you have data, if you have a label data that is- and you have a supervised prom essentially depending on the nature of your problem, if you have a 2d image to decondition architecture may be a reasonable point start if you have a 3d volume or 3d con vision. Architecture might make sense if your dataset is unstructured or if there is a national graph property to that area, you can explore a graph convolution net. I you know, I think I'm told that Lou gave a really good talk this morning on sequences.

B

So you know, while we don't really have language from strictly speaking, you know Berkeley that there are so many text modeling, but you can treat time as a sequence and then lsdm zorag men's can be applied there. And of course you know. If you have space-time problems, then you're gonna, you can create hybrid architectures that either do space-time convolutions or there is a hybrid, an SDM in accomplish lock action.

B

So that's you know, maybe something general to keep in mind for and I would say that across the board, whenever there's been enough, training data I think we've seen state-of-the-art accuracy, so this I'm quite confident is working and it's working well now for unsupervised roms. You know we have explored autoencoders for looking at you know the intrinsic dimensionality of a data set finding clusters in the data sets on so forth, but I would say that our results aren't as conclusive I think.

B

In some cases it's well others not as much, and it certainly is harder to do to I guess get unstuck once this is not working for circuit models. You know, I touched upon was the first Cosmo game projects. When not when is going to talk about Callaghan. There are certainly other people exploring Gans for simulations, so that I think is quite relevant and potentially a methodology for enhancing simulations I think there's some speculation that variation auto-encoders could also work in this space for control problems.

B

Almost certainly reinforcement, learning techniques that are being successfully applied can be applied to scientific domains. I think much of the challenges that our experiments aren't really hardwired. There isn't really an end-to-end loop which allows for the possibility of collecting a lot of training data and then instrumentation that you can. You know essentially take a signal from an automated rate, forceful learning system and plug it in so that's going to take a little bit more time to work out now. Normally, detection is a question mark.

B

In my mind, it's not clear to me yet whether deep learning is the right solution, so I think for the time being, you know well thought through statistical procedures are likely going to be good enough, but remains to be seen.

B

You know if deep learning is the principal solution for the object detection now, by definition anomalies are you know, events that you have very very few data very little data for so certainly I think it remains to be seen whether deep learning is right, method, all right. So after having done this for five years, you know I think there are a few things that we've learned a long way.

B

So I think there are some short-term challenges now that we are better able to articulate and then there are some longer-term challenges so I just want to walk you through. What you know we see are are the short-term challenges, so complex data is national designs right. So, even though on the web there's a lot of complexity, around images and video and text I think there's much more complexity in science. You know our data comes in all form factors, 2d images, 3d volumes, 4d, space-time, datasets multispectral, you know imaging.

B

Sometimes the data is dense, but also sometimes the data is natively sparse. Sometimes the graph structure is really the best way to represent your data set. So you know if you download something like eros or white or short ends. It so does that natively support these kinds of data at this point in time, Soniya so I think making sure that the entire software infrastructure can natively support these modalities. That's an issue, hyper parameter. Optimization is an issue I think later. Maybe tomorrow or today, you know you can have a talk on on HBO.

B

Just because you read something on the web. You know you may be coded up Alex. That doesn't mean that that's the best choice for this architecture. So if you go ahead and you write a paper saying, oh, you know, I chose Alex net I got 85% as my accuracy. Here's the paper. It's a new. You know state of the art. You need to be a little more rigorous about it.

B

You need to you know at least: do some have some attempt at exploring some reasonable architectures that could have done better and really I think there are very few people at Google and Facebook and open AI who really know how all of these parameters interact with each other. So how many layers should use learning rate schedules so on and so forth.

B

So I think what we need are automated capabilities so that all of you don't have to become experts in HBO, but you know folks at supercomputing centers and the cloud can essentially deploy some of these capabilities that you can then use now. Performance and scaling. Is it's certainly an issue I? Think if you have data set, that's a gigabyte or tens of gigabytes in size, and you try to train on that chances.

B

Are that it's going to take hours and days, and if one network is going to take you hours and days, then there is no way that you're going to be doing a parameter. Optimization. Now, if you are, you know, amongst the chosen few who's dumb actually has tens of terabytes hundreds of terabytes or more and you'd like to apply deep learning to these datasets that there's no way that that's gonna happen on single machine.

B

So it is really quite important that deep learning run in a performant way on single node architectures, but then also scale to multi node architects, and that's the reason why you know we are tasks and others in the BOE have been really pushing hard on scaling dependencies now this. So these are, you know, I sort of view as technical challenges, I think the computer scientists, the engineers can certainly address those. But this is a you know. This is more of a sociological challenge.

B

If a domain, you know some domain in biology or some some neuroscientist somewhere, if you just do not have training data at all, no label data, then what are you gonna do? So it seems, like a you know, crime at this point in time to not explore deep learning.

B

I guess one should explore deep learning, but if you don't have label data, then you're stuck so essentially I think what we are finding, at least in the domains that I've seen as people are coming to the realization that if only I could have enough label data, I can convert this to a supervised from and hence I can. You know, apply the deep learning hammer I. Think much of the emphasis is now shifting towards how do we go about acquiring ranging label datasets? So this you know, I, don't think abuse scientist can do this.

B

Computer scientists can develop systems like Amazon Mechanical Turk. They can develop web portals, but really it's up to the domain science community to come together and run labeling campaigns and so on so forth. So this is much of a sociological challenge which I think will will play out in the coming years. All right so I think these are actually easy. I think you know one way or the other we're going to get to these in the next.

B

You know one to three days, but the longer term challenges I think are are worth noting and I think you know, many of you are getting started in your careers and you know obviously interested in deep learning for science. So one I think you should be aware of these and two. You know if you have an opportunity to write a grant or lead an exciting program, then maybe something you this is something you can think of.

B

So one I think the the lack of theory in deep learning, definitely bugs a lot of people and I think it's bogging a lot of domain scientist to the extent that they don't want to adopt these methods, so I think as methodologists as practitioners. It behooves us I think to think about characterizing what the limits really are. I mean you can't say that this thing is gonna, get more and more powerful. Just get me more data, just get me more compute.

B

It's not gonna work, so I think we need to characterize what the limits really are. So what are the limits of supervised architectures? What are the limits of unsupervised architectures? What are the limits of semi, Savoy's, architectures I? Think we need to say something that you know frankly, ganzar. Looking really really intriguing, I mean I. Think the the face results. There were just amazing. You know we are seeing promising results in science as well, but I think the question that one of you asks around the generalization limits of gans. That's a central issue.

B

If all of your training has happened in a certain parameter regime, what can you say about the Gann making predictions in and then extrapolated regime? So we really have to get. You know make some statements about the generalization properties of the answer, all right, so the the interpretability issue is again quite fundamental again, you know a domain science may have a well-established workflow and now you're going to pull out an analytics piece and drop in a deep learning piece, and suddenly this workflow becomes a black box and again some people are not comfortable with that.

B

So I think there are maybe two ways of just thinking about this. One is to build in interpretability.

B

So if you know something about your domain science, if you know that the domain you know some laws or conservation laws, PD is on support relevant to that domain, then you should build it in the other ways to introspect it or visualize it. So if I have an architecture, that's doing one fully well, and you know, I would like to use that T factor in my in my workflow, then I think we need to be able to explain what this network does so visualizing it explaining what the network is.

B

Learning in terms of semantic features that that are relevant to the domain. I think we just need to have those tools now. Uncertainty quantification is also important again. I think it just mentioned, maybe a few times that in science, apart from the observation, the error bars are equally important. So we do have to say something about how confident the network is, how much uncertainty perhaps do we have? You know in the whole network and essentially I think developed more for end-to-end uncertainty, quantification, which you know not enough.

B

People have looked at so far, I'm gonna get to this slide. Next I think it is worth calling this out so frankly, I think the deep learning protocol as it exists right now, is very simple right. So somehow you get more data and just throw more data at your deep learning architecture. And then you know if you have underfitting, throw more complex from more complex architecture at your data set. So this this protocol in throw more data or throw more complex network at the at the problem, is it's just not satisfying I mean I.

B

Think if you were to establish a contrast to say how applied mathematicians have thought about competition modeling over the last 40 years, you know the way they go about it. Is they think about what physical system I my study, you know they think about what governing equations might be relevant to that system. They will design solvers.

B

The solvers will typically have some analytical proofs. They will, then you know discretize this all they will think about things like cfl conditions that gives us gives them a handle on what processes can and cannot be resolved. They will think about convergence of these solvers, and then they, you know, think about the implementation on an HPC machine. Do performance, optimization, scaling and sounds forth. So this is how applied.

A

B

Worked in the last 40 years and arguably they definitely succeeded in building bridges and building planes that we trust and trains and so on and so forth. Compare that again to the deep learning protocols. You know this seems very, very simple, so I think the question is in science as as deep learning is applicable to science. If we know something about the domain, then how do we build a protocol?

B

That's more sophisticated, so I'm, quite sure that this will happen that you know we will be adapting and enhancing this protocol, but I think you know more effort is needed in this space, so I think coming back to the long term, challenges I think these are really the problems that will that require attention over the next five to ten years. It's not going to happen in a year. It's gonna take much more time but I think as community. We really should be thinking about these and and working on these, so I just want to conclude.

B

So this was meant to be I. Think a broad sort of breakfast talk on. You know how deep learning is making successful wind roads into science? You know we just touched upon some problems in cosmology astronomy, climate. Some talks are coming up next on chemistry and and high-energy physics. You will note one common theme here in that all of these domains are computationally.

B

Sorry, they have simulation tools, they have a handle on their data sets they have machinery in place, so I think that's what we're seeing that computationally savvy domains are adapting and have been successful in in applying deep learning to their workflows, but domains that have not been completely savvy. Don't want to pick on examples, but there are many I think they are having a harder time.

B

There are certainly you know a lot of opportunities in this space in the be planning for science space to work on societally, important problems, so I think you know you all should should certainly think of that. You know I characterize two classes of challenges, some short-term one, two three years, you know I think they will happen, but then there are certainly some long-term challenges that take the five to ten years.

B

Dissolves- and you know we are certainly open to collaboration I- think one of the reasons we are having this event here in that you know, we've been working on this area for a number of years. Now there are plenty of opportunities to work on the main science problems. Think about the theory of some of the longer-term challenges thing about software and hardware infrastructure for the sharpen challenges. So if any of these sound interesting to you, you know please come and talk to us. So there are certainly you know some other internships that are available.

B

Like some of you were asking about that. So you know the group hires 10 to 15 interns every summer you're welcome to let us know if you're interested, we have something called the Nisa. The nurse X scale application readiness program and we are now hiring postdocs for that program. So you know if you care about some of these applications and optimizing them and scaling them on big machines.

B

There are option T's for that, and you know going forward I think as these town halls kick in and there's going to be a bunch of cop down funding I do anticipate they'll, be staffing opportunities. So if you're interested in you know research positions or engineering positions, there are certainly in option T's here at risk, all right, so I'm gonna stop there and I'm happy to take questions.

B

Yes, I think you don't want where you can fill a big machine is what is called the capacity mode. So you know independent networks running on you know independent nodes and the other is capability mode, so one network running in a synchronized fashion, so we were certainly in the second bucket. So there's one network running in the data parent fashion on all of summit, there was another team, a garden Bell finalist from Oak Ridge that was running essentially doing I'm tuning at scale.

B

Yes, I think you know, data sharing has really been a long-standing issue, I would say in the science community and it's been unclear who really owns that problem. I. Think deep learning will bring that prom to the fore. So I guess I can say this I think there is definitely a desire now in the community to create a hub for both models and datasets and you, as you know, domain scientists may choose to right now. I mean. If you go to this.

B

You can certainly put up your data in it ' and make it publicly available. You know we can have Globus endpoints connected to that, so that the download is easier, so I think there'll be more robust. Support for sharing datasets going forward, I think that's going to happen. One way or the other I think the new unique requirement that's coming up now is around model she raised again. You know you develop your five architectures. You write your paper, you move on the next material scientist who comes along.

B

You know what does he or she learn from you? So if you're open to sharing your model as well beyond, just your dataset, then there should be a mechanism for them to tap into the network's use. You've obtained and again. You know we talked about this high platform to optimization problem. If they they just want to tweak your model in a few ways, then you know they ought to be able to do that, and you know if they just start from scratch, they may never get to whatever you were able to achieve so.

B

Certainly I think there are mechanisms for sharing data sets right now at nask you can use those I guess all I'm pointing out is that there will be enhanced capabilities for sharing models as well, but you know in some ways we are sort of beating around the bush of scientific reproducibility.

B

You know the hope is that, eventually anyone outside your group, anyone in science, can reproduce your figure. You know your deep learning accuracy for the day said that you had and I feel that with Jupiter notebook with a model repository with the data repository, that's gonna happen. Yes, so I guess maybe you have a few questions. One is around the labeling procedure, not showing all the fields. So that is an easy one.

B

So I think we are working on options so that multiple variables as we Grell you into a task can be displayed simultaneously or you can toggle between them.

B

So that is more of a UI question and that I think we we are working on, but the Associated question I, guess that you raise is you know there are all these questions around impact, oriented questions around the damage or you know what happens when these things make landfall and almost certainly you'll have to consider all the variables for doing that analysis of training, your model, I guess all I can you know say in my defense- is that you really don't need a supercomputer for every single problem that you have so it may be the case that once we have and that's that is the vision for climate net.

B

Once you have enough label data, we create a unified model that does a good job of segmenting these known patents. And then perhaps there is a new pattern that you come in with, and at that point you know you don't need to retrain the network or train it from scratch. Maybe there is some transfer learning that you can do to adapt.

B

You know some of the later layers in your architecture for this new problem that that is more and more relevant to you. So so I think my hope is that with transfer learning we'll be able to circumvent that that issue. Now that having been said, I mean you know, why do these pics of computing centers exist when it is so that you can take on proms of this kind?

B

So if there is a completely new domain, you know cryo-em, and you now have a gold standard data set for priori m, but these pics, these images are ten K by 10 K by you know one K, and you know you really have to train a network data panel model panel at scale. Then that's that's. Why we are here. I mean that's.

B

Industry is why we work with industry to develop tools that can scale to that extent, so I feel that I think once a few key people and different domains have led the charge in creating a few central models, then other people will be able to adapt their their networks, but someone certainly has to take the initiative to run these models at scale and I. Think you can team up with. You know people in the DOA or BNSF to make that happen all right, good, so I guess you can enjoy 15 minutes of sunlight and I.

B

Guess we'll continue on at 1:30.