National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 16 - Deep Learning for Quantum Chemistry - Justin Smith

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

My pleasure to host Justin today again for another talk on defining for chemistry: you're a postdoc at autonomous about how to open the black box yeah.

B

All right, so this is a this- is a great I'm excited to give a talk at an actual machine learning conference because normally I'm talking to chemists, and so they don't get the actual machine learning aspect of things that we're doing so are actually the last talk was Jennifer's. Talk was really good because it kind of gives you a idea of what can be done in chemistry, we're coming at things from a little different direction.

B

Actually, it's can be a collaborative direction honestly, so I work at Los, Alamos, National, Lab right now and I work with these guys kept embarrassing Sergey traits yet I come from the University of Florida worked with a dream right, Berg and collaborated with these fellows at UNC, Chapel Hill, so I'm gonna talk about work done with all of these groups, so our application domain is we're looking at molecular dynamics simulations.

B

So how can we simulate physics of atoms using machine learning, you're interesting cases here might be protein simulation simulation of liquids or the simulation of materials and the way that we've traditionally done this? We have some energy function, potential energy, which is a function of the positions of the atoms in the system and your electronic structure as well. If you're dealing with a quantum mechanics problem, we calculate.

B

Oh sorry, we can get forces from these as the negative gradient of that energy function, and these are guaranteed to be conservative, based on the type of function that this energy function is and we carry out dynamics using Newtonian physics. It's very simple. It just gives you a kind of an idea of what one of these simulations looks like, and actually this was fully carried out with machine learning, we're doing these types of simulations right now at Los, Alamos, National, Lab.

B

We're actually scaling these up to millions of atoms. At this point, where we're doing shock simulations through materials and looking at how the wavefront propagates so the traditional techniques are doing. This would be your your quantum mechanics methods, which we briefly had mentioned last talk where you're solving the time-independent schrodinger equation.

B

You have this Hamiltonian that describes different different physics of your problem, but the real key here is that we're treating the electrons in the system is this probabilistic cloud, and so we have this very expensive, eigenvalue problem and it's an iterative eigenvalue problem that we end up having to carry out to to get an energy out of this problem. There's many techniques for doing this: semi empirical methods, density, functional Theory, post, our Kukoc and the order and the number of electrons and your system grows like this.

B

So this is, as the system grows in size, you get up to very expensive even up to end of the seventh. Let's hire higher cost couple cluster methods. Typically in organic chemistry, they would employ these force fields to simulate something bigger. Something like this size like a protein in a binding pocket and the way these work is you have what is effectively a many-body decomposition of your system, so you describe torsional four body interactions: you'd have a bonding term.

B

You'd have angular terms you'd sum up over all these put everything else in these non bonded terms and call that your energy function and you just parameterize everything so different carbon carbon bonds, for instance in different environments, would have different parameters that describe how they how they should interact, and so, if you put all this into kind of a big qualitative plot, what you see is that classical methods which are these force fields, they scale about order in in system size and as you go down to exact solution, it's completely intractable, and so you get these methods like couple cluster techniques which are into the seventh scaling you're, not going to run those on anything more than a couple of a couple of non hydrogen atoms, and then DFT gets slightly larger by finding that it's larger so where machine learning is supposed to fit here is we want to predict at some higher level of theory, but using a techniques is closer to the computational scaling of a force field?

B

Okay, I'm, not gonna, spend much time here, because I think we've seen this like 20 times now. But the idea is you, we're doing supervised learning but have inputs. We have some function, which is unknown or unknown that you're putting your input inputs into you, get a label, and these labels are the things you're trying to fit to.

B

So it's the standard, supervised learning process, deep neural networks, you have an input layer, hidden layers, you have an input vector, you apply, your weights, get your pre activation, apply an activation function and you get your output, so I won't spend much time listening. One doesn't know how that works. Yet sorry, so we're getting there for our application, but that could be you know, depends on whatever your problem is it's your descriptor, so for for our application, what we end up inputting is effectively coordinates.

B

So this big R here, let that represent a set of coordinates for a given molecule. It could be a benzene. It could be that big material simulation I showed you earlier and this we then compute quantum mechanics data, which is our eyes, and these are our labels. So we know what our function is.

B

The problem is, our function is very slow and we don't want to have to run molecular dynamics on this function because it would be its intractable also if we take care in the way that we build our datasets in the way that our machine learning models built in the end, we can just train two small sets of these things with less numbers of atoms and then later test them on things with more atoms. This is this idea of extensibility extending up to bigger systems.

B

So what are these deep learning potential as well? There's a couple types that have showed up in literature in the past, so this type here you'd have some kind of vector, fixed size, vector that describes just a molecule, so it could be internal coordinates. What that means is distances between atoms and angles between them, and then you feed this into a deep learning model. You get a property and that's it very simple. The what we believe to be the better approach, which has come out, probably back in 2005 issue.

B

Doesn't seven people started using this, for you have a fixed size vector describing the chemical environment around a given atom so for three atoms in this water molecule? You have three vectors feed this through your deep neural network model, getting a tonic property sum it up, and then your total property represents the sum of individual site properties.

B

For so I want to get an idea. Give you an idea about what we mean when we talk about something being transferable, or so. This would be, like general, a lot of people talking about machine learning needing to generalize. In our case this means it needs to be transferable and extensible. So transferable just means able to predict on things not in the training set.

B

Extensible is subset of that which means able to predict on things larger than what's in your training data set, and this is important, because we can't run quantum mechanics to get information to get data about a protein in a box of water. That's thirty! Five thousand atoms wouldn't finish in my lifetime. If I try to run this, so we want to be able to train two small things and then eventually have it predict on these bigger systems.

B

So the problem with these fixed size descriptors, is that if you were to add, say another molecule to the system and have a water dimer that fixed size, descriptors no longer fixa, as you now have more bonds, more distances, more angles between atoms and so this grows. And then your deep neural network needs to be retrained.

B

So it's not not very useful for that for these atomic descriptors, since your operation is a sum over properties that you're predicting it can easily be, you can easily add in another molecule and then get a prediction, and your neural networks just need to have been trained to chemical environments that look similar.

B

So this is a great place to talk about again putting physics back into your model. So why build one of these potentials? It's not conservative. We know from physics that these energy function should be conservative, meaning that the forces should conserve energy if we run time-domain simulations on them. So one way you could actually predict forces, you could just directly predict them as a property like this.

B

The problem with this is since machine learning inherently has some kind of error to it: you're never going to conserve forces here, because you have some error preventing these things from summing to zero and it actually representing a conservative vector field. The better way to do it is to have your model predict your energy, which is kind of the definition of what what a conservative vector field is.

B

You predict a scalar, a smooth, scalar field, and then you take the negative gradient of that to get your forces back to your inputs, and this gives you an idea about, or it gives you, the conservative vector field that tells you about where your atoms should be moving in space.

B

So, to represent these chemical environments, we have a look at this oxygen here. You have some distance an atom at a distance, another atom, the distance and angle. If you have another molecule, that's outside some cutoff radius distance away. We don't consider it because, typically in in chemistry, when something moves far enough away, the interactions drop off to zero fairly quickly. It's not always the case. So this is an approximation.

B

In some cases we have to try and fix that, and then we use some smooth cutoff function to ensure that we don't drop off to zero very quickly or else our forces wouldn't be conservative in the end, we'd have discontinuities in it.

B

We also choose to make these models invariant with respect to rotation, translation and permutation of atoms in the system, because these are things that you want physically guaranteed in your energy model. You'd have trouble conserving momentum in any simulation. If you didn't guarantee this.

B

So there's some common components used to make these descriptors invariant back in 2007, Baylor and parinello introduced these descriptors, which looks something like this here. The idea is, if you have an atom centered at zero, and then you have some neighbor J at 3, angstroms away, you're peaking these different functions and you're reducing overall J atoms into the specter, which is a fixed sized script vector. So this now is a once you sum over all the atoms. This describes, basically the distribution of atoms and your local chemical environment.

B

There's a similar function, that's for angles that describes angles between neighboring atoms, with the center atom being the one being the ice atom and they build functions that look like this and end up summing into a matrix that looks like this, and so with these two types of descriptors, you can get not just pairwise but triple information out of your models in 2018 and 2017.

B

We've released modifications disease that just worked better in certain conditions. This one's actually cool, who says these mu and Sigma or learn about parameters were actually having the machine. Learning model learn what the descriptors should look like, which makes a little more sense. We've published so far. Three models on this topic so originally in 2017, this one, which was one of the first that showed extensibility and transferability within an entire I, would say an entire class of molecules, organic molecules.

B

So if we train this, we train this thing to about 22 million different DFT calculations, so different molecules and then test it on other organic molecules and we're getting really good performance out of it. These models bring back some long range interactions through an iterative update and it actually acts more like a graph convolution neural network. So we'll talk more about how that works in a second and then this third model, which I'll also talk so the first one which I in my opinion, is the simplest.

B

Is you have these radial Raptors that build Spectre, which I was talking about that you're summing over your je neighbors and you're building these angular descriptors, which sum over your JK neighbors concatenating, these type differentiate them. So if, like your neighbor Jay's, a hydrogen, you just put that in a specific place of the array, your neighbor K is a different type. You put in a different type of the array you sum over all your neighbors feed this into here.

B

It's just a dense, fully connected neural networks, the most basic thing ever get an energy prediction sum over that you get your energy, it's a very simple concept and then for forces. Of course you just take the gradient of this. Do a back propagation through the network through these things and back to the atomic coordinates, and that's what gives you your forces on atoms, the hip and instyle neural network, which works more like a graph convolutional model? Basically, you just take this radial information.

B

You build your radial, descriptor vector and then here you're doing this interaction, which looks like a graph convolution interaction between neighboring sites or between neighboring neural networks, and so with this communication, you feed this into a dense Network. Actually, these are these are recurrent neural Nets, and then you carry out this interaction iteratively over and over again.

B

Summing up your predictions along the way until your end result is an energy at site, I summing over those producers, your total energy, and so because of these interaction layers being iterative, you're actually able to pull in longer range information than whatever cutoff you've put on top of your model and then the aim net style, neural network potential is the monster of these things, which is recently accepted in science and dances. So you build your radial and angular descriptors.

B

You combine these with these atomic feature, vectors, which are now learn about vectors that describe the type of atom that you have so in the very first iteration for all hydrogen's. They have the exact same vector for all carbons. In the exact same vector you combine them sum over all J neighbors feed it through a neural network. You get an update to your a vectors so now in the next iteration, all hydrogen's are not equal or all carbons are not equal.

B

Oxygens are not equal, so it's like learning what type of atom this should be described as, and then you have this other vector, which is a latent space, that you're learning and you're concatenated that, with the new atom type vector this gets fed through the atomic neural net, another atomic neural network. So this is on each site, and then you can predict different quantities from here, such as the energy at that site, some future total energy. It can actually predict charges as well.

B

We show that it predicts other quantities that are very useful to chemists, and the neat thing here is that you, through this process, you get this T. These iterative updates of your Tom of these atomic feature vectors, and so what you can see in this plot appears that here we have the DFT charge. So this is our reference level of theory charge. This is our models predicted charge on iteration 1.

B

What we're looking at is the charge on this sulphur with respect to changing this functional group are with these different groups shown in the plot, and the key here is that these are all outside of the cutoff radius of model. The model can't see further than 4.6 angstroms explicitly, but through this iterative update at T equals to UX, so at T equals 1. You see that it's flat. It's not learning anything about that. Changing that R group and a further cut off radius and then at later iterations.

B

It starts matching the DFT better, because it's actually able to pull in longer range information from the chemical environment, and so to give you an idea of how accurate these things are. We actually did a big, active learning work where we, where we let the model drive itself through chemical space, to figure out what new data should be included on the next iteration and our models shown here on this correlation plot. So we have. This is our model, and this is the reference level of theory, its density, functional theory, and this is a log plot.

B

So actually, the vast majority of the data is pretty spot-on. Our prediction, these two DFT BMP m, 6 or semi empirical methods were just showing us a baseline to show you that even comparing quantum mechanics methods themselves, we fit very accurate accurately to our level of theory. These are faster quantum mechanics methods which are considered less accurate but they're. What people typically would use to go to simulating something? That's a couple hundred atoms um and sorry. Oh, this is a test set yeah.

B

These molecules, every one of them's, bigger than anything in the training data set yeah. So this is this is a protein. That's got about 350 atoms, this one's a protein with some 212 atoms, and these are force predictions by the way, not direct energy predictions, and so we wanted to look at how well this could actually be used in a you know, bulk phase simulation, so we took this protein with a drug leggett ligand in the binding pocket.

B

The reason this is interesting as drug companies would be interested in running these simulations and actually obtaining the free energy of this thing binding inside of this pocket, and then they can get an idea of whether or not it's say relatively more stable than another molecule that they're looking at in this system at about 35,000 atoms. This is something you could never run as quantum mechanics ever it just wouldn't happen. Your you'd run out of memory probably- and this contains seven different elements inside of this drug ligand.

B

So the big- the big thing here is just the time, is amazing, like we're able to get about a time step, so a single force, calculation for all, 35,000 atoms done in about 300, milliseconds or so, and the simulation actually converged to about a 1 angstrom rmsd, which means that the structures not falling apart. So it's kind of showing you that it's actually retaining the crystal structure that we originally got off of the protein databank.

B

Now, the really cool thing about this is imagine you've only trained a model to organic chemistry. What would you expect it would do if you then showed it something? That's more of a materials problem like a whole bunch of carbon in a box at a high temperature. Most people say it wouldn't work, you'd get garbage out or something, but actually so we run this simulation. It's 24 thousand atoms, 2,500 Kelvin, and we start this models.

B

Cuban mines only ever seen organic chemistry and the machine learning is actually able to figure out that you should be forming these fullerenes when you run a high-temperature simulation of carbon.

B

So I think this is just like a great example of the power of machine learning to pick out physics that you wouldn't expect it to and that's that's it I think I went a little short.