National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2020, 10 Sep 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Week 7 - Symmetry and Equivariance in Neural Networks - Tess Smidt

Description

More about this lecture: https://dl4sci-school.lbl.gov/tess-smidt
Deep Learning for Science School: https://dl4sci-school.lbl.gov/agenda

A

Good morning, everyone uh welcome again to another deep learning for science school. I'm really excited to have tess met uh with us today um to talk about symmetry and equivalence in neural networks, one of the most important topics and feats that we need to get to to incorporate physics and the physics physical knowledge in in our neural networks.

A

In the architecture, um tess smith is the 2018 alvarez postdoctoral fellow in the computing science sciences here at berkeley lab her current research interests include building neural networks from first principles for rich data types and accelerating existing techniques and creating new capabilities for computational chemistry and material science.

A

um Tess earned her phd in physics from uc berkeley in 2018 and as a graduate student, she used quantum mechanical calculations to understand and systematically design the geometry and corresponding electronic properties of atomic systems. Tess has been working on developing neural networks for applications in chemistry and physical sciences uh for a while. Now she she interned at google's accelerated accelerated science team, where she developed a new type of convolutional neural networks called tensor field networks that can naturally handle 3d geometry properties of physical systems.

A

This is a topic that is very uh close to what we are she will be talking about today. So I'm really pleased to have tess here, please test. Thank you uh for joining us for everyone. Please remember to ask questions in the q, a not in the chat that it helps us actually route, the questions to tests and yeah. Please ask questions.

B

Well, mustafa, thank you so much for that lovely introduction and thank you so much for having me um hello, everyone, I'm tess, I'm really excited to be here and to share a bit of what I love about symmetry and equivariance and neural networks. It's a really large topic, so I'm not going to be able to completely do it justice today, but I'm certainly going to try and I'm going to hopefully focus on some things that are relevant to kind of the scientific questions that you're interested in investigating.

B

um So because this is zooming, because I can't see all of your lovely faces. um What I will need your help with is that I really need you to ask questions when I'm saying something that doesn't make sense, um so I will also be monitoring the zoom chat um and what I'm gonna do. Basically, I'm gonna check in pretty frequently so after you know, before I go on to a next slide or next concept, I'm gonna be checking the chat. I would really really love it.

B

If you ask questions as soon as something comes up, that's confusing to you. So please feel free do not hesitate. This is supposed to be useful. I don't need to go through all my slides, I'm happy to just answer questions all day.

B

If we don't get past slide five, I mean that'd be a bit surprising, but it would that would be fine um yeah, and so with that, I first want to explain what is this image that I'm showing you so um I wanted to give an example of why symmetry and equivalence matter in scientific data, so I thought maybe this would be a good example on the let's see on the left. I always have trouble with left and right um so on the left.

B

We have a water molecule, that's just rotating in 3d space and then on the right. What we have is we have this matrix and this matrix happens to represent the electronic interactions between the different orbitals on the different atoms of hydrogen, I'm sorry of of water, so that you have the oxygen orbitals, the hydrogen orbitals and hydrogen orbitals and kind of it's a matrix, because you basically are asking how strongly do these interact? These orbitals interact with respect to each other.

B

The reason why I'm showing you this diagram is to show you, while the water molecule being rotated, looks pretty simple: you're like okay yeah, a neural network should probably be able to understand that a water molecule is rotating. That's not too hard. The problem is that a lot of the quantities that we are interested in in science. uh Look like this. What is actually the hamiltonian matrix of water which, as you can see it, transforms in a very complicated and it's a hard to to recognize pattern as to how it's rotating?

B

So, if you're asking a neural network to like learn that all of those different matrices actually mean the same thing, because they're all just the same hamiltonian for water just in different rotations, um you can imagine that that might be difficult, and this is where symmetry comes in, because if I know that I can rotate a water molecule in any coordinate system and the hamiltonian matrix also can rotate for any coordinate system, then I know that I only actually need one of these hamiltonians to reconstruct any of them.

B

So this is kind of the power of symmetry. um uh If you employ it into machine learning, models and I'll specifically talk about neural networks, but this applies more broadly to other methods.

B

So, let's see if I can get my cursor over here, okay, so a quick outline. I want to talk about what kind of assumptions are actually built into neural networks and into their operations, and then I want to talk about more about like why does symmetry appear in scientific problems or just in in general computational tasks um and then at some point in that explanation, I may have actually reordered things a little bit. um I will describe symmetry in variance versus equivalence.

B

This is uh these two words come up a lot together and often they're used interchangeably, but they actually mean something very different um related, but different. Invariance is just things: don't change under some transformation and equal variances things do change so physicists in the audience would also recognize the word covariance so covariance. If it's a covariant tensor, um that would mean it's an equivalent quantity. um So once I kind of maybe disentangle those two definitions which may actually happen kind of combined with point four I'll describe, how do you actually make models?

B

Symmetry aware and I'll describe kind of the reasoning for why you do it certain ways versus other ways and what you get the pros and cons of each approach? And then I'm going to do a case study which is um I'm going to describe how you get symmetry equivariance to euclidean symmetry, specifically in what are called euclidean neural networks, which is kind of now the superset. It's become a super set of things like tensorfield networks, um clutch gordon nets, 3d scarable cnns.

B

um We all we've kind of rebranded them as euclidean neural networks as they're all equivalent to putting symmetry, and I think that's going to be something that um will probably be new for a lot of people and um but it should hopefully be insightful as to kind of what are the considerations that you need to think about when making these models. Okay.

B

Point five, then we'll talk about just consequences of making your model symmetry aware. Often you make a model symmetry aware for one reason, and then you get a bunch of consequences that were not things you intended, and I can give some very concrete examples from when we made tensorfield networks, then I'll give a brief recap and then I'll put up a slide that just has a bunch of resources, not exhaustive, but better than none on um just links and papers that might be of interest to you.

B

So before we begin does anyone have any questions? I will look at the chat I'll take. I said my tea see you more time.

B

Okay, I don't see any questions so I'll go ahead. Okay, so neural networks are specially designed for different data types and assumptions about the data type are actually built into the operation. So in these next figures, w is going to roughly represent the neural network or something learnable. Some learnable parameters and x is going to be whatever I give to that neural network.

B

So if I have a bunch of arrays, so a bunch of data arrays, I might use a dense neural network. This is often considered the most general um type of neural network and basically it's some linear operation mixed in with a bunch of nonlinear operations. But it's all operating on these data vectors and the assumption built into the network is that the components of the vector are independent.

B

I don't need any special consideration for how these different um components relate, so there's no, nothing special um in the relationship between those pieces of data. Okay, if I have a 2d image, I might use a convolutional neural network and what's baked into a convolutional neural network, is that the same feature can appear anywhere in the image and then it actually means the same thing.

B

So if I get a fluffy ear and this part of the image or a fluffy ear in this part of the image, it's a fluffy ear, they don't mean they're, not different types of fluffy ears, they're just fluffy ears, um and this has an assumption of locality that if I want to find patterns where uh pixels next to each other um are, are they mean more? So they have a closer relationship. If I want to learn about this pixel, I should look at the pixel next to it.

B

That's going to give me a lot of information.

B

Okay, if we have text, I know, maybe recurrent neural networks are have been superseded by transformer networks, um but it's still. It is good for illustrating this purpose. um Recurrent neural networks uh are expecting sequential data and specifically the assumption that the next input or output depends on what came before.

B

Okay and then uh graphs. If you have a graph, you might use a a graph neural network and I put convolutional neural network because there's actually there's an explosion of the types of graph neural networks and um they can all do very different things and I'm gonna more specifically talk about graph convolutional, neural networks, where you have sort of a an aggregation approach. That depends on your nearest neighbors. But this is a case where you have some graph, which is just topological data. The nodes have features on them.

B

The edges can have features too, um and the network passes messages kind of between the nodes via the edges kind of a broad definition, and then um I'm putting this on a special box because, as I said, it's probably one that you're less familiar with and if you have 3d physical data, so you have data that was generated in 3d space by an experiment or in a simulation.

B

You might want to consider using a euclidean neural network, because data in in 3d euclidean space, you always have the freedom to choose your coordinate system, and so that assumption is baked into the network. Okay, so these are all you can see. There's a lot of assumptions going on that. I think a lot of people don't talk about with these different networks, and the key thing about these assumptions is that symmetries actually emerge from these assumptions.

B

So for dense neural networks. We do not have any symmetries, because we really didn't make much assumptions about what data was coming in.

B

um If, for 2d images, we have translation symmetry for recurrent neural network systems, kind of interesting, you have time translation symmetry, but only in the forward direction, because you're using the same module over and over again, so you're kind of assuming that, regardless of when something happens in time that it should interpret it in the same way it still takes in the history, um but the actual module itself is identical.

B

So it has a sort of a forward time: translation, uh graphs, convolutional, neural networks, not in all cases, but in most cases that the desire is to have permutation symmetry. I can order the nodes in any order in my computer, um but it should still mean the same thing if I permute them and I'll talk about that. A little bit more and then uh lastly yeah for euclidean neural networks, you have euclidean symmetry, so I'll quickly see a checking if anyone's got any questions there.

B

I'm also going to switch the power from my ipad to my laptop to make sure that my laptop doesn't die. um Okay, I don't see any questions, so I will continue okay. So yes, so symmetry emerges when different ways of representing the something means the same thing, um and you can talk about symmetry in many different ways. You can talk about kind of the symmetry of the representation so kind of this. The space of possibilities that something can be um operations themselves can preserve symmetry.

B

So if a certain space has a given symmetry, uh does this operation muck that up or does it preserve that symmetry um and then objects themselves so objects existing in these spaces? These representations can have symmetry, so it can get really confusing as to what in the world, people actually mean by symmetry. So um I'll leave out the operations about preserving symmetry for later in the talk, but I'll at least give some examples of what does it mean for a representation to have symmetry versus an object having symmetry?

B

So let's go ahead and deal with with 2d images, so translation, symmetry and 2d? I talked about this a little bit, but just kind of to reiterate features mean the same thing at any location.

B

So, for example, I've highlighted this is a little kagome tiling, um that's a specific basket weaving, but it also shows up in material science. In case people are wondering where the name came from, so you have these kind of upside down triangle features and all of those features. If I'm assuming translation symmetry in 2d, I'm saying that all those upside down triangles are the same. They mean the same thing, but this up this up right side up triangle um pattern does not mean the same thing.

B

It would mean the same thing if I had 2d euclidean symmetry, where I was including rotations and things like that. But it does not it's not included if I purely have translation symmetry, which is the symmetry. That's kind of assumed by a convolutional neural network, so convolutional neural networks do not think of those as the same thing, they don't think of the orange box and the blue box is the same thing: okay. So what about the symmetry of 2d objects?

B

So something that I think is kind of interesting is that the boundary of your image actually breaks translation, symmetry, so a convolutional neural network purely just convolutional. If you don't tack on a deep or a dense network at the end, it can apply to an image. That's infinitely large it generalizes to that case. So it can. It can operate on images of any size. The only thing that limits what size of image you can use is typically the dense layer.

B

That's used for classification, for example, um so that's kind of interesting, so you can have kind of these sub symmetries within within the image. But if you have the boundary sort of you've sort of fixed the origin, um if you put back periodic boundary conditions, you would recover sort of like these discrete translations. You don't have continuous translation of 2d space, but you have sort of discrete translations like if you had a unit cell.

B

um Any questions on that.

B

I don't see anything so we'll go ahead. Okay, so let's talk about permutation symmetry permutation symmetry is the set of sets or sorry the symmetry of sets, and um this sort of means that if I have 10 items I can list them in any order, but it's still the same set of 10 items. So in the case of graphs, where this comes up, I might have graphs with a particular topology like this. This graph in pink here and I might in memory, have the data for each node stored.

B

You know in the array, location, 0, 1, 2, 3, 4 5., but the graph means the same thing if I decided to reorder those nodes. So I don't want my algorithm to be sensitive to the order, because the only reason why those nodes are ordered is because of the details of of how computers compute just the fact that they need to store things in memory and that they need to compute.

B

There is a q, a sorry.

B

Sorry I didn't understand that the q a is a separate from the chat. Okay q, a um for each symmetry. Can we obtain okay? So the question is for each symmetry? Can we obtain a new echovariance transform?

B

B

Okay, so I think yeah, so it's a bit circular because in in some ways like the symmetry means that when you operate like that, there's a predictable way to operate on the object and something's preserved, so in the case of permutation symmetry.

B

um So so it's funny because, like with symmetry, there's sort of like the the platonic ideal of what are called groups um which, like the group of 3d rotations, is this platonic ideal, but normally, when we think of rotations, we're thinking of three by three rotation matrices or like an axis at an angle, and so the the actual group is sort of like the ideal of what it means to operate on something with some operation. But how you actually go about expressing that transformation depends on your representation.

B

It depends on what am I acting on three by three or, if I am acting on cartesian space? Am I acting on spherical harmonics? um What am I acting on? um So I'm not sure if that really answers your question, um but I certainly do want to. I believe the question you're getting at is um is important.

B

Okay, so we've got another question: okay, relationshi. Okay, the relationships between nodes and edges have to be the same. Yes, okay,.

B

One second: sorry, I'm still learning how to like appropriately answer the q, a in live stream. Okay, simply talking about the order, that's in memory, the picture looks like the nuts are different connections after permutation.

B

ah Yes, okay, thanks for the clarification. So um what michael was saying is um that I'm just talking about the order of the nodes in memory and not necessarily saying that these nodes are the same. um So, for example, if I look at node three in the above picture, it's connected to three different other nodes, but in the bottom one's only connected to and yes and when I'm labeling them, I'm labeling them, just as they are in memory and not as them being the same.

B

um This next example should actually clarify that um okay and then let me show this one example and then on above I'll, get to your question.

B

Cool, it's very exciting thanks for asking questions. Folks, love it okay. So if I have a graph I can also say this graph has a certain symmetry, and I think this is what's touching upon michael's question and this is sort of if you've heard the term graph automorphism. This is where this comes in. Where specific nodes are indistinguishable, they have the same global connectivity. So if I look at their next near their nearest neighbors, they look the same next nearest neighbors. They look the same and so forth.

B

So if I kind of look at all their entire connectivity, um you can have nodes that look the same. So in this little bowtie graph, you can see that the two orange nodes are symmetrically indistinguishable, as are the blue nodes. So hopefully that answers a bit of michael's question and now let me check on anabolic question.

B

Okay, so yeah permutation symmetry is um really important in any neural network, where you basically have nodes or points or geometry, so anything where you have to store it in memory, but it needs to be treated as a set, and so you know, even when you do a convolution on images.

B

um One of the ways that convolutions get away from this problem is because they're dealing with geometry. So what you like? How you apply the operation um it doesn't depend on? It doesn't at all depend on which index it has in memory. All that matters is, are you in my neighborhood and what is my relative distance?

B

So that's getting a little bit ahead of other things that we'll talk about, but I'm hoping that maybe that matters or sorry, maybe that answers your question on a bob a little bit.

B

Okay, um carrie anne asking. If I can read the question out that I'm answering absolutely got it, so it doesn't appear. Oh, I see because some of them are in the panel okay. So on a bob's question was how is permutation symmetry applicable to neural networks. uh Neural networks, in my understanding are directed acyclic graph. So order matters. ah Okay, so I think the picture that anabad is talking about is kind of the picture with all the nodes and the connections um that people often describe dense neural networks with um so what I'm taught yeah.

B

So I I don't think about neural networks. That way, because I find that confusing um I, when I think of dense neural networks, I literally think about um this picture here. I don't know if you can see. I hope you can see my cursor um yeah, so if I basically just have matrix operations applying to vectors, that's what I think about when I think of a dense network, so I don't think of the neural network, it's itself as being needing permutation, which is more of like the actual operations of it.

B

I'm not sure if that answers your question or above, but I can try again later too okay. Thank you guys so much for the questions. I will go ahead and proceed so we talked a bit about permutation symmetry. Now I want to talk about euclidean symmetry and again. I want to emphasize that the reason why we have symmetry in 3d space, despite the fact that, like the world is messy and asymmetric, is because we always have the freedom to choose our coordinate system and the physical system should mean the same thing.

B

um So it's kind of unintuitive, because the world is not symmetric and yet there's an underlying symmetry of 3d space. Because, basically, if we took everything out of 3d space and which is empty 3d space, we wouldn't be able to tell the difference if we'd moved a bit or rotated a bit or inverted our coordinate system. It would all look the same. So the underlying space and the underlying representation has the symmetry.

B

But we can also talk about the symmetry of geometric objects, and I think this concept is more familiar with folks, especially in chemistry and material science. They're much more familiar with the symmetry of geometric objects, also familiar with the symmetry of 3d space, from dealing with things like geometric, tensors and 3x3 matrices and elasticity tensors and all that jazz. um But I think when people say this object has symmetry, they usually mean the symmetry of the object and not the underlying representation.

B

So this benzene molecule, for example, is highly symmetric. It has a six-fold rotation axis, several mirror planes several 2d uh or a two-fold rotation axes. um So it's a highly symmetric molecule in case you're interested in the point group d6h, um but I just wanted to kind of introduce the concept that the representation has a symmetry and the object can have a symmetry.

B

um So I hope that's a little clarifying okay. Now, let's talk about how do we actually go about making models? Symmetry aware and I'm going to use an example of how do we make a model that understands the symmetry of it's an atomic structure and we're going to take benzene as another example, so the most general way to express um kind of any geometry is to use coordinates, but this is problematic because in general, coordinates are sensitive to translations, rotations and inversions. So the numbers change.

B

So if I want to give these numbers to a neural network chances, are it's going to be sensitive to any of these changes, but there's sort of three different ways that we might be able to handle this approach one. It doesn't matter we're doing deep learning, throw all the data at the model and see what you get. That is a very valid approach and in many cases it will actually work um with enough training and gpu burning and things like that approach.

B

Two is convert your data to a representation so that the neural network can't possibly mess it up. You've, basically modded out you've gotten rid of all of these sensitivities, to any choice of coordinate system or in the case of graphs, any sort of permutation symmetry like you just have like, maybe a number or a set of numbers that describe uh things about the graph without actually addressing any individual nodes, for example. That would be something that's like an invariant representation.

B

So basically you mod out all the the tricky bits and that that's a that's a fairly good approach, too approach. Three. um If there's no model that naturally handles the symmetry of your system, you can build one.

B

So these are my feelings, my relative feelings on these different approaches, and it doesn't mean that you shouldn't do that. I'm just saying that it's sort of like levels of.

B

I guess rigor- or you know you have certain guarantees in certain cases so approach. One is basically the approach of data augmentation that if I just show my model enough, for example, rotated examples. It'll eventually get the idea that these things are the same, and while that's true that it does learn to give the same output for something it doesn't actually learn that it's the same um approach so yeah, so data augmentation, and also, if you put constraints in your loss function, which can sometimes be you it's very it's typically very similar to data augmentation.

B

So I'm going to treat data augmentation loss function, constraints is kind of the same thing, um and these are very good approaches and I'll discuss that um yeah. So approach. Two is basically changing your input so making your data fit the model rather than making the model fit your data and then approach three uh will go into more detail, but it kind of is more similar to what I talked about in my first slide with how these different models build in certain assumptions about data types.

B

It's kind of that approach of like making a custom network that specifically, naturally handles that data type. Okay. So before I go to the next one, I'll pause for a minute for questions and t.

B

Great okay: let's talk about data augmentation, so I said this a little bit in the previous slide, but it's always good to re-emphasize data. Augmentation is the brute force approach to making to teaching your model, to be symmetry aware or at least to emulate it again, there's not really a guarantee that it's going to always behave. Predictably, when you rotate an object, but in most cases it will in 2d for images. Typically, people get away with um about two or like a ten-fold augmentation and there's various reasons for that image.

B

Datasets tend to be pretty large and have a lot of rotations within them already. um But if you have 3d data like you're dealing with molecules in 3d space, um then data augmentation get very expensive.

B

What kind of data? Sorry, okay, so I'll finish the association and then I'll get to the question on the slide. Sorry, I'm realizing that I'm very distractible, um okay, so yeah, so it's very expensive. So I wanted to give this example of if you're training without cemetery- and you wanted to train a model to be able to recognize that this is a cube.

B

You'd have to show all these different rotations of this cube, and I'm assuming that this is a 3d model, I'm showing a 2d picture, but it's a 3d model um and each of these rotations looks very different. But if my model knows that things are the same up to you know, maybe some rotation or inversion or translation.

B

I only need one example, and so that's very helpful. Okay, so now I'm going to go to the q a what kind of data type is used for the benzene example yeah, so that data type would technically be a geometry or a geometric tensor, but you can also make invariant representations of the benzene molecule. um So there's actually it's kind of an open question as to what you how you want to handle that um does that answer your.

B

B

Maybe maybe not.

B

Great awesome, thank you.

B

B

I do want to emphasize that data augmentation does make sense in many cases, um particularly if it's very difficult to formalize when two things are similar, so you're like okay, these things are kind of similar, but I can't like formally show how I transform this one in that one um that can be quite difficult um or, if there's some quantity that you want to conserve that it's it's not so much.

B

uh It's not so much that you can easily um yeah, basically anything that's hard to articulate as as a group or as a transform anything like that, then by all means, please feel free to add that to your lost function, terms or to use data augmentation, especially things that are kind of messy. Like anything, that's like it's approximately similar and so perturbations things like that, um so I think there are. There are cases that make sense for using data augmentation.

B

They don't tend to come up in what I do um just because I'm always dealing with geometry, and so you really do need to deal with geometric tensors, um but yeah that totally totally can be appropriate for your needs. Okay, I will go back to the q. A okay question is: is there a benefit in terms of self-supervised learning with the equivalent nature of data, so that does that influence? How neural network structure is also equivalent symmetric, self-supervised learning.

B

So I'm not exactly sure what you mean by self-supervised learning. I don't know if that means like, um like an auto encoder, it's kind of unsee, it's like semi-supervised, um but I would say that in general, um having equivariance and symmetry in your network just always helps if you can make that assumption safely.

B

With your data, which in many scientific data sets you can't like permutation, symmetry and and the symmetry of 3d space are pretty good ones to make, um that's only going to help because you're, basically allowing your model to just focus on the actual data and not learning how to understand what a rotation is doing. So I think it makes your model able to focus on them. Probably the more pertinent features, the things that you actually wanted to pay attention to. So I would say it probably helps so I don't know if that answers your question.

B

um But that's that's my answer great another question: can we learn the symmetry if it's hard for us to write down? Yes, you can- and I actually I didn't put this in uh the resources, but I can um there's this really pretty paper by taco, cohen and friends. Called homeomorphic, it's like variational, auto, encoders, homeomorphic variation on encoders like that and there's actually a lot of there's a lot of people working on this. I do not work on this, so I don't I don't. There is a way to do it.

B

I don't know how to do it again. I I use the same group for everything, um but there are ways of doing this. I don't often know what they are um on the last slide. I have a link, there's actually a workshop tomorrow on equivariance and data augmentation and those there's there's, I think, there's at least three or four talks on learning, symmetries or learning invariants from data, so that would probably be a better source of information. To answer your question: okay, cool thanks for the questions. I'm gonna go ahead and continue.

B

So we talked about data augmentation. So that's approach, one! Let's talk about approach two, so let's say I have some invariant representation that basically sweeps under the rug, all the complexities of the symmetry of kind of how I would most naturally represent my object. I you know an invariant representation might be like. I have a molecule, and I say well: an invariant representation is how many atoms does it have and how many carbon atoms and how many hydrogen atoms, because no matter how many times you know I rotate my molecule.

B

It has the same number of atoms. So that's what we mean by sort of an invariant representation, you sort of featurize. The data object to have um these features that don't change under a change of coordinate system or a change of permutation. Anything like that. uh The nice thing about invariant representations is that you can throw them at any neural network with an invariant representation or like you, can throw them at any uh neural network um and it can't mess it up.

B

It'll be fine, and the nice thing too, is that you, if you make a really good invariant representation, you can use it with any machine learning algorithm. So it doesn't have to be neural networks, um and I think you see this a lot in. Like chemi informatics materials. Informatics people have spent many many years of their lives crafting these gorgeous uh invariant representations of chemical and material systems that are very effective for certain things.

B

I will say, however, um if you can craft a good representation: that's fantastic, that's great, um but deep, learning, specialty or secret sauce is its ability to learn representations. And so, if you really want to use a specific invariant representation, you may want to go with a different machine learning model. There's so many more interpretable machine learning models than neural networks. Neural networks are sort of the most black boxy.

B

So if you're not using it to learn those features, um you may just have a better time or you might be able to get more out of your model by using like a kernel method or a decision tree. Those are all fantastic uh models for for getting concrete insights from your data. So that's kind of my my thoughts on that one. Okay, so one question the q. A invariant representation will need future engineering effort right. If so, what's the advantage of deep cnc's in terms of automatic feature extraction?

B

Okay, so this kind of touches upon what we were just talking about. So yes, invariant representations do need uh feature engineering, that's yeah! That's sort of the definition, so you can kind of think of invariant representation. Is equivalent to feature engineering um the advantage, I guess, of a deep cnn in terms of automatic feature extraction, is the fact that it's automatic um and also, I think something that can be nice, is if you really want to use the fact that your model is differentiable and be able to, like.

B

You know, update your input with respect to gradients, which is really a fun game to play. Like okay, like I put my input through the model and I get some output, but what if I want the output to be slightly different? How does that change the inputs? So if you want to use this differentiability feature, which especially, can be really fun for geometry, um that can that you know that can be nice. um What are other reasons?

B

I guess, if you just you, want to sort of be open to being able to toggle on and off different interactions between to get between the inputs and outputs. um This is something more relevant to euclidean neural networks, where all the interactions in the network have very specific data types, and so you can actually turn some of them off and turn on other ones, and you can actually use that to run experiments as to how important are like vector, vector interactions or three by three matrix by three or three matrix interactions.

B

So I think it allows you to ask it allows you to run experiments. I think that's. What is really interesting and and powerful about neural networks is that you can, if you craft them well, then they become little toys that you can experiment with to learn about your data. What does and doesn't work?

B

So I hope that answers your question a little bit great thanks. Okay, so now.

A

Like I said, this actually ended up.

B

Being in the middle of a different point, but let's really hone this notion of invariance versus equivalence, so invariance does not change under any transformation or anything. It's it's the thing that it's the same number um you know mass or like your your your mass is the same regardless of your orientation.

B

Otherwise, that'd be a fantastic. You know weight loss program, it's like you, can just gain use mass uh by rotation, but that's not the case. It's a scalar, uh so invariants do not change um things that are equivalent change deterministically.

B

They do change under specific operation. But if you handed me that operation, I would know exactly how that quantity transforms. So, let's take the example of a 3d vector, a 3d vector has three properties. It has a magnitude which I'm picturing here in orange. With the bars it's got a direction which I'm picturing with the pink arrow, and then it's got a location with little purple dot, and if we consider how these properties change under translation, rotation and inversion, they each transform differently. So the magnitude of a vector is invariant.

B

So if I take, if I have two particles- and I have some relative distance between them- that distance that magnitude doesn't change, no matter how I change my coordinate system, so relative distances are invariant um under rotation, translation and even inversion. um There's some interpretive group theory dance for you.

B

Okay, if I have the direction um it doesn't matter where the vector is located, it still points in the same direction. However, if I rotate it or invert it it, it is different. It transforms. Predictably, I can apply a rotation matrix or an inversion operator um to see how that vector changes, but it is different.

B

Last but not least, the location so points in 3d space are sensitive to all of these, so they're sensitive to translation, rotation and inversion, except in the very special case where the point is at the center of rotation or reversion, but generally speaking, um they do change. If you know I rotate around another point or if I invert across another point: okay, I'll look at the q a.

A

B

Quick, what types of properties and materials or chemistry or physics depend strongly on symmetry. All of them. I mean okay, that sounds maybe a bit facetious, but but it actually is like all of them. So um so, for you know, I don't know if any of you have had a professor who's like and by symmetry. We can see this and by symmetry. We can see that this is. These are the types of arguments that a lot of people um will use to describe material.

B

So here's an example: if I have a crystal that is symmetric under inversion, so I can take x y z to minus x, minus y minus c, um so I can invert it. There is absolutely no way that that material can host any property. That looks like a vector, so it can't have a polarization um it can't yeah. Well, polarization is the first one that comes to mind um and this impacts uh what the elasticity tensor looks like.

B

So if something has inversion symmetry, it's um basically like how much it compresses in all directions is the same. It has an isotropic elasticity. I think that's correct, I'm pretty sure that's correct.

B

I think that's the case, so there's a lot of them um normal modes of materials or not normal modes of molecule. So how molecules wiggle in response to light is symmetry dependent so.

A

B

Of them- um and I certainly have not done full justice to it and answering this question all right, so I'm going to go to the next question. What about discrete rotations? Would you say like the cubes uh yeah, so the cubes have a symmetry. Definitely so um yeah I should have maybe should have included a slide specifically about space groups and point groups, so um you have 3d euclidean space and then there's these subgroups, which are space groups and point groups and space groups, are how you can tile patterns in 3d space.

B

So all the different ways you can tile patterns in 3d space, and in a lot of these cases you break certain continuous symmetries. um You know. Actually I have a slide for this.

B

Let me just and we'll come back. Thank you guys for the questions. By the way I really appreciate it. Where did I put it?

B

B

So ignore the heading on the top, because that doesn't relate to this. What we're talking about the moment, um but yes, you can have discrete rotations and discrete translations and that can still be asymmetry, but these are actually subgroups of euclidean symmetry. So, what's really cool is, if you have a network that has euclidean symmetry you get all of these subgroups for free, so I have euclidean symmetry in space. But if I put a sphere in space, then the symmetry of my sphere is only 3d rotations and inversions, which is the group called o3.

B

um If I put a cone into 3d space, I still have rotational symmetry this way and I have mirrors along here, um but I don't have continuous symmetry like this. If I have a cube in 3d space, I have lost all continuous symmetries, but I still have discrete rotations.

B

I have a three-fold axis along the diagonals four fold axis along the faces and a bunch of other symmetries, and this is this- is the point group o h or the octahedral point group. The uh octahedron has the same symmetries as the q, fun fact, um and then space groups are what you get. If you like. Let's say I take a cube in 3d space, and actually this is wrong. It's it should be. uh 230 is the space group number.

B

This is an old slide that I forgot to update um you tile, a cube in 3d space. Then you recover a specific space group symmetry so hopefully that that kind of answers. The question I'll go back to this slide.

B

Great questions very exciting. Okay, so I think we were here.

B

Okay, so hopefully this clarifies a little bit about what does it mean for invariance versus equivariance and yeah? It's totally okay. If uh something is equivalent to only discrete rotations, it doesn't have to be all rotations in this special case for this vector um it is all rotations, um but a cube is invariant under certain discrete rotations. um Actually, a vector is invariant under rotations around its axis, so that maybe connects more closely to the question that was asked.

B

Okay, so I wanted to give an example again, I'm a bit biased because I work on on atoms, and so all my examples are for atoms. So I hope those of you who do not work on atoms do not feel alienated from this, but I wanted to give an example of in some invariant futurization algorithms and how these can actually be really sophisticated and, and especially, they can be very expressive if well crafted so one such invariant representation are soap kernels, and I have a link to this paper in my resources slide.

B

So there are equivariant operations that then produce invariant quantities and so I'll talk about what it means to be an equivalent versus invariant operation in a later slide. But I just wanted to touch upon this before we move from representations to models. So a soap kernel roughly works like this. So let's say I have this ethane molecule where the yellow atoms are carbon and the hydrogen atoms are blue and I'm going to project the local neighborhood of those carbon atoms onto spherical harmonics, and I won't go through exactly what that procedure is.

B

There is actually a backup slide showing this that I'm happy to go through if that's of interest to folks, but you can basically kind of see that if my carbon atom is here that this roughly represents where my hydrogen atoms are for this first carbon and then this represents where the hydrogen and then this carbon atom over here for this other carbon. So we have these two signals. You can see that these are equivariant quantities, because if we rotate them, they look different.

B

But what we're going to do is that now we have these two equivariant quantities and we're going to basically perform a dot product kind of the tensor equivalent of a dot product, and that's what we're doing down here. So all this is is basically showing what this looks like numerically and I'm separating it by which spherical harmonic they're generated from there's more spherical harmonics, the higher frequency you go, so this shape just corresponds to these numbers, and what we can do is that we can do a dot product basically multiply these elements.

B

So this element multiplies this element and then we sum across the l's, and then we get this number here. So we get these seven scalars. um So you know this is a simplified version, but you know it's a relatively sophisticated operation and it turns out these types of operations if you have enough of them. So you do this for all the different atoms sort of in your local environment. That can be extremely expressive in variant representation of geometry.

B

um So that's just to give an example. Okay, so I'm going to look at the q a do these networks work! Well achieve high accuracy on crystal structures that have periodic boundary conditions. Okay, so I'm going to assume that you mean euclidean, neural networks and so euclidean neural networks can handle periodic boundary conditions very naturally and we haven't yet tested them too much on crystal structures just because we don't have enough people using them. So if you would like to do so, I would I would love to chat with you about that.

B

um But if you look at analogous models like schnet, which is an invariant version, sort of of euclidean neural networks, um they get pretty good accuracy on several crystal prediction tasks, um and so our network should do as well, if not much better, purely from an expressivity point of view, like the operations we have in that network, is it's just more expressive? It can express more complex interactions, um so in theory it should, um but we haven't totally demonstrated it yet, just because of um time and and person power, so yeah, great okay.

B

So now, let's talk about invariant versus equivariant models, um because we talked about invariant representations, but you could still, for example, give your network your data and its full. You know unaltered glory with all these. You know the messiness of translations and rotations, and the model can just only operate in a way that it's only acting on invariant quantity. So it's only acting on relative distances versus relative distance vectors, so it doesn't have kind of the xyz component. It just has the distance, so an invariant model would handle the distance.

B

The equivariant would handle a vector so for a function to be equivalent, and this is general. This is not with respect to any specific group. This is just any sort of set of operations. A function is equivariant if we can either act on the inputs or act on the output. So in the case of rotation I could take my molecule and I could rotate my molecule or I could rotate. Let's say I was predicting forces on a molecule.

B

I could rotate the forces that were predicted for that molecule, so I could do either either order so that operation that symmetry operation commutes with the function um yeah. So I can either do it to the input to the outputs for the case of an invariant function. What that means is that g is the identity, so the inputs to that function um are going to be invariant quantities and the outputs to that function are also invariant quantities. So that's what it needs to be for an equivalent versus invariant uh function.

B

So any questions on that.

B

Okay question from onova quick question, so the importance of detecting these symmetry or to symmetry preserving neural networks is to save computation. Or am I missing something? This is great yeah there's many reasons. So the probably the easiest one to motivate is to get rid of data augmentation. So it does save a lot of training time. It saves on how many parameters and um actually, okay, is this I'm about to talk about this next thing, but I'll talk about how your data goes further with equivariant functions.

B

um There are other reasons too there's a lot of these kind of unintuitive consequences that I'll talk about that end up being super beneficial um kind of as a spoiler alert. Basically, euclidean symmetry has a bunch of really interesting consequences. Space groups, um point groups, um geometry, geometric, tensors, second order, phase transitions. You would think that you needed thermo for that, but actually it kind of naturally falls out to study, euclidean, symmetry and I'll show you some examples of that. If we have time um how are we doing on time?

B

So we start at 9 30 we're about an hour in okay super good um yeah. So I I think there are more reasons than just computation.

B

I think it's easier to interpret what the model is doing and depending on how you craft your operations, you can again experiment a bit more saying: like does this interaction help model my data? Yes, no, but in order to ask those questions, you do have to have a pretty tailored network, so putting effort into the network you know, makes it a better tool for answering scientific questions.

B

Okay, okay, so why limit yourself to equivalent functions? Why not use a more general function, and one of the reasons for this is because you can substantially shrink the space of functions you're looking over, so let's say I have inputs and I have outputs and I want some function that maps my inputs to my outputs, and I know that these pieces of data have some symmetry, let's say euclidean symmetry.

B

I know that I can change my coordinate system now. You could just do data augmentation and teach it to learn about rotations and things like that or I could have a model that kind of inherently has those operations built into it. Under the hood or it respects those operations, so if I have all learnable functions, that's a huge, huge space and then there's all equivariant functions, which is a much smaller space. It's still a very large space, but it's still smaller and physics lives only in equivariant functions, so all physics um phenomena are equivalent.

B

Operations depends on which group you're talking about, but it's all equivariant and then, if you consider kind of all learnable functions that are constrained by your data, that could be a much bigger space than sort of relevant equivalent operations, and if you inherently wanted to learn a model that had the same properties as the process you wanted to emulate.

B

The functions you really wanted to learn are kind of the overlap of equivalent functions and functions constrained by your data. So by constraining your network to be equivariant, your data goes a lot further because you're, basically being a lot more specific you're narrowing in on which function. You want okay. Well, I want it to be equivalent. Okay, cool you've narrowed out that space. It's like okay, and then I have this data and it needs to be compatible with this stage. Okay, well, that that narrows it on this spot.

B

So it makes your data a lot more powerful okay, so I will go ahead and look at the q and a really quick do we need to implement function? G as a neural network layer, ah great question: no, this is just a property of the function. This is more of like this is a condition that your neural network must satisfy.

B

You never have to know what g is, because you prove it for all g, so it just is the case, and this is really interesting because you don't have to know the symmetry of the object for the symmetry to be preserved by the network. The network doesn't know the symmetry of your object. It never does it's just I. It just says I'm going to act in a way that preserves whatever symmetry. This thing has and what's really interesting, is computationally speaking um knowing what something is versus preserving a certain property.

B

It has are two very computationally different tasks um like it's really easy to make something. Permutation have permutation symmetry versus like it's actually really difficult to tell if two graphs are the same like graph. Isomorphism is like a very hard problem, um but ensuring that if they're the same, you get the same answer.

B

That's a lot easier now. Another graph might also give the same answer and that's why isomorphisms are hard um but yeah. I think that's really important that the symmetry equivalent networks don't actually know the symmetry of whatever you're giving them. They just can't violate it. You're just kind of handicapping them you're tying their hands behind their back, saying, okay, you, you will preserve rotation and translation and inversion operations.

B

um Great question: okay. Another question: have you been benchmarking of euclidean neural networks on different predictions against other high performing models such as different types of graph, neural networks? um I have not, but my good colleague ben miller has well I mean we did, but ben did did really all the heavy lifting.

B

um So we do have a paper out on archive right now, benchmark against qm9, which is a data set on small molecules and I'll link that in the or I have the archive number in the resource slide, um and it is a top performer on a predicting dipole moment, the quantity, the magnitude which is interesting because it's secretly a vector quantity, even though it's a scalar in that data set. um Yes, so we have, but we would like to do more benchmarking, but again, um there's only so much time in a day.

B

Unfortunately, so thanks for the question.

B

Okay, another question systems governed by partial differential equations uh like say fluid turbulence have symmetries and conservation laws. Yes, they do um you've shown some excellent examples of atomic physics. Chemistry. Do you have pointers on how I might derive symmetries of pdes and incorporate them effectively, either through data transformations or contracting clever models, um and then say something about lee? Algebra is an incomprehensible math. I'm totally there with you. I I lea algebra is is a bit of a mess. I mean it's very elegant, but it's it's.

B

It's not the most fun thing to learn um yeah, so pdes pdes are really interesting. So there's a bunch of work in like neural, odes and and basically um constructing models where, if you kind of know the partial differential equation you're solving, um you can either like fit certain parameters that are unknown, and this is this is out of my element, um stephen hoyer, at google. I worked with him while I was on the google accelerated science team.

B

This is like this is something he's super passionate about, and I think mustafa and steve I and what he I think you guys are probably more plugged into this than I am so maybe maybe you you can provide some some suggestions. um I know very little about using neural networks to solve partial differential equations.

B

um I think there's a lot of really interesting stuff coming out. I think those methods, coupled with a euclidean neural network, could be super powerful, so really excited about any potential um applications there um yeah. I was talking about stefan about that once um and saying that that'd be kind of the next step for some of the stuff that they're doing in 3d um but yeah. I'm sorry, I'm not I'm not a bit, I'm not as helpful about discussing things about partial differential equations. It's a little bit outside my knowledge zone.

A

There are a few papers, I can point them to the papers. If you want to ask your question on slack, I can send you some papers later.

B

Cool thanks mustafa all right, so another question uh does applying cemetery work well, with transfer learning does applying equivalent constraints, make it more challenging to apply your network trained on one data set to another. um Would you have to change? Symmetries and constraints? Will work on the other data set great question, um so typically, you would do transfer line like say between different sets of molecules, or um so I would typically stay within the same symmetry. I I don't think it would.

B

I can't immediately think of a use case where um you would want to apply like a permeation symmetric graph to um something that's in euclidean I mean you could you could, um but typically again the data type kind of governs which neural network you use? So you can you definitely get very good transfer learning between, um like differing molecules or different sizes, like instead of you, train on 10 different molecules at a very high accuracy calculation and you extrapolate to 100 um these types of methods actually do really well with that. So that's it.

B

It does seem to be very helpful and what's been interesting, is that there are sort of invariant models that are so. You can take euclidean neural network and make an invariant version, um and I was actually discussing with some colleagues yesterday that we were trying to figure out exactly why it seems that the equivalent models are performing with like they're much more data efficient than the invariant ones like it intuitively makes sense like okay you're able to extract more geometric information more easily, but we don't totally like understand mathematically the specific mechanism allowing for that.

B

um So I would say that equivalent models can often do better transfer learning because they might be able to learn more general features than an invariant model and then compared to a general model just because their operations are much more tailored.

B

They're able to do much, much better transfer, learning, um okay and then one more question then I'll go to the next slide. um Are you applying euclidean neural networks after you're obtained data like some md data or applying your neural network during the amd simulation?

B

um So you can use a trained, euclidean neural network to generate molecular dynamics forces um but typically like you, first have to train it on some data, so you can do both, but typically you want to do it with a trained network. First, there are certain cases where you can actually um use the network to uncover certain things and that's in later, uh slides, which I'm not sure we'll get to, but that's, okay, um but hopefully that answers your question.

B

Okay, I might what I might do is I might kind of go through some slides for a bit and then we can go back to questions um because I haven't even explained to you how euclidean neural networks work- and you guys are asking me a bunch of questions about them. So um I better, I better get to that part uh so that so that we can have even even more fun questions.

B

Okay, so this is basically saying general function, space equivalent function, space. Your data can do a lot more. Your data can constrain your function much better if it's kind of this better scoped function space. Now, why not go to invariant functions?

B

And so I kind of touched upon this, and the issue is that if, for example, you have you have inputs x, outputs y and you want to map them and there's any equivalent operation, you need to go between x and y, then you better make sure that that equivalent operation was part of your input features, because you won't be able to compute it with the model.

B

So, let's say again: we have all learnable equivariant functions, all learnable invariant functions, which is a strict subset, and then we have all invariant functions that are constrained by your data, which is in this gray circle.

B

So what I want to point you to is the functions that you actually wanted to learn. They may actually be an invariant function. In that case, you are great. You have nothing to worry about. However, it could be that it was like an almost invariant function, but there was a bit of equivalence and in that case, you're going to be throttled on accuracy and there's certain situations where inherently, what you wanted to learn was an equivalent function and the invariant function is just not going to be able to do it.

B

So these are kind of three different scenarios like you're, either good you're like meh or you're.

B

So this is kind of the three different situations you can find yourself in um okay, so I want to talk a minute about convolutions and how it relates to local versus global symmetry, um because this is really interesting and this kind of touches upon a lot of the questions you guys have been asking about: transfer learning so convolutions capture, local symmetry.

B

So they just look at the kind of other points around it or the other geometry or pixels around it, and the data on those pixels, and it's it's only through interactions with features in subsequent layers that yield sort of a more global symmetry. So, to give an example, this is a rubidium, manganese chloride crystal, and it has these octahedral motifs in the crystal that occur in different orientations and locations.

B

So, if we're, if we have a convolution that is equivalent to 3d rotation, it will understand that these two octahedron are the same thing. However, they may be in different contexts now, in this particular case, their environments look pretty similar because they're symmetrically um the same atom, basically um but kind of as you go further out and further out, as you go from layer to layer, it will eventually see oh well. My system doesn't have octahedral symmetry. It actually has some space group symmetry due to translations and kind of the local.

B

The composition of all the local motifs inside the crystal so convolutions in a given instance, are really are really sensitive to local symmetry, but it's through communication of all those local symmetries to all the other environments that you kind of get a global symmetric picture.

B

Okay, so now we're going to talk about how do euclidean, neural networks achieve euclidean symmetry equivalence and I'm going to do this at a fairly high level, because normally, if I give a talk just on this, it is an hour so just to kind of give context. So euclidean neural networks are very similar to convolutional neural networks, but rather than operate on images.

B

We operate on points because for the cases that were the first applications that we were most interested in, uh we were interested on atomic structures and it's just a lot easier to represent an atomic structure as a point cloud, but these methods very readily transfer back over to images. So you can use euclidean neural networks on on fixed grids. Meshes any data type is totally fine.

B

um So because we're using points we use what's known as a continuous convolution, so rather than having a fixed grid for every convolution center to apply its operation on. We actually, you know compute the relative distance between each of the neighbors, and then we plug that r vector into our filter function. So our filter function is actually a function that takes in that r vector versus it just being a pixel grid. There's a slight difference and of course one thing that's very different about it.

B

Is that not only do we have 3d translation, but we have 3d rotation equivariance. So the fact that it will know that this is a benzene molecule, no matter how I translate or rotate it so again, a quick recap: translation equivalence. The fact that I can identify the bunny rabbit in this image, if it's over here or over there is stems from the fact that we're going to use a convolutional neural network and the main feature of a convolutional neural network that allows it to be translation equivalent, is that it only uses relative coordinates.

B

I only ever care about how far is point j from point I like. If point I is my convolution center. All I care about is how far is this point from me and that relative coordinate system will always move with a translation, but the resulting the output will still be shifted.

B

um Oh there's, a quick question: can you share your slides with us? I think mustafa is handling how those are shared, so I'm gonna just note that to mustafa um so okay, yes, so translation, equivalence. We have convolutional neural networks, uh but how do we handle rotation, equivalence and kind of harkening back to some of the things we talked about? We could do data augmentation, so we just showed a bunch of rotated, bunny rabbits or the molecules in a rotated bunny rabbit. um We could do something invariant.

B

We have an invariant model where we are only caring not about the direction, the relative direction of one pixel with respect to another or one atom, with respect to another, but just the distance. So that would be some radial function um really. We want a network that preserves both the geometry and exploits the symmetry of the problem.

B

So we have something that's similar to a convolutional neural network, except we have very special filters and everything in our network is tensor algebra, so rather than scalar multiplication, we have the tensor generalization of that and I'll talk about that in the next slide. So our convolutional filters have uh two components: they're separable into a learned radial function basically saying like.

B

If I'm this far away, what's my output and then we multiply it times a spherical harmonic to handle sort of this angular distribution and the essential reason for this and I'm happy to go into more detail um at the end of the talk. But the essential reason why we use spherical harmonics is that they have very beautiful properties uh under rotation. Very nice transformation properties. If I give you a linear combination of spherical harmonics in l equals two. So all these l equals two and then I rotate my coordinate system.

B

That signal is still only it's still. Only l equals two components, so you can think of it as the angular frequency is preserved. The specifics of like which spherical harmonics of l equals two um it's described by will change, but the frequency of the signal doesn't change.

B

So, okay I'll take one quick question, one symmetry that I'm thinking about that. I don't think you've mentioned scale invariance. What sort of models can exhibit scale? Equivalence? Okay, this is a really interesting question, um so euclidean symmetry, we don't assume scale invariance and the reason for this is uh physics is different at different length scales. There are models that handle scale and variants. um I didn't link to any of them. I know david warhol was working on some of them.

B

um There are definitely papers on this. um There's a lot of cases where they'll do this by sort of augmenting their filters. So basically, if you have a filter with fixed weights, you say: okay, I'm going to take this. I have learned parameters and then I'm going to make like five copies of the filter that basically like zoom the filter in and out um so you you can do this in a sort of way you can do dilated convolutions, um but I think there's also more rigorous ways of doing it.

B

I'm just not aware of it on hand, but it is something that is discussed in the literature, it's very relevant for image recognition because like if you want to recognize a cat in an image, you want to recognize a tiny cat versus, like you know, a close-up of the cat. um So it's very relevant and there is work on this, but I am largely ignorant of it.

B

So thanks for the question, okay, so yeah, so we have special filters based on spherical, harmonics and learned radial functions, and then we basically have to replace all the scalar operations in our network. So normally you um you, you get your filter function and you have your input and you just multiply them and sum um multiplication in this case can no longer just be scalar multiplication. It actually has to be a tensor multiplication or a tensor product, and to give an example of why this is necessary.

B

um How do you multiply two vectors, so there's sort of three different answers? I could take two vectors compute their dot product, and that gives me a scalar, so that gives me an invariant quantity. I could take two vectors compute their their cross product, and that gives me back a vector or I could take the outer product, and this will actually give me a matrix. So these are three different ways you could kind of combine or multiply vectors, um and so it turns out when you put in neural networks.

B

Everything in our network is a geometric tensor which surprised us, and I can talk that about that. A little bit more so everything in the network now has to obey the rules of tensor algebra and, if you've ever heard of things like clutch, gordon coefficients or wigner, 3j symbols, that's where these things come into play and if that doesn't mean anything to you, don't worry about it! um Okay!

B

So what what do you get from this? Because you know this? This seems very mathematically, not fun. um So what do you get from it? Well, if I give a euclidean neural network, a molecule and a rotated version- and I wanted to predict molecular dynamics forces, for example, the forces that are predicted by the model will be the same as the rotated version modulo the rotation of the molecule.

B

Additionally, the networks generalize very well to molecules with similar motifs and again that's because these convolutions have are sensitive to local symmetry, they're, able to understand local symmetry and then through exchanging messages between different atoms. You can get a global symmetry picture, but that kind of happens in a hierarchical fashion.

B

Okay. Another thing is that if I have a euclidean neural network- and I show it these unit cells- so there's a it's a very easy way to articulate periodicity in in neural networks or in including neural networks, um basically using graphs. um So when I have a crystal, which is represented by a 3d box, that I then tile in 3d space I can choose to represent. This is silicon.

B

I can choose to represent it kind of in the smallest unit cell or the primitive cell, the conventional unit cell or some supercell to euclidean neural network. These all look the same you're guaranteed to sort of get the same per atom output regardless. So, rather than worrying about putting your unit cell in a specific convention, you just give it whatever unit cell. You have on hand um as long as it you know, is numerically precise enough to have the symmetry you want.

B

um Then then that'll be fine with the network and then going back to this first example that I showed on my slide. If you train a euclidean neural network on a single water molecule in a single water hamiltonian, you naturally get back the rich variety of what this matrix looks like under rotation. So this is what you get.

B

This is the payoff of all the math that which, by the way, is under the hood, so you don't have to deal with it, um which is why we made a framework for doing this and I'll link that in um okay, so we're coming up a bit on time, so I want to just quickly go through some unintended intuitive consequences of equivariance.

B

So if I asked you, for example, let's take this bow tie graph partition this graph with a permutation equivariant function using into two different sets, so basically split it up into two: even groups using ordered labels. So I want you to basically train a model that learns to predict zero one or one zero on each of the nodes, and I asked you to do this.

B

What you'll find is that the network can't do this and the reason why is not because your model is broken but because the question is ill-posed, so you know you can imagine oh well, why didn't it learn either the left graph or the right graph like? Why didn't it learn one of these partitions and the thing is the network can't distinguish these two outputs, and so the best thing it can do is produce its average, so it'll actually produce a degeneracy.

B

There's nothing to distinguish the you know the orange partition or the blue partition to be first or second, they are themselves sets of partitions, and so this is something where the you know: you're, like oh darn it. I really wanted to use this for graph partitioning and- and I really need this partition this- this permutation equivalence, but now I can't use it. So you have to figure out how to ask your question in a way that still respects the symmetry um of your data type. So that's kind of an interesting unintended consequence.

B

um Okay, I'll see, there's a question here. Interesting phenomena such as coherent structures and spatio temple spatiotemporal systems can be defined as local deviations from the symmetries of the system can symmetries be hard to define simply but presumably can be learned by the neural network. I'm wondering if the neural network that is learned, the underlying symmetries of the system, can be used to detect coherent structures and new data as broken symmetries. Can you think of a way to build a neural network? That does that.

B

So thank you for the question. One thing that I I love about working with these networks is that I've learned a lot like. It's really changed. How I think about symmetry, and one of the ways is that I don't think of symmetry as an on or off thing. So when we talk about space groups, it's either in a space group or not, but when, when it comes down to it, what symmetry is is the cancellation of certain interactions?

B

So, like you know, if I have an atom to my left and an atom to my right, um I can get cancelling contributions if they're like equally spaced from me, um but if one's slightly to the right or you know slightly perturbed, it's still a fairly symmetric configuration, I will mostly get cancellation from those two quantities and so again these networks, don't necessarily, uh I don't know specifically about making a network that detects partial symmetries.

B

But even if you do have something, that's perturbed um that it's going to be hard for the network to fight the fact that it still looks mostly symmetric and this this actually does. This is physical. um We often think, oh, if I you know, take a hill and I roll down the hill or like. If I perturb a little bit, then I roll down the hill, but that's assuming dynamics and dynamics can make small changes grow.

B

But if, if I'm just doing a snapshot like here is my mostly symmetric object, predict something you're not going to suddenly get like a super asymmetric prediction, because how would that asymmetry have been generated?

B

So unless you learned like a really huge weight to be paid on to this very small difference, it won't really be able to um amplify that perturbation. um I'm not sure if that actually answered your question, but I think it was related.

B

You can craft, I believe you can use, for example, including neural network, to detect uh space group symmetries. I think you can do that. How you would articulate it is actually a little complicated because of how you'd represent, like certain symmetry operations, and I assume that nuance would transfer over to other groups if you in case you're interested in things that are not the euclidean group, um but again, as far as for building neural networks that can learn or detect certain symmetries.

B

I highly recommend the workshop that I'm going to put on the resource slide because they might, they might have some better answers for you, okay and then one more question. When optimizing euclidean neural networks, do we have issues of getting stuck and degenerate subspaces? Yes, you do, um and I will talk about that in another slide. So first I want to just emphasize that. um Okay, we have some of this, so equivalencies again can have unintended consequences, and so the input, intermediate and output data of neural networks must be geometric. Tensors.

B

We didn't realize this when we started, we just wanted a network that had rotation equivalence. We're like all I wanted, was rotation experience and I got geometric tensors space groups. All this other stuff kind of goes to show that sometimes it's worth going through the hassle of getting that echo variance, because you may end up getting a bunch of things that you secretly wanted, but you didn't know that you could get um so geometric tensors.

B

These really just lovely objects, I'm extremely biased, because I work with them all the time um but essentially like let's say I have a three by three matrix which I'm representing uh with these colors and the x x and x y and all this stuff. um I could equally represent that as a linear combination of spherical harmonics as a shape. So it's interesting because geometric tensors can be as much thought about as geometric shapes as a numerical objects, and then you also get a bunch of really interesting data types.

B

When you deal with geometric tensors, so, for example, in 3d space, there are four distinct vector-like quantities, there's sort of the classic vector which is a equivalent under inversion, equivalent rotation and invariant. Under reflection along um the vector, um a pseudo-vector does not flip when you invert space, um then if you have a double-headed ray, it has a lot of the same properties as a vector, except for you can invert it. um And then you have things like a spiral which you can rotate it by 180, but you can invert it.

B

So there's all these kind of little data types and you can actually articulate specifically like this- is a spiral into the network. So this granularity of defining what your object is is super super fun super useful, okay, um so euclidean neural networks also, they can produce outputs that the outputs must have equal or higher symmetry than the inputs.

B

So these are two different situations: I'm going to input either a tetrahedron, so I'm going to input a tetrahedron geometry or an octahedron geometry, and these are the outputs of three randomly initialized models that have been asked to generate. Spherical harmonics from l equals zero to l equals six, and you can see that these shapes all look substantially different, but they all have the same symmetry as whatever I gave it as input.

B

So there's many different ways to produce signals that have a certain symmetry, and so this is sort of something that we realized after the fact, um but that's been useful and one reason why it's useful is that you can use- and this this pertains to, I believe, nicholas's question um about. Do you end up getting degeneracy issues?

B

Yes, you do, but you can also figure out a way to get out of them because of the equivariance of your network, so equivalent neural networks can be used, and this this follows for any equivalent neural network, not just euclidean ones. um The euclidean ones produce nice pictures, but this also works for permutation, equivalent neural networks and other things. You can use them as symmetry compilers and you can use them to find symmetry implied missing data.

B

So I'm going to take two tasks: I'm going to start off with some geometry: let's say this rectangle a specific rectangle. So I'm going to show this specific rectangle, I'm going to say I want you to learn displacements of these points to form a square and the task two is kind of reversing this. I'm going to give you a square, and I want you to distort the square into this rectangle and basically, what we show is. It can do the first task, no problem, it's going from low symmetry to high symmetry.

B

It can't do the second task, and the reason for this is because it is symmetrically ill-defined because the question is well which rectangle did you want? There's two degenerate rectangles. If I'm a rotation equivalent network, I'm like well, I can't tell whether you want the one around what it's oriented around y or the one around z, because you could just or um x you could just change your coordinate system and, if you this is in a recent paper that we have on the archive and it's it's in the resource slide later.

B

This is what the network output is. So if, instead of articulating the displacements as vectors, I do it as in terms of spherical harmonics. So we kind of see the full symmetry of the problem. What you can see is that so the blue points on the left. The blob is just supposed to overlap with the orange point. That means it's doing a good job if it overlaps with the orange point, it's doing a superb job, so the one on the left perfect, the one on the right.

B

You can see, there's actually a degeneracy in its predictions. That's why the lobes are smaller, because you're, basically averaging two signals. So if you imagine kind of the normally sized blob in a normally sized blob- and you sum them- um you get you get this or you get like an average yeah. So it's going to average those two, those two symmetrically degenerate choices, so that's kind of cool, but what's even cooler is that we can actually use gradients of the network to the input to figure out.

B

How did we need to change the input such that we could do this task and what it's able to do is so the inputs on these individual points is just like a one. It's just a just a number so which is just a scalar, but if I allow it to have inputs which I initially set to zero, but it can, they could be higher order, spherical harmonics.

B

What it says is it learns that, oh, you can have l equals two or l equals four uh contributions on each of these points and that breaks symmetry such that it can fit the model and this what this, what this actually means? What this actually means is that it's choosing it's saying the y direction is different from the x direction. That's what this blob is showing you, because you can kind of see it's. It starts off as a sphere, then it goes, and so it's kind of going.

B

Okay, I'm symmetric, like I can't tell the difference between you know: minus x, minus y. Those look the same. It's like a double-headed ray, um but I can tell that this is different from this, and so that's what it's able to learn from gradients, and so in this paper that I have linked below, we basically mathematically prove that all that equivalent neural networks have this property and that you can use it to find symmetry breaking inputs.

B

A

If I yeah, I just want to say like we are at 11, I know that some folks will leave. You can, please feel free to finish your lecture, but I just wanted to say that for everyone who's leaving. Thank you very much for joining us and we'll have another lecture next thursday at 9, 30.

B

Yeah, thank you all for your questions. I really appreciate it. It was really fun.

A

Sorry to interrupt no.

B

It's okay yeah! So just this is a quick recap. um If you have the slides, I might just skip this, but we've talked about symmetry. We've talked about how to make some a model symmetry where we've talked about the difference between invariant and equivalent um and why you might want to use an equivalent neural network um and that these models can have unintuitive consequences when you embed these symmetries.

B

um So I want to give a big shout out to my collaborators and the developers of e3 n, which is the open source repository that we use for euclidean neural networks. So if you're interested, this is the repository for you. I also want to give a shout out to my friend tawny, who helped me with some of the graphics in this presentation.

B

um I want to give a shout out to the tensorfield networks team, and that I was part of um this- is one of the first implementations of euclidean neural networks, and then here are all the resources and I'm going to leave it at that feel free to reach out to me via email. If you have questions beyond this lecture and with that said I'll, take any any remaining questions that folks have but yeah thanks.

A

Thank you tess, for this great talk and for all the work that you put in for all the illustrations and then graphics. That's that's. It's really awesome. It's a very good overview here in that field, yeah um and thanks for everyone for the questions, I'm not sure. If there are more questions here, we see in the chat, I think it's just people are uh commending. You on the lecture and the slides.

B

It's very kind. Those were amazing questions. It was very exciting. So thank you all to everyone who asked questions and I'm sorry that I totally butchered them as I was reading them. I clearly needed to practice, reading questions and doing slides. At the same time,.

A

I think it's it's it's on zoom, so it's inevitable that it like there is some yeah. It's it's much easier in person, especially.

B

A

The speaker is a lively person like you're, given animated sort of uh explanations of things, so yeah um okay, so I don't. I think we have one last question.

B

One question: yes, all right: all right are there any special considerations for applying symmetries to variational, auto encoders, normalizing flows or other probabilistic models? Yes,.

B

One thing: that's nice about variational autoencoders- is that at least in the in the context of euclidean neural networks, which I've done some work on making variational auto encoders it's hard with discrete geometry to do it in a way that I'm happy with, but it is possible, um you want to make sure that your your middle layer, representation isn't just reduced to scalars, or if you do that, you, you will need to put orientation information back into it before generating.

B

um This is actually something that I'm exploring with um some of my collaborators. We have some slides here, so we've been working on, oh where'd, it go.

B

There yeah so we've been interested in working on variational autoencoders, where we basically take local environments and encode them as scalars, so learning invariant representations and then being able to pop them back up into a geometric um object, but you need to introduce coordinate frames somehow, and so the the point of that paper is to talk about all the nifty ways that one can do that that are still faithful to the problem. So I don't know if that completely answers your question, but it maybe answers some aspect of it.

B

Can you post the link to the workshop on the slack channel? um I am not actually on the slack channel, but I think mustafa can't.

A

Yeah, I mean, I think, some some someone already posted the link there. So it's there.

B

Yeah so yeah sorry, so this this workshop tomorrow should be really good. It starts at 6am pacific time because it's an east coast uh workshop, but uh I really the the talks in the beginning are definitely worth it. There's some great speakers, um particularly I've taco cohen, actually came to berkeley lab a while ago and is a good colleague of mine and he's giving a talk and that'll be on a lot of these natural uh graph uh networks. So that should be really interesting.

B

So I'm excited to see that talk and many of the other talks.

A

Sounds good, I think that's the last of the questions again. If you have more questions, you want to look at the material and slides and you have more of them, and maybe there will be like more of the talks tomorrow and you have questions on those especially to tests. Please post them on slack channel and I'll find tests trying to get her to answer some of those questions.

A

B

For sure happy to answer any questions, my emails there thanks everyone. It was super.

A

Fun. Thank you thanks again tess. Thank you very much thanks. Everyone for joining bye, bye.