National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 19 - Interpretability - Been Kim

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

Research focuses on building interprete, Balma, sheen learning and the vision of her research is to make humans empowered by machine learning and not overwhelmed by it. Before joining brain, she was a research scientist at Institute for artificial intelligence and an affiliate faculty in the department of computer science and engineering at the University of Washington, and she received her PhD from MIT. Please help me welcome beam.

B

Hi, can you hear me in the back? Yes, all right awesome tell me once you start not your knee all right, so I'm being I'm at Google brain this. Yes, some more thing all right: okay, I'm at Google brain! This is work with a lot of awesome people inside and outside of Google listed here.

B

I'm curious, I'm, sure some other speakers have done this, but I'm wondering if we can, if we can do a raise of hands that I have a sense of who has how much experience in machine learning, who has trained a machine learning model of any kind Oh who has trained deep learning during your network of any kind MA who use sensor Flo just kidding who you spied for interesting? Who has? Do you regularly read deep learning papers?

B

Oh my, have you read interpretability papers, Oh, have you tried not trained a used interpretability method? Alright, have you have you designed your own interpretability method, all right, cool right? This gives me a really good idea, all right, so I'm gonna talk who doesn't want to start their talk with xkcd, so here's xkcd your friend asks and asking a friend. This is your machine learning system yep.

B

You pour your data into this big pile of linear algebra and then collect the answers on the other side. What if the answers are wrong? Well, just the start of pile until they start looking right. So now you have all trained. Most of you train the model who thinks there is some truth in this cartoon.

B

All right, yes, III, agree so I think there's some truth in this cartoon and is that our problem I think so because this machine learning is a such a powerful tool. We use it to make money, perhaps help people improve their lives and there's this whole industry and hiked around it, which myself- and perhaps you also are part of- but when you use this powerful tool without knowing how it works, you might do something that you didn't intend to do so.

B

Last decades, machine learning community has been responding to the need of understanding how this tool works. This is number of papers, I, just googled, scholar search- and it's really, but it's really important to know that this is not a new problem. Who here are red expert systems paper in 80s and 70s yeah? So a few vote, fuyq' of you, the problem of interpretability has already we have already thought about this as a community back in the days, but I think it's safe to say that we quite haven't solved the problem. So why now?

B

Why do we care now complexity and privilege? Complexity? We have a lot of computers. Computers are cheaper. We can put a lot of data in those cheap computers. We can make models a lot more complicated. Then we could. We could have done that or ever before things are more much more complex than before privilege, they're everywhere. Let's say you wanna escaped from technology and you want to go camping this weekend.

B

Chances are that the way that they manage the storage of your camping equipment, they might be using machine learning to so it's everywhere. So we now have to care, but you might say well, hang on, but I heard this thing that decision trees are interpretable. So maybe- and it is the important question- if that's true- then we should all study decision trees and we should just optimize hell out of it and and then we're done right. So let's do a little bit of exercise.

B

I'm gonna have show do an action you at each slide. I'm gonna show you a tree looks like this and you follow whether there's more than hundred people in this room, I think there's more than 100 people in this room, but you said yeah less than 200 I think okay cool and it is definitely not raining because it's California and once you do yes and yes bless them, love it. So it's a no and yes and you will stop okay, so at each side and we're gonna see three trees.

B

I'm gonna, have you do to action as soon as you? You know the answer? Okay, don't be afraid it's no one stretching. Are you ready.

A

B

Alright I think pretty. Oh, this is like a left hand, some some right hands and some lessons where they're instantly a different time zone. That's true. Maybe their weather is sunny. Yes and then time is morning left, oh, maybe it was confusing. Oh yeah, it's not the mirror image of it. You yeah, okay, so stuck tan all right now, I'm gonna, add one or two more layers to this tree. Okay and I would like you to do the same thing. You ready.

B

Your pressure, alright there's some man's kind of shy, hands raising and some clapping a lot of clapping. Let's see Sonny yes morning; yes, nineteen, yes, free coffee; yes, the clapping right job folks made a mistake. Alright, last like just couple more layers. Okay, last slide, you ready.

B

Some hands come feed on hands, right hands and some clapping all right. Let's see cloudy no you're greater yes afternoon, no you're greater. Yes, what are people less? No people in the first row, less! No, so I think it's clapping good job. So this has five to six features. This is a very small data, five to six features, and this is like five layer tree. It's like a really tight problem in machine learning right in addition, world can you can someone tell me what the overall logic was?

B

What was like the takeaway of this tree, which is there a theme, it's hard? It's really hard right and remember. This is five feature, five to six features and five to six layers. So it is not true that decision, trees and even linear models are always interpretable.

B

So maybe it was a problem with the interface, maybe like trees, branching coming out of everywhere. It was just not good, so maybe something like rules rule lists would work something like this. You have. If, if these this was not true, then you go to ski or not you. If you want to check whether there's a new episode of Rick and Morty, you have to check everything else was false, so maybe rule set something like this: we clump them together in two sets so or or end relationship between busy modules or not.

B

It gets very, very complicated, very quickly, again very small number of features, and this is really not to say that these methods don't work. The point is that, depending on what you're trying to do, there's no one-size-fits-all solution, you have to study what the task is.

B

Some solutions, these solutions to work for some applications, so I hope, I, convinced you that we still need to stay in for the rest of the lecture to hear what other people did in their work, then you might say well that sounds hard, even five layer, trees or too complex, so it's just possible at all. Is this even even? Are we ever gonna get there like? There's this superhuman performance, Network like alphago? Are we ever gonna understand it? Well, the point of interpretability is not about understanding every single bit about the model.

B

It's about understanding enough. That is such that it's helpful for your downstream tasks or goals. So that sounds good. So now, once you write down what your goals are in mathematics, then you optimize for it and you're done right.

B

Not quite and I'll get to that in a bit, but let's just define what our goal is. My goal of, inter for interpretability and yours might be different. In fact, I'll come back to what your goals might be: cuz you're all scientists, if from what I heard, but my goal is the following: I think the goal should be building a tool so that we can help people use machine learning more responsibly. I really do believe that a lot of people wanted to the right thing.

B

They build this model, so observe big population in the world and when they add on you terminal in the cost function, they really do want to understand. What's the impact of that extra term, what kind of feedback they would create, what kind of social impact they might have, but a lot of times they may have the right tool to investigate and answer that question so the tool that makes that enables that help that that answering that question in some way is useful, and that is my goal.

B

So this is. These are not completely list about some of the lists of how do we help one I think the tool should help you make sure that your values are aligned in the model, such as fairness and your domain. Expert knowledge, such as medical knowledge or scientific knowledge, is well reflected in the model when you want it and I think it's extremely important that we keep in mind that this is not just for computer scientists or you guys who all have trained neural network and machine learning models or neural networks.

B

It has to work for everyone, medical domain, like doctors, they have the critical knowledge to make the right decision, but they may not have computer science or machine learning background. It's rare to see you have MD and CS PhD. They do exist in brain. There are couple of them, but it's rare, but those are the times when the interpretability becomes most critical, also important to talk about what are not our goals. It's really not our goal. To make everything interpretable. There are plenty of applications out there where interpretability is overkill.

B

You just don't care that much, so don't spend energy on that. If that's not your goal, it's not about understanding every single bit about the model, but also it's not about against developing, like oh, oh AML, other complex net or architectures. We just have more work to do, and importantly, it's really not about gaining trust, gaining trust and interpretability is a separate problem. In fact, if your only interest is to gain trust, the best strategy is to look at psychology, because humans are very easy to deceive.

B

The secure related results that we're very gullible and if that's on your go, you should just look at that. Not what about revealing truths, which is what interpretability is about? What do I mean? There's this recent paper, a beautiful paper that came out a couple months ago now, where they show that they thought, given a medical image.

B

Machine learning is capturing something to predict some some facts about this patient, whether it's a patient will be deteriorate after he fractured, but they found out later that, oh, it was really reading which machine the image was taken. What kind of the model of the machine was all the confounding variables that you don't machine? Who want the machine to take, get a signal from, and in this case, what interpretability message should really do is to tell the truth and tell them tell two humans that you should not trust this model.

B

So that sounds good. Now we just go back and write down what our goals are, and you optimize mathematically and that's a great that's a that's a great start as a computer scientist right, but not quite because interpretability is fundamentally under specified problem. What does that mean? Well I'll. Give you some examples: safety.

B

Can we figure out all the possible unit tests for which, if a car passes, then it's a safe car, safe, autonomous car, no right, you're, familiar with the trolley problem, the moral discussion between in in an interesting discussion where hypothetical problem you're driving a train? If you pull a lever, you'd kill these four people who happens to be roped into the train, lag or you kill this one person, and you only have two solutions and they ask you questions about what do you think if this one person was pregnant?

B

What if this one person was related to you? This is like really difficult question hypothetical, but that kind of shed light into there is no right answer. It's a very difficult question under specified problem. What safe, autonomous driving its science? This is a. This is a good good audience for this science. You can do machine learning to discover something new and it's something new, so you don't know how to write a loss function for that, so it's under specified other times. You might have missed mismatched objectives. What do I mean you're a doctor?

B

You want to help the patient, but you didn't know that some of the side-effect that this patient might be sensitive to weight gain depression want to help the patients, but you just didn't, have all the variables and you so you couldn't write the right objective, although you meant well all under specified problems.

B

So because it's a challenge in a problem definition, it's not something more data or clever algorithm can help. In fact, if you think about it, the regular supervisors, machine learning has similar problems because a cure sees great star I me much better than much more specific than interpretability, but its accuracy, everything you ever want, maybe its precision. Maybe it's recall, maybe it's something about this particular group that you definitely want to protect. Maybe it's about accuracy in that group.

B

How do you define that group coming up with a metric or clear mathematical definition of your problem that exactly achieves your objective is a hard problem.

B

So hearing this you sound like well sounds like everything is under specified problem right. Are you saying that we need interpretability for everything? Well, no, there are models that were only thing that you care is performance.

B

Probably good example is like alphago I find awful go. Amazing, I wasn't involved in any of those, because the way that it does place go is so different from how humans would have done it and go players would look at the how they play the most 37 and they got excited. They see what this isn't an alien go player. It's beautifully playing. This I would like to learn from it right and that kind of a problem. Perhaps you don't need interpretability, it's beautiful problem as it is, and you're learning something.

B

Stock price is something that something else that I usually example, but it depends on how you feel about financial finance, finance businesses. We also don't need interpret ability for sufficiently studied problem like airplanes, how many people, maybe disorients, actually how many people know exactly why planes fly right, I'm, not in theory, but in practice. Like you measured it you did. You use the P top and measure the pressure navier-stokes equation. You read your dit, it's it's hard, I, don't really exactly understand.

B

I I'm from mechanical engineering background, but I trust it because I'd rather take plane to go to Boston than a road trip. It takes a long time so I trust it and I think most of the time it works, wait. So hey good grades, timing, I guess so I- would rather use that and I. Don't I! Don't worry about that too much.

B

So if machine learning comes to stage where a lot of people kind of accepts it, because empirically you get that you accept the risk, then you may not need interoperability and some other times when you don't want people to game the system. Maybe you don't want to reveal everything. A good example is a credit card score, so we don't need all we don't always need interpretability.

B

So that sounds good. Once you decide that you need interpretability, that's great! Then it comes with all the cousins, fairness, accountability, trust and causality right. Not quite fairness. Trust and other things are not the same thing. Interpretability may help. These guys reveal fairness. Problems perhaps help users to gain trust, but not the other way around. There are different problems so great.

B

So now you have all the goals and and and everything in mind now you made your a beautiful interpretability method and you go ok, here's my method, look at this picture, it's beautiful and that's it because there's no good way to evaluate interpretability method. I hear this a lot and I think it hinders progress of this field because we can evaluate interpretability methods. It's a the idea that I'm going to speak a little more about this. The the high-level idea is that you can make a toy dataset such that you know.

B

We're in the image, for example, is important should be important because you made up the datasets and test whether your attribution method, for example, which that identifies which photo the picture is important, is doing the right thing. So I'll talk about that a little bit, so you can evaluate interpretability methods.

B

All right so now we're gonna go into actually studying the technical stuff doing well in time. Alright. So before jumping in let's talk about some ingredients for interpretability methods. First, you can think about interpretability as some optimization problem, where you have some quality function. That evaluates your explanation e. When your goal is to under this call a function, we want to find the best t that maximizes, that quality function and the quality function can be measured via human human experiments.

B

You test with the final and user like doctors and see whether the ultimate goal has been achieved. Did you save more people, or did you help them do something faster or it could be some quantitative measure which I will also talk about later?

B

You need a model depending on your complexity in your model, if you have a linear model like this you're in good shape, it's a lot easier than if you have something more complicated like this I mean this is newer and you're complicated as how your how your dimensional function, but you get data matters if you're super lucky and I, don't think any of us are that you have you: have a data completely linearly separable good shape, you're great, it's gonna be easy, but typically we see data like this, where all the each class is crossing over the decision boundary all over the place.

B

Now you have to explain this complex data humans, the end user, the customer matters. The only reason that we have interpretability method is for final human consumption and who the human is the final user. It's very important and fundamentally changes the problem of interpretability method. If you're newbie, then that I can, if you're, if you're a more an expert, I, can compact a lot more information.

B

Things that you might want to learn from the model might be very different from if you're on newbie tasks very very important, depending on what you're trying to do, whether you're interested in understanding the the overall. What model does for this particular class of credit card approvers credit card, reject it's different from. If you are, if you want to just explain this one person, one customer who's complaining why he or her his or her credit card was rejected.

B

The former was global and if you're just interested in one data point, that's we call local explanation and of course it depends on low and high task domains. Sometimes you have a lot of time to look at one data point, but if you're a medical expert- and you have to make a decision like this- then you don't have that much time. So there's a lot of trade-offs that you have to think about when you design these methods.

B

There are three types of interpretability methods: we're going to go through some more briefly and I'm, going to spend a little more time. Here. First is explaining data, you don't have any models, no models included, but you just want to look at the data and get some understanding of how your data look. Is it garbage it does? Is it missing too much too much fields? In fact, a lot of problems that you see in the real world or in research problems problem comes from data. You have to look at data a lot of times.

B

It will answer your question why your model is being punky second building interpretable, so that in that case we have no model but conditional human data tasks. You your goal is to learn. Explanation, remove ignore that that I didn't mean to put it there, no models.

B

Second, you might have a bandwidth to build a new model with model bill to train a new model from scratch, so that it's inherently interpretable so we'll go over some methods. How you might be able to do that, but sometimes often at Google. You have a letter, your model, that a lot of people worked on many many years. I can't just like come in as a new galore and say I'm going to change your model. Let's now use ol interminable model that just doesn't work so then, in that case, what do you do?

B

Well, you have to try your best and given a fixed model that you cannot change. How can you do your best in interpreting them to reveal interesting questions as they're interesting questions explaining data? Oh by the way, it's a long lecture, so I'd love questions in the middle I.

B

Remember when I'm, when I was in college an hour and half lectures long like if I'm interested in the subject like psychology I'm like in, but if I'm listening to like chemistry, sorry for chemists in the audience, I am like falling asleep after 45 minutes, I'm gone, so please ask questions if you're falling asleep and you can stand up and stand in the back. That's no problem! Okay, questions! Welcome all right! So, first explaining data, we have a running example, a simple 2d data.

B

You have first class blue circle and then red Xs to tsukasa's 2-dimensional data table for one simplest possible way and I hope. You're doing this is by getting the mean and center Divo of these data for two classes so for existence. If you gather these, two classes have like couple features and they completely overlap. Then you kind of have no hope.

B

You have less hope. I should say. Maybe you can project him in somewhere. Manifold and you'll be fine, but you can see that it's gonna miss this guy this. This lonely guy here you're just counting up the majorities. Then this guy is gonna left alone, so maybe there's a better way.

B

It's widely known, as exploratory data analysis, if you're coming from HCI and visualization community, there are a couple of other names, but the idea is basically just how people parse this complex data a little better by either making the interface better were building some back in our algorithm to help you understand better and I'll talk about one of themselves.

B

One way to do this is instead of giving you just a mean and variance I'll, give you an example. So let's say: oh now we have a one. Let's say we cluster them and instead of saying oh, the mean is like two and three I give you this picture and say: they're true clusters, one cluster dogs, look like this. The other cluster dogs look like that. Simple k-means worked surprisingly well, but what about these guys?

B

They're gonna, be left alone, and sometimes when your classifier or clustering method is acting funky very often that these guys are the problems. So you definitely want to know about these guys. So here's a one way to do it with what we did is is a paper from 2016 Europe's. We picked a prototype that is kind of maturity that represents the maturity and then given we feel a distribution over this prototype first and then, given that we try to learn another distribution that captures the difference between majority and the overall distribution.

B

So we want to capture these minorities that are not too minor, like it's, not outlier, but it's significant enough that you have to see, and we of course, leverage this kernel MMD trick. That gives you really nice guarantees and computational efficiency, and here are some examples results. We have two dog classes prototypes. You see the face of the dog and three typical pictures for criticisms. We see dog in a rabbit, costume, cute, Santa, Claus, black and white picture pictures with dogs without their kind of face their faces are hard to see.

B

So those are some examples of explaining data itself. Of course, I'm swiping, a huge body of literature and visualization and heci community their work under the rug, but there's a lot more work that we have to do in this I'm, not expert in HCI, so I, always whenever I go to HCI conferences. I'm like please help us, we need your expertise, but next up we're gonna now build a model. Next type of interpretability method is building inherently interpretable models. So what are they again? Our data points two classes.

B

First, simplest personal suppose, one of the simple way to do it is learn. Rules you've seen this before a decision tree is a type of this. This style of modular medium that you use, there's lots of rule is the rule says lots of work has been done, especially Cynthia. Rudin from Duke has done some great work, including learning certifiably, optimal rulest, so you can run this and you have the optimal under some assumptions.

B

You have optimal role list another way, but rule may not work for you like I, say if you're looking at pictures, you can't really based on one pixel divide the picture that probably doesn't really much make sense. So there are a lot of other ways.

B

You can fit a simpler function for each feature, for example, so here there's a lot of blue dots here and I projected it onto this dimension, the the X x FF to dimension sorry, the X of 2 I got rid of the f1 and I projected everything to f2, and you might have some nice distributions like this and in fact, there's a nice family of method called generalized linear model, which Khurana did some awesome work on this. Where you can. These are just the kinds of generalized linear models.

B

You can feed a linear model, of course, or you can put another function on your prediction. Variable Y and again have a linear model. You know, site sinusoid functions are our type of dis. You can make it a little more expressive by fitting functions for each feature. That's another way to do it, and and what rich Khurana showed in his paper in 2000, I'm, forgetting a while ago, is that he can match a highly expressive. Neural networks performance with his simple, generalized linear additive model.

B

He added a one more terms for that captures inter to feature interaction, but that's it. He was able to model that is as good in the prediction accuracy as neural network.

B

Another method that you can do is model distillation, so you have a complex model, a neuron Network and it did all the heavy lifting. Then you train another model using input and output of that model. So you forget about the true label. Now your training, a model using the input and predicted label from this heavy lifted neural network, and you build a simple model using that and that's called model distillation, a type of model distillation.

B

You can do it in just by removing some layers, which is what what happened in in this paper, or you can learn a trick. You can learn a model for a part of the neural network with this complex function or you can try to distill the whole thing, which is obviously harder.

B

What I really find powerful from my experience is example based interpretability methods, so what it is is similar to what we looked at in k-means instead of saying that this class is either based on this rule or this function, we just give you the example. This class looks like these. Those this class looks like this, these other anymore.

B

Why is this powerful humans use a lot of context in their reasoning. In fact, there is a famous studies and studying firefighters, where they the way that they make quick decisions. When the thing happens there like this, when it comes to okay, you go there, you go that post and we're gonna do this and the way that they do. This is what they call recognition, recognition, prime decision-making, and what that is. Is they basically think about all the other?

B

Okay, all the other incidents that they had dealt with in the past and they map the solution and modify the solution to this new problem, and they do that iteratively they don't use rules, they don't use functions, they use examples so and I find this is really powerful in like medical cases too, if you're a doctor, it is better to talk about a group of patients using the very patient that you dealt with and you thought about, and you know their histories, their medical history I find it's usually very powerful.

B

So now we're gonna move on to the last topic. Don't worry. This is gonna, be a little longer than other the other two post-training interpretability methods, any questions so far.

A

B

Right again, our data points. Now we have a bottle. You gave me the model and the model looks like this cool you fit some function for read some function for flu.

B

One way to do women way to do ad post-training explanation is simply get rid of one feature and see how it looks like and that's typically called ovulation test. So, for example, if you have a picture or a categorical data, you remove one of the factor like age and C. So, basically, just shuffle that feature now the age doesn't give you any information, it's not a real human and see how much the accuracy drop. If the feature was holding a significant signal, then accuracy will drop more than if it didn't that's population test.

B

Just simply, there's a smarter way of doing this. There's this paper by penguin and Percy Liang at Stanford. They use this thing called influential functions. It's a pretty well-known function in outside of machine learning community to determine which images in the training that was most influencing prediction of this picture. So what they showed. This is kind of hypothesis.

B

I guess is that inception is more expressive and learn better representation that picked out this fish that are actually relevant, whereas SVM, which have might have learned more superficial superficial picture, features, had just picked out pictures that has similar color. The challenge with this Meccano method is that it's computationally very expensive. You got to invert some matrixes and once you do pseudo inverse things kind of fall apart, but there has been a lot of nice work.

B

I'll follow up to make this method a little more efficient second method, which I'm gonna talk, spend a little bit of time on is sensitivity, analysis, fitting linear model or function gradient based methods. These are all kind of similar flavor. What do I mean? Well, here's line method you probably who have who have heard of line so okay cool. So this is like one of the most widely used. What they do is really simple and elegant.

B

What it is is you have this decision boundary looks like a red class and the blue class, and you have a mock data point that you want to explain. Then what you do is you randomly sample data points around you and then you fit a linear function, and now you have a linear function and you look at the weights and see which one which feature was important or not important. Now, if you're thinking ah is that work, then your intuition might be right.

B

It's a it's, a very sensitive method, depending on how you sample the data, your linear, linear, classifier, might be all over the place. There's a nice work by a couple of people from MIT that showed the robustness, the lack of robustness of lime and after that, a body of people moved in to improve the robustness of such methods.

B

Another body of method is called salience. Emags I'm gonna talk at length about this. What is salience EMAP? It's you have a picture like this and you get ended up getting picture like that. What is it? Well, you just look at every single pixel and you take first order derivative of the probability of like starfish, with respect to every single pixel in this image. If that means intuitively, if I change this pixel a lot and the probability changes quite a bit, that means that the pixel is important.

B

It's just a simple sensitivity test and we build bells and whistles around it to make it fancier and that's bottom work hold salience EMAP. A lot of them are based on gradients, and you get end up getting images like just like this, where this is a vanilla gradient and this method oh yeah. This is my work and I'll criticize this work very soon after you do some fancy things. You get something more kind of like the tower first problem with this type of approach.

B

Is that when you look at just one data point at a time, you're looking at basically this, this is first one of the feature f1 you're taking driver for shorter derivative of probable of the class. Now because your locally fitting this linear function, you might have these two data points that guy and that guy that are very similar in both values of F, 2 and F 1, but has completely different explanation, because God knows what your function does. Your functions might be very picky where explanation will be all over the place.

B

Well, what's wrong with that. Well, well, couple problems, if you, if you present explanations that are conflicting and human, look at it and they're like oh, ha these two patients look very similar, but they have a completely different explanation. It's matter of time that you completely lose the confidence from, and the expert might say. Oh yeah! No! No thank you question earning is not ready, and the second problem is that as a human, even if this is true as a human, it's really hard to grasp the whole idea, because we have a.

B

We have a limitation in our memory. If you want to have at some general notion of how this model works, this may not be the way that you want to do. Here's a second problem so again, just just to recap our problem definition. The problem is, you have a model that someone else trained and your goal is to answer what was the evidence of prediction? What caused the prediction? So, for example, you have, on your network, paste a picture like this and predicts that this is junk uber-cool. So then I'm asking.

B

Why was this a jungle? Bird so say? Let's see, map will give you something like this and what this means- and this is important- that these pixels are evidence of predictions. These pixels are why this bird is predicted as a bird, so cool that sounds fine. Then we can ask some simple sanity check question. If these pixels are the evidence of prediction, then when the prediction changes, then explanation should change.

B

In fact, we can think about Oh an extreme case. When we make the network garbage just randomize everything, then the explanation should really change now the network is garbage, didn't don't know anything about the bird, then in the probabilities very low chance that by accident you will have the same explanation. So we did that cast yeses work with all some PhDs unit at MIT Julius. What we did is the following: we took a network, that's been beautifully trained, predicts doing the right thing and get this.

B

We got the salience map and we saw okay well, the belly of the bird seemed important. The cheek of the bird seemed important okay sure.

B

Then we took that same network and we start randomizing the layer, starting from the top all the way to the bottom and after all, you have a completely random network. So this network, we just completely initialize the weights, it doesn't know anything it had no backward test whatsoever.

B

So you would think the explanation should change it doesn't and when we first saw this we're like hoho, that's as bad like many years for a lot of people have to work on this, including myself. What is happening?

B

Well, but you might you might study, look at this two pictures and say: well, you know it's not the same picture, there's a sky, the little darker than the other one and so on, but remember the only reason we're doing this work is for you to consume the you, the humans and as a human I, don't know if I will draw a different conclusion.

B

Based on these two pictures, the belly looks pretty important still in the cheek still important, and if they look the same to you and to me I think that's a problem, so we did this for many methods. Let me actually show you the next picture. Oh it's not here we added a lot more methods, so we did a lot of different. This is a vanilla gradient. This is smooth.

B

Bread is one of my word gradient input and and in Taylor gradients and so on, and, as you can see, some methods rarely change as we randomized the top layer all the way to the bottom layer. So we are randomizing this in a cascading fashion.

B

My accumulated way so at the last column, you're looking at a completely random network, completely random, completely reinitialize and even the first second column here the prediction is random: prediction is completely garbage, but except perhaps for grad camp, although it somehow recovers where the bird was a lot of a lot of other methods seems to be pointing at where the bird is so. We were shocked by this result and we said.

A

Okay, well, let's do something.

B

Else, crazy. We, yes! ah Thank you great question. Yes, the darker red or purple is a more important than the wider one, I'm, not completely sure. So, if, when you win, the network is completely random, it's just a random projection right, it's a it's an input image and it take that image and randomly project that image into some dimension higher dimension and you shrunk. It right, I'm, not sure how much and I can make this random weights in any way. I want doesn't have to be Gaussian.

B

We initialize to be I, believe we initialized to be Gaussian we're water stein, but it doesn't have to be so I, don't know if the deaths will still carry steady. Might the state still carry the statistics of the image?

B

No, this is like a completely random I, don't think it we're doing. We haven't tried anything rich if I haven't trained anything right, so there's no bad norm normalization during training, nothing, but they would think about how to cut a little bit. I'll talk about another, deep image, prior work that came out in Europe's 17, which observed very similar symptom. The culprit is convolution operation. So it turns out the convolution operation itself. Is such a good feature extractor, even if it was never trained, which is kind of shocking and I? Don't understand why?

B

But it just does so if you have a completely random Network and you pipe an image. So this is untrained and you collect activations in some layer and use that activation to train a separate Network like linear, classifier or SVM. Again. This is like a random projection right, but apparently it works pretty well yeah, as it's fun to be in this field.

B

Every day, like shocking results comes out great question, great question: we don't know and I think that's the problem, so I'll talk about a method that I think of after seeing this failure mode to kind of go over. That idea of having to they've been constrained into a pixel space. I'll talk about that. We can use higher-level concepts like color, like texture or like something else and I'll talk about that method. There's, although we had like thousand, we took the in Section B 3, which has thousand classes great question.

B

So if you get the Bunya like gradient and there's a lot of work doing this, they call it discriminate ability if you have two classes in the image and you're taking the gradient with respect to one class versus the other, then this too should be different right. Unfortunately, a lot of methods, including mine, doesn't really have that. If there's a thing in the middle and dog in a cat, then it will say happily say all the things in the middle I like it.

B

There are methods who just to who try to distinguish and add this as a loss function to do that. I, don't think in this society check context, I, don't think anything would change just because you have to cuz it's already for me and they're questions. Yeah I, like this question and I got this question a lot. I think the problem is that the promise was that these are these pixels or why this bird is a bird.

B

It I didn't say here are set of pixels that might or may or may not be the cause of prediction. So if the claim was that it's a recall that I will give you some hundred things- and maybe one of them might cause the prediction, then I think perhaps, but even then I have some questions. Why do they it? Those two happen to exactly the same, and my conjecture is that convolution operation itself is just looking at the image.

B

It loves the image and loves edges because it is kind of trained to pick up the edges and by getting the somehow the first ordered radio and even new, even though it was with respect to the probability final logit layer. It is still highly biased to just the image itself and it has less to do with prediction. It might have to sue something with prediction, but not as much as you think it is so like extending this farther. Let's say you have a I think about this.

B

A lot like you have a cancer image right and your your some big company put out a medical tool that says here's a cancer image. Here's your your my own cell cell specimen and then here your doctors will look at salient see maps to tell whether you have a cancer or not I will be very worried, I, don't know! If it's getting today, precision is it a recall by having some attention map? Am I biasing the doctor to miss this other feature that she should he or she should have seen? That's I would be worried.

B

Yes, I see, I wish I have other pictures, we're not cherry-picking so to answer. Actually, this is a good point to point out. We do consultative evaluations in the paper over like many many pictures, ten thousand or so, where that's not necessarily true yeah and we do three different metrics, because none of these metrics are perfect, Pearson, Pearson correlation coefficients and other computer-based vision method where it compares similarity between two things. I think it I think it depends on the message.

B

So, let's look at it I think for this guy, like this thing, is definitely more birdie than this random thing. This is completely random and this is only one they're random, but Brad can I, don't know what this is doing, but it kind of recovers somehow recovers the bird again, so I think it depends. I. Think that's a question I think it's interesting question, but we do see an amazing performance on language transfer which doesn't have convolution layers but I think there's something special about convolution layer for sure that we don't a lot of them.

B

We don't understand. Maybe we need scientists to work on it. I would take this. You have the question: hey Co, all right, so we can take. We can talk about the solute more later on, but let's move on to my next crazy experiment, so we're shocked by this and we said: ok, let's do something again crazy. What we did is we took em nice dataset and we shuffle the labels and we train the network. So this time we train the network but with random labels, and we got some results for Saline seam ABS.

B

Remember this network never learned what 0 is because 0 was 0 to 9. All the digits are randomly labeled. However, if you look at this experiment, explanation I can still kind of see the number 0 yeah it is pretty different, but still if I were given just that explanation, I'm not sure if I can tell these are salient C maps from random model.

B

Random label model, not surprisingly, but perhaps surprisingly the vanilla, kraid Ian, seems to be better at revealing the truth than others and remember all these fancy methods are using vanilla, gradient as an ingredient and doing some putting some bells and whistles.

B

Right, so what is this? What can we learn from this? Well, it's something that I very often encounter is our confirmation bias. So this entire field of saliency maps has been developed. Many years we looked at many people looked at these pictures and they expected to see a bird. They saw a bird they liked it. This is right, including myself.

B

I had I had work since alien snaps too, and that's something that we repeatedly see that when humans see on evidence that agrees with your hypothesis, we love it or like, of course, it's right, of course, I'm right and that's something that it's a feature of us, maybe not a bug, and but it's something that we have to take into account when we're designing these methods. I perfectly talked about this about the deep image prior work by aldy rule. Another paper also mathematically proved that some of these methods are just reconstructing the image.

B

It has nothing to do with prediction. Around the same time, however, some of these papers has gone out of their way to check, with doctors and experts to see whether this having these maps are helpful and they show that they did so. Perhaps there is something perhaps something that they're showing some candidate set of pixels. Maybe it's useful in some way, but we need to do a lot more study to figure out what was it and how do we amplify that signal?

B

Recent work by Sanjeev Arora from Princeton, followed up and suggest a very simple elegant fix to pass this test, which I find pretty awesome. Awesome start, but you know this is really a low bar test come on like we should have been passing this test long time ago. They can't believe it took many many years to come up with this dumb test, so I paused and thought well. What? If there's harder tests? Can we came up with a harder test, so this is what we did.

B

This is work with Shari at the goal is to have a benchmark so that you can evaluate your interpretability methods. This is I roughly alluded to. We make up a data set such that we know which part of the data set is important or shouldn't be important, and we see show how the attribution message to or salience in methods maps to. So how do we? How did we do this? We took mini place this data set. This has just bunch of scenes, forests, kitchen stadium and then from another data set. Ms cocoa.

B

We took a patch of object, cut out pieces, so here I have a dog and I also have backpack and other ten different things. Then we paste it's this thing to every single scene, images for all scene classes and what is the well dog is everywhere. Dog is in every single picture, so dog is not important for classifying scenes and we verify this in more computationally in the paper, but we showed that yeah dog doesn't matter in the prediction, because I know I made up this day. That's that so attribution method or salience.

B

A message should not highlight the dog so and we can make a second step and that's what I just said. We can make it this even more complicated by saying. Okay, I can only add the dog to some classes. Just the forest, then dog is now giving signal to space, classify the forest cause. Then the attribution in forest class should be higher than any other dogs in any other classes, and we can do this relatively one-by-one and have like a relative measure. So that's what we did in our paper.

B

We suggest three metrics as a start for measuring, but we focus on false positives. I. Think interpretability method have very similar history that will come in in a way that we measure how good this thing is. So in traditional machine learning we first started with accuracy, and then we say oh wait, maybe the precision and recall mattered, and then we went to AUC and then now we're thinking about robustness, adversarial tag and all sorts of other things.

B

This is like all good things that different metrics and I think similarly we're in the very beginning stage of that interpretability method. We're gonna come up with metrics that matters to your task, because not there's no one metric, that's good for everything.

B

So here what we mean by false positive is when interpretability message said it's important but model they didn't think so. So that's just the part of the metric that we are focusing on in this dataset and we suggest three metrics I'm going to talk about a little more about the first one, but briefly the second and the third one.

B

What we did here input dependence ray is we took a picture and we did the optimization stochastic stochastic gradient descent to make a dog really really important, and the question is well now I made it really really important. We know it because we optimized for it then attribution should increase input. Independence rate. This is kind of like adversarial example. How many people have you had out with your example lecture? Yet, though, okay I see okay, cool, so input, independence rate is something like I'll gain.

B

Take a picture and I placed a dog in a location and I only change that dog pixels such that the network actually doesn't see. The dog I can do this using gradients I make sure that every single layer, nothing changes when I add this pixel, and this is kind of basis of other examples, but if you change something I can either make net or go crazy or I can make the network do nothing. This is just simple radio based attack or approach. That's what we do in those two measures.

B

So these three metrics measures, different things different different things that might be might be differently important. That's it depending on your task. But let me talk a little bit about the model. Contrast score here. We trained two models.

B

The first model is this is better yet the first model is trained to classify scenes so Forest Stadium kitchen, the second model, is trained to classify objects, so it's dog that its backpack and so on, and we expect this where the dog would have been, should be very different in these two models and that's the score that we measure we do this for many many images, ten thousand I believe and we average them- and here is a quantitative measure of using this metric model. Contra score. This is a grad cam.

B

This is all the method that you just saw in the previous section, vanilla, gradient, smooth, gradient and so on, and the T cap, which is the technique that I'm going to talk about soon. The scale of this is one one is the best, although there's a little nuance there higher. The better is. The message that you should take away from this data set is open, sourced, we're open sourcing model and everything so that, hopefully, when we're doing this, we don't make this mistake, a game that we're kind of doing the ostrich thing.

B

Where, like things don't work like you, don't want to see it so sanity check doesn't pass on your method. We just kind of ignore it. Let's, let's not do that. Let's evaluate- and this is like a low bar again like you- can do much better, more sophisticated tests and we can make better benchmark data set to, but this is a starting point.

B

What this work gives us, however, is a wish list. What could we have done? Better? The saliency map relied on the queue that is based on human. Human has to look at it and they have to subjectively judge like what you said like Oh belly is important, so maybe it's a shape of the belly. Maybe it's a texture of their belly like if you have to reason and it used the pixel as a module. But you know you must don't think in pixels I, don't go!

B

Look at the stock picture and say: look at the pixel number two point, one! Thirty! Five! Isn't that cute I say: oh the fluffy thing. That's cute right, so maybe we can have something: some more quantitative quality function and use something more human-friendly, high-level concepts like for finesse or texture or color instead of the pixel, which is artificial, and perhaps that will help us to help lay person to understand machine learning models better.

B

You don't have to know that computers process images in pixels because we could have been living in a world where analog computers are everywhere. It didn't have to be digit. Well, I guess: there reason why it has to be digital, but maybe there's a parallel universe, where everyone is using animal computer that we will be living in a completely different space. Pixels are not not the way that we we communicate.

B

What else can we do? Well? Local understanding is important in case of like court cases or medical understanding. One data point is important, but sometimes you just want to get a kind of idea of what this model does. So we're gonna, try global understanding.

B

So again, problem is the same. We have post-training explanation, you have a model, it's given to you, you have a template data going in and it says I'm a cash machine and the question is: why are you a cash machine?

B

Let's use this alien sea map that we've been honoring to help us think about what we really want as an explanation. So here is a sale, a type of salience in my album for this picture now, as I stare at this very carefully I see this human in front of the cash machine. So that made me think, maybe the existence of a human concept in this picture mattered in cash machine prediction and oddly this cart wheel behind the human is also highlighted. That's a little weird why the cart wheel.

B

Perhaps that also mattered for some real reason, and if these concepts didn't matter, I would like to know which one mattered more, because if a human mattered more, there may be a little more comforting than with if the wheels mattered more and whether this is true for all cash machine. Pictures should I be worried about this or not so more global. Explanation who watch you reckon Morty.

B

If you have when I highly highly recommend it, it took me a while to get used to it so I think it's a very West American culture I grew up in South Korea, but like once, you get used to it. It's beautiful, beautiful, looking forward to the next season, I'm, not getting paid to say this, but I love, Rick amore. This is a character in Rick amore. Where he's always says like I can do.

B

I can do something, but anyway our our character says well no, but we can express these concepts that we want the explanation for like humans and wheels in pixels, especially across many images, and it was, it would have been fine if you had these two things as an input features, but you didn't I just came up with it. I just looked at this picture and I had this insight of all the humans. Of course, humans are always in front of cash machines, so you didn't have that.

B

So would I be great if we have a quantitative way to measure how important these concepts that you just came up with that's what we did testing with concept activation vectors, T, keV I really wish I had named this taco I regret. I, regret this deeply, but I couldn't quite figure out how to cut throw in there.

B

So it's cool tikal and what it does is you have a concept like race or gender that you want to measure whether that was important, your prediction and we can give you content ater explanation only if the concept was indeed important. Even if the concept wasn't part of the train, so let's be concrete. Here's a concrete example. Let's say you have a model that takes a picture and predicts whether there was a doctor in the picture or not, and I want to know whether the gender counts have mattered in this prediction.

B

Tk is like a microscope that gets attached to the model, so it's a white box, interpretability method I, have to know what's going on inside of the model and give you number between 0 and 1 T cap score, whether or not a woman mattered how much compared to other concept like woman, and it does so if and only if the concept actually mattered and I'll talk about how we can test this.

B

If we think that the concept, crazy concept that you came with didn't have anything to do with the prediction that it will say, no I, don't know what you're talking about sorry. I can't give you an answer all right. So for the running example, we're gonna have a zebra and I am curious. Whether striped contact was important in predicting zebra or not. So, first and foremost, you may say: okay a little concept, high-level concept, it makes sense. But what do you even mean by this, like? How do you get this? How do you express this?

B

We do the simplest possible thing. What is it? Well, you? The user, provides some examples of the concepts like in this case strapless shirts, and you also provide some random images as long as the majority of them are not stripe. You're fine- and you get this- that you have this network that you you already trained- that you're interested in interpreting.

B

Now what we do is we simply collect the activation activation F of L for these pictures, the concept, pictures and the random pictures- and you simply just classify train or linear classifier that separates the two and find the factor that is orthogonal to the decision boundary. And what is this factor? Well, it's just a vector that points from random activation directions to concept activation direction is not new.

B

This is like talk to Vincent talked about work to that vector of the gender, although, despite the the the nuance there, the linearity over such your high-level concept has been shown in over and over again in many papers and we're just doing that the same thing, it's just the simplest way. Now we have this vector. What are you going to do with this vector to get that quantitative score? That I talked about tikal score? It's also really simple, I think to this audience in particular. It might be really toy example, but we use directional derivative.

B

What is it well, you just take the probability of the zebra and take the derivative with respect to that vector that we just got the stripe miss vector. So what is this intuitively? Well, it means if I move my image to slightly to more, like a concept slightly less like the content. How much would the probability of zebra swim if it swings a lot? It's important concept.

B

If it doesn't it's not an important concept, it's pretty simple and we do this for many zebra pictures and we do this simplest thing, which is to say well among all the zebra pictures. I have how many of them in return the positive directional derivative. In other words, if I had hundreds ebrill pictures, how many zebra pictures having the concept increase the probability of a zebra.

B

Now this is simplest definition. You can put o inequality. You can flip this to test the negation or absence all sorts of things. That fits your bill, but this is just a one number. That's simplest, oh great, that's pretty much the entire framework. It's very simple, I view this as like a canvas that someone else is it smarter than me can go and make it fancier and more maybe, principled and and nicely complete, but that's really it that's the base. Yes, we have a follower that mark that we were trying to discover concepts.

B

I'll talk about that, a little bit a lot harder. If we solve the problem we solve AI. Now it's it's actually not true. It's interesting. It's an important insight that the concept actually doesn't have to come from the training distribution it could, it doesn't have to be zebra. These are. These are actually just a clothing from goodness.

B

Oh no, that wouldn't work. Yes, they would not work. Yes, that would be a lot harder like try to express the bright birds also if, instead, if the concepts are yeah, if the network is trying to do images, but you want to express concept in words, that's difficult.

B

So, but one more question here we have this cow that we learn in embedding space, but we know that in high dimensional space things are funky, but for Sara example, actually leverages that will or a type of characteristics is because it's high dimensional, our intuition is out the window, so things are a little funky. So we want to make sure that the calves that we, a cap we have didn't return sensitivity. The high directional derivative by chance. So there's two ways to check this qualitatively and quantitatively, but I'll only show you the one: that's quantitative!

B

Here's one way to check that. So here remember. We learned this calves using random set of pictures.

B

These are random set of pictures, so you can have as many random set of pictures as you want they're random from each each side you will have a cap, so you had enough having many many cap, then, given a target class that you would like to explain, you have many many TCAP scores, so those averaged number of how many zebras, which are and positive directional rabbit, and then you can see these as a samples from a distribution, and if you, if you Stratton, is, was higher high important for a zebra class.

B

You will see something this nice caption description like this. Then what we do is we train another set of cabs that are random, so random concept. What does that mean? It just means that is their stripes. We're gonna put some random pictures, but pictures or data points that are from still from roughly training distribution.

B

So this is like meaningless concept as a straw as a counterpart and we're gonna do some T testing or Welsh testing better to say whether this the mean of these two distributions are statistically significantly different, and only then we say this concept might mean something I think there are better tests. If you have a better tighter test. Let me know I would love to hear, but it's a starting point I think it's better than nothing. So that's what we do.

B

A qualitative test is actually better that you, you sort your input, pictures with respect to the cab that you train and see if it aligns with your concept of that that your concept of that concept, and if it doesn't, then that means your cab is not good. I think that's a second way or double proof. They say in addition to quantitative method, so that's it I'm gonna show you some results. I might leave some time for the question, so I'm gonna pass, maybe skip the sanity check experiment.

B

What this does is simply: oh I guess I didn't include it. So it's I did a very similar thing with the dog and see an example and I confirm that the T cab is able to match at least this toy case, the truth, and we compare that with salience e maps. We do human experiments to show that salience in map mislead people very easily and when they're mislead they're very confident that their answers are right because again they see what they like and they say, I'm as confident as I will ever be.

B

It's a it's a I like this, my favorite part of the results section, but second, we were excited after the sending a check experiments. We were like okay, let's go wild and run this too widely use. It means image classification networks, so we did that we tested different types of concepts, are colors race and objects for the first color concept we tested for fire engine, the red and green comes out high greens high, because a lot of fire engines are on grassy field we found now.

B

This makes sense unless you're from Australia anybody from Australia how party, okay I, actually met somebody who knew exactly where this picture was taken from it's from Canberra. It's not everyone in Australia, it's just a Canberra the capital and there's this fire station that does that has a yellow fire engine. In fact, yellow fire engines are better than red. Can anyone think of why oh yeah good point? Actually, maybe that's another reason.

B

There's slightly different reason.

B

Oh that's another reason, so the reason was that it can be better seen in night. Read it's harder to see at night. So there's like a paper is actually written by I, guess, psychologists or environmental researcher is that we should all be using yellow fire trucks. Why are we still using red trucks?

B

But you know when we looked at this graph, it made sense to us, but it wouldn't make sense to people from Canberra Australia and that's showing some geographical bias that the model might have learned another of course racial biases, that we confirm with that's consistent with previous findings and if you remember the very first deep dream, blog post where they classified dumbbells and they showed oh, we found this neuron number by. I don't know 37 38, that shows muscular arms and they say who maybe dumbbell classification muscular arm is relevant.

B

So I wanted to test this. So I went and google searched arms and other object images, and we can now quantitatively say yeah arms did matter and for this case, I didn't have this dataset, so I collected 33 examples of arms from Google image and if passes, that's a school testing and everything- and in fact this is something that was surprising as I have more people at Google. Using this method, people find there's you just don't need that many examples to learn.

B

That factor also there's a big caveat that maybe there are some domains that you cannot. You do need more concept examples, but we've seen like 15 to 33 was the right number for images, cancer, images and other other images who used so going back to the goal that we initially started with and I'm, hoping that some of these tools that we develop is helpful for you to ensure that your values are aligned in your model so really excited. We wanted to go to real domain, which is working with doctors.

B

We looked at diabetic, retinopathy application, diabetic, retinopathy or dr, is a treatable symptom, a steerable condition. If you discover early, you don't lose your sight, but if you discover lay that's not good news, so medical brain has this model that can classify IDR pretty well, but our question was well: is this model using something that doctors use? Is it using as if it's a it's a doctor making the decision, because people are looking into this model and deploy it to areas where there are no doctors?

B

So we went to a doctor and in brain and we asked her, can you tell us the concepts that you would be looking for for this level of dr? This is most severe level of dr and what are the concept that you don't look for and we did the same thing for another class.

B

Then we asked the model using tikal. What do you think so for model that it's for the class that was most severe so at this point the hope is lost. It looks like it's doing what doctors would have done. The green ones are the ones that doctors would like to see. It's high red it slow, make sense and accuracy models. Accuracy is pretty high story's a little different when the models accuracy is a mediocre.

B

So this is most my level of dr, and in that case it looks like the model is paying attention to a concept that doctors would have not looked at, so we were possible by this and we dig farther and we found out that the air level one has a lot of confusion with dr level, two even for doctors that the next level of a severe severe level with ER and the DA level. Two has a lot of this feature. It's like ahem, which color like the the vessel kind of been blown up.

B

However I'm blinking the name, but it's it's that, like observing the blood vessel that thing blown our or swollen, so that that was a mystery than they realized that, although we should probably clean up the label before going farther and again I think this brings us back to the goal where, when we want to confirm that domain extra knowledge is being used in the way that we would have been used and that's what you are one of the things you want to know. This tool might be useful.

B

It's also important to know that in some other things that our focus points it out limitations of TCAP, the first limitation is that concept has to be expressible using examples. If you want to express something higher-level like love or some other higher levels, the super higher level things I, don't know how to do that. The Atika wouldn't work for you and users need to give the concept so I find it really interesting it dip. This divides people into two like the HCI folks and domain experts. Love this. They say.

B

Oh yeah, I can give you examples, and now it can speak my language, whereas if you talk to anyone in brain they'll be like yeah, but can you automatically discover it? Can you put the humans out of the loop and and see if we can do this already and I think both problems are important, one that you can impose our language on the model and and and to discover something that maybe we never knew before and that's not that's TBD. We have a follow-up work.

B

That's submitted to discover concepts by form of imaging patches, so in this case, for example, we we have patches of images that consistently discover the basketball concept.

B

However, the danger of this is that when humans see these patches, for example, when you see these patches, what's the concept right anything else, but the first thing that people think of is floor. However, what if it was brown color like again, you you're kind of injecting your confirmation bias when you interpret this new concept so that there's a plenty of challenge left here to how to tease that out yeah, you can hardly see it's like this thing in the floor of the basketball court I see.

B

So there are three pictures of basketball class pictures and our goal is to find a patch of pixel that consists that can make a cluster and that becomes our new concept, discovered concepts and we discovered this concept cluster, and these are just spool patch, just to zoom doing blowing up, and there are many like arms and like bowls and other things, so there's a there's a lot of work. There. Also, it's important to note that none of these are causal I. Think causal inference plays important role in interpretability.

B

We recently worked on extending this work to causal tea calves, but computations are really hard or a lot harder for to do this. You need a VA e for your model and things get kind of complicated. You have to add a lot of assumptions. We have to have a causal graph. That assumes there's no other compoundings than anything everything else that that we listen the console graph, there's a trade-off. You can do this cool code.

B

Is there if you want to look at and here's a self-promoting slash side I was I was thrilled that, like regular people who are not in computer scientists that really started looking at this, which made me so excited, because my original goal was to build a tool for lay people who's for actually doing like saving people and solving problems out in the world, not like a PhD from computer science, a doctor, the fake doctor, I, think well, oh ma bean. The many of you guys are also I call myself fake doctors.

B

So sooner I talked about it at Google I/o, which was awesome. I've made my dad happy more than anything. I went to UNESCO in Paris to receive this award. The UNESCO Center has like this professional stage. That was pretty awesome. Experience I also was thrilled to see responses from inside of academia. There's this paper I'm part of it, but I didn't do much work. It's really Kerry Kerry Kai's work. She used this concept vector to help doctors to store images. This is a prostate cancer.

B

The pieces of cell pictures and doctors found it very useful, and it was one of the best paper on our mission at this KY conference. Top HDI conference this year forgot to put this, but a couple of years. A couple of weeks ago, I met this person from aerospace in Corporation. His name is Eric and he used he Cal to print interpret a model that predicts storms. So he used like. Is this storm kept for? Why is it about? The eye of the storm? Is about something else color about other things.

B

Then he came over to me to tell me that I need to add it here. That's awesome. Another student in Switzerland extended this work for regression models and tested on breast cancer data work with doctors to show that this is useful.

B

So we have a couple of moments. We talked about these three methods, however, I think there's a lot more interesting things that I didn't cover other versus aerial attacks. You can all tag interpretability method and make it completely screw up totally possible. You want some robust explanation. Things change a little bit. You still want to give stable explanation. Another thing that I think I'm getting much more more excited these days is interpretability for science. Can you interpret a superhuman performance model to discover something that we didn't know before?

B

Can we add a piece of knowledge to the knowledge of the human humanity? I think that's a very exciting topic and I think a lot of people in this audience and specifically would love to collaborate. If you have interesting data, if you have a model, that's interesting either about earthquake or mental health, depression or autism. That sort of thing I'm super excited. Let me know if you're interested binnu at Berkley is doing some awesome work on looking at neuroscience and you know mix on interpretability for science.

B

So for the interest of time, I'm gonna skip this part, I think I already lecture that we need to do evaluation properly. We need to remember humans are biased and irrational. If you love reading on doing project by Danny Kahneman winning a Nobel Prize winner of economics, he wrote oh I guess someone else wrote that book front doing project that's beautifully, describing what kind of crazy biases humans have and that we can't get away with it. It's beautiful book highly recommend.

B

We need help from HCI community who knows how to build a interface design, a workflow, very important and so on and I think it's a I'm, a dog person I, but I like putting cats in this site, they're funny I. Think it's a really exciting time to be in the field of interoperability. There are so much things we can do and help people so that they want to do the right thing and we can help them to do the right thing and give them the power to be more responsible. Thank you.