National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 10 - Fairness and Ethics in ML - Emily Denton

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

Folks now we have the last lecture of the day we have with us Emily Denton from Google's research and machine intelligence group, so Emily works on developing tools and techniques to promote fair, inclusive and ethical. Ai is actually very, very generous and agreed to give two talks at this at this school. So we have our today talking about this and then is it tomorrow. You're talking about generative models, awesome so just a little bit more about Emily, so she's, particularly interested in detecting and mitigating a harmful bias in computer vision systems.

A

So when you're doing machine learning for people and social kind of things, this kind of stuff is very important, but you can always think about how these things and I know analogs in the kind of science work we're going to do. We want models that are robust against issues with our data and so like this, so Emily got our PhD from a Quran Institute and in NYU in machine learning, and there her research focused on unsupervised, learning, generative modeling so like that. So thank you. Emily.

B

All right, can you all hear me yeah, so very excited to speak with you today, so this talk that this talk is gonna cover a subset of the sort of broader ethical considerations that are involved when building machine learning systems how's that better yeah, okay, so so yeah machine learning fairness is it's a huge complex, emerging space, sort of the intersection of business and law and ethics and data, science and engineering and social justice, and so I'm gonna kind of consider this, like a light primer and a whole bunch of different things, and hopefully you can kind of like dig into some of the references and different things if it interests you so cool.

B

So when machine learning is typically taught, you know, we tend to focus on a very narrow range of metrics, and so the goal is typically, you know pick a metric optimized for it. You know, in that case model C is what we should pick here in this talk. We're gonna be thinking about other issues that might come into play when we're thinking about developing models that are going to be in the world.

B

So I'm just gonna read this quote from the Guardian this from a couple of years ago, while the neural networks might be said to write their own programs, they do so towards goals set by humans, using data collected for human purposes. If the data is skewed even by accident, the computers will amplify injustice. So at a high level.

B

Algorithmic unfairness refers to the myriad of ways in which harmful societal biases can get embedded in algorithms and lead to unjust or prejudicial treatment of people based on all sorts of different sensitive characteristics like race or gender or sexual orientation or disability status, and so on and so forth, and so with that would sort of really careful consideration of the societal context within which these systems are being built. Patterns of structural and equity that are reflected in datasets can easily become embedded in these models. So I'm just gonna go through a couple examples of life.

B

What I mean when I say this, but before that I want to just like make a real clarification point. You know, there's a distinction between the sort of phrase biases used in statistics and the type of bias that I'm referring to here right. Machine learning is sort of at the core about discrimination and bias throughout this talk, I'm going to be using bias to refer to sort of unjustified discrimination. Not the kind of you know core machine learning use of the term. So, okay. So here's a really good example.

B

So machine learning models are increasingly being utilized within the u.s. car stereo system to inform decision making in the context of bail and parole and sentencing in 2016, Pro Publica released a study of the compass risk assessment algorithm.

B

This was a proprietary software developed by company Northpoint, and this algorithm used information about defendants or socioeconomic status and family background neighborhood crime and other sorts of things to reach a supposed prediction of their individual risk, and so ProPublica did an audit of this algorithm and they found that the tool correctly predicted recidivism about 61% of the time. But there was a clearly racialized pattern in terms of the errors that were made, so the tool was twice as likely to falsely flag a black defendant as a future criminal and wrongly labeling.

B

This way about twice as much as they're sort of comparable like white defendants and in contrast, white defendants were mislabeled, but in the inverse way.

B

So you know these error patterns that are observed, they're, not super surprising, just given that the variables that are used to train this system are sort of inherently structured by patterns of racial oppression in the society in which we're living you know so another example here this is the work of a PhD student at MIT, Media, Lab, joy, bloom, weeny, so she's been looking at how different facial analysis, systems differentially, recognize people with different skin tones and so their kind of failure of AI systems to account for and recognize people of, all different ethnicities and genders and socioeconomic status.

B

And all of these things like this is a serious potential to exacerbate existing inequality in our society. Again. Machine translation. This is another example where sort of social biases get embedded into the system, and in this example here we see that sort of stereotypically male words get translated with a male pronoun, whereas stereotypically female words get defaulted to female pronouns, and this is only Google actually fixed earlier this year. So that's kind of good. Ok, so there's lots of different ways of characterizing the different types of harm that can result from ML systems.

B

I won't go into too much depth here because there's so many different types of framings, but this is one that I think is nice, and it's kind of you know comes up repeatedly, and this is a distinction between allocated harm and representational harm and so allocated harm has to do with the allocation of sort of resources or opportunities.

B

Sort of you know material harm. Basically, if you have a hiring algorithm that is disproportionately recommending different groups of people to be hired. This would, you know, fall into this category and then representational harm occurs when technology like reinforces stereotypes or diminishes specific groups, and so thinking about this harm is really important, because you know it highlights the fact that machine learning plays a really important role in representations of identity in our society.

B

And then another thing, I think is often under looked in the m/l fairness world is the sort of tendency for Silicon Valley generally to just propose technical fixes to social problems, often without fully understanding the social problem and the kind of structural conditions underlying it. And you know this is you know, we're all most of us.

B

I think come from science and engineering backgrounds, and so we're taught to like abstract away the problem and really you know, get to the meat of it and try and come up with a nice, simple solution and often you know this can be part of a solution. But this isn't the whole thing, and so I think it's really important to you know understand when we're falling into these kinds of patterns- and this is often even worse when data driven prediction systems are like touted as more objective and more neutral because they are based on data.

B

So this is a nice quote from ruja Benjamin, who was a great book, so she says the narrow investment in technical innovation necessarily displaces a broader set of social interests.

B

This is a form of exclusion and subordination built into the various ways in which priorities are established and solutions are defined in the tech industry and then, similarly, the sort of allure of objectivity, x' is really dangerous, and so he or she says that when bias is routed through technoscience and coded as scientific and objective, then it becomes even more difficult to challenge and hold individuals and institutions accountable. So this is why it's like really really important from the beginning of the design process to understand you know what are the problems we're trying to fix?

B

Is there a textual solution? Is that the entire solution, you know? Are you? You know kind of engaging with the relevant stakeholders and so on and so forth. So just really quick thing. So why is this important to all of you, because you're all coming from a different sort of science backgrounds and a lot of the stuff I'm going to go through today? A lot of the harms come from models being built on social data and being deployed in a social setting and I know that you know immediately.

B

You might think this isn't super applicable to your work, so I want to kind of just like motivate you to like pay attention and like. Why should you care, besides it being interesting and as people you should care so cool?

B

So first, this is a little cliche, but, like with great power, comes great responsibility, you're learning, you know a lot of really really powerful tools this week, and you know you might initially think you're going to leverage these tools in your you know very sort of narrow um discipline, but machine learning is increasingly touching and sort of every aspect of our lives and so I think it's really important that, as you become aware of these skills, you understand you know how are these systems being deployed and and also just kind of understanding that this is you know it's a it's a pretty unregulated area, there's a lot of different ways in which machine learning models are being used and just kind of be conscious as the broader ethical considerations.

B

Even if you don't think they immediately apply to your work right now, but also they might so. Secondly, I'd like to emphasize that, like no science or technology is ever developed in a vacuum. Technologies are often depicted as being neutral. You know and kind of developed outside of the political and social contexts, but this is I. Think in a lot of scholars. Argue just kind of false, and you know most science and technology.

B

You know inadvertently or explicitly embodies different sort of social relations, there's a lot of classic examples of this which I'll skim through for the sake of time. But you know we've seen that the ways in which we design the material world has a potential to reflect and reinforce social hierarchies or also subvert them, and you know. Similarly, this is an example of like physics, so you know, people might say. ah Physics is physics, it's neutral, it's science, but the sort of modern photography as an example. That is very, not value neutral.

B

So modern photography was developed with a very white norm coated into camera sensors- and you know this is this is now you know- affects a lot of different sensors that are deployed in the world and it took a very long time, for you know the photography and Industry to even take notice of this. So that's another great paper, I'll direct you to read by Ben green basic argument put forward is that data scientists just need to recognize themselves as political actors engaged in basically the normative construction of different aspects of society, so, okay and then.

B

Finally, this is important for you, because a lot of the best practices that we'll see for sort of ethics, informed design and development are also just good practices. Generally, you know a lot of them. Things deal with some accountability and transparency and interpretability, and that's important for everybody in machine learning cool, so typical ml model paradigm is kind of broken down into data collection and then model choices and a common way of thinking about this. That a lot of people hold I think is the data.

B

You know it's kind of like reflects the world and then, if we fit an algorithm, the better we fit it, the better a model is going to be and I'm also going to be focusing a little bit on the kind of like downstream use cases of all of this. So starting with data, a lot of people think that data just reflects the world. As I said, it's not Davis, never neutral.

B

It's always some kind of representation of reality filtered through you know different sort of human processes, so I'm gonna go through just a couple different types of biases that might get into your data set again. A lot of the harms are gonna result from these are sort of within the social setting, but understanding data set bias is relevant sort of everywhere that we are using data, so sampling bias. Obviously you know this occurs when a data set is not representative of the underlying population interest.

B

You know we see a lot of common image, datasets and machine learning exhibiting different sort of gender and racial and Geographic biases. You may also have biases based on the types of instruments you're using to collect your data sets. You know times of day and conditions and all of these types of things it's really important to be cognizant of all of this when you're developing your data sets.

B

These are just a couple examples of like common face, datasets, that people use there's a significant sort of skintone skew and also gender skew the open images, data set and imagenet. These are very common image datasets. They have a very significant geographical skew.

B

This is just another visualization that kind of highlights at about 60 percent of the data comes from the six most represented countries in North, America and Europe also unequal distribution of demographics within each class. This is also an important thing, so this data set found sort of significant gender biases with different activities that were present in the data set, so human reporting bias. This is another thing. This basically means the frequency with which people sort of write about actions and outcomes doesn't reflects or real-world statistics.

B

So this is a nice example of. Like word learning texts. You know if you're just like trained to model on text, you would think that people were murdered way more than they exhaled. But obviously this is not true. So again, like the things that our data are telling us are not necessarily representative of what is actually happening in the world. I like this example.

B

So you know, if you ask you what they see here, they'll typically say like bananas or a bundle of bananas, and if you show people this image now they'll say green bananas, and so this is because people tend to sort of mention the things that are not prototypical, so yellow is like very parts of Bill banana, so it won't be mentioned. Green is like oh, this is novel and interesting and sailing it to this particular image, and so I'll mention it. So this can also carry over into sort of social stereotypes right.

B

Somebody might caption the image, the top doctor and then one female doctor, because you know female is like a typical. You know sort of stereotype in this field.

B

Furthermore, so stereotypes are sort of internalized associations that you know occur through, like natural processes of learning and categorization and implicit biases are super pervasive. They operate largely unconsciously and they can't automatically influence the ways in which people see the world, and these are really really important when we're having you know, people annotate our datasets, right and I think this isn't like really really overlooked in machine learning. Is you know what types of cultural biases are getting into?

B

Your data sets just purely through the annotation process right so again here this might look like captioning one of these images. Dr. and one of these images. Nurse racial stereotypes are also really important. I mean again, it's really important to like tie these back to the kind of social setting in which these systems may be deployed cool. So, ultimately, data is not a neutral reflection of reality, just really important to remember through.

B

So how does this type of bias affect machine learning models that are ultimately trained on this data, and so I touched on this a little bit earlier, but a super common and now pretty well-documented result of unrepresentative training data is models that exhibit differential performance across different groups, so this is kind of looking at different facial analysis systems that are unable to detect our Christian faces.

B

This is some follow-up work that was looking at sort of you know, I'm, a broader range of facial analysis systems. Similar sort of patterns were found.

B

This is more follow-up, work, looking at state-of-the-art pedestrian detection systems, again finding disparities between lighter and darker skinned pedestrians in all these cases, like the thing to remember, is that this means that, like not only were these models not trained on representative data, but they like went through the entire pipeline of like deployment without anybody ever doing this type of testing so again like when you're building models, just really be thinking about, like you know how representative your data set is, do you think it might not be representative?

B

Do some adversarial testing find some new data? Try and understand these patterns, and again this has happened before you know we've kind of seen throughout history. You know light-skinned male body is being taken kind of consistently as the default time and time again, we've seen this with like camera sensors, as I mentioned earlier, you've seen this with crash-test dummies. We see it in like medical research, so the types of things that were encountering here you know they're, not new, cool, so again, disparities in accuracy. This is looking at.

B

You know a model that was trained on very us centric data and unable to kind of recognize a photo of a wedding. It would cite of that very specific cultural mold.

B

This is an example of a model that comes in at Google, so this is a toxicity classifier, so it takes in like a phrase of text and then predicts whether or not the text is toxic, and this is deployed all over the place to help with difference with content moderation problems, but this team, when they first built it, they realized that the model had incorrectly learned to associate names of frequently targeted identities with toxicity.

B

And so these examples here, like none of these comments, are toxic, but the new model predicted, otherwise, just because the sources where the model was trained on there were you know, sort of certain identities were overwhelmingly referenced in offensive ways and under representative for positive ways: smart reply, again cool I'm, gonna kind of skip through some of these just for the sake of time, but you can look through the slides, there's a lot of examples of this cool so like some solutions. So again, many of the solutions start at the data level.

B

Collecting representative training data is obviously a first step. This is an example of some studies that did this IBM. Releasing diverse face data testing models on representative data to try and find braking points is kind of like adversarial. Testing. Here is really important, algorithmic audits. So this is some work where folks were just kind of looking at public. Api is and doing some really thorough and rigorous testing data set transparency efforts. So this is really important and I think this.

B

This is an example of some that applies like very, very broadly I love this to sort of catch on more widely. So there's a couple. Different frameworks for comprehensive data set documentation that have been proposed, such as data sheets for data, sets nutrition labels data statements. Each of these frameworks basically provide a slightly different framework for documenting data, set collection and annotation methodologies, and so the overarching goals of these frameworks are twofold. So, first for data set creators, the aim is to provide a process that encourages a sort of reflexive.

B

Look at you know the way in which it was created and distributed and maintained, and well as making clear any sort of underlying assumptions that went into the data set creation, and then the data set creators were also encouraged to kind of think about harms and implications that use some stuff like that, and then on the data set consumer side. This really facilitate sort of informed decision making. So I think this is something that should just generally be adopted by people who are creating and putting out data sets.

B

So basically in summary, you're working with data, you know collecting data annotating data using existing data, there's a wide range of questions that I think people should always be asking such as you know. Where did the data come from? Who collected it as the data representatives or measurement error, reporting bias? If there's annotations, are they subjective? Is there? You know stats on inter annotator agreement stuff like that, you know: could the annotations themselves be reflecting harmful social stereotypes? Does the data itself reflect patterns that would be harmful to reproduce or amplify?

B

Okay, so moving on I'm gonna talk a little bit about sort of designing and testing with fairness and inclusion in mind. So so the first thing we could consider is just looking at multiple different evaluation metrics when you're training and evaluating your model- and this is important, because each evaluation metric really provides different types of information about your model and understanding the implications of different types of errors is useful. To kind of you know, help trade off, you know acceptable trade-offs between false positives and false negatives, and so on.

B

So, in addition to looking at multiple different error, metrics, something that you can do is break down your quantitative analysis by different groups of data. So within the AML fairness community, this typically looks like breaking down your data set along. Maybe demographic lines, cultural lines- you know typical lines, but you could also imagine doing this type of analysis broken down by sort of conditions under which the data was obtained.

B

Different sort of devices stuff like that really just trying to understand more fine-grained patterns in in in your in your models performance. So this is just a good example of some early work that did this. This is a gender shades paper. They adopted this approach and the key thing is to kind of look, not just a unitary group, so here are sort of gender based groups and skin tone based groups, but also intersections of those groups. Because again, this can kind of tell you more than this kind of aggregated statistics can so model cards.

B

This is another really nice framework. It's kind of the complementary to the data cards and data set documentation frameworks that I was just proposing, and so the idea here is a formalized set of protocols that sort of put forward model evaluations with fairness and inclusion in mind so actually before I go on the goals of this are basically twofold right. We have the again. This is very analogous to the the data set annotations or data set on documentation stuff, so for model creators right.

B

It encourages this kind of like thorough and critical evaluation, then also like a really thorough consideration of intended, uses and stuff like that, and then for the model. Consumers again. This is providing kind of informed if facilitate informed decision-making. So basically, what this does is we see sort of model details broken down intended, uses broken down and made made clear, different sort of factors relevant to analysis. So again, this might be.

B

You know, breaking down analysis by different devices, different conditions, different subgroup stuff, like that, a you know, kind of brief summary of what the training data looked like. What the testing data looked like different metrics are gonna be proposed. Why are these relevant metrics? What are the trade-offs, the different metrics and then the actual kind of quantitative analysis itself is proposed and then there's additionally, an ethical considerations section which I think is also you know increasingly important to just kind of put in any kind of model.

B

You're developing- and this might be things that you know we're under consideration as the model is being developed as well as things that people who are going to be using the model should be thinking about and then any sort of, just additional, like caveats and recommendations that the model developers want to convey.

B

So, okay, so now I'm going to get into some things called fairness definitions. So there's been a real flurry of work recently on translating complex social notions of fairness into precise mathematical formulations. These are frequently referred to as fairness, definitions and I have fairness in quotes here, because I think it's like really important to recognize that these you know these are mathematical like abstractions and formulations, and they are in attempts to capture different social notions of fairness.

B

But you know kind of like achieving these different constraints is not going to be sufficient for, like you know, a more contextualized and nuanced understanding of fairness and inclusion in the social sense.

B

So, generally speaking, these definitions assume that each data instance is associated with some protected attribute. These attributes will frequently reflect different protected groups in society that have often been historically marginalized, and these attributes are sort of leveraged in different ways. Throughout and many of the definitions, try and capture something related to a couple: different notions of social fairness, disparate treatment and disparate impact. Disparate treatment is often thought of as like explicit discrimination based on some sensitive attribute. Where, as disparate impact is, is this sort of more like long term?

B

You know differential outcome. The distinction between these is actually really blurry. If you try and like dig deep into structural factors in our society, but some of the different fairness definitions are trying to formalize, these I think it's kind of relevant to bring them up.

B

So one thing that people have tried is like fairness through unawareness, and this basically just means don't let your predictor have access directly to any of the sensitive attributes like race or gender, and so on. This framework has pretty severe limitations. Obviously, just because sensitive attributes have a lot of proxies. So, for example, you know: do the history of segregation in this country? Zip code is a really really good.

B

Predictor of race, so if you just omit race from your hiring algorithm but include zip code, this is probably not going to be a super fair algorithm.

B

So demographic parity is another definition that people have put forward and this basically asks that the rate of positive predictions across different groups again, these groups might be defined along racial or gender lines. Just ask the the rate of positive predictions is equal, regardless of that sensitive variable, and so you know if we had like a system that was like I, don't know deciding whether or not to hire people for the job of a CEO, then this definition would say you have to hire.

B

You know equal numbers of men and women for example, so a quality of odds is a another condition. So this is basically mathematically the same as the last one, except now we're conditioning on the true value of the target variable and so okay, so yeah I believe there's an entire talk later this week on interpretability methods, so I'm not going to go into too much detail. That's kind of like speak about a few things. Basically, but basically, you know girl networks, their black boxes, they've taken inputs, they produce outputs.

B

This is cool, they're really powerful, but it also means that, if we're using these systems in ways that are going to be affecting people and like really significant ways, we need to kind of understand like why are predictions being made. You know this is important for in terms of understanding causes of different types of biases, but also giving people sort of a feeling of control and a mechanism of like you know, coming back and saying: hey I feel like this decision was unfair and just so tea cab is a really nice method.

B

I believe that the creator of tea cab is coming tomorrow, so I think she'll go into a lot further depth, but I I really like this method. Basically, it kind of lets you ask the question of like you know how important is the concept of gender for this dr. classifier and it's an interpretability method that works in a kind of high level conceptual space, as opposed to providing very low level like pixel explanations, and so it's quite nice in terms of understanding bias of classifiers counterfactual methods are another really nice method.

B

The idea here is to try and isolate the causal effects um by asking questions of the form you know if what this one specific thing had changed, all else being equal, how might the prediction be different and there's a huge sort of history of these types of analyses being done outside of the machine learning context, so these are referred to as like audit studies, so people have, you know, looked at bias and hiring decisions, for example by like sending out equal sets of resumes with names, just change, so you would have like a you know: stereotypically female name or a male name or names that correlate with different racial groups and then looking at the kind of callback rates where everything else in the resume stays.

B

The same so these types of studies have been done previously, and people are kind of carrying these. These ideas over into testing machine learning models, so just going to go through a whole bunch of different references here.

B

Basically, these types here they kind of ask if you know most of these are gonna, be in the context of images, but this is sort of saying, if you, if you replace some segments of an image, was like a plausible alternative value, what region would maximally change the classifiers output, counterfactual fairness models, these methods, sort of try and perform interventions at the level of sensitive attributes like race or gender and they're performing interventions over a causal graph?

B

These papers are sort of interesting to read if you're interested in machine learning fairness, but I would follow them immediately by this, with this paper by ISA, who talks a lot about how race and gender are not really entities that can have counterfactual causality, and this has been kind of looked at in social statistics- a lot there's a huge history of this. It sort of really only makes sense to talk about something having counterfactual causality.

B

If it is like meaningful to talk about like two units that are identical, except for the one thing and just the ways in which, like race and gender structure, every aspect of our society like this, this is kind of a nonsensical thing to talk about so again interesting papers, but I think that needs to be followed up with, like a much more nuanced understanding of of how sort of racial and gender based oppression work in our society, so counterfactual fairness and text. This is a nice thing.

B

Basically, you know if you flip certain tokens and text. How does a models prediction change, and this is another model- that kind of uses, generative techniques, basically to manipulate images and see how the classifiers response changes, cool, so visualizations and other exploratory tools can also answer questions and you know, point to surprising issues and machine learning systems. So this is a tool called facets. It's an open source, visualization tool that I believe chaotic, Google and I should know that pretty sure came out of Google and it can.

B

It can basically help you understand and analyze machine data sets. So go. Look at that and then tensor flow model analysis, TF, MA, it's another tool. This provides scalable slice or full pass metrics lets you look at your model, performance, broken down by different subgroups and there's really sort of nice aggregate statistics or another useful tool.

B

So another important thing to consider is sort of you know throughout all of these questions like who to say and who was consulted in the design and development process- and you know, as we've seen, machine learning models they learned from historically collected data sets, and so populations that have you know experienced individual or structural biases in the past are often the most vulnerable to harm resulting from these models. And so you know it's really important to include you know: voices of domain experts, but also you know, sort of perspectives of people who lived experiences.

B

Who can speak to how these systems are going to be used.

B

So this is just a nice paper, because I focused a lot on kind of vision and LP applications so far, and so this is sort of looking at machine learning used within a clinical setting, and you know we see machine learning being increasingly used to improve diagnosis and treatment and health system efficiency. Basically, and this model really looks at how sorts of model design and data collection and all of these things could you know, be done in a really like proactive way to increase health equity um cool. So so we've looked at data.

B

We've looked at models and just kind of one final plug I've kind of been saying this throughout, but we really need to understand how these systems are going to be used, and this in this requires understanding the broader social context within which the systems are being built and deployed.

B

So yeah earlier I talked about some facial analysis systems and how a lot of different studies have been done to show that they failed to recognize darker-skinned individuals, and so, while understanding these kinds of differential patterns and performance is important, there is a lot of examples of technologies. We're just equalizing statistics across different groups is not going to be sufficient to solving the problem so face. Recognition is an example where the technology is dangerous. You know if it works, really well for everybody and it's dangerous.

B

If it doesn't work really well for everybody, and so you know, building you know face recognition that works well, for everybody is not a technical problem. It's really important to understand how the technology will fit into existing.

B

You know, structures of surveillance and policing, and also to understand that not all people experience the dangers of surveillance and people, always so I'm, not gonna, think I have time to go into this, but I I would recommend reading this article from Georgetown law that looks at a big analysis of over a hundred police departments or at the country and how they are using and abusing face recognition technology.

B

One of the like short anecdotes that comes out of this is basically you know: police officers taking a you know a frame of an image from the CCTV camera and saying okay. This is you know somebody who stole something. I'm gonna put it through my face recognition algorithm.

B

Oh there's no hits, but this guy looks a lot like Woody Harrelson, and so they go off and they google Woody Harrelson and they pull a photo of Woody Harrelson and they feed that photo into the face recognition system get a hit from that and then go investigate that lead, and so obviously this is not the way machine learning models work. This is like dark. This is the title.

B

This article is garbage in garbage out right, I mean they also described how different law enforcement departments are taking like hand-drawn sketches of faces and feeding those into the algorithm, and so like this is, you know just evidence of like whenever we're building these types of systems like we, it's not enough to think about like what the intended use cases or with the ideals you know.

B

Scenarios in which you might be used is ultimately if these tools are gonna be released to the world like we need to be thinking about these things like from the beginning, so gender classification. This is another example like there's, there's no way to like make gender classification like good, really like this, isn't an issue of, like you know, minimizing area statistics across different groups and again some like further reading here, sexual orientation classification. Another thing that, like just don't, build it. This is this a couple papers that, like just won't die. They keep.

B

You know resurfacing simcha. Some colleagues of mine wrote this great medium post last year that just kind of like debunked one of these papers and yeah read that it's good and yeah. This is a startup that markets itself as being the first to technology on first, a market with proprietary, computer vision and machine learning, technology for profiling, people and revealing their personality is based only on their facial image and it'll-it'll predict things like high IQ and white-collar offender and terrorists just from an image so again, like just think of it, if you're building something.

B

Why are you building this like? Just because you can create a dataset with you know, inputs and labels, and you can train a system to make that prediction? It doesn't mean that you should be doing it. Please yeah I, know capitalism. Man I know.

B

So pretty yeah! This is another, this isn't from the same startup. But this is another paper that came out predicting criminality. Again some colleagues of mine. They wrote a blog post, basically where they, you know we're kind of showing that, like this article is all about like all like the angles between like the nose and the corners, your lips, like that's predictive, but that's also like predictive of rounding at smiling. So like really, what are you doing here so yeah?

B

If the classification, your problem, that you're working on is rooted in scientific racism, then please don't do it or rooted in other? You know sort of systems of like social classification and oppression.

B

Don't do it so physiognomy is this kind of pseudoscience that had a lot of popularity in the turn of the century, and you know the idea here is to try and infer like inner characteristics of individuals based solely on their appearance, and this was sort of basically a scientific way of justifying racism, and you know we're kind of this was this was rejected, but we're like seeing a recent revival of it with machine learning technologies.

B

So, just like be aware of the history of the types of problems that you're trying to solve and the ultimate impact that it could have and yeah. This is a good blog post. This is a blog post in response to that, like predicting criminality, but it also kind of like goes through a whole history here, so I would recommend deleting that as well and yeah. Overall social context is important.

B

So a lot of machine learning fairness work focuses very specifically on equalizing arrow statistics across different groups, and this is like a really important part of understanding how technologies are working and a really important thing to consider. When you're you know, building and designing machine learning technologies, but kind of assessing fairness, purely an algorithmic level is insufficient. Ultimately, you know we need to be understanding. How are these systems going to be used?

B

What are the social structures that they're going to slot into, and and really thinking about, that sort of fully contextualized understanding when we're trying to build fair and equitable, just inclusive machine learning models and yeah? So I kind of like initially phrase this like this right? You had your data, you had your model and you might think about intended uses and hopefully you're documenting all of this, and hopefully you're.

B

You know doing really rigorous testing throughout the whole thing, but ultimately, I think that, like all of these questions about social impact and potential for abuse should be asked like right from the get-go.

B

You know design data collection, model, development, testing deployment, like all of these things like really need to be reframed within a critical understanding of the different social structures, the influence data generation and collection, and then also how this technology is going to be used, and none of this can kind of come after the model is trained, so yeah there we go and it's just some time cool.