Red Hat OpenShift San Francisco 2019 | OpenShift Commons Gathering, 28 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Panel: The Future and Ethics of AI with Alex Housely, Frederick Kautz and Daniel Riek

Description

Panel: The Future and Ethics of AI with Alex Housely of Seldon, Frederick Kautz of Doc.ai and Daniel Riek of Red Hat.

Filmed October 28th, 2019 in San Francisco.

A

Aaron, take it away and introduce your folks all.

B

Right can yep come on? Okay, oh so I'm Aaron, Boyd I spoke a little bit earlier about storage. I'm gonna, be your moderator if you are just joining us, I work for Red, Hat and office of the CTO, primarily working on hybrid cloud and multi cluster capabilities for storage. But enough about me I'm, not the smarties on this panel.

B

um So if you guys want to go ahead and introduce yourselves and maybe tell us your most controversial opinion of the future of AI, just to really start it off with a bang Daniel, you want to start then.

C

A reek red head office of CTO I manage the AI Center of Excellence.

C

The most controversial opinion is like I, think we're better off with AI and and I'm a big fan of Max tegmark, who, if I paraphrase and Bizzy, says that it's okay, if AI, replaces us, because it's just the next round of evolution of life, so I think that's! Okay in the long term, okay.

B

A

Good afternoon so I'm Alex Hasley, founder and CEO Sheldon we're an open source machine learning, deployment platform, providing models, serving model management and governance, so yeah good question, and so I I. My most controversial view is really that this kind of AGI thing that people are talking about is probably unlikely to happen. Okay,.

B

Are you all familiar with what that is doing, Alex to explain more AGI.

A

Artificial general intelligence, so you know in all of these discussions, so I.

A

The topic quite quickly goes on to this kind of singularity, and you know the world in super intelligence and I think you know. That is quite what you know way off, if not unlikely to happen at all and there's many more exciting things that that you know changing the world, revolutionising all industries transforming with the technology that we currently have available today. So I think it's really good to focus on that yeah.

B

And Fred so.

D

I'm Frederick I'm, head of edge infrastructure over at da ki, which does medical, AI I, also have worked in the open source community extensively and networking, and one of the things that I focused on is bringing things like AI into into the infrastructure. My most controversial opinion I think among many is I. Don't think that we're doing enough socially to try to work out what to do when AI starts to replace jobs, and we need to start focusing on that now and not wait great.

B

Those are all really great points. Thank you guys. So, I'm glad you guys touched on how AI is going to improve our lives. It's not necessarily this. You know doom, except for maybe some jobs, but for the most part, AI seems to always be seen as a bit of a savior in enhancing our lives and a new technology that's going to revolutionize, but we're also seeing how that technology can exploit people in their data in their privacy. So, since this panel is about AI and ethics, tell me why we need ethics in AI Frederick.

B

If you want to start sure.

D

So I think you framed it very well, so like AI is going to be everywhere and it's something that is absolutely going to improve many aspects of our life but like any particular tool. The AI is particularly interesting and that, as we start to build more AI models, the type of things that we decide to build or excuse me the the type of things that we that we decide to train it on the the types of biases that exist in the data centers are amplified.

D

It says it's not just like hey I have this tool: I'm gonna use a tool, I used it once it's like I built. This tool is tool, it's fully automated and it learns- and it's going to do this thing over and over and over again. So we need to make sure that the type of things that we build that when, when we build something, that's going that it's going to reflect on our on our TEKsystems on howie, on how we approach things.

D

And so we need to make sure that we that we form these type of thoughts, that we have these type of discussions that are absolutely important to have so that we can all be aware of them. And even if you don't have all the answers- or we don't have all the answers today, just the fact that we're a little bit more aware of them means that we can drive in the in the right direction and getting come up with a more fair and and beneficial outcome.

B

Okay, so Alex can you address with the discussion of ethics, how we can practically enforce that or lead the community in that way, yeah I wouldn't.

A

Say it's so much about enforcement, but more about like how can you put in place processes, tools, etc, to enable an organization to actually deploy machine learning models in a fair and ethical way? You know, data science in itself was and still is, in many cases, a big challenge for organizations.

A

So you know everything everything from of tea, gathering the data training building models, and then you know, operation lighting, the models I think about sixty to sixty percent or so organizations now are doing some kind of machine learning, but only about 13% of those are in in production.

A

So you know that's great for the kind of challenge between data science and DevOps and how the two kind of teams kind of collaborate through deployment, but the things which really matter you know to our organization as a whole and to you know, execs and and and sort of you know people at board levels etc are around. You know, will my organization get fined by regulators? Will we get reputational damage you know? Will we kill people by accident?

A

You know those kind of things are, you know incredibly damaging to a business, and so you know these are things which ultimately will be driven by ethical principles which are kind of commonly accepted, but not formalized, and then regulation is only starting to emerge and as they emerge, obviously a moving target, and you know both broadly across industries and sort of industry specific, and what companies have to do is a big challenge here of translating these rules written in English or other languages into code, which they can drop into their machine learning pipelines to avoid them from blocking up, and so you know, that's a very big problem and and one which is really best sought, but best solved through open source collaboration.

A

You know a lot of the best tools that we've seen emerge for things like explainable AI, bi, a detection, etc. Right.

B

A

Have emerged in open-source so and that's obviously built upon open research right.

B

And so with those rules, you know that will train these models. There also is liability around those Daniel. Can you go into how liability today, with our models, is maybe going to become more important in the future right.

C

Yeah, so, in the way ice used today, it's basically in in use case where you have limited liability or you are scapegoating someone else with a liability and I was like you know, if I, if I Drive a self-driving car famous car maker from California I, you know it drives itself on the highway about in messages. Well, I'm, not going to say how fast it's going.

C

That would be admitting to a misdemeanor but and like I'm, of course, not necessarily paying the same attention and then Jeffrey's talked about it this morning, even like a cruise control, you not paying as much attention now that car is actually driving itself on the highway, taking exits and things like that, the way in and if that car killed someone.

C

It's a big scandal right that that happened and it's it's a whole different story like humans, like self-driving cars, have a much better track record than humans, as in like they kill less people per a million miles driven, but it's still like if it happens once it's about twice, it's a it's, a big big scandal and the way they're working around it. Basically, they're telling you, as a driver, you're still responsible right, even though everyone knows that you're not living up to the responsibility.

C

The whole point of having that car is so you don't so that, like that works for now, but that doesn't work in the long term. Right we need, and- and you know, if you, if you look at like most serious applications of AI, the lack of explain ability, the lack of control around is the biggest inhibitor to the actual use of AI in many very beneficial areas, and we are very confining it to this kind of scapegoat areas or you can or or confine it to giving advice.

C

But we are not living up to the potential of for automation because it would be too dangerous or it would be too risky from a liability point of view right.

B

And so, when we talk about as far as liability and that kind of also enters into the realm of privacy, when we create a new model and we're training that model we're using personal data, most of the time to be able to Train that so what is being done within the community to help protect users, data or randomize? The data is, as it learns, so that we're protecting user data and lowering the liability to those models that are being created. You want to start off with that, frederick sure.

D

So there's a couple things that you can start off with, so very common techniques are people are starting with things like anonymize, datasets, I think we need to be a bit careful with those, though, because even if you have a data set, that's anonymized in isolation. The moment you started to pair it up with Twitter data Facebook data, then you can often do naanum eyes many of these data sets, and so in terms of trying to protect user information.

D

So this is something that I think we should have a lot of training and focus on is like how do is how do we develop and use techniques that are designed to still learn the signal of a population or the signal of your data set, but not learn any individual part of that data set, so there's techniques that are that are emerging. So we have things like fader e, federated learning, which you leave the data where it's at remotely you send the model over to it. You train on it.

D

You send the results back, so you never have to centralize the data. You also have other techniques like differential privacy, where you add in noise in certain parts of the while you're trained in the model, and what this noise does is. It adds in plausible deniability into the model itself in such a way that it makes it very difficult to extract information out of it on any given user, but the noise is centered around it centered around zero.

D

So you still preserve the the signal, and so they actually use this technique very often for for sensitive questions when they do statistics, so they might ask a person like hey. Have you tried cocaine in the past year and if you just ask that question flat-out people will say no for a variety of reasons and if, but, if you put the person into let's say a into a box, that's isolated and you put a queen in there and you say: okay! Well, flip the coin.

D

If the coin comes up, heads answer the question: if the coin- and you flip the coin again, just do it to erase your initial coin toss if, if it was tails on the first time you flip the coin and then if it comes up heads you write yes, if its tail, it comes out. No, when someone says oh, you answered yes to this. You said yeah I am sir the the coin toss question, and so it gives them plausible deniability.

D

It turns out those same techniques work in the while you're training data, while you're training models. So we can. We can apply these type of techniques in such a way to help reserve them. So even if you have no intention of even sharing the model, but perhaps the model is is is stolen by some group of attackers or so on and ends up on the dark web, like you still have some protection for those users that you train the model on so I.

D

Think it's very important, like these type of techniques, become not only well known but become mature and and standardized through the industry, and they do require more data to train on. But as we start to develop as an industry, we're going to get better at developing on large quantities of data and also develop techniques that still allow us to to train on smaller sets of data, but still maintain these.

D

These types of prize, taking so heavily implore people to look into these type of techniques and, if you're, a researcher to also and invest in researching and some of these techniques. So.

B

You know after you've developed the model and you've you've done what you can to anonymize the data or add noise, so it makes it you know fair quote-unquote. You also have to be able to say how do we get that result? You know where is the explained ability around it Alex you want to talk about that. Yeah.

A

So you know one of the big challenges around machine learning is effectively you're pushing. You know. Large data sets through complex algorithms and producing a model which has you know millions of features and and rules effectively which are not interpretable by by people. So you know people often refer to them as like a black box and there's a trade-off really between kind of the performance or accuracy of the model and the interpret ability.

A

So you know if we take that sort of self-driving car example, the car will crash less with a you know, neural network deep learning model, which is you know, totally uninterpretable on a on the most sort of you know, precise basis, and so the challenge there is really. You know how do we still produce an explanation that you know is interpretable by humans, but you know doesn't require you to use a sort of substandard model. So you know the variety of sort of techniques emerging most through open research and open source projects.

A

A lot of you would have heard about things like lime and schapp from you know that, a few years ago, we're seeing actually from the same authors of lima, very promising, feature, attribution algorithm called anchors which will allow you to have isolated the specific features which enabled you to deliver a certain output and and then provide a score waiting on those.

A

You know. You're then able to present back to whether it's a data scientist looking to kind of debug the model effectively or to someone who's sitting on a customer service desk and needs to. You know, speak to a customer. Then you know these are that these are it's possible to explain it in the context of which features hadn't had that impact on the output, and it could be very, you know, easily visualized another technique which was seeing a lot of it's very helpful and interpretive bullets around.

A

It's called counterfactual instances, and you know this will tell you what you'd need to change on the input feature to get another output. So, for example, if you have been declined alone, it would tell you what you'd need to change on the loan application for the application to be approved, you might say, get a higher salary or whatever, and so you know that's so. A different type of question ask the explainer.

A

So that's kind of where we see explanations is not just you know a one stop or one type of question, there's lots of different questions, and it's only just starting to become kind of accept, accepted and understood, but from the work that we've been doing, it's Elden. You know we believed a lot of the techniques which are now available and really should you know, are at the standard right now that they should be adopted by regulators officially, and you know financial services another, you know, industry, regulated industries should be able to use these techniques.

A

You know in FX trading or you know, other other environments which are currently you know kind of like a no-go area for for some of these models right.

B

Okay, and so with you know this morning, Dan Jeffries was talking about making that fair. So you talking about regulation. The idea of having that would be to provide a fairness, explain ability, transparency to that, but.

B

Maybe Daniel you could talk about what place does open source play in terms of making that fare? Well,.

C

So, there's a bunch of reasons why you want this in open source right like anyway. Ultimately, if, if you can't inspect the the code, that's supposed to guarantee the fairness right on and like in in many ways I, you.

A

C

Of these techniques will actually use machine learning themselves to you, know, watch the machine learning and you get like pretty complex things where you know you have to input your code and you have you have training, data and I. You know I, think you need sufficient transparency on both of that to actually be able to trust us sure you can always put like measurements around it, but that only Nick. You can only measure what you it gets very, very complex, right and gets very hard to trust. It I think I think it's. It's really really.

C

We've proven through, like the evolution of open source and that the an open source gives you a more trustworthy model for software. There are another aspect, is you know one of our goals at the end here in, and it goes into like a different aspect of fairness.

C

Right, like you can say, like the decision, needs to be fear and explainable and transparent, and you know, and it needs to be, it needs to be explainable enough to deal with our psychological difference right that we make between a machine taking a decision, a human taking a decision, but there's also an aspect of like who has access to the technology and only if it's open source they're, you know if it's as long as its proprietary, you can't guarantee that people who have equal access- you know it goes into the whole.

C

You know arms race around AI that you know Lydia. You cannot prevent AI right like if anyone thinks we can like just not do it that that's ridiculous right, it's it's! Actually that would be unethical in itself, because we can prove that AI saves lives in we had a dreaded summit. We had a customer case of like detecting sepsis through AI, and they could prove that they saved lives with that, and there plenty examples like that right. So we have to do it like the the benefit of AI is so clear.

C

So this is not about like limiting AI, it's about making sure that is beneficial and the only way you can do, that is, if you create transparency and equal access for everyone, you avoid an arms race, and all of that, like the only way to really do that is with open source. From my point of view, okay,.

B

So Frederick, how do you feel like open source addresses the idea of bias and algorithms? Okay.

D

So we start taking a look at bias, so there's there's a couple areas where more than a few areas where bias can can come in. So on one side when you start looking at bias on the open source part. So you start looking at what techniques are used to train things, and so we want to make sure that these particular techniques are well understood, well, known and well researched, and so the more eyeballs you can get on these techniques, the the better that you are, but at the same time I.

D

Don't think that open source alone can solve many of the bias problems. So, for example, when you're working in the medical space, you have HIPAA data that you may want to train certain models on that are used to save lives, as was described. If that data is isn't. If we don't account for bias within those data sets, then we may end up with scenarios where people from from minorities or people in poverty may end up with worse outcomes than then people who have currently have significant resources.

D

And so so, we need to make sure that we, that we address it from multiple from multiple angles but being having an open source model or having an open source thing that you that you work with it helps along a variety of areas and also even interest in learning how to do some other stuff like. If you see this is how we, this is, how we fixed a bias issue and here's an open source example of how of how we solved bias.

D

That alone means that, even if you have someone do it in close source, they've learned from the open source or maybe have used an open source tool in order to make that happen, and so I do think that open source plays a very important role in in reducing bias, but certainly is not the only thing we need to do. Ok,.

B

So what do we need to do beyond that alex, including data privacy around you know not all controlling the bias and how we teach those models make sure that model is fair, but then the data we use to then train those and undo the bias. How do we protect users, privacy.

A

So from a privacy perspective, well you I'm from the UK and in Europe we have this thing called the gdpr which puts lots of annoying pop-ups on people's websites, and you know, ultimately, what it's trying to do is to request for specific opt-in for using your data. You know I think over the last you know couple of decades, or so it's kind of been generally accepted that you know you can opt in just by visiting a website or using a service without reading the long.

A

You know terms and conditions, and you know the amount of places and organizations that have access to your data are now using it.

A

You know this is, is it's pretty pretty scary, so you know think there's a change in in culture and and understanding among consumers and and people are, you know now wanting to take more ownership of you know their data and the services that they're using so yeah being up front with people about what you're, using the data for what specific data- and you know who you'd be sharing him with, is obviously very important, so companies that obviously do that, will you know altom utley be trusted.

A

You know to do more more things, so that's that's the main thing I'd say really yeah, okay,.

B

And so talking a little bit more about data companies like pin screen that can create realistic videos of someone talking and things they didn't actually say. What are we doing in open source to you know create data provenance knowing where the data is coming from and making sure that what's being presented is actually where it came from originally, oh.

A

B

Sir, so yeah any one of you can have oh well.

A

This is actually kind of work in progress this this is, you know, big topic, because you know there's like lineage between the source data through to the trained model, and then you know in production data science teams are, you know, often working a sort of a different kind of frequency to the you know, in terms of deployment to the you know, core app teams, so version control, and you know being able to track that back through metadata and and call data sets is a big challenge.

A

So there's there's you know some work from open source projects like model DB, which is it's been doing a good job on this, and you have various efforts connected to you know. Various open source ml platforms is another one that Selden's connected with called cube flow. That's trying to figure this out as well at the moment so I. You know it will come down to standards and metadata, and you know various tools that are part of that pipeline.

A

You know interoperating and sort of taking on board standards in order for to streamline that kind of handover of metadata between the components. Okay,.

B

Daniel do an ad.

C

Is that and I think so? You gave the example of lip deep, fake videos right like where we learned that, like video evidence is actually not reliable anymore because it can be faked. Very you know, progressively convincing and you're part of the problem. There is the any solution to that has in itself privacy implications, I like when you start like source signing every data that you generate.

C

You know you, video camera, basically signs all the videos, so Mithen itself becomes a problem because you just eliminated the ability to have anonymous videos, and things like that, like so I think there are a bunch of areas where we have to find broader answer in society and maybe start thinking about reducing the stakes a little bit right, because some of these things are problem like. Why is privacy increasingly like we had a face where no one cared anymore?

C

Never one published everything like everything every we had and, and we turned into society a culture of exhibit.

A

Exhibitionists.

C

Right and then and now like suddenly, we realize that the stakes actually are high and, and you know we are trying to go back and maybe we cannot go back right.

B

C

Know so they like it, they are there some deeper question. There are not technology questions that we have to answer, because you know technology will force us right. It has forced us here who is the technology we already have today and we can predict where this is going to go, that it's going to increase and we will have to get around to that. You can go into things like citizens, scores and stuff like that, or you know we in the u.s..

C

We have a new proposal for like a driver database, that's collecting all kinds of information which bezzie turns into a full-on surveillance.

C

You know, maybe you know, there's a discussion that we have to have they about like how big? How high do we want the stakes to be for this because otherwise, like I, don't think you can I, don't think you can solve all of this in technology without side effects right.

B

And that's why we need data and ethics. So if you, if you had to give- and this question goes to each one of you to answer, if you had to give one piece of advice to a project or a company- that's starting, you know to look at AI and machine. What would that advice be.

D

No and that's a that's a tough one, I think in terms of then I'll scope this around a high and ethics as opposed to just like hey. How do you do AI right, so I think part of it is take a look at what it is. The the thing that you want to do bill you're a on take a look at the impact of what is of what it's going to do have on I.

D

Do do a there's, there's models that you can do now, where that don't require AI to develop these models like what is the? What is the risk of like okay, build this particular system? What if it goes wrong? What, if something breaks what, if you know what what are? What are the reaction risks that were there were taking on with this and use that to help develop a like a thread model see about developing it in such a way that you can try to work out? Okay, what are, if I, put any I here?

D

What are the risks and I'm from there? Don't skimp on on the time necessary in order to try to well number? Why do you should it should be something you even build in the first place? But assuming you decide yes, it's it's worth it and and we're gonna build. It then don't skimp on on working on the efficacy and trying to work out. Is this thing actually doing what I think it is and go towards the explained ability and fairness and so on, especially if it's on a much more important area?

D

You know and it's it's really a mindset and in a scenario it's like not just throwing something out there, because you saw it work like give you a cake example when I was very first starting when I was first starting to her in AI like it was a couple weeks in I was super excited. My model had like a seven point: five percent accuracy and then I looked at the data and I was eighty seven point.

D

Five percent of the answers were no, and so my model was saying no to everything because it thought okay, this is great. It's working and I was like super excited and then I realized I had to spend more time to work out. Okay. Well, what what do I need to do in order to make this model right? I know, that's an extreme example, but these type of things are going to come up.

D

We're going to our models, are gonna, make mistakes and us as humans, we're gonna, make mistakes, and so like try to build these types of thread models and and spend and spend the time to ring in AI experts to to help you answer these questions. Okay,.

B

Alex you take that yeah.

A

So, oh well, I think the kind of principle of kind of move, fast and break things doesn't really work so well, when you're, you know, have a kind of ethical consequence of getting it wrong and its ethics is something which is it's not just one person's problem.

A

You know you don't just just the data scientist or you know that the data people or anything else is it's a it's a full company issue and and whilst it's important to have someone in charge of it, it's it's it really a group effort and there's not one single thing that you need to be looking for.

A

So you know there are some sort of guidelines, I'm sort of merging, one of which actually was put together by a member of my team at seldom called Alejandro saucedo who's, a founder of the Institute for ethical AI, a mission and the machine learning as if you go to ethical Institute, it's a non-profit or.

A

Give you a prompt of where you should be investigating, so you know there are kind of packs of information, packs and checklists, as opposed for people who are in boards and running projects which can help prevent them from from getting something wrong. You know by accident as well. That's one of the biggest problems here is it's a complex space and it's very easy to get something wrong if you're not looking in the right areas. Okay,.

B

Great advice, yeah.

C

I'll, pile onto that or it's you know, understand, problem space and- and you know, is this specific right, like the mentality you know in in data science, you're often happy when you get 99% right, but then you know if you apply that to IT security. Like you know, intrusion just needs to happen once and then you're screwed so like. So there is a difference right. What we're doing today, most cases are these areas where, like you know, 90 99 percent is great.

C

If you go outside of that space, where that's good enough, you need to be really really careful. Yep.

B

Okay, great well, thank you guys for taking the time it was all very sage, advice and I'll give it back over to Diane to close this out. All.

D