Red Hat OpenShift AI / Machine Learning | OpenShift Commons, 5 Dec 2017

Previous Meeting

⏯

youtube image

►

From YouTube: What's Next? Machine Learning on OpenShift Panel - Matt Farrellee, David Aronchick , Kris Overholt

Description

OpenShift Commons Gathering December 5th 2017 Austin, Texas
What's Next? Machine Learning on OpenShift Panel
Panelists: Matthew Farrellee (Red Hat), David Aronchick (Google), Kris Overholt (Anaconda)
Tushar Katarki, Red Hat, Moderator

A

All right welcome everyone from my.

B

A

Obviously, I'm very delighted to have you all here. I know it's late in the afternoon and you know the refreshments are waiting for us. This is the last panel, so I will try to keep it very exciting and hopefully not make you sleep. I am too shortcut. Archie I am a product manager on the OpenShift team at Red Hat, and this is the panel discussion on AI ml on Cuban, Aires and OpenShift.

A

We have I'm going to say a couple of times so that it sinks in we have a new sig that we are creating an open shift Commons that Dan mentioned, so you can go and obviously I'm going to plug it now and at the end, go and sign up on open shift Commons. For this sake, you know if you're interested in this topic, especially towards the end when you know you heard from this esteemed panel here so so, the logistics are I.

A

I thought you know we we do some introduction to the topic and to the panelist for about 10-15 minutes that will get into the main discussion itself, awesome questions and then I'll, give you an opportunity to ask some questions for like say ten minutes and then we'll wrap up right up with some more plugs to make sense all right cool. So a brief introduction to this topic. It's very exciting right, I mean AI, artificial intelligence and machine learning is already touching our lives, be it.

A

You know, you know: driverless driverless cars or automobiles beat personal assistants like Alexa and Siri beat you know, Netflix recommending your favorite movie or Pandora a fear, music or beat optimizing. Your optimal energy usage with nest. Thermostats I mean there is some AI in there I mean I mean it's. It's really already there right I mean we are saying its new what's next, but in some ways it's already there and affecting our lives, so I'm very excited to talk about this today. My personal favorite quote unquote. Use case really was this thing.

A

I was reading that they are using basically pattern recognition and image recognition to reunite families that might have been separated like place in in place like China for 20 30 years, based on photographs, like oh I, have photographs of my child I've been then in they were six years old and now, 30 years later, I mean it's only possible because of AI I mean machine learning and I remember doing pattern, recognition 15 20 years ago in grad, school or whatever, and it was so much more elementary and so anyhow, so I'm very excited about this.

A

The one thing is, you know: AI is not easy right, I mean AI has been talked about for a long long time in academia. So why has it not happened so far? It's because we didn't have things like cloud and we didn't have things like data big data. So it's certainly building upon a couple of huge technology trends, but also those technology trends and those use cases, as you know, are also moving very fast. So yeah is building upon that. So it is complicated. It is a number of languages and we'll talk about that.

A

It means a number of frameworks it uses. A number of it is very computed, computationally intensive resource hungry, so you really want to optimize the use of it so and it touches a number of different roles. I mean data scientists, obvious being a obvious new one right. So so we have all this complexity and what I thought we would do really is the fact that you know one of advantages that we have we're going to talk about. We talked today, but over the next few days, we're going to talk about containers, I mean simplistically putting.

A

Hopefully we can continue I some of this complexity away is how I'm, looking at it for the some of this complexity that we saw containers also are lightweight, they are fast and efficient and they enable you to they're portable across a hybrid cloud footprint so so, and kubernetes and OpenShift obviously have emerged as really as you saw in Chris Wright's keynote. You know have really emerged as a very powerful container platform and in fact you might all call it the de facto platform. So so it is great to have to moderate this panel.

A

On a IML and open shift and cuban Eddie so with that, let me start with the introduction of a panel here. I start with David. First David Arendt Chuck is a product manager at Google. He is he's a manager for Google container engine and has been shipping software for 20 years in various rules has been a founder of three startups and has had strengths at Microsoft, Amazon, chef and obviously at Google.

A

David now is focusing on ml on Cuban Eddy so, and he has some very exciting news that he'll share with us in a bit, but they would say hi to everyone and add something that I might miss. That's.

C

20 years worth of summary right there so.

A

C

Pretty good for me all.

A

Right, okay, so next we have Chris Orel. He is a protein manager anaconda. He is a prog manager for the data science platform. He is, he has expertise in distributed systems, data engineering and computational science workflows. He has a PhD in civil engineering from UT Austin and prior to that, prior to anaconda he was at National Institute of Standards and Technology, and ist southwestern Research Institute and University of Dickson Texas at Austin, High and anything else. Yeah.

D

Hello: everyone thanks for having here it's an honor and our headquarters of anaconda is about five blocks over. So this is our home base. Here we have about a hundred employees if you're not familiar with anaconda. It's leading open source data, science distribution. We have just about 5 million unique data science users around the world, Windows Mac Linux, and it's a lot of foundational pieces of data science, machine learning and gateways for folks to get into things like notebooks and machine learning with tensorflow. So excited to talk about that here. Today, five million.

A

That's a big number. You know thanks thanks Chris, for that, and last but not least, we have Matt Farah Lee is a senior engineer and architect from Red Hat. He was. He is the founding member of a project that our analytics, which he's going to talk about in a bit yeah, which is about which is about open source data, analytics ml platform based on Apache spark on openshift and kubernetes. He was one of the founding members of Sahara.

A

I could say that right on in OpenStack, which is the OpenStack big data processing project, he was involved in the university of wisconsin kondal project, or some of you who might know that that was like the high-performance throughput computing project is kind of early pioneers in distributed cluster computing. That's only say: hi hi.

E

Everyone- and that was great, thank you very much- all.

A

Right thanks, I've known Matt for several years now, including that University of canvas constant process all right. So let's dive right into it right. This was the introduction, so, let's dive right into it, so that we have a common understanding. What I'd like to do is what does AI and ML mean to you? I'll start with you David and we'll circle around or we can go in any order, but what does it mean to you scope it a little bit? What is it? What is it not absolutely.

C

So I I always think it's funny and somewhat not great. When people joke about you know, ái coming to murder, us all and and I am as guilty of that as anyone, but please there are many many people in the world who do not understand AI, nml and like to hear it even as jokes from the people who do know not so great. So please.

C

Let me plead about that when I think of ML I basically think there are three categories of problems that it really unlocks that that you've never seen before the first are where things are hard today, but they're they're tractable, you could at least potentially do it, and so that might be. You know if you had a million pictures, you know identifying what pictures had dogs in them. You could do that today. You know with your standard algorithm, you could do it today with humans and so on and so forth.

C

It would just take a really long time. Then there's the second category, which is there's problems. You know how to solve, but effectively they would be impossible to solve the computers at all right, and that would be something like beating go, for example, right, that's something where we know the rules of go and theoretically you could beat it, but no, you could not use standard.

C

Compute computational techniques today to to you know, beat go or be better than humans at that problem, and then the third is where we can't even really describe how to solve the problem. We know what a solution looks like or when we've succeeded, but there's no algorithm that we could come up with to solve it, and that would be something like you know: identifying cancer in radiology, for example, we have kind of a generalized heuristic, but if you got ten doctors together, they might still disagree. Yet.

C

Even now you have AI arc, sees me ml, being able to look at this problem and make an assessment and be better than then humans are today so, and it's still improving even beyond that. So, that's all to say the three commonalities of those problems and that's where I say the definition of ml is ml is being able to solve a problem without necessarily understanding exactly the method all to get there and that's not great right.

C

We actually should probably be able to understand it, but that's what we're unlocking today and that's probably the biggest thing that you're seeing.

D

Yeah, when I think of machine learning, if I, initially scoped, that to a library saying Python or R, it's just a collection of algorithms or statistical algorithms that can be applied to different data sets so sort of you know, import a library run it on some data, that's the beginning of machine learning, but I often think of it on an implementation timeline. How we make that useful for other people and how do we democratize that so the following the next two big stages, I, think of? Are you have a library it has some statistical functions.

D

You do things like model selection, there's you know dozens of steps beyond that, once you have something sort of working, it works on my machine. How do I share this and make this useful for the rest of the world to build and improve on and to match up with open source philosophy and I think things like standardized formats, whether it's serializing data or sharing models in efficient ways allowing other people's to build on that module early.

D

It's a big deal and there's when you start working with larger and larger groups, whether that's a foundation or an enterprise team things like reproducibility governance, traceability of those models becomes very important, then deploying that out. So people don't have to follow. You know dozens of steps to get it up and running. It's very easy to get up and running in any environment, HPC cloud on-premise and then beyond that stage of deployment and and usability really comes the consuming of that right.

D

So we actually want our greatest audience to be able to consume that in a interactive visualization or just a browser right. So there may be complicated technical stack underlying that and it all starts at the library and infrastructure level, but we think a lot and and as I've watched, different industries evolve. It's all been about model about consolidating these models and api's on a common framework and common tool set to really democratize the audience the people building and consume.

D

So to me, machine learning is those libraries plus the reproducibility, plus the collaboration at that human scale like global scale in many different industries, I.

E

D

E

Interesting answers I, take kind of more of an engineer, approach to and I think of AI as a large body of research. That's been ongoing for for many many decades within that you have things like knowledge, representation and machine learning and then within machine learning. You have things like neural.

F

E

And deep learning and whatnots, so I kind of think of it from a from a structured perspective that way when it comes to the kind of like the scope or the impact, it's more of how are a AI machine learning is giving us ways to kind of interpret the world interpret all the data. That's that's around us and going giving us new ways to interact with the worlds and interact with with other people to try.

E

You gave some examples of a machine learning apps that people may interact with on a daily like may interact to the day on a daily basis if they could like buy a Tesla or something like that, but in reality AI machine learning is is really ubiquitous already. I mean Google search. Is an example of this. That's.

E

People's lives at this point on the kind of like I, think you said what is it not I, I like to think that it's it's not it's I'm gonna kind of violate David's comments a little bit, but it's not the it's, not the destruction of humanity. It's also not the savior of humanity and really it's also not a salad dressing. Although people might given the fact that it's so hyped right now, people might say it is you.

A

Know one of the stand things you hear also in this context is: oh it'll, kill all the jobs and it's I think it's not even that, because just the example of driverless cars I mean driverless doesn't mean that you can sleep inside I mean for the next 10 15 years. You still have to pay, probably some attention. You know.

E

I may not use as many kind of like revolutions of these things that people think are going to destroy humanity or whatnot and spoiler. We're still here and things are, things are, for the most part, getting better getting. What not so well the same thing with a I suppose a lot of the hype settles down the reality of what people can do with it. How it interacts with you, interactive in your life, actually becomes more clear. I'd.

C

Like to support that I actually just want to say one one thing on top of that: yeah, yes, and it, which is it won't, kill us. It won't, kill all the jobs or anything like that. But only if we all the people in this room and the people watching are responsible and think about. You are all technology, implementers creators and so on. Please do think as you're doing this stuff, don't rely on someone else doing the hard work and being aware of that.

C

But yes, I totally agree that that it by itself will not do it, but we do need responsible implementation. Yeah.

E

It's it's a tool.

A

Very good there's a there's, a good introduction to that topic. So next, as what I was going to ask each one of you and you can go in any particular order, but like what is a favorite use case for you, I mean like something that you're like get excited about. Oh I want to make this work today, because it'll solve this problem, and so what gets you excited?

A

You know in this I mean just one example: meaning multiple examples. It's up here.

E

E

Mine I think my my favorite example. My very favorite news case or whatnot is somewhat.

G

E

One so I've never been particularly good at foreign languages. I took lots of foreign languages in high school in college, but never really kind of like immersed myself in environments actually use them to actually communicate with people and I. Think. The the translation capabilities that are coming out right now are really going to make it much easier for people to communicate and much easier for me to communicate with people that I've been otherwise be able to.

D

For me, a little bit of my background is in civil engineering and specifically like life. Safety system is building protection systems, so really kept an eye on building systems and building integration, as it comes together with many different manufacturers. If you look at a building inherently it's sort of boring, it's a boxy structure with rooms as soon as you start recording and logging information like energy usage temperature occupancy, and you put all of that together in aggregated.

D

You actually get a really beautiful picture of how that building behaves as it interacts with people and people interact with it and at a city scale. It helps things like emergency responders and it's a nice example of how to bring something that was formerly static, sort of online and something that we can monitor over time and become an integrated even with many many different subsystems of many different types. So to me, the complexity of that how we wrap that all up into a some useful metrics for people is a pretty awesome.

D

Example of a smart building or a smart city and.

A

Today would certainly point some of this insights. We don't even know what we're looking for and I mean machine learning to help us. Okay, I.

C

Spend all my time in there now so I'm, always astounded at everything, I'll try and keep it super brief. I'm gonna I have a talk tomorrow, I'm going to give away some of the stuff I'm talking about, but one of them that I love is is from Google Google. As you may know, we have a lot of data centers. We hire some smart data center people and there's this term and data centers called pua power usage efficiency.

C

You want it to be as low as possible and it means like literally the number of cycles per power, and we have really.

F

C

Center engineers that and this and try and roll it out, and so we looked at it and they're like oh all, these fans and water and cooling and things like that. They they kind of look like signals 4ml, and so we hooked it up and the power usage efficiency went like this and that launched down a mile off BAM right back down and literally and we're very public about this. We save 15% on our power just by using ml against these data centers.

C

It's just such a phenomenally large number that that you can't even believe and again.

H

F

C

Not something you can just wire up instantly, but it was really really important. The second thing, I do want to say is one of one of the most interesting areas of ML. Today, it particularly use case wise, something called gams. Is anyone familiar with those generative adversarial networks?

C

Basically imagine this incredibly simple, summary is: is you take to a eyes or excuse me to ML frameworks models and you pit them against each other, so the first one tries to figure out a solution, and the second one tries to figure out something that breaks the first solution. You just force them to go against each other.

C

It's just unbelievable and you're getting things, and this helped with the translation and- and it's just you see this all over the place because for us right now we're entering this phase where we're actually having trouble coming up with exactly the right data to you know, face off against the model and and I remain very, very optimistic that this is the area where you'll see the most creativity coming, because literally, we can't even think of the ways to break the model in such a way that the model can get better anymore and very.

A

Cool very cool all right, so what I'll do next is kind of try to get more deeper into it. We I talked earlier about how this all nice sounds very exciting, but it's also complex and it's not easy. So starting with David. One of the things that I was going to ask is what are some of the challenges in ein ml that you see today and how is kubernetes playing a role in it and maybe there's a time to talk about one of the things that you won't talk about. Sure.

C

So generally, I see a couple of really big things.

C

The first is really the approach ability of ml today, if I saw any of you down for any non ml practitioners and and walked you through what the average ml person did. You'd be shocked at like how absolutely tribal and back-of-the-envelope- and you know- oh- maybe I'll- tweak this number. You know the way it is right now, which is really really disturbing for a bunch of yes folks who are like well, there should be a standard process.

C

We're going through this exploring and I think we'll get a lot better there, but part of it relates to the second big problem, which is real transparency and understanding being able to probe into a model. So if I run an application today, you know I, can you know attach to the the process and see exactly you know, what's being called at what time?

C

And you know what's using memory and things like that and there's not that kind of level of introspection into model and how it's performing right now, which is not great, that's kind of a second major effort and and both of those kind of relate to the third thing- and this is something up- I'll talk about a little bit, but it's basically one of the biggest problems is: is that it's not easy to roll out a standard ml deployment?

C

Everything is, is very bespoke you kind of piece together whatever works and how you would like to approach it, and that's not great, because that means that that standard, tooling doesn't work anymore. You now need to not only create this, this stack, but then on the other, half of that create a set of tooling. That supports your stack and lets you introspect and lets you understand. What's going on, let's you auto-tune and all those various things that said, that's where I hope that kubernetes can help help us.

C

Today, people generally create their stacks from the bottoms up all the way down. They they understand exactly what version of Python what libraries they're running, what you know, networks all these various things and that's just too much for the average data scientist to approach and the data science shouldn't have to think about that, and that's where kubernetes has really changed them change the game.

C

They create this wonderful standard, abstraction over the the infrastructure that you're running on and not just create an abstraction, but actually create rich objects that allow you to interact with various components of the platform and to your point, like help you wire a bunch of services together. So I am very optimistic and, like I said, I'll talk about it just a little bit. What I think the future looks like for ml on kubernetes.

D

Definitely so in terms of data science with python and are many of our users. If you sort of climb the stack over the past couple of years with our users anaconda solve the problem of I need to get Jupiter up and running with tensorflow, with all of its Fortran and C dependencies as quickly as possible across Windows, Mac and Linux. So that was a good thing.

D

The next question was now I want to share this analysis or this model, or this server, or this visualization with my buddy on a different operating system, or so they need to install system packages. Allow these things on their firewall, get these other libraries and all they miss master version of this. So docker was a nice addition to something like an anaconda Python distribution, we're sort of where the open source Conda package manager left off in terms of environments.

D

It took over and said: oh now, I can bake everything into an image and it's very portable and then the next layer was the resource management, scalability orchestration and that's where kubernetes came in, because what we found about a year ago was that our users are building amazingly different things and amazing things with anaconda and the last mile was now what now? How do I deploy this thing out? So it did a science deployment. You can read.

D

Blog post of you know, 20 easy steps to data science deployment right, spin up a web server, do a reverse proxy set up your SSL certs hook it with authentication on and on, but what kubernetes did for us was to create an echo system that was a common interface and a way for to take what we had already built with the package manager with docker images and containerization, and give us that last mile we're now someone using Python to iPhone 3, our Lua Giulia Fortran C dependencies that doesn't matter can all play in the same space.

D

They no longer have to worry about things clobbering, one another, an environment, it all just works at that abstraction layer. For these people who just want micro service, they just want their model to run and share that run alongside others, so they can build on top of it without having to go through that over and over and over. So this has been very important for enterprises, adopting things like machine learning and ml large organizations working together and and really democratizing environments right.

D

The fact that they can deploy to the cloud or on Prem without changing anything is a game changer. That means we don't have to switch api's every single time we want to deploy somewhere so really in the last year, deployment has data science deployment from anywhere from interactive is models to machine learning. Libraries and all the above has just become that much more curved. Eight pervasive through things like kubernetes, containerization, isolation and orchestration.

E

So I'll be quick and I'll build a little bit on what was said years, I, think and also.

A

I mean I mean touch upon data right like one of the things and how.

E

A

Is so important for all this, and also.

E

So quickly, I think kubernetes has done a tremendous job really providing the API the interface that's expected by operations, people by system mins by developers. It's really it's really codified a lot of their best practices over many many decades of experience, there's with with AI machine learning. There is a shift in the way that the system, the systems operate, the way that they're built that's kubernetes is going to have adapt to. To some extent.

E

It's you know, there's there's an understanding, I think is really being formed, will be talked about with data later is how how data Sciences operates and what expectations they have and then what expectations, the things that they build have on the infrastructure that they're running a concrete example that that we usually use is thinking about it. Rot thinking about that rot from a developer perspective is you've deployed some piece of code and rot.

E

It's something that happens over long periods of time, usually with some sort of dependency changes or some sort of input, changes and things fail, and things fail in a fairly drastic fashion and the infrastructure kubernetes understands how to deal with systems that that do that AI machine learning systems are inherently more more statistical based, they're they're. Not they don't give you that that clear things afoot mom fails. They just start performing sub-optimally and detecting that suboptimal performance being able to infrastructure that can respond to.

E

It is something that really is really gonna, be more and more important, as we try to take the data science artifacts that are produced from the disability system and put them onto something that an enterprise really really understands.

E

A

Matt, let me start with a kind of a new line of questions in some ways is like tell us what the Red Hat is doing in in this space and how there's some of the projects that you are leading? What are you of contributing? Do you want to walk us through that sure? Just.

E

Really really quickly, from my perspective, truly from my perspective, really think red hats getting into this conversation by a machine, learning and I'll say two important ways: three interacting with our with our customers: one is that we're really consuming best practices or AI machine learning, we're understanding what it means to to build these applications to use these applications to improve our business and what not, so that becomes really important to be a practitioner of these things.

E

The second thing we're doing is we're certain to have the conversation as to how do we influence? How do we give back to what these best practices are? So Chris mentions the the Rabb analytics work. That's going on this is this is a output which is starting to form some of the the understandings that we've we've built up over the last number of years, using a using machine learning.

E

So then that's that's really key, and then the third thing of how are that adding to this is we're actually putting out services and software for our customers that have don't have a big AI machine learning stamp on them, because in the in the end, it's a tool to do something but are actually like powered by AI and machine learning. Chris mentioned too earlier. Today you mentioned the Red.

E

Hat insights also mentions work, that's happening with open shift, I/o I'll, add a third which is actually kind of like intelligent routing for supports or getting customers to solutions.

A

Who wants to talk about some of the work that is happening? I mean we talked a little earlier about the Cuban early resource management, working group and some of the work that is happening with respect to GP, GP use and implement of that and some of the work that is happening in Cuban Ares. Who wants to talk about that.

E

I'll, throw throw a couple words so, like there's, there's work happening with those working groups, arounds figuring out the kind of like hardware technology that is becoming more and more important for these machine learning algorithms and making sure that those that hardware is is exposed and accessible to the algorithms that actually running on top of of openshift. There's the it's the performance, sensitive, pod, application, pod work, that's happening to really kind of like make sure that kubernetes it has a very solid foundation.

E

Not in not just in the the ops API is that developer API is but then and the hardware support to make it worthwhile to use.

A

All right so I, the next question really was what I was going to tee off really was this way right? So if I mean one of the things that everybody is thinking and like everybody makes these decisions, I think. Why should we care now right? So can you address that, especially like you know how data is important and even if you decide to do something today, you know you might not have collected the data.

A

How difficult data is for an organization to collect internally, but also extra mout side and what's the dynamics there as well, as you know how that's important for AI and ml.

C

That's fine, so I mean I think that we are awash in data in a way that we've never been before you know, literally, we are collecting data from every movement right, every device, Fitbit trackers. You know the sensors in this room, repeating heat and and thermostats, and so on and so forth. All the way up to the you know largest possible. You know number of queries, user, behavior and things like that.

C

So we're at this phase, where it's just absolutely transformative relative to data and and I think there will be a really important transformation that goes on. You know. I research errs out there nowadays would argue that that actually, with with all the hardware investments and all the data investments, we have everything that's necessary to make these great decisions.

C

The problem is, it's not available in a format or a way for us to consume it, or our algorithms are just wrong and too slow and so on and so forth and I think that's a really valid point, I think probably the biggest problem today and when you think about your average pipeline and the average number of the amount of time that anyone spends building a model versus everything else, that's involved in building on ml.

C

You know, building a model is very, very small versus ingesting getting rid of outliers feature engineering, transforming it moving the data around in a pipeline regular way, let alone after that comes out. You know, are you tracking? It? Are you being responsible security-wise, all that good stuff, I, think a lot of this stuff is gated on the process and the pipeline's, rather than just you know, the the actual implementation of building your model I.

D

Think a couple of interesting things we've observed in the last year happening in the data world. One of them is related to data virtualization right. The data is going to live in different places.

D

We know that it's going to live in different formats and that's something that data science, both of the machine learning library level and at the operational kubernetes level that we know as a fact nmf and folks have worked really hard to build in connectivity, whether it's Amazon s3, Google, Storage, on-prem HDFS different connectors are you know it's very important that those that they have high value to be maintained over time right.

D

That's that's a lot of hard work that goes into that, and things like standard data formats and best practices for things like Apache, park' or column, or data stores. These have become so from the from the Python and our perspective and data science right Python has connect to just about every data format and data source. You can imagine it's part of the Python data science and a kind of philosophy rights. It just connects to all sorts of remote data and compute sources.

D

But what really happens is we're seeing users exercise, you know for a given problem. It's best to use park' stored in this particular data store for performance reasons, for training reasons, so we're seeing a lot of us. We see these B get exercised in different verticals and another thing interesting I've seen in the last year is you know, generating synthetic data when training right? So just sometimes you just don't have enough data.

D

It's like it started when you need to do model selection on natural language, processing or image classification and we've seen really interesting use of generating huge datasets in parallel that can be used for the training iterative process, and then you can bring in the real data on on a rolling basis. So between those two things, data format, data storage, especially many remote sources, I think that it is. It is hard work and it's something that was recognized up front and kubernetes and containers, and it's going to be hard work to continue to maintain.

D

But we're gonna learn the high value connections of things like standardized data formats as the training, whether it's a Mis classification or NLP. It's orders of magnitude difference in performance when use the right tool for the right job with the right data format and the right data storage. So that's all starting to come together. I think we're learning a lot too that, together even in the open source and cloud activity, that's going on.

A

Cool I was going to go to the Q&A next, but before that, I wanted to give you each an opportunity to kind of talk a little bit about something that's happening here at the conference, a plug that you want to do for a announcement that you want to make so David. You want to absolutely.

C

Thank you well, my plug is for my talk tomorrow. Please come but I do want to talk about something that we're doing just between everyone in the room and those on Facebook, where we're launching something which is designed to solve exactly a lot of the problems that we talked about here on stage, it's called cube flow and the idea is, it is a standard ml stack for running ml on top of kubernetes. It is not about re implementing all the great hard work.

C

That's out there in the world, tensorflow XG, booze I can't learn anything like that or any of the UI's or any of the transformation tools. It is really about much in the same way that kubernetes didn't go and re-implement a database serving tool as something like that it just allowed you to take that containerized tool and spin it up in a very elegant way, but not just elegant, but also portable and very scalable. So you could deploy it to your laptop. You could deploy it to a GPU rig.

C

You could deploy it to a cluster all with the same command repeatably, and that is something that we're very, very you know happy to get out the door, because this is something that we hear so often from customers. Oh geez, you know I wanted to go down and mail, but I had to you know completely rambling that stack or I had to build it. Myself or my. You know, my data scientists had the wrong version of Python and so everything failed.

C

Our goal is, with cube flow, to be this very open framework that the community can come together and help collaborate on so we're kind of being quiet about it right now. We love your thoughts, but the github repo is open and please join in all.

I

D

My plug is for anaconda what else, if I guess how many folks in here have downloaded or used anaconda.

D

So if you haven't it's a free download, Windows, Mac and Linux, it's up to a thousand libraries for Python and are any area you can think of image classification, natural language processing in LT, caged and sim Jupiter, notebooks and machine learning. We've been very busy, adding more and more libraries tensorflow with GPU support and the nice thing is you just conned it install it. It's all precompiled across Windows, Mac or Linux makes it very easy to use free to use on any of those platforms.

D

And then we have anaconda Enterprise, which you can sign up for a 30-day trial of it's pretty much a manifestation of data science platform with collaboration, authentication security, but the it's all powered on the underlying anaconda distribution and the Conda package manager. So if you haven't used that and you're tired of living dependency, hell and and dealing with Fortran C library system, libraries and when doing machine learning, try out anaconda and let us know what you think cool.

A

E

I'm gonna book into you here with Cooper on the other side too I think one of the one of the really important things that we should be looking at when it comes to something like like coop flow and what's what Davison to talk more about is there are there are many many organizations out there who have been producing bespoke solutions for building these pipelines? Building these flow is trying to put them into production, trying to address how data scientists were trying to address how operations folks work.

E

So there's there's a lot of there's a lot of engineering work. That's gone into this there's. Also a lots of engineering product work. That's gone into this. You see cloud vendors are producing solutions for these almost on a weekly basis.

E

If you look over the last the last few months or so, what's really missing in the space right now is a place where a community you can form to really not just have a single single vendor or a single instance position on what this flow should be, what the interfaces should be, how things should fit together, so I'm really hoping looking forward to coop flow as a potential place where that community work can really happen.

E

A

And and if David forgot one, so let me plug it for him, there's a birds of a feather boa fishin at 7:00 7:30, something like that evening. Tomorrow,.

C

At I, I think I think it's early.

A

I think it's 5:30, but our.

F

C

It's a birds of a feather ml on kubernetes I'm gonna, be there to talk about cue flow, but I want to be clear. It is the organization the community that comes together and wants to talk about whatever they want.

C

A

Right, thank you and we'll come back to you for one final thought, but we'll do some Q&A now so I'll open it up and bang.

H

I'm gonna put in one more plug too. If you go to Commons at OpenShift org. Halfway down the page, there is an ml working group that we're starting up on the open chef Commons. So if you're interested this- and you want to get involved and hear more about the best practices and lessons that we're learning around coop flow, please sign up there as well. So do we have any questions in the audience? I know it's towards the end of the day. There's.

A

One right next to you.

B

Hi I was just wondering if there was anything in the predictive analytics space, in particular that you're excited about that. You feels really interesting. I.

C

Mean I, honest to god, I see something almost almost every day. I think the thing that's that I I have I approached this and I'm, not it, and you know ml PhD by any stretch of the imagination. I came at it like you know, the ml space is being like.

C

Well, how hard could it be to take an existing model and transfer it to this new space and and what I found is that, unfortunately, today it is highly highly bespoke like you could have a predictive model that you know is 98% accurate in the movie theater ticket space that doesn't work in the baseball game, ticket space and it's just kind of crazy, but that is where we are so I guess the things that I'm most optimistic about are things around transfer learning, really, where you know basically imagine your your model is I, don't know 100, neuron or 100 layers, you strip off the last five layers, and then you retrain on a much smaller set of data.

C

I think that's that's very exciting to me and there's a lot to do there to say the least, but that you know reduces the total amount of data you need by. You know, factor of a hundred or more often and and that's quite compelling.

D

Now, for me, it's a little bit more rudimentary, but it's exciting to watch our users going through the process of you know: dropping their batch jobs in terms of models that are constantly training constantly running. Instead of sort of this daily or weekly thing. That's exciting, because there's a lot of sort of waste of time that goes into these iterations and the daily iterations, as opposed to just bringing something online and having it run, sort of an ongoing basis and the other part that's interesting again.

D

It's not cutting edge, but it's watching our users refactor the way that they work into micro services. So what would have previously been a monolithic image classifier with a UI built onto it, with a very specific declarative way of doing something? Let's say recognizing images or edges is now completely different in the way we're seeing our users in the past years, sort of build a specialized API that just does the classification and a specialized front-end for that.

D

That's modular that can swap between the different backends so actually watching that roll out into the larger masses, and not just the developers. The bleeping edge developers is actually really nice to watch and it lets it lets us sort of focus around the best tool for the best job. Instead of a monolithic approach to everything so we're starting to see those projects get deprecated and sometimes broken up to micro-services are actually healthier than the original monolith. So it's exciting to watch that happen as it as things get adopted more and more across the industry. So.

E

Just so two quick things, then one is I want to add on to David's comments about about transfer learning it people need to watch this space as the real. The vast majority of the complexity, that's happening in data science and data data engineering, work, model, design or whatnot and being able to reuse that as a developer is going to be hugely empowering. So that's that's really key to watch out for to your question about something happening in the predictive space.

E

As a user I'm really excited about more work, that's happening around like news, curation or really curation of my RSS feeds. There's more work happening there. It's just save you lots of time. In the morning, yeah.

A

I mean at least from my perspective, to add to that right, I mean something that you mentioned using the digital exhaust like logs and metrics and stuff like that, and how do we make our systems much more smarter in terms of scaling or in terms of even a better fault, tolerance, etc, etc? Is something of very a lot of interest for us from the right head perspective? One more question: all right: hi.

G

So I've heard a lot of people complain of how, when technology advances, humanity regresses. So how, as technologists or advocate of technology, are we supposed to educate people to use this to our advantage and not the opposite?.

E

We really need to become data literate. We need to understand what the sources are. We need to be teaching people to understands how and what data is how data is used and what the potential is and and really, as from a personal perspective, also looking at understanding what to use the term our data, our data exhaust, is in the in the world today, I think.

E

If we, if we can teach people, we can educate people about data, how to eat, how it's used, how they can use it and then what they're actually producing well we'll have the building blocks for not regressing I.

D

Think a big piece of empowering both the producers and consumers of ML and AI is about transparency, reproducibility of the models and the analyses themselves, so I think I. Think a bad example is treating something as a black box. You know only runs in a certain environment and we don't really know why it works so well, but it works great and saves us money. It's not a good approach. You know, I come from a sort of civil engineering, very hands-on physical engineering. I. Think ml to me is the same.

D

I often relate to it in a way that, when I think about the hyper parameters or distributions that are going through, I want to see those all the way through I. Don't ever want to see a step that I don't really know what happened to that distribution or hyper parameter, but it looks good. You know that I think you know, as as producers, maybe they're very careful to document and and be very open about.

D

The stages are going through the layers and what intentionally each layer was put in for in a given analyses and then, as a consumer, we need to ask those questions of hey this really nice model. It's. It performed outperforms, the others by ten percent.

D

But why can you reason through why, whether it's a physical model, reasoning or a hyper parameter tuning just having that global picture often lets us not focus so much on one well performing local model, but really get a big picture of the context you know, especially depending on the industry you're in in finance or healthcare that can have really big impact, so reproduce with transparency of the models.

D

I think are the two key things as a as a citizen data scientist right to to always exercise and either produce or ask of things that are being produced. I.

C

I think those are both terrific answers. I, you know I think if I was going to kind of generalize it a little bit. There are two key things that these both factoring to anyone familiar with chaos monkey it's the Netflix tool that they used to actively kill machines randomly to to tease out issues like technology is not neutral. We need to be aware of that. We are technologists. We need to be aware that it is not neutral.

C

It has a positive or negative effect, and it is up to us to be our own chaos monkeys for the technology we roll out. We need to be probing in every possible way and and be mindful that hey, how am I checked to make sure that this model that I rolled out doesn't actively bias against a certain population, whatever that might be have I checked to make sure that the you know what the edge cases look like here and hey? Is this an area.

F

C

Be clear, you are not reinventing anything I promise you right.

C

If you develop a loan analysis system for looking at someone's credit report, there is literally 50 years of sociological research showing how minorities and people of color and women were biased against through standard methodologies like long before ml came along, go, be smart about that area and the ways that it impacts from even prior to technology and be thoughtful and mindful about entering that space again and and don't try and be an expert in it either you will get it wrong, go, find an expert and and chat with them and understand how you can be better at it.

C

So that's largely it, but to be honest, we will never educate the population. We need to be doing the educational work on our own, and- and you know it really have a hard line about this great.

A

Answers next question: I guess.

F

I did follow on you, guys were talking about it from a producer of of ML technologies and I'm thinking about it from a consumer of ML technologies. In terms of is there development or some kind of transparency guidelines that we can use to figure out? Ok, when a when a ml model makes a certain decision, why is it making that decision and is there a way that can tell if I'm consuming, like you know, Google's version of this algorithm versus Microsoft's version of this algorithm?

F

You know which one is going to give me the the appropriate transparency, and how do I trace it to make sure that the decision is not a biased decision. You know on a case-by-case basis, but also on a software by software basis. I'll.

A

Give a very concrete example of that: I use Google Maps a lot and I, always wonder how many times is it experimenting this time on me? You know so, and that goes to the consumer transparency. Is that I mean? Will we know that you are being experimented upon this time.

C

You will never know that yeah that's their goal, but any AI you like or any solution- and this is not a re or ml related they're, just gonna experiment. They want to see whether or not a new thing works. So that's that's the problem, not the problem. I, you know I know technology is not the solution or panacea, or anything like that. My hope is that and I know.

C

This is my job to pitch my new thing, but my hope is that by creating standardized ml stacks with somewhat standardized reusable components, we will develop standardized, reusable transparency tools. For that I mean, though it is impossible to look at you know, for example, there are two very very popular image recognition models right now out there right now, ResNet and imagenet they're, both very very successful. They both perform better than human right. Now you could not use transparency, analysis tools that you built for one with the other.

C

There are just completely different layouts and models, and so on, and so my hope is that by building some of these standard two, you can do it, but let me make a pitch out there. I would love someone to build chaos, monkey 4ml, meaning you don't need to introspect into the model. Right like you, could build this and say: hey I have a set of multiple different population types as data that I can feed into your model and get the results back on the other end, and it doesn't have to be real humans.

C

They can be totally anonymized but like if at the end your model comes out and it's biased, then you're like eh, something bad is happening here and and it doesn't that didn't give you the transparency that we all should demand, and literally there are a hundred PhDs working on introspecting into models. Today, but at least then we have some awareness and so I I will pitch that and I will endorse and find Google engineers to help you. If you want to lead.

C

You know that kind of thing exactly but but test cases where it's not like. We're we know what the population source was. It's the population source is not made available to the model right that you just hand these objects in, and some results come out the other side and that that test on the other side looks for bias against populations.

C

E

So two quick Atheneum, two quick things, one going back to my my definition of what AI is, what machine learning is and whatnots. We need to understand that machine learning there. There are some areas in machine learning, neural networks, deep learning capabilities is one particular area. That's that's being applied a whole lot right now and it specifically has interpret e concerns associated with it.

E

There are other approaches that are that are better in some use cases or worse and worse, and others like like image, recognition, speech things like that, but are interpretive when it does come to neural networks. I think we need to extend your question as to is it it's not just to focus on the model, but it gets more so kind of like what David is talking about this?

E

It's it's the focus on the model, the code that was associated with it and the data that was that was used to derive it and we'll we don't know it. We don't have an answer now, but if we can start capturing these things, smart people will figure out how we can get an answer to it.

I

Then one more question: yeah David to your point, I think we had chaos monkey like ten years ago in the financial industry, they're all used AI and they never saw it coming. But what I wanted to ask? Where do you think, will be the main contributions of AI when we talk about things like the self-driving data center, when I listen to your answers, David I think you're hedging a little bit that you say right now. The complexity is too high. We have to focus on abstraction.

I

Well, you foresee a future where maybe the scheduling is predictive that you already know, okay to assure the performance of a workload? This is why I have to put it I can provision more stuff or how do you see this play out? Sorry.

C

So, if that, if that's what you took away, I apologize I, don't think the complexity is too high at all right. We have existence proof of us solving that problem, I think, in my opinion, right now. The problem generally relates to the approachability of using AI or ml for your data center for yourself driving data center. That is too high, and by that I mean literally the interface between a model and your system is broken.

C

It's highly bespoke, meaning either I have to rewrite my model in some very specific way or I have to build some crazy feature engineering tool to translate the data that I have into something that's actually usable or then I have to like. Even if I get answers you know, I'm like do I have the correct feedback loop so that, as might take action on my answers, it's feeding back improperly, like all that is broken right now, it's it's less of like it's very approached or it's very implementable.

C

The problem is, is that there's no standards, and so what we're all three of us on the stage are contributing towards this. This stack and again I I, don't want to you know.

D

C

This up as the end-all be-all solution, but my hope is that we're able to develop some standard as an industry around stacks around ways to ingest data ways to spit out answers, getting feedback loops all that kind of stuff where it doesn't and to be clear. Like I said we have data centers that we do it at Google. We have internal services that literally self-drive our data centers. So it's absolutely possible it's just. How do we make that available to everyone.

C

A

Right with that, I think this is this is a great panel discussion. I thought I hope you all agree, thanks to our panelists and I hope to see you all over the next few days, including the few sessions that we have. It.