Red Hat OpenShift Data Science | OpenShift Commons Gathering 2021, 28 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: MLCommons: Accelerating Machine Learning Diane Feddema/Red Hat,Peter Mattson,David Kanter/ MLCommons

Description

MLCommons: Accelerating Machine Learning with Benchmarks, Datasets and beyond
https://mlcommons.org/en/

Guest Speakers:
Diane Feddema (Red Hat)
Peter Mattson (Google)
David Kanter (MLCommons)

OpenShift Commons Gathering on Data Science
January 28, 2021

https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Data_Science.html
Find out more about OpenShift Commons, please visit: https://commons.openshift.org

A

So we're pleased to be here today at the openshift commons gathering um and the topic today is data science and I'm dianne fenima from red hat. I work in the ai services team. I'm here with david cantor and peter madsen david is the excuse me.

A

The executive director of ml commons and peter is the president of ml commons, general chair of mlperf and staff engineer at google, so peter and david recently launched ammo commons and we invited them to provide some background history on emerald commons and animal perth, and I want to say that red hat is really excited to be one of the founding members of emily.

A

So to get us started, tell us a little bit about your backgrounds and some of the work that you do.

B

So um I'm peter matson, um I run ml metrics uh for google, I'm interested in measuring all things about ml and uh before before that, uh I I studied compilers at stanford. um I worked with a startup called stream processors, and we did video for a while lots, lots of different uh opportunities to try and make complex things go fast as it turns out that uh used to be an eternal need, so excited to be trying to do that for for a male and also try and make it better as we we pushed forward.

B

Thanks for the emerald perfidemo comments, david.

C

uh Yeah, david cantor uh and so uh pre-ml commons, uh I spent a lot of time in computer architecture.

C

I actually uh started a microprocessor company that was sort of doing a fusion of compilers and hardware design to exploit more single threaded performance and then, after that, I ended up consulting with a number of companies, uh one of which was cerebral systems, which is now uh uh like red hat, a uh founding member of ml commons and that's sort of how I got involved in this, and I actually have a little bit of background in benchmarking, uh which kind of came in handy and is part of the reason why uh I got involved.

C

And it's just you know it's it's very exciting- to be able to build this kind of uh uh an open community. And we really do uh appreciate that. The the role that red hat is playing.

A

So I don't know how many users are aware that ml commons originated in mlperf. What led you to start an el perv peter I and um uh what were its goals and how did it evolve into ml commons.

B

Sure so um about um about three years ago, um we were looking around at uh ml and in particular ml hardware in google, and trying to understand um how fast were different options, and we decided that we really needed to have a a good ml's performance benchmark and there did not seem to be an industry standard solution for those.

B

So we rounded up a set of usual suspects.

B

Anyone we could. We could find that we thought it done uh the wrong work, um some folks like uh great diamonds from baidu who did deep bench, stanford dawn bench, folks, uh matai, zaharia and peter bayless, um and um the fathom folks uh from harvard um and uh got everybody in a room and uh put forth the the the challenge like. Should we should we try and come up with one benchmark? uh One could use to make sure our training performed, and everyone thought that was a great idea.

B

So we came up with a set of rules um brought in a bunch more folks from industry, um strong players um like nvidia intel startups, like uh cerebros, uh which is like how david got uh sucked in um and the benchmark uh really took off. um We had our our first set of rules out in the middle of.

B

2018. and then uh results by the end of that year. um We've had several rounds since then 2019. uh It was a big year of growth. We got into inference, um um we got into uh hpc 2020, we continued to expand and we also started ml commons sort of the the the driving function behind that was. We were looking around for a home for ever wanted to put in a non-profit organ, but we wanted something that was engineering focused and ml, open engineering and ml, and we couldn't find that particular combination.

B

We could find large organizations with uh like linux, so we were very um focused on open engineering in general. We could find some that were were focused on ml, in particular uh like uh neurops, but they were more event oriented, and so we decided to start one. um We wanted an organization that that was their their reason for being was to try and come along and make ml better, and we we put uh mlperf into mmo comets that will perf is still very much uh going strong and growing, but it now has.

B

We also looked at the field of ml and we feel like it's a it's, a very young industry right. It really has uh a tremendous amount of needs to mature as a field it needs it needs. uh You know the same things that drove sort of the industrial revolution, great ways of measuring things. He needs good raw materials data in the case of ml and it needs good ways of making things standard ways of making things. You know a shift from doing things in your basement.

B

To uh you know, assembly line production at high quality, and we wanted to see whether we could form an organization that would uh answer that call and try and provide those things and really move the field forward.

A

Yeah, that's great.

C

Yeah, so so that's sort of the you know the driving motivation and I think we kind of ended up with three key pillars that we like to talk about. You know, and that would be the benchmarks and metrics which you know. We've talked about ml perf, as well as building large open data sets, which we think are another key ingredient towards really democratizing technology right, and you know the same uh way that open source really has enabled and fundamentally transformed, like the art of software, uh whether software as an art or or as an engineering.

C

It's just you know utterly unrecognizable compared to 30 years ago, and uh you know sort of the. The analogy is that that data is sort of that same raw ingredient, that you need to to start building up machine learning- and uh you know the the more large and open data sets. We have. The more folks are able to extend ml capabilities and use them in products and extend those benefits to the whole world.

C

Right um and and the third pillar is uh best practices, and I like to think of this as removing friction, right and and uh or or perhaps you know, the transition from sewing. Your own clothes to you know having a uh uh an abstracted uh assembly line where there's a real flow, and uh you know today with ml there's a lot of things, whether it's model, portability or just you know, even deploying a model is tremendously high friction.

C

But if we want ml to become pervasive, we need to drive those sources of friction down so that you know, maybe in the future doing things with ml is almost as easy, as you know, grabbing a library off of github, and then you know looking at the comments and maybe asking some questions on stack overflow about gluing it together. Like that's a future, we would love to go towards, and we are very fortunate that uh you know when we went out and started talking about this vision.

C

You know it really resonated with a lot of companies. um You know, red hat is a founder. We've got about uh 39 uh companies that are founders and a total of over 60 members. So some of those are individuals like myself or or academics associated with universities, um and so we've really built this just tremendously vibrant community to focus on advancing innovation in machine learning and kind of extending those benefits through all of society. And it's you know very much organized in in the principles of open source right, we're very open.

C

We like to move fast and iterate.

A

Okay, great so, um are most of those members. Then hardware companies. Can you just go like give me a little bit of a break down there.

C

Sure yeah, so we absolutely have uh a lot of hardware companies. uh You know peter named named a few like uh intel and nvidia as well, as you know, startups like sentient and so forth, uh but we have a number of cloud services, companies and software companies as well. We really see uh this is a big tent. Where there's a lot of folks who can play, uh you know, to name an example of uh you know, sort of a more purist software company in some sense, vmware is involved.

C

Are there a number of ml software companies and then a lot of uh cloud providers who provide computing services in in one fashion or or another, um as well? As you know, sort of very ml focused uh companies, there's a couple startups that focus on replica replicating experiments and things like that that are engaged. So it's a really lovely uh and diverse community and also across all geos.

C

This is both a blessing and a curse um for those of you in distributed organizations. You know the challenge of finding a time that works for folks in asia, folks in europe and folks in america, which is there is no such time. But you know it's great to have such a diversity of participation.

A

Can you give me some examples of projects that are going on in ml commons.

C

Absolutely um so I'll probably start off with uh you know, one or two, uh so the ml perf benchmarks are pretty well known, but one of the things that we are doing is trying to sort of grow the footprint and move into. You know some some new areas that that need attention. In terms of ml, we started as as peter mentioned, with training I got involved and helped to lead doing inference benchmarks, and then one of the things that we branched off to do was to start focusing on mobile phones uh and ml.

C

In that context, and then there's some efforts that we have around uh you know sort of the internet of things and tiny devices.

C

That's one way that we've been expanding uh with different projects in the metric side and then one of the things that I think you know actually you know brought us together, you and me most literally, was ml cube, which is one of our best practices right, and that is a uh that is a set of conventions around containerization that help you sort of abstract the machine learning away from all the other pieces of the infrastructure. And you know I like to talk about this in terms of both portability and reproducibility.

C

And one of the examples I give of how this can help is when I think about a game-changing innovation like bird, it was first published as a paper by google and there's probably some code in tensorflow.

C

But if you wanted to wrangle that and try that with your customers, you might spend a month or two doing that, and you know the vision is that maybe one day we can get that down to a day or so or less or maybe even hours, so that you know if, if you want to use an innovation at amazon or facebook, and and try it out on premise or in a different cloud altogether, that becomes frictionless.

C

And I remember you know one of the first things that that brought you together with us was you were working with some of our benchmarks and trying to get them to work uh on on red hat, and you know it was. It was a bit of a struggle and so in some sense it was born out of that need and desire.

C

um And you know we also have some data set projects and I'll. Let peter talk about those.

B

As uh as david said, there's three big pillars for us, which are benchmarks, uh best practices and data.

B

I think in many ways uh data sets are the new code. um They are the way you express what you want your machine learning product to do.

B

The models are, in some sense, a lossy compiler and one of the key kinds of data sets that really drives innovation in the field is public data sets you think about what imagenet has done for the field, right that that costs something on the order of 300 000 to build, and arguably it's created modern machine learning.

B

We can't build performance benchmarks without good data sets. um You can't do good academic research on anything without a good data and a lot of the data sets we have now that are really best for their task.

B

They were kind of created haphazardly. You know an academic group needed something specific. They created the data set and then moved on and there's there's a data set out there. Usually, you know a very modest size compared to what's actually industry, often under restrictive uh licensing terms, and it's it's not growing and evolving with the field, and so what we would really like to do with ammo commons is create a a center of excellence for public data sets. A group.

A

B

Are really excited about making sure there are good public data sets out there that are are growing and evolving uh to suit the needs of the field, both actual data sets. For instance, we um just announced the people's speech, the largest uh publicly or soon to be the largest publicly available speech data set by order of magnitude that includes a diverse range of languages. I think it's over 60 languages, um more diverse range of speakers than what's available.

B

Now we really want to push that forward, because uh you know that makes uh speech detect text technology accessible uh globally. If we can get this right, um we're also looking at uh potentially data sets for recommendation systems which are incredibly important industry and potentially a framework for doing very privacy.

B

Protecting medical uh data sets or accuracy validation for people looking to say, will this model really work in clinical practice? We've got a wide range of projects, we're looking into all around this sort of central theme of make good public.

A

Okay, well, that is great. So if someone in the audience right now is really interested in getting involved- and you know in one of these areas that you've discussed- you know I'm just wondering: where do you need contributors right now and and how could they go about getting on board and helping out.

C

Yeah so, uh first of all, you know like, like most open source communities. You know we, we really uh love folks who show up, and in fact you know I uh just to give you an example of that. uh I originally showed up to a meeting at.

C

I think the stanford faculty club, one of our early ones, that was posted through a call on thecomp.org, newsnet, uh right and- and I showed up- and you know eventually, I did so much good work that I got punished and they made me executive director right take that we are an extremely uh open organization. um So if you go to our website uh to mlcommons.org, there's a page about getting involved, it lists out all of our working groups.

C

uh You know we talked about like three or four projects, but there's prob, there's over 10 different working groups. You know everything from focusing on low power, embedded benchmarks to logging to uh algorithms, and so each of those uh uh working groups. We have chairs uh diane, you are, you know one of the chairs for ml cube, uh and so, if you go to the page on ml, cube you'll get to see, uh uh you know a bit about diane and what what the project focuses on.

C

So you can look through those, and uh you know we are uh uh open to individual members and and many of our projects are open source in nature. So you know you can stop by github sign the cla, and uh you know if you see some bugs, we always love those getting fixed, and uh you know I think again, like a lot of open source communities.

C

It's something that uh you know you get as much as you give right. It's it's the potluck model, and- and so I think there are a number of folks who have kind of wandered in randomly and found that it is, uh you know, uh really fits their interests. uh Some of the folks on the data set side are just phenomenally passionate about speech, and this is just you know, a really wonderful thing that just aligns with what they want to do. So we we'd love to see more folks getting engaged.

B

Both from industry and academia, we have uh quite a few faculty already involved and and we'd like more we'd, really like to maintain that balance and a community. That's that's, really open and and wants to push innovation move the whole field forward.

A

Okay, fantastic so and then, if you want to get like the links and things go to mlcommons.org, is that right, yep, okay, yeah! I think it's a great group of people very friendly group so glad I joined it, and uh thank you so much for being here today and talking to us.

B

A

B

Having us, it's been great.

C

Yeah thanks for the invitation and also you know, thanks for all of your uh contributions to the community as well. You know it's it's. uh It's been a a great partnership.

A

Yeah been a lot of fun. Thank.

B

B