►
From YouTube: MLCommons: Accelerating Machine Learning Diane Feddema/Red Hat,Peter Mattson,David Kanter/ MLCommons
Description
MLCommons: Accelerating Machine Learning with Benchmarks, Datasets and beyond
https://mlcommons.org/en/
Guest Speakers:
Diane Feddema (Red Hat)
Peter Mattson (Google)
David Kanter (MLCommons)
OpenShift Commons Gathering on Data Science
January 28, 2021
https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Data_Science.html
Find out more about OpenShift Commons, please visit: https://commons.openshift.org
A
So
we're
pleased
to
be
here
today
at
the
openshift
commons
gathering
and
the
topic
today
is
data
science
and
I'm
dianne
fenima
from
red
hat.
I
work
in
the
ai
services
team.
I'm
here
with
david
cantor
and
peter
madsen
david
is
the
excuse
me.
A
B
So
I'm
peter
matson,
I
run
ml
metrics
for
google,
I'm
interested
in
measuring
all
things
about
ml
and
before
before
that,
I
I
studied
compilers
at
stanford.
I
worked
with
a
startup
called
stream
processors,
and
we
did
video
for
a
while
lots,
lots
of
different
opportunities
to
try
and
make
complex
things
go
fast
as
it
turns
out
that
used
to
be
an
eternal
need,
so
excited
to
be
trying
to
do
that
for
for
a
male
and
also
try
and
make
it
better
as
we
we
pushed
forward.
C
I
actually
started
a
microprocessor
company
that
was
sort
of
doing
a
fusion
of
compilers
and
hardware
design
to
exploit
more
single
threaded
performance
and
then,
after
that,
I
ended
up
consulting
with
a
number
of
companies,
one
of
which
was
cerebral
systems,
which
is
now
like
red
hat,
a
founding
member
of
ml
commons
and
that's
sort
of
how
I
got
involved
in
this,
and
I
actually
have
a
little
bit
of
background
in
benchmarking,
which
kind
of
came
in
handy
and
is
part
of
the
reason
why
I
got
involved.
C
And
it's
just
you
know
it's
it's
very
exciting-
to
be
able
to
build
this
kind
of
an
open
community.
And
we
really
do
appreciate
that.
The
the
role
that
red
hat
is
playing.
A
So
I
don't
know
how
many
users
are
aware
that
ml
commons
originated
in
mlperf.
What
led
you
to
start
an
el
perv
peter
I
and
what
were
its
goals
and
how
did
it
evolve
into
ml
commons.
B
Sure
so
about
about
three
years
ago,
we
were
looking
around
at
ml
and
in
particular
ml
hardware
in
google,
and
trying
to
understand
how
fast
were
different
options,
and
we
decided
that
we
really
needed
to
have
a
a
good
ml's
performance
benchmark
and
there
did
not
seem
to
be
an
industry
standard
solution
for
those.
B
Anyone
we
could.
We
could
find
that
we
thought
it
done
the
wrong
work,
some
folks
like
great
diamonds
from
baidu
who
did
deep
bench,
stanford
dawn
bench,
folks,
matai,
zaharia
and
peter
bayless,
and
the
fathom
folks
from
harvard
and
got
everybody
in
a
room
and
put
forth
the
the
the
challenge
like.
Should
we
should
we
try
and
come
up
with
one
benchmark?
One
could
use
to
make
sure
our
training
performed,
and
everyone
thought
that
was
a
great
idea.
B
So
we
came
up
with
a
set
of
rules
brought
in
a
bunch
more
folks
from
industry,
strong
players
like
nvidia
intel
startups,
like
cerebros,
which
is
like
how
david
got
sucked
in
and
the
benchmark
really
took
off.
We
had
our
our
first
set
of
rules
out
in
the
middle
of.
B
2018.
and
then
results
by
the
end
of
that
year.
We've
had
several
rounds
since
then
2019.
It
was
a
big
year
of
growth.
We
got
into
inference,
we
got
into
hpc
2020,
we
continued
to
expand
and
we
also
started
ml
commons
sort
of
the
the
the
driving
function
behind
that
was.
We
were
looking
around
for
a
home
for
ever
wanted
to
put
in
a
non-profit
organ,
but
we
wanted
something
that
was
engineering
focused
and
ml,
open
engineering
and
ml,
and
we
couldn't
find
that
particular
combination.
B
We
could
find
large
organizations
with
like
linux,
so
we
were
very
focused
on
open
engineering
in
general.
We
could
find
some
that
were
were
focused
on
ml,
in
particular
like
neurops,
but
they
were
more
event
oriented,
and
so
we
decided
to
start
one.
We
wanted
an
organization
that
that
was
their
their
reason
for
being
was
to
try
and
come
along
and
make
ml
better,
and
we
we
put
mlperf
into
mmo
comets
that
will
perf
is
still
very
much
going
strong
and
growing,
but
it
now
has.
B
We
also
looked
at
the
field
of
ml
and
we
feel
like
it's
a
it's,
a
very
young
industry
right.
It
really
has
a
tremendous
amount
of
needs
to
mature
as
a
field
it
needs
it
needs.
You
know
the
same
things
that
drove
sort
of
the
industrial
revolution,
great
ways
of
measuring
things.
He
needs
good
raw
materials
data
in
the
case
of
ml
and
it
needs
good
ways
of
making
things
standard
ways
of
making
things.
You
know
a
shift
from
doing
things
in
your
basement.
B
To
you
know,
assembly
line
production
at
high
quality,
and
we
wanted
to
see
whether
we
could
form
an
organization
that
would
answer
that
call
and
try
and
provide
those
things
and
really
move
the
field
forward.
C
Yeah,
so
so
that's
sort
of
the
you
know
the
driving
motivation
and
I
think
we
kind
of
ended
up
with
three
key
pillars
that
we
like
to
talk
about.
You
know,
and
that
would
be
the
benchmarks
and
metrics
which
you
know.
We've
talked
about
ml
perf,
as
well
as
building
large
open
data
sets,
which
we
think
are
another
key
ingredient
towards
really
democratizing
technology
right,
and
you
know
the
same
way
that
open
source
really
has
enabled
and
fundamentally
transformed,
like
the
art
of
software,
whether
software
as
an
art
or
or
as
an
engineering.
C
It's
just
you
know
utterly
unrecognizable
compared
to
30
years
ago,
and
you
know
sort
of
the.
The
analogy
is
that
that
data
is
sort
of
that
same
raw
ingredient,
that
you
need
to
to
start
building
up
machine
learning-
and
you
know
the
the
more
large
and
open
data
sets.
We
have.
The
more
folks
are
able
to
extend
ml
capabilities
and
use
them
in
products
and
extend
those
benefits
to
the
whole
world.
C
Right
and
and
the
third
pillar
is
best
practices,
and
I
like
to
think
of
this
as
removing
friction,
right
and
and
or
or
perhaps
you
know,
the
transition
from
sewing.
Your
own
clothes
to
you
know
having
a
an
abstracted
assembly
line
where
there's
a
real
flow,
and
you
know
today
with
ml
there's
a
lot
of
things,
whether
it's
model,
portability
or
just
you
know,
even
deploying
a
model
is
tremendously
high
friction.
C
But
if
we
want
ml
to
become
pervasive,
we
need
to
drive
those
sources
of
friction
down
so
that
you
know,
maybe
in
the
future
doing
things
with
ml
is
almost
as
easy,
as
you
know,
grabbing
a
library
off
of
github,
and
then
you
know
looking
at
the
comments
and
maybe
asking
some
questions
on
stack
overflow
about
gluing
it
together.
Like
that's
a
future,
we
would
love
to
go
towards,
and
we
are
very
fortunate
that
you
know
when
we
went
out
and
started
talking
about
this
vision.
C
You
know
it
really
resonated
with
a
lot
of
companies.
You
know,
red
hat
is
a
founder.
We've
got
about
39
companies
that
are
founders
and
a
total
of
over
60
members.
So
some
of
those
are
individuals
like
myself
or
or
academics
associated
with
universities,
and
so
we've
really
built
this
just
tremendously
vibrant
community
to
focus
on
advancing
innovation
in
machine
learning
and
kind
of
extending
those
benefits
through
all
of
society.
And
it's
you
know
very
much
organized
in
in
the
principles
of
open
source
right,
we're
very
open.
A
Okay,
great
so,
are
most
of
those
members.
Then
hardware
companies.
Can
you
just
go
like
give
me
a
little
bit
of
a
break
down
there.
C
Sure
yeah,
so
we
absolutely
have
a
lot
of
hardware
companies.
You
know
peter
named
named
a
few
like
intel
and
nvidia
as
well,
as
you
know,
startups
like
sentient
and
so
forth,
but
we
have
a
number
of
cloud
services,
companies
and
software
companies
as
well.
We
really
see
this
is
a
big
tent.
Where
there's
a
lot
of
folks
who
can
play,
you
know,
to
name
an
example
of
you
know,
sort
of
a
more
purist
software
company
in
some
sense,
vmware
is
involved.
C
Are
there
a
number
of
ml
software
companies
and
then
a
lot
of
cloud
providers
who
provide
computing
services
in
in
one
fashion
or
or
another,
as
well?
As
you
know,
sort
of
very
ml
focused
companies,
there's
a
couple
startups
that
focus
on
replica
replicating
experiments
and
things
like
that
that
are
engaged.
So
it's
a
really
lovely
and
diverse
community
and
also
across
all
geos.
C
C
Absolutely
so
I'll
probably
start
off
with
you
know,
one
or
two,
so
the
ml
perf
benchmarks
are
pretty
well
known,
but
one
of
the
things
that
we
are
doing
is
trying
to
sort
of
grow
the
footprint
and
move
into.
You
know
some
some
new
areas
that
that
need
attention.
In
terms
of
ml,
we
started
as
as
peter
mentioned,
with
training
I
got
involved
and
helped
to
lead
doing
inference
benchmarks,
and
then
one
of
the
things
that
we
branched
off
to
do
was
to
start
focusing
on
mobile
phones
and
ml.
C
In
that
context,
and
then
there's
some
efforts
that
we
have
around
you
know
sort
of
the
internet
of
things
and
tiny
devices.
C
That's
one
way
that
we've
been
expanding
with
different
projects
in
the
metric
side
and
then
one
of
the
things
that
I
think
you
know
actually
you
know
brought
us
together,
you
and
me
most
literally,
was
ml
cube,
which
is
one
of
our
best
practices
right,
and
that
is
a
that
is
a
set
of
conventions
around
containerization
that
help
you
sort
of
abstract
the
machine
learning
away
from
all
the
other
pieces
of
the
infrastructure.
And
you
know
I
like
to
talk
about
this
in
terms
of
both
portability
and
reproducibility.
C
C
And
you
know
we
also
have
some
data
set
projects
and
I'll.
Let
peter
talk
about
those.
B
As
as
david
said,
there's
three
big
pillars
for
us,
which
are
benchmarks,
best
practices
and
data.
B
I
think
in
many
ways
data
sets
are
the
new
code.
They
are
the
way
you
express
what
you
want
your
machine
learning
product
to
do.
B
We
can't
build
performance
benchmarks
without
good
data
sets.
You
can't
do
good
academic
research
on
anything
without
a
good
data
and
a
lot
of
the
data
sets
we
have
now
that
are
really
best
for
their
task.
B
They
were
kind
of
created
haphazardly.
You
know
an
academic
group
needed
something
specific.
They
created
the
data
set
and
then
moved
on
and
there's
there's
a
data
set
out
there.
Usually,
you
know
a
very
modest
size
compared
to
what's
actually
industry,
often
under
restrictive
licensing
terms,
and
it's
it's
not
growing
and
evolving
with
the
field,
and
so
what
we
would
really
like
to
do
with
ammo
commons
is
create
a
a
center
of
excellence
for
public
data
sets.
A
group.
B
Are
really
excited
about
making
sure
there
are
good
public
data
sets
out
there
that
are
are
growing
and
evolving
to
suit
the
needs
of
the
field,
both
actual
data
sets.
For
instance,
we
just
announced
the
people's
speech,
the
largest
publicly
or
soon
to
be
the
largest
publicly
available
speech
data
set
by
order
of
magnitude
that
includes
a
diverse
range
of
languages.
I
think
it's
over
60
languages,
more
diverse
range
of
speakers
than
what's
available.
B
Now
we
really
want
to
push
that
forward,
because
you
know
that
makes
speech
detect
text
technology
accessible
globally.
If
we
can
get
this
right,
we're
also
looking
at
potentially
data
sets
for
recommendation
systems
which
are
incredibly
important
industry
and
potentially
a
framework
for
doing
very
privacy.
B
Protecting
medical
data
sets
or
accuracy
validation
for
people
looking
to
say,
will
this
model
really
work
in
clinical
practice?
We've
got
a
wide
range
of
projects,
we're
looking
into
all
around
this
sort
of
central
theme
of
make
good
public.
A
Okay,
well,
that
is
great.
So
if
someone
in
the
audience
right
now
is
really
interested
in
getting
involved-
and
you
know
in
one
of
these
areas
that
you've
discussed-
you
know
I'm
just
wondering:
where
do
you
need
contributors
right
now
and
and
how
could
they
go
about
getting
on
board
and
helping
out.
C
Yeah
so,
first
of
all,
you
know
like,
like
most
open
source
communities.
You
know
we,
we
really
love
folks
who
show
up,
and
in
fact
you
know
I
just
to
give
you
an
example
of
that.
I
originally
showed
up
to
a
meeting
at.
C
I
think
the
stanford
faculty
club,
one
of
our
early
ones,
that
was
posted
through
a
call
on
thecomp.org,
newsnet,
right
and-
and
I
showed
up-
and
you
know
eventually,
I
did
so
much
good
work
that
I
got
punished
and
they
made
me
executive
director
right
take
that
we
are
an
extremely
open
organization.
So
if
you
go
to
our
website
to
mlcommons.org,
there's
a
page
about
getting
involved,
it
lists
out
all
of
our
working
groups.
C
You
know
we
talked
about
like
three
or
four
projects,
but
there's
prob,
there's
over
10
different
working
groups.
You
know
everything
from
focusing
on
low
power,
embedded
benchmarks
to
logging
to
algorithms,
and
so
each
of
those
working
groups.
We
have
chairs
diane,
you
are,
you
know
one
of
the
chairs
for
ml
cube,
and
so,
if
you
go
to
the
page
on
ml,
cube
you'll
get
to
see,
you
know
a
bit
about
diane
and
what
what
the
project
focuses
on.
C
So
you
can
look
through
those,
and
you
know
we
are
open
to
individual
members
and
and
many
of
our
projects
are
open
source
in
nature.
So
you
know
you
can
stop
by
github
sign
the
cla,
and
you
know
if
you
see
some
bugs,
we
always
love
those
getting
fixed,
and
you
know
I
think
again,
like
a
lot
of
open
source
communities.
C
It's
something
that
you
know
you
get
as
much
as
you
give
right.
It's
it's
the
potluck
model,
and-
and
so
I
think
there
are
a
number
of
folks
who
have
kind
of
wandered
in
randomly
and
found
that
it
is,
you
know,
really
fits
their
interests.
Some
of
the
folks
on
the
data
set
side
are
just
phenomenally
passionate
about
speech,
and
this
is
just
you
know,
a
really
wonderful
thing
that
just
aligns
with
what
they
want
to
do.
So
we
we'd
love
to
see
more
folks
getting
engaged.
B
Both
from
industry
and
academia,
we
have
quite
a
few
faculty
already
involved
and
and
we'd
like
more
we'd,
really
like
to
maintain
that
balance
and
a
community.
That's
that's,
really
open
and
and
wants
to
push
innovation
move
the
whole
field
forward.
A
Okay,
fantastic
so
and
then,
if
you
want
to
get
like
the
links
and
things
go
to
mlcommons.org,
is
that
right,
yep,
okay,
yeah!
I
think
it's
a
great
group
of
people
very
friendly
group
so
glad
I
joined
it,
and
thank
you
so
much
for
being
here
today
and
talking
to
us.
B
A
C
Yeah
thanks
for
the
invitation
and
also
you
know,
thanks
for
all
of
your
contributions
to
the
community
as
well.
You
know
it's
it's.
It's
been
a
a
great
partnership.