►
From YouTube: CDF SIG MLOps Meeting 2020-05-21a
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
C
D
C
A
A
And
a
few
of
the
people
loitering
I
guess:
should
we
kick
off
Terry
you.
A
A
So
training
model
hyper
parameter,
endpoint
training,
pipeline
training
set,
there's
probably
more.
We
can
add,
probably
parameter
as
an
obvious
one
that
I'm
missing
you
know
so
hyper
parameters
are
things
usually
picked
by
human
and
the
parameters
of
things?
My
training,
although
with
newer
algorithms,
that's
even
getting
more
public
added
where
machines
in
and
picked
other
parameters,
so
models
deployable
unit,
training
pipeline,
it's
probably
analogous
to
a
normal
software
pipeline
and
may
involve
both
software
delivery.
I
guess.
The
difference.
A
That's
striking
me
about
machine
learning
sort
of
pipelines
is
that
they
are
much
longer
running
than
a
typical
software
pipeline,
maybe
in
industry,
and
that
some
software
pipelines
can
be
pretty
long
better.
If
you
had
an
end
to
end
mill
pipeline
that
involved
training,
that's
reasonable
to
think
you
could
be
running
for
hours
or
days,
probably
even
terrier.
What
do
you
see
several.
A
Just
spent
most
of
today
re
running
a
training
thing,
because
I
had
assigning
a
variable
in
a
map
in
Python
and
I
had
left
a
comma
at
the
end.
So
converted
into
a
tuple
for
me
and
and
I
was
staring
at
it
for
hours
like
what
is
wrong,
and
then
it
was
one
of
those
things
you
spot
and
then
it's
like
you
just
close,
but.
E
A
Okay
well
I
spend
when
I
do
my
own
about
nineteen
US
dollars
an
hour,
so
I
won't
so
I
in
those
kite.
That's
kind
of
the
edge
case
like
there's
a
lot
of
like
resumable
stages,
the
pipeline
I.
Guess,
like
you
know,
I'm
trying
to
think
of
other
pipelines,
I've
used
where
you
might
start
it
again,
but
you'll
skip
over
a
bunch
of
stages
to
get
up
to
where
you
work
or
or
in
those
cases,
is
it
like
if
it
breaks
at
one
point,
is
it
broken
and
you're
gonna
start
again?
What's
a
typical
well.
E
Iii
think
they're
going
to
be
some
interesting
challenges
on
the
on
the
larger
projects,
because
you
know,
typically,
what
you're
talking
about
is
a
situation
where,
where
you
have
a
large
number
of
GPUs
or
TP
use
and
you're
having
to
distribute
elements
of
the
solution
across
multiple
nodes,
so
the
intermediate
products
may
not
be
something
you
can
reuse
easily.
If,
if
something
goes
wrong
with
the
pipeline
run
in
in
the
more
complex
situations,
there
are
certainly
some
very
interesting
technical
challenges
that
will
have
significant
cost
implications.
A
C
C
B
A
D
A
C
So,
what's
the
dissuade
making
that
PR
just
to
disarm
and
I
understand
when
you're
talking
about
the
distinctions
between
ml
pipelines
and
traditional
pipelines?
Are
we
talking
about
anything
qualitatively
different
or
is
it
just
quantitatively
different?
So,
for
example,
you
know
kubernetes
is
doing
very
well
in
managing
large,
traditional
workloads
and
you
know
restarting
pods,
and
things
like
this
and
merging
memory
usage
and
all
that
kind
of
jazz.
A
Terry
would
have
some
sort
say
because
he's
been
using
Tecton,
which
is
kind
of
a
generic
sort
of
pipeline
thing
and
I
know.
David
are
on
cheque
from
Microsoft
sort
of
said
that
there
would
be
there's
no
reason
why
they
should
be
different.
But
to
me
the
thing
that
leaks
out
is
more
the
the
length
of
time
that
things
run
for
is
the
most
problematic
having
workers
a
lot
of
different
pipelines
over
the
years.
A
That's
the
thing
that
would
leap
out
is
like
just
the
the,
not
necessarily
the
number
of
steps,
because
I've
seen
just
boring
software
pipelines
in
enterprises.
There
might
have
50
steps
or
something
you
know
what
it's
like
yeah,
but
it's
like
you
know.
This
thing
might
run
for
three
hours
or
in
Terry's
case,
run
for
ten
days
and
cost
a
significant
amount
of
resources.
A
Maybe
it's
more
quantitatively
different.
Like
just
the
maybe
the
qualitative
difference
is
that
there's
data
flying
through
so
a
trigger
for
one
of
these
pipelines
could
be
the
Neary
I'm
working
on
something
where
the
training
will
be
updated
for
a
given
account.
If
you
like
once
a
month,
it's
just
there's
no
reason
for
that.
Maybe
there's
some
other
event:
significant
change
in
circumstances,
so
there's
some
trigger
for
this
pipeline.
That's
less
frequent
than
just
upgrading
some
software,
but
in
that
case,
there's
chunks
of
the
data
that
have
to
go
through
the
system.
A
E
Think
this
is
this
is
where
it.
It
gets
quite
interesting
because
for
for
trivial
projects
that
there
isn't
much
difference,
but
when
you
start
to
deal
with
significant
real-world
machine
learning
problems
when
you're
starting
to
try
and
deal
with
problems
where
you're
trying
to
replace
a
human
with
similar
levels
of
processing
capability,
then
you,
you
start
to
hit
some
some
fundamental
scaling
challenges
in
in
the
approaches
that
we
have
at
the
moment.
So
as
a
reference,
if
we
look
at
something
like
autonomous
vehicles,.
E
So
you
you
hit
some
fundamental
problems
in
a
number
of
areas.
So,
typically,
when
you're,
when
you're
looking
at
that
scale
of
complexity,
people
are
currently
using
dedicated
architectures,
where
you
effectively
have
a
whole
data
center,
which
is
dedicated
to
training
one
model
where
you,
you
have
racks
and
racks
of.
E
E
On
your
biggest
problem
is
actually
making
sure
that
you
have
sufficient
data
distributed
across
the
memory
caches
on
all
that
GPUs
to
be
able
to
to
actually
train
against.
If
you
think
about
it,
the
GPUs
only
have
a
limited
amount
of
RAM,
so
you
might
be
working
with
24
gigs
of
ram
on
a
on
a
individual
card,
and
the
card
may
be
able
to
process
all
of
that
data
relatively
quickly.
E
A
That
is
that
so
that's
pipeline
in
the
large,
but
if
it's
a
tool
and
say
like
tapped
on
or
Jenkins
X
or
something
like
that,
orchestrating
it
like
it's
not
moving
those
bits
around.
It's
orchestrating
it
isn't
like
it's
and
stuff
that
you're
you've
been
working
on
and
things
like
Tecton
capable
of
scaling
to
that
or
not
yet
or
maybe
in
the
future.
Well,.
E
E
A
A
E
Typically
they're
FPGA
cards
so
they're
there
they're
effectively
very
similar
and
in
architecture
to
the
to
the
GPU
approach.
You've
got
a
a
card
in
a
slot
in
a
physical
machine
in
Iraq,
and
you
might
be
at
a
I,
said
four
of
those
cards
in
one
physical
machine
and
then
there'll
be
limits
on
how
many
cards
you
can
have
virtual
machine
within
the
kubernetes
No.
E
E
It's
about
the
limits,
the
PCIe
bus
right,
okay,
you're
talking
about
moving
data
to
the
point
at
which
you're
saturating
the
PCIe
bus
and
therefore
you're
doing
with
the
limits
of
the
physical
hardware
got
yeah.
So
so,
when
you
look
at
the
dedicated
compute
units,
they're
they're,
not
structured
like
a
conventional
server,
would
be
basically
a
rack
full
of
GPUs.
B
So
thinking
about
what
I've
read
in
the
the
main
document
and
talking
about
how
it
is
important
to
be
agnostic
regarding
the
actual
details
of
the
technology
we
are
using
and
what
you've
just
said
about
having
pretty
much
a
data
center
dedicated
to
training
a
specific
model,
this
very
much
sounds
as
though
kubernetes
is
perfectly
fine,
but
we
just
need
to
have
an
awareness
that
we're
going
to
have
a
tiny
little
node
that
essentially
calls
out
and
says
data
center.
Please
do
your
thing
it's.
B
B
E
Are
a
number
of
constraints
at
the
moment,
so
obviously
kubernetes
is
designed
to
run
on
conventional
server
hardware.
So
so,
typically
it
doesn't
have
the
data
throughput
that
you
might
need
to
to
work
a
pace
on
a
large
model,
but
also
you
have
a
you,
have
an
elastic
scaling
problem
today,
because
compute
hardware
is
allocated
at
a
node
level
rather
than
container
level
so
to
to
actually
be
able
to
use
containers
to
build
things.
You
have
to
have
nodes
that
have
been
provisioned
with
GP
or
TPU
Hardware.
Sorry,.
B
I'm
not
trying
I'm
not
suggesting
that
kubernetes
should
be
used
to
do
the
building
it
sounds.
It
seems
very
clear
that
kubernetes
is
not
going
to
be
in
a
position
to
do
that,
but
it
seems
reasonable
that
kubernetes
as
part
of
the
ml
ops
infrastructure
remains
entirely
viable.
So
long
as
it
is
simply
calling
out
to
whatever
custom
hardware
we
might
need
where
custom
hardware
in
the
cases
you've
been
talking
about,
might
literally
be
a
whole
data
center
optimized
for
cycling
up
petabytes
of
I've
done
similar
items.
A
Like
this,
not
with
Cuba
news,
but
with
me
so
in
the
past-
and
you
would
configure
like
when
you
had
a
piece
of
work
to
do,
you
would
label
it
in
such
a
way
that
effectively
isolate
that
resource.
So
no
one
else
would
use
it
so
I
imagine
it
would
be
similar
in
this.
It's
not
actually
the
the
training
isn't
happening
inside
a
docker
container
or
a
whatever
post.
Docketing
kubernetes
uses
these
days,
but
it's
more.
It's
just
shelling
out
to
something
else.
To
actually
do
the
work
in
collecting
the
results
or
exactly.
B
E
Just
using
yes
docker
containers
and
that
that's
what
Jake
is
excellent
during
a
moment
you,
you
have
some
constraints
in
that
you
need
to
elastically
scale.
You
are
nodes
in
order
to
control
your
costs,
because
you'll
be
incurring
charges
for
as
long
as
GPU
units
are
connected
to
it,
an
active
node.
So
you
need
to
take
certain
precautions
to
make
sure
that
you're
you're
cleaning
down
your
pipeline
successfully
after
they've
finished
I'm,
not
leaving
dead
pipelines
running
around
consuming
resources
that
you're
having
to
pay
for.
E
But
there
is
a
a
cap
to
scaling
at
the
moment.
Some
of
that
cap
could
be
addressed
by
some
infrastructure
level.
Changes
to
cuba
Nattie's
itself
to
to
allow
us
to
treat
node
resources
elastically,
and
that
may
be
something
that
we
we
can
extend
kubernetes
to
to
to
fit
better
with
in
the
the
ML
model
of
the
world.
D
One
thing
here,
I
would
like
to
hear
your
thoughts.
It's
a
fool
agree.
The
things
scale
problem
when
we
are
training
problem
models,
but
I
believe
the
results
of
qualitative
difference
between
action,
which
is
for
me
to
teen
into
production
or
deploying
models,
means
things
differently
when
I'm
talking
about
usual
software,
which
is
just
putting
the
binaries
start
using
them.
But
when
we
are
talking
about
models
and
new
features,
we
have
also
to
deal
with
the
already
processed
data
data.
D
So,
for
example,
let's
sing
in
classical
float
detection
system
and
we
were
created
a
new
version
of
the
of
the
model
that
classifies
clients
into
good
or
bad
and
things
with
a
ploy
that
new
model
into
production.
But
after
that,
that
is
not
enough.
We
need
to
reevaluate
all
the
existing
clients
with
a
new
model
to
get
more
accurate
information
about
which
clients
are
on
fraud
and
which
clients
are
they
get,
and
that
is
an
extra
step
also
with
a
lot
of
scale,
problems
that
typically
I,
don't
believe
happens
in
Jews
walls
of
development.
D
E
Working
backwards,
if
you
like
typically
a
model,
also
requires
some
associated
components,
so
it
may
need
specific
pre-processing
for
the
data
that
it's
it's
going
to
infer
on.
So
so,
typically,
there
will
be
a
a
set
of
components
that
need
to
be
passed
from
the
training
stage
into
the
the
service.
That's
going
to
implement
the
an
instance
of
the
model
and
one
of
the
challenges
in
that
space
is
that
they're
currently
limited
abstraction
layers
that
allow
you
to
encapsulate
the
the
model
and
the
Associated
data.
E
Mechanism,
by
which
you
can
specify
the
structure
of
a
model
and
pass
that
from
one
system
to
another
without
actually
having
to
pass
serialized
classes
or
chunks
of
python,
so
that
allows
you
to
decouple
your
implementation
from
from
an
explicit
version
step
of
dependencies,
so
that
that's
that's!
That's
one
of
the
problems
that
jenkins
x
actually
addresses
to
directly.
E
So
when,
when
you
want
to
to
run
a
particular
training,
you
need
to
specify
a
a
set
of
training
data
and
a
set
of
test
data
to
operate
from,
and
you
want
to
be
able
to
repeatedly
go
back
to
to
that
set
of
data.
So
you
need
to
be
able
to
specify
a
version
collection
of
data
and
then
pass
that
version
collection
of
data
to
the
pipeline.
That's
executing
the
training
and.
A
That
that
versioned
collection
of
data
could
be
in
something
like
s3
is
maybe
some
content-addressable
harsh
or
immutable
name,
and
then,
if
you
were
tracking
everything
else
in
the
kit
ops
way,
you
could
just
say
the
data
is
over
here
or
use
its
fingerprint
here,
as
it's
like
it
doesn't
have
to
be
in
one
monolithic
thing.
Doesn't
so.
E
A
Mean
things
like
a
snowflake
would
let
you
do
things
like
that?
Like
yeah
there's
certain
solutions,
so
you
mentioned
test
data
and
training
data.
So
often
you
can
take
a
set
of
data
and
then
you
know
slice
a
bit
for
validation
and
a
bit
for
test.
So
you're
saying
for
it
to
be
reproduce,
and
you
do
that
randomly
like
you
would
you
would
take
a
set
randomly
pull
out
test,
randomly
pull
out
validation,
whatever
percentage
ratio
you
want
you're
saying
to
reproduce
it
then
that
split.
A
Those
splits
should
be
kept
separate
as
well,
because
you
want
to
you
wanted
to
be
deterministic,
because
if
it
was,
if
you
were
just
randomly
picking
different
times
each
time,
even
though
the
whole
set
of
data
was
constant,
you
would
get
slightly
different
results
because
you'd
be
randomly
picking
a
different
test
subset
each
time.
So
would
you
have
to
keep
that
test
a
bit
like
stored
somewhere
as
well?
The.
E
Easiest
way
to
understand
this,
one
is
to
think
backwards
from
from
a
real-world
scenario.
So
so,
typically
your
your
machine
learning
model
is
going
to
be
a
decision-making
system
operating
in
the
real
world.
So
if
it's
a
control
system
for
an
autonomous
vehicle,
then
if
something
has
gone
wrong,
then
potentially
its
killed
someone.
E
So
you
need
very
clear
compliance
reasons.
You
will
always
need
to
be
able
to
have
an
audit
trail
that
goes
from
the
finished
model
backwards
to
the
source
data
and,
in
many
cases,
you'll
be
required
to
to
implement
some
some
level
of
visibility
on
on
the
why
certain
decisions
were
made
by
the
model
under
certain
circumstances.
E
But
you
also
have
the
scenario
that
you
know
in
the
event
of
a
a
serious
failing
in
the
system.
You,
you
will
need
to
retrain
the
model
and
then
do
regression
testing
to
demonstrate
that
the
model
behaves
the
new
model
behaves
differently
under
those
circumstances.
So
you
all
need
to
be
able
to
reproduce
a
set
of
conditions
and
and
check
back
against
potentially
earlier
training
sets
to
to
verify
the
behavior
and.
A
E
This
is
the
this
is
the
big
risk
today
here
is
that
most
of
these
things
are
effectively
uncontrolled,
because
the
the
training
scripts
themselves
are
being
built
in
environments
that
you
can't
properly
control.
The
data
is
completely
uncontrolled
and
in
many
cases,
there's
there's
not
even
a
record
of
what
training
data
was
being
used
to
frame
which
version
of
a
model.
So
so
it's
it's
going
to
be
very
difficult
in
an
increasingly
regulated
environment
to
to
meet
the
requirements
of
the
regulatory
compliance
unless
we
can
provide
pipeline
systems
that
facilitate
doing
all
of
this
automatically.
D
D
D
Everything
in
unit
serializing
that
sending
and
then
we
analyze
the
model
we
deploy
it.
Well,
it
has
to
be
deployed,
so
we
need
to
run
some
checks.
Some
of
them
already
run
testing
done
on
training
to
make
sure
that
a
new
environment
is
going
to
behave
exactly
as
a
testing
one
or
is
that
what
are
your
thoughts
on
that
site?
So.
E
E
E
E
A
Me,
that's
that's!
That's
where
things
are
fairly
familiar,
it's
it's
like
imagine.
You
had
some
third-party
library.
You
were
using
that
updated
once
a
week
once
a
month,
there's
some
new
new
feature
for
checking
for
credit
card
fraud
or
something
your
credit
card
numbers,
and
it's
nothing
to
do
with
machine
learning
or
anything.
Then,
even
though
you're
taking
this
binary,
you
trust
its
provenance,
there's
all
that
sort
of
stuff.
You
would
still
have
your
own
kind
of
Institute
tests
or
regression
tests
or
integration
tests.
Whatever
you
want
to
call
it
acceptance
this.
A
So
this
point:
it's
you
know.
The
model
that's
being
deployed
is
an
artifact
like
anything
else
like
it's,
not
really
special.
In
that
regard,
it
might
be
infinitely
more
complex
in
how
I
can
misbehave
so
yeah,
there's
more
need
to
test
and
continuously
monitor
it.
But
you
know
in
the
simple
case
that
sort
of
degrades
too
you
know
it's.
It's
just
a
piece
of
software
that
you're
putting
out
there.
It
just.
E
A
If
it's
yeah,
if
it's
you
know,
for
example,
spotting
it's
doing
anomaly,
detection
or
some
sort
of
reinforcement,
learning,
yeah,
it'll,
it'll,
its
behavior,
will
change
and
yeah
I.
Suppose
then
you've
got
more.
You
want
to
have
it
even
deeper,
like
scenario,
sort
of
based
testing
or
regression
testing
that
you
know
that
it
hasn't
made
the
things
that
were
bad.
That
once
happened,
don't
happen
again
that
yeah
it's.
A
Mean
that's.
That's
if
you
written
a
bunch
of
code-
and
you
know
handcraft
decision,
trees
or
use
the
rule
system,
you
can
come
across
the
same
thing.
It's
just
a
human
mystics
or
like
your
assumptions,
change
and
yeah
I!
Guess:
he'd!
Be
you
more?
What
you're
saying
is
you're
more
likely
to
encounter
this
stuff
with
sophisticated
models,
they're
more
sensitive
to
their
environment?
Maybe
yeah.
E
Well,
this
is
this
is
the
challenges
where,
if
you're,
if
you
explicitly
coding
something,
you
know
what
your
assumptions
are
and
therefore
you
know
when
there's
change,
but
when
you're
training
a
model,
you
don't
actually
know
what
features
it's
it's
detecting.
You
just
know
that
it's
it's
giving
you
a
result
that
you
were
expecting
from
the
data
that
you're
measuring
on
the
output
side.
So
you
don't
actually
know
explicitly
what
is
influencing
the
decision
that
the
model
is
making
yeah
classic
example
of
that
is
they
were.
There
were
models
trained
to
recognize.
A
You're
getting
little
leakage
and
excreted
leakage.
There,
:
I've,
seen
some
I've
been
reading
about
some
there's
more
interesting
ensemble
models
that
would
help
with
things
like
that
and
ensemble
means
like
there's
there'd,
be
that
model
trained
and
there'd
be
other
unrelated
models,
perhaps
maybe
different
sets
of
data
that
would
maybe
even
off-the-shelf
things
that
would
go
well.
I
recognize
that
as
an
x-ray,
I
recognize
that
as
a
ruler
and
that
self
becomes
a
feature
that
feeds
into
something
else,
so
yeah
I
think
the
state
of
the
arts
moving
neck.
Is
that
yeah?
A
There's
lots
of
amusing
examples
like
that
and
yeah
I'm
sure,
there's
more
or
maybe
apocryphal
stories
as
well?
Okay,
so
one
piece
of
feedback
from
last
week
that
someone
put
it
in
the
document
I'll
paste,
the
link
was
Google's.
Take
on
ml
ops
for
those
that
are
interested
I
had
a
bit
of
a
flick
through
of
it.
It
seemed
to
line
up
with
some
of
the
things
we've
been
talking
about.
A
The
main
thing
I
feel
lot
of
interest,
where's
the
diagram
near
the
top.
That
shows
you
know
how
much
work
there
is
around
data
collection,
feature,
engineering
and
monitoring
and
serving
and
the
actual
machine
learning
codes
a
tiny
little
square
in
the
middle
I
thought.
That's
that's
a
really
nice
diagram
and
I
showed
everyone.
So
for
words.
A
Worse,
that's
bugles,
take
on
things
and
they
they
classify
things
from
level
zero
through
to
zero
is
like
basically
manual
clicky,
clicky,
notebooks,
to
I
think
to
being
the
using
C
ICD
pipeline
automation,
which
is
so
they
have
sort
of
three
three
levels
of
maturity.
I
guess
I
thought
that
was
an
interesting
thing
to
note.
A
Another
thing,
a
minor
thing:
Jerry
was
there
was
a
ticket
on
the
chickens,
X
testing
I
just
wanted
to
ask
like
for
doing
like
acceptance,
tests
of
the
ml
quickstarts
that
would
result
in
a
web
app
running
right,
like
there'd,
be
a
web
app.
That
does
a
decision
like
if
you
start
a
given
email.
Quickstart,
you
end
it.
You
end
up
with
a
one
app
running,
but
to
two
pipelines.
Is
that
right.
E
A
A
It
was
a
issue,
someone
open
that
I
was
going
to
include
in
the
document
which
was
a
bunch
of
interesting
links
to
articles.
I
thought
we
could
have
a
section
in
the
roadmap,
all
that
all
the
readme
of
just
interesting
resources.
I
thought
that
was
interesting
and
something
else
that
I
was
asked
by
tracy
from
the
cdf
is
like,
if
we
thought
about
doing
some
more
or
some
blogging
on
on
ml,
specifically
for
the
cdf,
like
that,
has
a
rapidly
growing
audience
of
users,
sort
of
introducing
the
sig
and
the
concepts.
A
E
A
A
Of
course,
Saturday
mornings
of
shitshow,
so
I
was
going
to
look
through
some
of
those
other
technical
technology
requirements,
some
of
the
things
that
I've
come
across,
but
one
of
the
ones
I
thought
that
if
you
had
a
chance
Terry
to
have
a
crack
at
would
be
the
implications
of
privacy
in
gdpr.
That's
I
have
no
idea
about
any
of
that.
A
That's
if
you
had
anything,
looks
on
that'd
be
great
to
get
him
in
there.
Cuz
there's
a
whole
lot
as
I'm
still
learning
all
this
stuff
myself,
but
as
I
do
it
I
realized
that
I'm
getting
my
hands
on
all
sorts
of
data
and
flinging
them
here
and
storing
it
there
and
training
this
thing
here
and
and
like
how
does
that
relate
those
datasets
that
train
the
model?
How
do
they
relate
they've
got
PII
and
is
the
model
parameters
end
up?
Having
PII
and
them
I
don't
know
it's
theirs.
That's
a
big
open
question.
You.
E
A
A
There
was
was
an
interesting
talk.
Ikura
was
might
have
been
a
Linux
Foundation
when
without
talking
about
the
IP
of
training
models,
so
we
heard
there's
lots
of
work
and
prior
art
and
even
in
legal
test
cases
on
things
like
GPL,
and
you
know
the
Apache
License
and
things
like
that,
what
they
mean
and
who
owns
it
and
copyright.
But
if
a
model
is
a
binary,
that's
compiled
from
data
like
who?
A
How
does
that
copyright
leave
if
that
model
can
exist
without
that
data
input,
as
well
as
the
hyper
parameters
that
were
chosen
by
the
human
and
the
algorithms
in
all
of
the
feature?
Engineering
choose
a
human
thing,
partly,
but
there's
some
I
don't
know.
If
I
chose
in
the
technical
challenges
question
like
IP
management,
that's
probably
many
more
than
Linux
foundations.
They
I
agree.
That's
probably
thinking
about
that.
I
thought
that
was
an
amusing
idea
issues.
A
C
B
A
Then
it's
easy
enough
to
copy,
and
if
it
has
descriptors,
you
know
be
good,
yes
and
yeah.
One
other
thing
that
just
popped
in
my
mind,
see
you
mentioning
before
about
a
sort
of
qualitative
and
quantitative
differences
stuff.
So
one
thing
that's
become
apparent
to
me.
Using
this
stuff
is
like
in
a
software
development,
workflow
you'll
be
working
on
an
algorithm
or
tuning
something.
A
Typically,
the
feedback
loop
is
pretty
quick
and
you
tend
to
work
on
one
thing
at
a
time:
I'm:
finding
when
I'm
training
models
to
do
things
it
can
take
hours
for
it
to
crunch
things.
So
I
would
sometimes
I
might
have
a
few
different
ideas
to
try
and
I'll.
Maybe
try
five
of
them
at
once
and
I'll.
Let
them
all
on
different
sets
of
clusters
of
machines
and
then
come
back
and
sort
of
pick
the
winner
and
go
okay
yeah
that
won't
work
that
didn't
and
that.
E
C
D
C
A
That
the
in
theory,
the
sort
of
get
based
approach
that
Terry
sort
of
worked
on
with
Jenkins
thanks
in
theory,
lets
you
do
where
you
can
have
each
one
of
the
experiments
could
be
a
branch
to
go
off
and
then
or
maybe
even
a
poor
request.
And
then
you
just
come
back
and
look
at
your
pull,
requests
and
decide
which
one
pick
the
winner
delete
the
rest
and
that's
I
that
doesn't
quite
have
a
parallel,
but
yeah.
A
It's
something
that's
so
this
is
more
for
developers
than
Amin
machine
learning
for
data
scientists
is
completely
natural
for
them,
so
they
don't
need
to
them.
But
for
me
this
was
like
a
it's
like
this
stuff
takes
a
long
time.
If
I
do
this,
serially
I'm
gonna
be
spending
weeks
working
on,
it's
like
I
need
to
man,
and
that's
something
I
still
struggle
with
is
like
managing
the
different
ones.
A
It's
like
well
I'll,
take
this
and
this
because
you
got
to
tweak
the
training
data,
tweak
the
parameters,
the
hub
parameters
and
and
leave
a
note
that
you
did
something
so
in
theory
that
sort
of
branch
wears
with
no
sort
of
pour
requested.
Notes-
or
you
know
something
that's
trafficked
as
a
version
thing
with
everything
together.
A
E
The
the
vision
in
in
in
that
space
is
do
not
just
facilitate
you
being
able
to
run
parallel
experiments
with
with
different
feature
sets,
but
also
to
actually
enable
you
to
create
evolutionary
systems
where
you
can
house
it.
Have
your
models,
compete
against
each
other
and
influence
their
write,
their
own
parameters.
E
C
That's
a
really
interesting
problem
because
I'm
thinking
about
you
know
this
scenario
that
it
takes
a
week
to
train
this
model
and
you're
three
hours
out
from
the
end
of
the
week
and
the
system
crashes.
Okay,
we
have
to
start
this
whole
thing
all
over
again
or
is
there
some
kind
of
break
point
that
we
can
identify
through
that
process?
C
Is
this
some
yeah
and
it
seems
like
that's
what
we're
trying
to
inject
we're
trying
to
inject
linear
processes,
or
at
least
a
linear
overlay
onto
a
fundamentally
nonlinear
process,
to
unable
to
kind
of
retro
actively
build
up
the
state
of
the
world
at
a
particular
point,
to
avoid
building
that
state
of
the
world
from
the
raw
data
again
and
then
from
that
point.
Hopefully
we
can
just
maybe
do
the
last
day's
worth
of
training
rather
than
the
last
four
days.
C
D
E
What
you're
doing
is
is
hill
climbing,
then
the
point
at
which
you
starts
is
actually
going
to
influence.
You
know
a
final
outcome,
drastic
thing,
so
you
may
get
stuck
in
a
you
know:
local
Maxima,
that's
different,
see
the
result.
You'll
get
from
started
at
slightly
different
point
in
time.
So
so
there
are
many
situations
in
in
the
fundamental
modeling
where
you,
depending
on
how
you
approach
the
problem,
you
may
not
actually
have
a
repeatable
solution,
so
in
many
cases
you
you
can
run
the
same
training
with
the
same
training
set.
E
B
B
Austrians
can
see,
isn't
this
about
capturing
all
the
parameters
and
making
it
repeatable,
rather
than
about
being
able
to
snapshot
the
process
part
way
through
and
therefore
resume
from
three
days
into
the
four
day.
Training
I
know
they're
both
important
and
relevant
problems,
but
I
don't
quite
understand
how
they're
linked.
E
Multiplied
to
a
very
different
result
across
two
two
runs,
so
you
know
you
you.
You
have
to
consider
that
in
a
lot
of
cases,
you're
you're
doing
parallel,
processing
across
multiple
GPUs
and
then
combining
the
results
and
doing
more
processing.
So
the
sequence
in
which
the
combination
happened
right.
B
A
A
A
And
then
I've
got
some
more
things
to
fill
out
there
and
yeah.
There
might
be
some
more
technical
challenges
to
fill
in
around
checkpointing.
Having
said
that,
I
think
there's
plenty
of
ground
to
cover.
So
there's
a
lot
of
like
Terry,
said
at
start,
there's
a
lot
of
companies
and
enterprises
doing
interesting
stuff
with
machine
learning
that
aren't
sort
of
at
this
extreme
end
like
they're
doing
like
the
data
sets,
I'm
feeling
means
they're
in
you
know
under
a
gigabyte
and
it
still
takes
hours
to
train
thing.
A
That's
not
big
data
by
any
stretch
of
the
imagination
and
there's
a
lot
of
value
in
that.
So
there's
still
a
lot
of
good
stuff
to
be
done
there,
but
it's
worthwhile
thinking
about
these
sort
of
extreme
cases,
cuz.
It's
fast
moving
field
and
fast
bang
and
yeah
yeah.
Well,
thanks!
Everyone.
Thank
you.