►
Description
Presented by: Dmitry Petrov
As always, feel free to leave us a comment below and don't forget to subscribe: http://bit.ly/subgithub
Thanks!
Connect with us.
Facebook: http://fb.com/github
Twitter: http://twitter.com/github
LinkedIn: http://linkedin.com/company/github
About GitHub
GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Millions of people use GitHub to build amazing things together. For more info, go to http://github.com
A
A
My
name
is
Dmitry
pietrov.
Today
we
are
going
to
talk
about
machine
learning
with
Git
and
experiment
tracking
with
code
spaces.
When
you
create
your
machine
learning
model,
you
need
to
train
the
model,
dozens
hundreds,
sometimes
thousands
times
and
that's
crucial-
to
keep
track
of
all
the
experiment
you
run.
So
you
can
return
to
the
past
and
understand
what
happens
yesterday,
what
happened
last
last
week
and
that's
important
for
your
team
to
understand
the
progress
of
the
project
and
how
to
reproduce
your
results.
A
Today,
we
will
learn
how
to
simplify
operations
around
machine
learning,
how
to
track
experiments
not
without
introducing
any
additional
pieces
of
infrastructure.
Instead,
we
will
be
using
git
and
ecosystem
around
git
in
order
to
track
your
experiments
and
collaborate
in
the
team.
A
little
bit
about
myself.
I
am
a
co-founder
of
a
startup
iterative
EI.
We
build
tools
for
machine
learning.
We
believe
that
machine
learning,
engineers
and
researchers
should
live
on
the
same
ecosystem
of
the
tool
as
software
developer.
So
in
this
way
we
can
collaborate
more
efficient
and
build
our
idea
application
faster.
A
We
need
to
use
git,
GitHub
and
all
data
system
around
to
improve
the
efficiency
of
our
teams.
My
experience
came
from
Microsoft
I
was
a
data
scientist
at
Microsoft
for
a
number
of
years
and
later
I
created
an
open
source
tool,
DBC
or
data
Version
Control,
which
helps
you
to
manage
your
data
set
models
in
the
storages
at
the
same
time,
use
them
in
your
git
repositories,
with
together
with
your
modeling
code.
Today
we
will
be
talking
about
experiment
tracking
and
how
to
use
DVC
for
tracking
your
experiments
and
utilizing
git
okay
system.
A
First,
we
will
talk
about
experiment,
tracking
and
importance
of
the
this
experience.
Why
we
need
this
and
how
people
use
this
to
improve
their
efficiency?
Then
we
will
talk
about
your
ideas
and
how
to
get
this
experience
without
any
additional
pieces
in
your
infrastructure,
how
to
utilize
verse
code
IDE
how
to
utilize
git
to
track
the
experiments.
A
Do
you
remember
what
metrics
you
got
when
you
was
training
model
today
in
the
morning?
Do
you
remember
what
Matrix
your
model
had
yesterday
or
maybe
a
month
ago,
it's
really
complicated
even
for
for
yourself
I'm,
not
even
talking
about
the
team,
so
you
need
a
system
a
place
when
this
information
is
stored
together
and
you
can
return
back
to
any
any
experiment
that
you
run
before
the
system
should
work
as
a
git
for
software
engineering.
But
you
need
a
little
bit
more
for
machine
learning.
A
You
need
to
remember
your
source
code
version,
your
metrics
hyper
parameters,
versions
of
your
data,
a
lot
more
that
you
need
for
software
development.
This
is
when
the
experimentation
tracking
tool
came
from
in
general.
In
the
tools
you
need
a
few
components.
First,
you
need
to
lock
your
metric
source
code
and
Hyper
parameter
somewhere,
and
you
need
to
have
a
centralized
place
when
you,
your
teammates,
can
look
at
this
metrics
and
Hyper
parameters
and
make
some
data-driven
decisions
based
on
this
information
and
it's
crucial
to
have
ability
to
reproduce
the
result.
A
That's
why
you
need
experiment
tracking.
This
is
the
maximum
value
you
can
get
from
this
to
reproduce
the
result
and
be
efficient.
It
helps
you
to
be
efficient
as
a
team,
the
first
experiment,
tracking
Tool
at
least
that
I
know
it's
a
circuit
which
was
released
in
2014..
If
you
are
doing
machine
learning
for
a
long
time,
you
probably
remember
that
tool.
I
will
use
this
tool
as
a
reference
for
functionality
of
experiment
tracking.
You
can
use
the
a
UI
of
socket.
A
This
is
how
web
UI,
when
you
can
see
all
the
runs
you
made
like
with
the
Run
ID,
all
the
hyper
parameters
you
use,
like
a
learning
crate,
is
0.003
batch
size
and
metrics.
The
metric
that
you
model
produced
for
this
particular
run
for
this
particular
set
of
metrics.
So
you
can
return
back
to
the
table
and
see
what
happens
to
the
ammonium
yesterday,
a
month
ago,
in
such
a
tool,
you
can
attach
more
to
your
experiments.
You
can
go
deeper.
You
can
understand
what
is
inside
the
models.
A
If
you
put
some
visuals
graphs
like
area
under
the
curve
loss
functions
all
the
information
that
helps
you
to
make
this
data-driven
decision,
and
even
the
first
experimenter
I,
can
tool
supported
this
functionality.
You
can
put
lost
function,
confusion,
Matrix,
anything
that
is
needed
to
remember
what
happened
in
the
past
when
you
was
working
on
this
experiment,
the
success
of
sacred
tool
and
the
rise
of
machine
learning
in
the
industry
created
a
lot
more
different
tools
for
for
experiment,
tracking
such
as
tensorboard
the
mail
flow
and
some
others.
A
Majority
of
the
tool
appears
in
around
2018,
but
there's
still
a
lot
of
progress
in
this
area.
Let's
take
a
look
at
the
architecture
of
this
system.
When
you
work
with
experiment
tracking
tool,
you
need
to
deal
with
additional
part
in
your
it
infrastructure.
You
need
to
have
the
centerless
table
somewhere
when
you
put
the
information,
so
you
and
your
teammate
can
can
look
at
it.
Sometimes
you
just
need
a
database
and
web
UI
on
Top
This
is
how
initial
version
of
the
socket
works
or
ml
flow.
A
Some
solution
put
the
database
and
web
UI
in
a
form
of
SAS,
so
you
just
need
to
have
a
login
to
the
system,
and
then
you
will
be
able
to
stream
the
your
metrics
images
to
a
SAS,
so
you
and
your
team
can
look
and
visualize
to
the
to
the
experiment.
This
SAS
or
service
becomes
an
additional
part
of
your
machine
learning
infrastructure
and
this
picture
becomes
even
more
complicated.
A
When
you
look
at
cloud
training
experience
when
you
need
to
train
in
a
cloud,
you
need
another
additional
company
that
manages
your
Cloud
Ram
infrastructure,
provisioning
and
such
so.
The
system
becomes
very
complicated.
You
just
need
to
track
experiment.
You
just
need
to
run
in
the
cloud.
At
the
same
time,
you
already
got
two
additional
companies
to
your
infrastructure.
You
double
or
even
triple
complexity
of
ml
operations
and
the
question
we
are
asking
today:
can
we
simplify
this
experience?
Can
we
remove
the
moving
parts
or
in
your
it
infrastructure
how
to
do
that?
A
We
need
to
reuse
the
infrastructure
company
that
we
already
have
I'm
talking
about
your
git
as
a
source,
control
tracking
tool.
I'm
talking
about
your
git
servers
such
as
GitHub
voice
code
extension,
your
IDE,
why
do
you
need
a
separate,
UI
and
service
for
experiment
tracking?
Why
don't
you
have
this
experience
right
in
your
ID
in
the
place?
A
When
you
manage
your
code,
you
won't
need
to
switch
back
and
forth
between
your
development
experience
when
you
write
the
modeling
code
and
your
ml
tracking
experience
when
you
see
your
runs,
this
can
dramatically
improve
our
efficiency,
efficiency,
around
infrastructure
and
efficiency
in
our
workflow.
This
is
how
we
structure
our
next
steps.
We
will
talk
about
each
of
these
components
and
how
to
use
this
in
your
ml
training
process.
A
A
We
are
starting
with
experiment,
tracking
user
interface,
how
we
can
get
the
reach
interface
right
in
your
idea,
together
with
your
code,
so
you
will
have
your
coding
experience
the
best
developer
experience
you
can
get
and
at
the
same
time
you
will
track
experiments
right
on
the
same
application.
Whenever
it's
run
in
in
your
laptop
or
in
a
cloud,
you
should
have
those
pieces
together
and
when
you
run
it
in
your
IDE,
you
don't
need
additional
infrastructure.
A
You
don't
need
additional
server,
because
you
have
one
all
right
now,
this
fun
stuff
begins:
I'm
open
my
best
code
extension
and
my
laptop.
We
start
local
locally,
and
then
we
go
to
Cloud.
This
is
a
regular
OS
code.
Experience
you're,
probably
familiar
with
this
very
well.
My
training
code
I
trained
something
I
save
my
model,
so
very
usual
modeling
stuff.
What
is
special
about
my
environment?
Invest
code,
I
have
a
plugin
installed
for
experiment
tracking,
so
we
can
see
the
list
of
our
experiments
with
metrics.
A
You
can
look
at
average,
Precision,
AUC
and
Hyper
parameters.
So
far
we
got
only
four
of
those
in
real
case.
Yes,
you
might
have
dozens
of
Matrix
and
Hyper
parameters,
but
let's
make
it
simple
for
the
demo,
you
can
see
the
entire
history
of
my
experiments
and
now
we
can
run
a
new
one.
I
am
choosing
an
experiment
and
run
it.
So
in
this
case,
I
will
be
changing
the
number
of
estimators
and
maximum
features
right
now,
I'm
using
a
very
simple
project.
It's
a
NLP
problem
with
a
binary
classifier
using
traditional
machine
learning.
A
So
the
training
takes
only
a
few
seconds
and
we
will
see
the
result
soon
here.
It
is
a
new
experiment
in
the
table.
You
can
see
the
time
the
new
value
of
the
Matrix,
and
now
we
can
compare
the
experiments.
The
current
experiment
performs
slightly
better
than
experiments.
We
started
originally
from
this
diagram.
Experiment
is
actually
a
git
Branch,
the
original
Branch
we
started
from
and
the
difference.
As
you
can
see
substantial,
we
can
go
deeper
and
look
at
the
Matrix
and
graphs
that
we
produced
during
the
experiment.
So
that's
this
Precision
recall
curve.
A
You
can
see
that
our
experiment
performance
slightly
better
than
the
original
one.
In
addition
to
basic
metrics,
you
can
see
more
advanced
visuals,
such
as
confusion,
Matrix
for
our
binary
classifier.
You
can
take
a
look
at
the
feature,
importance
graph
and
compare
those
two.
You
can
notice
that
feature
importance
is
just
regular
image
file,
regular
PNG
file
that
we
produced
as
a
result
of
our
experiments
and
the
tool
can
pick
it
up
and
show
it
right
here.
Sometimes
two
experiments
is
not
enough.
Why
don't?
A
We
take
a
look
at
the
experiment
that
we
run
let's
say
a
month
ago
and
look
at
all
three
experiments.
So
all
three
are
here:
you
can
see
the
IDS
and
all
the
three
graphs
in
one
in
confusion,
Matrix.
Of
course
you
cannot
put
one
Matrix
over
the
others,
so
you
can
see
them
separately
and
the
same
for
the
feature
importance.
This
is
the
graph
for
those
experiments
you
run
and
those
two
are
old
ones.
A
What
you
see
here
is
a
regular
way
of
tracking
machine
learning
experiments,
but
we
have
all
this
experience
right
in
your
idea.
Next
to
the
code,
you
can
click
and
jump
to
your
source
code
right
away,
modify
the
code
and
after
modification
just
click
run,
and
it
will
train
and
preserve
all
the
results,
all
the
changes
in
source
code,
all
the
changes
in
hyper
parameters
and
produce
the
metrics
in
the
table.
This
is
the
way
we
integrate
your
development
experience.
A
With
your
modern
experience,
you
you
don't
want
to
switch
back
and
forth
between
the
IDE
and
coding
and
some
sauce
and
some
UI
in
a
separate
place.
You
have
everything
here
how
we
generated
the
metrics,
how
we
generated
the
visuals.
You
need
to
instrument
your
source
code
using
some
libraries.
In
this
case
we
are
using
DVC,
Live
library,
but
any
experiment
tracking
tool
has
its
own.
You
import
the
library
and
lock
the
Matrix
like
here.
This
is
I
log
curve.
A
This
is
how
I
log
on
my
Precision
recall
curve,
and
this
is
how
I
save
my
Precision
numbers,
and
after
this
you
will
see
the
result
in
the
table
before
we
jump
to
the
details
and
see
how
it
works
under
the
hood.
How
we
use
git
to
save
the
table
I
want
to
mention
to
some
small
details
here,
how
you
can
customize
your
table.
You
can
remove
some
of
your
columns
that
you
don't
need
anymore.
A
Some
of
the
hyper
parameters
that
you
you
don't
need
when
the
table
is
customized.
You
can
understand
your
Matrix
better
help
you
make
data-driven
decision
faster,
but
how
to
get
this
experience.
That's
very
simple!
You
see
the
DVC
plugin
installed
in
the
west
code,
and
this
is
what
you
need
to
get
the
table.
You
can
go
to
the
extension
Marketplace
search
for
DBC
and
install
it
after
the
first
run.
You
will
see
the
table
and
you
can
start
your
journey
in
experiment
tracking
quiz
code.
Now
we
go
a
little
bit
deeper.
A
We
will
see
how
this
works
under
the
hood,
how
experiment
tracking
is
implemented
using
git?
How
git
becomes
the
source
of
Truth
for
thousands
of
your
experiment
that
run?
You
are
using
the
same
set
of
tools
and
technology
for
both
for
your
source
code,
tracking
animal
experiment,
tracking,
using
git
for
machine
learning,
experiment
tracking,
simplify
your
workflow.
We
will
show
you
how
to
use
git
branches,
pull
request
to
collaborate
with
teammates
on
machine
learning,
experiments
and
how
to
do
this
efficiently,
using
the
tool
that
you
already
have
and
what
is
even
more
interesting.
A
It
helps
you
collaborate
better
with
not
machine
learning
teams,
your
devops,
your
it
infrastructure
people,
your
software
Engineers,
know
very
well
how
to
use
a
git.
You
already
know
how
to
install
the
extension
and
how
to
see
the
table
with
your
experiments,
visuals
and
all
the
information
that
you
might
need.
Now
we
will
see
how
this
implemented
and
how
we
leverage
git
to
keep
information
about
our
experiment.
If
you
look
to
the
table,
you
can
notice
the
ideas
of
the
experiment,
believe
it
or
not.
A
Those
ideas
are
git
commits
and
by
these
ideas
you
can
find
the
exact
information
about
your
experiments,
the
exact
source
code,
metrics
and
data,
or
at
least
pointers
to
the
data
sets.
Let's
do
a
simple
experiment.
So
there
are
two
different
commits,
the
last
experiment
and
the
previous
one.
What
we
can
do
we
can
do
a
simple
git,
GIF
command
for
one
of
those
experiments,
the
ID
and
another
experiment.
A
A
You
can
see
that
that's
a
simple
Json
file
with
the
numbers
and
git
can
show
you
the
difference
between
the
Json
files.
If
you
are
not
happy
with
the
Json
div
right
for
the
numbers,
it's
kind
of
probably
not
the
best
experience,
you
can
use
DVC
functionality
to
get
difference
between
the
numbers,
so
DVC
div
works
the
same
way
pretty
much
as
a
git
does,
but
it
shows
you
data
driven
div
based
on
the
metrics.
In
the
last
session,
we
will
be
talking
about
training
in
cloud.
In
many
ml
projects,
you
need
Cloud
compute.
A
Sometimes
you
just
need
a
GPU
instance
to
run
your
deep
learning
model.
Sometimes
you
need
a
little
bit
more
memory
and
that's
a
constant
source
of
a
paint
problem
for
teams.
You
need
to
learn
how
to
get
these
resources
from
the
cloud
you
need
to
set
them
up.
You
need
to
clone
your
repositories.
You
need
to
sync
the
data
you
need
to
etc,
etc,
etc,
and
not
forget
to
shut
down
the
instance
when
training
is
ready,
so
you
won't
be
wasting
money
until
your
instance
is
running
over
the
weekend.
A
A
new
project
code
spaces
unifies
this
experience
with
code
spaces.
You
can
get
your
compute
right
from
web
UI
of
GitHub.
You
will
get
your
GPU
instance.
Your
ID
installed
your
git
repository
clone
there
good
portion
of
problems
that
you
have
will
be
solved
by
by
a
single
click
in
additional
to
all
the
benefits
of
using
compute.
In
such
a
simple
way,
we
will
talk
about
the
correlation
and
unification
of
your
infrastructure
with
code
spaces
and
risk
West
code.
A
There
is
a
way
how
to
unify
your
environment,
on
your
laptop
and
in
a
cloud
so
you'll
have
the
same
set
of
libraries,
the
same
the
same
versions
and
the
same
source
code.
It
simplifies
the
way
how
you
operate,
especially
for
the
teams.
We
will
also
talk
about
environments,
how
to
unify
environments
on
your
laptop
in
a
cloud
and
in
the
cloud,
so
you
won't
have
this
problem
like.
Oh,
it
worked
on
my
machine,
but
no
one
else
right
and
code
spaces
has
a
special
technology
to
simplify
this
process
and
improve
this
experience.
A
Let's
jump
to
the
GPU
part,
the
cool
part.
This
is
an
easy
way
to
run
your
ml
training
in
a
cloud
for
this
demo
for
GPU
demo,
I
will
be
using
a
different
project,
a
deep
learning
project.
So
in
the
web
page
of
the
project
we
just
need
to
get
to
the
code
spaces,
but
for
GPU
we
need
to
configure
the
page
a
little
bit.
So
we
are
choosing
the
GPU
instance.
When
you
push
the
button,
it
will
create
a
container.
A
It
might
take
a
minute
or
two.
So
instead
of
waiting,
I
will
return
back
and
launch
a
container
that
I
already
have.
Let
me
click
here
and
we
will
get
our
server
with
a
GPU
in
a
couple
of
seconds.
So
that's
our
source
code
and
the
source
code
in
a
cloud
and
you
we
can
see
the
great
developer
experience
with
OS
code.
While
it
runs
in
the
cloud.
Do
we
have
GPU?
Let's
check
it
out,
that
is
Tesla
V100.
A
Not
bad
for
our
project,
so,
let's
start
training,
DVC
plugin
is
set
up.
We
can
go
to
the
experimentation
table
and
from
the
table.
Let's
pick
that
experiment
and
run
it
with
slightly
different
set
of
hyper
parameters.
So
we
are
choosing
this
one
and
that
one
until
this
is
right
and
we
will
check
our
source
code
and
how
deep
learning
project
is
instrumented
by
metrics,
as
as
before
we
use
DVC
Live
library
in
Python,
you
can
find
our
callback
function.
Callback
helps
you
to
simplify
the
way
how
you
output
your
metrics.
A
That's
an
example
of
report
and
casino
metrics.
This
is
how
you
log
your
loss
function.
All
right
training
is
in
progress.
Let's
take
a
look
at
the
table.
This
is
a
similar
table
that
is,
you
have
seen
before.
We
just
use
a
different
set
of
metrics
right
and
different
set
of
hyper
parameters.
What
is
special
about
deeply
learning,
usually
the
slow
right
and
you
can
look
at
the
progress
of
the
training
in
our
plots.
If
you
choose
experiment
that
we
are
running
right
now,
you
can
see
the
progress
over
time.
A
This
number
is
increasing
and
in
real
time
you
see
the
value
of
your
loss
function
of
your
accuracy.
So
let's
look
at
the
metrics
that
we
use
for
this
particular
project
so
accuracy
and
in
addition
to
the
Matrix,
we
got
misclassified
images
so
that
image
was
classified
in
a
wrong
class.
So
it's
a
croissant
which
model
recognizes
a
cat,
so
that's
a
way
how
you
can
attach
more
visuals
to
your
project
and
you
can
make
a
better
decision
about
your
next
steps
in
the
modern
process.
A
Gpu
that
you
have
right
in
a
GitHub
brings
you
like
completely
new
experience
right.
You
have
all
the
experimentation
tools
in
one
place,
but
there
is
a
one
more
important
company
that
we
should
talk
about.
It's
about
country
containers,
it's
about
the
way
how
you
unify
your
environment,
you're,
probably
familiar
with
the
docker,
but
in
a
code
spaces
there
is
an
another
way
of
creating
a
Docker
and
I
would
say
even
a
simpler
way
of
creating
Docker.
Take
a
look
at
this
Dev
container
Json
file.
A
It
has
a
reference
to
initial
container
that
we
need,
which
we
use
Python
3..
We
use
a
few
features,
we
use
Cuda
and
we
use
iterative
dbcm.
If
you
create
this
image
with
that
features,
so
you
will
get
all
the
functionality
in
your
Docker
container
on
your
developing
environment
in
a
cloud
you
can
specify
more
items
here
like
extensions
right.
We
use
python
extension
from
Microsoft.
A
We
use
red
hat
yaml
extension
from
Red
Hat
to
visualize
our
files
and
iterative,
DVC
extension
for
the
experimentation
table
that
we
have
seen
before
all
in
one
place.
When
you
ask
for
a
GPU
instance,
you
will
get
all
this
environment,
so
it
helps
you
to
unify
environment
for
your
team
members.
But
what
makes
this
really
cool
is
a
local
experience.
A
You
can
use
the
same
container
in
your
local
West
code.
Let
me
show
you
how
it
works
if
you
open
your
West
code
and
if
you
open
this
project
in
your
laptop
in
the
west
code,
so
it
will
ask
you
automatically.
Do
you
need
to
use
this
container?
If
you
answer
yes
and
we
got
the
same
container,
the
same
environment
with
the
same
set
of
libraries
plugins
and
all
these
features
in
our
local
environment?
A
So
now
we
have
unified
environment
everywhere
in
a
cloud
in
your
laptop
all
the
code,
you
create
all
the
experiment,
you
run
work
everywhere.
Now
it
doesn't
mean
I
got
GPU
on
my
machine,
so
it
might
be
running
for
days,
but
you
got
the
point.
This
is
the
way
how
you
can
unify
your
environment.
This
is
the
way
how
you
efficiently
can
collaborate
with
your
teammates
with
your
devops
team,
who
is
responsible
for
deploying
care
models.
They
will
know
exactly
what
library
you
use,
what
images
you
use
for
business.
A
It
helps
maximizing
value
from
investment
that
you
already
made
investment
to
your
gitaka
system
and
tools
to
Agile
process
and
to
your
team
collaboration.
So
all
together,
we
can
be
more
efficient
in
create
and
color
AI
projects.
Thank
you
for
your
attention.
Please
reach
out
me
to
discuss
this
ideas.