►
Description
More about this lecture: https://sites.google.com/lbl.gov/dl4sci/koustuv-sinha
Deep Learning for Science School: https://dl4sci-school.lbl.gov/agenda
A
B
Good
morning,
everyone
welcome
again
to
the
deep
learning
for
science
school.
This
is
the
fourth
week
of
our
webinar
webinar
series
this
year.
I'm
very
pleased
to
have
costa
sinha
today
to
give
us
a
lecture
on
reproducibility
and
machine
learning.
Gustav
is
a
phd
candidate
at
mcgill
university.
He
is
working
with
joel
pinoue
and
william
hamilton
and
his
currently
a
research
intern
at
facebook.
B
Ai
gustav
has
his
primary
interest
is
in
advancing
logical
generalization,
capabilities
of
neural
models
in
discrete
domains
such
as
language
and
graphs,
he's
also
involved
in
organizing
the
annual
machine
learning,
pre-predisability
channel
challenge
and
he's
serving
as
a
reproducibility
co-chair
at
new
york's
2019
and
2020..
Let's
go
stuff
I'll
give
you
the
floor.
A
Thank
you
so
much
for
the
nice
introduction,
mustafa
yeah,
so
welcome
to
the
talk.
So
basically
in
this
talk
I
will
go
over
like
four
different
agendas.
So
basically,
first
I
will
talk
about.
Why
do
we
need
reproducibility
in
science,
especially
in
machine
learning?
Then
I
want
to
talk
about
like
the
case
study
to
niche
need
the
reproducibility.
A
So
I
will
go
over
like
certain
findings
that
came
up
in
the
literature
recently,
then
I
will
talk
about
what
is
the
machine
learning
community
doing
about
it
like
what
are
the
steps
that
we
as
a
communities
are
taking
about
reproducibility
and
then?
Finally,
I
will
talk
in
depth
about
how
you
can
perform
reproducible
research,
so
yeah.
So
basically
a
brief
background.
Above
me,
mustafa
already
gave
like
I'm
a
phd
student
at
mcgill,
university
and
mila,
and
I'm
a
intern
at
facebook.
Ai
and
I've
been
like
running
reproducibility
challenges
for
the
last
three
years.
A
Now
reproducibility
replicability
robustness
like
these
are
different
terms,
which
are
kindly
kind
of
like
used
in
different
context,
and
there
has
been
like
a
recent
debate
on
the
exact
definitions
of
these
different
terms.
So
just
for
the
purposes
of
this
talk,
I'm
going
to
like
set
up
a
definition.
A
May
this
definition
may
not
be
like
kind
of
like
well
accepted,
basically,
because
there
are
certain
things
that
overlap
so
reproduce
by
reproducible.
I
mean
that
if
you
use
the
same
code
and
same
data
given
by
the
authors,
if
you
get
the
same
results,
then
that
is,
reproducible,
whereas
robust
is
when
you
use
the
a
different
code,
but
on
the
same
data.
A
That
is
what
we
are
looking
for
at
robustness,
whereas
replicability
is
using
the
same
code
but
on
a
different
set
of
data,
and
if
you
have
both
of
them
different,
but
you
still
get
the
same
claims
of
a
particular
paper,
then
that
is
like
the
highest
level
of
reproducibility.
That
is
generalization
so
now
taking
these
definitions
into
account.
A
What
is
the
crisis
that
we
are
facing
so
in
nature,
published
a
nice
report
in
2016
by
m
baker
on
reproducibility
crisis
in
science,
so
when
asked
people,
so
this
is
basically
a
like
a
set
of
people
with
different
domains
in
science,
so
not
only
like
machine
learning,
so
basically
52
percent
of
the
respondents
like
replied
with.
A
Yes,
we
are
facing
a
significant
crisis
and
38
responded
that,
yes,
we
are
facing
a
slight
crisis
now,
if
we
look
at
the
domains
that
are
involved
so
chemistry
and
biology
come
up
like
the
top
two
domains
where
they
face
a
lot
of
reproducibility
crisis.
Now,
looking
at
this,
we
might
feel
safe
that
okay,
computer
science
doesn't
probably
have
that
much
reproducibility
crisis.
But
that
is
not
the
case,
although
in
computer
science
we
can
actually
do
a
bit
more
about
the
reproducibility
crisis.
A
So
basically,
let's
talk
about
certain
findings
in
reproducibility
in
in
basically
in
machine
learning,
so
reproducibility
should
be
easy
right,
at
least
for
us
computer
scientists.
We
have
the
same
code.
We
have
the
same
amount
of
data.
We
might
also
have
the
same
amount
of
computation
and
then
you
should
get
the
same
result.
Right
now
seems
like
it's
not
the
same
case
so
last
year
in
europe's
2019
edward
ref
published
a
paper,
so
this
paper
a
step
toward
quantifying
independently
reproducible
machine
learning
research.
A
In
this
paper
he
tried
to
reproduce
255
papers
which
were
published
from
1984
to
2017,
and
then
he
found
out
that
63.5
of
the
papers
were
reproducible
and
the
rest
were
not.
Now.
He
further
went
on
to
reproduce
85
percent
of
the
result
where,
after
getting
assistance
from
the
original
authors
compared
to
four
percent,
when
the
authors
didn't
respond,
so
he
only
could
reproduce
four
percent
of
the
papers
without
having
any
author,
like
suggestions,
so
he
came
up
with
some
significant
factors
which
affect
reproducibility.
A
That
includes
pseudo
code,
so
the
papers
that
he
reproduced,
like
basically
had
all
code
available,
but
that
is
not
always
the
case
in
case
of
machine
learning
research
these
days
because
of
several
companies
having
like
their
proprietary
use
of
code.
So
he
mentioned
that
pseudo
code
hyper
parameters
readabilities,
especially
in
equations
and
tables,
and
especially
the
amount
of
compute
that
is
needed.
These
are
kind
of
like
very
significant
factors
affecting
the
reproducibility.
A
Now
there
are
certain
limitations
of
this
work,
so
this
work
was
primarily
done
by
one
author
and
over
the
span
of
five
years,
so
that
might
introduce
certain
biases
in
the
study.
But
still
this
is
a
very
significant
study
which
shows
that
there
is
the
issue
in
reproducibility
in
machine
learning
research.
A
Now,
if
we
dive
a
bit
deep
in
several
subfields
of
machine
learning,
let
us
talk
about
the
computer
vision
subfield.
Now.
This
is
one
field
where
people
tend
to
think
less
about
reproducibility
because
they,
like
the
usual
notion,
is
that
okay,
whatever
we
are
doing,
should
be
reproducible
because
it's
the
same
data
set
is
the
same
training
pipeline
and
essentially
people
in
vision,
use
model
architecture
such
as
resnet
or
densenet,
which
are
like
deep
architectures.
A
To
give
you
like,
a
pretty
good
performance
performance
on
a
large
set
of
standard
data
sets,
but
there
has
been
like
several
works,
showing
that
reproducibility
is
still
a
big
issue.
So
xavier
butler
published
this
paper,
unreproducible
research
is
reproducible
last
year.
So
in
this
paper
he
shows
that
if
you
vary
the
seeds
of
different
models,
the
like
different
models
have
different
variation
in
the
errors.
So
here
in
the
x-axis,
this
is
the
model
error
and
in
the
y-axis
you
see.
Two
representative
data
sets
the
m-list
data
set.
A
A
Now
we
see
that
even
like
in
the
small
data
set,
the
like
the
variation
in
seed
is
really
like
important,
and
that
is
really
a
big
issue
in
the
smaller
data
sets,
whereas
in
bigger
data
sets,
the
variation
is
still
there,
but
it's
still
not
that
that
prone
to
it,
but
certain
models
are
like,
for
example,
resonant
101
is
super
prone
to
different
variations
of
seed.
A
So
the
conclusion
of
the
paper
was
that
single
initialization
seed
is
brittle
because
most
of
the
papers
that
are
reported
right
now
are
just
reported
with
only
one
seed.
So
a
better.
Like
evaluation
for
a
given
model
would
be
to
evaluate
on
at
least
n
number
of
seeds,
but
there
is
also
like
no
consensus
right
now.
What
would
n
be?
A
Let's
talk
about
another
subfield,
so
this
is
generative
adversarial
networks
which
have
been
like
super
popular
area
of
research
and
machine
learning
over
the
last
four
or
five
years
since
its
inception,
we
have
seen
a
lot
of
papers
being
published
in
it
and
there's
a
lot
of
like
research
coming
up
in
devising
different
gans,
where
gans
have
become
so
good
that
they
can
generate
realistic
images
to
realistic
videos
these
days,
but
even
gans
suffer
from
this
problem.
So
in
2017
there
came
up
a
paper
known
as
argans
created
equal
in
that
paper.
A
Authors
showed
that
basically,
they
ran
a
hyper
parameter,
search
on
100,
different
hyper
parameter
samples
per
model,
and
they
show
that
there's
a
large
variation
in
the
results,
and
that
shows
that
how
brittle
these
models
are
like
if
the
hyper
parameter
searches
like
give
us
this
bigger
range,
then
there
is
that
we
need
to
basically
report
this,
and
if
we
are
not
reporting
it,
then
we
are
not
like
investigating
truly
on
what
is
the
model
performance,
so
no
model
was
shown
as
significantly
stable
than
the
other.
A
Now
this
raises
the
question
of
limited
computation
budget.
So,
if
you
are
a,
if
you
are,
let's
say
a
university
student,
you
do
not
have
the
budget
to
actually
learn
like
run.
100
hyper
parameter
searches,
so
you
will
basically
just
run
like
quite
a
few
number
of
samples
and
then
you
will
get
certain
result,
but
then
your
result
could
become
like
different
if
you
run
on
a
large
set
of
hyper
parameters.
A
So
that's
why
discussing
the
best
code
given
by
your
model
should
always
be
used
alongside
how
many
seats
that
you're
running,
and
we
should
always
try
to
report
faithfully
on
the
distribution
of
your
scores
rather
than
a
single
score.
A
Now,
let's
talk
about
another
field
which
is
very
prone
to
reproducibility
issues,
so
this
is
reinforcement,
learning
where
the
idea
is
that
an
agent
learns
to
interact
with
the
environment
by
trial
and
error
by
receiving
sparse
feedback
and
with
experience
it
improves
in
real
time
and
reinforcement
learning
has
been
used
in
a
lot
of
real
life
like
real
life
problems,
starting
from
robotics
to
even
financial
trading.
A
So
if
you
take
those
implementations,
you
might
end
up
finding
that
none
of
those
implementations
give
you
the
same
results
on
the
same
dataset,
so
that
that
is
really
concerning,
especially
in
case
of
reinforcement,
learning
and
people
have
shown
that
different
results
vary.
A
lot
with
different
hyper
parameter
choices.
So
these
are
super
super
sensitive,
and
if
we
look
a
little
bit
closer,
let's
say
we
take
the
same
algorithm.
We
take
the
same
code,
we
take
the
same
hyper
parameters,
but
we
just
vary
the
seeds.
A
Even
then,
the
expected
reward
achieved
by
these
models
differ
and
if
that
expected
model
like
the
reward
differs,
then
what
should
one
do
like?
Basically,
if
I
propose
a
model,
then
you
can
come
up
with
a
different
seat,
saying
that
that
model
doesn't
work
so
how
many
trials
that
we
should
do
so
in
these
experiments
when
people
report
the
number
of
trials
is
also
not
standardized.
A
Some
people
report
like
five
trials
like
taking
five
seats
while,
as
some
people
report
like
two
to
three
trials,
so
this
also
needs
to
be
standardized.
A
Another
thing
is
the
baseline,
so
this
happens
not
only
in
resourceful
learning
but
in
general
machine
learning,
as
well.
People
tend
to
under
report
their
baselines
so
that,
basically
you
want
to
show
that
okay,
my
model
is
superior
than
the
baseline,
so
people
do
not
take
care
of
them
and
people
just
report
baselines
by
copying
from
another
another
paper
which,
when,
if
you
like,
evaluate
on
different
baselines,
then
you
might
see
that
the
baseline
might
be
beating
your
model.
So
there
is
like
a
strong
positive
bias.
That's
happening
so
basically
coming
from
fair
comparisons.
A
A
A
So
first
we
talk
about
the
open
science
movement,
so
the
open
science
movement
is
not
specific
to
machine
learning.
It's
a
much
more
general
movement
which
says
that
open
science
is
transparent
and
accessible
knowledge
that
is
shared
and
developed
through
collaborative
networks.
So
that
means
that
if
you
care
about
reproducible
research,
you
should
also
care
about
open
science,
because
that's
when
the
science
is
well
disseminated
among
people,
so
there
exists
a
journal
for
reproducibility
and
I
want
to
talk
about
that
journal
first.
A
So,
since
the
issue
of
reproducibility
is
so
important,
the
resigns
journal
was
set
up
and
this
journal
is
a
much
more
generic
journal.
So
this
doesn't
necessarily
focus
on
machine
learning
only,
but
it
focuses
on
any
and
all
types
of
computational
study,
be
it
computational
neuroscience
to
computation
medicine
and
so
on,
and
people
can
submit
the
reproducibility
reports
of
published
papers
in
this
journal
now.
This
journal
also
has
an
extensive
review
process,
so
your
reproduced
work
will
be
reviewed
by
a
set
of
editors
and
from
machine
learning
community.
A
The
annual
reproducibility
challenge
reports
are
also
published
in
this
journal,
so
this
journal
is
quite
cool,
so
it
has
like
open
reviewing
on
github.
That
means
that
it
doesn't
have
like
single,
blind
or
double
blind
reviewing.
Unfortunately,
but
still
you
can
submit
any
work
that
you
wish
to
replicate
and
reproduce,
and
then
you
can
do
a
thorough
analysis
on
that
and
there
is
a
large
team
of
editors
for
rolling
review
over
the
year,
so
this
journal
essentially
exists
to
give
a
nice
incentive
for
people
to
work
on
reproducibility.
A
Now,
let's
talk
about
checklists,
so
this
is
more
specific
in
terms
of
machine
learning
research.
So
my
supervisor,
joel
pino,
like
introduced
the
machine
learning
reproducibility
checklist
in
2018.
This
checklist
is
supposed
to
be
like
a
set
of
like
items
that
you
should
check.
While
you
are
submitting
your
paper
now,
this
checklist
doesn't
need
to
be
exhaustive.
A
These
are
just
like
a
generic
guidelines
and
the
and
the
next
version
of
this
checklist
was
actually
deployed
during
the
review
process
of
new
rips
2019,
where
the
reviewers
had
access
to
the
responses
of
this
checklist
and
from
now
on
reviewers
in
eurips,
as
well
as
in
icml.
They
also
have
access
to
the
answers
of
this
reproducibility
checklist.
A
Now
this
okay,
I
have
a
question
which
says:
is
there
a
link
to
the
slides?
The
the
link
is
already
shared
in
slack
yeah?
So
when
I
talk
about
this
checklist,
so
these
are
like
part
of
new
rips,
icml
and
iclear
submission
guidelines.
And
if
you
look
into
this
checklist,
you
will
have
like
different
sections
for
reporting
model.
Algorithms
theoretical
claims
how
to
report
your
data
sets
how
to
report
your
code
bases
and
what
to
include
in
your
like
figures
and
tables
and
so
on.
A
So
this
checklist
is
essentially
is
kind
of
like
a
guideline
for
you
to
follow,
and
we
basically
did
a
lot
of
study
on
the
on
the
reports
of
this
checklist.
So
what
people
did
during
initial
submission
and
camera
ready
submission
in
terms
of
new
rips
2019.
A
So
we
collected
those
responses
and
we
did
an
analysis
so
most
of
the
like
items
that
we
are
primarily
interested
in.
So
if
you
look
at
like
link
to
code
like
link
to
code,
was
not
available
during
initial
submission,
while
the
link
to
code
was
more
available
during
camera
ready.
A
So
this
is
due
to
the
code
submission
policies
that
were
enforced
in
eurips
2019,
so
which
I
will
be
talking
about
next,
but
one
cool
finding
from
this
reproducibility
checklist-
or
I
can
say
a
surprising
finding-
is
that
36
papers
of
like
36
percent
papers,
judge
error
bars
are
applicable
to
their
results,
while
87
percent
see
clear
value
in
defining
the
metrics
and
statistics
used.
So
that
is
kind
of
like.
A
If,
if
the,
if
87
percent
see
there
is
the
value
of
the
defining
the
metrics,
then
87
percent
should
also
like
find
mentioning
error
bars
as
applicable,
but
this
is
where
we
basically
need
to
improve
as
a
community,
because
we
need
to
do
much
more
stringent
statistical
validity
of
our
models.
A
Now
there
is
also
some
effect
of
the
code,
submission
policies
so
basically
like
new
rips
2019
enforced.
That
code
should
be
submitted,
but
it
was
not
like
a
stringent
informant
enforcement.
It
was
like
okay,
you
can
submit
your
code
and
we
strongly
suggest
you
to
submit
your
code,
at
least
by
the
camera,
ready
deadline.
A
So
in
here
you
see
that
most
of
the,
like
the
academia,
submitted
their
code
in
the
initial
submission
phase,
whereas
in
industry
the
submission
in
the
initial
phase
was
not
there,
but
still
the
industry
like
caught
up
to
it
during
the
camera,
ready
submission
and
on
your
right.
You
see
the
graph
where,
like
how
many
initial
submissions
turned
from
yes
to
like
no
to
yes,
so
essentially,
a
lot
of
papers
did
accompany
the
code
during
a
camera-related
deadline.
So
this
is
all
due
to
the
different
code,
submission
policies
now.
A
Does
the
checklist
affect
the
acceptance
rate?
This
is
a
very
interesting
question
and
as
of
now,
no
we
do
not
have
statistical
significance
that
a
checklist
does
accept
affect
it,
but
reviewers.
We
found
that
reviewers
who
found
the
checklist
useful
gave
higher
scores.
So
we
asked
like
how
many
reviewers
did
find
the
checklist.
Useful
and
34
percent
of
the
reviewers
responded
that
they
found
it
useful
and
within
them
we
found
that
there
is
a
tendency
to
give
higher
scores
to
the
papers
that
faithfully
report
the
checklist.
A
So
this
is
quite
interesting
and
we
we
are
hopeful
that,
given
in
the
following
years,
the
checklist
will
be
taken
like
more
and
more
importance
both
for
people
who
are
submitting
their
papers
as
well
as
reviewers.
So
you
can
read
more
about
our
checklist
and
the
statistical
analysis
in
the
reproducibility
program
report
that
we
published
on
archive.
I
will
share
the
links
in
slack
or
in
q
a
later
on,
so
another
checklist
came
up
last
year
and
this
is
very
exciting.
A
This
is
a
checklist
from
papers
with
code
and
it
was
introduced
by
robert
stoznik,
and
this
is
called
as
the
ml
code
complete
list
checklist.
So
this
checklist
gives
you
like
a
nice
set
of
instructions
that
you
should
add
in
your
readme,
while
you're
open
sourcing
your
code.
A
So
they
did
a
like
a
nice
study
with
this
set
of
five
criterias
using
new
ribs
2019
repositories,
and
they
found
that
repositories
which
has
all
of
these
five
criterias
met.
Had
a
median
of
196.5
github
starts,
so
that
is
really
really
significant
number,
and
that
shows
that
if
you
do
follow
these
checklist,
your
research
will
be
more
widely
applicable
to
a
lot
of
people
and
a
lot
of
people
will
use
it
in
their
own
work,
so
yeah.
So
next
I
come
to
the
code
submission
policies
that
were
used
in
the
machine
learning
field.
A
So
recently
in
icml,
2019
and
eures
2019
community
have
rolled
out
an
explicit
code
submission
policy.
So
there
are
many
concerns
in
the
code.
Submission
policy
regarding
data
set
confidentiality,
proprietary
software
and
so
on.
So
basically
like
when
the
code
submission
policy
was
given
out,
it
was
written
that
we
that
it
expects
code
only
for
accepted
papers
and
only
by
camera,
ready
deadline.
A
But
then
there
was
a
lot
of
like
backlash
because
like
for
example,
in
case
of
data
set
confidentiality,
a
lot
of
industry
like
researchers
says
that
their
data
set.
They
cannot
like
release
it,
and
that
is
specifically
in
case
of
medical
imaging.
A
But
if
that
is
the
case,
then
one
workaround
is
to
like
provide
complementary
empirical
results
on
open
source
benchmarks,
and
that
would
probably
add
to
more
value
of
your
work.
Then
proprietary
software
is
like
a
common
like
common
claim
for
industry
researchers
as
well,
but
in
that
case
we
suggest
that
if
you
are
in
industry,
you
can
also
like
provide
some
minimal
code
base
which,
which
might
not
have
the
same
training,
but
that
has
the
similar,
like
expected,
results
on
a
small
benchmark.
A
So
that
would
help
a
lot
like
that
would
help
the
community
a
lot,
because
if
you
remember
like
there's,
a
lot
of
papers
came
out
like
bird
and
gpt3,
but
still
the
community
ended
up
replicating
them
using
their
own
code
within
weeks
and
months.
So
it's
like
having
a
proprietary
code
out
like
not
the
proprietary
code,
but
rather
a
simple
version
of
your
code
out.
A
It
would
be
very
helpful
for
the
community,
so
this
graph
I
show,
is
that
how
many
papers
are
basically
releasing
with
their
code,
so
in
eurips
2017
we
started
analyzing
this
and
we
had
like
37
percent
of
code
shared,
whereas
right
now,
that
number
has
reached
to
more
than
75
percent,
which
is
very
encouraging,
and
this
is
what
we
want
to
like.
Go
towards
that
all
conferences
should
have
like
hundred
percent
code
submission
submitted.
A
So
finally,
in
the
open
science
movement,
I
want
to
talk
about
some
steps
that
we
took
in
terms
of
reproducibility
challenges,
so
we
introduced
the
ml
reproducibility
challenge
in
iclear
2018.,
so
this
challenge
is
essentially
very
unique
where
given
a
set
of
papers
that
have
been
accepted
or
even
papers
that
have
been
submitted
to
a
conference,
you
take
those
papers
and
then
you
try
to
like
reproduce
parts
or
full
of
the
paper,
and
then
you
essentially
submit
a
report
on
how
well
or
how
bad
your
reproducibility
effort
went.
A
But
again,
reproducibility
is
not
a
binary
issue.
You
cannot
just
ask
like
okay.
Is
this
paper
reproducible
like
that
question
is
very
difficult
to
answer,
because
a
paper
consists
of
a
lot
of
different
moving
parts
and
that's
why
these
kinds
of
challenges
are
important,
where
people
can
dive
deep
into
different
claims
of
the
paper
and
try
to
extract
if
certain
things
are
not
reproducible,
why
they
are
not
reproducible
and
that
helps
to
the
to
the
basically
the
information
of
the
original
paper.
So
the
motivation
of
the
challenge
is
not
at
all
to
be
adversarial.
A
Now,
how
is
the
challenge
structured?
So
essentially
we
start
with
the
process
of
claiming
a
paper,
so
we
want
to
encourage
people
to
work
on
different
reproducibility
aspects
of
different
papers.
So
basically
we
want
to
increase
the
like.
We
want
to
broaden
like
how
many
papers
are
being
considered,
so
we
added
like
a
claim
like
maximum
claim
list,
because
in
our
initial
conferences,
we
found
that
people
tend
to
replicate
papers
which
are
easy
to
do
rather,
and
that
would
like
lead
a
lot
of
students
like
working
on
the
same
paper
now.
A
I
should
also
like
clarify
on
what
is
the
like:
what
are
the
people
who
actually
work
on
these
reproducibility
challenges?
So
we
see
a
lot
of
students
working
on
it
and
a
lot
of
early
career
researchers
working
on
it,
because
this
is
a
great
way
for
students
to
quickly
learn
or
quickly
dive,
deep
into
the
machine.
Learning
state-of-the-art
machine
learning
literature,
but
also
we
see
a
lot
of
contributions
from
industry
as
well.
A
So
we
have
also
like
divided
into
like
three
different
tracks,
so
we
have
the
baseline
track
where
you
can
work
on
top
of
the
baselines
that
are
used
in
the
paper
because,
most
of
the
times
the
baselines
are
not
like
not
studied
at
all.
So
you
should
like
basically
try
to
replicate
the
baselines
and
try
to
do
ablation
studies
in
them.
Then
a
third
one
is
to
do
ablation
studies
on
the
code
that
is
given
by
the
authors.
So
you
take
the
same
code.
But
then
you
do
ablation
studies
on
different
model
components.
A
You
do
hyper
parameter,
searches
and
that's
how
you
end
up
to
learn
more
about
the
paper
and
also
add
to
our
understanding
of
the
paper
and
then
finally,
the
hardest
is
the
replication
study
where
it
is
where
you
take
the
you:
do
not
use
the
same
code
base
as
provided
by
the
authors.
A
You
create
the
code
from
scratch
and
that
became
like
super
challenging,
and
but
we
were
very
glad
to
see
that
a
lot
of
students
were
trying
to
do
do
this
track
and
that
led
to
a
lot
of
interesting
discussions
and
interesting
outcomes,
and
then
the
students
or
or
people
who
are
working
on
it
can
submit
their
work.
So
in
case
of
new
rips
29
2019,
we
did
the
review
process
in
open
review
and
open
review.
Helped
us
a
lot
to
set
up
the
different
like.
A
Basically,
they
created
an
entire
different
portal
for
us,
because
that
portal
was
tied
to
new
rips
2019
accepted
paper.
So
people
who
are
reading
through
the
accepted
papers
can
easily
link
to
the
corresponding
reproducibility
challenge.
So
I
encourage
you
to
like
go
to
our
open
review
website
to
see
the
challenge
like
the
reports.
Everything
is
public
and
once
the
like,
the
reviews
are
done.
So
we
use
the
same
pool
of
reviewers
from
eurips,
and
luckily
we
had
like
a
lot
of
reviewers
so
yeah.
A
Now
we
had
like
more
63
universities
participating
in
the
challenge
and
10
institutions
participating
in
the
challenge,
and
we
want
to
see
these
numbers
grow
and
we
had
five
machine
learning
courses
throughout
the
world
who
had
like
registered
specifically
like
making
mandatory
this
challenge
as
a
part
of
their
final
project.
A
If
you
are
an
instructor
like
this
is
a
great
opportunity
for
you
to
use
this
challenge
as
a
final
project,
because
the
timeline
that
we
do,
we
try
to
launch
this
challenge
in
fall
so
that
by
end
of
fall,
your
students
can
submit
certain
reproducibility
reports.
A
So
we
are
very
glad
to
these
about
the
students
and
we
worked
closely
with
them
to
get
their
reports
published
like
whoever
was
selected
in
the
top
ten,
and
we
want
to
like
continue
this
trend.
A
So
there's
also
related
work
in
the
community.
Jesse
dodge
has
been
working
on
emlp
reproducibility,
so
em
nlp
is
a
conference
in
natural
language
processing
and
he
did
like
a
similar
reproducibility
challenge
with
students
of
university
of
washington
in
winter
2020,
and
it
is
very
good
to
see
that
other
venues
are
opening
up
for
reproducibility
challenges.
A
There
has
been
a
lot
of
workshops
prior
to
this
challenge
in
icml,
2017,
2018
and
2019
authored
by
rosemarie
nk,
and
all
these
workshops
had
the
same
objective
of
people
to
submit
reproducibility
reports
and
then
disseminating
on
their
reports.
A
A
B
Let's
see,
I
think
we
have
a
couple
of
questions.
I
think
the
the
first
one
do
you
wanna
read
it
steve,
since
he
also
had
a
very
similar
question.
C
Sure
yeah
yeah
so
kusum.
However
you
want
to
do
it,
I
mean
we
can
we
can
read
the
the
q
a
things
to
you
or
you
can
you
can
view
them?
C
But
let
me
just
like
read
this
first
one,
because
I
also
wanted
to
generalize
it
a
little
bit
which
machine
learning
architecture
comes
with
the
best
for
producibility,
and
I
guess
my
generalization
is
more
about
like
if
you
could
expand
into
the
different
subfields
of
machine
learning,
which
ones
are
sort
of
the
most
and
least
reproducible,
and
then
are
these
related
more
to
the
cultures
of
the
communities
or
like
the
stabilities
of
the
methods.
Are
there
interesting
things
that
you've
seen
there.
A
Yes,
so
yeah
that's
a
great
question
and
there
is
no
definitive
answer
for
that.
So
basically,
the
way
I
see
it,
the
like
the
range
of
reproducibility
varies
actually
with
different
subfields.
So
if
you're
looking
at
reinforcement
learning,
the
problem
is
more
severe
there
because
of
the
variation
in
the
training
in
the
environments,
whereas
in
terms
of
like
computer
vision
or
natural
language
processing,
the
there
is
still
issues,
but
it's
not
as
deep
as
that,
but
still
like
the
like.
The
variance
of
the
models
also
like
differ.
A
If
you
ask
about
like
what
kind
of
models
would
be
more
reproducible,
that's
that's
a
question
that
we
really
want
an
answer
from
the
like
right
now.
The
notion
is
that
if
you
are
working
on,
let's
say
like
a
large
scale
model,
so
let's
say
bert
or
gbt3
like
these
models
could
have
like
better
reproducibility
because
they
are
actually
working
on
a
large
data
set.
A
But
then
that
is
not
quantified
anywhere,
and
it
is
very
hard
to
quantify
that
because
then
you
have
to
like
run
gpt3
on
your
platform
for
like
n
number
of
runs,
which
is
a
huge,
a
huge
costly
operation,
so
yeah.
So
there
is
not
a
definitive
answer
to
which
types
of
architectures
are
more
reproducible.
A
All
I
can
say
that
we,
whenever
we
are
reporting
our
own
like
model
performance,
we
should
give
like
a
good
variance
of
how
the
different
model
performance
are
working
so
that
we
get
like
a
good
notion
out
of
it.
B
Okay,
maybe
I
can
ask
another
question
of
my
own,
so
so
you
know
like
at
the
end
of
the
day,
the
research,
the
real
contribution
of
the
scholarly
contribution
of
over
research
is
really
the
code
right.
It's
not
the.
B
B
A
Yes,
so
thanks
for
the
great
question,
there
are
like
a
lot
of
challenges
that
I,
like
a
briefly
covered
on
the
code,
submission
policy.
So
the
major
challenge
is
that
in
a
lot
of
people
from
industry
are
not
able
to
submit
the
same
code
because
of
proprietary
reasons
and
also,
let's
say
in
medical,
imaging
there's
a
lot
of
like
data
set
restrictions.
So
if
you
are
working,
let's
say
right
now,
a
lot
of
people
are
working
on
covid
imaging.
You
cannot
publish
those
data
sets
right
because
those
are
very
sensitive
patient
information.
A
So,
in
these
cases
the
reproducibility
takes
a
hit
so-
and
this
this
is
like
this
occurs
across
all
conferences,
even
in
nature,
when,
like
computational
medical
scientists
like
publish
their
work,
some
of
these
works
do
not
at
all
accompany
with
code,
and
that
raises
a
lot
lot
of
confusion.
A
So
one
way
to
mitigate
this
is
that
if
you
have
a
proprietary
data
set,
that
is
fine.
You
can
also
like
report
a
similar
like
results
on
open
source
benchmarks,
for
example,
if
you
are
in
the
medical
community,
you
can
also
report
like
results
on
expert
on
other.
Like
chest,
x-ray
images
like
pad
chest
and
so
on,
which
are
openly
accessible
so
that
people
can
at
least
verify
your
claims
on
those
data
sets
now
in
terms
of
like
review
ability.
That
is
like
right
now.
A
We
do
not
like
we
do
not
stress
to
the
reviewers
that
okay,
you
have
to
like
run
the
code
as
as
it
is
given,
but
we
want
to
like
move
towards
that.
But
then
it
is
very
difficult
for
reviewers
as
well
to
set
up
the
same
code
and
same
dependencies
and
so
on.
Some
conferences
like
especially
in
computational
in
computational
medicine
or
even
in
computational
neuroscience.
A
People
have
tried
to
advocate
for
jupiter
notebooks
to
be
submitted
alongside
your
code,
which
has
like
a
nice
property
of
like
replicability,
so
that
reviewers
can
just
run
the
jupyter
notebooks,
but
that
is
more
feasible
if
your
training
doesn't
include
like
machine
learning.
Training
like
if
your
training
is
just
can
be
done
on
cpu,
then
you
can
do
that
in
case
of
machine
learning
that
becomes
more
challenging.
B
Thank
you,
I
think
yeah.
My
question
essentially
like
was
for
a
summary
of
what
you
already
described
yeah.
I
think
we
have
many
many
other
questions,
but
most
of
them,
as
I
see
them,
are,
will
probably
be
answered
in
the
second
part
of
your
talk.
So
maybe
I'll,
let
you
finish
your
talk
and
then
we
can
come
back
to
some
of
these
questions.
A
Okay,
awesome
so
yeah,
so
till
now
we
have
like.
I
was
basically
looking
into
the
problems
in
the
community,
but
now
I
want
to
talk
about
like
how
can
you
perform
reproducible
research,
so
this
this
will
follow,
basically
a
talk
and
also
I
will
release
a
blog
post,
so
I
have
already
released
the
blog
post.
I
will
like
share
with
you
guys
after
the
talk
so
that
you
can
go
into
the
different
suggested
practices.
A
So
let's
say
you
start
with
a
simple
example:
let's
say
you
have
a
awesome
research
idea
involving
transfer
learning
on
mnist
data
dataset,
where
transfer
learning
involves
like
learning
from
one
task
and
generalizing
to
the
next,
and
you
are
very
excited
about
setting
up
this
project.
You
you
have
like
looked
into
the
prior
research
and
you
are
just
starting
to
code,
so
you
start
with
the
basics.
Now
in
this
talk,
I'm
assuming
that
you're
working
with
pytorch,
but
you
do
not
have
to
work
on
that
you
can
work
with
tensorflow
as
well.
A
So
let's
say
you
set
up
like
data
loaders,
you
set
up
training
loop,
you
set
up
test
loop
and
now
you
are
running
the
experiments,
but
soon
you
figure
out
that
you
have
too
many
arguments
in
your
training,
like
you
have
so
many
different
model
arguments
and
these
model
arguments
like
kind
of
increase
as
as
you're
progressing
in
your
research.
So
what
you
can
do
about
it,
because
if
you
miss
certain
arguments,
then
you
have
to
like
add
those
arguments
as
a
default
in
your
arc
parse.
A
So
there
is
a
lot
of
like
nice
tools
available.
One
of
the
tools
that
I
recommend
very
much
is
pytorch
lightning
that
helps
in
maintaining
the
different
configurations
that
you
run.
So
basically
it
creates
a
copy
of
the
configurations
in
csv
files,
so
that
you
can
refer
to
that
from
when
you
last
run
your
code,
but
still
maintaining
this
long
set
of
arguments
is
trickier,
in
my
opinion,
so
for
an
easy
management
of
config.
A
You
can
use
like
config
files
instead
of
argument
parsers,
so
config
files
can
be
either
json
or
yaml
files.
I
personally
prefer
yaml,
because
there
you
can
add
comments
and
you
can
like
have
this
arbitrary
large
number
of
configuration
files
and
you
can
like
easily
run
like
large
scale
experiments
using
these
configuration
files,
because
now
running
your
code
is
as
simple
as
just
mentioning
like
the
name
of
the
config
file.
A
Now
I
want
to
like
plug
certain
libraries.
So
there
is
this
nice
library
known
as
hydra,
so
you
should
check
it
out
and
hydra
also
contains
another
library
known
as
omega
conf.
So
these
two
libraries
like
together
gives
you
like
a
really
big
power
over
using
config
files,
because
there
you
can
use
like
inheritance
in
your
config
files,
which
is
very
useful.
So
let's
say
you
have
a
sample
config
file
of
your
basic
arguments
and
then
based
on
your
different
experiments.
A
You
just
like
inherit
that
config
file
and
just
modify
certain
values
that
you
want
to
inspect
in
your
current
experiment.
So
that
gives
you
like
a
really
nice
leverage
of
maintaining
config
files
and
the
important
thing
is.
You
should
also
release
these
config
files
in
your
open
source
code
so
that
people
replicate
using
them
now.
A
Okay,
so
you
have
set
up
your
code,
you
are
doing
inference
and,
ideally,
you
basically
first
infer
on
your
validation
set
to
improve
your
model.
And
finally,
when
you
are
about
to
write
your
paper,
you
should
evaluate
on
your
held
out
test
set,
but
for
this
script
you
also
need
to
like
save
your
model,
and
you
need
to
save
your
best
checkpoints.
A
So
the
next
practice
involves
effective
checkpointing.
So,
ideally,
you
should
save
as
much
as
your
resources
permit,
but
then
resources
comes
cars,
especially
in
hpc's
when
you
are
given,
like
a
small
amount
of
data
limit
to
work
on.
So
in
these
cases
it's
best
to
save
the
last
epoch,
as
well
as
the
best
performing
epoch.
So
you
should
have
like
some
validation
metric
based
on
on
which
you
determine
which
training
checkpoint
is
the
best
performing
epoch.
And
you
save
your
model,
you
save
your
config
files.
A
Even
people
try
to
save
a
copy
of
their
code,
which
enables
like
greater
reproducibility
if
you
want
to
like
go
backwards
in
time.
So
if
your
model
fails,
you
can
load
from
a
previous
checkpoint,
and
this
example
that
I'm
showing
is.
This
is
how
pytorch
lightning
does
so.
Pytorch
lightning
is
a
nice
experiment,
management
library,
which
basically
writes
checkpoints
based
on
the
epoch
number,
and
you
can
also
modify
it
to
write
based
on
the
validation
performance,
so
that
gives
you
like.
Okay,
this
is
the
checkpoint
that
I
will.
I
should
use
now.
A
A
Now
this
is
very
important
in
terms
of
reproducibility,
so
you
should
log
all
your
features,
while
training
and
evaluation
now
features
include
like
validation,
metrics
and
training
metrics
to
log
now,
ideally
you,
you
can
also
save
your
logs
in
like
in
file
system
in
a
log
file,
so
you
can
use
like
python
logging
module,
which
you
can
like
redirect
to
save
on
a
file.
A
So
instead
you
can
use
logging
services
that
are
available
and
one
of
the
earlier
logging
services
that
have
been
used
is
known
as
tensorboard,
and
this
is
still
like
the
most
widely
used
in
machine
learning,
where
tensorboard
gives
you
like,
a
nice
local
runtime
of
like
basically
a
visualization
where,
once
you
log
your
values.
So
logging
is
very
simple:
you
just
like
log
metrics
and
then
this
this
tensorboard
will
show
you
different
metrics
and
different
plots.
A
Now
there
has
been
a
lot
of
criticism
of
tensorboard,
because
the
way
you
cannot
like
directly
interact
with
the
different
plots,
so
there
has
been
other
entrants
in
the
field
like
weights
and
biases.
So
this
is
like
really
a
cool
platform.
Where
you
can
check
interactive
plots,
you
can
go
into
interactive
sessions
of
different
hyper
parameters
and
you
can
see
which
hyper
parameters
affect
your
training
and
learning,
and
also
it
helps
to
give
you
like
a
large
view
of
different
plots.
A
There
are
other,
like
different
plotting
systems
available,
like
comet
ml
wisdom
ml
flow.
Now,
a
lot
of
people
still
prefer
tensorboard,
because
it
you
can
deploy
it
locally.
Now
the
issue
with
weights
and
biases
and
comet
ml
right
now
is
that
you
cannot
deploy
locally,
which
might
be
a
problem
with
industry
people,
because
these
systems
tend
to
log
a
lot
of
information
like
including
your
system
resources.
A
So
that
might
be
some
proprietary
information
that
you
don't
want
your
users
to
look
into
so
tensorboard
also
launched
a
tensorboard
like
an
online
version
these
days,
so
it's
kind
of
a
way
to
share
quickly
your
results
to
your
collaborators
and
that
uploads
your
tensorboard
system
to
google
servers.
A
So
you
can
take
use
of
these
different
login
platforms.
Now,
as
I
said,
practices
one
and
three
incorporate
like
best
experiment
management
practice,
and
I
I
I
highly
recommend
using
pytorch
lightning.
So
this
is
a
very
like
large,
growing
community,
where
they
have
like
really
really
best
practices
for
fast
training,
evaluation,
validation
and
they
expose
a
lot
of
different
loggers.
They
expose
a
lot
of
different
ways
to
save
and
evaluate
your
models
and
on
the
same
line.
A
A
Although
if
you
are
a
phd
student,
I
would
recommend
you
to
set
up
these
platforms
from
scratch,
because
that
would
give
you
like
a
greater
control
of
what
is
happening
once
you
understand
what
is
happening
and
where
then,
you
can
easily
like
switch
back
to
these
different,
like
different
things,
because
these
are.
After
all,
these
are
not
libraries.
These
are
frameworks,
so
you
have
to
essentially
learn
these
frameworks
and
if
some
like
something
changes
in
the
frameworks
later
on,
then
you
also
have
to
update
your
your
experiments
for
it
cool.
A
A
So
before
running
the
experiment,
I
recommend
you
to
like
basically
draw
n
number
of
seeds,
so
it's
ideal
to
use
five
seeds,
but,
depending
on
your
computation
budget,
you
can
also
use
three
seeds,
and
so
you
take
aside
these
five
seeds.
You
keep
them
stored
and
you
never
touch
it,
and
this
is
where
you
report
your
experiment
results.
So
you
should
not
optimize
the
seed.
You
should
report
on
whatever
seed
that
you
have
taken,
and
you
should
also
like
report
those
seeds
in
your
paper
as
well
as
in
your
code.
A
So,
ideally,
you
should
average
on
different
seeds
for
to
help
readers
understand
what
is
the
model
variance
now
I
have
given
a
simple
snippet
of
like
if
you
are
using
pie
torch,
how
to
set
your
seed
properly.
There
is
other,
like
snippets
available
to
do
the
same
in
tensor
flow.
Now
you
also
need
to
keep
an
eye
out
for
gpu
reproducibility,
because
that
is
something
that
is
not
ensured
by
even
by
torch
team
because
of
like
cuda
reproducibility
issues.
A
So
you
just
need
to
like
take
care
of
that
and
pytorch
like
recommends
these
these
apis
to
call
to
set
your
seed.
So
this
is
probably
like
the
most
important
part
in
reproducibility
of
your
work.
A
A
The
next
practice
is
the
most
obvious
is
versioning
your
code,
and
you
should
always
like
start
with
get
setting
up
a
get
repository,
but
the
key
thing
is
that
you
should
commit
early
and
you
should
commit
often-
and
you
should
also
add-
descriptive-
commit
messages
so
in
machine
learning,
what
you
can
do,
you
can
add
the
raw
results
directly
in
the
commit
message
that
you
can
say
that
okay,
I
ran
this
experiment,
and
this
experiment
gave
me
this
this
result.
A
So
as
a
example,
I
show
you
like
a
commit
message
from
the
hugging
face
repository
and
they
have
like
really
really
nice
way
to
like
commit
different
like
different
functionalities,
that
they
are
adding
to
this
repository
now.
This
is
an
open
source
repository,
so
they
have
to
maintain
these
standards.
You
do
not
have
to
do
necessarily
all
of
these
things,
but
at
least
for
your
own
sake,
you
can
add
in
as
more
descriptive
information
as
possible.
A
So
github
is
your
friend.
You
should
also
tag
versions
of
your
project
on
on
major
decisions
in
your
project.
So
let's
say
you
want
to
like
incorporate
a
new
model
architecture.
You
want
to
incorporate
a
new
training
architecture
so,
prior
to
that,
you
should
tag
the
versions
with
the
experiment
results,
and
you
should
have
like
a
separate
branch
for
small
proof
of
concepts.
Essentially,
you
can
use
github
to
its
full
potential.
A
Okay.
So
now
the
next
thing,
which
a
lot
of
people
kind
of
like
tend
to
overlook,
is
when,
let's
you
have
to
mind
your
data
or
the
data,
sets
that
you're
using
so
let's
say
in
the
transfer
learning
setup,
you
decided
it
would
be
a
good
idea
to
mix
certain
classes
of
fashion,
mnist
and
ms
digits
together
fashion.
Mnist
is
another
data
set
where,
instead
of
handwritten
digits,
you
have
like
small
images
of
like
clothes,
clothing,
different
types
of
clothing
and
bags,
and
so
on.
A
So
now
you
went
a
little
bit
too
deep
in
this
rabbit
hole
and
you
ended
up
creating
many
many
different
data
splits
and
finally,
you
are
not
you're,
not
sure
which
data
split
you
are
using,
and
you
probably
could
have
also
overwritten
one
of
your
data
splits.
So
you
are
not
getting
the
same
performance,
so
you
should
also
keep
track
of
your
data,
and
this
is
practice
number
six.
The
easiest
way
to
keep
track
of
your
data
is
to
add
it
to
a
git
version
system.
A
A
So
this
is
some
easy
things
that
you
can
try,
but
also
you
should
backup
your
data
periodically
using
google
drive
or
aws
s3
buckets
now
there
is
also
a
recent
entrant
called
data
version,
control
or
dvc,
which
is
kind
of
like
a
git
for
data
sets
where
you
can
add
in
your
data,
and
it
will
give
you
a
nice
way
to
track
your
data
as
well.
A
So
data
tracking
is
very
important.
If
you
are
releasing
your
own
data,
you
should
also
consider
adding
a
data
sheet
now.
Data
sheet
is
very
important
and
you
should.
I
encourage
you
to
read
this
paper
data
sheets
for
data
set
where
the
authors
propose
to
add,
like
a
readme
for
the
data
set
which
contains
stuff
like
motivation,
composition
of
the
data
set.
What
is
the
collection
process?
What
is
the
pre-processing
used?
What
are
the
use
cases
of
this
data
set
and
how
would
this
data
set
be
distributed
and
maintained?
A
So
these
are
very
important
points
that
you
should
have
if
you
are
releasing
your
own
data
set
cool.
So
now
you
have
done
your
experiments
and
you
have
like
really
shiny
plots.
You
show
to
your
supervisor
and
your
supervisor
is
like
okay.
I
don't
like
this
plot,
give
me
another
plot,
so
you
run
back
and
then
you
replot
again,
but
then,
after
a
certain
number
of
weeks,
you
cannot
find
your
plotting
code
again.
A
A
Because
that
helps
you
a
lot
in
terms
of
like
paper
production
in
in
terms
of
like
submitting
your
paper
and
also
doing
camera
ready,
so
the
best
thing
to
do
is
maintain
notebooks,
so
you
can
maintain
like
a
set
of
jupyter
notebooks
in
a
separate
folder
in
your
code,
where
you
should
maintain
separate
notebooks
for
data
analysis,
result
analysis,
plot
generation
and
table
generation.
A
Why?
Because
this
will
help
you
a
lot,
because
when,
let's
say
you
get
some
reviewer
comments
and
you
want
to
update
certain
plots,
but
you
don't
want
to
like
update
the
other
plots,
you
can
just
run
your
plotting
cells
and
that's
it
now.
You
should
add
these
notebooks
to
github
and
github.
Also
like
renders
the
notebooks
in
line.
So
you
can
like
share
your
intermediate
results
with
your
peers
and
your
collaborators.
A
Now
you
can
also
like
supercharge
the
existing
jupiter
notebooks
by
using
jupiter
contrib
extensions.
So
this
gives
you
a
lot
of
like
powerful
tools
from
collapsing
cells
to
collapsing
headers
to
like
tocs
and
everything
you
can
also
want
to
like
share
your
results
using
collab
like
google
collaboratory,
which
gives
you
like
a
gpu
and
tpu
runtimes,
and
you
can
also
use
binder
to
which
is
another
service
which
gives
you
like
jupiter
notebooks
associated
with
a
virtual
machine.
A
When
you
need
to
update
the
results
on
your
paper,
you
can
also
like
access
like
rerun
the
cells,
and
you
can
use
something
like
paper
mail
api,
so
that
the
notebooks
like
you
can
run
your
notebooks
in
a
different
set
of
parameters.
From
the
original
notebooks,
so
all
of
these
like
tools-
you
don't
have
to
worry
about
it.
A
I
will
like
I
have
written
everything
in
my
blog
post
and
I
will
share
it
in
the
end
of
this
talk
and
also
another
point
that
if
you
are
like
maintaining
tables
pandas,
have
this
nice
api
to
latex,
which
I
use
a
lot,
and
that
gives
me
like
a
nice
latex
tables
to
my
experiments
without
having
to
like
copying
the
results.
Hand
like
in
my
hand,
to
the
paper.
A
So
the
next
practice
is
reporting
the
results.
So
you
should
always
try
to
report
the
results
with
proper
error
bars.
So,
as
I
said
that
in
like
there's
a
problem
of
like
seeds,
so
you
should
not
run
grid
search
on
your
set
of
seeds.
So,
as
I
mentioned,
you
set
aside
a
set
of
seeds
and
you
basically
run
your
model
on
top
of
it
again
and
again,
and
that
gives
you
a
nice
variance
in
the
results.
A
So,
even
if
you
are
reporting
your
table,
you
should
like
mention
like
the
confidence
intervals
and
different
variants
in
your
paper
now
I
define
like
different
criterias.
One
criteria
could
be
like
multiple
seed.
Another
like
higher
part
of
reproducibility
is
multiple
data
sets.
A
So
you
can
also
report
your
results
as
a
variance
on
multiple
datasets,
and
that
would
be
like
a
super
higher
bar
of
reproducibility
of
generalization,
and
even
if
your
model
has
larger
variants
on
different
data
sets,
it
would
be
still
encouraged
to
report
them,
because
then
people
will
have
a
better
understanding
of
how
your
model
is
working
cool.
So,
okay,
a
lot
of
practices.
We
just
have
three
more
left
beard
with
me.
So
practice
number
nine
is
managing
dependencies,
and
this
is
very
important.
A
Irreproducibility
often
stem
stems
from
software
deprecation,
so
basically
to
replicate
a
published
work.
The
first
thing
to
do
is
to
match
the
same
development
environment,
the
and
containing
the
same
libraries,
and
that
the
program
expects.
Thus
it
is
crucial
to
document
the
libraries
and
their
versions
you
use
in
your
experiments
after
your
experiments
are
stable.
You
should
leverage
pip
or
conda
to
like
collect
the
requirements
in
a
file,
and
you
should
like
add
it
in
your
repository
now.
A
You
when
you
use
python
like
this,
is
a
nice
way
to
keep
track
of
the
different
libraries,
but
there
could
be
other
factors
affecting
as
well,
so
it
would
be
even
better
to
use
like
docker
or
singularity
containers,
so
these
are
like
virtual
machines,
which
basically
you
can
upload
your
docker
file
and
your
singularity
file
to
like
a
service,
and
then
that
would
be
like
easily
reproducible
with
the
same
exact
environments.
A
Now,
let's
say
you
do
not
like.
You
did
not
have
worked
on
docker
like
starting
from
your
training,
because
training
on
docker
is
a
bit
trickier.
You
have
to
use
nvidia
docker,
and
then
you
have
to
run
on
systems
which
support
it.
A
lot
of
hpc's
do
not
support
docker
from
the
fly
they
support
singularity
instead,
so
you
can
use
something
like
repo
to
docker
so
repo
to
docker.
You
can
use
it
to
convert
your
existing
repository
in
a
docker
format,
and
then
you
can
change
it
and
you
can
use
your
like.
A
A
Then
comes
the
next
practice
of
open
sourcing,
your
research.
So
after
your
paper
is
released,
you
should
consider
open
sourcing
your
work
now.
This
adds
visibility
to
your
paper
and
this
encourages
reproducible,
research
and
this.
This
is
very
important
because
you
can
like
this
is
basically
the
hallmark
of
reproducibility.
If
you
have
like
a
good,
well-documented
code
alongside
your
paper,
like
everyone
will
love
it
and
everyone
will
basically
work
on
top
of
it,
and
that
will
give
you
more
citations
on
on
like
in
that
regard.
A
So
there
is
a
like
a
great
service
coming
up,
which
is
like
papers
with
code
and
in
this
service
you
can
list
your
like
code
in
the
paper
and
so
that
people
have
like
more
visibility
of
your
like
released
code.
A
Now
before
you
release
code,
there
are
some
pre-release
checklists
that
I
want
to
like
mention.
So
basically,
I
talked
a
lot
about
like
maintaining
different
commits
for
your
research,
but
then
those
commits
are
are
especially
for
you
to
read
and
for
you
to
understand
where
you
did
certain
changes
now.
The
way
we
do
research
is
very
messy.
We
tend
to
like
just
fix
small
things.
We
say
fix
this
bug
or
no.
A
This
is
not
working
and
stuff
like
that,
but
these
commit
messages
when
it
becomes
public,
it
might
become
like
a
a
bad
thing
for
the
people
to
read.
So
it's
ideal
to
squash
your
commits
in
the
public
branch
to
a
single
commit
before
you
make
your
repository
public,
so
that
helps
you
to
remove
the
unwanted
commits,
and
it
also
helps
you
to
like
remove
any
sensitive
information
that
you
might
have
in
your
commits.
So
you
should
also
make
sure
that
your
code
doesn't
contain
any
api
keys.
A
So
if
you're
using
like
like
weights
and
biases
and
comet
ml
a
lot
of
times,
we
forget
that
our
api
keys
are
still
in
the
code,
so
you
have
to
like
remove
them
before
releasing
your
code,
so
you
also
need
to
keep
an
eye
out
for
hard-coded
file
locations
so
that
people
can
run
your
code
in
a
separate
environment.
A
You
should
also
format
your
code
properly
to
improve
readability,
so
you
can
use
something
like
black.
So
it's
a
python
formatter
which
formats
your
code
in
a
nice
readable
way.
So
you
can
just
like
black
all
your
code
at
once,
like
you
can
just
run
black
then
star,
which
includes
all
your
codes
and
then
it
would
format
them
in
a
really
nice
way
and
the
final
part
very
important
is
to
document
your
code,
so
you
should
add
like
documentation
as
much
as
you
can.
I
know
it's
it's.
A
It
gets
a
bit
difficult
to
do
all
of
these
things
at
once,
but
as
much
as
you
can
in
your
libraries
in
your
function,
calls-
and
especially
it
would
be
great
if
you
add,
like
tensor
dimensions
in
your
input
and
output
of
your
function,
so
that
will
help
like
machine
learning
community
to
understand
okay,
what
this
is
the
tensors
that
are
going
in
and
coming
out
of
this
function,
so
practice
11
is
effective
communication
with
readme.
A
So
once
you
release
your
repository,
you
should
also
like
add
these
information
from
the
machine
learning
code,
completeness
checklist
in
your
readme,
such
that
your
repository
gets
a
lot
of
starts
and
a
lot
of
publicity,
and
it's
only
not
only
about
publicity
like
people
can
easily
replicate
the
results.
If
you
have
like
already
shown
in
the
in
the
readme,
it's
also
really
good
to
have
like
a
contributing
guide
so
that
if
people
want
to
contribute
your
project,
they
can
they
know
what
to
work
on
and
how
to
work
on.
A
And
these
days
it's
also
really
important
to
release
a
blog
post
surrounding
your
paper,
which
is
kind
of
like
an
informal
document
where
you
talk
about
different
things
that
you
work
on
in
your
paper
and
these
days,
like
just
for
publicity,
people
also
like
post
their
paper
and
code
in
twitter
and
because
a
lot
of
like
academic
researchers
like
discuss
stuff
in
twitter
these
days,
that's
also
a
good
way
for
effective
communication,
so
cool.
So
I'm
at
my
last
practice
and
it
has
been
a
lot.
A
But
these
are
very
important
practices
and
the
last
one
is
also
very
important,
is
to
test
and
validate
your
setup
in
a
different
machine.
So,
as
I
said
that
you
have
to
take
care
of
your
hardcoded
parts
and
your
dependencies
and
everything,
but
the
best
way
to
ensure
that
everything
is
working,
is
that
if
you
use
google
cloud
or
aws
or
seo
to
spawn
up
a
small
environment,
and
then
you
just
test
the
inference
of
your
model
or
just
training,
one
epoch
of
your
model.
A
So
that
would
give
you
a
good
sense
of
like
how
your
model
is
working
like
whether
there
is
certain
like
issues
in
the
hard
coding
of
paths
or
certain
like
dependencies
that
you
haven't
mentioned
so
yeah.
These
are
all
12
practices
that
I
basically
wanted
to
talk
about,
and
the
key
takeaways
of
this
talk
is
that
reproducibility
in
machine
learning
is
extremely
important
for
the
advancement
of
the
field
and
the
ml
community
is
coming
up
with
innovative
ways
to
encourage
reproducibility.
A
So
I
also
will
release
a
blog
post.
Like
it's
already
online
in
my
website,
cs
mega
dot,
ca
slash
till
day,
so
there's
a
tilde
case
in
her
four.
So
this
is
essentially
my
mega
user
name
and
practices
for
reproducibility,
so
I
will
share
it
in
the
slack
and
also
in
the
question
answering
session.
So
this
is
essentially
a
complimentary
blog
post
to
this
entire
talk,
so
in
the
blog
post,
you
will
find
all
the
tools
that
are
necessary
or
you
may
want
to
use
for
your
for
the
best
practices
now.
A
Finally,
to
conclude,
I
want
to
mention,
like
this
nice
phrase
used
by
my
supervisor,
joel
pino
at
natives
2018
is
that
science
is
not
a
competitive
sport
like
we
tend
to
like
look
into
it.
In
that
way
like
we
tend
to
try
to
beat
other
models
by
posting,
better
models
and
try
to
beat
certain
bass
lines,
certain
leader
boards
and
so
on.
But
science
is
not
about
only
beating
certain
baselines
or
certain
models,
it's
about
deeply
analyzing
what
is
happening
and
how
we
are
advancing
the
field.
A
So
we
should
all
take
care
about
our
work
and
we
should
give
enough
time
and
enough
care
to
our
research.
I
understand
that,
due
to
different
incentives
like
due
to
the
incentive
of
publishing
faster
or
the
incentive
to
like
get
your
results
out
like
before
you
get
scooped,
these
things
tends
to
get
like
not
looked
upon.
A
But
if
you
care
about
your
work
and
then
you
should
like
devote
more
time
to
it,
the
more
time
you
devote
to
these
issues,
the
better
your
work
will
be
long-standing
and
the
better
like
down
the
line
you
people
will
find
your
work
to
be
very
suitable.
A
So
thanks
so
much
for
listening
to
this
talk
thanks
to
joelle,
pino
shagun,
sudhani,
jessica,
ford,
matthew,
mclee
and
michelle
paganini
from
facebook,
ai
research
to
help
me
out
in
making
this
talk
and
giving
different
suggestions,
and
thanks
to
the
hosts
of
this
seminar
for
giving
me
the
opportunity
to
talk
and
thanks
to
all
these
people
who
have
been
involved
in
reproducibility
challenge
in
resigns
in
building
the
checklist
and
especially
open
review,
guys
who
have
been
very
helpful
in
setting
up
the
platform.
B
Thank
you
gustav.
This
was
really
a
great
talk.
I
mean
it's
a
tour
de
force
of
all
the
things
that
are
related
to
reproducibility
in
practice.
I
I
certainly
learned
a
lot
we
have.
I
I
think
we
have
many
questions.
Several
of
them
have
been
answered.
I
think
in
the
second
part
of
your
talk,
but
let
me
let
me
see,
let
me
let's
go
through
a
few
of
them.
There
are
different
levels,
so
the
first
one
is
is
reproducible?
Is
there
a
predisability
challenge,
still
open
to
work
on.
A
Yes,
so
that's
a
great
question,
so
we
will
again
relaunch
the
reproducibility
challenge
in
new
rips
2020..
So,
right
now,
like
tomorrow,
new
rips
is
going
to
announce
the
reviews
for
new
york's
2020,
but
the
final
paper
acceptance
will
be
released
somewhere
in
end
of
september
or
early
october.
So
that
is
exactly
when
we
will
launch
our
like
our
reproducibility
challenge
and
it
will
continue
towards
end
of
december
to
early
january
so
that
you
have
enough
time
to
work
on
it.
A
And
if
you
are
in
university,
I
would
encourage
you
to
like
contact
your
supervisors
or
professors
beforehand,
so
that
they
can
like
make
it
like
a
general
thing
to
participate
in
your
course,
and
we
also
like
list
the
courses
that
are
participating
in
our
website,
so
that
everyone
can
have
like
a
good
visibility
of
which
courses
and
which
institutes
are
more.
A
B
Problem,
another
question
is
which
of
these
practices
are,
in
your
opinion,
opinion
the
hardest
to
adopt.
A
A
So
let's
say:
if
you
train
on
like
pi
torch
0.9,
and
then
the
people
like
upgrade
to
python
1.5,
then
certain
things
might
not
work,
so
you
have
to
like
mention
explicitly
like
which
of
the
libraries
that
you
are
using,
and
it
would
be
really
ideal
to
like
share
these
singularity
or
docker
containers,
but
that's
a
kind
of
a
lot
of
work
for
people
to
do
so.
That's
why
these
repo
to
docker,
like
these
services,
come
up
or
binder
or
collab.
A
These
services
come
up
now
in
case
of
like
maintaining
seats,
so
that
is
kind
of
like
a
that
like
a
different
problem.
The
problem
is
that
if
you
are
trying
to
show
that
your
contribution
is
significant,
if
you
like,
set
certain
seeds
early
on
and
you
see
that
your
model
is
not
performing
well,
you
will
have
the
urge
to
change
those
seeds,
and
that
is
where
the
difficulty
comes
up.
So
you
should
have
you
should
restrain
yourself,
and
you
should
say
that.
A
Okay,
I'm
not
gonna
change
the
seeds
because,
like
that,
would
be
kind
of
like
a
fair
assessment
of
like
reproducibility.
If
you
just
keep
your
seeds
aside
and
then
you
report
whatever
you
are
working,
so
you
should
focus
on
model
improvements
on
those
seeds
that
you
have
like
set
aside
on.
So
I
would
probably
like
term
these
two
practices
as
being
mostly
difficult,
but
I'm
sure
like
these
are
not
that
difficult.
If
you
are
like
careful
about
it,.
B
There
is
a,
I
think,
a
question
is
a
general
like
more
of
an
opinion
question
asking
for
your
opinion.
So
this
isn't,
it
says,
like
publishing,
trained
models
means
posting
binary.
Data
github
like
repos,
are
not
well
suited
for
such
purpose.
Is
there
a
youtube
like
repo
for
massive
binary
data
archives?
Who
will
pay
for
such
massive
publicly
available
storage.
A
Right,
so
that's
a
great
question
to
answer
that,
like
so
pytorch
team
has
released
this
pytorch
hub,
where
you
can
upload
your
trained
models
to
work
on,
and
otherwise,
if,
like
this
storage
is
an
issue,
you
can
like
store
it
to
aws
s3,
so
aws
s3,
you
can
store
in
a
long
term
archival
format
that
will
lead
to
very
cheap,
like
resource
usage.
So
it's
still
a
bit
costly.
A
But
if
you
are
like
coming
from
a
like
an
industry
or
a
lab,
you
you
should
like
get
your
lab
to
fund
it
out,
and
aws
s3
in
long-term
storage
is
cheap
enough.
Otherwise,
what
I
do
personally,
I
just
like
upload
the
model
checkpoints
in
my
google
drive,
but
essentially
I
pay
for
the
drive.
So
that's
one
issue
that
students
have
to
face
where
they
want
to
like
publish
these
these
model
binaries.
A
So
yeah.
You
should
like
kind
of
like
work
it
out
with
your
supervisor
for
a
nice
like
option,
a
lot
of
labs
do
have
like
a
common
like
an
enterprise
google
drive
which
you
can
use
to
share
these
model.
Binaries.
B
A
Yes,
so
as
a
student,
what
you
can
do
is
you
can
take,
take
help
of
like
hpc's.
So
if
you
are
a
student
in
canada,
you
have
access
to
compute
canada,
so
that
is
like
a
huge
resource
for
all
students
studying
in
canada,
where
you
can
take
access
to
like
a
huge
array
of
gpus.
A
If
you
are
in
us,
they
don't
have
a
compute
canada-like
setup,
but
there
exists
a
lot
of
different,
like
training
hpcs
from
different
institutions,
where
you
can
apply
with
your
project
proposal
and
take
use
of
it.
But
yes
for
sure.
I
agree
that
if
you
are
like
a
student
from
like
a
lab
which
do
not
have
all
these
gpu
resources
in
hand,
reproducibility
gets
a
lot
difficult.
A
So
in
that
case,
probably
the
major
focus
is
not
to
work
on
like
problems
that
require
a
lot
of
scalability
instead
of
looking
into
problems
that
are
more
like
more
analysis
driven
or
that
require
a
lot
deeper
thinking
of
the
different
architectures.
A
So
so,
to
give
you
an
example
like
it's
very
costly
to
train
bird
or
transformer
type
language
models,
but
it's
easy
to
infer
them
and
that,
like
led
to
like
a
large
subfield
of
using
probing
tasks
where
probing
tasks
are
simple,
linear
models
that
you
train,
on
top
of
your
language
model,
to
to
see
whether
your
language
model
is
learning
these
syntactical
or
semantic
like
cues.
A
So
these
types
of
research
fields
like
come
up
on
basically-
and
these
are
some
of
there's
a
lot
of
exciting
research
areas
that
you
can
work
on
which
do
not
need
to
like
where
you
do
not
need
to
like
scale
up
this.
Much,
for
example,
in
reinforcement
learning
most
of
the
like
fundamental
work
can
be
shown
in
smaller
environments,
which
could
be
like
easily
run
in
your
own
lab
setup,
so
yeah.
So
that's
kind
of
like
my
suggestion.
B
Thank
you.
Another
question
is:
does
regularization
have
a
positive
effect
on
reproducibility.
A
Yes,
yes,
so
regularization
has
a
like
a
really
good
effect
on
reproducibility
and
a
lot
of
papers.
They
tend
not
to
focus
on
it
so
like,
for
example,
if
you
can
propose
a
model
which
is
like,
which
is
a
very
fancy
model,
but
then
you
can
use
the
like.
You
can
use
regularization
on
your
baseline
to
show
that
the
baseline
is
in
fact
better
than
your
proposed
model,
so
we
should
be
also
like
looking
into
that.
A
So
recently,
there
is
a
like
a
nice
paper
which
came
up
where,
like
people
showed
that
you
can
train
an
mlp
to
replicate
the
cnn
architecture
by
using
l1
regularization,
and
these
these
are
like
really
important
findings,
and
so
you
have
to
like
keep
this
in
mind
while
proposing
a
new
work
and
that's
why
it
is
very
in
important
to
first
evaluate
the
baseline.
So
if
you
evaluate
the
baselines
thoroughly
by
using
all
these
regularization
techniques
or
ablations,
then
you
will
know
exactly
where
your
contribution
fits
in.
B
Maybe
another
question
on
on
like
how
to
get
more
of
this.
The
question
is:
do
you
think
there
will
be
an
open
source
courseware
that
teaches
researchers
pedagogically,
how
to
structure
projects
to
encapsulate
all
these
techniques
and
the
predisability
that
this
is
useful
for
people
outside
of
ml
too?.
A
Yes,
so
when
I
actually
looked
into
this
problem-
and
I
found
one
coursera
course
that
is
available
for
reproducibility,
but
that
course
I
felt
was
a
bit
limited
in
content,
so
it
would
be
great
if
the
community
like
comes
up
with
a
like
a
reproducibility
course.
A
As
far
as
I
know,
like
my
supervisor
plans
to
do
the
course
sometime
in
mcgill,
so
I
will
ask
her
again
so
so
once
she
has
like
all
the
necessary
like
setup,
then
she
can
probably
like
take
this
course
and
to
to
mention
another
course
like
there
has
been
in
in
that
there
was
a
course
which
looked
into
the
effective
machine
learning
training
pipelines
like
that
was
a
more
undergraduate
level
course.
I
think
that
was
in
mit.
B
Thank
you
again,
kustav.
I
think
there
are
several
other
questions,
but
maybe
they
can
be
taken
on
slack
if
that,
and
these
wants
to
follow
up
with
their
questions,
on
slack
that'll
be
great
sure
sure,
yeah.
A
B
For
for
this
great
talk
and
yeah
the
many
resources
I'm
looking
forward
to
reading
your
blog
post,
I
think
some
of
the
attendees
already
said
that
they
are
they're
there.
They're
reading
it
yeah
and
thanks
for
everyone
for
joining
the
the
lecture
today
next
week
will
be
a
break
and
then,
after
that
we
will
have
a
lecture
on
uncertainty,
quantification
in
deep
learning,
and
so
I
hope
that
you
join
us
then,
and
until
then,
please
be
safe.