►
From YouTube: Week 1 - Introduction to PyTorch - Evann Courdier
Description
More about this lecture: https://dl4sci-school.lbl.gov/evann-courdier
Deep Learning for Science School: https://dl4sci-school.lbl.gov/agenda
A
All
right
so
Mustafa
just
introduced
the
school,
so
I'm
gonna
introduce
Evan
and
then
we'll
we'll
kick
it
off.
So
we're
very
pleased
to
have
Evan
Cordier
joining
us
today
from
epfl,
that
is
the
Swiss
Federal
Institute
of
Technology
at
Lausanne
Switzerland.
So
Evan
works
on
fast
image.
Segmentation
networks
for
drones
got
a
master's
degree
in
general
engineering
and
another
one
in
math
and
machine
learning.
A
A
So
Evan
is
gonna.
Talk
to
us
today
about
Pike
Gorge.
Last
year
at
the
school
we
did
tensor
flowers
without
as
appropriate.
Everybody
is
really
really
loving
my
torch
nowadays.
So
with
that
Evan
I'll
hand
it
over
to
you
and
we
do
have.
We
have
a
Q&A
feature
on
zoom.
So
hopefully
everything
goes
smoothly,
but
this
is
our
first
one.
So
if
there
any
sort
of
weird
hiccups
we'll
try
to
make
it
better
for
the
next
webinars,
but
please
use
the
Q&A
feature
to
submit
questions.
B
Good
so
welcome
everybody
to
this
fellow
introduction
to
deep
learning
with
titled.
So
instead,
so
I'm
evan
code,
you
doing
HD
in
depending
for
computation
and
I'm
working
with
pyro
everyday,
so
I
try
to
walk
you
through
the
basics
of
my
road.
So
what
to
expect
from
this
Tribunal?
So
at
the
end,
you
have
an
overview
of
the
biological
system
and
I
get
to
you
to
know
all
the
basics
tools
to
for
deepening
that
the
vitals
provides
I'm
going
to
be
able
to
read,
understand
part
of
curtain
and
free
air
quickly
start
coding.
B
So
the
format
of
this
webinar
is
one-hour,
101
15
minutes,
and
since
we
only
have
one
hour,
our
basic
knowledge
of
Python
and
object-oriented
programming,
some
garden-based
and
machine
learning,
and
maybe
some
experience
with
numerical
complete
computing
language
like
beam
PI.
For
example,
I
go
back
and
4/5
lights
on
my
books,
so
that
I
can
get
give
some
explanation
on
the
slides
and
then
move
on
the
netbooks
for
real-life
examples.
So,
let's
dive
in
so
what
spiders?
So
that's
scientific
computing
rivalry
that
is
written
for
Python.
B
So
it's
Python
first
and
it
works
very
much
like
name
Pi,
but
you
get
additionally
GPU
support
automatic
differentiation,
an
optimization
algorithm
that
goes
around
and
all
the
necessary
tools
for
deep
learning
that
you
might
want
to
use.
So
it
goes
from
all
the
Ross
functions,
optimizers
and
so
on.
Well,
have
a
look
at
that,
so
this
is
the
original
code
from
the
torch
library
to
bite
them
and
you
invite
us.
We
have
a
hole,
deep
running
pipeline
that
exists
so.
B
B
So
you
have,
let
me
some
pointers,
so
you
you
will
you
have
some
data
set
some
training
loop,
get
your
model
trained.
You
can
put
it
to
production
using
a
production
server.
You
can
do
your
training
distributed.
What
we'll
focus
on
for
this
presentation
is
how
to
get
your
data
available
for
your
model
to
be
trained
and
focused
on
how
the
training
loop
is
built.
B
What
I
want
to
do
is
basically
to
see
how
each
bit
of
the
training
loop
and
try
to
find
how
to
do
this
in
Pirates.
So
the
collision
of
a
program,
deep
learning
program
will
be
first
to
create
your
modern
load,
your
data
and
initialize,
all
the
hyper
parameter
that
you
might
need,
and
then
you
have
the
training
loop
where
you
would
get
some
samples
present.
Sorry
move
them
on
GPU.
B
If
you
have
one
compute
the
models,
prediction
the
loss
and
then
compute
the
gradients
of
the
loss
with
respect
to
the
parameters
of
sure
model
and
update
these
parameters.
So
this
is
the
training,
loop
and
I
want
to
focus
on
how
each
of
these
small
bits
in
pi
wrote.
So
that's
how
I
will
build
it
bit
by
bit
and
it
actually
all
starts
with
with
samples
which
are
actually
pencils
in
Python.
Let's
see
quickly
what
is
done
soon,
so
these
are
multi-dimensional
rays
very
much
I
came
by
and
similarly
to
moon
pie.
B
They,
like
the
interface,
is
similar
for
creation
indexing
masking.
So
here
is
an
example.
How
the
torch
interface
is
a
similar
to
the
new
PI
1
and,
let's
jump
in,
to
see
how
it
works
in
real
life
with
creating
tensors
and
so
on.
So
here
we
go
I
hope
everybody
can
see
properly
the
screen.
If
there
is
you
know
so
first
for
this
book
I
will
quickly
go
through,
because
things
are
quite
simple
and
you
may
want
to
go
back
to
it
later.
If
you
want
to.
B
If
you
want
to
find
more
information,
we
have.
We
have
a
finite,
so
this
one
I
will
quickly
go
through
so
to
import
the
phyto
forever.
We
need
to
import
torch
and
then
I
create
a
tensor
using
the
total
tensile
functions.
As
you
can
see,
my
term
store
here
has
two
dimensions.
This
is
something
you
can
check
using
the
in
function
and
it
answer,
and
you
also
have
shape
and
size
function,
to
check
the
size
of
also
given
themselves
here,
my
tenth
row.
B
It
has
two
times
three
FA
Cup
2003,
so
you
can
perform
a
lot
of
reparation
respite
Road
and
there
is
two
ways
to
call
me
operation
so
either.
You
can
call,
for
example,
torch
that
dancer
tours
that
operation
on
attention
or
you
can
click
directly
called
your
preparation
and
the
transfer
itself.
So
with
the
example
of
the
same
operation,
I
can
simply
I
can
simply
call
towards
that
firm
or
accept
some
this.
B
This
way
of
writing.
It
is
quite
useful
because
you
can
then
find
the
operators,
so,
for
example,
if
I
want
to
compute
the
norm
of
the
times
of
X
the
times
before,
I
can
simply
square
thumbs
or
compute
the
sum
and
then
square
it
changing
this
operator
is
quite
is
quite
common
in
my
coach
and,
of
course,
normal
function
that
can
allows
you
to
do
the
same
thing.
B
One
thing
you
want
to
pay
attention
to
in
by
coach:
it
is
the
is
whether
your
operation
takes
place
in
place
or
is
it
is
mutating
the
array.
So,
for
example,
here
I
look
at
the
add
operation,
so
the
operation
we
add
the
value
here
to
the
tensor.
So
let's
create
a
dancer
that
is
an
identity
matrix
and
if
I
apply
this
person
to
a,
we
should
get
the
term
so
that
as
hint
that,
indeed,
as
added
five
everywhere
but
dancer
hey,
it
hasn't
been
changed.
For
us.
B
There
is
an
in-place
version
of
this
add
operation,
hence
the
tongues
or
a
in
place,
and
here
you
can
see
that
a
has
indeed
been
changed.
So
that's
something
you
want
to
pay
attention
to,
because
in
some
in
certain
context,
in
pi
thought
you
might
need
to
know
when
your
tongue
sauce
has
been
changed
in
place
or
has
been
mutated.
So,
on
a
similar
note,
I
want
to
show
the
difference
between
the
assignment
operator
and
the
addition
assignment
operator.
B
So
here
I
have
terms
of
a
and
I
add
1
to
this
tensor
and
this
operation
here
create
Newton's
or
so.
Therefore,
if
I
run
we
sell,
the
a
and
a
B
for
tonsils
will
have
different
would
be
different
answers.
However,
if
I
use
the
addition
assignment
operator
here,
I
will
get
the
same
term,
sir,
because
a
is
modified
in
place,
xxx
operators.
So
that's
something
that
you
might
you
might
encounter,
and
you
might
be
aware
you
might
want
to
be
aware
that
this
in
this
case
you
get
different
answers.
B
Something
that
you
use
very
often
as
we're
in
deep
learning
is
reshaping
your
tenses
in
Python.
You
do
this
using
the
view
function,
and
so,
let's
create
a
term
store
that
has
one
dimension
and
a
size
of
six,
and
so
here
I
can
choose
to
view
my
attention.
My
cancer
has
a
2
x,
3
tensor,
and
that's
what
the
and
you
can
ask
Pyro's
makes
it
infer
the
missing
dimension
by
putting
minus
1,
which
is
something
you
see
also
quite
often
in
the
code.
B
B
Something
else
that
you
use
very
often
is
using
a
GPU.
So
in
python
you
might
want
to
you.
You
might
want
to
check
first
whether
you
have
it
is
available
on
your
machine.
So
it
is,
if
you
available
so
here,
my
personal
computers
day,
I,
don't
have
good
available,
but
let's
see
on
this
and
this
machine
I
do
have
the
available,
and
so
I
first
need
to
import
all
right
and
then
I
should
have
could
available
on
this
machine
as
well.
B
So
here
it's
true,
let's
create
a
dancer
and
move
it
to
GPU.
So
there
is
two
ways
in
Python
to
do
so.
One
way
is
to
call
a
CUDA
and
the
tonsure
itself.
So
the
first
time
you
do
so
it
will
take
a
little
while
and
then
it
will
be
emitted.
And
when
you
have
your
answer
on
GPU,
you
can
check
here
that
the
device
is
is
written
over
there.
You
walk,
you
may
want
to
bring
it
back
to
CPU
and
there
is
a
dot
CPU
functions
to
do
that.
B
B
This
proof,
this
first
part,
is
used
in
many
order
codes.
So
you
might
need
to
know
it.
However,
the
new
way
is
more
convenient
because
you
can
define
a
device
at
the
beginning
of
your
code
and
then
you
don't
need
to
where
we
are
worried
about
whether
you're
on
CPU
or
GPU
can
define
device
using
the
torch
device
function.
B
Here
you
specify
whether
you
want
to
be
an
CPU
or
GPU
and
which
is
the
number
of
the
GPU
you
want
to
use,
and
then
you
can
use
the
dot
tooth
function
to
move
your
uten
so
X
to
the
corresponding
device.
So,
for
example,
here
I
can
move
my
thumbs
or
X
to
G
view
using
that
direction.
So
that's
quite
useful,
as
I
said,
because
you
can
put
for
a
spectral
line
at
the
beginning
of
your
card.
That
will
check
if
good
eye
is
available
and
if
not,
it
will
switch
back
to
CPU.
B
A
B
All
right
so
two
last
bits
I
want
to
cover
about
tensors
are
the
time
conversion
and
the
conversion
to
name
by
so
convert
the
type
of
a
tensor.
You
can
use
the
same
to
function
as
to
move
to
move
it
and
for
two
different
device,
so
here
I've
created
it
and
so
Y,
and
we
can
see
that
by
default,
this
will
be
float32
tensor
and
if
you
want,
for
example,
to
convert
it
to
a
float
16,
you
can
do
so
with
this
kind
of
line.
B
B
Also
here,
I
have:
why
that
is
it
out,
make
him
call
the
dot
Mumbai
something
to
get
an
empire
right?
Alright,
so
that's
basically
it
for
the
basics
and
temp
source.
There
is
a
lot
of
function
that
you
will
discover
as
you
are
using
it.
So
what
I
propose
that
we
start
building
our
training
loop
right
now,
so
there
will
be
like
two
main
to
main
bits
which
will
be
initially
initializing
different
parameters,
models
and
so
on,
and
then
the
training
itself.
B
So
here
we
can
already
set
the
device,
so
we
import
Oh
thought
and
we
set
the
device
that
we
want
to
and
then
in
the
training
loop
we
can
already
move
our
samples
on
labels
to
the
proper
device,
and
now,
let's
see
how
we
can
get
these
samples
and
neighbors
from
a
dataset.
So
let
me
move
back
to
the
slides
and
so
really
from
recap
on
the
consoles.
So
that's
that's
what
you
want
to
probably
remember
from
what
I
just
what
have
just
shown?
B
I,
let
you
go
through
it
as
well,
when
when
you
would
provide
sites,
so
this
is
what
we've
just
done.
So
we
have
moved
the
sample
on
neighbors
to
GPU.
We
have
seen
how
to
do
so,
so
this
is
the
best
we
ever
added
and
now,
let's
see
how
we
actually
can
get
these
some
person
labels
from
a
real
dataset.
So
that's
the
crucial
aspect
of
deepening
to
have
data.
So,
let's
assume
here
we
have
a
training
data.
B
We
have
a
data
set,
and
here
we
have
samples
that
we
represent
with
their
squares
and
different
labels
that
are
presented
by
the
course.
So
this
could
look
like
this,
for
example,
on
your
file
system.
So
here
you
have,
you
could
have
two
folders
one
for
the
train,
speed,
the
other
one
for
the
validation
speed
and
each
of
these
folders
one
folder
per
class,
so
here,
for
example,
humanity
s,
dog,
dataset
and
now
to
make
the
to
make
the
link
between
your
file
system
on
your
model.
That
is
waiting
for
input.
B
Python
provides
a
dataset
class,
so
real
here
is
a
representation
of
the
data
suggests.
It's
an
object
that
you
can
query
with
an
index
and
that
we
return
the
corresponding
sample
from
the
dataset.
So
the
only
thing
that
the
dataset
has
to
implement
is
the
two
fungsten
Len
and
get
item.
So
Len
is
a
function
that
we
return.
The
number
of
items
in
the
dataset
and
that
item
is
a
method
that
given
an
index,
we
return
the
corresponding
sample
and
its
label.
B
So,
for
example,
in
practice
you
could
have
a
data
set
with
five
samples
and
then
your
data
set,
if
it,
if
you
call
the
Len
function
on
it,
should
return
the
number
of
something
in
the
data
set.
So
here
fine
and
then,
if
you,
if
you
index,
you
ask
from
index
four
of
the
data
stack
with
indexing,
then
it
should
return
the
fourth
index
in
your
data,
just
for
somebody
in
your
data
set
here,
I
come
alongside
with
the
label,
so
here
human.
B
B
So
then
you
have
this
that
dataset.
But
when
you
work
with
data,
you
often
need
to
process
batches
of
samples
to
suffer
the
data
and
the
dataset
class
doesn't
provide
this.
For
this.
You
will
need
to
use
a
data
loader,
so
data
logger
is
another
class
that
our
pythons
provides
and
that
which-
and
that
takes
a
dataset
object
as
input,
and
that
will
help
you
with
iterating
over
this
dataset.
B
So
you
can
use
it
to
create
batches
to
shuffle
the
data
to
sample
the
data
in
a
particular
fashion,
and
you
can
also
launch
multiple
workers
to
make
the
data
loading
faster.
So
here
you
can
see
when
you
use
it
in
practice,
you
will
be
carrying
a
mini
batch
to
the
data
loader,
which
will
be
occurring
each
index
to
the
data
set,
and
then
you
return
many
batches
of
data
that
it
might
have
suffered
if
you
have
occurred
it
to
do
so.
B
B
B
So
here
you
can
see
the
three
functions
that
are
needed
for
the
data
set
and
once
you
have
done
that
you
know
I
haven't
said,
of
course,
but
you
inherit
from
dataset
class
when
you
have
done
that
your
dataset
is
ready
to
be
used
so
now
to
actually
instantiate
an
object.
You
would
just
run
that
line.
So
here
you
instantiate
an
object
and
instance,
and
then
you
can
already
check
what
the
data
attribute
contains.
So
here
you
can
see
that
it
has
created
ten
random
numbers
and
we
can
check
the,
for
example,
the
first
one.
B
So
the
first
one
should
be
here
0.88
indeed,
and
the
labels
be
true
so
here,
as
you
can
see
it
returns
indeed
at
Apple.
You
can
check,
for
example,
that
if
you
create
the
fourth
one
here
you
should
get.
You
should
get
this
number
that
has
a
label
force,
so
our
dataset
is
working
properly.
Now
we
have
seen
how
a
simple
dataset
works.
We
can
already
use,
use
it
within
the
data
logger.
So
to
you
to
the
data
louder.
B
There
is
the
classroom,
I
wrote
that
called
data
louder
and
that
you
can
also
load
from
totes.
That
util
is
the
data
and
then
to
create
data.
Loader,
that's
very
simple:
you
just
need
to
provide
the
data
set.
You
want
to
iterate
on
and
then
the
bad
signs
that
you
wish
and
whether
you
want
to
fulfill
it
or
not.
So
here
I
said
that
I
wanted
shuttle
and
that
I
want
patch
size
of
pi,
and
then
you
can
simply
iterate
through
this
data
type,
using
a
simple
follow
so
for
sample
label
in
louder.
B
B
B
With
the
alien
vs
predator
data
set,
so
this
is
data
set
that
you
can
find
on
campion,
and
that
is
a
classification
data
set
between
image
of
aliens
and
predators.
So
here
on
my
on
the
repository,
you
can
find
such
data
set
with
a
trained
and
validation
folder
inside
which
you
have
the
two
classes
again
and
creator
of
this
folder.
You
have
multiple
images
that
looks
like
this.
So
here
is.
B
The
predator
here
is
an
onion,
and
now,
let's
create
a
dataset
class
to
load
the
data
that
is
actually
inside
the
inside
my
photos
yeah.
So
here
I
have
created
this
a
lien
creditor
dataset.
So,
as
you
can
see,
the
structure
is
the
same
as
the
dummy
data
said
before.
So
we
have
an
init
function,
loading
the
data,
a
Len
function,
returning
the
length
and
a
get
item
function
to
give
a
sample
when
you
provide
an
index.
So
here
in
this
case,
you
don't
want
to
load
all
the
image
in
memory.
B
B
The
number
of
paths
I
have
in
my
data
set
this
way
and
then
the
gate
item
for
given
index
will
return,
will
return
the
image
and
target
so
first
from
the
index,
I
can
retrieve
the
path
I'm.
The
target
label
that
I
have
defined
before
I
can
open
the
image.
I
can
open
the
image
and
and
then
I
simply
need
to
return.
It
I'm
sorry
target.
So
here
I
get
the
path
from
the
from
G
array:
I
open
the
image
and
I
return
the
image
around
side.
B
The
target,
once
this
is
done-
let's
run
this
cell
I
can
I
can
create
a
dataset
providing
the
route
and
the
speed.
So
the
train-
speed,
yes
and
I
can
already
check
the
length
of
my
dataset
so
lame
dataset.
We
call
the
Len
function
so
here
I
can
see,
I
have
a
694
samples,
training
samples
and
let's,
let's
run
that
and
create
the
first
temple.
So
here,
as
you
can
see,
we
get
a
tuples.
We
get
a
bill
image
alongside
the
label
0.
So
here
you
get
an
image
and
we
label
1.
B
B
So,
let's
see
how
to
instantiate
the
simple
transformation
you
can
instantiate
a
random
crop,
given
the
size
of
the
crop
here
and
the
crop
transform
will
be
a
function
that
you
can
apply
on
an
image.
So
let's
take
the
first
image
in
our
data
set.
This
is
our
atom
and
then
I
can
simply
apply.
It
is
cropped
from
the
form
here
to
our
image.
B
Here
we
go.
We
get
a
random
crop
of
our
image
and
each
time
it's
fine,
I
will
run
this
function
and
we
get
different
random
crops.
So
now
what
we
want
is
actually
to
crop
out
answer
and
then
crop
our
image
and
then
convert
it
to
an
answer.
So
I
want
to
perform
multiple
transform
and
to
do
that
to
combine
multiple
transform.
B
We
can
use
the
compost
dust
from
toads
vision
which
arose
here
to
compose
different
different
transformations,
so
here
I
do
random
and
then
I
move
it
to
tonsure
and
then,
if
I
apply
this,
this
all
transform
function
to
my
image.
I
can
see
that
indeed,
I
get
a
ton
zero
now
and
if
I
check
the
shape
of
distance
or
using
the
dot
shape
attribute,
you
can
see
that
I
get
a
tensor
of
size,
100
500,
so
the
crop
I
I
wanted
to
end.
B
B
Here
you
go
and
once
that's
done,
you
can
shake
so
today,
I
load
the
data
set
again,
but
now,
when
I
query
force
for
the
first
sample,
I
get
a
tensor
rather
than
rather
than
an
image
and
the
plus
level
corresponding.
So
we've
done
with
that
being
said,
we
have
data
set.
It
is
ready
to
be
used
with
our
model,
so
data
set
that
provides
cancel
and
that
has
a
fixed
size.
B
B
Yeah,
a
ton
so
of
size
Phi
with
full
dimension.
The
float
dimension
is
the
batch
size.
So
here
you
can
see,
I
have
ended
a
batch
size
of
pipe
and
then
you
can
see.
We
have
images
so
three
channels,
105,
108
and
100,
and
alongside
with
this
sample,
I
plot,
the
labels
and
the
labels
here
are
the
five
labels
corresponding
image.
B
So
what
is
interesting
is
that
the
data
allow
the
code
doesn't
change
depending
on
the
dataset,
so
we
have
did
an
alien
predator
date
effect,
but
doing
image
specification
is
pretty
common
tasks,
independent
and
therefore
include
division.
There
is
also
a
dataset
that
automatically
provides
loading
data
from
a
specific
folder
structure.
So
here,
if
I,
if
I,
provide,
if
I
provide
the
root
of
my
dataset
to
the
class
image
folder,
then
that
is
a
class
that
exists
in
Taj
vision.
B
Then
it
will
automatically
find
the
different
classes
and
basically
it
will
do
all
the
job
that
we
have
been
doing
in
our
in
our
dataset
classic
in
defining
yeah
yeah,
where
it.
So,
let's
run
this
this
image
folder
here
we
provide
the
root
and
the
transform
that
we
want.
It
to
use-
and
we
can
feel
it-
we
get
indeed
a
proper
sample
from
our
dataset.
So
and
that's
that's
probably
the
main
thing
that
you
want
to
know
for
data
set.
B
So
we
have
seen
how
to
build
a
data
set
and
to
use
the
data
loader
and
in
case
of
image,
you
can
use
the
transformation
from
top.
So
now
we
can
incorporate
these
bits
inside
a
training
loop.
So,
let's
take
as
a
attempt
to
classify
this
onion,
predator
and
so
here,
I
have
added
three
lines:
to
import
complete
transformation
from
toad
vision,
as
well
as
the
data
louder.
The
image
folder
and
so
here
I
have
added
some
transformation,
I
load,
the
data
spec
and
then
I
created
data
loader.
B
B
B
A
B
A
B
To
do
image
segmentation,
alongside
with
with
moving
you,
can
also
crop
it.
You
can
as
you've
seen,
but
you
can
do
kind
of
cropping
rotation
and
so
on.
So
of
course,
information
when
you
crop
the
image,
but
usually
you
will
see
that
image
multiple
times
with
multiple
crops
so
easily.
That's
that's
that's
I
mean
that's
how
that's
not
an
issue,
and
so
you
can
of
course,
collagen
provides
a
lot
of
functions
to
add
the
image
to
rescale
it
and
so
on.
B
B
Let's
move
on
then,
all
right
so
now
that
we
have
seen
how
to
get
batches
of
samples
from
neighbors
from
the
data
set
and
move
them
to
GPU.
Let's
see
how
you
can
compute
the
models
prediction.
So,
let's
see
first,
how
do
you
can
actually
create
a
model
and
compute
its
prediction?
So
Python
provides
what
we
call
a
module
that
are
reusable
model
components
and
that
help
you
with
building
on
your
networks.
B
I
think
already
provides
a
lot
of
built-in
models,
so
let's
see
how
we
hope
to
manage
the
parameters
so
having
a
module,
so
module
class
actually
had
to
keep
track
of
all
the
parameters
in
your
model,
so
you
may
have
a
lot
of
convolutional
layers,
linearly
year's
budget
organization
and
so
on
with
parameters
and
the
motor
will
actually
be
aware
of
the
parameter
it
allowed
to
quickly
load
and
save
the
model.
It
will
allow
to
reset
all
the
gradients
of
the
parameters
because
we'd
be
computing.
B
B
So,
as
I
said,
Python
provides
a
lot
of
predefined
modules,
so
there
is
actually
the
thought
that
soup
module
of
Python,
which
is
which
is
a
whole
library
that
is
dedicated
to
neural
network,
and
so
they
have
conversational
layers,
layers,
activation
functions,
functions
and
so
on.
So
it's.
B
A
B
Verily,
it's
full
of
function,
which
will
be
actually
that
you
made
which
might
need,
and
then,
when
you
want
to
create
your
own
modules,
you
can
inherit
from
the
thoughts
that
nm
that
module
class,
and
then
you
can
benefit
from
all
the
advantages
that
I
have
talked
about
when
I
talked
about
managing
parent
parameters.
So
all
the
modules
from
the
totes
that
I
mean
I
really
like
conversation,
linear
and
so
on.
They
also
inherit
from
this
class
and
when
you
want
to
inherit
from
toast,
but
and
in
that
module,
you
need
to
implement
two
functions.
B
So
you
need
to
implement
the
eight
functions
that
we
like
the
user
in
it
when
you,
when
you
need
you
need
when
you
create
a
test,
but
in
pytho
specifically.
This
is
why
we
define
all
the
sub
components
of
your
networks
and
it
could
become
aware
of
the
parameters
because
you
define
them
in
the
index
and
then
once
you
have
defined
what
component
your
models
would
have
you
define
in
the
forward
function?
How
all
these
components
will
will
be
connected
together?
So
all
the
components
that
you
have
defined
in
the
in
alright
so
yeah?
B
That's
the
basics:
on
modules:
let's
see
how
to
how
to
you
wanting
to
create
custom
modules
through
some
typical
books
example.
So,
first,
we
will
need
to
import
taught
as
well,
along
with
thoughts
that
a
name
that
we
import
as
nm
and
then,
let's
see
as
how
you
can
instantiate
the
simple
module
that
exists
in
the
toaster
and
in
ivory.
So
here
I
in
turn,
I
want
to
create
a
fully
connected
layer
that
is
called
linear
in
Python.
B
It's
actually
simply
a
matrix
multiplication,
a
plot
by
F
so
to
create
this
I
need
to
provide
the
number
of
features
in
and
out
so
here,
five
a
feature
theme
and
two
features
out
of
the
this
fully
connected
layer.
That's
why
I
call
here
linear
regression
model
and
like
this
I
have
created
a
linear
layer
and
therefore
my
mother.
We
have
a
way
and
a
bias
parameter.
B
When,
while
you
use
items
a
parameter
in
the
model,
the
model
would
be
aware
that
this
tensor
is
a
parameter
because
it
has
this
parameter
class.
So
for
module.
You
can
get
all
the
parameters.
So
here
I'm
using
the
name
parameter
function
on
the
module
and
I
iterate
through
them,
so
it
returns
a
tuple
with
names
and
dancers.
So
here
I've
got
the
name
and
the
tones
or
shapes.
So
you
can
see
there
is
in
the
linear
regression
model
wait
5
to
5
and
by
a
size,
2,
and
there
is
also
a
function.
B
It
is
just
parameters
that
returns
a
generator
of
for
all
the
parameters,
in
fact
the
model.
So
here,
if
I,
if
I
use,
if
I
use
this
function,
I
can
get
the
two
parameters,
weight
and
bias
that
that
that
is
in
my
daenerys
mother.
So
then,
how
can
we
use
this
model
so
a
motive
in
Python?
They
will
work
on
batches,
which
means
that
if
you
have
a
sample
of
feature
size
5,
for
example,
you
will
always
need
to
have
as
first
dimension
the
batch,
the
batch
size
and
then
a
second
dimension.
B
Your
feature
size,
so
here
I'm,
creating
like
dummy
data
with
random
numbers
and
actually
here
what
I
have
is
a
term
for
X
with
a
bad
size
of
3
and
then
5
damage
from
a
5
under
sample.
Understand
then,
with
this
with
this,
so
an
input
of
bad
size,
three
feature
size.
Five
I
can
just
call
my
model
directly
under
sample,
so
here
not
that
you
don't
call
the
forward
explicitly.
That's
quite
important.
B
B
So
we
can
see
that
we
get
a
bad
size
of
three
for
the
output
and
then
a
feature
size
if
you
wish
of
the
output
of
size,
two,
which
is
the
output
size
of
our
linear
modem,
that
you
have
defined
all
right.
So
here
we
have
seen
a
bit
how
to
how
to
use
predefined
the
module
in
my
code
so
be
our
linear,
our
linear
regression
model,
and
then,
let's
see
how
you
can
actually
build
your
own
module.
B
So,
as
I
said
before,
you
need
to
inherit
from
an
ended
module
and
then
here
I
will
just
initialize
the
different
modules
from
the
top
and
in
library
that
I
want
to
use.
So
here,
I
want
to
use
food
in.
There
are
layers
to
240
connected
layers
in
a
row,
so
I
just
defined
you
define
them
here
and
then
in
the
forward
function
you
can
see
that
I
am
I'm,
calling
them
recursively.
B
So
first
I
call
the
Rina
one
I
apply
a
really
function,
then
I
called
the
linear
to
and
the
output
and
then
I
return
this
output.
So
just
like
this,
my
network,
my
neural
network
module,
is
ready
to
be
used.
You
can
know
that
I
have
used
here
after
true
loop,
and
you
might
wonder
where
does
this
come
from?
So
here
you
can
see
it
comes
from
totes
at
any
node
functional.
B
So,
as
you
can
see,
it's
kind
of
a
module
of
enemy,
so
all
the
that
you
can
find
in
enemy
like
linear,
our
and
conversation
and
so
on.
You
can
also
find
them
as
functional
inside
the
totes
that
and
in
that
functional
that
we
use
sometime.
So
here,
I
used
the
reduce
function
from
this
toast
at
an
angle,
function
and
submit
all
right.
So
now
now
our
network
is
ready
to
be
instantiated,
so
let's
create
one
instance
of
our
model,
so
here
I
just
called
the
model
with
the
parameters
that
I
have
specified
here.
B
So
the
input
size
of
the
first
in
our
the
hidden
side
and
the
number
of
classes
in
output
of
my
second
linear,
Network
and
Pyrus
provides
a
handy
way
of
displaying
model.
When
you
print
it,
you
can
see
all
the
permit.
The
modules
that
are
registered
inside
your
network,
so
here
I,
have
I
can
see.
We
have
two
denaro
layers.
B
However,
this
the
order
here
does
not
necessarily
match
the
order
in
which
will
you
will
apply
them
in
the
fall.
It's
just
about
what
parameters
are
inside
your
neck
and
then,
just
as
before,
when
you
want
to
call
your
model
on
the
input,
apply
your
mother
on
the
input
X
here
you
note
that
you
don't
call
the
forward.
You
directly
called
model
on
the
unti
put
X,
and
here
I
create
an
input
X
with
patch
size
type
and
within
input
dimensions
to
match
the
input
dimensions
of
my
first
inner
layer.
B
So
when
I
run
this
I
I
I
get
an
output,
a
file,
a
size
five
times,
two,
so
the
best
size
and
the
output
of
money
nearly
so.
What
I
have
done
here
is
actually
building
a
sequential
model.
It
will
run
each
different
modules
sequentially
and
that's
something
you
can
do
directly,
because
this
is
something
which
is
quite
common,
so
there
is
a
class
that
is
called
pot
that
and
in
that
sequential
which
will
apply
the
module
sequentially.
So
here
we
apply
three
modules:
sequentially
the
linear
modules
then
remove
module.
B
Then
another
linear
modules-
and
this
will
act
like
very
much
exactly
what
our
custom
module
is
doing
with
applying
a
linear
than
ever
written
in
other
Denia
module.
So
this
is
this
allows
you
to
do
much
more
much
quicker
too
much
more
quicker,
sorry
define
network.
This
is
actually
sequential,
and
then
we
run
it
the
same
way
as
you
would
have
run
the
previous
one.
So
here
I
get
also.
They
are
good.
B
Bats
ties,
file
output
size
to
right,
then
modern
Kanner's
can
be
moved
to
GPU
and
you
don't
need
to
move
each
parameters
to
project
into
GPU.
You
can
simply
call
dot
CUDA
on
your
model
to
move
directly
all
the
parameters
to
GP.
So,
similarly,
to
what
I've
said
just
before
that,
CUDA
is
the
older
way
to
do
so,
and
the
newer
way
to
do
so
is
to
actually
call
the
two
device
on
your
model,
and
so
assuming
you
have
defined
your
device
before
and
this
remove
your
device
to
your
modem.
B
B
And
probably
the
last
did
we
want
to
check
on
modern
how
to
store
unload
them
after
we
have
you
have
trained
your
models,
you
might
want
to
be
able
to
save
them
and
know
them
later.
So
there
is
two
ways
to
do
so
and
I'll
cruise
on
both
because
there
is
one
that
is
easier,
but
it
is
not
necessarily
always
safe.
So
let
me,
let
me
show
you
so.
B
B
This
function
type
will
actually
picker
the
models,
so
it
will
use
the
pickle
packages
from
Python
to
save
the
model,
and
actually
this
will
mean
that
the
what
you
would
say
will
be
bound
to
the
folder
structure
that
you
had.
So
if
your
model
is
defined
in
a
specific
file
and
then
that
you
want
to
load
your
model
and
the
define
has
another
name.
For
example,
we
won't
be
able
to
load
it
so
this
might.
This
might
be
a
problem,
and
so
there
is
a
way
to
let's
say
rather
than
saving
the
whole
model.
B
Just
I
didn't
depart
for
the
value
of
the
parameters
and
that's
what
you
can
do
with
the
state.
So
it
has
a
state
that
will
return
all
the
parameters
of
this
model.
So
that's
why
it's
it's
actually
useful
to
have
the
model
awarded
parameter,
because
it's
able
to
return
all
this
or
its
parameter
at
once
in
the
state
dictum.
Then
you
can
just
save
this
static,
so
this
technique
is
really
a
dictionary
mapping
name
of
the
parameter
with
defining
different
weights.
B
So
if
I,
if
I
save
my
money,
that
way,
I'm
also
allowed
to
then
load
it,
but
I
need
this
time
to
use
a
function
is
called
load.
State
dict,
so
I
create
I
need
to
create
my
model
first,
which
I
didn't
need
to
do
before
in
the
other
way,
I
create
my
model
first
and
then
I
can
use
the
load
state
function
on
my
model
to
load
the
static
that
I
have
say,
and
here
you
can
check
that.
B
Indeed,
I
have
the
same
values
for
the
model
and
mode,
so
the
difference
is
is
really
it's
I
mean
it's
important
to
know
a
bit
that
with
the
first
way,
this
is
much
simpler,
but
you
need
to
make
sure
that
your
class
names
and
organization
of
your
photos-
we
won't
change
otherwise
using
the
state
it
will
just
rather
than
saving
the
model
itself,
simply
State
Dictionary
Python
dictionary
with
all
the
parameters
of
your
model.
One
last
thing
we
may
want
to
cover
now
is
how
to
compute
a
loss
with
PI
crotch.
B
So
PI
post
comes
already,
if
a
lot
of
pretty
fine.
So
here
we
have
an
oscilloscope
and
so
on
and
let's
run
203
and
they
all
existing
inside
the
package,
so
the
toast
a
tenant,
and
so
if
we
want
to
create
on
the
pollen,
elbow
and
loss
function,
you
can
just
instantiate
it
this
way
and
then
to
apply
it
between
to
compute
the
loss
between
two
pencils.
We
can
just
pass
it
as
argument
to
our
loss
function
and
get
and
get
the
sort
of
the
loss,
so
he
had
n1
nodes
between
the
two
time
zones.
B
B
So
that's
it
for
for
for
pi
code
modules,
creating
your
own
modules
and
I'm
using
the
one
from
by
so
now
what
we
can
where
we
can
see
how
we
can
add
it
to
our
training
loop.
So
what
we
can
do
is
simply
to
instantiate
a
model
that
way
with
a
sequencer,
for
example,
and
then
to
move
it
to
the
proper
device.
B
We
also
want
to
instantiate
the
last
function
so
before
I
had
the
l1
here.
I
want
to
use
the
cross
rooms
of
fillers.
That
pyros
provides
that
is
more
suited
for
classification
programs
and
then
I
can
simply
in
my
training,
new,
add
lime.
The
first
time
apply
my
model
under
samples
we
get
predictions
and
then
we
compute
the
loss
between
the
predictions
and
the
correct
label.
So
that's
what
we
do
on
this
time
and
we
get
the
loss
so.
B
Have
covered
most
of
the
most
of
the
steps?
What
what
is
next
is
to
see
how
to
compute
the
greatest
and
to
update
the
parameters
with
discretion.
So
that's
probably
the
most
important
step
that
we're
going
to
see
right
now,
maybe
let's
yeah
it's
covered.
Let's
cover
the
written
computation
and
then
maybe
have
this
more
break
for
question
for
you.
So
that's
what
we
have
done,
completing
them
on
their
prediction
and
computing,
which
adds
these
two
lines
to
the
training.
B
Look
now,
let's
feel
a
bit
how
you
can
compute
gradients
in
Python,
so
python
has
a
package
called
auto
grad
which
which
will
allow
you
to
compute
the
gradients
or
two
different
parameters.
So
python
is
what
we
call
defined
by
one
run
framework,
and
this
means
that
the
back
propagation
is
defined
by
how
the
code
is
run.
So
every
single
run
of
the
code
and
be
different,
so
that's
the
we
will
use
so
this
autograph
that
is
inside
the
that
is
the
differentiation
package
of
Python,
and
so
how
does
that
work?
B
Then,
when
you
finish
the
computation,
you
can
call
the
tone
for
the
backward
to
directly
compute
all
the
gradients
automatically.
So
what
you
will
have
usually
is
that
you
will
do
a
successive
of
computation
that
will
lead
to
a
lot.
We
call
a
lot
of
backward
to
compute
loss
and
to
compute
the
gradients
of
the
laws
with
respect
to
each
parameters
so
before
to
the
champion
journal
book.
B
Let's
look
at
a
quick
example:
oh
one
stuff,
I
forgot
is
that
the
gradients
are
accumulated
into
the
tongue,
so
that
what
attribute
so
not
here
that
when
you
code,
the
back,
the
gradient
that
is
computed
will
be
added
to
the
produced
gradient.
That
was
inside
the
term
solid
good.
It's
not
stored,
it's
accumulated,
which
means
that
if
something
is
already
there,
you
will
add
the
previous
gradient.
We
will
see
a
bit
later
why
we
wide
this.
B
This
happens
that
way
so
and
I
said
just
a
quick
example
of
how
that
could
work
before
coming
to
netbooks.
So
here
we
have
the
computation
graph.
That
could
happen
from
from
for
the
previous
example,
so
we
have
an
input
that
we
multiplied
by
a
by
a
by
a
white
matrix
W,
we
add
bias
D,
we
remove
the
value
of
the
label
Y.
Then
we
square
it
and
we
get
a
loss.
So
it's
like
enough
linear
model.
B
What
you
can
note
here
is
that
X
doesn't
require
squad
because
we
don't
need
to
have
the
gradient
of
the
loss
with
respect
to
X,
because
we
don't
want
to
update
X.
However,
we
want
to
update
the
weight,
you
want
to
add
a
W
and
E
and
therefore
we
will
stick.
There
requires
grad
to
initially
the
equations
are
set
to
none
and
then
what
we
want
is
the
grams
of
the
loss
with
respect
to
each
parameters,
and
so
we
will
cover
the
backward
fraction
to
get
that.
B
B
Usually
this
accumulation
is
used
to
provide
some
more
flexibility
when,
when
you
work
with
gradients
of
complex
model-
but
here
it
won't
be
useful
to
us
today-
and
that
means
that
we
will
have
to
after
we
have
used
the
gradient
to
update
our
parameters.
We
need
to
set
them
to
zero,
but
that's
something
that
you
have
to
do
in
Python
is
to
remember
to
set
your
greatness
to
zero
after
you
have
used
the
gradients
so
that
for
the
next
back
wall
you
will
you
will
get
the
fresh
gradient.
B
Okay,
great,
so
how
does
Otto
grad
works?
Let's
see
doesn't
practice
so,
as
I
said.
First,
you
need
to
set
the
required
property
on
your
time
zone.
So
when
you
create
answer
by
default,
the
recurs
grad
will
be
to
force
the
red
ball
NX
bit.
And
then,
if
you
want
to
change
this,
you
can
use
the
whicker's
white
function
with
an
underscore
at
the
end
training
under
which
allows
you
to
change
the
value
of
the
recurs
Brad
attribute,
and
here
you
can
check
that.
Indeed,
we
get
time
for
eggs
with
whicker's
one
equal
to
true.
B
So
once
that's
done,
you
can
see
how
other
grad
will
actually
track
this
operation.
So
let's
do
a
simple
addition.
Operation
Y
can
express-
and
here
you
can
see
that
our
Y
functions
also
has
the
whicker's
will
attribute
equal
to
true
and
that
it
has
a
gradient
function.
This
gradient
function
is
actually
a
pointer
for
auto
blood.
You
know
which
kind
of
operation
it
has
to
perform
to
compute
the
derivatives.
So
we
won't
dive
into
this.
B
But
it's
interesting
to
see
that,
through
these
dragon
functions
of
the
grad
is
building
a
graph
of
the
computation
and
that's
what
will
allow
a
low
autograph
to
compute
the
derivative
later
on.
So
if
we
continue
and
do
we
have
more
computation
and
why
we
can
see
that
Z
is
will
also
occurs
well
and
it
will
also
have
the
gradient
function.
That
is
different.
B
So
here
it's
the
backward
function
defined
for
the
mean,
for
example,
and
so
once
I've
seen
that
we
can
track
how
it
would
look
in
our
in
the
example
that
we
had
in
the
sky.
So
in
the
side
you
had
example
of
some
linear
prediction
model,
so
we
have
an
input
X
and
a
target
white
that
I
define
here
and
then
at
answer
or
a
weight
matrix
W
on
a
bias
B.
So
you
can
see
that
here,
I
will
we
curve.
B
I
will
make
W
and
B
we
Clare's
gradients,
so
that
that
we
we
can
update
them
with
with
because
this
is.
These
are
the
parameters
that
we
want
to
update.
So
you
can
check
here
our
two
tensors
and
then
we
can
just
run
the
line
to
compute
the
loss,
and
you
can
see
that
the
loss
is
also
a
ton
so
that
whicker's
gradient
and
that
has
a
gradient
function.
So
it
knows
why
what
kind
of
regime
has
allowed
to
create
draws?
B
And
now,
if
we
check
the
gradient
attribute
of
W
and
E,
which
is
where
the
gradient
is
tall,
you
can
see
that
at
the
beginning,
they
have
no
gradient,
so
actually
the
grantors
which
are
set
to
none.
Then,
if
we,
if
we
call
it
backward
and
the
lost
control,
it
will
compute
the
derivative
with
respect
to
B
and
the
value.
And
then,
if
we
check
the
gradients
now
you
can
see
that
actually
regret.
The
gradient
attribute
of
W&B
has
been
populated
with
some
data,
however
x
and
y
that
doesn't
require
that
don't
require
gradient,
have.
B
Still,
it
hasn't,
let's
just
confirm
from
confirm
what
I've
said
about
graduates
that
accumulates.
So
if
I
run
this
line
again,
I
compute
the
log
again
with
the
same
values
M
that
I
call
back
code
again,
we
can
check
that
the
values
of
W
and
D,
the
gradient
of
W
on
d
are
actually
twice
as
much
as
the
way
before
so
here
you
can
see
the
grams
accumulate.
So
if
you
don't
want
this
to
happen,
we'll
have
to
set
the
ramp
to
zero
B
between
two
backwards.
B
So
we
have
seen
how
how
to
compute
the
loss
for
the
decorations
for
our
W
on
these
dates.
How
does
it
work
actually
for
the
model
parameters
if
we
have
a
model?
So
here
we
have
a
quick
example
of
how
this
could
work.
So,
let's
create
a
neural
net,
so
here
I
create
a
simple,
sequential
internet
where.
A
A
B
Also
create
a
loss
function
so
that
I
can
compute
the
loss
and
then
backward.
So
here,
if
I
get.
If
I
ask
for
the
free
index
of
neural
net,
it
will
actually
return
the
first
module
that
that
is
inside
the
sequencer.
So
when
running
with
fell,
I
actually
get
the
first
jr.,
which
it
has
in
either
in
feature.
Five
and
I
was
pictured
ten
and
then
I
can
check
the
weight
of
this
module.
We
know
that
it's
very
now
more.
B
It
has
a
weight,
so
you
can
check
this
and
we
can
indeed
see
that
it
has
five
by
ten.
So
ten
by
five
here
sign
and
you
can
see
again
that
it's
a
parameter
and
that
we
curse
God
is
true.
So
this
is
something
we
might
have
not
pay
attention
before
in
the
previous
netbook
and
that's
something
that
the
music
of
the
parameter,
so
the
parameter
is
registered
by
the
model
as
being
a
parameter,
and
it
has
automatically
degree
across
broad
set
to
truth.
B
Then,
let's
try
to
compute
the
loss
of
yet
to
compute
the
loss
from
prediction
and
some
targets
of
and
then
to
see
how
the
gradients
populates
inside
our
neural
net.
So
here
I
created
some
dummy
data
X&Y
to
meet
answer.
First,
you
see,
there
is
the
batch
size
which
is
here
and
here
and
then
the
input
size
of
X
is
5.
B
I
can
run
my
neural
net,
cutting
it
on
X,
get
some
predictions
and
then
compute
the
loss
between
the
predictions
and
the
value
we
want
for
Y
here
and
when
I
get
given
loss,
which
has
a
gradient
function.
So
we
have
tracked
the
operation
that
have
happened
to
get
to
that
loss,
and
then
we
can
check
the
gradient
attribute
of
our
neuron
of
the
of
the
weight
function.
B
Then,
as
I
said,
you
need
to
zero
the
gradients
before
you
actually
do
some
other
prediction
and
update.
So
to
do
that.
To
do
that
modules
have
a
specific
function,
that
is,
that
is
called
zero
graph
and
that
you
can
call
directly
on
the
on
the
on
the
model,
so
here
neural
net,
and
this
will
zero
the
gradients
of
all
the
parameters.
So
if
we
went
that,
we
can
check
that
afterward,
the
gradient
of
our
wait
times
row
is
set
to
zero.
That's
something
that
you
would
need
to
do
between
each
time.
B
You
see
a
sample
in
your
training
the
during
the
Breton's.
One
last
thing
that
is
useful
to
know
is
how
to
stop
this
history
tracking
from
autograph.
All
your
parameters
have
by
default,
Derrick
Rose
graduate
true
and
doing
inference
when
you
are
not
on
you,
don't
want
to
update
your
parameters,
and
you
are
simply
wanting
to
get
some
credit
you
map
you
want
to.
You
don't
want
to
write
to
create
a
computation
graph,
and
you
can
do
that
with
the
no
grad
context.
B
So
if
you
write
with
thoughts,
but
no
blood
and
no
computation
bathroom
will
be
built.
So
if
I
run
that
here,
you
can
see
that
this,
this
operation
y
equal
x
to
power
two
creates,
creates
the
terms
of
Y
that
precursor
idn't.
However,
if
you
ran
the
same
operation
inside
the,
we
thought
that
my
grad
context
the
tongue
floor
that
you
create
here
doesn't
requires
gradient,
and
you
can
see
that
the
gradient
under
grad
function
of
G
are
both
known.
B
So
that's
the
contents
basic
of
autograph.
You
might
want
to
go
through
it
again
to
make
sure
you
have
understood
everything
for
now.
Let's
just
see
how
we,
we
can
add
it
to
our
training.
Look
so
from
what
we
have.
We
had
before.
The
only
thing
that
we
will
actually
add
will
not
be
inside
the
initialization
part.
It
will
be
instead
of
training
loop
itself.
So
we
add
these
two
bits:
the
roasted
backward
function.
B
So
after
having
computed
the
loss,
we
will
compute
the
gradient
with
respect
to
drugs
for
each
parameter
of
the
model
and
that's
with
the
lost
backward.
Then
we
will
have
to
update
the
model
parameters
and,
finally,
we
will
need
to
set
the
grant
to
zero,
as
I
have
said,
otherwise
they
would
accumulate
over.
Never
not
that
this
is
ever
gone.
I
have
put
this
line
here,
but
you
can
put
this
time
also
at
the
beginning
of
the
of
the
training
group.
B
The
only
place
where
you
don't
want
to
put
it
is
between
the
computation
of
the
gradients
and
the
update
of
the
model
with
that
gradients.
So
you
don't
want
to
put
your
zero
right
here.
Otherwise,
you
will
not
update
your
model
parameters,
because
your
parents
will
always
be
zero,
otherwise,
this
zero,
but
can
only
go
anywhere.
B
So
that's
it
for
our
new
training
loop.
Let's
move
back
to
the
slides
and
see
what
we
have
added
from
our
training
loop.
We
are
nearly
done.
We
have
computed
the
gradients,
and
now
we
have
this
new
step
of
during
the
gradient
that
we
have
a
swing
demand
TV
on
here
and
what
is
simply
to
now
update
the
model
parameters
and
so
to
optimize
the
model
for
producing
with
sorry.
A
B
B
A
B
B
We
will
now
see
how
to
update
the
parameters
using
optimizers
so
how
to
optimize
parameters
in
python.
So
the
total
has
taught
optimist
third
module
where
we
can
find
lots
of
different
optimizers.
An
optimization
algorithm
optimizer
that
you
can
construct
will
take
as
input
a
list
of
parameters
that
you
want
to
optimize
and
then,
when
you
want
to
actually
do
an
optimized
optimization
step,
you
can
call
the
optimizer
that
step
function
to
update
the
parameter.
B
So
all
the
parameters
that
are
passed
to
the
optimizer
will
be
retained
inside
the
optimizer
object
and
that
way
Gupta
miser
can
access
the
gradient
attribute
and
update
their
values
when
you
call
optimizer
that
step.
Finally,
you
we
don't.
We
know
that
we
have
to
zero
the
burdens
and
that
you
regret
you
could
also
use
optimizer
dot.
0
grad
this
function
does
the
same
thing
for
the
parameters
that
were
passed
to
the
optimizer,
so
maybe
to
make
it
a
bit
more
concrete.
B
Here
we
go
first,
we
want
to
import
the
option
package
so
from
toad
I'm
Paul
opting
I'm
here
so
this
command.
I
want
me
to
check
all
the
different
optimizers
that
are
provided
by
python,
so
you
have
added
by
grad
that
the
delta
s
de
atom,
so
these
are
very
common
optimizers
that
are
already
implemented,
and
then
you
can
use
out
of
the
box
for
an
optimizer.
B
As
I
said
you
can
you
always
need
to
provide
parameters
when
you
instantiate
it
so
here
I
have
to
learn
to
use
the
STD
optimizer.
This
is
in
the
traj
that
opt-in
sub
module
and
to
create
to
create
this
optimizer
I
need
to
provide
the
parameters,
and
here
I
can
use
the
parameter
function
that
we
have
seen
in
the
module
notebook
that
allows
us
to
directly
get
all
the
parameters
of
our
model.
B
So
let's
see
how
uses
you
used
it
in
practice,
so
I
define
a
loss
function
here
and
and
Demi
input
and
output,
so
X,
which
is
a
an
input
with
a
batch
size,
15
and
a
feature
size,
10
and
then
I
run
the
my
prediction.
Sorry,
a
random
labor,
I
compute,
the
prediction
with
my
neural
net,
so
I
can
just
run
the
run
it
on
X
and
get
predictions
I
conclude
the
last
corresponding
to
this
prediction
prediction
and
the
labels
and
then
I
compute,
the
Breton's
hello
ever
seen
just
before
so
after
design.
B
We
know
that
all
the
gradients
of
the
parameters
of
the
models
has
been
computed
here.
Let's
take
what
the
what
is
the
value
of
the
bias
of
my
neuron
of
my
linear
first
linear
layer
in
my
neural
net.
So
as
you
can
see
here,
we
have
a
bias
with
five
terms
and
now
the
optimizer
that
step
will
use
the
gradients
of
this
bias.
So
the
gradient
of
the
loss
with
respect
to
this
bias
to
update
these
parameter
values.
B
So
if
we
do
the
optimizer
that
step
now,
when
we
print
the
new
bias,
the
moving
for
the
bias,
we
can
see
that
we
have
slightly
different
values,
because
the
optimizers
the
optimizer,
have
has
changed
the
value
of
this.
The
bias
using
the
gradient
actually
so
that
that's,
if
you
just
you,
just
have
to
create
the
optimizer
with
the
proper
parameters
that
you
want
to
optimize
and
then
after
having
computed
the
gradient,
you
simply
need
to
call
the
optimizer
that
step
to
to
request
the
optimizer
for
a
gradient
update.
B
One
thing
you
usually
want
to
do
also
with
optimizer.
Sorry
during
your
training
is
to
detail
the
running
right.
You
might
not
want
keep
the
same
running
right
all
the
time
and
to
do
so,
you
have
running
right,
speedy
roads
and
so
in
pythons
there
is
a
couple
of
running
right
schedules
that
already
exist
that
allows
to
change
the
running
right,
drink
training.
B
B
So
here
what
you
can
see
is
that
I
print
the
learning
rate
of
my
optimizer,
which
is
0.1,
then
I,
do
a
step
of
scheduler,
which
is
supposed
to
decay.
My
learning
rate
by
multiplying
it
by
0.8
and
here
I
print
the
running
right
after
the
decay,
and
you
can
see
that
indeed,
the
scheduler
has
decreased
the
running
rate
from
Japan
wine
to
0.08.
So
that's
how
you
could
use
a
scheduler
inside
your
training.
B
It's
not
use
all
the
time,
but
there
are
available
in
package
once
you
have
seen
that
we
can
just
add
this:
adding
optimizer
to
our
training
loop
to
finally
finish
and
have
a
full
training
loop.
So
put
the
initialization
part.
What
we
need
is
to
add
this
line,
which
instantiate
a
new
t
minus
2
here
from
the
opting
package
with
a
CD
optimizer,
then
I
pass
to
the
model
a
pass
to
the
optimizer.
The
model
parameters
which
are
the
parameters
I
want
to
optimize
on
and
a
given
learning
rate
and
inside
the
training.
B
Look
I
added
this
time,
which
will
use
the
gradient
that
you
have
computed,
that
auto
growth
is
computed
with
loss
that
backward
and
then
the
optimizer
will
use
these
gradients
to
update
the
parameters.
So
you
need
to
call
the
optimizer
after
having
computed
the
gradient
with
backward
after
having
used
this
gradient.
You
can
describe
them
by
setting
them
to
0,
as
I
have
said,
rather
than
using
model
dot
d
regret,
you
could
also
use
optimizer
dot
0,
but
here
and
you
do
the
exact
same
thing.
So
here
we
are,
we
are
good,
so
we
have.
B
B
So
I
think
I
have
not
much
time
left,
but
I
will
try
to
quickly
go
through
a
complete
example
of
how
to
build
and
train
a
full
model
on
task.
The
enemy's
task,
the
computer
I
mean
didn't
ask
any.
So
let
me
go
through
this
quickly
inside
the
chip,
eternal
books,
and
you
can
probably
ask
more
questions
later
on
the
stack
or
after
the
webinar.
So
for
this
example,
I
have
a
couple
of
imports
that
you
know
you
now
know,
and
the
network,
if
you
want
to
build,
is
the
Lynnette.
B
So
here
is
a
picture
that
is
not
very
clear
of
the
network.
You
want
to
build
it,
and
so
the
details
are
here.
So
what
we
want
to
have
is
a
couple
of
convolution
really
max
pooling
layers.
So
I
have
convolution
random
act
pudding.
Another
conversation
will
do
max
pooling
and
then
a
conversation
with
given
input
and
output
channels
and
then,
when
I
have
used
when
I
have
decreased
the
size
of
the
image
using
very
small
layers.
I
can
use
free
connected
layers
to
do
the
final
classification.
B
So
such
a
network
for
such
a
network,
you
could
do
custom
module
that
way,
so
here
I
still
inherit
from
an
in
that
module
and
then
I
choose
to
create
two
sequential,
let's
say
sequential
model
inside
in
it.
So
the
first
sequential
would
be
the
convo
convolutional
part
and
the
second
C
concern
will
be
the
freedom
free
connected
part.
So
here
you
can
see
I
used
different
modules
from
the
NN
package,
so
it
comes
to
derail
we
might
pull
to
D.
B
So
these
are
modules
that
you
can
find
in
the
any
packaging
that
you
can
find
depending
on
what
you
need.
So
you
can.
You
can
check
them
on
the
highroad
stock
and
then
here
I
use
linear,
I
use
linear
layers
to
form
a
fully
connected
part.
So
once
I
have
defined
Network
I
want
to
do
this.
I
want
to
use
I
simply
need
to
tell
which
is
the
way
I
want
to
combine
them.
So
first
I
apply
on
my
input
images.
B
The
convolutional
part
I
get
some
output,
then
this
line,
if
you,
if
you
check
this,
will
flatten
in
the
contour,
so
it
will
actually
take
the
image
and
output,
something
that
will
have
the
size,
the
size
of
the
batch
size
and
the
remaining
the
remaining
dimensions.
So
this
is
the
flattening
part
where
you
convert
the
transfers
that
is
usually
for
dimensional,
with
the
batch,
the
channel,
the
height
on
the
width,
to
thumb
through
that
that
will
just
have
a
bad
size
and
a
future
dimension.
B
And
then
you
apply
the
fully
connected
part
on
the
and
the
output
of
this
fattening.
So
you
apply
the
convolutional
net,
you
flatten
the
output,
and
then
you
apply
the
fully
connected
part
to
get
final
and
the
final
predictions.
So
here
it
is
here
with
our
in
it
on
our
for
work.
This
is
enough
to
define
this
conversional
network,
so
we
run
this
cell
and
then
we
can
compute.
B
We
can
instantiate
the
network
if
we
wish,
though
so
here,
I
I,
create
the
net
with
this
with
this
installation
and
I
use
the
print
function
from
Pyro's
to
be
able
to
check
that
my
define
proper
the
proper
network.
So
here
you
can
see
it
cleans
all
the
operations
that
I
want
to
prefer.
However,
it's
not
necessary
the
order
in
which
I'm
going
to
perform
them.
That's
just
what
parameters
and
modules
are
inside
my
network.
So,
let's
take
here
all
the
parameters
that
are
actually
inside
my
mother
and
that
want
to
optimize.
B
You
can
see
that
have
couple
of
bias
and
waits
for
the
conversational
layers
and
couple
of
bias
and
weights
as
well
for
the
fully
connected
layers.
These
are
all
the
parameters
that
I
want
to
opt
my
form
for
my
training.
We
can
check
if
you,
if
you
want
what
would
happen
if
we
feed
that
network
to
a
random
input,
so
we
take
an
input
that
have
a
specific
size,
so
bad
size,
channel,
8
in
width
and
we
as
input
to
our
convolutional
network,
and
we
can
see
that
in
output.
B
B
We
need
to
define
the
load,
the
data
and
then
create
our
training
functions
so
loading
the
data
now
is
something
that
we
know
how
to
do
so
here
we
we
create
two
data
set,
so
we
load
the
minister
in
a
second,
we
load
the
two
different
split.
So
here
we
know
the
train
speed,
and
here
we
load
the
tested
split
of
the
mistake.
This
is
a
data
set
that
is
already
it
is
already
available
in
in
Python.
B
So
you
don't
have
to
create
your
own
trust
to
deal
with
this
data,
even
though
you
to
download
this
data
set.
If
you
wish
we,
so
we
define
a
couple
of
transformation,
so
we
resize
the
image
to
32
by
32
only
when
we
convert
them
to
tensors.
We
provide
this
transformation
to
our
dataset
and
then
we
create
two
loaders
one
for
the
training
set
and
another
one
for
the
testing.
So
this
is
done
simply
as
we
have,
as
we
have
seen
before.
B
B
It
will
actually
train
our
network,
so
here
I
have
chosen
to
actually
write
a
train
function
and
not
just
a
training
loop,
so
that
I
can
call
this
function
multiple
times
with
different
number
of
epochs
and
here
so
in
this
function,
first
I
will
want
to
create
the
optimizer
I
want
to
use.
So
in
this
case,
I
chose
Adam,
for
which
I
give
the
parameters
in
the
running
right.
B
The
mother
has
two
modes,
training
and
validation,
and
there
is
a
very
slight
difference
between
the
two
in
train
mode.
When
you
are
using
normalization
or
drop
out
layers,
the
drop
out
layers
will
actually
drop
certain
connections
inside
your
inside
your
network,
which
is
something
you
don't
want
to
have
you
see
happening
in
when
you
are
using
your
network
in
increments?
So
when
you
don't
want
to
have
the
drop
out
active
in
inference,
you
want
to
set
your
model
in
validation
1.
B
However,
when
you
train
your
network,
you
want
your
dropout
to
be
to
be
active,
and
so
you
will
set
your
model
to
train
mode.
It
also
has
a
similar,
similar
action
in
the
batch
mode,
so
the
bad
norm
will
behave
differently,
whether
you
are
in
backdashing
or
training
mode,
so
make
sure
when
you're
training
to
use
the
training
mode
and
then,
when
you're,
on
a
training,
you
can
use
the
validation
model,
which
also
just
now
how
to
switch
to
the
backdash
mode.
And
so
here
we
have
the
whole
training
loop.
B
So
if
I
define,
if
I
run
this
to
these
two
cells
and
then
I
have
trained
function
of
my
accuracy,
member
type
in
Kolb
and
here
I
will
first
define
a
device.
The
device
I
want
to
use.
So
in
this
case,
and
this
and
this
laptop
I
don't
have
a
diffuser,
we
use
the
review
and
then
I
move
my
confident
to
CPU
to
the
proper
device
and
I
feel
all
that
to
my
friend
function
and
the
train
comes
on.
B
B
Let's
just
wait
for
our
final
accuracy
here,
which
is
actually
a
quite
impressive.
We
can
see
that
we
can
actually
learn
tasks,
missed,
recognize
images
with
just
a
simple
conversional
network
and
get
already
96%
accuracy.
So
that's
something
that
that
is
actually
quite
simple
and
easy
to
do
in
by
coach
with
simply.
B
Sorry
to
wrap
up,
we
have
seen
things
and
the
couple
of
things
and
the
BIOS
weather
with
much
more
you
can
discover
there
is
divert,
is
thoughtful
division.
You
can
see
the
training
conversation.
There
is
a
way
to
move
to
production
and
high
performance
script.
So
you
can
there's
a
lot
more
things
that
you
can
discover
both
by
coach
and
maybe,
as
the
last
word.
B
This
is
a
nice
quote
from
Andre
company
and
basically
trusting
tight
roads,
makes
your
life
better
I,
encourage
you
to
check
the
github
that
Steve
already
provided
in
chat
where
we
have
the
notebooks
I
have
presented
during
this
webinar,
along
with
a
detailed
notebook
for
offline
studies,
so
with
lots
of
commands
and
there's
a
couple
of
other
resources
that
you
want
to
check.
All
the
great
pictures
I
have
taken
from
the
deep
running
boot.
This
is
an
e-book.
B
A
A
A
Okay,
so
I
think
you
know
some
folks
have
been
dropping
off
already,
but
let
me
just
say:
apologies.
If
we
didn't
get
to
anybody's
questions,
please
feel
free
to
drop
them
on
slack
and
we'll
we'll
get
to
them,
but
maybe
we
just
get
to
a
couple
right
now,
if
you're
still
interested
in
answering
so
one
of
them,
which
I
think
is
maybe
worth
touching
on
and
I,
don't
know
exactly
what
your
level
of
familiarity
is
with
other
frameworks.
A
B
It's
polemic,
question
I,
guess:
I
haven't
really
used
much
done
stuff
for
myself,
so
I,
wouldn't
be
very
probably
biased
or
touched
by
toes,
but
I
know
is
that
for
now
the
production
part
of
tongues
or
fruit
is
is
more
mature.
So
if
you
want
to
put
a
model
into
production
there
is,
it
would
be
much
easier
using
pie
code.
B
Sorry,
however,
when
you
want
to
quickly
like
tweak,
Network
and
test
a
couple
of
things,
you
can
easily
debug,
you
can
easily
look
into
your
Gretchen's
and
do
things
that
are
very
much
harder
to
do
is
tons
of
flow.
So
I
would
say
that
PI
thought
usually
is
is
more
interesting
in
terms
of
for
research
kind
of
side
when
you
want
to
when
you
want
to
try
out
things
that
are
not
like
under
and
I
mean.
That
is
not
what
everyone
is
do.
B
If
you
want
to
do
what
most
people
do
and
do
it
quickly,
I
get
I
guess.
Chaos
is
probably
the
best
the
best
your
best
friend,
because
with
chaos
we
can
really
do
something
very
quickly
and
very
high
up.
I
thought
will
help
you
tweak
a
bit
more
on
these
things
and
have
more
control.
Tom
servo
would
be
more
prediction
already.
That
must
be
what
I
would
say
what
most
people
would
say.
Yes,.
A
Agreed,
let's
see
here,
there's
another
question
about:
can
we
define
custom
loss
functions?
This
was
a
while
ago.
I
didn't
have
a
chance
to
bring
it
up,
but
maybe
just
comment
on
whether
it's
possible
in
how
you
can
implement
custom
logic
functions.
B
A
B
Again,
where
this
so
with
the
link
to
the
limited
books,
you
will
find
a
live,
notebooks
folder,
where
you
will
find
the
notebooks
I
have
I
went
through,
but
I
didn't
put
any
comment
in
the
book
which
are
not
books
that
I
previously
made
for
another
room.
You
will
find
assignments,
we
find
much
more
and
there
will
be
a
bit
longer.
So
I
encourage
you
to
go
through
this
assignment,
to
get
all
the
explanations
and
for
any
questions
you
can
contact
us
through
the
stack.