►
From YouTube: Day 1 Intro to AI Lecture & CNN Primer & Keras 101
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Good
morning
evening
afternoon,
everybody
I
just
wanted
to
say
stephen.
That
was
awesome.
I
could.
I
could
have
listened
and
looked
at
way
more
projects.
So
if.
A
A
C
A
So
this
is
the
ai
for
science,
bootcamp
yeah,
thanks
for
being
here.
So
a
little
about
myself,
I'm
a
senior
data
scientist
and
the
an
aia
scientist
for
the
nvidia
ai
tech
center
at
the
university
of
florida.
Also
a
site
lead
there,
so
they
have
a
system
not
as
big
as
per
water.
It's
it's
140
dgx
a100
nodes
which
comes
out
to
run
around
over
a
thousand
a
100
gpus.
A
D
E
A
A
D
A
B
D
E
A
A
A
A
huge
participation
we
have
so
one
of
the
main
goals
is
being
able
to
talk
fundamental,
deep
learning
enough
to
read
papers,
to
read,
tutorials
blogs
and
understand.
What's
going
on
right.
That
would
be
huge
for
me
if
you
leave
here
today
and
tomorrow,
like
that,
second
would
be
being
able
to
use
these
notebooks
that
we're
going
to
go
through
and
pick
your
problem
set
and
set
it
right
there,
where
the
problem
set
is
in
the
notebook,
your
data,
all
right.
A
And
real,
quick
like
like
I
mentioned,
I
got
about
an
hour
to
talk
and
then
we're
gonna
take
a
15-minute
break
and
then
we're
going
to
do
two
labs
they're
going
to
be
on
keras
and
we're
going
to
be
classifying
endness
and
we'll
be
doing
a
keras
101
and
a
cnn
lab
so
pretty
short
day
only
an
hour
of
me
talking
and
then
you
all
get
to
go
into
it.
A
So
I'm
going
to
try
to
make
this
as
exciting
as
an
hour
can
be
at
high
level
ai,
so
so
bear
with
me,
but
I
think
we're
on
the
right
track
so
intro
to
ai,
and
you
can
see
that
in
parentheses
I
have
d,
l
m
l
d
s,
deep
learning,
machine
learning,
data
science,
ai,
all
synonymous
today
in
industry,
for
some
reason
as
the
buzzword
for
ai
right.
So
you
can
look
on
linkedin.
A
A
So,
where
we're
going
to
right
is
this
whole
idea
of
a
new
way
to
code?
So
we're
going
to
look
at
traditional
programming
back
before
there
was
machine,
learning,
slash,
deep
learning,
slash
ai,
not
really
ai,
but
and
then
what
we're
at
today.
So
traditional
programming
right,
you
had
hardcore
coder
hardcore
programmer
and
they
had
a
task
that
they
needed
to
accomplish
and
they
had
expert
knowledge
in
that
task
or.
A
You
know
if,
if
or
statements
loops,
you
know
boolean
things
et
cetera
a
ton
of
things
that
just
what
we
go
through
right.
You
know
I
have.
I
have
two
kids
three
and
one
and
same
thing:
I'm
trying
to
teach
my
three-year-old
right
if
it
feels
warm
the
stove,
it's
probably
hot
right.
She
just
doesn't
look
at
the
stove
and
knows
it's
hot
she.
You
know
you
gotta,
try
to
teach
that.
So
that's.
A
Traditional
program
and
then
we
get
in
today,
software
2.0
we've
got
this
awesome
optimizer.
This
is
a
robot
called
atom,
which
is
actually
one
of
the
most
used
optimizers.
We
have
in
deep
learning
and
we're
going
to
feed
it.
A
ton
of
examples
and
we're
going
to
have
machine
learning,
understand
from
the
examples
and
the
optimizer
right
finding
this
space
in
this
manifold
world
that
we,
our
data,
lives
in.
It's.
A
Function
that
explains
everything
we
want,
so
you
can
see
right
so
task
and
expert
knowledge
later
ask
an
expert
knowledge
is
now
replaced
with
just
a
ton
of
data,
a
ton
of
examples,
and
hopefully
they're
labeled
right
for
our
sake.
Right
now,
let's
just
say:
they're
labeled
make
this
a
little
more
simple
of
an
analogy
to
pick
up
and.
A
Now,
let's
look
what
this
kind
of
looks
like
right.
So
if
we
have
this
task
right,
we
want
the
probability
of
it
raining.
That's
our
task
and
we
know
we
can
input
temperature
pressure
and
moisture
levels
from
some
sensor
we
collect,
you
know
somebody
could
go
in,
they
could
code
function,
one
that
says
if
temperature
is
let's
say
we're
in
florida,
where
it's
100
degrees
all
the
time
it
can
still
rain.
That's
kind
of
my
new
point
right,
but
you
get
the
idea.
A
Temperature
is
100,
then
pressure's
whatever
and
moisture
is
high,
then
go
to
the
next
function
right
and
then
again
and
again
again.
What
that
could
look
like
right
is
is
something
like
this
some
hand
written
function.
Now
this
you
know
is
a
little
more
difficult
right.
This
is
this
is
a
lot
more
than
just
function,
one
two
and
what
I
made
at
such
a
high
level.
So
we
have
to
put
temperature
pressure
and
moisture,
and
then
you
can
update
the
mass
mask
update
the
momentum
update
the
energy
do
macro
physics,
do
microphysics
and.
B
A
You
finally
get
some
prediction
on
participation,
precipitation
and
that's
converting
expert
knowledge
into
these
functions
right
and
each
one
of
these
would
be
its
own
function.
That's
very
intense
and
very
labor
intensive
to
write,
and
now
we
get
into
this
learn
function,
idea
this
machine
learning
function
where.
A
A
D
D
A
A
prediction
right,
so
you
can
see
how
these
two
things,
one
very
labor,
intensive,
very
complex
and
the
other
a
learned
function,
just
based
on
data
that
we
have
that's
labeled.
It's
amazing
really
shifted
everything.
We
did
everything
we
do
now
in
this
whole
domain,
so
today
right
learn
to
use
this
new
approach
and
revolutionize
science.
I
do
not
know
why
that's
in
there,
but
we're
gonna,
we're
not
gonna
actually
touch
any
real
world
science.
Today,
that's
gonna,
be
tomorrow
today.
We're
just
gonna.
Do
two
easy
primer,
jupiter
notebooks
on
curiosity
cluster.
A
A
The
machine
deep
blue,
maybe
I
think
it
was
called
that
executed
and
beat
tons
of
people
in
chess
and
then
in
the
80s
early
2000s.
Well,
I
guess
up
to
2010
machine
learning
was
taken
over
right.
It's
a
subset
of
ai,
obviously,
but
instead
of
having
expert
systems,
we
had
machine
learning,
algorithms
that
learned
from
data,
but
the
data
had
to
be
transformed
into
some
handcrafted
features
right
hand.
Crafted
features
so
feature.
Engineering
was
a
huge
component
in
these
two.
A
A
Years
span:
that's
a
decade
right,
anyways,
yeah
from
the
80s
to
2010
right
future
engineering.
There
are
probably
tons
of
people
with
phds
right
now.
Well,
there's
not
tons
of
people
with
phds,
but
other
ones
with
phds.
A
large
number
that
did
machine
learning
feature
engineering
could
have
been
a
dissertation
right,
a
whole
phd
in
doing
future
engineering
on
time
series
data.
On
this
specific
time
series
data
set
that's
collected
from
a
sensor
to
do
this.
Xyz
thing
right,
but.
A
A
Make
it
super
fast,
well,
super
fast,
faster
and
we'll
just
learn
everything
from
just
data
right.
We
can
learn
our
output
and
features
from
data,
and
that's
where
we
are
so
that's
why,
when
I
use
the
that
that
quote,
ai
mldo
data
science
kind
of
can
encapsulate
all
of
this
right,
because
you're
doing
science
on
data
data
analysis,
it.
A
A
Learning
right:
when
should
we
use
traditional
so
before
we
get
into
that
right?
Let's,
let's
look
at
this
difference
again
and
again.
I
hope
I
hit
on
it
a
little
bit
this
feature
extraction.
That's
the
main
difference.
You
have
input
data,
you
have
a
classifier
but
feature
extraction.
You
know
it
used
to
be
a
human
in
the
loop
or
some
feature
extraction
technique
that
a
human
made
special
for
that
task
right.
A
Something
like
I
don't
know:
72
sift
feature
locations
and
if
you
showed
the
image
at
a
different
angle,
it
could
still
pick
up
those
sift
locations
and
get
those
features
right
so
that
whole
idea
of
a
translation
of
variance
was
there
and
then
those
features
you
know
went
through
a
bunch
of
things
to
get
the
features
they
would
be
put
through
a
classifier
one
of
the
best
ones
at
the
time
was
an
svm
right.
Svms
are
running
lead
on
all
benchmark
image.
Classification
data.
A
D
A
Rapids
xg
boost
with
rapids
is,
is
like
in
every
kaggle
competition
you
can
think
of
and
kaggle
is
an
open
data
science.
Competition
website
that
you
know.
A
A
So
if
we
have
a
small
set
of
features,
maybe
only
10
or
100
pieces
of
data,
too
might
want
to
look
at
traditional
machine
learning
right.
It's
notorious
deep
learning
does
need
a
lot
of
data
and
that's
that's
a
huge
research
area.
You
know
that
I'm
very
passionate
about
so
my
dissertation
was
trying
to
do
fu
shot
generation
using
deep
learning
on
time
series
data
because
there's
a
lot
of
instances.
A
You
know
if
we
want
to
push
ai
deep
learning
applications
to
make
the
world
better
right.
A
lot
of
instances
don't
have
a
lot
of
data
collected.
So
if
we
only
have
a
few
because
it's
expensive,
maybe
that
occurrence
doesn't
happen
more
than
once
every
every
10
years
right.
Things
like
that,
there
needs
to
be
a
way
to
be
able
to
generate
more
data,
so
we
can
use
these
tried
and
true
deep
learning,
algorithms
or
come
up
with
better
few
shot,
zero
shot.
Classifiers,
you
know,
detectors,
etc.
A
G
A
A
A
A
We
have
something
called
kupai
that
you
can
run
on
a
gpu,
get
10
100
x,
speed
up
no
joke
right,
no
joke
panda's,
another
popular
one.
We
have
something
called
qdf.
A
Qdf
and
pandas
are
so
close
one
to
one
to
get
a
10x
100x
speedup.
You
can
literally
get
your
environment
with
nvidia
rapids
in
it
that
you're
running
your
code
on
and
instead
of
importing
pandas
as
pd
just
import
pdf
as
pd,
and
everything
should
be
one
to
one
all
right.
There
might
be
one
or
two
functions:
that's
not
right
now,
very
rare,
but
then
all
your
all,
your
data
frame
will
be
on
the
gpu
and
every
little
thing
you
do
will
get
10x
100x
feet
off.
It's
amazing!
That's
a
side!
A
Note,
though,
we're
not
really
focused
on
that
today,
but
rapids
is
something
I'm
very
interested
in
too.
So
I
teach
a
course
on
rapids
use
rapids
a
lot
with
my
researchers
at
uf
and
other
researchers.
It's
just
really
strong
right.
It
cuts
down
the
amount
of
time
you're
waiting
to
load
in
a
data
set
which
you
know.
So
I
will
tell
this
story
back
in
my
lab.
We
had
a
gigantic
data
set
of
infrasonic
data
time
series
data
from
infrasound,
and
it
was
typical
you
would
leave
the
lab.
A
You
would
load
into
data
come
back
the
next
day
and
your
data
would
be
loaded
and
you
could
start
working
right.
That's
how
long
it
took
to
live
this
humongous
data
set
and
I
started
nvidia.
I
told
my
advisor
about
rapids.
I
said:
hey
just
try
this
out.
You
know
you
got
nothing
to
lose
and
the
phd
student
who
was
in
the
lab
at
the
time
did
the
same
method.
We
always
do
load
in
the
data,
get
ready
to
go
home
for
the
evening
before
he
got
to
his
vehicle.
A
A
D
A
D
B
G
A
D
D
A
D
A
A
We
have
three
inputs
here:
x1
x2
x3.
We
got
weights
for
each
one
of
those
that
connect
to
an
output
y
and
that's
what
we're
trying
to
find
and
we
let
the
data
in
this
some
activation
functions
inside
each
neuron
right
figure
out
what
this
function
is
to
get
the
best
y
and
you
think
of
it
as
a
generalization
of
curve
fitting
too
right.
A
The
big
difference
between
these
two
is
this.
We
just
have
you
know
in
this
case
it's
2d
right.
We
have
some
floating
point
numbers
we're
trying
to
find
the
line
of
best
fit.
You
can
think
of
this
back
in
the
day
when
you
did
your
chem
labs
and
you're
in
excel,
and
you
had
to
do
that
for
the
first
chemistry
lab.
B
D
A
B
A
A
Right,
that's
the
whole
idea,
that's
all
we're
doing
find
a
function,
and
here
we
just
feed
in
data.
We
know
what
our
outputs
are
going
to
be,
and
we
just
kind
of
do
this
optimization
throughout
this
whole
thing
right,
so
that
that's
a
very
generalization
of
what
deep
learning
is
trying
to
fit
into
an
analogy
of
curve
fitting,
but
let's,
let's
have
a
little
fun
and
look
at
some
examples
of
you
know:
images
real
world
data
and
here's
one
called
lunar
crater
identification
via
deep
learning.
So
you
can
see
you
have
some
digital
elevation
map.
A
I
call
it
a
map,
let's
just
call
it
an
image.
I
don't
want
to
upset
anyone
that
actually
knows
what
this
is
and
you
have
your
ground
truth
right
and
then
you
have
your
predictions
and
you
can
see
this
is
actually
pretty
straightforward
and
pretty
pretty
good
results.
A
You
know
this
is
from
from
a
team,
I
think
from
the
university
of
toronto
and
it's
the
automatically
detect
craters
on
the
moon.
That's
the
whole
the
whole
point
and
their
model
was
able
to
recover
92
percent
of
the
craters
in
their
test
set
and
and
that's
amazing.
So
the
blue
circles
are
the
ones
that
got
right
and
the
purple
circles
are
the
ones
that
got
wrong
in
this
middle
image,
and
that's
that's
remarkable.
A
Considering
there's
a
lot
more
blue
than
there
is
purple
now
on
this
this
one
to
the
right,
I'm
not
sure
the
predictions
there
must
be
a
way,
they're
scaling
out
the
predictions
to
make
sure
so,
if
they're
too
small,
maybe
they're
just
not
putting
them,
I'm
not
sure
I'd
have
to
look
more
into
that
paper,
but
they
were
able
to
identify.
Oh
that's
what
it
is.
That
makes
more
sense.
So
this.
A
A
A
A
So
u-net's
very
popular,
especially
in
medical
segmentation.
Anything
really
segmentation
is
pretty
neat.
We're.
A
A
A
It's
pretty
remarkable
sunspot
prediction
not
on
people
but
on
on
the
sun
right
predicts.
A
A
A
Set
down
to
something
that
they
can
use
on
a
deep
learning
algorithm
and
they
train
the
convolutional
network
on
small
crops
only
and
predict
on
full
resolution
images
and
the
results
are
pretty
good
again
right.
It
enabled
them
to
label
1.5
million
images
where
they
would
have
taken,
probably
1.5
million
days
to
do
it
with
their
slow
handcrafted
algorithm.
A
So
just
remarkable
all
right
so
deep
breath
everyone
we
have.
We
have
about
34
minutes
left
and
I
have
a
chunk
of
slides
to
go
through
so
now
we'll
get
into
a
little
bit
of
training
with
it
like
to
train
these
algorithms.
A
What
are
we
doing
when
we
train
them
and
then
we'll
get
in
the
implementation
basics?
Maybe
you
need
to
do
deep
learning,
basically
spoiler.
There's
gpus
involved
right
in
this
one,
we'll.
A
So
in
this
training
versus
inference
phase,
we
can
think
of
training.
As
this
awesome
lego
looks
like
a
transformer,
no
pun
intended
and
you're
going
to
try
to
build
this
transformer
out
right.
So
this
is
training,
so
you
think
of
it.
In
your
deep
learning,
algorithm
your
scene,
learning
algorithm.
You
go
out
of
your
data
and
you're
trying
to
piece
together
this
this
training
with
all
this
data
in
this
algorithm
to
get
them
optimization
right,
the
best
optimal
use
case
of
your
of
your
model,
and
then
we
have
our
inference
phase
we.
A
This
is
not
a
transformer.
That's
I
don't
even
know
what
this
is.
This
is
like
an
old
cartoon
model.
I
think.
Nonetheless,
we
got
this
transformer.
Let's
say
this:
this
bot,
and
now
it.
A
A
Apply
the
lead
model
right,
so
that's
the
difference
we
train
and
then
we
use
that
model,
that's
been
trained,
it's
converged
hopefully,
and
we
can
deploy
it
in
inference
mode
or
test
mode
or
in
the
wild
right.
Anything
like
that,
where
it's
actually
looking
at
real
world
use
cases
data,
that's
never
seen,
and
it's
applying
everything
it
learned
from
training
to
that
data,
and
then
you
can
continuously
do
this
with
online
learning
right.
A
A
A
D
A
A
A
Be
a
maximum
we're
trying
to
do,
but
for
this
case,
let's
just
say
we're
trying
to
find
the
minimum
loss.
So
if
this
is
error,
reconstruction
error-
let's
say
there:
it
is
reconstruction
error.
We
want
that
to
be
as
small
as
possible
right,
because
we
want
our
reconstruction
to
look
just
like
the
input,
and
then
we
have
an
optimizer.
That
is
basically
the
strategy
to
search
this
manifold
space.
A
This
optimization
space
to
find
these
optimal
parameters
to
get
our
model
to
have
the
best
weights
to
have
the
minimum
loss
right,
so
it
all
kind
of
works
together.
Many
choices
be
made,
though,
so
here
is
one
of
those
techniques
that
we
use.
It
is
the
technique
we
use
not
one
of
to
find
a
solution
for
this.
D
A
Descent
right
so
here
in
this
3d
space,
we
start
at
some
random
point
with
random
weights.
That
could
be
our
first
pass
through
the
data
right.
We
compute
the
gradient
of
that
loss
function
and
we
send
it
back
right.
We'll
talk
a
little
more
about
this
and
then
we
take
a
step
in
the
descending
gradient
descent
and
we
move
a
little
bit
and
we
try
to
get
to
some
optimal
right.
Some.
D
C
A
This
case,
so
we
stopping
the
error
small
now,
in
this
case
it
looks
like
there's
two
saddle
points
that
could
be
similar
errors
right,
that's
something,
hopefully
our
manifold
space,
our
optimization
space
as
well
behaves
and
doesn't
have
a
ton
of
saddle
points.
There's
nothing.
You
can
be
too
sure
about
right.
So
a
lot
of
these
optimizers
help
with
that-
and
here
is
one
two
three.
A
B
A
B
A
Atom
stands
for
adaptive
momentum
and
it
works
well
for
many
image
problems
for
sure
and
basically
it's
a
way
to
jump
over
local
minimum
to
get
to
a
global
minimum,
and
I
mentioned
back
propagation
right,
compute
the
gradient
efficiently
assigning
weights.
So
we
do
a
forward
pass
of
all
our
data
through
a
network.
So
all
our
data
gets
pushed
through
a
network.
A
All
these
weights
remain
constant
right.
Remember
the
weights
up
so
think
of
these
as
all
x1,
x2,
x3,
x4
and
so
forth.
They
go
forward
to
the
next
layer
we
have
weights
assigned
to
each
one
of
these
everything's
fully
connected.
So
there's
a
ton
of
weights,
as
you
can
see,
with
all
these
lines
and
we
get
to
some
output
prediction
and
we
have
a
loss
function
right.
A
So
we
compute
the
gradient
of
that
loss,
function
and
propagate
it
backwards
to
update
those
weights
right
all
those
weights,
and
you
can
see
that
with
the
the
blue
line
and
the
red
line
too.
But
that's
that's
the
goal
right:
we're
trying
to
update
weights
these
parameters
with
the
gradient
from
the
proceed
from
the
post-seating
layer
right,
so
the
layer
further
back
so
back
propagates
all
the
way
through.
A
And
then
you
know,
each
weight
can
be
nudged.
You
know
a
little
bit
in
some
small
amount
of
direction
to
obtain
this,
this
great
function
that
gives
us
that
optimal.
But
that's
why
we
train
on
a
lot
of
data,
to
make
sure
that
back
propagation
is
a
little
more
sophisticated
and
we
train
on
many
epics
right,
many
epics
epics
being
one
pass
through
the
entire
data
set
and
those
epics
give
us
a
better
understanding
of.
What's
going
on
in
the
data
to
update
this
function,
all
right.
I
A
This
is
a
pet
peeve
of
the
mind,
so
here's
the
soapbox,
I
don't
know
why
we
talk
about
pie,
torch
autograd,
when
this
whole
thing
is
in
tensorflow
and
keras,
but
they
do
the
same
thing
so
tensorflow
keras
has
an
auto
grad.
So
basically
we
don't
have
to
compute
the
gradient.
We
let
the
framework
compute
it
for
us
and
take
care
of
the
back
propagation
for
us
right.
So
this
entire
slide.
All
you
got
to
get
out
of
this
is
these
deep
learning
frameworks
make
this
effortless
effortless
for
us
right
now
and
if.
D
A
You
can
use
pi
torch,
autograd
tensorflow's,
auto
grad
keras,
which
is
part
of
tensorflow,
and
it
will
go
ahead
and
compute
the
gradient
for
you
and
do
all
that
right.
You
just
have
to
input
the
function
into.
A
E
A
Some
random
data
and
we
assign
some
validation,
data
and
training
data
right,
so
you're
like
what
is
that?
What's
the
difference
right,
we
have
predictions
too,
that's
our
black
line,
so
we
actually
do
a
great
job
fitting
this
line
right.
But
are
we
over
fitting?
That's
a
word?
We
use
a
lot
right,
overfitting
and
you're
like
what
does
that
even
mean?
A
So,
let's,
let's
talk
a
little
more
about
our
data
and
sense
and
we
have
a
whole
pile
of
data
and
what
we're
going
to
do
with
that
data
is
break
it
up
in
training
data
for
the
training,
the
model,
here's,
this
validation,
word,
validation,
data
for
hyper
parameter,
tuning
and
test
data
for
final
eval.
So
in
a
training
loop,
you're
going
to
train
the
model
and
each
epic
it's
going
to
see
some
data.
It's
never
seen.
People
sorry
see
some
data,
it's
never
seen
before,
and
it's
going
to
validate
on
that
right.
A
A
A
A
A
A
A
And
you
need
a
gpu,
do
you
really
need
a
gpu?
Yes,
I
work
for
a
video.
I
have
to
say
that
you
can
do
cpu,
but
it
just
takes
forever.
So
gpus
really
make
life
simpler
for
us,
then
there's
different
deep
learning
frameworks,
python
based.
We
have
pi
torch
and
carriage
tensorflow
and
before
when
I
would
touch
this,
I
told
everyone.
I
knew
no
one
that
used
mxnet
or
julia
and
stephen
and
his
awesome
plots
showed
that
he
has
five
people
using
these
combined.
A
Guy
I
used
tensorflow
when
it
first
came
out
when
we
had
to
use
a
bazel.
A
To
get
it
into
c
c
plus,
and
that
was
miserable
so
now
it's
all
in
pi
torch,
it's
really
great
and
then
for
our
case
we
use
jupiter
notebooks,
I'm
a
big
ide
guy,
because
I
like
debugging
and
seeing
variables
and
everything
on
the
fly
but
jupiter
notebooks
are
really
popular
because
you,
you
have
something
you
can
present.
You
know
if
you're
going
to
make
this
open
source.
Everyone
loves
reading
blogs,
especially
blogs
with
code.
Why
not
have
a
blog
with
code
that
you
can
execute
like
a
jupiter
notebook
or
a
google
collab.
A
And
then
you
know
nvidia
gpu
cloud
registry.
We
have
a
ton
of
stuff
on
the
nvidia
gpu
cloud.
We
have
containers
that
are
optimized
for
nvidia
gpus.
We
have
a
ton
of
sdks
like
clara,
well
monie
moni's,
our
medical,
imaging
sdk
wins
our
ta
here
today,
she's
a
mona
expert.
So
I'm
just
giving
her
a
shout
out.
She
saves
my
life
a
lot
of
uf
and.
C
D
A
D
A
A
A
Deep
learning
and
gpus
everyone's
doing
great,
we
got
20
minutes
left
all
right,
so
this
is
actually
pretty
cool
code.
I
love
verifying
my
gpus,
it's
very
important
when
I'm
doing
multi
gpu
things
which
we
don't
touch
on
here.
So
I
do
apologize
for
that.
But
we
do
do
a
ton.
A
That
you
can
access
funny
story.
I
had
just
started
my
job
prior
to
nvidia,
where
I
ran
an
ai
prototype,
we'll
have
for
the
dod.
They
got
me
this
sweet
workstation
with
two
gpus
I
was
like
yeah.
This
is
great,
but
I
was
using
because
they
were
envy
length
and
everything
these
two
gb
100,
which
shows
right
here,
thought
I
was
using
multi
gpu
work
and
kept
getting
om
errors
out
of
memory.
I
was
like
what
is
going
on
in
here.
One
of
the
gpus
wasn't
even
pcied.
D
A
Into
the
board,
so
if
I
would
have
just
ran
one
piece
of
code
to
see
what
devices
I
had
available,
it
probably
would
have
stopped
probably
like
four
weeks
of
misery
at
that
job.
Nonetheless,
this
is
important
stuff,
so
one
line
for
tensorflow
one
line
for
cares.
You
know
just
to
find
out
if
you
have
gpus
and
what
gpus
you
have
pi
torches
a
couple
lines.
You
know
just
to
get
more
printouts
on
it,
but
doing
this
is
very
important
and
they
do
that
in
the
lab.
A
So
you
don't
have
to
worry
about
copying
this
down,
and
then
you
know
I
mentioned
this
before.
Like
gpu
usage
and
deep
learning
frameworks
is
simple:
they
just
they.
Let
us
do
it
right.
It's
it
makes
life
easy.
These
deep
learning
frameworks,
we'll
not
talk
about
julia-
that
that
looks
like
looks
like
hook,
is
focused,
but
keras
automatically
is
your
gpu
tensorflow
2.
pytorch
is
actually
a
little
more
robust
right,
because
you
can
do
a
bunch
of
things
on
the
cpu
too.
A
Debugging
and
will
let
you
know
if
you
have
a
tensor,
that's
on
the
cpu
and
you
forgot
to
put
it
on
the
gpu
right
and
then
the
coolest
thing
nvidia
ever
did
for
a
terminal
is
nvidia
smi
system
management.
Interface.
Lets
you
look
at
your
gpus.
You
can
look
at
it
real
time
by
typing
in
watch
space,
nvidia,
smi,
dash,
smr,
and
you
can
see
everything
there
is.
A
You
can
see
your
gpu
fan
usage
right,
the
temperature
they're
running
at
and
when
we
were
running
a
super
large
language
model
on
this
new
superpod
university
of
florida,
we
actually
had
our
temperatures
get
up
to
80
degrees
celsius.
I
think
higher
80s
and
it
flagged
right
like
we
were
flagged
by
that.
So
we
were
actually
monitoring
that
in
flag,
because
that
means
really
slow
training,
something's,
not
cooling
correctly,
and
we
had
to
troubleshoot
that
quick
or
else
you
know
things
just
don't
don't
perform
up
the
part
you
can.
A
D
A
B
A
A
A
Not
in
the
input,
obviously,
but
each
one
of
the
neurons
in
the
layers
and
the
output,
they
have
an
activation
function,
so
sigmoid
and
tan
h
were
used
first
and
everything,
but
we
had
a
lot
of
discrepancy
with
those
because
they
have
errors
in
training
right.
Sometimes
things
would
just
blow
up
things
would
vanish
things
didn't
do
well.
So
the
tried
and
true
now
is
relu,
which
is
rectified
linear
unit,
which.
A
To
zero
right
so
that
max
zero
x-
and
it
gives
you
some
non-linearity
to
your
network
and
you've
learned
a
bunch
of
really
cool
stuff
used
a
lot
in
cnns
tons
in
cnns
and
pretty
much
anything
now.
Really.
I
don't.
G
C
A
Can
age
or
signal
usage,
but
leaky
relu?
We
use
a
lot
in
gans
generative
adversarial
networks,
because
sometimes
we
need
some
negative
input:
negative
output.
Sorry
from
that
activation
function.
Now
you
could
be
thinking
wow.
What's
that
even
mean?
What's
that
look
like
do
not
fear,
because
I
have
the
shared
new.
A
A
A
So
it's
a
lot
of
fun
right.
So
if
we
have
our
activation
function
as
sigmoid
and
we're
going
to
try
to
learn
this
distribution
over
here,
that's
the
whole
point
right:
we're
going
to
try
to
find
decision
boundaries
of
these
two
classes.
So
if
you
hit
play
actually
follow
it
over
here
where
there
it
goes,
hopefully
everyone
can
see
that,
let's
see,
maybe
if
I
make
it
a
little
bigger.
A
And
we're
going
to
it's
still
learning
you
can
see
the
training
date
is
learning
we'll
just
stop
it
here
at
a
thousand
ish.
You
see,
we
didn't
do
too
hot
right,
but
let's
just
change
it
to
relu
same
exact
network
right
input,
x1
x2,
two
hidden
layers
with
two
neurons:
each
each
one
is
relu
activation.
A
A
A
A
A
B
A
A
What
you
want
to
see,
though,
is
this
training
loss
and
test
loss
decrease,
let's
just
go
into
town,
crazy,
anyways,
fun,
fun
little
thing
to
play
with:
let's
get
back
to
the
slides
cool
I'll
move
that
out
of
the
way,
so
we're
not
looking
okay.
So
this
is
a
deeper
neural
network.
More
layers
left
more
levels
of
abstraction.
It's
a
super
old
paper.
This
is
actually
a
deep
belief
network
again.
G
B
A
Learning
objects,
pieces
of
objects
right,
not
really
objects,
but
like
a
nose,
an
eye,
here's
an
ear
things
like
that,
and
then
the
closer
you
get
to
the
output.
The
high
level
features
come
out,
so
we're
actually
learning
objects
which
is
whole
faces.
There's
a
popular
paper
out.
I
think
it
was
google
in
that
might
be.
A
Learned
a
the
cat
right
and
they're
like
oh
my
gosh,
this
craziest
thing
ever
learned
the
whole
cat.
So
when
a
cat's
fed
into
the.
A
The
only
neuron
that
fires
at
those
lower
levels
and
boom-
you
know
it's
cat,
but
the
difference
being.
This
is
a
deep
belief,
network,
cnns
and
mlps
work
similar
right.
So
if
you
have
a
bunch
of
layers,
those
layers
closest
to
the
input
will
have
low
level
features
and
those
layers
closest
to
the
output
will
be
higher
level
features
all
right,
eight
minutes
to
go
we're
doing
great.
We
are
all
stars,
I
don't
see.
Hardly
anything
with
chat,
awesome,
we'll
keep
going
so.
B
A
What
are
cnns
used
for
cnn
stands
for
convolutional
neural
network
problems
with
translational
invariance
is
why
they
came
about,
but
they're
used
in
1d
and
audio
and
time
series
right
and
variance
and
time
they're
used
in
2d.
Of
course,
computer
vision,
2d
spaces.
A
A
And
there
are
multiple
different
computer
vision
tasks
that
we
might
want
to
do.
There's
classification,
so
we'll
look
at
the
top
row.
I
like
talking
about
dogs
and
cats.
The
bottom
row,
though,
is
more
science.
You
know,
weather
related,
so
I'll
try
my
best,
but
here's
a
picture
of
a
cat.
The
classification
would
say
what
is
this
a
picture
of
right?
This
looks
like
a
cyclone
tropical
cyclone.
What
is
this
a
picture
of?
That's
classification,
show
it
an
image.
Tell
me
what
it
is:
what
class
does
that
image
belong
to?
B
B
A
C
A
You
tell
me
what
those
objects
are,
so
basically,
it's
classification
localization
on
multiple
objects,
so
we
have
two
cats
a
duck
and
this
adorable
puppy,
and
this
is
where
they're
at
same
down
here
right.
This
looks
like
it
has
spotted
six
cyclones,
tropical
cyclones
and
I'm
just
going
to
make
a
whim
that
this
is
an
atmospheric
river
just
because
it
seems
like
a
buzzword
right
now
and
then
instant
segmentation
is
exactly
object.
A
A
Little
nurse
gave
us
the
damage,
so
we
need
to
figure
that
out
but
atmospheric
river
right.
Here's
that
so
there's.
E
A
Pretty
simple
right
like
oh,
you
see
some
airplanes,
there's
a
good
chance.
This
is,
like
you
know:
runway
baseball,
diamond
beach
buildings,
it's
these
ones
where
it's
like
mobile
home
park
versus
medium
residential
versus
dense
residential,
like
that's
a
tough
one
to
decipher
through
right.
So
it's
pretty
powerful
what
these
deep
learning
algorithms
can
do
on
images,
but
the
right
data
labels
now.
A
Network
on
images
right
well,
I
mentioned
before
everything's
connected
fully
connected
each
is
connected
to
the
next
and
so
forth
and
so
forth.
So
if
we
have
a
megapixel
image,
one
megapixel
is
one
million
pixels,
our
input's,
already
one
million.
So
that's
one
million
weights.
We
automatically
have
to
learn
if
our
next
layer
just
had
one
neuron.
A
Now,
if
I
had
two,
you
know
this
goes
up
three
four
five.
You
can
get
the
idea
right.
They
just
don't
scale
well
at
all,
so
they
create
cnns.
That
kind
of
help
with
that
that
issue
and
we'll
talk
a
little
more
on
that
too.
It's
also
objects
in
nature.
The
translation
and
variance
right
objects
and
nature.
Look
the
same
from
place
to
place.
A
D
D
A
A
That
never
took
off
there's
some
papers
on
it
and
that
the
implementation
was
rough
and
it
really.
D
C
A
Application,
wise
was
not
so
cnns
are
still
trapping
true.
So
what
is
a
convolution?
It's
just
a
small
matrix
transformation
applied
at
each
point
on
the
image,
typically
through
some
convolutional
kernel,
and
this
case
is
a
three
by
three
edge
detector
kernel,
a
feature:
detector
kernel,
sorry
not
edge
detector,
and
you
just
put
it
over
the
exact
location
on
the
image
you
update,
that
middle
pixel
see,
and
you
just
slide
it
right.
You
might
be
thinking
wow.
That
is
very.
D
A
I
know
we're
getting
close
to
time.
That's
okay
might
get.
A
A
This
is
really
cool
right,
so
it
talks
about
what's
a
convolution,
what's
going
on?
What's
a
neuron?
What's
a
tensor?
What's
a
layer?
How
do
you
update
kernel
weights
in
that?
What
do
each
layer
of
the
network
do
and
then
get
down
here?
It'll!
Actually
look
at
each
network
right,
so
it
you
can
see
that
kernel
see
that
little
kernel
here
it
is.
Oh,
I
listen
that
little
kernel
sliding
around
this
image
and
updating
as
it
goes
right,
then
it
updates
right.
A
A
So
this
is
when
you
have
a
five
by
five
input,
but
your
kernel
is
a
two
by
two.
A
So
if
you
have
seven
by
seven
it
a
one
pad,
when
you
run
a
three
by
three,
you
can
see
how
it's
updating
the
seven
by
seven
with
the
padding
right,
I'm
gonna
change
that
anyways
check
that
out
play
with
it
or
some
the
notebooks
they
go
over.
This
really
well
too,
and
at
the
end
of
this,
is
a
video
tutorial.
How
you
can
actually
go
in
and
play
even
more
right
and
just
try
to
understand
what
cnns
are
doing
to
a
little
higher
level,
all
right.
So
with
that
we
are
at
time.
A
A
So
you
know
back
in
the
day
when
people
had
the
feature
engineer,
we
talked
about
somebody
spent
forever
figuring
out
hey.
How
can
I
get
edges
from
a
convolutional
filter
and
what
would
that
look
like
so
sobel
came
in
and
he
was
like.
Oh
if
we
have
this
filter
we'll
get
horizontal
edges
and
if
we
have
this
filter
we'll
get
vertical
edges,
oh
vice
versa,
and
boom.
If
you
apply
those,
that's
what
you
get
it's
pretty
neat,
but
now
cnn's
actually
go
in
and
learn
optimal
filters
right.
A
D
A
B
A
A
A
Actually
I
thought
it
was
and
then
alexnet
is
what
spurred
the
deep
learning
revolution
right
so
using
a
gpu,
they
accelerated
the
convolutional
cnn
portion
and
how
it
updates
and
backprops
and
everything
blew
up
since
then
that
went
to
vgg
and
inception
in
2014
resnet
in
215
exception,
resonant
50
and
densenet
2019,
and
these
were
all
state
of
the
art
in
imagenet
classification.
A
D
A
Every
benchmark
across
domains
88,
I
knew
it
was
88.
lynnette
first
started
it
looking
at.
You
know
endless
for
the
usps.
This
was
super
slow
because
there
was
no
gpus
at
the
time.
Utilizing
acceleration
so
svms
or
vector
machines
with
some
future
engineering
actually
outperformed
this.
So
it
kind
of
died
right
after
this
publication,
then
alex
came
in
with
hinton
and
one
imagenet
with
a
cnn
that
took
you
know
an
accelerated
pace
of
training
compared
to
lynnette
and
people
started
paying
attention
because.
D
D
D
A
Layers
of
different
sizes
and
vgg
is
used
a
ton
for
feature
extraction
right
now
and
you're
like
well.
I
thought
we
didn't
have
to
do
that.
It's
true,
but
you
can
actually
use
these
networks,
cool
features
and
then
use
those
features
in
different
downstream
tasks
as
well
like
video.
A
D
B
A
So
it's
crazy!
Here's
a
plain
cnn
and
you
can
see
the
resnet
is
just
the
plain
cnn,
with
some
skip
layers.
A
Densenet
came
in
and
said,
hey
what,
if
we
just
made
everything
connected
right
and
every
cnn
layer
can
know
what's
going
on
from
previous
cnn
layers
right,
so
it
has
a
more
universal
understanding
of
everything
being
learned
and
trained
from
the
data
and
they
did.
A
It's
huge
right,
it's
pretty
dense
and
it
takes
a
hot
minute
to
train
even
on
gpus,
but
it's
it
works,
and
then
we
get
to
vision
transformers.
So
this
is
not
from
the
original
paper,
but
I,
like
the
mushroom,
was
actually
shown
pretty
high.
So
it's
an
idea
so
just
taken
from
nlp,
where
you
have
sentences,
you
do.
A
You
know
some
type
of
patching
or
tokenizing
you
flatten
it
and
you
get
your
position
embedding
and
then
you
use
this
transformer
encoder,
which
is
this
over
here
to
the
right,
very
powerful
and,
like
I
said
it's
just
taking
over,
so
that's
it.
That's
all.
I
had
for
right
now
put
my
camera
back
on.
Hopefully
my
internet's
better,
because
what
happens
when
your
wife
comes
home
gets
on
her
phone
yeah.
A
It's
10
50
your
time
150
my
time
so
we'll
take
a
break
until
for
10
minutes
come
back
at
11
and
we
will
get
on
curiosity
and
we'll
start
the
labs
I'll
actually
talk
a
little
bit
about
the
lab.
We'll
look
at
some
challenges
here.
I'm
still
sharing
my
screen.
So
that's
good,
so
just
challenges
disperse
some
conversation
and
then
we'll
talk
about
the
lab.
Okay,
but
right
now
go
ahead
and
take
your
bio
break
thanks.
A
So
much
for
listening
to
me
talk
for
an
hour
and
five
minutes
do
apologize
for
that
and
yeah
I'll
see
you
back
here
in
10
minutes.
A
So
while
people
are
funneling
in
because
we
had
100
some
now
we're
down
to
90.,
that's
okay
I'll
talk
about
some
some
challenges
that
actually
steven
see
what
I'm
telling
you
that
presentation
you
gave
is
awesome,
but
some
of
the
challenges
they
have
right.
It's
labeling
large
quantities
of
data
so.
A
D
A
A
A
A
Transfer
learning
that's
gigantic,
especially
for
like
image
classification.
Where
you
take
one
of
those
huge
pre-trained
networks.
I
showed
you
on
imagenet
and
just
use
that
for
whatever
downstream
task
you
have,
where
you
can
you
can
you
know
we
do.
A
Dli
for
fundamentals
and
deep
learning,
but
this
is
neat
they're,
transferring
everything
they
learned
in
a
omniverse
environment,
the
simulated
environment
of
this
robot
arm,
and
then
they
just
take
it
and
fine-tune
it
on
the
real
data
set
in
real
life
and
then
pins
forcing
physical
constraints.
That's
just
that's
just
awesome
right.
Forecastnet
has
a
lot
to
do
with
that,
but
the
fourier
neural
operator
and
things
like
that.
Keeping
physical.
A
A
A
G
A
Classification-
and
you
know
this
has
been
tried
and
true
thousands
of
papers
on
it.
I
think
the
benchmark
now
is
like
99.97
accuracy.
So
it's
almost
perfect.
It's
not
that
difficult
right,
and
this
is
a
2d
cnn,
that
you
could
use
to
train
and
test
and
get
a
huge,
huge
accuracy
on
this
right.
So
here's
your
data,
you're
going
to
load.
J
A
A
Category
called
x,
val
and
y
val,
where
it's
part
of
the
training
data
that
you
use
for
evaluation,
but
what
we're
going
to
look
at
is
slightly
more
interesting.
It's
called
fashion
mnist,
it's
10
different
classes
of
little
thumbnails
of
different
types
of
clothing
and
bags,
so
t-shirt,
trousers,
pull
over
stress
and
so
forth,
and
this
is
something
that
they
tried
to
preach
a
lot
in
these
boot
camps
and
someone
came
up
the
other
time.
I
taught
this
and
they
had
a
discrepancy
in
it.
But
we'll
go
with
this
six
step
approach.
A
E
A
A
Share
screen
here
we
go.
Let
me
get
these
things
these
things,
I
don't
know
all
right.
So
here
we
are.
Hopefully
everyone
got
the
here
so
today
only
we're
going
to
do
intro
to
dl,
so
you'll
click
on
that
directory
hope.
Everyone
can
see
that
when
we
go
back
so
when
you,
when
you
launch
the
lab,
you
see
intro
to
dl
tropical
cycle
intensity
and
start
here,
so
you
could
start
to
start
here
check.
A
Intensity
estimation
tomorrow,
and
you
can
see-
I
do-
have
a
gpu
now,
if
you
go
to
this
plus,
you
can
actually
run
that
nvidia
smi
that
I
was
talking
about
in
jupiter
by
putting
exclamation,
nvidia
smi
and
you
can
see.
We
are
indeed
on
a100s
that
are
80
gig
a100s,
so
they're
powerful,
the
most
powerful
gpu
you
can
have
right
now.
That's
in
production,
so
you're
gonna
have
a
lot
of
fun,
very
powerful
machine
on
your
hands.
A
A
I
don't
know
why
it's
called
that,
that's
where
we
start
so
cnn
and
primer.
Please
take
your
time
read
this.
This
is
so
well
put
together.
It's
unbelievable
they'll
go
through
that
six
step
approach
that
I
kind
of
glossed
over
and
then
each
step
of
the
way
right.
They
talk
about
the
data.
They
talk
about
pre-processing,
the
data.
Why
you're
doing
this?
Why
you're
doing
that
right?
You
you
understand
it
from
the
code
and
then
they
have
stuff
written
about.
A
Why
you're
doing
it
like
here's,
a
great
one,
the
image
pixel
values
on
a
grayscale
image
range
from
0
to
255.
So
now
they
want
to
normalize
it
between
0
and
1,
for
your
train
and
test
data
and
then
every
lab
it
seems
I
get
this
question.
Why
are
we
doing
this
and
it
says
it
right
here:
the
normalization
of
the
pixels
help
us
by
optimizing.
The
process
with
the
gradients
are
computed
right.
A
So
take
your
time
right,
really
gather
what
you
want
out
of
this
lab
now
you
could,
in
theory,
hit
run
all
and
get
it
done,
but
you
don't
really
learn
anything
from
that
right.
We're
here
to
learn-
and
some
of
you
here
might
already
know
everything
in
this
primer
course
after
that-
I'm
sorry
you
can
run
through
maybe
do
a
quick
update
in
your
brain
kind
of
hang
out
with
us
for
another
hour
and
yeah.
So
you'll
do
part
two
first
right.
A
So
this
is
a
this
is
an
mlp
you'll
go
through
data
pre-processing,
defining
your
model
and
you're,
just
gonna
make
a
dense
network
at
mlp
and
then
after
you're
done
that
you'll
go
to
cnns.
A
Okay,
and
please
read
through
that.
This
is
really
good
too.
They
go
through
the
convolution
and
then
things
of
that
sort.
So
it's
it's
extremely
well
written
you'll
learn
a
ton
from
the
notebooks.
So
take
your
time
ask
questions
in
your
group.
Ask
questions
to
the
ta
in
the
group
anything
like
that.
A
So
there
is
one
thing
I
want
to
note
at
the
end
of
each
notebook
has
something
highlighted
right,
shut
down
the
kernel,
so
in
the
cloud
when
we're
doing
this
on
a100s
with
tensorflow,
when
the
notebook
is
up,
it
automatically
allocates
that
gpu.
So
when
we
get
into
tomorrow's
stuff
in
our
labs,
we'll
get
a
lot
of
hey,
I
got
om
errors.
I
got
all
that
mirrors
it's
typically
because
no
one
went
down
and
shut
the
kernels
so
to
do
that,
we're
on
our
directory
tab
right
now,
this
little
folder
you'll
go
right
below
it.
A
A
Run
everything
through
it
and
then
before
you
move
on
to
cnn's
just
come
in
and
shut
down
the
kernel
for
part
two:
okay
and
then,
if
you
want
to
keep
this,
you
go
to
file,
you
can
download
it
as
a
jupiter
notebook.
A
D
A
E
Awesome
yeah,
I'm
gonna
hit
open
up
breakout
rooms.
We
have
seven
different
groups,
so
seven
different
rooms,
each
room
has
a
ta.
The
ta
will
introduce
themselves
briefly
and
then
you
guys
can
go
ahead
and
jump
in
make
sure
you're
asking
questions.
If
you've
got
them,
you
can
always
post
in
slack,
doesn't
matter
so
you're
automatically
going
to
be
moved
here
in
a
couple
seconds.
E
Oh
cool,
so
let's
wrap
up
in
qa.
No,
I
hope
today
was
a
good
start
to
our
learning.
We
got
a
lot
of
hands-on
stuff
tomorrow,
so
with
that
we
can
go
ahead.
Did
you
want
to
go
through
any
of
the
results
from
the
notebooks
or
anything.
A
Thinking
about
it,
question
yeah
we
we
could
so
I
think
there
was
a
and
I
brought
it
up
to
you.
There
was
an
error
coming
up
where
you
know
the
the
kernel
wasn't
setting
itself,
so
maybe
some
people,
hopefully
everyone
nailed
that
yeah.
I
think
we
saw
with
the
dance
well
wait
a
second
I'll
just
go
ahead
and
pull
up
the
screen
any
of
the
tas
want
to
want
to
talk
or
anything.
While
I'm
doing
this
sorry,
I
didn't
think
to.
D
C
Caleb,
like
in
my
session,
there
was
a
question
that
we
just
thought
of
like
out
of
the
break
room,
just
in
the
last
minute,
as
I
was
trying
to
answer,
but
there
was
a
question
about
in
the
convolution
layers.
There's
a
parameter
called
filters,
which
is
either
64
or
32
like
the
question
was:
is
there
something
that
is
calculated?
Basically,
how
are
we
determining?
What's
the
number
that
we
need
to
use
in
this
parameter
filters
for
the
con
2d
layer.
A
Great
thanks,
thanks
for
relaying
that
so
I
saw
peter
shake
his
head
too.
I
think
I
saw
someone
also
in
another
room,
so
my
claim
to
this
all
the
time
is:
there's
no
exact
number
to
use.
We
usually
base
a
lot
of
the
stuff
off
of
papers.
We
read
right
models
that
have
been
successful
so
in
in
this
fashion.
A
We
look
at
a
paper
like
oh,
they
did
really
well
with
this
architecture
right
and
what
you'll
see
with
that
is
typically
their
powers
of
two
and
that's
just
because
gpus
work
faster
in
powers
of
tubes,
that's
how
our
threads
and
blocks
and
threads
and
blocks
and
warps
and
all
that
are,
are
calculated
and
formulated.
So
if
you
have
something
in
the
power
of
two,
it
flows
a
lot
smoother,
but
there's
no
calculation
on
how
many
filters
to
use
just
like
we
don't
know
the
the
perfect
size
to
use
either.
A
A
G
Yeah
yeah,
that's
basically
what
I
said
and
you
start
off
with
models
that
people
have
used
for
their
application
and
then
you
fine
tune
it
for
your
own
application
and
usually
64
128
256
are
the
numbers.
D
G
Buy
yeah.
A
I
At
least
theoretically,
you
can
basis
base
it
on
the
local
receptive
field
that
each
kernel
size
would
have
on
your
image
right.
That
would
capture
these
size
of
the
features
that
you
really
want
to
be
captured
right,
and
the
number
of
layers
would
determine
if
your
total
receptive
field
would
cover
your
entire
image
or
not,
at
least
theoretically
I
mean
practically,
you
need
more
experimentation
with
it,
but
at
least
theoretically.
A
A
B
A
A
Cool
any
any
other
questions.
Sorry,
I
pulled
up
the
lab.
I
don't
know
if
anybody
wants
to
benefit
from
walking
through
there
or
not.
I
think
we
saw
with
the
the
first
part,
which
is
part
two
of
funny
enough.
Adding
dense
layers
didn't
help
anything
with
the
performance
right.
A
It
might
gave
like
a
little
bit
of
a
bump,
but
we
really
didn't
see
any
of
those
bumps
until
we
moved
to
the
cnn
in
part,
two
well
cnn,
the
second
lab
and
that's
just
kind
of
a
test,
but
a
testament
to
like
how
powerful
cnns
are
right
and
how
they
pick
up
on
different
features
that
obviously
we're
not
going
to
learn
from
a
dense
network
of
flattening
because
we
lose
that
spatial
coherency
right.
So
we
flatten
an
image.
A
A
H
A
H
So
so
I
have
64
layers
of
these
matrix
matrices.
So
how
do
you
calculate
these
matrices
to
apply
the
filter
layer
by
layer
or
do
you
have
other
ways.
A
H
No,
no,
I'm
not
talking
about
the
numbers.
So
let's
say
we
ask
it
to
be
64
layers.
So
do
you
just
apply
the
filter
like
six
to
four
times.
D
G
Actually,
I
didn't
get
the
full
question.
So
yeah
you
have
64
filters.
Is
the
question:
how
do
I
compute
these
filters?
I
mean
these
are
all
learnable
parameters
and
the
optimization
algorithm
essentially
learns
what
the
optimal
weights
and
biases
is
for
each
filter
and
to
compute
the
activation
in
each
layer.
It's
just
a
matrix,
matrix
multiplier.
G
That's
it
and.
G
How
is
it
down
in
under
the
hood
or
or
maybe
you
could
elaborate
a
little
bit
more.
H
Oh,
I
think
I
understand
what
you're
saying
so
you're
saying
that
you're
just
do
that.
Multiplication
like
six
to
four
times.
G
Yeah,
so
if
you
have
64
different
filters,
then
you
do
that
convolution
operation
64
times,
but
that
convolution
operation
under
the
hood
is
just
a
matrix
matrix
multiply,
I
mean
that's
how
you
implement
a
convolution
underneath
and
yes,
you
basically
get
a
tensor
which
is
the
size
of
the
image
or
whatever
it
is
depending
on.
If
you
have
padding
or
not-
and
it's
just
like
the
operation
is
on
64
times,
so
your
output
will
have
64
channels.
A
Yeah,
thank
you.
You
nailed
it
there's
a
question
in
that
chat.
It
might
be
a
little
off
topic,
but
I
was
curious
about
the
execution
of
these
deep
learning
models
on
gpus
are
these
executed
on
the
tensor
cores
or
regular
sms?
Not
exactly
sure
the
differences
between
these,
but
the
a100
data
sheet
mentions
their
flops
performance
separately.
B
A
B
Yeah
the
reason
that
you're
able
to
use
mixed
precision
in
deep
learning
is
because
you're
you're,
basically
calculating
approximations
during
each
epic.
So
you
actually
don't
need
like
a
full
floating
point
representation,
a
full
32
representation,
32-bit
representation
of
of
your
values
right
so
you're
able
to
reduce
that
precision
and
multiply
your
compute
throughput
in
the
process
because,
like
I
said,
you're
just
calculating
an
approximation,
it
doesn't,
it
doesn't
have
to
be
perfect.
B
Thanks
robbie,
that's
not
true
for
scientific
applications!
That's
why,
in
science,
in
the
scientific
hpc
world
we
don't
use
tensor
cores,
I
mean
they're
starting
to
use
some
mixed
precision
stuff
for
some,
like
intermediary
calculations
in
certain
applications
and
stuff.
But
for
the
most
part,
most
people
are
using
at
least
single
precision,
if
not
double
precision
in
those
types
of
algorithms,
it's
really
just
algorithm
specific
and
how
much
accuracy
you
need.
J
J
Sorry
so
my
question
is:
when
these
dl
models
are
being
trained
on
the
tensor
cores,
the
sms,
are
they
underused?
I
mean
they
just
stay
idle.
The.
B
So
trying
to
find
a
like
a
diagram
for
you,
but
basically
inside
of
each
sm,
you
have
a
number
of
different
compute
units
and
it
varies
for
each
architecture
that
we've
released
and
that's
also,
for
example,
why
double
precision
compute
is
slower
than
single
precision,
compute
right,
typically
on
our
sms,
you
have
half
as
many
double
precision
floating
point
units
alus.
Then
you
do
single
precision.
B
So
we
typically
measure
I
mean
when
we
release
a
new
gpu,
we'll
say
the
single
precision
and
double
precision
and
mixed
precision
results,
but
the
the
main
number
that's
usually
advertised
used
to
be
historically
the
single
precision
floating
point.
Compute
number
more
recently
they've
started
to
highlight
deep
learning
performance
because
it's
obviously
really
really
popular
and
really
widely
used.
J
A
F
Yeah,
thank
you.
Yeah
yeah,
just
I
just
discuss
with
peter
and
give
a
very
good
explanation
between
the
fitting
and
our
traditional
fitting
and
some
machine
learning
uncertainties
but,
and
he
yeah.
He
kind
of
told
me
that,
in
order
to
get
uncertainty
from
similar
to
the
what
we
did
in
the
traditional
fitting,
we
need
a
lot
more
work
like
training
the
model
several
times,
and
we
need
to
take
into
account
several
sources
of
uncertainties
in
order
to
get
a
good
estimate.
F
I
I
So
in
full
bayesian
approaches
you
have
this
mcmc
kind
of
sampling
rate
markov
chain,
monte,
carlo
stuff
and
that's
kind
of
inherently
sequential
in
nature,
and
I'm
not
sure
if
there
are
any
algorithms
that
are
really
parallelizable
and
in
such
cases
it
may
not
be
beneficial
to
use
a
gpu,
but
there
are
certain
other
flavors
like
gaussian
process
regression
and
where
you
kind
of
assume
certain
analytical
assumptions
right
in
such
cases,
it
boils
down
to
just
like
matrix
vector
kind
of
multiplication,
so
there
actually
really
helps
okay.
F
F
I
B
A
Solid
great
great
font
and
then
for
your
the
previous
question,
mr
wine.
I
think
if
you
typed
it
in
the
slack
it'd
be
a
little
easier
for
us
to
get
right.
Now,
it's
a
pretty
long
question,
I'm
pretty
little.
I
think
praveen
did
a
great
job
answering
it,
and
I
agree
that
it
would
be
specific
to
the
use
case
right,
go
ahead
and
put
it
in
the
in
the
slack
and
I
think
we'll
get
a
better
understanding
and
maybe
a
better
answer
from
it.
A
Okay,
so
we
have
10
minutes
till
the
end
or
if
there's
no
other
questions,
we
can
go
ahead
and
wrap
up
now
and
I'll
see
you
all
tomorrow
at
nine.
Yes,
nine.