►
From YouTube: ReWork AI Summit Day Two Recap
Description
ReWork AI Summit Day Two Recap
A
A
Talk
about
the
rework
a
summit,
HTM
forum
Q&A,
if
I
have
time,
I've
got
a
10:00
o'clock,
stand-up
meeting
I'm,
not
sure
if
there's
a
research
meeting
yet
still
waiting
to
hear
back
from
the
research
team,
all
of
them
nothing
from
super
tired,
Geoff
today
so
see
what
Marcus
has
Marcus
has
anything
he's
not
quite
in
the
office
yet
so
I'm,
not
sure.
If
there'll
be
a
research
meeting,
and
then
we
will
have
a
building
interesting
system
session,
where
I'm
really
happy
with
how
the
learning
turned
out.
A
I
know
that
the
last
stream
ended
in
failure,
but
I
just
needed
a
little
sleep,
I
think
and
everything
goes
everything's
all
good
and
actually
the
connection
distribution
diagram
looks
great
when
you
add
learning,
because
you
can
see
the
connection
is
changing
over
time,
which
is
I,
didn't
think
about
that
honestly
and
it
just
was
very
serendipitous.
A
Yeah
I
was
totally
very
tired
at
the
end
of
that
day.
Mark
okay,
thanks
for
noticing.
So
let's,
let's
talk
about
day
two,
let's
see
if
I
can
get
this,
so
you
guys
could
see
my
whole
whole
screen
here.
So
the
first.
The
first
topic
was
deep
reinforcement
learning
this
is
mostly
I.
Think
I
did
nothing
but
deep
RL
on
this
day,
I
kept
running
into
quotes
these
guys
I
like
to
quote
they
lot.
They
love
to
relate
what
they're
doing
to
human
intelligence
in
errant
ways.
A
For
example,
the
presenter
for
this
presentation
said
pretty
quickly.
It
said
in
his
presentation
that
humans
don't
do
everything
from
scratch.
Nobody
was
talking
about
was
prior
knowledge
like
when
we
come
into
a
new
situation.
We
bring
prior
knowledge
to
that
situation.
That
was
his
point,
but
we
do
learn
that
do
everything
from
scratch.
I
mean
there's,
there's
genetic
predispositions
to
the
way
that
your
neurons
in
your
brain
are
structured
for
sure.
My
neck
is
better
mark.
Thank
you,
yes,
100%
100%,
but
those
that
all
that
prior
knowledge
is
something
that
we
learned
from
scratch.
A
So
this
was
about
priors
all
about
priors
in
robotics,
specifically
so
they're
trying
to
incorporate
priors
to
find
sort
of
the
sweet
spot
in
the
policy
parameter,
because
if
you
start
your
hyper
parameter,
search
in
a
bad
place
in
the
search
space,
you'll
never
get
to
optimum
parameters,
because
if
you
just
search
around
and
you're,
always
it's
always
less
sort
of
flat.
You
know
if
you
want
to
look
for
there's
these
sweet
spots
and
these
in
the
parameter
spaces.
A
A
First
thing:
9:00
a.m.
fire
engine.
Okay,
so
I
asked
a
question.
You
know
cuz
there's
this
theme
in
day.
One,
certainly
that
we
need
that
a
realization
that
RL
needed
to
move
towards
generalization
find
a
way
to
better
generalize,
and
so
this
is
moving
away
from
generalization,
he's
sort
of
balked
at
that,
and
didn't
really
admit
that
this
was
moving
away
from
generalization
but
said
well.
A
The
priors
can
be
applied
if
you,
the
priors,
are
very
general,
meaning
that
they're
about
objects
or
something
and
but
when
I
asked
him
well,
if
the
priors
are
general,
can
they
be
applied
to
different,
robotic
situations,
and
he
said
well,
that's
something
we'll
have
to
research.
So
this
the
incorporation
of
priors,
definitely
seems
to
be
movement
away
from
the
area
of
generalization.
So
anyway,
the
thing
the
technology
or
whatever
he
was
talking
about.
It's
called
I
lq
g,
which
is
model-based
residual,
RL
and
I.
A
Don't
really
wasn't
too
interested
in
what
it
was,
because
it
was
definitely
very
specific
about
one
thing,
so,
okay,
next,
the
next
presentation
was
interesting
because
it
was
about
space
and
exploration
of
space
and
stuff.
So
there's
a
company
called
offworld,
you
can,
let's
see
they
have
a
website.
That's
honestly.
A
This
was
that
there
were
some
details
about
this.
Let
me
find
the
website
offworld
a
I,
so
great
vision,
okay
for
wow.
This
is
what
they're
going
for
they
want.
They've
got
a
huge
pie
in
the
sky,
vision
about
exploring
other
planets
mining,
specifically
they're,
focusing
on
creating
robots
that
can
mine
on
earth,
and
so
this
is
something
that
Jeff
has
talked
about.
You
know
Jeff
Hawkins
of
in
it's
a
of
a
potential,
really
big
application
for
truly
intelligent
machines.
We're
talking
you
need
a
GI
for
this
first.
You
definitely
need
something
that
can
make.
A
Decisions
can
learn
all
by
itself
without
some
round
trip
back
to
earth
or
back
to
some
servers
to
do
retraining
after
a
new
environment
is,
you
know,
observed
so
a
huge
vision,
certainly
a
huge
vision,
very
ambitious,
but
but
definitely
a
big
marketing
pitch.
It
felt
like
to
me
I,
don't
know
I
mean
there's
they
have
investors
there
they're
doing
research,
they're
focused
on
mining
on
earth.
They
have
these
prototypes
of
diggers
and
crushers.
Oh
sorry,
they
have
bit
kings
talking
about
priors
he
quotes.
This
is
back
to
the
previous
presentation.
A
I
believe
that
everything
you
learn
is
framed
in
parts
in
terms
of
what
you've
learned
before
you
learn.
Delta
coding,
based
on
your
prior
life
experiences
yeah,
so
I
totally
believe
this,
but
but
this
is
all
about
hard
coded
priors
for
each
specific
situation
and
that's
that's
the
movement
away
from
generalization
and
I
was
talking
about.
It.
Definitely
wasn't:
transfer
learning
it's
from
what
I
could
tell.
A
Wouldn't
it
be
more
like
insect
level
thinking
rather
than
creative
thinking?
Oh,
you
mean
as
far
as
the
logistic
stuff,
no
I,
don't
I,
don't
know,
I
think
when
you
put
when
you
go
put
these
things
out
in
the
environment
of
Mars
or
whatever
you're
they're
gonna
run
into
situations
that
we
never
anticipated
and
they're
going
to
have
to
adapt
and
learn
or
die.
You
know
so
I
wouldn't
I,
wouldn't
say
it's
more
than
just
insect
level.
Thinking
they're
gonna
have
to
do
problem.
A
Solving
they're
gonna
face
situations
that
an
insect
would
not
be
able
to
solve.
I
think
yeah.
It
totally
sucks
to
be
stuck
with
those
shackles
is
the
deep
learning
shackles.
That's
that's
the
world
that
they
certainly
operate
within
and
I'll
have
some
sort
of
sum
up.
Thoughts
about
this
conference
at
the
end
and
I'll
talk
about
that
a
little
so
about
the
mining
thing
they
did
so
they
actually
have
these
diggers
and
crushers
that
they're
testing
in
real
ones.
Here's
a
here's
sort
of
an
example.
Okay!
A
So
there's
one
of
the
diggers-
and
this
is
what
this
is
where
they're
at
right
now
they've
got
a
robot
arm
and
it's
and
it's
got-
you
know
a
mocker.
You
know
a
chisel
essentially
and
they
brought
in
like
an
expert
Mason
that
they
had
chisel
and
show
them
like
the
best
way
to
do
the
chiseling
and
they
used
what
they
called
imitation
learning
and
so
the
so
where
they're
at
is
they're
trying
to
maximize
the
amount
of
mass
that's
being
extracted
with
each
attack
each
chisel
attack.
A
Now
there
they
do
have
they're
trying
to
do
navigation
type
stuff,
but
from
what
I
can
tell
from
the
rest
of
the
deep
learning
presentations
they're
so
so
far
away
from
being
efficient
at
this
you
know,
so
so
a
lot
of
these
deep
learning
systems
start
with
humans
in
the
loop
yeah.
You
could
be
right,
but
it
could
be
right
about
the
insects.
I
don't
know,
ants
are
pretty
smart
answers
pretty
smart
anyway.
A
A
Operate
the
robot
and
to
give
it
an
example
of
what
it
should
be
doing
sort
of
like
a
goal,
so
it
can
have
a
sort
of
jump
start
and
then
and
I
think
that's
what
they
did
here.
They
they
had
a
professional
Mason,
come
in
and
do
the
chiseling,
so
they
the
one
thing
about
this
is
that
I
thought
was
interesting
is
that
they
do
all
their
testing
and
training
in
the
real
world,
which
is
different
from
a
lot
of
other
systems,
because
they
they
can
train
so
much
faster
and
more
efficiently
in
simulations.
A
But
then
those
agents
when
they
transfer
from
the
simulation
into
the
real
world,
there's
a
big
problem
there,
because
the
real
world
is
not
a
simulation
and
there's
just
so
many
unexpected
things
things.
You
cannot
model
just
about
physics
about
reality
that
exist
in
the
real
world
and
the
agents
fall
flat.
They
they
can't
there.
Their
training
does
not
transfer
and
most
in
most
cases,
I
would
say
simulation
to
real-world.
So
the
interesting
about
this
is
they're.
Doing
all
this
training
in
the
real
world
and
they're,
the
reward
function
is
all
about.
For
this
example.
A
Mass
mind
per
attempt:
they
also
talked
about
using
hallucinatory
ganz
I,
didn't
quite
understand
this
I
sort
of
get
what
they
are,
but
they
were
trying
to
do
navigation
with
alou
Centauri
Ganz,
but
I
didn't
quite
see
where
they're
going
with
that.
So
in
summary,
I
think
that
this
company
has
great
vision
and
it's
cool
to
do
to
think
about
space
exploration
and
that
sort
of
thing,
but
it
seems
really
far
out
really
far
out
because
to
be
successful
at
this,
you
really
have
to
have
some
pretty
intelligent
machines
and
they're.
A
Really
deep
RL
seems
to
be
the
only
game
in
town
for
four
people
doing
robotics
that
are
at
least
the
most
promising
hello
disillusion
thanks
for
joining
I,
like
your
Zelda
emote
I'm.
Talking
about
this
conference,
I
went
to
last
week
and
most
of
its
deep
reinforcement
learning
so
at
we.
There
were
some
hard
questions
at
the
end
of
this
presentation
about
retraining
and
resets
and.
A
So
the
presenter
who
I
think
was
this:
the
founder
and
the
founder
of
offworld
admitted
that
there
must
be
retraining
on
offworld
location
so
with
the
approach
they're,
taking
with
deep
reinforcement,
learning
they're,
not
online
learning
systems,
so
I
feel,
like
that's
a
brick
wall.
You
know
power.
A
How
are
they
going
to
adjust
in
real-time
to
changing
situations
and
environments?
I'm
not
sure
they
are
there
are.
There
are
certainly
cases
where
machine
learning
discovers
surprising
solutions,
and
we
saw
that
and
when
I
talked
Friday
about
alpha
star
and
about
how
they,
the
tactics
that
alpha
star
had
against
humans
surprise
the
humans
they
did
not
expect
them
to
do
that,
and
even
in
the
go
alpha
go
again
that
really
surprised
the
the
go
players
that
the
tactics
were
unexpected
and
hadn't
been
seen
before
and
now
go
players
are
using
those
tactics.
A
A
That's
not
controlled,
it
changes
or
it's
unknown,
which
is
good
to
try
and
focus
on
those
environments,
because
those
are
the
environments
that
we
want
to
have
intelligent
things
operated
today
we
don't
have
anything
that
can
operate
in
unstructured
environments.
So
we
talked
about
what
the
problem
was.
One
of
the
problems
is
creating
a
a
reward
function,
deciding
where
the
reward
comes
from.
A
When
you
have
an
unstructured
environments,
if
you
don't
know
what
environment
you
can
be
operating
on,
how
do
you
know
what
to
reward
the
agent
for
what
actions
to
reward
the
agent
for
so
coming
up
with
these
ROI
reward
functions
is
a
hard
problem
and
expensive
because
training,
training,
training
and
training
these
things
have
to
be
trained
so
much,
and
that
is
expensive.
It's
all
compute
time
and
diversity
was
a
big
deal
at
this
conference
so
needing
to
learn.
A
A
diverse
range
of
skills
is
important
and
the
only
way
that
I
can
see
that
the
creators
of
these
systems
add
diversity
is
by
hand
coding
the
diversity.
So
that
means
lots
of
supervision
and
trying
to
learn
faster
with
less
supervision
is
one
of
their
goals,
but
it's
hard
to
add
action
diversity
without
that
supervision,
and
even
in
those
situations
Mark
Brown,
you
mentioned
that
they
come
up
with
some
interesting
solutions
to
problems.
A
It's
the
diversity,
I,
think
that
creates
those
interesting
solutions,
because
humans
have
to
sort
of
inject
ideas
into
these
agents
like
tell
them,
do
up
a
lot
of
this
and
then
do
a
lot
of
that
or
something
and
and
then
once
they
have
agents
that
have
certain
strategies
and
then
they
play
them
all
against
each
other.
You
know
that's
one
of
the
things.
Deep
learning
can
do
they
don't
it
doesn't
need
a
ton
of
data,
not
deep
learning,
deep
reinforcement.
A
Learning
you
don't
need
a
boatload
of
data
like
you
do
for
deep
learning
networks,
like
millions
and
millions
of
images,
you
can
create
these
agents
and
then
play
them
against
each
other,
so
they
sort
of
create
the
data
themselves,
but
you
do
have
to
inject
strategies
or
that's
what
they're
calling
diversity
different
strategies,
different
ways
of
being
rewarded
for
different
actions.
You
know
for
each
agents
and
then,
as
these
agents
that
have
different
strategies
play
each
other
out,
that
you
find
agents
that
these
this
emergent
behavior
sort
of
comes
out
of
that
diversity.
A
That's
which
is
like
combinations
of
strategies
that
that
haven't
been
tried
before
yeah
diversity
is
huge
and
everybody,
and
it's
such
an
interesting
topic,
because
diversity
is
important
in
life.
It's
just
as
well.
You
know
you,
as
we
all
sort
of
know,
I
think
Mark
said,
is
such
a
different
way
to
work,
as
it
only
stimulates
a
single
aspect
of
the
brain.
That
applies
simple
point
neurons,
but
the
thing
being
emulated
is
the
layers
two
parts
to
suck
problems,
space
and
manifold
topology
yeah,
it's
very
hard
to
visualize
these
things.
A
A
There's,
there's
a
population
effects,
hey
thanks
for
the
follow
Sam
Griffin
I've
watched
your
channel
quite
a
bit
always
rating
thanks
for
the
raid
appreciate
it
I'm
talking
about
deep
reinforcement,
learning
as
I
went
to
a
conference
about
deep
reinforcement,
learning
on
Thursday
and
Friday
and
I'm
going
over.
Some
of
these
presentations
that
I
saw
there
awesome
so
we're
talking
about
diversity
right
now
and
deep
reinforcement
learning,
because
it's
very
important
to
have
diverse.
A
There's
a
cop
on
it
for
somebody
to
see
that
motorcycle,
that's
a
cop,
so
meta
RL
is
leveraging
prior
experience
to
try
and
quickly
learn
new
tasks.
So
it's
I
think
it's
sort
of
a
way
of
transferring
the
learning
that
you've
you've
had
in
other
environments
to
learn
new
tasks,
but
it
still
requires
supervision
and
all
of
these
tasks,
even
in
meta,
reinforcement,
learning,
still
need
reward
functions.
A
Here's
just
a
peek
into
some
of
the
math
I
did
not
take
pictures
of
the
math
at
this
conference
because
I'm
not
a
huge
I'm,
not
a
mathematician
I'm,
not
really
great
at
math,
but
all
of
this
ton
of
these
these
things.
These
presentations
are
focused
on
robotics,
so
towards
supervise
the
meta
reinforcement,
learning,
there's
still
going
to
be
a
human
in
the
loop.
In
this
example,
you
need
a
human
trainer
and
they
even
talked
about
language
in
the
loop,
through
reinforcement,
learning
and
there's
a
couple.
A
Presentations
too
talked
about
this,
and
this
is
about
like
having
basic
language
understanding,
so
you
can
give
it
to
a
man.
Thank
you
for
Chuck
for
the
follow,
appreciate
it.
If
you
and
things
like
move
to
the
left
a
bit
or
move
up
a
lot
or
stuff
like
that,
if
if
you
can
have
basic
commands
like
that-
and
this
is
sort
of
the
human
in
the
loop,
it
gives
you
the
ability
to
have
less
intervention
because
you're,
not
necessarily
retraining.
A
You
can
adjust
the
agents
behavior
by
giving
it
a
few
small
commands
and
then
it
will
get
its
rewards
and
sort
of
learn
from
from
those
adjustments.
But
you
still
have
to
have
a
human
labeler
like
move
closer
to
the
green
triangle,
so
we
can
understand,
commands
like
that
and
then
the
agent
can
respond
and
once
he
gets
a
reward
when
it
gets
to
the
green
triangle.
You
know
it
gets
a
reward
for
sort
of
higher-level
commands,
which
is
the
tasks
that
the
agent
is
trying
to
call
mark
says.
A
If
we
did
do
the
hierarchy
with
HTM,
it
would
not
be
reward
functions.
This
will
be
breading,
some
serious
new
ground
and
huge
new
potholes
to
step
in
yeah,
I,
agree,
I,
think
the
hierarchy
and
the
cortex
is
not
when
they
talk.
They
talk
about
hierarchy
and
there's
another
presentation.
I
think
the
talks
about
hierarchy
and
deep
reinforcement,
planning
and
I
don't
think
it's
the
same,
the
same
thing
as
the
as
the
hierarchy
where
we
talked
about
in
HTM
okay.
So
this
supervises
the
matter.
A
A
Unsupervised
tasks,
reward
extraction
from
the
environment,
I
think
you
just
mentioned
that
I'm,
not
sure.
That's
a
thing,
oh
here
it
is
how
so
all
right.
So
they
start
with
like
random
reward
functions
when
they're
talking
about
unsupervised,
meta
learning,
let's
see
if
we
can
parse
this
slide
here
so
for
unsupervised,
meta
learning.
A
Let's
say
you
have
your
environment
that
here's
the
meta
RL
loop,
so
the
meta
learned
environment,
specific
algorithm
uses
some
reward
function,
has
some
fast
adaption
but
I
think
less
overfitting
to
tasks
distributions
random
reward
functions.
They
start
off
with
random
reward
functions
and
they
sort
of
evolve
over
time
and
again
they
are
a
diversity
driven
I,
don't
know
how
that
works.
A
When
your
diversity
driven
using
random
reward
functions
like
I,
guess
you
can,
you
can
reward
I,
don't
know
how
you
identify
a
diversity
and
random
random
reward
functions,
but
maybe
again,
but
you
still
have
to
dat's
how
you
do
the
verse.
You
start
with
a
predefined
set
of
tasks,
so
those
tasks
could
be
the
diversity
where
you're
using
diversity.
A
Yes,
but
it's
all
about
priors
a
lot
of
this.
A
lot
of
this
stuff
requires
priors
that
they
can't
get
anywhere
without
the
priors,
so
the
takeaway
from
this
presentation
was
that
supervision
is
extremely
important
for
solving
deep
reinforcement,
learning
problems,
and
this
is
sort
of
a
common
theme.
A
lot
of
the
takeaways
from
these
presentations
were
like.
We
need
priors,
we
need
supervision,
we
can't
generalize.
A
A
So
everybody
wants
to
have
autonomous,
robots,
right
and
everybody's
looking
at
deep
reinforcement,
learning
for
the
answers
so
and
they're,
assuming
that
this
is
the
way
to
get
at
ominous
autonomous
robots
is
it's
I
mean
for
most
of
these
companies?
It's
the
only
game
in
town,
it's
the
most
impressive
form
of
deep
learning
that
we
have
at
the
moment.
They're
all
referring
back
to
these
big
milestones
like
alpha
star
alpha,
go
all
the
games
that
they
can
play
and
beat,
and
everything
and
again
this
this
woman
made
a
reference
to
how
children
learn.
A
I,
always
I'm
always
very
wary
about
about
that,
but
anyway,
humans
remember
their
errors.
So
here's
the
the
whole
point
of
this
presentation
is
that
humans
remember
their
errors
and
use
them
to
learn
faster.
So
how
can
we
get
deep?
Our
ell
systems
did
the
same
thing,
so
she's
barking
up
the
right
tree
here
right
learning
how
to
learn
is
is
super
important
for
autonomous
agents
and
she's
trying
to
teach
these
robots
how
to
move
and
the
effects
of
their
actions
by
separating
out
the
errors
that
they've
made
and
and
transferring
those
learning
rates.
A
So
you
can
separate
that
model
with
that
memory
of
errors
like
says
right
here
or
learning
rates,
and
then,
when
you
go
to
a
new
task,
you
take
that
memory
of
all
your
heirs
with
you
and
apparently
that's
easier
to
transfer
you're
talking
about
learning
transfer,
so
they're
they're
experiments
where
we're
trying
to
do
that
sort
of
thing.
But
there
are
super
simple
experiments
like
a
robot.
A
So
this
is
the
task
they're
talking
about
a
robot
picking
up
nothing,
that's
one
task:
a
robot
I,
don't
know
how
you
get
a
reward
for
that,
but
a
robot
picking
up
a
light
object
and
a
robot
picking
up
a
heavy
object.
So
this
is
like
a
super
simple
model,
one
model
per
joint
in
the
robot
arm
or
whatever,
and
basically
transfer
learning
that
includes
learning
rates
that
which
is
which
includes
the
errors
and
learning
rates
or
the
errors
and
and
try
and
transfer
that
to
the
other
tasks.
A
So
they
successfully
took
errors
from
a
robot,
picking
up
no
object
and
applied
it
to
picking
up
a
light
object,
a
heavy
object
and
essentially
it
learned
faster
than
starting
from
scratch.
So
that's
that's
what
they
did.
One
takeaway
and
I
heard
this
over
and
over
again
is
online
learning
is
really
hard.
That's
a
direct
quote.
A
Ok,
so
here's
some
more
math
math,
you
stuff,
you
want
to
look
at
it.
So
here's
here's
the
lot
about
learning
the
loss
function.
So
you
have
to
have
this
differentiable
framework
high
dimensional
and
she
says
both
metal
loss
and
learning
rate
are
learned.
I'm,
not
I'm,
not
totally
I.
Don't
totally
understand
what
metal
loss
is
I
want
to
talk
about
meta
stuff.
A
Let's
see
you
still
here's
here's
one
thing:
you
still
have
to
provide
a
task
loss
function.
This
is
human
coded.
You
have
to
create
this
yourself.
You
can
change
the
policy
with
more
layers
or
other
architectures
yeah,
but
they
are
just
starting
on
this.
This
woman
was
very
excited
about
this,
but
it's
it
was
a
very
simple
experiment.
You're
just
talking
about
robots,
just
lifting
lifting
one
thing
up
and
down:
that's
it.
So
that's
I
mean
this.
That's
state-of-the-art.
A
Okay,
next
one
continually
evolving
machines
learning
by
experimenting.
This
is
there's
a
lot
of
research
scientists
from
Berkeley
presenting
here,
so
the
idea
was
again
to
equip
reinforcement,
learning
agents
with
prior
knowledge.
A
theme
at
this
conference
certainly
was
starting
with
prior
knowledge.
He
did
ask
some
interesting
questions
like
this
is
the
first
person
to
refer
to
object
representation
in
any
way
that
I
could
tell.
So
he
asked
the
question:
what
what
is
an
object,
but
what
do
you
have
to
understand?
What
an
object
is
and
then
from
what
I
could
tell?
A
Did
it
really
answer
it?
You
know
and
again
there's
this
reference
to
infant
learning.
Saying
that
play
is
a
form
of
experimentation,
absolutely
spot-on,
you
know,
and
our
brains
were
pre-wired
were
wired
to
enjoy
learning
and
exploration.
That's
the
thing
that
we
have
to
add
to
these
agents.
If
we
want
them
to
be
smart
and
to
learn,
we
also
imitate
other
other
agents,
which
is,
which
is
true
a
lot
of
talk
about
imitation.
A
A
There
is
really
no
as
far
as
I
can
tell
I,
don't
they
don't
think
about
it
in
the
same
way,
absolutely
so
trying
to
let's
say
the
inverse
model
of
action
prediction
instead
of
pixel
prediction:
oh
right
so
usually
in
been
in
these
robotics
deep
RL
systems,
there's
a
camera
and
a
robot
right
and
and
the
camera,
the
pixels
and
the
camera
are
what
is
fed
into
the
deep
reinforced
employment
system
just
like
when
they
do
the
Atari
games.
They
feed
every
pixel
into
the
system.
A
Also
said,
random
exploration
is
limiting
because
the
actions
should
increase
exploration.
So
that's
a
good
point.
If
you
just
start
with
random
actions,
you're
you
you
don't
explore
the
whole
space.
You
know
you
end
up
just
sort
of
a
little
bit
of
what
you
want
is
semi
random
action,
pseudo,
random
actions
that
that
can
optimize
on
exploring
new
space
right.
That
seems
to
be
the
right
way
of
doing
it.
So
they
came
to
the
same
conclusion,
which
is
good
driven
by
curiosity,
which
is
that's
a
good
concept.
A
That's
so
this
is
interesting,
so
they
are
actually
training
agents
to
maximize
his
prediction
error,
because
that
tells
them
that
they're
in
a
space
that
they
don't
understand.
So
that's
that's,
that's
an
interesting
way
to
think
about
it,
and,
and
they
could
they
call
this
in
a
way.
It's
like
self
self
supervision,
because
you're
incentivized
to
go
places
that
you
don't
know
and
experience
things
that
you
have
no
model
of
right
novelty,
see
yes
exactly,
but
there's
too
much
to
learn.
A
So
you
need
to
use
imitation,
so
they
again
going
back
to
priors
imitation
is
it
is
about
prior
prior
knowledge,
at
least
to
the
deep
RL
world?
It's
about
priors.
So
it's
interesting
that
you
can
treat
prediction
error
as
a
reward
function
to
train
the
policy,
that's
kind
of
a
cool
idea.
So
that's
why
I
took
away
from
that
one.
So,
here's
another
Facebook,
hey
I
research,
called
habitat
a
platform
for
embodied
AI
research.
I
was
a
little
bit
harsh
on
this.
He
didn't
have
a
completely
wrong
definition
of
embodied
AI.
A
But
from
his
perspective,
his
embodiment
means
in
a
3d
world.
Okay,
I,
don't
quite
agree
with
that.
I
think
you
can
be
embodied
as
long
as
you
can
take
actions
in
an
environment
and
and
perceive
the
changes
to
the
environment
based
on
those
actions.
I
think
that's
embodiment,
I,
don't
think
it
doesn't
it's
not
about
a
3d
world
at
any
dimensional
world,
it's
just
about
a
world
and
a
self
and
act
and
motion
or
movement
in
that
world
and
the
perception
of
the
changes
to
that
world.
You
know,
that's
that's
more
power.
A
So
the
habitat
thing
he
was
talking
about
was
a
project
that
he's
been
working
on.
One
of
the
things
I
wanted
to
look
at
was
this
matter:
port
3d,
if
you're
interested
in
3d
environments
is
for
for
exploration
for
agents.
This
is
really
interesting
because
they
have
this
technology.
Now,
where
you
know,
people
will
come
in
to
your
house,
for
example,
and
those
and
they'll
set
up
things
all
around
the
house
and
take
all
these
pictures
so
that
you
can
basically
fly
through
your
house
and
get
this
super-high
resolution.
A
You
know
a
graphical
interface
of
your
house
like
this
is
one
of
the
pictures
of
someone's
kitchen
and
you
can
go
all
through
it
and
all
of
these
objects
are
are
tagged,
you
know,
and-
and
you
can
interact
with
them
well,
not
eventually
I
mean
his
system.
It
creates
something
sort
of
on
top
of
that
that
can
interact
with
them.
So
they're
trying
to
create
like
an
image
net
for
embodied
AI
and
they're,
calling
this
habitat
it's
a
3d,
stimulator
and
I
was
confused
cuz.
A
He
said
so
you
kept
talking
about
how,
like
the
resolution,
is
so
high.
So
like
unity
when
you're
playing
a
video
game
when
you
humans
play
a
video
game,
the
refresh
rate
that
are
they
pay
attention.
So
thank
you
for
the
follow
FP
jest
when,
when
humans
play
video
games,
the
refresh
rate
that
we
know
it
is
I
think
it's
like
25
per
source,
something
like
anything
or
no,
it's
60!
Sorry
60
is
right
there
on
the
slide.
A
It's
60
Hertz
anything
over
60
Hertz
is
we're
not
gonna
even
notice
right
humans
do
not
process
input
that
fast,
so,
like
unity,
I,
think
runs
it
like
25
Hertz
or
something,
and
it
looks
fine
to
pretty
much
everybody,
but
it's
at
a
higher
resolution.
So
what
the
machine
would
rather
have
is
a
smaller
resolution,
but
at
a
faster
refresh
rate.
A
So
I
was
totally
confused
by
this
because
I'm
like
if
we're
talking
about
Hertz
as
far
as
refresh
rate,
why
that
just
seems
like
a
ton
of
duplicate
information,
but
he's
not
talking
about
actual
perceived
refresh
rates,
he's
talking
about
how
quickly
is
system.
Thank
you
for
the
follow
I.
Don't
even
care.
12
I
appreciate
it
talking
about
deep
reinforcement.
Learning
a
conference.
I
went
to
last
week
they're
talking
about
how
quickly
this
system
can
update,
so
that
a
deep
RL
system
can
learn
fast.
A
A
A
That's
amazing
that
there's
that
much
compute
imagine
the
carbon
footprint
of
that.
It's
just
amazing.
Okay!
Next
up
this
is
a
this
was
a
dance
conference,
man,
another
Facebook,
AI
researcher.
So
this
was
all
about
the
sense
of
touch
which
I
thought
was
cool.
It's
really
the
first
person
the
only
person
at
all
talking
about
the
sense
of
touch
so
he's
emphasizing
forces
when
you
interact
with
the
world-
and
this
is
something
that
most
of
the
other
people
doing.
A
Robotics
are
not
even
paying
attention
to
I
mean,
there's,
there's
torque,
you
know
and
usually
they're
just
hard
coded
things
like
don't
squeeze
something
too
hard,
but
he's
actually
got
a
really
interesting
sensor
that
they've
created
most
of
stuff.
We
already
know
you
know
humans
create
models
of
the
world
but
robots.
Don't
we
he's
trying
to
use
tactile
sensors
right,
so
the
one
thing
is
called
gel
sight,
which
is
this
interesting
sensor,
but
it's
all
of
this
was
basically
a
presentation
about
this
gel
site.
Tensor.
A
You
can
it's
hard
to
see
it,
but
this
is
this:
is
it
right
here
there's
two
of
them
on
either
side
of
this
little
thing
and
and
this
is
sort
of
what
it
sees.
So
this
is
an
interesting
little
technology,
because
it's
it's
so
imagine
that
you've
got.
This
is
like
the
gel
sight
sensor
and
you
put
a
camera
inside
of
it
and
it
points
up
right.
I
could
probably
draw
this
better.
A
So
on
top
of
that
cup,
you
put
this
gel
all
right
and
then
you
put
a
grid
on
it
like
you
actually
draw
a
grid
on
it,
and
then
you
have-
and
it's
very
plastic.
You
know
it's
very
flexible.
So
then,
when
you,
when
it
touches
something
plus,
you
can
sort
of
see
here
where
it
says
gel
site
left
and
right.
These
are
the
impressions
left
on
that
gel
by
by
where
it's
touching
like
this
pitcher.
So
you
can
see
a
very
appointed
like
this
is
the
right
one
where
it's
hitting
the
point?
A
That's
and
you
can
see
it's
a
lot
of
pressure
and
this
one
is
sort
of
more
rounded
where
it's
touching
the
handle.
So
this
is
super
cool,
I
thought,
because
because
you
get
a
lot
more,
you
get
the
tactile
information
out
of
it.
So
there
he
was
working
on
creating
robots
that
could
better
understand
and
incorporate
these
types
of
tactile
responses
and-
and
it
helped
with
grasping.
You
know
like
one
of
those
visualizations
where
he'd
show
a
picture
of
two
robots.
A
That,
essentially,
were
grasping
the
same
thing
and
you
say
which
one's
going
to
fail
and
which
ones
not
because
you
can't
tell,
and
neither
can
the
robot
from
the
picture,
whether
it's,
whether
it's
exerting
enough
pressure
on
the
thing
for
it
to
actually
hold.
But
with
something
like
this.
You
can
tell
exactly
how
much
pressure
is
being
exerted.
So
that
was
interesting.
Presentation.
A
This
really
is
a
window
into
where
we're
at
right
now,
as
far
as
autonomous
driving,
because
they're
so
he's
working
specifically
on
situations
that
you
come
across
in
urban
driving,
almost
all
pretty
much
all
of
the
autonomous
driving
vehicles
we
have
right
now
are
highway
only
they
they
can't
make
too
many
decisions.
You
know
it's
basically,
you
know
navigate
on
a
highway.
Setting,
don't
drive
it
on
the
off
roads,
don't
drive
it
through
the
city.
You
can't
do
that
sort
of
thing,
so
they're
trying
to
tackle
these
really
challenging
environment.
A
A
A
You
can't
learn
skills
if
there's,
if
there's
no
representations
and
the
training
of
those
situations
right,
yeah,
most
vision
systems
and
motion
systems
are
currently
hard-coded
to
so
deep
reinforcement
learning
just
needs
a
simulator,
so
they're
doing
all
this
in
a
simulator,
but
for
urban
environments,
deep
RL
is
still
not
very
successful
and,
and
the
reasons
he's
pointing
to
once
some
of
them
are
that
the
there
is
still
using
outdated
reinforcement.
Learning
algorithms
you
got
to
remember
deep
RL
is
only
like
five
years
old.
A
A
So
let
me
think
I'm
trying
to
figure
out
so
so
they're
use
their
experiments
here
use
this
lidar,
and
this
is
sort
of
an
example
of
the
image
that
the
lidar
gets.
They
actually
put
the
lidar
like
way
up
on
top
of
the
car,
so
it
can
look
down
and
get
a
better
perspective
of
it
and
it
tries
to
differentiate.
You
know
things
that
are
at
up
high
versus
down
low
and
try
and
pull
out.
A
You
know:
dimensionality
essentially
reduce
the
dimensionality
so
that
the
RL
agent
doesn't
have
as
much
to
process
so
they're
using
a
variational,
auto
encoder
reconstruction
pipeline.
But
there's
still
nothing
like
nothing
similar
to
object.
Understanding
in
this
it's
it's
still,
you
know
probability
distributions
etc.
So
this
is
sort
of
what
they're
trying
to
get
to
they
have
a
map.
A
They've
got
objects
that
they've
detected
where
like
an
egocentric
state
and
then
a
route,
and
then
they
there.
This
is
basically
that
what
they're
working
with
is
bird
view
image,
that's
what
they're
trying
to
reconstruct
and
make
decisions
off
of,
and
this
is
sort
of
what
the
lidar
looks
like.
So
the
the
input
up
here
at
the
top
and
then
they
reconstruct
it
I,
don't
know
exactly
what
the
difference
between
these
is,
but
essentially
they
get
to
this
mass
down
here.
A
So
they
take
this
and
they
process
process
process
so
that
they
can
reduce
the
dimensionality.
So
the
reinforcement
agent
doesn't
have
to
deal
with
all
of
those
bits,
so
they're
not
just
feeding
the
white
our
data
straight
into
the
DRL.
That
would
be
way
too
much
way
too
much
so
they're
doing
a
little
spree
processing.
A
Yes,
some
there's
one
of
the
presentations
did
talk
about
slam,
I,
don't
remember
which
one,
but
it
was
as
a
comparison.
I
think
so.
They've
got
all
these
model
free
algorithms,
they're
trying
and
they
showed
all
of
these
I
did
take
pictures
because
it
was
just
math,
math,
math,
ddq
n,
which
is
a
common
one,
deep
t3
and
soft
actor
soft
actor,
something
soft
actor
self
critic
soft
after
factor
critic.
A
These
are
like
some
of
the
common,
deep
reinforcement,
algorithms
that
are
around
right
now,
so
Carla
is
the
simulator
that
they
use.
That's
the
name
of
this,
the
car
simulator
Carla,
which
is
fun.
He
said
they
did
Anna
simulator
because
you
wouldn't
want
to
run
a
real
RL
car
in
the
real
world
so
which
emphasizes
none
of
the
self-driving
cars
currently
are
using
reinforcement
learning.
This
is
not
a
not
a
technology.
That's
currently
deployed,
there's
only
one
place
in
this
whole
conference.
A
That
I
could
tell
was
something
was
actually
deployed
in
reinforcement,
learning
in
the
real
world
which
I'll
get
to
so
what
they
tried
to
do
was
the
roundabout
I
think
I
have
picture
of
this,
so
the
goal
was
to
navigate
a
roundabout.
That's
what
this
whole
presentations
experiment
boiled
down
to
how
to
navigate
a
roundabout,
and
it
was
basically
into
the
roundabout,
ignore
the
first
exit
ignore
the
second
exit
exit
on
the
third
exit.
A
Don't
hit
anything
so
they're
rewarded
for
progress
getting
to
the
first
exit
getting
to
the
second
exit
and
then
finally
getting
passed
out
of
the
roundabout,
so
they
would
penalize
for
collisions.
They
reward
for
progress
through
the
roundabout
and
they
tried
this
with
and
without
the
surrounding
vehicles.
They
found
that
the
soft
actor/model
worked
best,
but
it
was.
This
is
a
very
restricted
experiment.
A
A
Ok,
another
robot
won.
I
think
this
was
the
one
that
actually
has
something
deployed
for
reinforcement.
Learning.
If
you
want
to
look
into
it,
the
company
is
called.
It
was
sorrow,
so
they're
really
working
to
do:
reinforcement,
learning
for
robotics
tasks
and
their
company
trying
to
make
products
most
robots,
they
say
have
only
a
level
one
autonomy
which
means
single
automated
operation
and
there's
a
ton
of
robots
that
have
single
level
one
autonomy
and
you
can
go
look
up.
A
The
different
levels
of
autonomy,
I
think
there's
five
of
them,
five
being
fully
autonomous
level,
one
being
just
one
single
automated
operation,
so
I
think
they're
getting
to
like
level
two
they're
trying
to
get
to
level
two.
So
what
they're
trying
to
do
is
essentially
target
the
warehouse
space
because
it's
not
an
unstructured
environment.
You
know
a
warehouse
is
a
structured
environment.
They
can
basically
they
know,
there's
going
to
be
AB
in
here
and
AB
in
there
and
there's
going
to
be
stuff
over
here
that
needs
to
be
put
over
there.
A
A
Let's
see
so
these
are
some
just
some
of
the
like
robots.
Some
of
them
are
suctions.
Most
of
them
are
suction.
It's
easiest
to
pick
up
things
with
a
suction
cup
than
it
is
to
grasp
it,
because
all
you
have
to
do
is
put
it
in
the
right
spot.
You
know
you
don't
have
to
worry
about,
or
so
so
all
these
that
I
saw
in
her
example
were
suction
cups,
sort
of
collection,
so
they
have
to
collect
real
world
data.
This
is
something
you
can't
just
do
in
simulations.
A
You
do
testing
and
simulations,
but
at
the
end
you
have
to
test
also
in
the
real
world
all
of
their
data.
Their
models
are
in
the
cloud
they're
in
structured
warehouse
environments.
Only
deep
learning
solutions
currently
deployed.
Here's
the
one
so
there's
this
here's,
the
one
robot
that
got
up
here
on
the
top
right
and
there's
a
video
of
this
if
you're
interested
in
looking
at
it,
but
it's
not
very
exciting
but
essentially
there's
a
plate
right
and
the
plate
has
four
bolts
coming
out
of
it.
A
A
A
This
was
a
logistics
application
which
was
interesting,
not
a
robotics
application,
and
it
seemed
to
be
somewhat
successful,
and
this
is
basically
supply
and
demand.
I've
got
all
of
these
centers
that
supply
product
and
all
of
these
centers
that
need
product
and
keeping
in
keeping
track
of
the
best
way
to
route
the
product.
To
these
centers
at
the
right
times
is
a
is
a
big
problem.
A
Do
you
go
fast
and
expensive,
which
is
like
shipping
over
with
a
truck,
or
do
you
go
slow
and
cheap,
which
is
like
a
train
which
it
gives
you
more
so
there's
there's
big
logistics
problems
and
there
and
this
person
who's
was
in
more
the
operation.
Space
was
applying
deep
reinforcement,
learning
to
to
try
and
decide
when
to
replenish
stock
to
the
different
systems
and
what
orders
to
make,
and
he
seemed
to
be
having
some
success
with
that.
A
So
I
thought
that
was
that
was
cool
okay
I
was
I
was
I
was
interested
in
this
because
the
word
reason
was
in
the
title,
learning
to
read
and
summarize
and
discover
with
deep
RL.
This
is
from
Salesforce,
so
they,
let's
see
he
went
through
these
three
different
case
studies.
I
lost
interest
in
this
when
I
realized.
That
reasoning
was
a
was
a
knowledge
graph.
A
So
really
when
he
said
the
the
way
he
wanted
to
find
the
way
they
defined
reasoning
was
having
a
pre-existing
knowledge
graph,
which
is
a
prior
prior
information,
could
be
a
very
deep
knowledge
graph
and
reasoning
is
basically
navigating
from
one
node
to
another.
You
know
so
asking
the
question:
what
what
directors
has
Tom
Cruise
worked
with,
for
example,
that
so
that's,
so
the
reasoning
would
be
okay,
let's
go
to
Tom
Cruise
and
listen
to
all
the
movies.
A
So
the
first
step
is
finding
all
of
the
movies
are
all
of
the
projects
that
he's
worked
with
and
then
the
next
step
of
saying
for
all
those
projects
who
is
the
director
and
then
making
a
list
of
those
things.
So
that
was
the
type
of
reasoning
that
they're
talking
about,
which
is
navigating
a
knowledge
graph.
A
Okay,
that's
it
okay,
so
a
couple
of
final
thoughts
and
then
I'm
gonna
take
a
break
and
I'm
gonna
prep
for
my
stand-up
meeting
and
then
we're
gonna
have
research
meeting.
So
let's
do
this,
so
this
is
worth
going
to
because
just
for
my
own
education
I
know
a
lot
more
about
deep
reinforcement.
Learning
today,
even
though
I
have
never
created
a
deep
reinforcement,
learning
system,
I
think
I
understand
the
nomenclature
a
bit
better,
so
it
was
good
education.
A
For
me,
I
was
wondering
what
the
state
of
the
art
was
currently
in
deep
reinforcement
learning,
and
so
it
was
a
good
sort
of
indication
of
what
the
state
a
art
is.
No
one
seems
to
have
any
illusions
about
what
you
can
and
can't
do,
with
deep
reinforcement
learning.
When
you
talk
about
AG
I
at
this
conference,
everybody
thought
AG.
I
was
a
long
way
away.
This
thinking
about
de
Bardo
and
from
the
standpoint
of
deep
RL
AGI
is
a
long
way
away,
which
I
would
say
no
I
agree.
There's
a
ton
of
challenges.
A
The
major
challenges
are
generalization
and
training,
speed,
scalability
right,
the
that's
the
things
that
everybody
acknowledges
are
the
major
challenges
for
deep
RL
and
and
no
one
really
understands
how
to
attack
those
problems.
They
know
that
they
we
need
benchmarks
and
so
they're
working
on
benchmarks
and
there's
a
lot
of
and
people
are
working
on
lots
of
different
ways
to
attack
those
problems
and
a
lot
of
the
tactics
that
are
that
are
being
applied
today
involve
prior
knowledge,
so
which
is
goes
against
the
idea
of
generalization.
A
So
one
of
the
ways
you
can
try
and
decrease
training
time
and
increase
scalability
is
by
injecting
prior
knowledge,
which
is
completely
the
opposite
of
what
you
want
to
do
to
achieve
generalization.
So
there's
this
dichotomy
here
and
everyone
agrees
that
this
is
a
hard
problem.
There's
a
lot
lot
of
hard
problems
that
need
to
be
addressed
in
the
deep
reinforcement,
learning,
world
and
and
everyone
I
think
understands
that.
A
You
know
there's
a
lot
of
references
to
that
like
if
we
can
do
that,
we
can
do
anything
right
sort
of
thing,
but
the
problem
is
that
just
doesn't
transfer
it's
very,
very
hard
to
take
those
very
hard
coded
solutions
in
those
specific
environments
and
and
transfer
any
of
that
learning
into
environments
that
are
even
a
little
bit
different,
even
a
little
bit
different.
That's
that's
a
big
big
challenge,
so
so
that's
the
recap
of
the
conference
is
totally
interesting.
If
you're
I,
like
the
conference
organization,
rework
is
seem
to
do
a
good
job.