►
From YouTube: 19 - Interpretability - Been Kim
Description
Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda
A
Research
focuses
on
building
interprete,
Balma,
sheen
learning
and
the
vision
of
her
research
is
to
make
humans
empowered
by
machine
learning
and
not
overwhelmed
by
it.
Before
joining
brain,
she
was
a
research
scientist
at
Institute
for
artificial
intelligence
and
an
affiliate
faculty
in
the
department
of
computer
science
and
engineering
at
the
University
of
Washington,
and
she
received
her
PhD
from
MIT.
Please
help
me
welcome
beam.
B
B
I'm
curious,
I'm,
sure
some
other
speakers
have
done
this,
but
I'm
wondering
if
we
can,
if
we
can
do
a
raise
of
hands
that
I
have
a
sense
of
who
has
how
much
experience
in
machine
learning,
who
has
trained
a
machine
learning
model
of
any
kind
Oh
who
has
trained
deep
learning
during
your
network
of
any
kind
MA
who
use
sensor
Flo
just
kidding
who
you
spied
for
interesting?
Who
has?
Do
you
regularly
read
deep
learning
papers?
B
Oh
my,
have
you
read
interpretability
papers,
Oh,
have
you
tried
not
trained
a
used
interpretability
method?
Alright,
have
you
have
you
designed
your
own
interpretability
method,
all
right,
cool
right?
This
gives
me
a
really
good
idea,
all
right,
so
I'm
gonna
talk
who
doesn't
want
to
start
their
talk
with
xkcd,
so
here's
xkcd
your
friend
asks
and
asking
a
friend.
This
is
your
machine
learning
system
yep.
B
You
pour
your
data
into
this
big
pile
of
linear
algebra
and
then
collect
the
answers
on
the
other
side.
What
if
the
answers
are
wrong?
Well,
just
the
start
of
pile
until
they
start
looking
right.
So
now
you
have
all
trained.
Most
of
you
train
the
model
who
thinks
there
is
some
truth
in
this
cartoon.
B
All
right,
yes,
III,
agree
so
I
think
there's
some
truth
in
this
cartoon
and
is
that
our
problem
I
think
so
because
this
machine
learning
is
a
such
a
powerful
tool.
We
use
it
to
make
money,
perhaps
help
people
improve
their
lives
and
there's
this
whole
industry
and
hiked
around
it,
which
myself-
and
perhaps
you
also
are
part
of-
but
when
you
use
this
powerful
tool
without
knowing
how
it
works,
you
might
do
something
that
you
didn't
intend
to
do
so.
B
Last
decades,
machine
learning
community
has
been
responding
to
the
need
of
understanding
how
this
tool
works.
This
is
number
of
papers,
I,
just
googled,
scholar
search-
and
it's
really,
but
it's
really
important
to
know
that
this
is
not
a
new
problem.
Who
here
are
red
expert
systems
paper
in
80s
and
70s
yeah?
So
a
few
vote,
fuyq'
of
you,
the
problem
of
interpretability
has
already
we
have
already
thought
about
this
as
a
community
back
in
the
days,
but
I
think
it's
safe
to
say
that
we
quite
haven't
solved
the
problem.
So
why
now?
B
Why
do
we
care
now
complexity
and
privilege?
Complexity?
We
have
a
lot
of
computers.
Computers
are
cheaper.
We
can
put
a
lot
of
data
in
those
cheap
computers.
We
can
make
models
a
lot
more
complicated.
Then
we
could.
We
could
have
done
that
or
ever
before
things
are
more
much
more
complex
than
before
privilege,
they're
everywhere.
Let's
say
you
wanna
escaped
from
technology
and
you
want
to
go
camping
this
weekend.
B
Chances
are
that
the
way
that
they
manage
the
storage
of
your
camping
equipment,
they
might
be
using
machine
learning
to
so
it's
everywhere.
So
we
now
have
to
care,
but
you
might
say
well,
hang
on,
but
I
heard
this
thing
that
decision
trees
are
interpretable.
So
maybe-
and
it
is
the
important
question-
if
that's
true-
then
we
should
all
study
decision
trees
and
we
should
just
optimize
hell
out
of
it
and
and
then
we're
done
right.
So
let's
do
a
little
bit
of
exercise.
B
I'm
gonna
have
show
do
an
action
you
at
each
slide.
I'm
gonna
show
you
a
tree
looks
like
this
and
you
follow
whether
there's
more
than
hundred
people
in
this
room,
I
think
there's
more
than
100
people
in
this
room,
but
you
said
yeah
less
than
200
I
think
okay
cool
and
it
is
definitely
not
raining
because
it's
California
and
once
you
do
yes
and
yes
bless
them,
love
it.
So
it's
a
no
and
yes
and
you
will
stop
okay,
so
at
each
side
and
we're
gonna
see
three
trees.
B
A
B
Alright
I
think
pretty.
Oh,
this
is
like
a
left
hand,
some
some
right
hands
and
some
lessons
where
they're
instantly
a
different
time
zone.
That's
true.
Maybe
their
weather
is
sunny.
Yes
and
then
time
is
morning
left,
oh,
maybe
it
was
confusing.
Oh
yeah,
it's
not
the
mirror
image
of
it.
You
yeah,
okay,
so
stuck
tan
all
right
now,
I'm
gonna,
add
one
or
two
more
layers
to
this
tree.
Okay
and
I
would
like
you
to
do
the
same
thing.
You
ready.
B
B
Some
hands
come
feed
on
hands,
right
hands
and
some
clapping
all
right.
Let's
see
cloudy
no
you're
greater
yes
afternoon,
no
you're
greater.
Yes,
what
are
people
less?
No
people
in
the
first
row,
less!
No,
so
I
think
it's
clapping
good
job.
So
this
has
five
to
six
features.
This
is
a
very
small
data,
five
to
six
features,
and
this
is
like
five
layer
tree.
It's
like
a
really
tight
problem
in
machine
learning
right
in
addition,
world
can
you
can
someone
tell
me
what
the
overall
logic
was?
B
B
So
maybe
it
was
a
problem
with
the
interface,
maybe
like
trees,
branching
coming
out
of
everywhere.
It
was
just
not
good,
so
maybe
something
like
rules
rule
lists
would
work
something
like
this.
You
have.
If,
if
these
this
was
not
true,
then
you
go
to
ski
or
not
you.
If
you
want
to
check
whether
there's
a
new
episode
of
Rick
and
Morty,
you
have
to
check
everything
else
was
false,
so
maybe
rule
set
something
like
this:
we
clump
them
together
in
two
sets
so
or
or
end
relationship
between
busy
modules
or
not.
B
B
Some
solutions,
these
solutions
to
work
for
some
applications,
so
I
hope,
I,
convinced
you
that
we
still
need
to
stay
in
for
the
rest
of
the
lecture
to
hear
what
other
people
did
in
their
work,
then
you
might
say
well
that
sounds
hard,
even
five
layer,
trees
or
too
complex,
so
it's
just
possible
at
all.
Is
this
even
even?
Are
we
ever
gonna
get
there
like?
There's
this
superhuman
performance,
Network
like
alphago?
Are
we
ever
gonna
understand
it?
Well,
the
point
of
interpretability
is
not
about
understanding
every
single
bit
about
the
model.
B
B
Not
quite
and
I'll
get
to
that
in
a
bit,
but
let's
just
define
what
our
goal
is.
My
goal
of,
inter
for
interpretability
and
yours
might
be
different.
In
fact,
I'll
come
back
to
what
your
goals
might
be:
cuz
you're
all
scientists,
if
from
what
I
heard,
but
my
goal
is
the
following:
I
think
the
goal
should
be
building
a
tool
so
that
we
can
help
people
use
machine
learning
more
responsibly.
I
really
do
believe
that
a
lot
of
people
wanted
to
the
right
thing.
B
They
build
this
model,
so
observe
big
population
in
the
world
and
when
they
add
on
you
terminal
in
the
cost
function,
they
really
do
want
to
understand.
What's
the
impact
of
that
extra
term,
what
kind
of
feedback
they
would
create,
what
kind
of
social
impact
they
might
have,
but
a
lot
of
times
they
may
have
the
right
tool
to
investigate
and
answer
that
question
so
the
tool
that
makes
that
enables
that
help
that
that
answering
that
question
in
some
way
is
useful,
and
that
is
my
goal.
B
So
this
is.
These
are
not
completely
list
about
some
of
the
lists
of
how
do
we
help
one
I
think
the
tool
should
help
you
make
sure
that
your
values
are
aligned
in
the
model,
such
as
fairness
and
your
domain.
Expert
knowledge,
such
as
medical
knowledge
or
scientific
knowledge,
is
well
reflected
in
the
model
when
you
want
it
and
I
think
it's
extremely
important
that
we
keep
in
mind
that
this
is
not
just
for
computer
scientists
or
you
guys
who
all
have
trained
neural
network
and
machine
learning
models
or
neural
networks.
B
It
has
to
work
for
everyone,
medical
domain,
like
doctors,
they
have
the
critical
knowledge
to
make
the
right
decision,
but
they
may
not
have
computer
science
or
machine
learning
background.
It's
rare
to
see
you
have
MD
and
CS
PhD.
They
do
exist
in
brain.
There
are
couple
of
them,
but
it's
rare,
but
those
are
the
times
when
the
interpretability
becomes
most
critical,
also
important
to
talk
about
what
are
not
our
goals.
It's
really
not
our
goal.
To
make
everything
interpretable.
There
are
plenty
of
applications
out
there
where
interpretability
is
overkill.
B
You
just
don't
care
that
much,
so
don't
spend
energy
on
that.
If
that's
not
your
goal,
it's
not
about
understanding
every
single
bit
about
the
model,
but
also
it's
not
about
against
developing,
like
oh,
oh
AML,
other
complex
net
or
architectures.
We
just
have
more
work
to
do,
and
importantly,
it's
really
not
about
gaining
trust,
gaining
trust
and
interpretability
is
a
separate
problem.
In
fact,
if
your
only
interest
is
to
gain
trust,
the
best
strategy
is
to
look
at
psychology,
because
humans
are
very
easy
to
deceive.
B
The
secure
related
results
that
we're
very
gullible
and
if
that's
on
your
go,
you
should
just
look
at
that.
Not
what
about
revealing
truths,
which
is
what
interpretability
is
about?
What
do
I
mean?
There's
this
recent
paper,
a
beautiful
paper
that
came
out
a
couple
months
ago
now,
where
they
show
that
they
thought,
given
a
medical
image.
B
Machine
learning
is
capturing
something
to
predict
some
some
facts
about
this
patient,
whether
it's
a
patient
will
be
deteriorate
after
he
fractured,
but
they
found
out
later
that,
oh,
it
was
really
reading
which
machine
the
image
was
taken.
What
kind
of
the
model
of
the
machine
was
all
the
confounding
variables
that
you
don't
machine?
Who
want
the
machine
to
take,
get
a
signal
from,
and
in
this
case,
what
interpretability
message
should
really
do
is
to
tell
the
truth
and
tell
them
tell
two
humans
that
you
should
not
trust
this
model.
B
So
that
sounds
good.
Now
we
just
go
back
and
write
down
what
our
goals
are,
and
you
optimize
mathematically
and
that's
a
great
that's
a
that's
a
great
start
as
a
computer
scientist
right,
but
not
quite
because
interpretability
is
fundamentally
under
specified
problem.
What
does
that
mean?
Well
I'll.
Give
you
some
examples:
safety.
B
Can
we
figure
out
all
the
possible
unit
tests
for
which,
if
a
car
passes,
then
it's
a
safe
car,
safe,
autonomous
car,
no
right,
you're,
familiar
with
the
trolley
problem,
the
moral
discussion
between
in
in
an
interesting
discussion
where
hypothetical
problem
you're
driving
a
train?
If
you
pull
a
lever,
you'd
kill
these
four
people
who
happens
to
be
roped
into
the
train,
lag
or
you
kill
this
one
person,
and
you
only
have
two
solutions
and
they
ask
you
questions
about
what
do
you
think
if
this
one
person
was
pregnant?
B
What
if
this
one
person
was
related
to
you?
This
is
like
really
difficult
question
hypothetical,
but
that
kind
of
shed
light
into
there
is
no
right
answer.
It's
a
very
difficult
question
under
specified
problem.
What
safe,
autonomous
driving
its
science?
This
is
a.
This
is
a
good
good
audience
for
this
science.
You
can
do
machine
learning
to
discover
something
new
and
it's
something
new,
so
you
don't
know
how
to
write
a
loss
function
for
that,
so
it's
under
specified
other
times.
You
might
have
missed
mismatched
objectives.
What
do
I
mean
you're
a
doctor?
B
So
because
it's
a
challenge
in
a
problem
definition,
it's
not
something
more
data
or
clever
algorithm
can
help.
In
fact,
if
you
think
about
it,
the
regular
supervisors,
machine
learning
has
similar
problems
because
a
cure
sees
great
star
I
me
much
better
than
much
more
specific
than
interpretability,
but
its
accuracy,
everything
you
ever
want,
maybe
its
precision.
Maybe
it's
recall,
maybe
it's
something
about
this
particular
group
that
you
definitely
want
to
protect.
Maybe
it's
about
accuracy
in
that
group.
B
B
Probably
good
example
is
like
alphago
I
find
awful
go.
Amazing,
I
wasn't
involved
in
any
of
those,
because
the
way
that
it
does
place
go
is
so
different
from
how
humans
would
have
done
it
and
go
players
would
look
at
the
how
they
play
the
most
37
and
they
got
excited.
They
see
what
this
isn't
an
alien
go
player.
It's
beautifully
playing.
This
I
would
like
to
learn
from
it
right
and
that
kind
of
a
problem.
Perhaps
you
don't
need
interpretability,
it's
beautiful
problem
as
it
is,
and
you're
learning
something.
B
Stock
price
is
something
that
something
else
that
I
usually
example,
but
it
depends
on
how
you
feel
about
financial
finance,
finance
businesses.
We
also
don't
need
interpret
ability
for
sufficiently
studied
problem
like
airplanes,
how
many
people,
maybe
disorients,
actually
how
many
people
know
exactly
why
planes
fly
right,
I'm,
not
in
theory,
but
in
practice.
Like
you
measured
it
you
did.
You
use
the
P
top
and
measure
the
pressure
navier-stokes
equation.
You
read
your
dit,
it's
it's
hard,
I,
don't
really
exactly
understand.
B
I
I'm
from
mechanical
engineering
background,
but
I
trust
it
because
I'd
rather
take
plane
to
go
to
Boston
than
a
road
trip.
It
takes
a
long
time
so
I
trust
it
and
I
think
most
of
the
time
it
works,
wait.
So
hey
good
grades,
timing,
I
guess
so
I-
would
rather
use
that
and
I.
Don't
I!
Don't
worry
about
that
too
much.
B
So
if
machine
learning
comes
to
stage
where
a
lot
of
people
kind
of
accepts
it,
because
empirically
you
get
that
you
accept
the
risk,
then
you
may
not
need
interoperability
and
some
other
times
when
you
don't
want
people
to
game
the
system.
Maybe
you
don't
want
to
reveal
everything.
A
good
example
is
a
credit
card
score,
so
we
don't
need
all
we
don't
always
need
interpretability.
B
So
that
sounds
good.
Once
you
decide
that
you
need
interpretability,
that's
great!
Then
it
comes
with
all
the
cousins,
fairness,
accountability,
trust
and
causality
right.
Not
quite
fairness.
Trust
and
other
things
are
not
the
same
thing.
Interpretability
may
help.
These
guys
reveal
fairness.
Problems
perhaps
help
users
to
gain
trust,
but
not
the
other
way
around.
There
are
different
problems
so
great.
B
So
now
you
have
all
the
goals
and
and
and
everything
in
mind
now
you
made
your
a
beautiful
interpretability
method
and
you
go
ok,
here's
my
method,
look
at
this
picture,
it's
beautiful
and
that's
it
because
there's
no
good
way
to
evaluate
interpretability
method.
I
hear
this
a
lot
and
I
think
it
hinders
progress
of
this
field
because
we
can
evaluate
interpretability
methods.
It's
a
the
idea
that
I'm
going
to
speak
a
little
more
about
this.
The
the
high-level
idea
is
that
you
can
make
a
toy
dataset
such
that
you
know.
B
We're
in
the
image,
for
example,
is
important
should
be
important
because
you
made
up
the
datasets
and
test
whether
your
attribution
method,
for
example,
which
that
identifies
which
photo
the
picture
is
important,
is
doing
the
right
thing.
So
I'll
talk
about
that
a
little
bit,
so
you
can
evaluate
interpretability
methods.
B
All
right
so
now
we're
gonna
go
into
actually
studying
the
technical
stuff
doing
well
in
time.
Alright.
So
before
jumping
in
let's
talk
about
some
ingredients
for
interpretability
methods.
First,
you
can
think
about
interpretability
as
some
optimization
problem,
where
you
have
some
quality
function.
That
evaluates
your
explanation
e.
When
your
goal
is
to
under
this
call
a
function,
we
want
to
find
the
best
t
that
maximizes,
that
quality
function
and
the
quality
function
can
be
measured
via
human
human
experiments.
B
B
Now
you
have
to
explain
this
complex
data
humans,
the
end
user,
the
customer
matters.
The
only
reason
that
we
have
interpretability
method
is
for
final
human
consumption
and
who
the
human
is
the
final
user.
It's
very
important
and
fundamentally
changes
the
problem
of
interpretability
method.
If
you're
newbie,
then
that
I
can,
if
you're,
if
you're
a
more
an
expert,
I,
can
compact
a
lot
more
information.
B
Things
that
you
might
want
to
learn
from
the
model
might
be
very
different
from
if
you're
on
newbie
tasks
very
very
important,
depending
on
what
you're
trying
to
do,
whether
you're
interested
in
understanding
the
the
overall.
What
model
does
for
this
particular
class
of
credit
card
approvers
credit
card,
reject
it's
different
from.
If
you
are,
if
you
want
to
just
explain
this
one
person,
one
customer
who's
complaining
why
he
or
her
his
or
her
credit
card
was
rejected.
B
The
former
was
global
and
if
you're
just
interested
in
one
data
point,
that's
we
call
local
explanation
and
of
course
it
depends
on
low
and
high
task
domains.
Sometimes
you
have
a
lot
of
time
to
look
at
one
data
point,
but
if
you're
a
medical
expert-
and
you
have
to
make
a
decision
like
this-
then
you
don't
have
that
much
time.
So
there's
a
lot
of
trade-offs
that
you
have
to
think
about
when
you
design
these
methods.
B
There
are
three
types
of
interpretability
methods:
we're
going
to
go
through
some
more
briefly
and
I'm,
going
to
spend
a
little
more
time.
Here.
First
is
explaining
data,
you
don't
have
any
models,
no
models
included,
but
you
just
want
to
look
at
the
data
and
get
some
understanding
of
how
your
data
look.
Is
it
garbage
it
does?
Is
it
missing
too
much
too
much
fields?
In
fact,
a
lot
of
problems
that
you
see
in
the
real
world
or
in
research
problems
problem
comes
from
data.
You
have
to
look
at
data
a
lot
of
times.
B
B
Second,
you
might
have
a
bandwidth
to
build
a
new
model
with
model
bill
to
train
a
new
model
from
scratch,
so
that
it's
inherently
interpretable
so
we'll
go
over
some
methods.
How
you
might
be
able
to
do
that,
but
sometimes
often
at
Google.
You
have
a
letter,
your
model,
that
a
lot
of
people
worked
on
many
many
years.
I
can't
just
like
come
in
as
a
new
galore
and
say
I'm
going
to
change
your
model.
Let's
now
use
ol
interminable
model
that
just
doesn't
work
so
then,
in
that
case,
what
do
you
do?
B
B
Remember
when
I'm,
when
I
was
in
college
an
hour
and
half
lectures
long
like
if
I'm
interested
in
the
subject
like
psychology
I'm
like
in,
but
if
I'm
listening
to
like
chemistry,
sorry
for
chemists
in
the
audience,
I
am
like
falling
asleep
after
45
minutes,
I'm
gone,
so
please
ask
questions
if
you're
falling
asleep
and
you
can
stand
up
and
stand
in
the
back.
That's
no
problem!
Okay,
questions!
Welcome
all
right!
So,
first
explaining
data,
we
have
a
running
example,
a
simple
2d
data.
B
You
have
first
class
blue
circle
and
then
red
Xs
to
tsukasa's
2-dimensional
data
table
for
one
simplest
possible
way
and
I
hope.
You're
doing
this
is
by
getting
the
mean
and
center
Divo
of
these
data
for
two
classes
so
for
existence.
If
you
gather
these,
two
classes
have
like
couple
features
and
they
completely
overlap.
Then
you
kind
of
have
no
hope.
B
B
One
way
to
do
this
is
instead
of
giving
you
just
a
mean
and
variance
I'll,
give
you
an
example.
So
let's
say:
oh
now
we
have
a
one.
Let's
say
we
cluster
them
and
instead
of
saying
oh,
the
mean
is
like
two
and
three
I
give
you
this
picture
and
say:
they're
true
clusters,
one
cluster
dogs,
look
like
this.
The
other
cluster
dogs
look
like
that.
Simple
k-means
worked
surprisingly
well,
but
what
about
these
guys?
B
They're
gonna,
be
left
alone,
and
sometimes
when
your
classifier
or
clustering
method
is
acting
funky
very
often
that
these
guys
are
the
problems.
So
you
definitely
want
to
know
about
these
guys.
So
here's
a
one
way
to
do
it
with
what
we
did
is
is
a
paper
from
2016
Europe's.
We
picked
a
prototype
that
is
kind
of
maturity
that
represents
the
maturity
and
then
given
we
feel
a
distribution
over
this
prototype
first
and
then,
given
that
we
try
to
learn
another
distribution
that
captures
the
difference
between
majority
and
the
overall
distribution.
B
So
we
want
to
capture
these
minorities
that
are
not
too
minor,
like
it's,
not
outlier,
but
it's
significant
enough
that
you
have
to
see,
and
we
of
course,
leverage
this
kernel
MMD
trick.
That
gives
you
really
nice
guarantees
and
computational
efficiency,
and
here
are
some
examples
results.
We
have
two
dog
classes
prototypes.
You
see
the
face
of
the
dog
and
three
typical
pictures
for
criticisms.
We
see
dog
in
a
rabbit,
costume,
cute,
Santa,
Claus,
black
and
white
picture
pictures
with
dogs
without
their
kind
of
face
their
faces
are
hard
to
see.
B
So
those
are
some
examples
of
explaining
data
itself.
Of
course,
I'm
swiping,
a
huge
body
of
literature
and
visualization
and
heci
community
their
work
under
the
rug,
but
there's
a
lot
more
work
that
we
have
to
do
in
this
I'm,
not
expert
in
HCI,
so
I,
always
whenever
I
go
to
HCI
conferences.
I'm
like
please
help
us,
we
need
your
expertise,
but
next
up
we're
gonna
now
build
a
model.
Next
type
of
interpretability
method
is
building
inherently
interpretable
models.
So
what
are
they
again?
Our
data
points
two
classes.
B
First,
simplest
personal
suppose,
one
of
the
simple
way
to
do
it
is
learn.
Rules
you've
seen
this
before
a
decision
tree
is
a
type
of
this.
This
style
of
modular
medium
that
you
use,
there's
lots
of
rule
is
the
rule
says
lots
of
work
has
been
done,
especially
Cynthia.
Rudin
from
Duke
has
done
some
great
work,
including
learning
certifiably,
optimal
rulest,
so
you
can
run
this
and
you
have
the
optimal
under
some
assumptions.
B
B
You
can
fit
a
simpler
function
for
each
feature,
for
example,
so
here
there's
a
lot
of
blue
dots
here
and
I
projected
it
onto
this
dimension,
the
the
X
x
FF
to
dimension
sorry,
the
X
of
2
I
got
rid
of
the
f1
and
I
projected
everything
to
f2,
and
you
might
have
some
nice
distributions
like
this
and
in
fact,
there's
a
nice
family
of
method
called
generalized
linear
model,
which
Khurana
did
some
awesome
work
on
this.
Where
you
can.
These
are
just
the
kinds
of
generalized
linear
models.
B
You
can
feed
a
linear
model,
of
course,
or
you
can
put
another
function
on
your
prediction.
Variable
Y
and
again
have
a
linear
model.
You
know,
site
sinusoid
functions
are
our
type
of
dis.
You
can
make
it
a
little
more
expressive
by
fitting
functions
for
each
feature.
That's
another
way
to
do
it,
and
and
what
rich
Khurana
showed
in
his
paper
in
2000,
I'm,
forgetting
a
while
ago,
is
that
he
can
match
a
highly
expressive.
Neural
networks
performance
with
his
simple,
generalized
linear
additive
model.
B
B
Another
method
that
you
can
do
is
model
distillation,
so
you
have
a
complex
model,
a
neuron
Network
and
it
did
all
the
heavy
lifting.
Then
you
train
another
model
using
input
and
output
of
that
model.
So
you
forget
about
the
true
label.
Now
your
training,
a
model
using
the
input
and
predicted
label
from
this
heavy
lifted
neural
network,
and
you
build
a
simple
model
using
that
and
that's
called
model
distillation,
a
type
of
model
distillation.
B
B
What
I
really
find
powerful
from
my
experience
is
example
based
interpretability
methods,
so
what
it
is
is
similar
to
what
we
looked
at
in
k-means
instead
of
saying
that
this
class
is
either
based
on
this
rule
or
this
function,
we
just
give
you
the
example.
This
class
looks
like
these.
Those
this
class
looks
like
this,
these
other
anymore.
B
Why
is
this
powerful
humans
use
a
lot
of
context
in
their
reasoning.
In
fact,
there
is
a
famous
studies
and
studying
firefighters,
where
they
the
way
that
they
make
quick
decisions.
When
the
thing
happens
there
like
this,
when
it
comes
to
okay,
you
go
there,
you
go
that
post
and
we're
gonna
do
this
and
the
way
that
they
do.
This
is
what
they
call
recognition,
recognition,
prime
decision-making,
and
what
that
is.
Is
they
basically
think
about
all
the
other?
B
B
B
One
way
to
do
women
way
to
do
ad
post-training
explanation
is
simply
get
rid
of
one
feature
and
see
how
it
looks
like
and
that's
typically
called
ovulation
test.
So,
for
example,
if
you
have
a
picture
or
a
categorical
data,
you
remove
one
of
the
factor
like
age
and
C.
So,
basically,
just
shuffle
that
feature
now
the
age
doesn't
give
you
any
information,
it's
not
a
real
human
and
see
how
much
the
accuracy
drop.
If
the
feature
was
holding
a
significant
signal,
then
accuracy
will
drop
more
than
if
it
didn't
that's
population
test.
B
Just
simply,
there's
a
smarter
way
of
doing
this.
There's
this
paper
by
penguin
and
Percy
Liang
at
Stanford.
They
use
this
thing
called
influential
functions.
It's
a
pretty
well-known
function
in
outside
of
machine
learning
community
to
determine
which
images
in
the
training
that
was
most
influencing
prediction
of
this
picture.
So
what
they
showed.
This
is
kind
of
hypothesis.
B
I
guess
is
that
inception
is
more
expressive
and
learn
better
representation
that
picked
out
this
fish
that
are
actually
relevant,
whereas
SVM,
which
have
might
have
learned
more
superficial
superficial
picture,
features,
had
just
picked
out
pictures
that
has
similar
color.
The
challenge
with
this
Meccano
method
is
that
it's
computationally
very
expensive.
You
got
to
invert
some
matrixes
and
once
you
do
pseudo
inverse
things
kind
of
fall
apart,
but
there
has
been
a
lot
of
nice
work.
B
I'll
follow
up
to
make
this
method
a
little
more
efficient
second
method,
which
I'm
gonna
talk,
spend
a
little
bit
of
time
on
is
sensitivity,
analysis,
fitting
linear
model
or
function
gradient
based
methods.
These
are
all
kind
of
similar
flavor.
What
do
I
mean?
Well,
here's
line
method
you
probably
who
have
who
have
heard
of
line
so
okay
cool.
So
this
is
like
one
of
the
most
widely
used.
What
they
do
is
really
simple
and
elegant.
B
What
it
is
is
you
have
this
decision
boundary
looks
like
a
red
class
and
the
blue
class,
and
you
have
a
mock
data
point
that
you
want
to
explain.
Then
what
you
do
is
you
randomly
sample
data
points
around
you
and
then
you
fit
a
linear
function,
and
now
you
have
a
linear
function
and
you
look
at
the
weights
and
see
which
one
which
feature
was
important
or
not
important.
Now,
if
you're
thinking
is
that
work,
then
your
intuition
might
be
right.
B
It's
a
it's,
a
very
sensitive
method,
depending
on
how
you
sample
the
data,
your
linear,
linear,
classifier,
might
be
all
over
the
place.
There's
a
nice
work
by
a
couple
of
people
from
MIT
that
showed
the
robustness,
the
lack
of
robustness
of
lime
and
after
that,
a
body
of
people
moved
in
to
improve
the
robustness
of
such
methods.
B
Another
body
of
method
is
called
salience.
Emags
I'm
gonna
talk
at
length
about
this.
What
is
salience
EMAP?
It's
you
have
a
picture
like
this
and
you
get
ended
up
getting
picture
like
that.
What
is
it?
Well,
you
just
look
at
every
single
pixel
and
you
take
first
order
derivative
of
the
probability
of
like
starfish,
with
respect
to
every
single
pixel
in
this
image.
If
that
means
intuitively,
if
I
change
this
pixel
a
lot
and
the
probability
changes
quite
a
bit,
that
means
that
the
pixel
is
important.
B
It's
just
a
simple
sensitivity
test
and
we
build
bells
and
whistles
around
it
to
make
it
fancier
and
that's
bottom
work
hold
salience
EMAP.
A
lot
of
them
are
based
on
gradients,
and
you
get
end
up
getting
images
like
just
like
this,
where
this
is
a
vanilla
gradient
and
this
method
oh
yeah.
This
is
my
work
and
I'll
criticize
this
work
very
soon
after
you
do
some
fancy
things.
You
get
something
more
kind
of
like
the
tower
first
problem
with
this
type
of
approach.
B
Is
that
when
you
look
at
just
one
data
point
at
a
time,
you're
looking
at
basically
this,
this
is
first
one
of
the
feature
f1
you're
taking
driver
for
shorter
derivative
of
probable
of
the
class.
Now
because
your
locally
fitting
this
linear
function,
you
might
have
these
two
data
points
that
guy
and
that
guy
that
are
very
similar
in
both
values
of
F,
2
and
F
1,
but
has
completely
different
explanation,
because
God
knows
what
your
function
does.
Your
functions
might
be
very
picky
where
explanation
will
be
all
over
the
place.
B
Well,
what's
wrong
with
that.
Well,
well,
couple
problems,
if
you,
if
you
present
explanations
that
are
conflicting
and
human,
look
at
it
and
they're
like
oh,
these
two
patients
look
very
similar,
but
they
have
a
completely
different
explanation.
It's
matter
of
time
that
you
completely
lose
the
confidence
from,
and
the
expert
might
say.
Oh
yeah!
No!
No
thank
you
question
earning
is
not
ready,
and
the
second
problem
is
that
as
a
human,
even
if
this
is
true
as
a
human,
it's
really
hard
to
grasp
the
whole
idea,
because
we
have
a.
B
We
have
a
limitation
in
our
memory.
If
you
want
to
have
at
some
general
notion
of
how
this
model
works,
this
may
not
be
the
way
that
you
want
to
do.
Here's
a
second
problem
so
again,
just
just
to
recap
our
problem
definition.
The
problem
is,
you
have
a
model
that
someone
else
trained
and
your
goal
is
to
answer
what
was
the
evidence
of
prediction?
What
caused
the
prediction?
So,
for
example,
you
have,
on
your
network,
paste
a
picture
like
this
and
predicts
that
this
is
junk
uber-cool.
So
then
I'm
asking.
B
Why
was
this
a
jungle?
Bird
so
say?
Let's
see,
map
will
give
you
something
like
this
and
what
this
means-
and
this
is
important-
that
these
pixels
are
evidence
of
predictions.
These
pixels
are
why
this
bird
is
predicted
as
a
bird,
so
cool
that
sounds
fine.
Then
we
can
ask
some
simple
sanity
check
question.
If
these
pixels
are
the
evidence
of
prediction,
then
when
the
prediction
changes,
then
explanation
should
change.
B
In
fact,
we
can
think
about
Oh
an
extreme
case.
When
we
make
the
network
garbage
just
randomize
everything,
then
the
explanation
should
really
change
now
the
network
is
garbage,
didn't
don't
know
anything
about
the
bird,
then
in
the
probabilities
very
low
chance
that
by
accident
you
will
have
the
same
explanation.
So
we
did
that
cast
yeses
work
with
all
some
PhDs
unit
at
MIT
Julius.
What
we
did
is
the
following:
we
took
a
network,
that's
been
beautifully
trained,
predicts
doing
the
right
thing
and
get
this.
B
B
B
B
Based
on
these
two
pictures,
the
belly
looks
pretty
important
still
in
the
cheek
still
important,
and
if
they
look
the
same
to
you
and
to
me
I
think
that's
a
problem,
so
we
did
this
for
many
methods.
Let
me
actually
show
you
the
next
picture.
Oh
it's
not
here
we
added
a
lot
more
methods,
so
we
did
a
lot
of
different.
This
is
a
vanilla
gradient.
This
is
smooth.
B
B
My
accumulated
way
so
at
the
last
column,
you're
looking
at
a
completely
random
network,
completely
random,
completely
reinitialize
and
even
the
first
second
column
here
the
prediction
is
random:
prediction
is
completely
garbage,
but
except
perhaps
for
grad
camp,
although
it
somehow
recovers
where
the
bird
was
a
lot
of
a
lot
of
other
methods
seems
to
be
pointing
at
where
the
bird
is
so.
We
were
shocked
by
this
result
and
we
said.
B
Else,
crazy.
We,
yes!
Thank
you
great
question.
Yes,
the
darker
red
or
purple
is
a
more
important
than
the
wider
one,
I'm,
not
completely
sure.
So,
if,
when
you
win,
the
network
is
completely
random,
it's
just
a
random
projection
right,
it's
a
it's
an
input
image
and
it
take
that
image
and
randomly
project
that
image
into
some
dimension
higher
dimension
and
you
shrunk.
It
right,
I'm,
not
sure
how
much
and
I
can
make
this
random
weights
in
any
way.
I
want
doesn't
have
to
be
Gaussian.
B
B
No,
this
is
like
a
completely
random
I,
don't
think
it
we're
doing.
We
haven't
tried
anything
rich
if
I
haven't
trained
anything
right,
so
there's
no
bad
norm
normalization
during
training,
nothing,
but
they
would
think
about
how
to
cut
a
little
bit.
I'll
talk
about
another,
deep
image,
prior
work
that
came
out
in
Europe's
17,
which
observed
very
similar
symptom.
The
culprit
is
convolution
operation.
So
it
turns
out
the
convolution
operation
itself.
Is
such
a
good
feature
extractor,
even
if
it
was
never
trained,
which
is
kind
of
shocking
and
I?
Don't
understand
why?
B
But
it
just
does
so
if
you
have
a
completely
random
Network
and
you
pipe
an
image.
So
this
is
untrained
and
you
collect
activations
in
some
layer
and
use
that
activation
to
train
a
separate
Network
like
linear,
classifier
or
SVM.
Again.
This
is
like
a
random
projection
right,
but
apparently
it
works
pretty
well
yeah,
as
it's
fun
to
be
in
this
field.
B
Every
day,
like
shocking
results
comes
out
great
question,
great
question:
we
don't
know
and
I
think
that's
the
problem,
so
I'll
talk
about
a
method
that
I
think
of
after
seeing
this
failure
mode
to
kind
of
go
over.
That
idea
of
having
to
they've
been
constrained
into
a
pixel
space.
I'll
talk
about
that.
We
can
use
higher-level
concepts
like
color,
like
texture
or
like
something
else
and
I'll
talk
about
that
method.
There's,
although
we
had
like
thousand,
we
took
the
in
Section
B
3,
which
has
thousand
classes
great
question.
B
So
if
you
get
the
Bunya
like
gradient
and
there's
a
lot
of
work
doing
this,
they
call
it
discriminate
ability
if
you
have
two
classes
in
the
image
and
you're
taking
the
gradient
with
respect
to
one
class
versus
the
other,
then
this
too
should
be
different
right.
Unfortunately,
a
lot
of
methods,
including
mine,
doesn't
really
have
that.
If
there's
a
thing
in
the
middle
and
dog
in
a
cat,
then
it
will
say
happily
say
all
the
things
in
the
middle
I
like
it.
B
There
are
methods
who
just
to
who
try
to
distinguish
and
add
this
as
a
loss
function
to
do
that.
I,
don't
think
in
this
society
check
context,
I,
don't
think
anything
would
change
just
because
you
have
to
cuz
it's
already
for
me
and
they're
questions.
Yeah
I,
like
this
question
and
I
got
this
question
a
lot.
I
think
the
problem
is
that
the
promise
was
that
these
are
these
pixels
or
why
this
bird
is
a
bird.
B
It
I
didn't
say
here
are
set
of
pixels
that
might
or
may
or
may
not
be
the
cause
of
prediction.
So
if
the
claim
was
that
it's
a
recall
that
I
will
give
you
some
hundred
things-
and
maybe
one
of
them
might
cause
the
prediction,
then
I
think
perhaps,
but
even
then
I
have
some
questions.
Why
do
they
it?
Those
two
happen
to
exactly
the
same,
and
my
conjecture
is
that
convolution
operation
itself
is
just
looking
at
the
image.
B
It
loves
the
image
and
loves
edges
because
it
is
kind
of
trained
to
pick
up
the
edges
and
by
getting
the
somehow
the
first
ordered
radio
and
even
new,
even
though
it
was
with
respect
to
the
probability
final
logit
layer.
It
is
still
highly
biased
to
just
the
image
itself
and
it
has
less
to
do
with
prediction.
It
might
have
to
sue
something
with
prediction,
but
not
as
much
as
you
think
it
is
so
like
extending
this
farther.
Let's
say
you
have
a
I
think
about
this.
B
A
lot
like
you
have
a
cancer
image
right
and
your
your
some
big
company
put
out
a
medical
tool
that
says
here's
a
cancer
image.
Here's
your
your
my
own
cell
cell
specimen
and
then
here
your
doctors
will
look
at
salient
see
maps
to
tell
whether
you
have
a
cancer
or
not
I
will
be
very
worried,
I,
don't
know!
If
it's
getting
today,
precision
is
it
a
recall
by
having
some
attention
map?
Am
I
biasing
the
doctor
to
miss
this
other
feature
that
she
should
he
or
she
should
have
seen?
That's
I
would
be
worried.
B
Yes,
I
see,
I
wish
I
have
other
pictures,
we're
not
cherry-picking
so
to
answer.
Actually,
this
is
a
good
point
to
point
out.
We
do
consultative
evaluations
in
the
paper
over
like
many
many
pictures,
ten
thousand
or
so,
where
that's
not
necessarily
true
yeah
and
we
do
three
different
metrics,
because
none
of
these
metrics
are
perfect,
Pearson,
Pearson
correlation
coefficients
and
other
computer-based
vision
method
where
it
compares
similarity
between
two
things.
I
think
it
I
think
it
depends
on
the
message.
B
So,
let's
look
at
it
I
think
for
this
guy,
like
this
thing,
is
definitely
more
birdie
than
this
random
thing.
This
is
completely
random
and
this
is
only
one
they're
random,
but
Brad
can
I,
don't
know
what
this
is
doing,
but
it
kind
of
recovers
somehow
recovers
the
bird
again,
so
I
think
it
depends.
I.
Think
that's
a
question
I
think
it's
interesting
question,
but
we
do
see
an
amazing
performance
on
language
transfer
which
doesn't
have
convolution
layers
but
I
think
there's
something
special
about
convolution
layer
for
sure
that
we
don't
a
lot
of
them.
B
We
don't
understand.
Maybe
we
need
scientists
to
work
on
it.
I
would
take
this.
You
have
the
question:
hey
Co,
all
right,
so
we
can
take.
We
can
talk
about
the
solute
more
later
on,
but
let's
move
on
to
my
next
crazy
experiment,
so
we're
shocked
by
this
and
we
said:
ok,
let's
do
something
again
crazy.
What
we
did
is
we
took
em
nice
dataset
and
we
shuffle
the
labels
and
we
train
the
network.
So
this
time
we
train
the
network
but
with
random
labels,
and
we
got
some
results
for
Saline
seam
ABS.
B
Remember
this
network
never
learned
what
0
is
because
0
was
0
to
9.
All
the
digits
are
randomly
labeled.
However,
if
you
look
at
this
experiment,
explanation
I
can
still
kind
of
see
the
number
0
yeah
it
is
pretty
different,
but
still
if
I
were
given
just
that
explanation,
I'm
not
sure
if
I
can
tell
these
are
salient
C
maps
from
random
model.
B
Right,
so
what
is
this?
What
can
we
learn
from
this?
Well,
it's
something
that
I
very
often
encounter
is
our
confirmation
bias.
So
this
entire
field
of
saliency
maps
has
been
developed.
Many
years
we
looked
at
many
people
looked
at
these
pictures
and
they
expected
to
see
a
bird.
They
saw
a
bird
they
liked
it.
This
is
right,
including
myself.
B
I
had
I
had
work
since
alien
snaps
too,
and
that's
something
that
we
repeatedly
see
that
when
humans
see
on
evidence
that
agrees
with
your
hypothesis,
we
love
it
or
like,
of
course,
it's
right,
of
course,
I'm
right
and
that's
something
that
it's
a
feature
of
us,
maybe
not
a
bug,
and
but
it's
something
that
we
have
to
take
into
account
when
we're
designing
these
methods.
I
perfectly
talked
about
this
about
the
deep
image
prior
work
by
aldy
rule.
Another
paper
also
mathematically
proved
that
some
of
these
methods
are
just
reconstructing
the
image.
B
It
has
nothing
to
do
with
prediction.
Around
the
same
time,
however,
some
of
these
papers
has
gone
out
of
their
way
to
check,
with
doctors
and
experts
to
see
whether
this
having
these
maps
are
helpful
and
they
show
that
they
did
so.
Perhaps
there
is
something
perhaps
something
that
they're
showing
some
candidate
set
of
pixels.
Maybe
it's
useful
in
some
way,
but
we
need
to
do
a
lot
more
study
to
figure
out
what
was
it
and
how
do
we
amplify
that
signal?
B
Recent
work
by
Sanjeev
Arora
from
Princeton,
followed
up
and
suggest
a
very
simple
elegant
fix
to
pass
this
test,
which
I
find
pretty
awesome.
Awesome
start,
but
you
know
this
is
really
a
low
bar
test
come
on
like
we
should
have
been
passing
this
test
long
time
ago.
They
can't
believe
it
took
many
many
years
to
come
up
with
this
dumb
test,
so
I
paused
and
thought
well.
What?
If
there's
harder
tests?
Can
we
came
up
with
a
harder
test,
so
this
is
what
we
did.
B
This
is
work
with
Shari
at
the
goal
is
to
have
a
benchmark
so
that
you
can
evaluate
your
interpretability
methods.
This
is
I
roughly
alluded
to.
We
make
up
a
data
set
such
that
we
know
which
part
of
the
data
set
is
important
or
shouldn't
be
important,
and
we
see
show
how
the
attribution
message
to
or
salience
in
methods
maps
to.
So
how
do
we?
How
did
we
do
this?
We
took
mini
place
this
data
set.
This
has
just
bunch
of
scenes,
forests,
kitchen
stadium
and
then
from
another
data
set.
Ms
cocoa.
B
We
took
a
patch
of
object,
cut
out
pieces,
so
here
I
have
a
dog
and
I
also
have
backpack
and
other
ten
different
things.
Then
we
paste
it's
this
thing
to
every
single
scene,
images
for
all
scene
classes
and
what
is
the
well
dog
is
everywhere.
Dog
is
in
every
single
picture,
so
dog
is
not
important
for
classifying
scenes
and
we
verify
this
in
more
computationally
in
the
paper,
but
we
showed
that
yeah
dog
doesn't
matter
in
the
prediction,
because
I
know
I
made
up
this
day.
That's
that
so
attribution
method
or
salience.
B
A
message
should
not
highlight
the
dog
so
and
we
can
make
a
second
step
and
that's
what
I
just
said.
We
can
make
it
this
even
more
complicated
by
saying.
Okay,
I
can
only
add
the
dog
to
some
classes.
Just
the
forest,
then
dog
is
now
giving
signal
to
space,
classify
the
forest
cause.
Then
the
attribution
in
forest
class
should
be
higher
than
any
other
dogs
in
any
other
classes,
and
we
can
do
this
relatively
one-by-one
and
have
like
a
relative
measure.
So
that's
what
we
did
in
our
paper.
B
We
suggest
three
metrics
as
a
start
for
measuring,
but
we
focus
on
false
positives.
I.
Think
interpretability
method
have
very
similar
history
that
will
come
in
in
a
way
that
we
measure
how
good
this
thing
is.
So
in
traditional
machine
learning
we
first
started
with
accuracy,
and
then
we
say
oh
wait,
maybe
the
precision
and
recall
mattered,
and
then
we
went
to
AUC
and
then
now
we're
thinking
about
robustness,
adversarial
tag
and
all
sorts
of
other
things.
B
B
So
here
what
we
mean
by
false
positive
is
when
interpretability
message
said
it's
important
but
model
they
didn't
think
so.
So
that's
just
the
part
of
the
metric
that
we
are
focusing
on
in
this
dataset
and
we
suggest
three
metrics
I'm
going
to
talk
about
a
little
more
about
the
first
one,
but
briefly
the
second
and
the
third
one.
B
What
we
did
here
input
dependence
ray
is
we
took
a
picture
and
we
did
the
optimization
stochastic
stochastic
gradient
descent
to
make
a
dog
really
really
important,
and
the
question
is
well
now
I
made
it
really
really
important.
We
know
it
because
we
optimized
for
it
then
attribution
should
increase
input.
Independence
rate.
This
is
kind
of
like
adversarial
example.
How
many
people
have
you
had
out
with
your
example
lecture?
Yet,
though,
okay
I
see
okay,
cool,
so
input,
independence
rate
is
something
like
I'll
gain.
B
Take
a
picture
and
I
placed
a
dog
in
a
location
and
I
only
change
that
dog
pixels
such
that
the
network
actually
doesn't
see.
The
dog
I
can
do
this
using
gradients
I
make
sure
that
every
single
layer,
nothing
changes
when
I
add
this
pixel,
and
this
is
kind
of
basis
of
other
examples,
but
if
you
change
something
I
can
either
make
net
or
go
crazy
or
I
can
make
the
network
do
nothing.
This
is
just
simple
radio
based
attack
or
approach.
That's
what
we
do
in
those
two
measures.
B
B
The
first
model
is
this
is
better
yet
the
first
model
is
trained
to
classify
scenes
so
Forest
Stadium
kitchen,
the
second
model,
is
trained
to
classify
objects,
so
it's
dog
that
its
backpack
and
so
on,
and
we
expect
this
where
the
dog
would
have
been,
should
be
very
different
in
these
two
models
and
that's
the
score
that
we
measure
we
do
this
for
many
many
images,
ten
thousand
I
believe
and
we
average
them-
and
here
is
a
quantitative
measure
of
using
this
metric
model.
Contra
score.
This
is
a
grad
cam.
B
This
is
all
the
method
that
you
just
saw
in
the
previous
section,
vanilla,
gradient,
smooth,
gradient
and
so
on,
and
the
T
cap,
which
is
the
technique
that
I'm
going
to
talk
about
soon.
The
scale
of
this
is
one
one
is
the
best,
although
there's
a
little
nuance
there
higher.
The
better
is.
The
message
that
you
should
take
away
from
this
data
set
is
open,
sourced,
we're
open
sourcing
model
and
everything
so
that,
hopefully,
when
we're
doing
this,
we
don't
make
this
mistake,
a
game
that
we're
kind
of
doing
the
ostrich
thing.
B
Where,
like
things
don't
work
like
you,
don't
want
to
see
it
so
sanity
check
doesn't
pass
on
your
method.
We
just
kind
of
ignore
it.
Let's,
let's
not
do
that.
Let's
evaluate-
and
this
is
like
a
low
bar
again
like
you-
can
do
much
better,
more
sophisticated
tests
and
we
can
make
better
benchmark
data
set
to,
but
this
is
a
starting
point.
B
What
this
work
gives
us,
however,
is
a
wish
list.
What
could
we
have
done?
Better?
The
saliency
map
relied
on
the
queue
that
is
based
on
human.
Human
has
to
look
at
it
and
they
have
to
subjectively
judge
like
what
you
said
like
Oh
belly
is
important,
so
maybe
it's
a
shape
of
the
belly.
Maybe
it's
a
texture
of
their
belly
like
if
you
have
to
reason
and
it
used
the
pixel
as
a
module.
But
you
know
you
must
don't
think
in
pixels
I,
don't
go!
B
Look
at
the
stock
picture
and
say:
look
at
the
pixel
number
two
point,
one!
Thirty!
Five!
Isn't
that
cute
I
say:
oh
the
fluffy
thing.
That's
cute
right,
so
maybe
we
can
have
something:
some
more
quantitative
quality
function
and
use
something
more
human-friendly,
high-level
concepts
like
for
finesse
or
texture
or
color
instead
of
the
pixel,
which
is
artificial,
and
perhaps
that
will
help
us
to
help
lay
person
to
understand
machine
learning
models
better.
B
You
don't
have
to
know
that
computers
process
images
in
pixels
because
we
could
have
been
living
in
a
world
where
analog
computers
are
everywhere.
It
didn't
have
to
be
digit.
Well,
I
guess:
there
reason
why
it
has
to
be
digital,
but
maybe
there's
a
parallel
universe,
where
everyone
is
using
animal
computer
that
we
will
be
living
in
a
completely
different
space.
Pixels
are
not
not
the
way
that
we
we
communicate.
B
B
B
Let's
use
this
alien
sea
map
that
we've
been
honoring
to
help
us
think
about
what
we
really
want
as
an
explanation.
So
here
is
a
sale,
a
type
of
salience
in
my
album
for
this
picture
now,
as
I
stare
at
this
very
carefully
I
see
this
human
in
front
of
the
cash
machine.
So
that
made
me
think,
maybe
the
existence
of
a
human
concept
in
this
picture
mattered
in
cash
machine
prediction
and
oddly
this
cart
wheel
behind
the
human
is
also
highlighted.
That's
a
little
weird
why
the
cart
wheel.
B
Perhaps
that
also
mattered
for
some
real
reason,
and
if
these
concepts
didn't
matter,
I
would
like
to
know
which
one
mattered
more,
because
if
a
human
mattered
more,
there
may
be
a
little
more
comforting
than
with
if
the
wheels
mattered
more
and
whether
this
is
true
for
all
cash
machine.
Pictures
should
I
be
worried
about
this
or
not
so
more
global.
Explanation
who
watch
you
reckon
Morty.
B
If
you
have
when
I
highly
highly
recommend
it,
it
took
me
a
while
to
get
used
to
it
so
I
think
it's
a
very
West
American
culture
I
grew
up
in
South
Korea,
but
like
once,
you
get
used
to
it.
It's
beautiful,
beautiful,
looking
forward
to
the
next
season,
I'm,
not
getting
paid
to
say
this,
but
I
love,
Rick
amore.
This
is
a
character
in
Rick
amore.
Where
he's
always
says
like
I
can
do.
B
I
can
do
something,
but
anyway
our
our
character
says
well
no,
but
we
can
express
these
concepts
that
we
want
the
explanation
for
like
humans
and
wheels
in
pixels,
especially
across
many
images,
and
it
was,
it
would
have
been
fine
if
you
had
these
two
things
as
an
input
features,
but
you
didn't
I
just
came
up
with
it.
I
just
looked
at
this
picture
and
I
had
this
insight
of
all
the
humans.
Of
course,
humans
are
always
in
front
of
cash
machines,
so
you
didn't
have
that.
B
So
would
I
be
great
if
we
have
a
quantitative
way
to
measure
how
important
these
concepts
that
you
just
came
up
with
that's
what
we
did
testing
with
concept
activation
vectors,
T,
keV
I
really
wish
I
had
named
this
taco
I
regret.
I,
regret
this
deeply,
but
I
couldn't
quite
figure
out
how
to
cut
throw
in
there.
B
So
it's
cool
tikal
and
what
it
does
is
you
have
a
concept
like
race
or
gender
that
you
want
to
measure
whether
that
was
important,
your
prediction
and
we
can
give
you
content
ater
explanation
only
if
the
concept
was
indeed
important.
Even
if
the
concept
wasn't
part
of
the
train,
so
let's
be
concrete.
Here's
a
concrete
example.
Let's
say
you
have
a
model
that
takes
a
picture
and
predicts
whether
there
was
a
doctor
in
the
picture
or
not,
and
I
want
to
know
whether
the
gender
counts
have
mattered
in
this
prediction.
B
If
we
think
that
the
concept,
crazy
concept
that
you
came
with
didn't
have
anything
to
do
with
the
prediction
that
it
will
say,
no
I,
don't
know
what
you're
talking
about
sorry.
I
can't
give
you
an
answer
all
right.
So
for
the
running
example,
we're
gonna
have
a
zebra
and
I
am
curious.
Whether
striped
contact
was
important
in
predicting
zebra
or
not.
So,
first
and
foremost,
you
may
say:
okay
a
little
concept,
high-level
concept,
it
makes
sense.
But
what
do
you
even
mean
by
this,
like?
How
do
you
get
this?
How
do
you
express
this?
B
We
do
the
simplest
possible
thing.
What
is
it?
Well,
you?
The
user,
provides
some
examples
of
the
concepts
like
in
this
case
strapless
shirts,
and
you
also
provide
some
random
images
as
long
as
the
majority
of
them
are
not
stripe.
You're
fine-
and
you
get
this-
that
you
have
this
network
that
you
you
already
trained-
that
you're
interested
in
interpreting.
B
Now
what
we
do
is
we
simply
collect
the
activation
activation
F
of
L
for
these
pictures,
the
concept,
pictures
and
the
random
pictures-
and
you
simply
just
classify
train
or
linear
classifier
that
separates
the
two
and
find
the
factor
that
is
orthogonal
to
the
decision
boundary.
And
what
is
this
factor?
Well,
it's
just
a
vector
that
points
from
random
activation
directions
to
concept
activation
direction
is
not
new.
B
This
is
like
talk
to
Vincent
talked
about
work
to
that
vector
of
the
gender,
although,
despite
the
the
the
nuance
there,
the
linearity
over
such
your
high-level
concept
has
been
shown
in
over
and
over
again
in
many
papers
and
we're
just
doing
that
the
same
thing,
it's
just
the
simplest
way.
Now
we
have
this
vector.
What
are
you
going
to
do
with
this
vector
to
get
that
quantitative
score?
That
I
talked
about
tikal
score?
It's
also
really
simple,
I
think
to
this
audience
in
particular.
It
might
be
really
toy
example,
but
we
use
directional
derivative.
B
What
is
it
well,
you
just
take
the
probability
of
the
zebra
and
take
the
derivative
with
respect
to
that
vector
that
we
just
got
the
stripe
miss
vector.
So
what
is
this
intuitively?
Well,
it
means
if
I
move
my
image
to
slightly
to
more,
like
a
concept
slightly
less
like
the
content.
How
much
would
the
probability
of
zebra
swim
if
it
swings
a
lot?
It's
important
concept.
B
If
it
doesn't
it's
not
an
important
concept,
it's
pretty
simple
and
we
do
this
for
many
zebra
pictures
and
we
do
this
simplest
thing,
which
is
to
say
well
among
all
the
zebra
pictures.
I
have
how
many
of
them
in
return
the
positive
directional
derivative.
In
other
words,
if
I
had
hundreds
ebrill
pictures,
how
many
zebra
pictures
having
the
concept
increase
the
probability
of
a
zebra.
B
Now
this
is
simplest
definition.
You
can
put
o
inequality.
You
can
flip
this
to
test
the
negation
or
absence
all
sorts
of
things.
That
fits
your
bill,
but
this
is
just
a
one
number.
That's
simplest,
oh
great,
that's
pretty
much
the
entire
framework.
It's
very
simple,
I
view
this
as
like
a
canvas
that
someone
else
is
it
smarter
than
me
can
go
and
make
it
fancier
and
more
maybe,
principled
and
and
nicely
complete,
but
that's
really
it
that's
the
base.
Yes,
we
have
a
follower
that
mark
that
we
were
trying
to
discover
concepts.
B
I'll
talk
about
that,
a
little
bit
a
lot
harder.
If
we
solve
the
problem
we
solve
AI.
Now
it's
it's
actually
not
true.
It's
interesting.
It's
an
important
insight
that
the
concept
actually
doesn't
have
to
come
from
the
training
distribution
it
could,
it
doesn't
have
to
be
zebra.
These
are.
These
are
actually
just
a
clothing
from
goodness.
B
B
So,
but
one
more
question
here
we
have
this
cow
that
we
learn
in
embedding
space,
but
we
know
that
in
high
dimensional
space
things
are
funky,
but
for
Sara
example,
actually
leverages
that
will
or
a
type
of
characteristics
is
because
it's
high
dimensional,
our
intuition
is
out
the
window,
so
things
are
a
little
funky.
So
we
want
to
make
sure
that
the
calves
that
we,
a
cap
we
have
didn't
return
sensitivity.
The
high
directional
derivative
by
chance.
So
there's
two
ways
to
check
this
qualitatively
and
quantitatively,
but
I'll
only
show
you
the
one:
that's
quantitative!
B
B
You
will
see
something
this
nice
caption
description
like
this.
Then
what
we
do
is
we
train
another
set
of
cabs
that
are
random,
so
random
concept.
What
does
that
mean?
It
just
means
that
is
their
stripes.
We're
gonna
put
some
random
pictures,
but
pictures
or
data
points
that
are
from
still
from
roughly
training
distribution.
B
So
this
is
like
meaningless
concept
as
a
straw
as
a
counterpart
and
we're
gonna
do
some
T
testing
or
Welsh
testing
better
to
say
whether
this
the
mean
of
these
two
distributions
are
statistically
significantly
different,
and
only
then
we
say
this
concept
might
mean
something
I
think
there
are
better
tests.
If
you
have
a
better
tighter
test.
Let
me
know
I
would
love
to
hear,
but
it's
a
starting
point
I
think
it's
better
than
nothing.
So
that's
what
we
do.
B
A
qualitative
test
is
actually
better
that
you,
you
sort
your
input,
pictures
with
respect
to
the
cab
that
you
train
and
see
if
it
aligns
with
your
concept
of
that
that
your
concept
of
that
concept,
and
if
it
doesn't,
then
that
means
your
cab
is
not
good.
I
think
that's
a
second
way
or
double
proof.
They
say
in
addition
to
quantitative
method,
so
that's
it
I'm
gonna
show
you
some
results.
I
might
leave
some
time
for
the
question,
so
I'm
gonna
pass,
maybe
skip
the
sanity
check
experiment.
B
What
this
does
is
simply:
oh
I
guess
I
didn't
include
it.
So
it's
I
did
a
very
similar
thing
with
the
dog
and
see
an
example
and
I
confirm
that
the
T
cab
is
able
to
match
at
least
this
toy
case,
the
truth,
and
we
compare
that
with
salience
e
maps.
We
do
human
experiments
to
show
that
salience
in
map
mislead
people
very
easily
and
when
they're
mislead
they're
very
confident
that
their
answers
are
right
because
again
they
see
what
they
like
and
they
say,
I'm
as
confident
as
I
will
ever
be.
B
It's
a
it's
a
I
like
this,
my
favorite
part
of
the
results
section,
but
second,
we
were
excited
after
the
sending
a
check
experiments.
We
were
like
okay,
let's
go
wild
and
run
this
too
widely
use.
It
means
image
classification
networks,
so
we
did
that
we
tested
different
types
of
concepts,
are
colors
race
and
objects
for
the
first
color
concept
we
tested
for
fire
engine,
the
red
and
green
comes
out
high
greens
high,
because
a
lot
of
fire
engines
are
on
grassy
field
we
found
now.
B
This
makes
sense
unless
you're
from
Australia
anybody
from
Australia
how
party,
okay
I,
actually
met
somebody
who
knew
exactly
where
this
picture
was
taken
from
it's
from
Canberra.
It's
not
everyone
in
Australia,
it's
just
a
Canberra
the
capital
and
there's
this
fire
station
that
does
that
has
a
yellow
fire
engine.
In
fact,
yellow
fire
engines
are
better
than
red.
Can
anyone
think
of
why
oh
yeah
good
point?
Actually,
maybe
that's
another
reason.
B
B
But
you
know
when
we
looked
at
this
graph,
it
made
sense
to
us,
but
it
wouldn't
make
sense
to
people
from
Canberra
Australia
and
that's
showing
some
geographical
bias
that
the
model
might
have
learned
another
of
course
racial
biases,
that
we
confirm
with
that's
consistent
with
previous
findings
and
if
you
remember
the
very
first
deep
dream,
blog
post
where
they
classified
dumbbells
and
they
showed
oh,
we
found
this
neuron
number
by.
I
don't
know
37
38,
that
shows
muscular
arms
and
they
say
who
maybe
dumbbell
classification
muscular
arm
is
relevant.
B
So
I
wanted
to
test
this.
So
I
went
and
google
searched
arms
and
other
object
images,
and
we
can
now
quantitatively
say
yeah
arms
did
matter
and
for
this
case,
I
didn't
have
this
dataset,
so
I
collected
33
examples
of
arms
from
Google
image
and
if
passes,
that's
a
school
testing
and
everything-
and
in
fact
this
is
something
that
was
surprising
as
I
have
more
people
at
Google.
Using
this
method,
people
find
there's
you
just
don't
need
that
many
examples
to
learn.
B
That
factor
also
there's
a
big
caveat
that
maybe
there
are
some
domains
that
you
cannot.
You
do
need
more
concept
examples,
but
we've
seen
like
15
to
33
was
the
right
number
for
images,
cancer,
images
and
other
other
images
who
used
so
going
back
to
the
goal
that
we
initially
started
with
and
I'm,
hoping
that
some
of
these
tools
that
we
develop
is
helpful
for
you
to
ensure
that
your
values
are
aligned
in
your
model
so
really
excited.
We
wanted
to
go
to
real
domain,
which
is
working
with
doctors.
B
We
looked
at
diabetic,
retinopathy
application,
diabetic,
retinopathy
or
dr,
is
a
treatable
symptom,
a
steerable
condition.
If
you
discover
early,
you
don't
lose
your
sight,
but
if
you
discover
lay
that's
not
good
news,
so
medical
brain
has
this
model
that
can
classify
IDR
pretty
well,
but
our
question
was
well:
is
this
model
using
something
that
doctors
use?
Is
it
using
as
if
it's
a
it's
a
doctor
making
the
decision,
because
people
are
looking
into
this
model
and
deploy
it
to
areas
where
there
are
no
doctors?
B
B
Then
we
asked
the
model
using
tikal.
What
do
you
think
so
for
model
that
it's
for
the
class
that
was
most
severe
so
at
this
point
the
hope
is
lost.
It
looks
like
it's
doing
what
doctors
would
have
done.
The
green
ones
are
the
ones
that
doctors
would
like
to
see.
It's
high
red
it
slow,
make
sense
and
accuracy
models.
Accuracy
is
pretty
high
story's
a
little
different
when
the
models
accuracy
is
a
mediocre.
B
So
this
is
most
my
level
of
dr,
and
in
that
case
it
looks
like
the
model
is
paying
attention
to
a
concept
that
doctors
would
have
not
looked
at,
so
we
were
possible
by
this
and
we
dig
farther
and
we
found
out
that
the
air
level
one
has
a
lot
of
confusion
with
dr
level,
two
even
for
doctors
that
the
next
level
of
a
severe
severe
level
with
ER
and
the
DA
level.
Two
has
a
lot
of
this
feature.
It's
like
ahem,
which
color
like
the
the
vessel
kind
of
been
blown
up.
B
However
I'm
blinking
the
name,
but
it's
it's
that,
like
observing
the
blood
vessel
that
thing
blown
our
or
swollen,
so
that
that
was
a
mystery
than
they
realized
that,
although
we
should
probably
clean
up
the
label
before
going
farther
and
again
I
think
this
brings
us
back
to
the
goal
where,
when
we
want
to
confirm
that
domain
extra
knowledge
is
being
used
in
the
way
that
we
would
have
been
used
and
that's
what
you
are
one
of
the
things
you
want
to
know.
This
tool
might
be
useful.
B
It's
also
important
to
know
that
in
some
other
things
that
our
focus
points
it
out
limitations
of
TCAP,
the
first
limitation
is
that
concept
has
to
be
expressible
using
examples.
If
you
want
to
express
something
higher-level
like
love
or
some
other
higher
levels,
the
super
higher
level
things
I,
don't
know
how
to
do
that.
The
Atika
wouldn't
work
for
you
and
users
need
to
give
the
concept
so
I
find
it
really
interesting
it
dip.
This
divides
people
into
two
like
the
HCI
folks
and
domain
experts.
Love
this.
They
say.
B
Oh
yeah,
I
can
give
you
examples,
and
now
it
can
speak
my
language,
whereas
if
you
talk
to
anyone
in
brain
they'll
be
like
yeah,
but
can
you
automatically
discover
it?
Can
you
put
the
humans
out
of
the
loop
and
and
see
if
we
can
do
this
already
and
I
think
both
problems
are
important,
one
that
you
can
impose
our
language
on
the
model
and
and
and
to
discover
something
that
maybe
we
never
knew
before
and
that's
not
that's
TBD.
We
have
a
follow-up
work.
B
However,
the
danger
of
this
is
that
when
humans
see
these
patches,
for
example,
when
you
see
these
patches,
what's
the
concept
right
anything
else,
but
the
first
thing
that
people
think
of
is
floor.
However,
what
if
it
was
brown
color
like
again,
you
you're
kind
of
injecting
your
confirmation
bias
when
you
interpret
this
new
concept
so
that
there's
a
plenty
of
challenge
left
here
to
how
to
tease
that
out
yeah,
you
can
hardly
see
it's
like
this
thing
in
the
floor
of
the
basketball
court
I
see.
B
So
there
are
three
pictures
of
basketball
class
pictures
and
our
goal
is
to
find
a
patch
of
pixel
that
consists
that
can
make
a
cluster
and
that
becomes
our
new
concept,
discovered
concepts
and
we
discovered
this
concept
cluster,
and
these
are
just
spool
patch,
just
to
zoom
doing
blowing
up,
and
there
are
many
like
arms
and
like
bowls
and
other
things,
so
there's
a
there's
a
lot
of
work.
There.
Also,
it's
important
to
note
that
none
of
these
are
causal
I.
Think
causal
inference
plays
important
role
in
interpretability.
B
We
recently
worked
on
extending
this
work
to
causal
tea
calves,
but
computations
are
really
hard
or
a
lot
harder
for
to
do
this.
You
need
a
VA
e
for
your
model
and
things
get
kind
of
complicated.
You
have
to
add
a
lot
of
assumptions.
We
have
to
have
a
causal
graph.
That
assumes
there's
no
other
compoundings
than
anything
everything
else
that
that
we
listen
the
console
graph,
there's
a
trade-off.
You
can
do
this
cool
code.
B
Is
there
if
you
want
to
look
at
and
here's
a
self-promoting
slash
side
I
was
I
was
thrilled
that,
like
regular
people
who
are
not
in
computer
scientists
that
really
started
looking
at
this,
which
made
me
so
excited,
because
my
original
goal
was
to
build
a
tool
for
lay
people
who's
for
actually
doing
like
saving
people
and
solving
problems
out
in
the
world,
not
like
a
PhD
from
computer
science,
a
doctor,
the
fake
doctor,
I,
think
well,
oh
ma
bean.
The
many
of
you
guys
are
also
I
call
myself
fake
doctors.
B
So
sooner
I
talked
about
it
at
Google
I/o,
which
was
awesome.
I've
made
my
dad
happy
more
than
anything.
I
went
to
UNESCO
in
Paris
to
receive
this
award.
The
UNESCO
Center
has
like
this
professional
stage.
That
was
pretty
awesome.
Experience
I
also
was
thrilled
to
see
responses
from
inside
of
academia.
There's
this
paper
I'm
part
of
it,
but
I
didn't
do
much
work.
It's
really
Kerry
Kerry
Kai's
work.
She
used
this
concept
vector
to
help
doctors
to
store
images.
This
is
a
prostate
cancer.
B
The
pieces
of
cell
pictures
and
doctors
found
it
very
useful,
and
it
was
one
of
the
best
paper
on
our
mission
at
this
KY
conference.
Top
HDI
conference
this
year
forgot
to
put
this,
but
a
couple
of
years.
A
couple
of
weeks
ago,
I
met
this
person
from
aerospace
in
Corporation.
His
name
is
Eric
and
he
used
he
Cal
to
print
interpret
a
model
that
predicts
storms.
So
he
used
like.
Is
this
storm
kept
for?
Why
is
it
about?
The
eye
of
the
storm?
Is
about
something
else
color
about
other
things.
B
B
So
we
have
a
couple
of
moments.
We
talked
about
these
three
methods,
however,
I
think
there's
a
lot
more
interesting
things
that
I
didn't
cover
other
versus
aerial
attacks.
You
can
all
tag
interpretability
method
and
make
it
completely
screw
up
totally
possible.
You
want
some
robust
explanation.
Things
change
a
little
bit.
You
still
want
to
give
stable
explanation.
Another
thing
that
I
think
I'm
getting
much
more
more
excited
these
days
is
interpretability
for
science.
Can
you
interpret
a
superhuman
performance
model
to
discover
something
that
we
didn't
know
before?
B
Can
we
add
a
piece
of
knowledge
to
the
knowledge
of
the
human
humanity?
I
think
that's
a
very
exciting
topic
and
I
think
a
lot
of
people
in
this
audience
and
specifically
would
love
to
collaborate.
If
you
have
interesting
data,
if
you
have
a
model,
that's
interesting
either
about
earthquake
or
mental
health,
depression
or
autism.
That
sort
of
thing
I'm
super
excited.
Let
me
know
if
you're
interested
binnu
at
Berkley
is
doing
some
awesome
work
on
looking
at
neuroscience
and
you
know
mix
on
interpretability
for
science.
B
So
for
the
interest
of
time,
I'm
gonna
skip
this
part,
I
think
I
already
lecture
that
we
need
to
do
evaluation
properly.
We
need
to
remember
humans
are
biased
and
irrational.
If
you
love
reading
on
doing
project
by
Danny
Kahneman
winning
a
Nobel
Prize
winner
of
economics,
he
wrote
oh
I
guess
someone
else
wrote
that
book
front
doing
project
that's
beautifully,
describing
what
kind
of
crazy
biases
humans
have
and
that
we
can't
get
away
with
it.
It's
beautiful
book
highly
recommend.
B
We
need
help
from
HCI
community
who
knows
how
to
build
a
interface
design,
a
workflow,
very
important
and
so
on
and
I
think
it's
a
I'm,
a
dog
person
I,
but
I
like
putting
cats
in
this
site,
they're
funny
I.
Think
it's
a
really
exciting
time
to
be
in
the
field
of
interoperability.
There
are
so
much
things
we
can
do
and
help
people
so
that
they
want
to
do
the
right
thing
and
we
can
help
them
to
do
the
right
thing
and
give
them
the
power
to
be
more
responsible.
Thank
you.