►
From YouTube: Social Cybersecurity WG: Targeted Knowledge Infusion To Make Conversational AI Explainable and Safe
Description
Presenter: Dr. Manas Gaur
Institution: University of Maryland, Baltimore County
A
Welcome
everyone
for
the
social
cyber
security
working
group.
This
is
the
first
meeting
of
2023.
So,
although
it's
a
month
late,
but
Happy,
New
Year
hope
everyone
is
enjoying
their
2023
so
far
and
Health
Prosperity
productivity
and
all
the
best
wishes
for
the
rest
of
the
Year
to
everyone.
Today
we
are
joined
by
Professor
Manas
Gore,
who
is
doing
exceptional
work
in
knowledge,
graphs,
knowledge
infusion.
A
So
in
a
moment
we
will
hear
from
Professor
Gaul,
but
before
we
give
the
floor
to
Professor,
Gore
I
would
like
to
introduce
everyone
who
are
attending
starting
from
me,
so
I'm
Dr,
nitin,
Agarwal
world
energy,
chair
and
distinguished
professor
of
information
science,
and
also
the
founding
director
for
Cosmos
Research
Center
here
in
Arkansas,
I'm,
also
a
co-chair
for
the
social
cyber
security
working
group,
which
is
operating
under
NSF
auspices
at
the
South
Bay
data
Hub.
A
So
we
are
really
grateful
for
Chris
and
the
entire
South
Big
Data
Hub
team
for
helping
us
in
coordination
of
these
meetings.
Now
we
see
a
lot
of
participants
here,
so
what
we
typically
do
is
go
around
the
room
and
I
ask
individual
folk
to
introduce
themselves.
So
I
will
call
out
your
name
and
just
in
a
one
breath,
please
introduce
yourself
so
we'll
start
with
bala.
A
B
A
Good
to
have
you
garima
next.
E
F
Hello:
everyone
I'm
a
research
assistant
greatest
assistant
at
Cosmos,
lab
and
I'm,
currently
being
my
Master's
at
ulr,.
C
B
Good
morning,
everyone
I'm
a
richest
research
assistant
with
Cosmos
and
a
master
student
as
well
in
University
of
Arkansas
Little
Rock.
Thank
you.
A
Perhaps
reject
has
issues
with
his
microphone.
We
will
come
back
to
you
Richard.
We
have
Vanessa
next.
G
A
Good
to
have
you
Stephen,
all
right,
I
think
gone
through
everyone
reject
if
your
microphone
is
still
giving
you
trouble.
You
can
tell
us
about
yourself
in
the
chat.
I
will
be
monitoring
chat
for
the
rest
of
the
session,
so
it
is
my
distinct
pleasure
to
introduce
Professor
Manas
Gore,
who
is
going
to
talk
to
us
about
our
targeted
knowledge
infusion
to
make
conversational
Ai
explainable
and
safe
minus
is
an
assistant
professor
in
the
computer
science
and
electrical
engineering
department
at
University
of
Maryland
Baltimore
County.
A
Before
UMBC,
he
held
the
position
of
senior
AI
research
assistant
with
the
knowledge
and
dialogue
team
within
the
AI
Center
at
Samsung
research
America.
He
completed
his
PhD
in
computer
science
from
the
artificial
intelligence
Institute
at
the
University
of
South
Carolina.
His
PhD
research
was
supported
by
Eric
and
Wendy
Schmidt
data
science
for
social
good
Fellowship
AI
for
social
good
Fellowship
from
data
minor
Inc
epsrc
ukri
Grant
through
Alan
Turing,
Institute
and
NSF
Eagle
on
knowledge,
infused
learning
his
most
noted
work
on
knowledge,
infused
learning,
parallels,
neurosymbolic,
AI
in
mental
health.
H
Thank
you
Nathan
and
thank
you
Chris
for
inviting
me
and
I
want
to
share
my
yes
I
think
my
screen
is
visible.
Yes,.
A
H
I
want
to
awesome.
H
Everyone
and
thank
you
all
for
being
here,
and
this
talk
today
that
I
wanted
to
talk
about
is
also
a
part
of
the
triple
AI
new
faculty
talk
that
I
will
be
giving
at
Washington
DC
triple
a
conference
this
year
and
essentially
that's
the
work
that
I
will
be
presenting
towards
the
South
Big
Data
Hub
working
group
Workshop
is
along
an
idea
that
I
have
been
working
for
past
couple
of
years,
and,
and
it
includes
a
lot
of
the
data,
the
information
on
conversational
AI
but,
most
importantly,
I-
wanted
to
Enlighten
you
all
that
why
we
need
knowledge
infusion
in
making
conversational
Ai,
explainable
and
safe.
H
There
have
been
a
lot
of
discussion
recently
with
the
first
safety
workshop
and
conversational
AI
started
by
Microsoft
and
and
Facebook
and
over
the
time
they
have
been
in
and
a
need
for
explanations
and
safety
when
you
are
dealing
with
conversational
agents
because
they
are
AI
based
and
they
are
automated.
So
just
like
what
you
are
getting
is
what
you
will.
H
What
you're,
giving
to
the
model
is
what
you're
getting
on
the
model,
so
I
wanted
to
make
the
the
stock
to
be
more
specific
to
the
sensitive
domains
in
of
AI,
where
AI,
adapter
or
AI
adoption
is
really
of
critical
concern
and
have
been.
There
have
been
some
work
along
that
line,
which
I
will
be
discussing
and
motivating
this,
this
today's
presentation
on
the
knowledge
infusion
and
how
do
us,
Target
a
knowledge
infusion,
occurs
in
AI
on
to
this.
H
Let's
start
with
a
very
generous
scenario
that
I
was
enlightened
when
light
enlightened
with
last
year,
when
I
was
interacting
with
Dali
and
I
was
I
gave
a
simple
query,
essentially,
which
is
very
easy
to
understand
to
all
the
folks
who
are
in
machine
learning
and
artificial
intelligence
that
I'm
actually
looking
for
an
architecture
that
it
has
data
extraction,
modeled
by
training,
grid,
search
and
cross
validation,
as
as
a
component
in
that
architecture,
but
to
the
AI.
It
seems
like.
H
H
There
are
some
gender
wise
issues,
but
I
don't
want
to
go
into
the
details
of
that,
because
that's
kind
of
have
a
different
Turf,
but
essentially
what
we
are
looking
into
is
the
segmentation
of
this
entire
query
into
bunch
of
words,
which
has
a
drastic
consequence,
as
it
loses
the
semantics
behind
the
entire
query
that
has
been
given
to
the
model.
H
On
the
other
hand,
if
I
just
do
a
simple,
Google
search,
which
is
pretty
much
there
for
almost
like
quite
a
decade
now
they
use
Google,
Knowledge
Graph.
So
intrinsically
you
get
the
desired
query
your
desired
response.
By
simply
looking
at
the
query-
and
you
are-
you
are
getting
the
desired
architectures
in
front
of
you,
which
you
really
need.
H
What,
if
a
model,
let's
take
a
Dali
who
would
take
a
retrieval
engine
that
can
actually
retrieve
the
sensible
images
and
can
utilize
these
images
to
come
up
with
a
decent
response
than
such
generation
would
not
have
happened.
So
there
is
a
potential
where
a
retrieval
based
mechanism
can
can
work
with
a
generation,
an
automated
generation
mechanism
to
improve
the
capability
of
generation.
H
H
But
there
are
a
lot
of
improvements
that
are
required
and
that's
the
very
reason
that
chat
GPT
has
a
reinforcement,
learning
with
human
feedback
in
a
loop
that
it
tries
to
take
the
human
feedback
and
improve
upon
it,
but
essentially
that
human
feedback
may
or
may
not
be
correct.
H
If
I,
therefore,
that
was
one
just
example:
what,
if
I
just
take
a
very
simple
example
in
mental
health
only,
but
in
a
very
in
a
not
so
sensitive
case,
as
it
was
in
the
previous
example?
What
if
I
say
I
am
feeling
tired,
which
is
pretty
much
after
year.
After
months
of
after
days
of
work,
you
go
home
and
you're
feeling
tired,
but
what
is
I
add
another
concept
to
it
that
it's
been
weak
since
I
have
slept
now.
H
The
focus
of
the
model
is
definitely
on
the
sleep,
but
the
sentence
it
starts
with
and
the
information
that
is
built
on
it,
because
the
next
word
prediction
model,
because
whatever
it
starts
with
it,
has
to
follow
that
that
sentence.
So
if
it
starts
with
sleep,
the
rest
of
the
information
would
be
catered
to
The
Sleep,
whereas
if
you
look
at
this
query
is
essentially
looking
at
some
of
the
terms
like
fatigue,
exhaustion,
probably
sleep
disturbances,
and
if
I
can
do
a
simple
Google
search
with
this.
Such
a
query.
H
I
can
straight
away
inform
look
at
the
few
bad
nights
on
insomnia
as
the
top
most
response
or
related
information
that
I
can
get
on
Google
Search.
Now
the
question
is
what's
happening
with
the
generation
right,
why
the
generation
is
happening
and
in
a
in
a
very
uncontrolled
manner,
can
it
be
controlled?
H
So
those
are
the
questions
that
can
derive
our
initiative
towards
explainability
and
safety
and
in
this
line,
Deep
Mind
also
analyze
such
a
scenario
and
introduced
Sparrow,
which
is
one
of
the
of
good
working
and
a
decent
conversational
agent
that
talks
to
human
and
answers
the
question
using
the
live,
Google
search,
something
that
we
were
hypothesizing
in
a
previous
slide
right.
Essentially,
they
also
use
reinforcement,
learning,
which
is
also
the
part
of
chat
GPT.
But
what
they
have
done
is
interest
is
very
ingenious
is
that
they
have
guide
the
conversation
to
23
rules.
H
These
rules
describe
the
safety
aspect
of
safety
mechanism
of
the
deepminds
conversation
bot
and
what
is
an
interesting
point
that
actually
drive
my
research
in
the
last
year.
Probably
the
fall
of
the
last
year
was
this
quote
by
Jeffrey
irwing.
That
dialogue
is
a
good
way
to
introduce
safety
in
AI
through
dialogue,
but
AI
can
understand
whether
it
is
going
in
the
right
direction
and
or
whether
the
generation
is
acceptable
to
the
human
or
not.
H
So
that's
a
good
reduction
to
make
that
an
a
conversational
models
can
be
made
AI
safe
if
we
are
able
to
utilize.
Conversational
data
sets
carefully
crafted
for
AI
to
make
a
safe,
AI,
safe
and
generation,
but
at
the
end
of
this
goal,
even
if
the
sparrow
is
out
there,
the
generation
component
is
still
a
parametric
memory
and
it
has
to
be
look
at
focus
on.
H
Okay,
so
it
still
needs
to
look
at
the
concept
within
the
query
and
do
not
generate
responses
which
are
predefined
as
a
template
in
the
in
a
in
a
conversational
agent,
for
instance,
in
this
case
I'm.
Sorry,
to
hear
that
I
can
help
you
with
that.
It
has
no
connection
to
the
severity
and
essentially
charging
picture.
H
Gpt
does
do
a
good
do
a
good
job
in
just
connecting
such
a
query
to
the
helplines
so
that
the
person
gets
the
help,
but
essentially
that
is
a
step
towards
safety,
but
essentially
it
is
also
required
to
walk
to
a
person
through
A
Conversation
Piece,
while
passively
activating
the
The
Crisis
crisis
online.
Also
crisis
calls
so
those
things
are
actually
a
part
of
the
safe
conversations,
because
a
person
who
has
that
kind
of
a
self-hop
situation,
A
helpline
definition
by
giving
the
definition
would
not
be
sufficient
to
come
complete
the
the
job
walk.
H
What
gpt3,
which
is
a
precursor
of
charge,
GPT
used
to
do
that
so
hallucinating
by
definition,
is
by
that
it
deviates
the
the
generated
response,
deviates
significantly
from
the
subject
matter
and
is
often
unreasonable,
because
that
response,
if
it's
changing
that
means
you're
changing
your
definition
or
your
perspective,
or
your
your
understanding
to
that.
For
that
question,
that's
where
we
actually
step
into
the
concept
of
explainable
AI,
and
we
need
why
we
need
explainable
AI
is
a
core
component
within
an
artificial
intelligence
systems,
so
in
1970
Vic's
sense
making
Theory
actually
defined
ex.
H
What
does
explanation
means?
It
says
that
that
explanations
are
always
a
human-centered
sentences
which
are
makes
sense
to
human
expert,
so
they
are
always.
They
have
never
been
a
part
of
some
kind
of
a
heat
map
or
some
kind
of
a
weight,
Matrix
or
a
important
score.
Rather
they
are
actually
deliberate
human-centered
sentences
or
something
of
the
sort
that
has
some
sense
to
human
expert.
H
Alternatively,
can
we
look
at
if
this?
If
we
look
at
this
definition
from
the
perspective
of
AI,
can
I
say
this?
That
explanation
from
AI
are
traces
of
attention,
because
we
are
looking
at
the
attention
model
nowadays
and
all
the
models
that
we
are
working
on
are
Transformer
based
models,
so
we
say
explanation
from
AI
are
traces
of
attention
which
are
essentially
Collective
experiences
that
the
model
is
gathering
from
the
training
data.
H
Now
the
task
that
we
are
now
entitled
to
is
how
does
this
AI
model,
who
is
gathering
collective
experience
over
the
training,
can
connect
to
some
real
world
entities
and
actionable
definitions?
If
that
can
be
performed,
then
you
are
able
to
introduce
an
experimentality
component
within
an
ai's
generation
Behavior.
H
Thank
you.
Why
I
did
this
process?
What
are
we
getting?
The
purpose
that
we
are
gathering
from
this
approach
is
now
the
AI
has
is
able
to
analyze
and
it
can
be
understood
easily
by
the
domain
experts,
and
the
only
goal
of
introducing
this
AI
explainability
is
that
we
do
not
want
the
model
to
generate
hallucinating
outcomes.
Hallucinating
means
that
the
generated
behavior
of
the
AI
is
drifting
away
from
the
human
desired
functionality.
Over
the
time
we
are
gathering,
we
are
having
generated
different
responses.
H
H
So
that's
where,
where
a
study
of
explanations
also
starts
that
there
is
a
last
year
last
few
years,
there
have
been
work
on
explanations,
but
from
the
system
side
like
next
for
the
very
popular
way
of
defining
something
is
lime
and
charm,
which
were
the
prominent
explanations
based
systems
and
that
were
being
used
in
machine
learning,
and
this
is
on
a
schematic
of
how
this
live
and
shaft
work.
H
And
what
is
interesting
to
interesting
to
observe
is
that
knowledge,
though,
can
be
a
part
of
lament
chef,
but
they
can
always
introduce
new
features
into
the
model
and
interesting.
Interestingly,
you
need
a
surrogate
model
to
verify
a
black
box
model,
so
your
original
Black
Box
model
is
still
a
black
box.
H
You
are
just
using
a
surrogate
Model
A
linear
model
which
are
interpretable
which
are
explainable,
and
you
are
using
them
to
explain
what
a
black
box
model
is
doing
and
still
it
do
not
establishes
the
connection
that
the
the
Learned
features
of
the
model
are
connecting
to
the
real
world
entities
and
that's
where
even
Center
release
2019
paper
talks
about
that.
We
need
to
drive
models
that
are
inherently
explainable
and
that's
where
you
can
actually
introduce
more
explainability
aspect
within
the
model
natively.
H
So
in
this
case,
if
the
redly
explainable
model,
if
I
want
to
turn
the
Lyman
shop
to
an
unharently
explainable
I,
would
Define
such
a
scenario
where
I
would
like
the
knowledge
to
be
a
part
of
the
interpretable
feature.
I
want
the
knowledge
to
be
able
to
map
any
input,
data
to
those
interpretable
features
and
I'm
making
them
my
model.
H
So
once
this
example
that
I
I
conducted
in
2019
was
that
what
if,
if
I,
want
to
Define
user
level
explanations,
because
that's
something
that
I
was
able
to
relate
with
bike's
theory-
is
that
if
I
have
an
input
text
and
I'm
able
to
generate
some
kind
of
attention,
Maps,
which
is
pretty
much
doable
nowadays
from
python
libraries.
So
what
we
are
trying
to
do
here
is
that
we
are
taking
the
maps
of
the
heat
map
of
the
model
and
they
are
saying
now.
These
maps
are
essentially
what
they
are
Concepts
their
words.
H
So
what
we
are
trying
to
do
is
that
we
are
trying
to
say:
let's
take
a
database
that
has
some
interlinked
entities
and,
let's
just
run
the
database,
queries
over
the
database
queries
using
these
words.
Essentially
what
we
are
trying
to
do.
We
are
trying
to
say
that
let's
just
derive
a
trace
of
all
the
entities
that
are
being
passed
over
the
entire
database
and
see
what
other
trays
are.
Is
there
any
commonality
in
the
trace,
and
can
we
construct
a
tree
now
the
question
that
arises?
Where
do
we
stop?
H
We
stop
heuristically
what
I
thought?
Maybe
it
was,
it
has
read
a
lot
of
improvement,
but
what
I
thought
was?
Maybe
the
prediction
has
given
can
be
connected
to
the
to
the
node
in
the
knowledge,
in
the
knowledge
base
or
in
a
database
which
has
some
similarity
with
the
prediction,
and
if
we
reach
to
that
particular
node,
we
can
stop
the
trace
and
that's
where
we
want
to
take
that
stress
and
explore
whether
we
are
able
to
achieve
any
closed
labels
as
the
predictions.
H
So
this
is
was
one
of
the
post
on
explanation.
Work
on
how
do
we
introduce
a
user
level
explanations
according
to
White's
Theory
and
even
the
the
definitions
what
Dr
Rudy
mentioned?
H
So
this
makes
me
feel
that
there
is
some
kind
of
a
vertical
that
that
we
are
looking
into.
We
have
a
generative,
a
conversational
AI
at
the
top
and
I'm
able
to
introduce
some
kind
of
good
and
good
enough
explanations
of
foundational
explanations
and
I
can
explain
them
through
user
level.
Explainability
can
I
make
this
explanations
by
this
classification,
a
part
or
intrinsic
component
of
an
AI,
so
that
the
conversational
models
can
actually
benefit
from
such
a
classification.
H
If
I'm
doing
a
classification
at
if
I
say
the
classification
is
pretty
good
and
it
is
very
decent
and
I
can
explain
it,
then
probably
the
generation
would
be
explainable
because
it
has
some
Synergy
or
connection
with
what
you
have
done
in
as
a
classification.
But
how
do
we
convert
complete
this
particular
process?
What
is
a
core
component
that
we
can
start
with
to
have
a
generative
and
conversational
AI
that
is
explainable
and
safe?
H
So
that's
actually
the
work
which
even
further
derived
me
to
work
into
the
mental
health
aspect,
try
to
look
at
the
various
ways.
Agents
have
been
developed
over
the
time
how
they
have
been
interacting,
what
type
of
generations
that
they
do
and
is
there
a
way
to
confirm
that
this
generation
is
right
or
wrong.
H
So,
for
instance,
this
conversation,
which
is
pretty
much
about
nervousness
commonality
that
happens
in
our
days
in
the
USD
level
or
the
school
going
students
the
generation
is
such
a
mom
with
a
model
are
pretty
much
risky,
essentially
I'm,
saying
risky,
because
they
have
no
connection
to
verify
whether
such
a
generation
is
is
of
Worth
or
not
I'm.
Now,
turning
this
entire
generation
problem
into
a
classification
problem,
so
what
I'm
trying
to
do
I'm
introducing
a
classification
component
within
the
generation
car
AI?
H
It's
telling
the
AI
is
saying
that
if
the
generated
outcome,
if
the
generated
question
do
not
match
some
safety
guidelines,
something
very
similar
to
Sparrow,
if
they
are
not,
if
we,
if
we
are
not
having
a
connection
with
safety
lexicon,
all
the
generated
questions
do
not
match
some
existing
questionnaires,
which
are
clinically
approved
and
safe.
Potentially
the
generated
outcomes
are
not
safe,
and
can
we
change
these
these
term?
These
questions
by
taking
the
help
of
those
those
questions
in
the
middle
question
here.
So
there
are
two
problems
over
here.
H
One
problem
is
a
classification
problem.
Another
is
I'm.
Chair
with
the
head
of
the
classification.
I
am
generating
a
new
sentence,
a
new
question,
so
there
are
some
simple
and
effective
strategies
to
come
across
this
such
a
behavior
pretty
much.
The
first
is
use
of
external
knowledge
completely
to
transform
the
initial
data,
make
your
initial
data
very
safe,
so
that
any
or
undesirable
Behavior
can
be
minimized
to
the
largest
extent.
Another
possibility
is
to
develop
some
auxiliary
tasks,
to
confirm
the
natural
language
understanding
capabilities
of
your
model.
H
That's
what
we
we
Define
as
auxiliary
tasks.
Another
point
is
how
about
tagging
the
data
now
we
all
know
that
deep
neural
networks
actually
learns
why
special
tags,
some
special
tokens,
can
I
introduce
some
new
tags.
It's
saying
that
this
part
of
the
sentence
is
knowledge.
This
part
of
the
sentence
is
query.
This
part
is
about
my
personal
profile
or
my
personal
issues
or
my
concerns,
and
this
is
the
response.
That
means
that's.
The
models
should
look
at
this
tag
and
should
give
it
response
after
the
attack.
H
H
One
such
example
is
how
about
I
use
key
phrase,
extraction
or
key
phase
generation
approaches
we
have
given
some
bunch
of
words:
I
create
a
phrase
out
of
those
words
right,
like
training,
my
model,
you
know,
in
a
unsupervised
fashion,
to
generate
some
key
phrases
which
are
n
grams.
Five
grams
topic
models
all
collapse
together
in
a
system,
so
that
can
be
defined
as
key
phrase,
extraction
or
generation
approach.
H
H
Let
the
user
tell
me
or
let
the
human
tell
me
whether
that
generation
is
right
or
wrong
and
I
think
that
that's
reward
and
improve
upon
it.
Now
these
rewards
can
be
either
make
maintenance
as
a
system
or
chat
GPD
does,
or
you
can
actually
make
it
intrinsically
with
the
model.
By
introducing
some
rewards
now,
if
I
say,
rewards,
how
would
it
how
can
I
construct
rewards,
for
instance,
in
this
query,
let's
say
mother
by
troubles
concentrating
while
reading
newspaper
or
watching
television.
H
The
left
side
is
a
generation
by
a
T5
model
which
has
no,
which
is
a
legitimate
language
model,
not
as
the
capability
of
gpd3,
but
a
good
enough
capability
to
be
able
to
execute
on
a
gpus.
So
at
the
right
is
what
you
can
see.
As
of
a
bunch
of
rewards
that
I
constructed
saying
that
natural
language
inference
is
one
reward.
Syntactics
course
I
can
construct
those
either
as
a
matrix
who
can
who
can
calculate
syntactic
scores
and
there's
another
metric?
Who
can
calculate
semantics
course
now?
H
Can
I
use
these
metrics
to
compute
a
reward
to
tell
me
where
irrespect,
so
this
reward
is
irrespective
to
your
loss
function,
but
this
is
what
tells
me
that
whether
the
models
generation
is
going
into
the
human
desired
Behavior
or
not.
So
we
we
look
at
these
questions
and
then
this
Behavior
tells
me
that
which
question
which
set
of
questions
needs
to
be
generated
and
which
set
of
questions
can
be
ignored.
H
So
now,
all
of
these
information
that
we
are
available
now,
how
do
we
include
them
inside
the
model
we
have?
We
have
talked
about
some
of
the
strategies
we
talked
about,
that
the
knowledge
is
required.
Now.
How
do
we
include
them?
I
have
given
you
some
high
level
information,
let's
just
dive
into
it,
with
a
more
concrete
examples
of
how
this
information
can
workplace
can
take
place
inside
the
NLP
and
that's
where
the
Paradigm
comes
into
the
picture,
which
I
worked
as
my
part
of
my
PhD,
which
is
knowledge,
infused
learning?
H
How
do
we
include
this
knowledge
as
a
wholesome
part
of
your
AI
learning
strategies
and
that
forms
a
very
much
start
of
this
upside
down
pyramid?
H
So
knowledge,
infused
learning
is
a
method
or
a
class
of
that
involves
the
incorporation
of
broader
forms
of
knowledge
that
we
have
right
and
how
do
we
include
them
in
the
AI
formations
so
that
they
are
they
make
the
model
interpretation
interpretations
better?
You
can
interpret
the
model
and
actually
the
the
generation
or
the
explanation
that
you're
getting
are
of
of
user
level.
H
You
are
able
to
match
it
to
the
user's
expectations
and
user
can
comprehend
that
let's
take
an
example
of
a
scenario,
so
this
on
the
right
left
is
basically
a
conversational
piece
that
that
we
were
playing
around
with
a
lot
of
conversational
agents,
and
we
took
this
conversation
piece
from
Reddit
and
we
took
this
from
from
depression
subreddit,
which
is
pretty
much
the
huge
community
and
the
generation.
H
If
you
see
on
the
left,
is
a
generation
by
people
who
are
anonymously
responding
to
this
person,
and
you
get
if
you
look
at
this
slides
very
closely
that
the
that
the
question
that
being
asked
have
never
been
a
part
of
the
topics
or
the
concepts
within
the
query.
That
means
that
people
who
are
asking
the
question
are
actually
reading
the
content,
comprehending
it
and
then
asking
the
follow-up
question.
H
They
are
not
asking
the
question,
which
are:
what
are
the
topics
or
the
terms
that
are
already
present
in
the
query,
either
in
the
user's
post,
whereas
if
you
look
at
an
agent
trying
to
work
on
it
in
this
case,
there's
T5
we
are
able
to
generate.
We
are
able
to
generate
some
more
of
redundant
questions,
because
such
questions
have
already
been
answered
by
the
user
right
now,
if
I.
H
If
these
questions,
even
if
you
fine-tune
the
model,
they
are
really
good
like
they
would
be
very
so
if
I
do
fight
tuning
I'm,
making
my
model
very
focused
on
this
on
this
particular
data
or
without
this
particular
post.
So
definitely
you
will
get
a
good
questions,
but
are
they
safe?
Are
they
good
to
you
know
to
be
asked
to
the
user?
So
that's
where
we
we
say
that
the
safety
is
not
there,
but
essentially
still
the
questions
even
after
fine
tuning
are
very
specific
to
the
context.
H
H
So
these
are
the
series
of
experiments.
We
found
that
that
the
your
the
conversations,
probably
by
the
even
after
fine
tuning,
is
still
unsafe.
It
go
rent.
We
are
not
able
to
find
diagnostically
relevant
information
in
the
the
questions
and,
most
importantly,
I
run
my
model
today.
I
read
my
model
after
a
day
and
my
Generations
are
completely
different.
So
definitely
the
Hallucination
is
still
in
the
process.
H
So
we
took
this
example
and
tried
to
look
at
from
the
perspective
of
reinforcement,
learning.
We
took
the
same
question
the
same
post
generated
by
T5.
We
ran
it,
we
fine-tuned
it
on
the
Reddit,
the
Reddit
Corpus,
and
now
the
generation
are
still
different
and
what
we
are
looking
at
is
a
good
question,
but
still
unsafe.
So
what
we
did
was
we
introduced
a
sort
of
a
reward.
Now
I
talked
about
the
reward
in
the
previous
slide,
but
this
is
the
reward
that
we
were
introducing.
H
We
say
that
there's
a
blood
score
so
Blood
score
is
the
largest
trained
metric
proposed
by
Google,
which
says
that
that
if
you
are
able
to
generate
a
sentence
that
is
human,
understandable
and
or
human
comprehensible,
it
will
give
a
legitimate
good
score,
but
it
cannot
account
for
the
safety
for
the
safety
aspect.
What
we
are
trying
to
do
is
we
are
making
sure
that
the
generated
question
and
some
clinical
guidelines
in
this
case
a
phq
nine,
which
is
a
patient
Health
questionnaire
right
I-
will
share
you.
H
I
will
show
you
what
exactly
these
are.
But
let's
say
this
is
an
elongated
list
of
all
the
questions
right.
So
if
you
have
all
the
list
of
questions,
you
want
to
compute
the
similarity
and
if
it
is
greater
than
by
a
particular
threshold
and
counting
on
it
and
making
sure
that
it
is
significantly
larger
than
value
0
and
if
it
is
not,
then
potentially
I
will
be
giving
it
a
value
minus
one.
So
here
I'm,
actually
looking
at
how
divergent
or
how
divergent
or
convergent
is
my
generation
to
the
questions
to
the
pH
right.
H
Whether
the
generated
question
is
matching
to
the
expectations
of
the
human
or
not
so
we
are
talking
about
retrieval.
There
is
a
part
of
the
retrieval
earlier
that
I
was
I
started
my
presentation
with
now
we
have
a
generator,
but
now,
along
with
the
generator,
we
have
an
evaluator
framework
as
well,
which
is
checking
that
whether
this
generation
are
matching
the
guidelines
or
not.
So
this
is
a
manifestation
of
of
phq9
questionnaire,
which
is
a
set
of
nine
questions,
and
every
question
has
any
has
a
particular
score.
H
We
saw
significant
Improvement
on
the
lower
threshold
of
0.4.
We
when
we
started
to
increase
the
threshold,
definitely
the
quality,
degraded
right
if
I'm
not
super
high,
like
convincing
to
be
a
kind
of
a
deployed
system,
but
it's
certainly
a
work
in
progress
along
this
line,
but
what
we
found
that
the
phq
Knight
based
questions
can
be
intrinsically
be
a
part
of
of
a
such
a
network.
I
can
also
be
packed
as
a
as
a
foundation
for
the
evaluation
Network.
H
So
that's
where
we
saw
that.
Can
we
Define
this
pH
u9
right
now
it
was
externally
a
part
of
a
loss
function,
just
Computing
a
similarity.
Can
we
have
this
phq
line
to
be
intrinsically
a
part
of
that
Network,
because
now
that's
just
a
loss
function
tuning
but
can
be
a
part
of
the
decision
making
of
that
model.
So
that's
where
we
introduced
the
concept
of
process
knowledge,
because
the
clinical
knowledge
is
are
always
process
oriented.
They
are
not
unstructured,
they
are
structured
and
every
seek
every
sequence
of
question
are
Predator
are
determined.
H
No
clinician
asks
second
question.
First,
then,
the
first
question
based
on
the
scenario,
all
the
questions
are
asked
in
a
sequence.
So
what
are
the
things
that
we
want
to
work
on
it?
If
you
want
to
do
such
kind
of
scenario?
The
first
thing
is
that
we
transform
the
data
through
a
particular
knowledge,
guided
question
tag.
H
It's
kind
of
like
say
getting
a
threshold
again,
but
with
a
more
conscious
involvement
of
the
process
in
process
knowledge
and
that's
where
we
stepped
into
the
safety
aspect
of
the
model.
Now
the
model
is
intrinsically
going
into
the
safety
aspect
where
we
are
now
introducing
this.
This
notion
of
of
clinical
guidelines
intrinsically
inside
the
model,
so
safety
by
definition,
has
three
features
that
were
according
to
or
what
I
have
been.
H
What
I've
have
in
my
literature,
saying
that
robustness
was
one
of
the
reason
for
safety
that
that
it
is
defined
as
the
AI
models
continues
to
operate
within
the
safe
limit
of
specifications?
Specifications
are
clinical
guidelines,
awesome
questionnaires,
where
you
are
trying
to
assure
that
the
AI
model
adheres
to
these
guidelines,
and
if
it
does,
you
are
able
to
show
explainability
in
the
model.
H
So
the
first
thing
that
I
talked
about
the
tagging.
This
is
what
I
meant
by
tagging.
So
tagging
means
that
now
my
every
input
in
my
text,
I,
am
now
introducing
new
tags,
saying
that
this
is
Q2.
This
is
Q3.
This
is
Q3.
This
is
Q3.
This
is
q9
so
that
the
model
knows
that
this
part
of
this
content
is
associated
to
which
part
of
the
questions
in
the
phq
Knight.
But
this
is
just
a
data
set.
We
want
to
enforce
the
suggested
scenario
or
certain
approach
is
a
part
of
the
model
as
well.
H
So
this
is
just
a
illustration
of
the
page
u9,
but
what
essentially,
what
we
did
was
we
introduced
new
cross
attention
blocks.
So
now
you
see
that
there
are
nine
core
cross
attention
blocks,
corresponding
to
nine
different
questions.
Now,
once
one
can
ask
me
that
if
you
have
nine
cross
addition
blocks,
how
come
the
Knight,
how
come
the
nine
different
blocks
have
sufficient
knowledge
about
it?
H
So
that's
where
we
utilize
some
lexicon,
some
existing
knowledge
that
people
have
built
over
the
time
in
mental
health
and
we
use
them
to
populate
the
definitions
of
these
blocks.
Making
them
have
sufficient
knowledge
so
that
they
can
compute
the
attention,
scores,
deterministically
and
that's
what
we
introduced
as
phq-9
intrinsically
inside
the
model
as
an
auxiliary
task.
Now
these
are
auxiliary
tasks.
The
only
thing
that
I'm
not.
D
H
Is
that
I'm
not
doing
a
binary
classification
on
different
h-type,
one
H
type,
2
and
so
on
and
so
forth
by
intrinsically
I'm,
making
this
block
a
part
of
the
phq
nine
questionnaires
and
and
important
in
what
it
interested
me
that
I
can
just
replace
nine
questions
with
any
other
questionnaires
that
mental
health
people
uses
and
this
functions.
This
network
starts
to
work
accordingly.
H
So
what
was
the
benefit
of
such
a
heavy
exercise
that
we
did?
So
what
you
see
on
the
top
is
a
very
convoluted
attention,
representation
by
a
self-attention
model,
and
it
gives
you
all
the
highlighting
possible
in
the
content
right
and
if
you
plot
this
attention
map
on
a
plot
by
Computing
the
similarity
between
the
attention
Matrix,
because
attention
Matrix
are
between
zero
and
one
they're,
always
between
0
and
1..
H
If
you
compute
the
similarity
between
that
values
and
the
and
the
questions
are
even
see
that
whether
those
values
map
to
any
php9
questions,
because
you're
highlighting
is
taken
by
the
words
right.
So
you
take
these
words
and
compute
the
similarity
with
the
php9
questions
you
will
see
it
will
give
you
all
the
highlighting
possibles
in
the
phq9,
because
there
is
no
way
to
check
which
phq9
question
had
a
higher
impact
on
this
content
compared
to
other,
and
that's
where
we
want
to
look
into.
Can
we
have
more
adaptation?
H
So
this
was
my
hypothesis
that
why
would
have
what?
Why
would
you
have?
Why
would
we
have
such
a
unique
like
uniform
similarity
across
all
the
phq9
questions?
All
the
all
the
patient
Health
questionnaire
questions,
but
what
we
need
was
that
if
I
have
a
phq9
question
1
to
the
model,
it
should
tell
me
what
part
of
this
words
is
that
content
is
highlighted.
If
I
give
it
phe9
question
2,
what
part
of
the
sentences
or
part
of
the
paragraph
is
highlighted.
H
So
what
we
found
that
we
found
only
specific
questions
that
are
getting
good
treatment
from
this
content.
That
means
these
questions
are
something
that
is
answerable
or
they
can
be
answered
from
given
context
from
the
user,
and
the
question
which
did
not
show
up
in
this
list
are
the
potential
questions
that
the
clinician
that
the
model
should
be
asking
to
the
user
for
more
information
if
it
is
required.
H
So
when
we
now,
when
I,
have
these
bunch
of
questions
and
now
I
want
to
judge
which
question
to
ask,
so
that's
where
the
question
comes
in
what
is
the
next
safe
questions
to
generate?
So
that's
where
we,
we
created
a
series
of
data
sets
which
start
with
strategic
guidelines.
What
type
of
questions
needs
to
be
generated?
What
kind
of
categories
these
questions
belongs
to
so
that
you
are
able
to
generate
these
questions
accordingly?
So
now
this
till
this
point,
everything
is
automated.
H
There
are
all
the
existing
resources,
but
at
this
point
you
need
a
specific
resource
that
can
tell
the
safety
and
Char
GPD
in
this
case
does
by
human
feedback.
We
they've
made
the
system
online
and
we
were
giving
the
feedback
and
that's
where
the
things
becomes
more
relatable.
Now,
in
this
case
system
scenario,
we
cannot
make
a
system
online
because
it
can
be
a
detrimental,
so
we
need
to
make
a
kind
of
specialist
data
sets
of
this
particular
kind.
H
So
what
we
did
was
in
this
kind
of
a
data
set.
We
introduced
a
specific
algorithm
which
is
very
similar
to
what
charity
introduced.
We,
rather
than
doing
a
reinforcement,
learning
guided
approach
of
whatever
are
the
things
that
they
have
proposed.
What
we
were
doing
is
we
were
defining
specific
scores
that
we
can
compute.
The
first
score
is
simply
the
probability
of
the
model.
Second
is:
does
the
model
generated
a
question
of
a
particular
tag,
or
not?
H
So
that's
something
classification
you
are
now
third
point
is
that
is
there
a
similarity
with
some
knowledge
base
as
a
knowledge,
similarity
that
is
again
a
part
of
the
classification
task
and
the
fourth
point
is
still
an
intersection.
So
if
you
see
the
first
question,
the
first
point
in
this
algorithm
is
a
generation
and
rest.
H
All
of
them
are
some
sort
of
classification
task
that
you
are
introducing
and
what
you're
trying
to
say
that
summing
up
of
this
whole
score
is
the
entire
loss
score
that
you
are
Computing
into
your
in
your
model,
so
safety
was
introduced
by
safety,
lexicons
and
explainability
was
constructed
by
using
the
access
to
the
knowledge
base
in
Mayo
Clinic
to
see
if
we
are
able
to
check
how
was
the
generated
questions
matches
to
the?
How
much
are
the
generated
questions
are
actually
have
some
similarity
with
the
knowledge
base.
H
And
if
I
want
to
look
at
this
fourth
point,
which
is
the
intersection
for
safety,
what
we
say
that
anything
that
has
those
set
of
tokens
in
the
generated
text
that
are
actually
overlapped
or
they
have
an
exact
overlap
with
the
safety
lexicon
that
we
are
able
to
say
that
those
questions
are
are
pretty
much
the
same
or
are
they
are
decent
to
be
generated?
So
that's
how
we
are
looking
at.
We
identified
what
questions
to
be
generated.
H
We
now
modeling,
which
question
to
be
generated
from
the
list
that
we
have
right,
and
then
we
are
looking
into
how
we
are
making
sure
that
the
safe
generation
is
going
to
happen
in
among
those
generated
questions
that
the
model
is
providing
by
introducing
this
kind
of
a
scenario
of
explainability
and
safety
right.
Irrespective
of
what
root
score
shows
and
what
blue
score
shows.
We
found
that
there,
after
even
after
introducing
such
phenomenons
of
experiability
and
safety
through
this
algorithm
inside
the
model,
we
are
not
hurting
any
rule
score.
H
We
are
not
hurting
the
blue
score
of
the
model,
but
we
are
definitely
improving.
Thus,
the
safety
aspect,
as
well
as
the
knowledge
capture
aspect
of
the
model
now
here,
I'm
saying
and
safety,
means
that
we
are
making
the
model
less
unsafe,
that
we
more
safe.
H
It
was
just
that
we
took
the
terminology
and
later
on,
figured
out
that
it
was
actually
safety
metric,
but
we
we
submitted
the
work
as
as
with
the
title
and
safety
as
one
of
the
music
and
another
point
was
knowledge
capture
where
we
say
that
how
much
of
the
terms
are
actually
getting
captured
within
Mayo
Clinic.
So
definitely
there
was
a
huge
effect
that
it
we
do
not
show.
H
We
did
not
see
too
much
of
capturing
with
the
knowledge,
but
essentially
this
is
a
work
in
progress
and
things
will
evolve
over
the
time.
H
So
when
some
of
the
generations
that
we
we
were
testing-
and
so
especially
this
particular
example,
we
were
able
to
see
that
some
of
the
questions
do
map
to
the
phq9.
So
one
question
that
I
have
marked
star
did
not
show
up
in
the
phq9,
but
we
cannot
deny
that
such
a
question
is
safe
because
it
talks
about
some
antidepressants
which
are.
H
So
the
answer
generation,
which
will
go
beyond
the
scope
of
phq
Knight,
but
that's
where
you
are
actually
looking
at
the
capabilities
of
AI,
because
AI
gives
you
the
power
of
going
beyond
what
is
already
being
constructed
with
expert,
but
but
without
with
the
concepts
that
they
map
to
something
that
you
can
actually
refer
to
or
checked
check
with.
H
So
this
was
a
demo
that
we
we
developed
for
the
entire
agent
which
we
will
represent
as
a
triple
AI
2023
demo
track,
and
it
is,
it
is
a
kind
of
a
series
of
work
that
we
actually
can
that
we
did
on
process
knowledge
and
fuse
learning
specifically
trying
to
introduce
explainability
and
safety
intrinsically
inside
on
a
deep
neural
network
that
is
capable
of
generating
languages.
H
So
the
summary
of
my
talk
I
walk
you
through
through
explainable
AI,
where
I
talked
about
something
inherently.
What
does
inherently
explainable
means?
What
does
it
handle
experimental
systems
here
are
in
the
mental
health
context?
What
does
hallucinations
risky
Generations
in
coherency?
What
are
they
and
how
they
are
looking,
how
they
are
being
perceived
in
Real,
World,
Knowledge.
H
There
were
the
years
that
there's
a
series
of
work
data
sets
that
we
have
built
and
and
new
metrics
that
we
have
also
constructed
specifically
to
test
explainability
safety
and
things
of
this
nature,
because
these
metrics,
like
accuracy
and
Fs
core,
have
already
fall
short
of.
So
that
was
a
work
that
we
accumulated
into
what
we
termed
as
a
knowledge,
intensive
language,
understanding
tasks
with
this
I
and
my
talk.
Thank
you
all
for
your
attention
and
I'm
open
to
any
questions.