►
Description
Understanding communications between source and target requires deciphering the unique language, semantic and contextual characteristics reflected through sentiment, emotion, intention and divergent thinking. A context-aware and knowledge-enhanced computational approach to the analysis of these narratives breaks down this long-running and complex process into contextual building blocks that acknowledge inherent ambiguity, sparsity, and creativity.
Date: 04/02/20
Presenter: Ugur Kursuncu & Amit Sheth
Institution: University of South Carolina
A
All
right
we'll
get
started,
and
it
is
my
pleasure
today
to
host
two
speakers.
They
will
Co
present
a
talk
on
underlying
factors
of
extremism
on
cyber
social
space,
dr.
Ahmed
said
and
dr.
Kerr
sent
you
I
know
dr.
Hamid
said
for
over
10
years
now,
maybe
more,
and
he
has
done
tremendous
work
in
any
analysis,
especially
social
media
and
other
forms
of
data.
He
is
recently
the
founding
director
of
University
of
South
Carolina's
university-wide,
artificial
intelligence
Institute.
A
He
is
a
fellow
of
I
Triple
E
in
Tripoli
I
in
Tripoli
is
his
current
research
teams
or
at
the
intersection
of
big
data,
and
you
know,
including
physical
cyber,
social,
Big,
Data,
semantic
cognitive,
perceptual,
computing
knowledge,
infuse
learning
and
Augmented
personalized
health.
Along
with
dr.
said,
we
have
dr.
collection,
who
is
a
postdoctoral
researcher
and
the
aih
team,
and
he
received
his
PhD
in
computer
science
from
the
University
of
Georgia.
A
His
research
is
in
the
pursuit
of
gaining
a
better
understanding
of
online
human
behavior,
with
social
impact
spanning
multiple
disciplines
from
computer
science
to
cognitive,
political
and
Health.
Sciences,
specifically,
one
leg
of
his
work
focuses
on
an
enhanced
analysis
of
online
communications
concerning
malevolent
activity
is
run
by
ill-intentioned
actors
or
orchestrated
by
malicious
crooks
that
are
harming
our
society
at
large.
So
in
that
introduction,
I
would
like
to
request
a
doctor
said,
and
after
croissant
you
to
tell
us
more
about
their
water
Thank.
B
You
knitting,
so
really
the
real
presenter
is
the
presenters
or
I'm
just
getting
things
started
or
is
coordinating
a
an
interesting
activity
in
our
group
that
involves
participation
of
a
whole
bunch
of
collaborators.
Number
of
them
are
outside
South
Carolina,
and
you
will
see
the
name
of
some
of
them
or
most
of
them
listed
in
the
part
of
the
team
in
the
slide.
So
good,
let's
go
to
second
slide
here.
Sure.
B
They
the
fact
that
there
is
a
significant
role
that
the
social
media
platforms
have
provided
the
ability
for
extremist
group
to
expand
their
activity
to
recruit.
He
is
reasonably
well
documented,
and
the
concern
is
that
the
platform
themselves
are
making
inadequate
effort,
and
we
probably
remarked
later
on
why
that
may
be
the
case
and
why
research
like
the
one
that
we
present
to
be
present
are
is
necessary
to
gain
the
handle
such
that
they
can
play
a
bigger
role.
B
Government
wants,
to
you,
know,
platform
to
X,
to
use
their
special
responsibilities
and
remove
harmful
content,
and
recently
I
had
a
chance
to
give
a
talk.
It's
interesting.
It
occurred
to
me
that
the
amount
of
use
of
social
media
for
negative
activity
might
have
exceeded
a
motor
used
to
do
the
positive
activity,
or
at
least
the
negative
activity
is
so
large
in
percentage
that
that
is
becoming
a
big
big
problem.
Next
page.
B
Even
in
the
narrower
context
of
Kobe
19,
you
start
seeing
all
these
headlines.
So
while
there
is
been
lot
of
you
know
in
the
past,
we
we
are
aware
of
the
use
of
Xperia.
You
know
of
this
one
line
platforms
by
the
extremists,
even
in
these
any
particular
crisis
that
happens
event.
That
happens.
You
know
these
guys
are
really
adapting
very
fast
now
looks
like.
B
One
of
the
best
known
example
of
misuse
of
social
media
platform
has
been
this
problem
of
the
travelers,
so
around
thousand
Americans.
During
this
you
know
time
frame
that
is
mentioned
on
the
slide
and
since
2011
three
hundred
Americans
have
attempted
to
travel
or
have
traveled
to
places
inside
it
by
jihadist.
B
B
B
What
that
domain
is
is
very
critical,
as
we
will
demonstrate
to
you
that
it
is
not
a
problem
that
can
be
solved
by
the
standard
techniques,
AI
techniques
or
information
material
techniques,
that
deeper
understanding
of
the
content
and
conversations
is
very
important,
and
that
requires
you
to
really
model
domains
in
our
case
will
be
religion,
ideology
and
violence,
understanding
the
users
and
modeling.
The
users
is
also
very
important.
Who
is
a
recruiter?
Who
is
a
follower
and
what
are
the
different
stages
of
radicalization?
B
B
One
of
her
collaborators
is
a
political
science
at
UMass
and
he
has
done
empirical
work
and
in
our
case
you
know
the
relevant
to
our
particular
case.
Is
this
ridiculous
scale?
Well,
he
has
laid
out
the
you
know:
scale
the
he
has
laid
out
the
scale
to
these
five
levels
from
non
to
severe
and
how
you
know
you
can
see
the
description
of
what
those
stages
are
main
Singh
views.
B
No
support
for
particular
moderation,
all
the
way
to
call
for
action
to
join
the
and
fight
the
use
of
violence,
and
there
are
a
bunch
of
indicators
that
have
been
identified
that
help
us
kind
of
connect.
The
content
with
the
you
know
this
different
stages
of
ridiculous
Asian.
So
these
are
the
kind
of
indie
concepts
of
topics
that
are
discussed
at
different
stages
of
radicalization.
B
And
so
the
idea
here
is
to
analyze
the
content
in
the
context
that
can
provide
us
deeper
understanding
or
the
factors
characterized
in
this
reading
class
in
process.
As
you
see,
you
know
this
excess
we
had
access
to
some
highly
verified
data
of
this
process
actually
carried
out
on
the
on
the
network,
social
network,
and
you
could,
you
know,
being
able
to
study
these
understand
the
process
and
how,
in
a
methodical
way,
a
non
extremal
use.you
person
is
ridiculous
by
any
recruiter.
B
B
If
you
have
wrong
results,
then
you
have,
you
could
have
harmful
results.
You
know
that
is,
extremists
will
succeed
incorrect
and
that
you
will
not
be
able
to
counter.
You
know
this
kind
of
activities,
but
you
have
false
alarm.
That
is
also
very
jarring.
That's
also
very
it's
not
you
know
labeling
somebody.
That
is
when
that
person
is
not
his
can
cause
a
huge
you
know
backlash,
and
you
have
to
be
very
worried
about
that.
So
quality
reliability
of
this
prediction
is
very
important,
but
also
very
challenging.
C
So
the
data
fact
that
we
have
used
in
this
work,
it
was
verified
by
Twitter
and
later
they
accounts
for
were
suspended
by
Twitter,
and
the
time
frame
was
like
seven
years
of
timeframe
and
the
data
set
includes
538
extremist
users,
and
then
we
have
also
equally
same
amount
of
non
extremist
users
and
non-extremist
users
were
specifically
created
from
a
corpus
that
contains
Muslim
users.
So
the
reason
that
we
have
actually
picked
Muslim
users.
C
So
we
wanted
to
see
you
explore
the
data,
what
kind
of
what
what
we
have
in
the
data?
We
need
to
understand
the
concepts
that
we
have
in
the
data:
the
language
characteristics
of
the
data,
how
these
people
are
actually
using
the
religious
concepts
and
the
and
other
mainstream
concepts
that
we
that
we
see
I'm
here
all
the
time
every
day
and
what
we
have
seen
here
in
the
experience
content.
So
there
are
actually
three.
C
Contexts
and
one
is
actually
the
religion
and
the
other
one
is
ideology
and
another
one
is
hate
and
what
they
do
is
most
of
the
time
they
are
trying
to
propagate
their
ideology,
but
they
are
actually
using
what
for
doing
for
doing
that.
They
are
using
religion
and
they
are
trying
to
also
disseminate
hatred
against
Western
countries
using
their
ideology
and
their
religion
as
well.
C
So
for
that
reason
that
was
one
observation
that
we
have
done
in
the
in
the
data
set
and
after
them
we
have
into
the
literature
what
literature
was
saying
actually
about
the
contextual
dimensions,
and
it
was.
It
also
confirmed
that
actually,
the
political
scientist
working
on
religious
extremism
today
also
identify
roughly
the
contingent
dimensions,
religion,
ideology
and
hate,
and
then
we
decide
that
we
need
to
actually
represent
the
data
based
on
these
different
context
of
the
content.
C
So
the
reason
that
this
is
an
important
approach
is
the
distribution
of
the
prevalent
terms
might
be
actually
different
for
each
of
the
eighth
dimension,
because,
as
we
know
that
some
of
the
concepts
that
exist
in
the
so
extremists
are
coming
in
and
hijacking
those
mainstream
concepts
and
none
that
they
twist
the
meaning
and
none
they
actually,
they
actually
assigned
some
other
meanings.
That
is
going
to
serve
their
ideology
and
they
are
changing
the
meaning
of
these
concepts.
C
Actually,
for
example,
the
jihad
is
a
is
an
important
concept
in
the
really
the
religious
literature
and
religious
resources,
but
the
in
ideology.
Extremist
ideology,
jihad
is
also
important,
but
the
meaning
is
actually
very,
very
different
and
to
be
able
to
detect
the
extremist
content
and
separate
them
from
the
non
extremist
content.
Jihad
is
a
very
important
concept
and
we
need
to
actually
disambiguate
this
important
concept
from
different
meanings.
So
then
we
can
actually.
C
Represent
this
diagnostic
term
in
our
model,
based
on
their
different
meanings
in
different
contexts,
for
example,
the
jihad
might
actually
appear
in
tweets
with
different
meanings.
For
example,
some
person
like
manic
this
person
might
actually
use
jihad,
something
that
is
going
to
like
be
a
better
person,
and
so
he
says,
like
my
jihad,
is
always
being
kind,
but
on
the
other
hand,
another
person
is
actually
using
jihad
in
context
of
ideology,
and
it
is
actually
saying
that
the
nation
of
jihad
and
martyrdom
so
jihad
is
a
part
of
their
ideological
rhetoric.
C
On
the
other
hand,
they
are
using
jihad
to
disseminate
hatred
against
the
Western
countries
or
against
their
up
group
or
against
their
enemy
and
they're
using
jihad.
For
that
purpose,
as
well,
and
all
these
different
all
the
same
concept
but
different
meanings
and
to
dissent
to
differentiate
these
from
each
other
when
we
are
languish
represent
actually
these
the
same
term
for
extremists
and
non
extremists.
So
we
can
see
the
difference
as
as
their
similarity
with
other
key
other
key
words.
C
As
you
see
here,
the
jihad
is
actually
much
more
similar
to
other
extremist
concepts,
for
example
Allah
key
or
a
key
dollar
share
or
Islamic
state
media.
So
these
are
the
terms
actually
most
of
the
time
extremes
are
actually
using
Allah.
He
is
a
very
well-known
ideologue
and
Ikeda
is
actually
one
concept
that
the
Islam's
extremes
are.
They
keep
using
Islamic
state
media.
We
all
know
I'm
bothering
on
extremist
people
are
using
jihad,
for
maybe
in
our
course
religious
terms.
Quran
Muslims,
Imam.
C
So
so
being
able
to
make
this
differentiation,
then
we
decided
to
create
different
contextual,
contextual
dimension
models
and
then
for
each
of
these
dimensions,
religion,
ideology
and
hate.
We
are
creating
a
language
model
and
to
be
able
to
do
that,
so
we
didn't
start
from
the
data
actually,
but
we
we
went
to
an
external
knowledge
resource
for
each
of
these
religion,
ideology
and
hate
emotions.
We
utilize
external
ground,
truth
resources,
the
corpora,
which
has
been
identified
by
our
domain
expert
political
scientist,
domain
expert
and
for
religion.
C
We
are
using
Quran
and
hadith
because
excuse
users
are
actually
making
references
to
those
resources,
underhand
ideology.
We
are
using
the
books
or
lectures
of
the
ideologues.
Our
dialogues
have
been
identified
by
the
political
scientists
as
well,
and
also
the
hate
hate.
We
are
using
hate
speech
corpus
that
was
probably
before
a
specific
to
Twitter
as
well
since
so
these
are
actually
a
kind
of
modular
contextual
dimensions,
so
this
can
be
applied.
This
approach
can
be
applied
over
many
social
other
social
problems
as
well
as
far
as
long
as
we
identified
the
correct,
contextual
dimensions.
C
So
how
we
create
user
representations,
as
we
represent
the
words,
so
we
create
their
represent
representations
based
on
their
similarities,
with
respect
to
their
surrounding
surrounding
keywords,
as
well
as
the
whole
corpus.
So
for
that
reason
our
solution
is
actually
a
distributional
similarity
based
representation
and
we
are
doing
that
for
each
of
the
dimensions,
religion,
ideology
and
hate.
C
So
after
we
generate
representations,
so
we
decided
to
just
compare
the
extremist
people
with
the
non-extremist
people
how
actually
similar
they
are
and
how
dissimilar
they
are
and
as
we
expected
so
for
religion,
they
are
very
similar,
very
strong
similarity,
because
non
x
rays,
people
are
Muslim
people
and
they're
using
the
religious
language
and
x
rays.
People
are
saying
as
well,
even
though
they
are
using
actually
the
religious
language
in
a
different
way.
But
there
is
some
similarity
here,
but
that
is
very
difficult
to
separate
from
each
other
for
a
language
model.
C
On
the
other
hand,
ideology
there
is
somewhat
similarity,
but
not
that
similar
and
for
hate.
Actually,
it
is
it's
not
that
similar
at
all,
even
though
there
are
some
similarities
between
some
of
the
users.
On
the
other
hand,
when
we
compare
extremist
people
within
each
other,
it
is
interesting
that
we
see
for
religion
so
for
religion
we
were
expecting.
Actually
still,
we
will
see
strong
similarity
between
extremist
users,
but
for
religion.
It
is
not
actually
the
case.
There
is
still
strong
similarity
between
extremist
users
most,
but
there
is
a
group
of
users
in
the
x-axis.
C
If
you
see
there
is
a
group
of
users
that
are
not
similar
at
all
with
with
other
users,
so
that
is
actually
creating
some
suspicions
there
may
be
in
our
extremist
dataset
there.
There
might
be
some
outlier
users,
all
pile
users
who
are
labeled
as
extremists,
but
they
may
be
actually
on
extremists.
So
this
is
just
suspicion
for
now,
and
then
we
carry
on
further,
for
I
mean
just
making
sure
of
the
suspicions
about
the
outlier.
We
actually
create
some
visualizations
for
the
extremist
users,
and
what
we
see
here
is
for
each
of
the
dimension.
C
The
ideology
is
blue,
religion
is
green
and
the
hate
is
red
for
religion
and
hate.
We
are
seeing
actually
a
group
of
users
circled
clusters.
They
are
very
independent
from
the
rest
of
the
group
and
they
are
actually
similar
and
far
from
others,
and
there
is
a
very
small
group
as
well
for
the
ideology,
but
it's
not
that
far
compared
to
religion
and
hate.
C
We
randomly
select
an
users
and
each
time
the
same
users
were
clustered
actually
separately
and
far
from
the
rest
of
the
group
for
religion
and
hate.
So
that
is
that
had
created
a
big
suspicion
that
the
data
set
that
has
outliers.
So
we
used
a
clustering,
hierarchical
clustering
algorithm
to
cluster
the
users
like
the.
C
Religion
of
the
mushroom
ideology
dimension
and
the
hate
dimension,
and
we
did
the
Cystic
of
analysis
whether
this
separation
actually
is
significant,
significant
and
and
we
see
that
it
is.
And
now
we
have
selected
some
random
sample
of
seven
four
seven
six
users
55%
and
gave
it
to
our
food
science
expert
and
he
validated
that
these
users
are
actually
outliers
and
a
couple
score
was
83%.
C
C
From
the
alcohol
users,
so
they
were
mostly
talking
about
marriage
or
Allah
silence,
Islam
leaders,
cake
and
so
on,
which
are
seemingly
not
very
indicators
of
extremist
content
on
dr.
hunt.
Since
we
are
using
three
different
dimensions,
we
know
that
not
all
extremist
users
are
actually
equal
and
some
users
we
actually
at
the
beginning
of
the
radicalization
process.
Some
users
may
be
at
the
end
of
the
radicalization
process
and
for
each
of
the
stage
or
the
level
of
radicalization,
the
intensity
or
laxity
of
the
radical
ideology
or
religion
or
hate
may
be
actually
different.
C
So
for
some
users
we
might
have
some
sparse
representations
for
religion
for
ideology
or
for
hate,
and
that
is
going
to
impact
our
our
models
for
performance.
So
for
that
reason
we
have
created
an
approach
for
importing
these
partial
plantations
and
reuse
topic
of
similarity
of
the
contacts
to
be
able
to
account
for
the
sparsity
and
after
we
created
our
models.
So
what
we
have
seen
is
our
tried
national
model.
It
performs
best,
as
you
see
here,
by
the
way
Rih.
C
It
stands
for
religion,
ideology
and
hate,
and
the
reason
that
we
are
using
precision
gaze,
we
want
to
emphasize
the
reduction
in
miss
classification
of
non-extremist
users.
Is
that
chef
actually
point
out
before
we
need
to
reduce
false
positives
and
we
need
to
reduce
the
false
positives,
meaning
we
need
to
reduce
the
risk
classification
of
non
extremists
compared
to
reduction
in
miss
classification
of
extremes.
So
we
might
be
actually
a
better
off
missing
on
some
extremist
users
being
misclassified.
C
C
C
Based
on
the
results
that
we
have,
what
we
have
observed
in
this
study
is
the
domain-specific
knowledge
that
we
can
incorporate
in
the
model.
Creation
is
quite
critical
and
it
is
improving
actually
the
improving
the
reduction
of
fossil
arms
of
the
models,
and
it
also
reduces
the
likelihood
of
an
unfair
mistreatment
of
non-extremist
individuals
or
any
potential
social
discrimination.
C
On
the
other
hand,
studying
actually
different
dimensions,
contextual
dimensions
and
how
they
are
impacting
the
model
performance,
so
what
we
have
seen
here
is
the
extremist
use
potentially
employ
actually
religion
along
with
hate.
Maybe
this
is
a
part
of
their
hate
tactic,
so
they
are
actually
doing
this
purposefully
and
it
might
be
possible
that
they
are
just
fine,
their
hatred,
using
religious,
religious
rhetoric.
C
On
the
other
hand,
each
dimension
plays
different
roles
for
each
of
the
levels
of
the
radicalization
which,
which
was
like
five
levels
of
radicalization,
and
the
nuances
in
each
of
these
radicalization
is
quite
different,
because
each
of
the
dimensions
are
actually
playing
role,
different
roles
in
each
of
them
and
capturing
the
Monza's
in
terms
of
linguistic
or
semantic
characteristics.
We
can
actually
do
this
when
we
break
this
down
into
different
dimensions.
We
can
capture
those
nuances
in
a
more
granular
way
and.
C
We
have
more
actually
research
that
we
are
working
on,
how
to
better
understand
the
factors
of
online
extremism,
and
we
are
working
more
on
actually
other
aspects
of
the
of
the
radicalization
process,
how
these
people
are
able
to
perceive
these
people
and
maybe
another
time
we
can
talk
more
about
other
aspects
as
well.
Thank
you.
A
A
C
C
So
we
are
so
these
are
the
representations
generated
from
language
models.
In
this
case
we
have
used
a
word
to
wake
and
and
to
be
able
to
create
this.
So
we
just
measuring
the
distance
between
the
representation
using
cosine
similarity,
and
we
are
just
measuring
that
and
then
creating
this
heat
map.
C
A
D
E
D
C
C
Actually,
it
was
a
group
of
users,
group
of
volunteers
called
lucky
troll
group
and
they
were
actually
identifying
the
likely
Isis
supporters
and
reporting
to
Twitter
and
Twitter
was
reviewing
those
and
then
suspending
their
accounts
after
verification
that
they're
actually
Isis
supporters
and
we
have
taken
taking
the
list
of
the
users.
That
was
that
that
was
how
that
was
published,
and
then
we,
when
we
gathered
actually
their
past
tweet
tweets,
and
then
we
combined
the
two
data
sets
accordingly.
D
C
Connotations
using
these
external
resources
so
to
be
able
to,
for
example,
for
religion
dimension.
We
have
created
one
worth:
wake
representation
for
our
ideology,
another
represent
another
model
and
for
hate
we
create
another
model
and
using
these
models
we
created
the
user
representation
of
each
user
and
meaning
user
representation.
Meaning
is
we
aggregated
the
tweets
for
each
user
and
then
we
create
a
representation
accordingly,.
E
E
C
D
Your
average,
across
the
words
in
a
tweet
or
across
all
the
words
in
all
the
tweets,
or
do
you
first
average
across
tweets
and
then
across
sorry
words
and
then
across
tweets,
oh
I'm,
asking
because
there's
lots
of
different
ways
to
do
this
and
also
dependent
on
that.
So
I
wanted
to
understand
how
you
did
that.
So
we.
C
We
have
network
information,
but
in
this
specific
study
we
didn't
look
at
the
network
interactions,
so
this
is
something
we
are
working
on
also
right
now
how
these
people
are
actually
interacting
with
each
other.
So
it
was
in
saying
that
when
we
also
look
at
the
information
density
for
each
of
the
dimensions
for
the
users,
so
there
is
actually
differences
between
the
religion
and
the
ideology.
So,
for
some
of
the
users,
ideology
is
actually
much
higher
much
Dancer
compared
to
majority
of
the
users,
so
the
ideology
users
are
actually
a
small
number
of
users.
C
C
And
so
this
is
the
the
information
density
was
actually
suggesting
that
there
might
be
some
group
of
users
with
trying
to
disseminate
ideological
content
and
and
disseminate
to
their
their
followers
that
they're
trying
to
radicalize,
and
so
we
are
actually
in
the
process
of
identifying
those
recruiters
and
followers.
At
the
intersection
of
network
analysis
and
their
content
as
well.
Yeah.
E
No
I
think
it's
fascinating.
I
mean
I
am
a
traditional
content.
Analysis
person
I
feel
that
content
is
extremely
powerful
and
I
have
been
a
little
frustrated
that
people
have
gone
for
Network,
not
content,
so
I
think
then,
wherever
you
can
try
to
get
a
useful
intersection
of
the
two.
Instead
of
just
talking
about
content
or
just
talking
about
network,
is
it's
really
really
useful,
not
just
in
this
context,
but
in
general,
in
terms
of
leveraging
the
particular
importance
I.
B
Think
the
points
that
I've
made
so
what
makes
at
least
for
me
this-
you
know
this
much
more
exciting-
is
that
a
lot
of
work
has
been
done
in
content
analysis
and
often
the
tendencies
is
to
let
the
data
speak
and
see
what
we
can
find.
What
pattern
we
can
find
all
that
when
we
go
into
this
complex
domain,
I.
B
B
B
You
know
this
levels
and
process
steps
in
the
process
without
having
this
domain
aspect
of
it
and
without
actually
building
good
domain
models
and
without
taking
the
step
combining
the
statistical
processing,
what
to
work
kind
of
stuff
and
embeddings
with
domain
infusion,
as
we
do
so.
You
know,
our
groups
are
key
emphases
these
days
in
knowledge,
infused
deep
learning-
and
this
is
just
one
of
the
instances
of
that
line
of
work
that
we
are
doing.
I.
E
Think
it's
great
and
you're
totally
preaching
to
the
choir
here
as
I'm
a
political
scientists
young
right
now,
they're
looking
at
Russian
disinformation,
narratives,
but
I'm,
starting
with
the
narrative
not
with
suspected
you
know,
trolls
or
BOTS,
or
anything
like
that,
and
it's
it's
complicated,
but
I
completely
agree
with
you
about
this
signal.
Noise
issue
and
I
can
I
totally
think
that
you
know
having
a
ground
truth.
Corpus
is
just
really
really
important.
C
C
So
it
is
very
likely
that
they
were
they
fail
to
separate
the
accounts
that
are
actually
extremists
from
non
extremists,
and
there
are
like
10
percent,
although
outliers
may
be
that
that
was
a
reason.
On
the
other
hand,
it
is
important
to
generate
representations.
The
true
representations
based
on
their
meanings
with
respect
to
these
contexts,
religion,
ideology
on
it
because
same
the
very
same
concepts
are
going
to
be
represented
differently,
for
example
the
vectors,
the
representations
that
we
are
going
to
generate
over
here
for
religion.
C
The
same
concept
for
jihad
is
not
going
to
be
the
same
representation
for
ideology.
It
is
going
to
be
very
different
from
each
other,
as
I
as
I
showed
like
in
earlier
slice,
so
over
here.
The
same
concepts
are
actually
very
similar
to
extremist
concepts
that
we
already
know
and
the
other
jihad
representation
was
close
to
other
concepts
that
we
know
that
are
not
extremists
non-extremist.
C
So
this
is
actually
how
we
can
separate
or
the
same
business
ambiguities
concepts
from
each
other.
That
is
going
to
contribute
a
lot
to
the
performance
of
the
model,
specifically
reduction
in
false
alarms
and
once
any
anyone
wants
to
employ
this
kind
of
language
model,
they
need
to
account
for
false
alarm
and
their
implication
because
is
going
to
implicate
millions
of
people.
For
that
is
done.
This
is
significant.
I.
D
So
did
you
use
only
original
tweets
or
do
you
use
also
the
content
of
retweets
so
because
I'm
I
was
wondering
if
perhaps
some
of
these
people,
who
are
not
necessarily
you
know,
extremists
in
from
from
your
perspective,
when
you
look
at
their
the
way
they
write,
but
perhaps
they
have
occasionally
retweeted
something
posted
by
somebody
else
that
might
have
affected
tweeter
decision,
but
it
may
or
may
not
have
affected
your
classification
depending
on
whether
or
not
you
consider
the
content
in
retweets.
So
I
was
curious
about
how
you
chose
so.
C
We
were
also
curious.
Actually,
what
kind
of
I
mean
these
outliers
since
the
dataset
was
verified
by
Twitter?
What
was
the
reason
actually
for
that
particular
so-called
mistake,
or
if
there
is
a
mistake
and
the
team
that
verifies
all
the
reported
users
to
reveal
whether
they
are
supporting
what
their
Isis
supporters
or
not
so
they
were
actually
a
general
abuse
detection
team,
so
they
were
actually
the
team
verifying
the
users
who
were
abusing
the
platform
spreading
bad
behavior.
So
that
was
actually
so.
C
C
D
D
C
D
C
D
There
is
so
thank
you
that
that
answers
my
question
and,
of
course
that
is
always
a
challenge
in
these
cases,
because
you
know
you
can
get
relatively
high
as
you
do
good,
good,
very
good
performance
within
the
domain
with
cross-validation,
but
these
these
kind
of
tasks
in
these
real-world
scenarios
are
really
challenges.
When
you
go,
you
know
cross
domain.
D
You
know
we
find
our
models,
which
are
for
doing
some
different
tasks,
which
is
about
detection.
We
find
that
they
do
really
really
well
within
domain,
but
very
poorly
about
lane.
So
that's
always
a
challenge,
of
course,
for
all
of
these
applications.
But
let
me
thank
you
guys.
A
really
great
presentation,
I
learned
a
lot.
A
Then,
given
for
the
questions,
definitely
help
us
know
a
little
bit
more
about
the
approach
I'm.
If
I
may
I'd
like
to
ask
one
question
a
doctor
said
and
doctor
course
with
you,
given
this
study
now,
as
we
know
that
the
behaviors
keep
evolving
now,
how
do
you
see
application
of
your
research
or
the
models
that
you
have
developed?
What
would
need
to
be
done
to
detect
new
forms
of
giving
new
forms
of
behavior
to
detect
the
same
users
with
their
new
forms
of
behavior,
not
sure
if
I
made
my
question
clear
sure
should.
A
When
I
took
there
are
multiple
dimensions
to
that
question
one
is
we
sometimes
see
that
people
share
things
on
Twitter
when
they're
in
the
initial
stages,
but
then
they
move
to
some
private
platforms,
almost
closed
platforms
when
they
are
getting
close
in
recruitment
stages
or
level
of
for
it
from
your
scale.
Yes,
that
is
something
that
we
have
observed,
that
they
move
or
migrate
from
one
platform
to
the
other.
That
is
one
form
of
behavior.
A
C
C
On
the
other
hand,
we
believe
that
the
dynamic
knowledge
graphs
can
address
to
this
question
is
we
can
represent
the
online
behavior
of
these
groups
in
a
knowledge
graph,
and
this
knowledge
graph
can
be
evolved
over
time
depending
on
the
changes
in
this
behavior,
so
that
we
can
actually
incorporate
incorporate
that
knowledge
in
a
model
in
a
continuous
in
continuous
manner.
So
so
this
is
I
mean
still
an
open
question
for
us
and
how
you
can
continuously
evolve
this
knowledge
graph
first
and
that
you
can
incorporate
that
in
the
model.
B
B
How
do
you
detect
that
those
concepts
are
there
now
as
part
of
the
conversation,
and
how
do
you
make
them
in
howdy
in
enhance,
extend
your
knowledge,
graph
or
knowledge
model
for
that
domain
with
to
include
them
that
part
we
have
been
working
on
for
some
time.
I
have
not
been
able
to
give
enough
attention
to
it
because
of
other
problems
going
on,
but
it's
a
it's
been
a
theme
that
we
will
be
working
more
on.
B
So
that
means
the
topic
drift
or
concept
reef,
a
kind
of
stuff
that
we
expect
to
capture
along
this
line.
There
is
also
increasing
value
for
incorporating
the
time
concept,
meaning
the
concern
when
a
particular
concept
becomes
dominant
in
a
particular
part
of
conversation,
and
when
we
incorporate
that
into
our
knowledge
graph
to
to
also
have
a
time
information
record.
That
also
becomes
part
of
the
you
know,
domain-specific
approach
that
we
pursue,
but
your
question
will
be
months
more
than
this
and
just
to
know
all
the
ways
that
we
need
to
look
at
I
would.