►
Description
Presenter: Dr. Ugur Kursuncu
Bio: Ugur Kursuncu is an Assistant Professor at Georgia State University. He received his Ph.D. in Computer Science from the University of Georgia, and previously was a Postdoctoral Fellow at the Artificial Intelligence (AI) Institute, University of South Carolina His research is at the intersection of humans, society, and computers, designing socially responsible human-centered intelligent systems by integrating knowledge to enhance contemporary AI and data science methods.
A
I'm
going
to
talk
about
about
our
work
in
this
in
on
in
the
space
of
cyber
social
threats
and
I'm,
going
to
specifically
focus
on
the
online
extremism
work
that
we
have
been
working
on
for
a
few
years
now
and.
A
I'm
going
to
give
you
an
overall
introduction,
our
motivation
and
our
problem
statements
and
what
kind
of
challenges
that
we
have
and
possible
solutions,
but
we
are
ongoing
to
more
I'm
going
to
focus
more
on
the
contextual
nature
of
the
extremism.
Extremism
is
a
problem
in
the
space
of
cyber
social
threats
and
specifically
I'm
going
to
talk
about
the
malevolent
creativity
and
its
impact
on
the
or
one
extremism
conversations
on
social
media.
A
So
as
you
as
you
know
that
the
social
media
has
been
very
useful
too,
not
only
for
people
to
express
their
opinion
and
to
have
a
conversation
in
their
on
the
topics
in
their
interest,
but
all
it
is
also
very
useful
tool
for
the
researchers
to
be
able
to
understand
the
human
behavior
or
why
people
behave
the
way.
The
way
they
behave.
So
specifically,
organizations
are
also
taking
advantage
of
this
tool
to
be
able
to
understand
their
audience.
For
example,
the
companies
are
able
to
understand
their
audience
through
social
media.
A
On
the
other
hand,
social
media
can
be
misused
by
people
or
by
the
organizations
by
the
states
as
well.
So
so
that
would
be
the
problems.
The
problems
that
may
arise
from
online
platforms
can
be
misinformation
or
extremism
or
toxicity
hate
speech
or
cyber
bullying.
A
So
these
can
be
actually
like
a
naturally
organically
evolving
on
the
platforms,
but
they
can
be
exacerbated
by
the
organizations
by
the
extremist
groups
or
the
body
terrorist
organizations
or
even
by
other
organizations
as
well,
so
so
they
can
be
unintended
but
but
unintended,
but
they
can
be
also
intended
too.
So,
in
this
case,
what
we
can
do
is
first
to
understand
what
kind
of
problems
we
are
tackling
here
and
to
to
try
to
detect
and
then
counter
control
the
harm
using
social
Big
Data.
A
Of
course,
there
are
some
ethical
considerations
that
we
have
to
account
for,
while
we
are
doing
these
analysis
and
developing
models,
hopefully
using
using
techniques
that
we
that
we
come
up
with.
So
if
you
want
to
give
an
example,
online,
toxicity
or
cyber
bullying
and
harassment.
So
these
count.
A
These
are
being
increasingly
important
problems
on
these
platforms,
for
example,
students
who
experience
bullying
or
subject
buildings
are
nearly
two
times
likely
to
attempt,
and,
more
importantly,
so,
cyber
bullying
is
actually
increasing
on
social
media
among
Among,
The,
Adolescents
or
teenagers
high
school
students
and
the
the
building
or
the
harassment
can
be
about
the
race
or
gender
or
their
sexual
orientation,
disability
or
or
religion.
A
On
the
other
hand,
so
the
online
toxicity
is
cyber
bullying.
Our
harassment
and
also
extremism
is
just
one
example,
but
all
of
them
sharing
are
sharing
one
characteristics,
and
that
is
context,
and
the
context
is
a
context-
is
quite
critical
to
be
able
to
develop
reliable
and
robust
analysis
approaches
and
also
the
models
for
detection
and
countering
these
problems
on
social
media.
A
When
you
look
at
the
Spectrum
to
demonstrate
the
harm
on
social
media
in
different,
there
are
different
spaces
where
people
are
working
on
using
social
media
data,
but
specifically,
some
of
them
are
more
harmful
than
others.
For
example,
extremism
or
disinformation,
harassment
or
illicit
drugs
that
we
want
to
Monitor
and
all
of
these
more
harmful
Communications
are
sharing
one
characteristics,
and
that
is
a
context
and
what
is
the
importance
of
context
in
these
domains
in
these
domains?
A
Even
for
humans,
it
would
be
very
difficult
to
distinguish
what
is
extremist
from
what
is
not
experience
or
what
is
disinformation
from
what
is
not
disinformation
or
what
is
harassment
or
what's
not
harassment,
most
of
the
time?
So
when
you
develop
some
analysis
or
when
you
develop
a
models
that
is
going
to
detect
this
type
of
conversations
on
social
media
or
the
users
that
is
disseminating
this
type
of
conversations
on
social
media
most
of
the
time,
the
the
models
that
we
develop
is
going
to
confuse.
A
What
is
what
is
a
bad
conversation
or
hard
conversation?
What's
not
a
harmful
conversation,
for
example,
when
you
consider
a
communication
between
the
two
high
school
friends,
so
let's
say
one
is
harassing
the
other
one.
One
student
is
harassing
the
other
one
in
high
school,
and
that
is
happening
on
social
media
too,
and
they
are
using
some
some
curse
words
or
some
harmful,
harmful
language
between
each
other.
A
On
the
other
hand,
it's
really
difficult
to
label
this
conversation
as
a
as
a
harassing
conversation,
just
because
there
are
some
some
harmful
language
in
the
content
that
they
exchange
with
each
other,
because
regular
friends
are
also
sharing
this
type
of
information,
just
as
a
matter
of
as
a
matter
of
joke,
for
example,
and
that
is
not
necessarily
harassment,
so
how
you
are
going
to
disseminine
how
you
are
going
to
distinguish
the
harassing
conversation
from
non-harassing
conversation
and
context
is
going
to
become
an
important
and
critical
component
of
the
content
here.
A
So
in
this
case,
then,
the
problem
that
we
are
tackling
with
here.
What
is
the
impact
of
context?
What's
the
impact
of
context-
and
we
are
also
focusing
on
the
malevolent
creativity
in
the
content
as
well,
and
we
are
looking
at
the
the
impact
of
context
and
malevolent
creativity
in
content
on
individuals
or
groups
online
in
the
Eco
Chambers
for
cyber
social
threats
and
then
how
can
we
mine
and
leverage
some
actionable
insights
from
social
media
to
address
the
above
question?
A
So
in
our
methods?
So
we
are
incorporating
knowledge,
external
knowledge
in
machine
learning,
algorithms
and
our
question
in
this
case,
from
a
method
methodology
perspective.
Can
the
external
views
like
it
can
be?
External
knowledge,
help
address
these
challenges
and
we
are
using
a
method,
knowledge,
infused
learning
that
is
leveraging
external
knowledge
in
the
form
of
Corpus,
Corpus
or
corpora
or
lexicons,
or
knowledge
base
ontology
and
knowledge
graphs,
and
we
are
using
this
approach
in
our
in
our
method.
Here,
foreign.
A
Contextual
nature
of
online
extremism
here
is
going
to
be
also
very
important
because
the
meaning
of
the
content
is
going
to
change,
depending
depending
on
the
perspective
of
the
of
the
user
and
and
the
power
of
extremism.
Online
has
been
quite
influential
so
far
and
so
far
1
000
Americans
between
1980
and
2011.
A
So,
on
the
other
hand,
the
there
are
challenges
to
be
able
to
detect
this
type
of
conversations,
because
the
contextual
nature
of
this
problem
has
been
actually
quite
challenging
and
first
of
all,
this
is
a
persuasive
process.
Over
time.
The
radicalization
and
the
these
type
of
communications
are
constructed
to
and
also
intended
to
shape
and
reinforce
or
change
the
response
of
of
another
or
others,
or
it
has
been
constructed
to
change
the
behavior
of
the
other
person
so
to
change
the
point
of
view
of
the
other
person
or
the
police
system.
A
So
they
are
using
most
of
the
time
domain
knowledge
for
proceeding.
Individuals-
since
we
are
talking
about
here,
extremism
or
a
specific
type
of
extremism
which
is
Islam,
is
extremism.
Then
domain
knowledge
has
been
critical
because,
most
of
the
time,
the
extremist
groups
or
organizations
and
individuals
on
social
media,
they
are
using
knowledge
in
the
domain
of
the
religion
of
Islam.
So
to
be
able
to
understand
what
kind
of
person
is
a
process
or
what
kind
of
contextual
nature
of
the
problem
has.
A
We
need
to
know
specifically
about
a
specific
knowledge
about
the
religion
of
Islam,
so
that
we
can
incorporate
the
this
type
of
knowledge
in
our
analysis
and
also
the
model
that
we
are
developing
and,
more
importantly,
whatever
solution
that
we
come
up
with.
It
has
to
be
a
fair
solution
in
this
case.
A
A
A
We
see
here
that
the
the
phrases
or
the
topics
in
blue
they
are
talking
about
mostly
about
their
ideologies,
for
example
like
Syria
or
Islamic,
State
or
Iraq,
or
call
it
a
news
coming
from
their
media
channels
or
Shia
militia
or
ISIS
attack
or
or
other
political
or
ideology
related
phrases
and
the
green
ones
are
mostly
about
the
religious
phrases
or
or
the
topics,
for
example,
Muslim
or
Allah,
or
Islam
and
mercy
Jannah.
A
So
these
are
most
of
the
time
shared
with
the
non-extremists
individuals
as
well
or
the
organizations,
because
most
of
the
time
Muslims
are
talking
about
these
topics
like
religious
topics
and
this
type
of
conversations
and
the
the
ones
in
the
in
Red.
So
they
are
mostly
about
harming
another
person
or
another
group.
A
First
example
kill
attack,
break
or
or
destroy,
and
each
of
these
different
themes
are
requiring
a
separate
type
of
approach
and
consideration
to
be
able
to
understand
the
context,
because
each
of
them
is
a
different
context
and
they
need
to
be
treated
separately
to
be
able
to
understand
what
they
are
about
and
what
kind
of
perspective
people
are
talking
about
these.
A
So
here
when
we
so
as
we
as
we
see
before
the
extremist
content
is
a
multi-dimensional
Content
here
and
the
dimensions
of
the
context
of
this
content
is
religion,
ideology
and
hate,
as
we
identified
before,
and
the
distribution
of
prevalent
terms
in
each
Dimension
is
different
here.
So
the
meaning
is
going
to
be
very
different
in
each
context
as
well.
A
So,
for
example,
the
the
keyword
for
for
Jihad,
for
example,
the
meaning
of
jihad,
is
going
to
be
very
different
in
a
religion
domain
than
the
the
meaning
in
ideology
or
in
hate.
So
for
this
reason
we
need
to
use
these
Dimensions
contextual
Dimensions
to
the
same
big
weight.
Common
diagnostic
terms.
Jihad
is
a
very
important
critical
term
that
we
need
to
use
in
our
model,
but
we
we
also
need
to
disambiguate
the
meaning
of
this
term,
to
be
able
to
to
be
able
to
create
a
reliable
model.
A
A
Using
some
using
machine
learning
based
on
purely
data
driven
machine
learning,
models
are
going
to
distinguish
or
are
going
to
disambiguate
the
meaning
of
these
terms,
but
they
are
not
going
to
be
explosively,
distinguishing
the
the
meaning,
for
example,
so
going
running
through
the
same
example
Jihad
here.
A
A
So
in
this
case
the
model
is
going
to
be
confused
or
it's
not
going
to
be
as
successful
to
distinguish
the
meaning
behind
Jihad
here,
because
it's
colossal
violence
at
the
same
time
close
to
God
and
prayer.
So
it
is
their
conflicting
meanings.
They
have
conflicting
meanings
right.
A
So
in
this
case,
when
we
consider
an
external
knowledge
in
the
form
of
a
Knowledge
Graph,
where
we
Define
different
meanings
of
jihad,
so
Jihad,
for
example,
on
the
right
hand,
side
is-
is
related
to
violence,
killing
paradise
and
hate
attack
or
and
on
the
other
hand,
the
Jihad.
Another
meaning
of
jihad
is
related
to
prayer,
God
and
Paradise
right.
A
So
then,
as
we
use
this
mouse
graph
as
an
external
resource
that
is
going
to
give
us
the
opportunity
to
distinguish
the
meaning
of
the
of
the
of
the
term
in
an
explosive
Manner
and
incorporate
that
animation
learning
model,
and
that
is
going
to
give
us
more
more
explicit,
explicitly
separated
meanings
of
the
same
keyword
as
we
said.
The
same
liquidity
these
this.
This
content.
A
So
just
an
example
from
the
content
using
Jihad.
So,
for
example,
from
religious
perspective,
the
Jihad
is
being
used
in
the
first
example.
C
A
As
as
it
is
a
kindness
It's
about
kindness
and
just
helping
other
people,
for
example,
it
says
kindness
is
a
language
which
the
blind
can't
see
and
the
death
can't
hear
mind.
Jihad
be
kind
always
so
being
kind
to
others
is
is,
is
a
Jihad
right,
and
this
is
a
religious
meaning
in
that
in
that
content.
In
that
context,
on
the
other
hand,
at
the
bottom,
so
it
talks
about
the
the
Jihad
from
a
political
perspective,
and
it
talks
about
the
nation
of
jihad
and
martyrdom
can
never
be
defeated.
A
So
it
talks
about
the
political
perspective
of
jihad
and
martyrdom
can
never
be
defeated.
Martyrdom
can
be
done
for
the
sake
of
a
national
Jihad,
which
is
referring
to
a
political
protocol
organization
or
political
meaning,
and
that
is
about
ideology,
but
on
the
other
hand,
the
right
hand
side
it's
about
harming
others
right,
so
it
is
about
killing
other
people
and
it
they
they
call
it
Supreme
Jihad.
A
So
this
is
the
hate
perspective.
All
of
these
perspectives
have
different,
contextual,
meaning
all
the
same
keyword,
and
it
is
critical
to
separate
them
and
then
treat
them
as
a
separate
as
a
separate
semantic
meanings
of
the
of
the
same
keyword.
A
So
we
can
do
this
using
language
models
first
and
we
can
use
any
language
models
but
vertig
gpt2,
and
they
are
going
to
give
you
the
similarity
of
the
words
from
a
large
corpora,
and
it
is
mostly
based
on
the
distributional
similarity
representations
of
the
same
keyword
based
on
the
surrounding
keywords
of
the
topic
of
the
term.
A
On
the
other
hand,
we
can
also
can
get
the
context
capture
the
context
from
launch
graph,
as
I
mentioned
before,
and
from
a
Knowledge
Graph.
That
is
going
to
be
a
solution
through
the
distance
between
Concepts
in
the
mouse
graph
and
for
each
Concept
in
the
content.
Here
we
are
going
to
have
an
entity
in
the
knowledge
graph
and
in
an
outs
graph.
We
have
the
relationships
between
these
entities
and
these
relationships
are
going
to
give
you
the
contextual
information
of
the
of
the
content.
A
So
that
is
going
to
that's
going
to
be
your
contextual
information
that
you
can
extract
from
from
a
large
graph.
A
So
in
this
case
we
are,
we
have
the
quotation,
Dimension
modeling
approach
first,
and
that
is
the
that
that
has
three
different
dimensions
and
the
three
different
dimensions
we
have
different
resources
and
the
resource
that
we
have
for
one
dimension
is
for
two
Dimensions
original
ideology
and
eight
we
have
the
corpora
from
books,
lectures
of
audio
logs
and
and
hate
speech
Corpus
for
religion
we
are
using
a
we
are
using
a
a
Quran
and
Hadith,
and
also
we
are
using
Quran
ontology
and
very
recently.
A
We
have
also
developed
our
own
online
extremism,
islamist
extremism
ontology
as
well-
and
we
are
also
experimenting
with
that
in
this
study
too.
So
in
in
results.
A
Of
course,
when
we
use
when
we
develop
a
machine
learning
model,
then
when
we
use
the
three
discussions,
all
together
is
performing
better
than
other
models
that
we
develop
with
only
one
dimension
or
only
two
dimensions.
So
we
are
specifically
using
precision
as
a
metric
here.
We
because
we
want
to
emphasize
the
reduced
risk
classification
of
non-extremist
content,
because
we
don't
want
to.
A
We
don't
want
to
model
to
emphasize
or
to
ignore
the
unfairness,
unfair
misclassification
of
the
of
the
money
extremist
content
for
this,
because
because
the
implications
in
such
a
case
is
going
to
be
a
very,
very
big
when
you
consider
to
deploy
this
kind
of
model
in
a
platform
that
is
being
used
by
millions
of
people
so
Precision,
we
we
use
precision
as
a
main
metric
here
to
evaluate
for
the
for
the
famous
reasons.
A
So
on
top
of
this,
so
we
are
also.
We
are
also
trying
to
understand
the
malevolent
creativity
in
extremist
content,
and-
and
that
is
why
we
are
studying
this
to
understand
why
extremist
groups
are
very
efficient,
effective
and
efficient
in
influencing
others
online.
Because
when
you
look
at
the
statistics,
so
they
are,
they
are
very
successful
to
perceive
other
people
to
influence
other
people
so
that
other
people
can
actually
be
persuaded,
maybe
over
time
in
a
year
or
two
and
they
attempt
to
travel.
A
They
they
decide
to
leave
their
families
and
they
they
attempt
to
travel
to
join.
Join
Isis
or
other
terrorist
organizations
so,
and
we
want
to
understand
why
these
extreme
screws
are
so
effective
in
influencing
others
as
well.
A
So
the
context
is
very
important
to
detect,
but
what
else
is
very
important
in
their
content
and
we
are,
we
have
been
looking
at
the
malevolent
creativity
that
they
are
using
in
their
content
as
well,
so
before
that
was,
it
was
actually
studied
in
other
for
other
topics,
in
space
of
the
Cyber
social
threats
and
specifically,
misinformation
back
in
2018.
A
This
seminal
work
actually
was
done
by
the
researchers
at
MIT
and
they
what
they
found
was,
and
then
they
they
actually
trained
an
LDA
former,
a
big
social
media
data
and
they
identified
200
topics
and
they
compared
the
false
news
and
the
True
News,
using
the
metrics
of
KL
Divergence
information,
uniqueness
and
semantic
distance,
and
what
they
found
was
the
falsehood
or
the
false
news.
They
spread
significantly
farther
faster
and
deeper
on
social
media
compared
to
the
True
News
and
falseness
they
found.
A
False
news
are
more
novel
than
the
True
News,
and
people
are
more
likely
to
share
false
needs
and
and
and
and
humans.
Inherently
we
are,
we
are
more,
we
are,
we
tend
to
share
or
be
influenced
by
novel
content
compared
to
compared
to
a
Content.
That
is
not
that
novel,
so
so
that
is
all
that
is
a
novelty
in
that
they
find
in
the
content,
but
that
is
also
malevolent
computational
creativity.
A
So
what
is
actually
malevolent
creativity
in.
C
A
The
researchers
in
social
science,
specifically
in
educational
psychology,
they
identify
four
different
characteristics
for
malevolent
creativity
and
one
is
novelty,
originality,
surprise
and
Divergence,
and
so
creativity
is
about
usefulness
and
a
novelty
of
the
ideas,
but
it
has
to
be
relevant
too.
So,
for
example,
if
someone
is
talking
about
different
topics
and
talking
about
some
new
topic,
new
need
giving
need
information
all
the
time,
but
it
has
to
be
relevant
too.
A
It
has
to
be
useful
to
the
audience
for
it
to
be
able
to
effective
or
for
it
to
be
able
to
create
it,
so
it
has
to
be
coherent
throughout
the
content
that
is
being
provided.
So
for
this
reason,
so,
along
with
the
novelty
original
surprise
and
diverges,
tasks
will
be
relevant
too.
A
So
in
this
part
in
this
context,
so
we
are
looking
at
the
creativity
in
the
extremism
problem
and
as
we
are
working
on
the
online
extremism
problem
with
with
the
focus
on
islamist
extremism,
so
how
Melbourne
creativity
is
actually
being
employed
in
Islam's
extremist
content.
A
So
we
are
comparing
the
extremist
content
to
a
non-experienced
con
to
be
able
to
understand
this.
So.
A
Are
using
three
different.
B
A
To
one
is
scale
Divergence,
another
one
is
cosine
similarity
and
the
other
one
is
entropy
and
we
are
measuring
semantic
distance
using
those.
So
we
are
looking
at
the
novelty
and
originality
in
the
content
and
KL
Divergence
is
a
measure
of
difference
between
two
probabilities
distributions.
Okay
and
if
two
distributions
were
similar,
the
KL
Divergence
for
both
of
them
would
be
small
right.
A
So
as
so
you
can
also
say,
KL
diversions
is
a
measure
of
this
similarity
too
right.
So
as
it
is
higher,
it
is
going
to
be
more
dissimilar,
so
more
dissimilar
content
implies
more
original
or
more
novel
content
here.
A
So
as
you
look
at
the
at
the
graphs
here
in
the
extremist
content,
when
we
look
at
the
at
the
heat
Maps,
so
it
is
actually
much
it
is
it's
higher
much
higher
for
the
extremist
content
compared
to
non-experience
content,
because
67
percent
of
the
content
is
more
original
here
and
for
the
money
extremist
content,
it
is
only
29
or
original,
but
you
would
when
we
look
at
the
and
okay.
A
So
this
is
this:
is
the
religion
perspective
by
the
way
for
religion,
extremist
people
are
being
more
original
like
at
67,
but
the
non-extremist
people
are
not
actually
that
original
29
percent.
So
for
ideology
perspective
we
are
looking
at
the
the
KL
Divergence
as.
C
A
And
here
it
is
98
original.
A
For
the
money
extremist
group,
it
is
95,
so
the
difference
is
not
actually
very
high,
but
still
experience.
People
are
more
original
extremists.
Then
we
look
at
the
hate.
The
extremist
group
is
at
40
originality,
and
that
is
compared
to
one
percent
in
the
non-experience
group
and
still
extreme
screws
is
actually
more
original
and
they
have
more
novel
content
from
a
hate
perspective.
A
So
when
we
look
at
the
when
we
look
at
the
the
two
groups,
how
different
they
are
from
different
metrics-
and
we
are
looking
at
the
KL
and
the
cosine
similarity
here
and
and
cosine
similarity
is
a
difference.
Cosine
as
it
is
higher,
it
is
more
similar
for
KL.
It
is
when
it
is
higher.
It
is
more
dissimilar.
So
in
this
case,
for
for
extreme
screw,
it
is
for
the
four
percent
similarity
meaning
the
rest
is
actually
dissimilar.
A
So
so,
compared
to
another
extremist
group,
99
experiments
is
at
13
similar
when
we
look
at
the
hay
content.
So
when
we
look
at
the
difference
between
the
two
groups,
they
are
also
statistically
significant.
A
The
entropy
we
are
using
the
Shannon's
atrophy
here.
Entropy
is
a
measure
of
surprise
right.
So
when
we
look
at
the
entropy,
we
find
that
the
extremist
users
have
more
unique
content
compared
to
the
non-experienced
users
and
based
on
the
statistics
that
we
have
and
the
distribution
here
we
see
that
the
minimum
entropy
is
a
number
of
non-exchange
groups
here
in
in
Orange,
and
the
maximum
entropy
is
a
member
of
extremist
group
in
blue.
So
that
may
compare
the
distribution
of
those.
A
A
So
going
back
to
context,
so
we
are
modeling
a
a
detection
model
using
the
con
using
a
contextual
model
and
where
we
learn
the
the
representations
using
sequential
model
using
lstm
and
wireless
DM.
As
you
know,
The
stm's,
Preserve
long
dependencies
in
the
in
content,
preserving
the
capturing
context
as
well,
and
also
we
are
extracting
Knowledge
from
a
Knowledge
Graph.
As
we
as
you
see
here
and
in
another
approach.
A
We
are
also
modeling
using
a
creativity
approach
in
this
creativity,
as
we
as
I
mentioned
before,
the
variation
in
language
is
important
and
how
unique
the
content
is
is
very
important
as
well,
and
we
see
between
the
extremist
and
non-excused
groups.
There
is
actually
a
difference
between
the
between
these
two
groups
from
a
originality
and
Novelty
perspective,
and
for
this
reason
we
are
using
here
A
variation,
Auto
encoder
to
better
capture
variations
in
language.
A
So
again
we
are
using
here
the
the
knowledge
graph
and
we
are
creating
Mouse
companies
from
this
knowledge
graph
and
use
it
in
the
classification.
A
So
when
we
look
at
the
results
using
the
results
from
the
from
from
the
previous
study
that
I
talked
to
you
about
before,
and
it
was
the
random
forest
and
using
the
three
dimensions,
as
you
see
here,.
A
Okay,
so,
as
you
see
here,
B
with
three
dimensions,
the
random
force
is
actually
achieving
at
93
percent.
When
we
use
lstm
and
biostrium,
it
is
actually
going
to
94
and
95
percent
with
the
three
dimensions
and
we
incorporate
the
knowledge
embedding.
It
is
going
up
96
percent,
but
it
is
very
interesting
to
see
with
the
variation
Auto
encoded,
it's
actually
even
going
up
to
98.
A
So
then
we
see
also
the
same
or
similar
Improvement
in
Precision
when
we
use
the
variation
Auto
encoder,
we
are
improving
from
97
to
98,
with
variation
Auto
encoder.
A
So,
overall
from
the
study,
what
we
learned
was
extremist
users
employ
different
tactics
to
perceive
their
target,
and
we
we
found
that
another
one.
Creativity
is
actually
an
important
tactic
that
they
use
in
their
content
creation
so
that
they
are
able
to.
They
are
able
to
influence
other
people
more
effectively
and
each
Dimension
has
different
creativity,
characteristics
based
on
linguistic
and
somatic
cues
on
characteristics
as
you,
as
we
saw
before,
for
different
dimensions.
The
creativity
characteristics
is
coming
up
actually
differently
from
a
religion
perspective
from
ideology
and
hate.
A
A
And,
of
course,
domain
specific
knowledge
is
reducing
false
alarm,
and
that
is
helping
to
prevent
a
potential
social
discrimination,
and
that
is
specifically
an
important
point,
because
when
we
deploy
such
a
model
on
a
platform
being
used
by
millions
of
people,
it
is
likely
that
an
unfair
mistreatment
is
going
to
occur
just
because
of
the
misclassification
in
the
model.
And
that
is
going
to
be
that's
going
to
affect
mostly
non-extreme
individuals
that
are
not
extremists
or
not
harmful
in
any
way.
A
And
as
a
conclusion,
so
social
media
data
is
giving
us
a
tool
to
understand
this
type
of
Dynamics
as
to
how
these
malicious
groups
and
organizations
are
providing
their
content
or
their
messages
to
Their
audience
and
what
kind
of
tactics
they
are
using
and
we
are
able
to
understand
these.
Using
this
data
and,
more
importantly,
the
malevolent
creativity
has
been
actually
very
effective
for
extremism
as
it
was
affected
for
misinformation
and
for
other
online
toxicity
problems
as
well
and
prior
knowledge.
C
Thank
you
very
much
yeah.
If
people
are
thinking
about
questions
now,
I
can
get
us
started
I.
So,
first
of
all
very
good
talk.
C
There
were
a
couple
of
points
where
I
was
a
bit
curious
and
one
of
them
was
at
the
very
beginning
when
you
were
mentioning
kind
of
the
the
context-
and
you
add
some
data
by
cyber
bullying,
which
have
also
done
some
research
on
I
was
wondering:
are
these
typically
sort
of
like
surveys,
self-reporting
kind
of
kind
of
data,
or
are
there
any
like
computational
tools
like
toxicity
or
anything
like
that?
Do
you
know
how
this
was.
A
Here
this
one:
yes,
so
this
one,
they
were
surveys
I,
believe,
and
these
are
from
other
research
organizations
and
papers
as
well,
and
you
can
look
up.
C
Look
at
the
yeah
I
understand
like
a
lot
of
citations
at
the
bottom.
I
was
yeah
just
just
curious.
If
these
are
mostly
surveys-
yes,
okay,
because
I'm
always
like
curious
about.
If
you
know
they
have
a
new
completion
mode
to
look
at
this
yeah,
okay
and
I'm,
sorry,
but
sorry,
you're
I
think
you
did
a
set
is
mostly
Twitter.
Did
you
look
at
a
particular
time
frame
or
geolocation,
or
was
this
mostly
a
hashtags?
Only.
A
So
specifically,
yes,
we
are
using
tweets
and
the
tweets
were
covering
about
seven
years
of
data
starting
from
2011
to
2017,
and
these
tweets
were
actually
verified
by
Twitter
and
specifically
in
this
in
this
paper,
what
we,
what
we.
C
A
Even
though
these
tweets
were
actually
verified
by
Twitter,
so
we
found
some
Outlets
outpiers
meaning,
so
these
were
verified
as
extremist
twists
or
extremist
individuals.
But
looking
at
their
looking
at
their
content
from
a
ex
domain
expert
perspective,
they
were
not.
They
would
not
be
considered
as
extremists.
So
we
identified
these
outliers
as
like.
Ten
percent
about
10
percent
of
the
content
of
the
users
were
actually
outliers
and
they
were
labeled
as
extremists,
but
they
were
not
extremists
in
the
first
place
and
we
removed
those
from
the
content
from
our
data
sets.
A
After
we
identified
those
outliers
and
looking
at
the
content
of
the
extremist
groups,
so
they
were
mostly
trying
to
promote
their
their
ideology
and
the
ideology.
Meaning
I
mean
Isis,
ended
up
being
able
to
hold
some
territories
and
they
claimed
themselves
as
a
state
right
and
they
were
trying
to
promote
their
state
and
their
ideology,
using
some
ideological
statements
from
some
ideologues
that
they
that
they
follow,
for
example,
for
example,
Imam
and
War
a
lucky
for
example.
A
So
this
person
was
the
number
one
ideologue
followed
by
ISIS
and
many
other
terrorist
organizations
as
well,
and
they
were
promoting
their
ideas.
They
were
promoting
their
sayings
or
what
they
what
they
said
in
at
one
time
before
or
some
other
ideologues
as
well,
and-
and
they
were
talking
about
some
some
attacks-
that
Isis
Isis
did
at
the
time
or
or
some
air
strikes
or
killing
other
people
or
or
this
type
of
content
just
being
exchanged
all
the
time.
A
On
the
other
hand,
they
were
also
attributing
this
type
of
content
to
some
religious
resources
or
some
religious,
for
example.
They
are
also
talking
about
Quran
they're,
also
talking
about
Hadith
or
they're,
also
talking
about
Allah.
So
these
are
all
shared
concepts
with
the
extremist
groups
with
among
extremist
groups,
and
it
it
it.
It
is
creating
a
lot
of
confusion
around
these
Concepts,
so
we
had
to
also
disambiguate
the
meaning
of
these
Concepts
as
well.
C
I
see
you
thank
you
very
much
and
I
I
do
have
a
follow-up
to
that.
But
before
I
do
anyone
else
has
any
questions.
B
Yeah
I've
got
a
couple
of
questions,
good
talk
too,
by
the
way
on
page
26,
when
you're
looking
at
kale
Divergence.
What
are
you
measuring
again?
What
are
your
X
and
Y
axis
E's?
There
are
those
topics
or
tweets,
or
what
do
you
measuring.
A
So
these
are,
these
are
the
extremist
and
non-extremist
groups
and
how
original
their
content
from
each
other.
So
these
groups
were
actually
the
x-axis
on
the
y-axis.
They
are
users
and
how
these
users
are
being
original
from
each
other,
whether
they
are
repeating
each
other
or
they
are
providing
some
novel
content
or
some
novel
information
to
Their
audience.
So
if
they
are
repeating
each
other,
then
it
is
not
actually
original
or
it's
not
novel
right,
but
if
they
are
providing
some
novel
original
information,
so
that
is
novel.
A
So
we
are
looking
at
the
KL
Divergence
as
a
measure
of
somatic
distance
and
trying
to
see
how
now
original
their
contents
is.
B
A
A
So
it's
not
sample,
so
we
are
looking
at
all
of
the
users
here.
Okay-
and
this
is
I
mean
we
are
just
skipping
some
numbers
because
there's
no
space
to
put
all.
B
Of
them
yeah,
okay,
gotcha
yeah,
it
makes
sense.
Okay,
good
now,
that's
yeah,
very
interesting
concept
here
in
a
way
you're
using
that
the
other
question
I,
guess
somewhere
related
to
that.
In
order
to
do
that
comparison,
are
you
extracting
that
text
through
some
kind
of
topic
modeling
or
what
language
model?
How
are
you
extracting,
and
even
back
in
the
previous,
so
we
were
just
discussing
for
Thomas's
question
where
you
had
all
the
all
of
the
phrases
and
topics
extracted
modeling
and
what
are.
A
Also,
they
are
using
language
models,
word
tovic
to
create
engrams,
to
create
the
key
phrases
and
the
that
that's
how
we
that's
how
we
create
these
topics
on
key
phrases.
B
C
All
right,
if
not
I,
will
be
closing
us
out
with
a
couple
of
announcements.
Thank
you
again
to
our
speaker
and
the
so
two
things
coming
up
quite
soon
are
the
s9.22
conference?
C
Actually
people
maybe
are
we
gonna,
see
the
slides,
I,
don't
know
if
there's
a
place,
you
know
to
share
that
kind
of
resources.
Unless
you
you
want
to
exchange
context
there,
you
go
I,
guess
Billy.
C
If
you
want
to
reach
out
yeah
so
as
I'm
is
coming
up
on
the
well,
the
deadline
for
a
submission
is
going
to
be
June
9th
for
the
main
track,
but
we
have
some
others
like
the
PHD
track
on
June
18th,
but
do
check
the
website
and
similarly,
SBP
is
coming
up
in
July,
with
the
regular
submission
deadline
on
July,
1st
and
similarly
other
tracks.
C
On
the
you
know
varying
deadlines,
but
this
is
the
time
frame
you're
looking
at,
but
yes
thank
you
again
for
speaking
and
thank
you,
everyone
for
attending
yeah
with
that
Chris.
If
there's
nothing
else,
I
think
we
can
close
things
in.