►
From YouTube: Chatanomaly - Chetan Surpur
Description
2015 HTM Challenge Application submission (ineligible for prizes).
A
First,
one
we're
going
to
do
is
called
chat,
anomaly
and
I
when
I'm
going
to
note
on
some
of
these
submissions
are
actually
in
eligible
for
prizes
because
they
are
worked
upon
by
new
menta
employees
or
partners
of
new
menta
and
their
employees,
but
I
really
felt
bad
about
excluding
them
from
the
showcase
I
mean.
Why
not
show
what
people
have
done
so
I'm
going
to
still
show
them.
B
Hello
new
pic,
this
is
Chayton
support,
I'm
going
to
be
demoing,
my
new
big
challenge
project.
This
project
is
an
example
of
new
pic
that
allows
you
to
take
your
own
google,
hangouts
data
and
feed
in
the
message
activity
into
new
pic
to
detect
anomalies
in
that
message
activity.
This
project
is
on
github
at
this
URL.
You
can
follow
the
installation
instructions
here,
one
of
the
installation
instructions
to
visit,
Google,
takeout
and
download
the
hangouts
data
that
Google
provides
when
you
do
that.
C
B
I've
done
that
you
can
look
into
this
data
directory
and
you'll
see
that
it
has
extracted
all
the
conversations,
and
each
of
these
numbers
is
the
number
of
messages
total
in
that
conversation.
So
let's
select
one
of
these
messages
in
one
of
these
conversation
and
run
it
through
new
pic.
They
all
pick
this
one
here.
The
first
one
and
I'll
run
is
your
new
pic
and
it
by
running
this
command,
it's
going
to
run
it
through
to
a
pic
and
put
the
results
in
the
output
directory
and
plot
the
results.
B
So
this
is
going
to
take
some
time
just
let
it
run
okay,
we
have
some
results,
so
let's
take
a
look
up
at
the
top.
We
have
the
number
of
messages
over
time,
so
the
x
axis
is
time.
The
y
axis
is
number
of
messages
in
a
one-hour
aggregation
window
at
the
bottom.
We
have
the
anomaly
score
that
new
pic
produced
for
that
given
moment
in
time,
and
these
red
indicators
are
indicators
of
where
the
anomaly
was
high,
where
the
anomaly
likelihood
was
over
fifty
percent.
B
So
we
can
see
here
that
there's
some
anomaly
indications
when
something
unusual
is
happening
here
we
had
some
spikes
and
over
here
we
start
seeing
more
frequent,
more
frequency
of
messages,
and
here
we
have
a
big
spike,
and
so
the
interesting
thing
is
that
there's
a
few
anomalies
in
this
one,
almost
one
year,
wind
period
and
that's
because
new
pic
doesn't
produce
a
lot
of
false
positives.
It
only
produces
an
anomaly
when
something
when
it's
really
sure
that
something
is
happening.
B
That's
how
the
anomaly
likelihood
algorithm
works,
so
you
can
be
sure
that
you're
not
going
to
get
a
lot
of
annoying
notifications
from
this
system.
You
only
get
notified
when
something
really
unusual
is
happening,
and
this
is
nice
because,
typically
with
Google
Hangouts,
if
you
enable
notifications
on
a
group
hangout,
which
is
the
case
here,
you'll
get
a
lot
of
notifications
and
I
tend
to
mute
the
conversation,
because
there
is
too
many
notifications,
but
it
would
be
nice
to
get
only
a
few
when
something
really
unusual
is
happening.
B
So
this
is
one
data
set
of
one
conversation.
Let's
take
a
look
at
another
one
and
run
a
bigger
data
set.
This
is
a
conversation
with
almost
10
times
the
number
of
messages
in
it-
and
these
are
all
group
hangouts
group
conversations.
So
there's
a
lot
of
messages
very
frequent,
which
is
what
this
application
is
most
useful
for.
So
let's
just
let
it
run
and
take
a
look
at
the
results.
B
Ok,
that
is
completed.
We
have
results.
So
here
we
can
see
it's
that
there's
a
lot
more
conversation
messages.
It
goes
up
to
almost
600
over
600
messages.
This
is
a
more
busy
conversation
and
we
see
some
anomaly
indications
here.
Interesting
thing
is
that
not
all
of
these
are
for
spikes.
Some
of
them
are
just
for.
The
frequency
of
the
messages
have
changed
just
like
in
the
previous
conversation,
so
we
see
that
new
pic
is
able
to
pick
up
anomalies
even
when
they're,
not
just
threshold-based
anomalies.
B
So
this
is
interesting
this
project,
because
there
are
some
interesting
integrations
that
can
be
done
with
this
kind
of
a
project.
If
this
was
integrated
with
Google
Hangouts,
then
it
could
solve
the
problem
of
too
many
notifications
in
especially
in
group
conversations
where
there's
a
lot
of
activity.
You
don't
want
to
be
notified
on
every
single
message
if
it's
not
an
important
conversation,
but
you
do
want
to
come
back
to
it
when
something
interesting
is
happening
with
people
having
in
an
interesting
discussion.
B
So
this
could
be
applied
there
there's
online
conversation
forums
like
the
comment
section
on
popular
news
outlets.
That
would
be
nice
if
you
could
get
email
notifications,
when
people
were
the
message
activity
level
was
at
an
unusual
level.
That
would
be
a
nice
feature.
So
there's
a
lot
of
places
where
people
are
having
conversations
that
would
be
nice
to
get
enough
notified
when
something
interesting
is
happening.
So
that's
the
project
and
you
can
visit
the
the
new
pig
dot
hangouts
repo
on
github
to
try
it
out
for
yourself
with
your
own
data.
Thank
you.
So.
A
Just
want
to
note
that
what
Jaden
did
here
is
a
is
a
pretty
typical
data,
aggregation
strategy
for
events
over
time
that
we
we
can
do
with
a
lot
of
different
data.
That
represents
just
things
that
happen
over
time.
If
you
don't
want
to
evaluate
the
context
of
the
thing
or
the
content
of
the
thing,
you
can
do
it
just
by
the
timing
of
the
events,
which
is
what
Jaden
did
with
this.
So
open
it
up
to
the
panel
for
questions
or
comments.
E
Always
really
cool
to
see
what
people
think
of
in
terms
of
how
to
use
anomaly
detection.
So
this
is
a
really
cool.
Creative
application.
I
think
one
question
I
had
was:
when
you
saw
anomalies
there,
did
you
ever
try
to
go
back
to
the
messages
to
see?
Is
it
was
it
actually
up
interest?
Was
it
really
unusual?
E
B
Get
a
chance
to
do
that,
but
that
would
be
the
next
interesting
thing
to
do
here
since
the
message
activity
level
is
so
high,
there's
so
many
messages
of
during
that
anomalous
time
period.
It's
hard
to
visualize
try
to
root
cause
analysis
there,
but
I
think
that
would
be
a
cool
next
step
here
would
be
to
look
at
a
sample
of
the
messages
during
that
time
and
see
if
the
conversation
nature
has
changed
in
some
way.
A
B
That's
a
great
idea:
yeah
I
mean
it's
in
some
sense
it
could
be.
There
could
be
two
different
metrics
that
you're
tracking
of
the
conversation
just
like
with
grog
for
stocks.
You
have
multiple
metrics
in
the
same
server.
This
could
be
a
message,
activity,
level
and
content
of
the
messages
using
cortical
I.
Oh
so
that
would
be
a
nice
way
to
track
conversations
and
get
notified
on
different
aspects
of
it.
Yeah.
E
F
B
I
did
not
feed
in
the
time
of
day
and
Dave
week,
information
as
part
of
the
data,
but
that
would
be
a
interesting
investigation
to
do,
because
we
do
that
for
other
applications
like
hot
Jam
uses
this
time
of
day,
so
yeah
I
think
there
could
be
and
some
it
could
perform
better
or
they
could
be
more
interesting
anomaly
as
if
we,
if
I,
include
that
information
yeah.
That's
that's
also
a
good
idea.
B
I
think
it
might
depend
on
the
time
I'm
sure
there.
If,
if
there
was
like
a
big
change
in
the
frequency
and
the
message
frequency,
you
saw
that
in
some
cases
where
the
conversation
got
more
sparse
or
more
dense
than
there
was
an
amelie
trigger.
If
it
would
be
I
guess
what
I
could
try
is
like
just
zeroing
out
a
bunch
of
values
and
seeing
if
there
was,
if
it
detects
an
anomaly,
it
should,
and
it
might
also
depend
on
the
aggregation
window.
If
it
sees
a
long
long
silences
with
respect
to
the
aggregation
window.
A
D
Jaden
no,
but
is
kind
of
a
product
request
to,
but
can
this
be
hooked
up
to
like
the
mailing
list
so
that
I
could
subscribe
to
be
alerted
when
interesting
conversations
are
happening
on
the
mailing
list
so
that
you
know
my
attention
could
be
directed
to
you
know
to
it
when
now
something's
really
hot,
you
know
that's
being
talked
about
so
like
I,
don't
know,
conversation,
submission
frequency
or
something
like
that.
It's
just
an
idea.
Yeah.
B
B
It's
it,
I
don't
know
how
well
it
would
work
with
the
like
level
of
activity
that
we
see
on
the
mailing
list.
In
this
case,
the
reason
I
chose,
Google
Hangouts
is
because
of
the
nature
of
the
conversations
especially
group
messages
happen
really
frequently.
Every
hour
there
may
be
messages,
but
with
the
with
mailing
list,
it
might
be
harder
to
like
it
might
take
longer
to
train,
for
example,
but
if
they're
love
like
Judy
was
high
enough,
then
it
could
be
an
interesting,
a
useful
application
of
it
that
the
community
could
use
and
yeah.
B
You
could
just
try
it
out
because
it's
available,
and
it's
it's
actually
designed
it
kind
of
my
intention
with
this-
is
also
to
be
an
example
application
for
a
new
pic
that
if
you
want
to
try
out
new
pic-
and
you
don't
have
you
want
to
try
it
with
data,
that's
relevant
to
you
that
you
can
understand
better.
This
would
be
something
you
can
try
out,
even
more
so
than
being
a
useful
application
in
the
real
world.