►
From YouTube: Antrea Community Meeting 01/30/2023
Description
Antrea Community Meeting, January 30th 2023
A
Perfect
good
morning,
good
afternoon
good
evening,
and
thanks
for
joining
this
instance
of
gentria
community
meeting
depending
on
where
you
are
in
the
world,
it's
either
the
30th
of
January
or
the
31st
of
January,
and
for
today
we
have
a
presentation
related
to
Tia
project.
Most
specifically,
we
have
Tushar
presenting
a
throughput,
a
normal
detector
for
detector
for
Tia
and
yeah.
A
So
this
will
be
a
presentation
that
involves
the
implementation
of
this
detector
in
the
Tia
solution,
with
the
clickhouse
backend
so
and
well,
I,
probably
better
stop
talking
now
and
let
to
Shar
do
all
the
talking
so
to
Sharp.
Please
go
ahead
with
your
presentation.
B
C
B
All
right,
so
basically
Tia
throughput
and
oblique
direction
is,
as
his
name
suggests,
it's
a
anomaly
detection
technique,
so
it
is
kind
of
like
you
are
detecting
any
abnormalities
in
the
network.
So
this
is
one
of
the
features
that
we
have
implemented
as
part
of
Network
traffic
analysis
in
Thea.
So
the
thing
is
like.
B
So
basically,
if
you
understand
this
thing
like
why
there
is
anomalies
in
the
network,
that
could
either
be
because
of
some
minute
simple
reasons
or
it
could
either
be
because
of
a
threat
in
the
network.
So
it
is
always
better
to
have
that
analysis.
To
have
that
things
told
to
us
beforehand,
or
at
least
we
should
be
able
to
analyze
that
there
is
a
through.
B
There
is
an
anomaly
there
is
something
going
on
wrong
in
the
network,
so
this
is
the
this
is
where
the
throughput
and
object
detection
comes
into
picture.
So
we
have.
We
have
used
like
three
algorithms
for
this
thing,
the
first
one
being
ewma,
so
the
exponentially
weighted
moving
average.
So
it
is
a.
It
is
a
basically
used
in
the
time
series
analysis
and
this
model
is
used
for
in
order
to
figure
out
the
throughput
difference.
B
So
basically
there's
a
difference
if
there
is
a
difference
between
the
forecasted
throughput
and
the
actual
throughput,
and
the
difference
is
too
big,
we'll
be
saying
that
there
is
an
anomaly
in
that
thing.
So
how
do
we
detect
that
this
algorithm
uses
a
weighted,
algor
weighted
points
in
order
to
figure
out
where
whether
the
the
analyze
throughput
or
the
calculated
through
in
order
to
find
the
calculated
throughput?
B
So
what
it
does
it
gives
it
gives
higher
weights
towards
the
newer
points
and
the
lower
low
low
weights
towards
the
older
points,
and
that
way
it
it
is
able
to
figure
out
like
this
there's
a
whole
completely
a
complete
derivations
in
this
equation
in
this
which
we
will
not
go
through,
because
that
will
be
too
much
so
for
now.
It
is
basically
just
understand
it
like
this
that
you
have
some
throughput.
B
That
should
be
there
and
then
there
is
a
calculated
throughput
and
if
the,
if
the
difference
is
too
much,
that
is
where
you
find
it
an
obli
and
the
second
one
is
arima,
so
I
call
it
a
rhyme
I'm,
not
sure
if
it
is
a
r.
I
m
a
sorry
if
I'm
pronouncing
it
wrong.
Auto
regressive,
integrated
moving
average
model.
So
basically
it
is
also
the
subset
of
linear
regression.
B
So
it
basically
what
it
will
do
is
like
it
will
figure
out
a
linear
pattern
in
the
in
the
model
where
you
find
the
throughputs
and
then
if
there
is
a
throughput
which
is
way
far,
then
the
linear
regression
model.
Then
we
can
say
that
yeah
there
there
was
an
anomaly
in
that
because
it
shouldn't
have
happened
like
this
thing
right,
so
that
is
arima
model
for
you
and
then
there
is
DB
scan
model.
B
So
DB
scan
model
is
basically
like
when
you
do
the
density
based
spatial
clustering
of
application
with
noise.
So
what
it
means
is
like.
B
Let's
say
that
you
have
some
throughput
values,
that
is,
that
is
making
a
higher
density
point
in
one
area
and
then
there
is,
there
could
be
multiple
clusters
of
density
points
in
in
any
in
any
network,
and
if
you
find
any
point
or
a
bunch
of
points
that
are
like
far
away
from
these
things
from
these
clusters,
then
you
can
obviously
say
that
yeah
there
is
an
anomaly.
B
So
that
is
like
the
big
picture
over
here
in
this
now,
if
we
go
into,
how
did
we
implement
it,
throughput
anomaly,
detection
and
what
actually
happens
in
the
back
end?
So
what
we
actually
do
is
like.
We
have
created
one
crd,
so
a
custom
resource
definition
in
arthria
in
RTR
software.
So
what
what
that
does
is
it
allows
so
custom
resource
definitions
basically
allows
you
to
have
a
custom
resource
deployed
on
your
on
your
cluster.
B
So
let's
say
that
I
have
made
a
custom
resource
of,
for
example,
throughput
anomaly,
detection
spark
job,
so
you
can
have
a
pod
that
will
run
specifically
throughput
anomaly:
detection
spark
jobs.
If,
if
you
want
us
a
different
custom
resource,
you
can
do
that
as
well,
and
this
is
basically
the
crds
are
basically
to
for
you
to
use
any
custom
resource
that
you
want
to
use
in
the
in
the
cluster.
So
what
happens
once
you
have
created
the
cluster
and
now
what
you
have
done
is
like.
B
You
have
started
an
instance
of
that
custom
resource.
So
once
you
have
you,
you
give
the
kubernetes
that
there
is
a
definition
and
if
there
is
a
resource
that
obeys
this
definition,
please
help
me
out
and
give
that
definition,
the
name
of
the
name
and
the
source
of
the
custom
resource
definition,
I'm,
sorry
I'm
using
so
many
words
over
here,
but
basically
it
is
like
that.
So
so
what
once
that
crd
is
deployed
in
the
kubernetes?
And
then
you
try
to?
You
include
new
object
as
a
custom
resource
in
this.
B
What
you
will
do
you
you
can
either
do
it
through
a
yaml
file
or
you
can
use
our
tscli
for
this
demo.
We
are
going
to
use
the
tsclr,
but
you
can
obviously
use
Yami
files
as
well
and
I.
Can
then
you
can
also
like
curl
to
that
base
API,
so,
basically
yeah,
when,
whenever
you
create
a
custom
resource,
the
customary
resource
definition,
you
also
create
a
base
API
and
all
the
resources
based
on
that
custom.
Resource
definition
are
going
to
be
under
the
base
apis,
and
then
it
could
be
different.
B
Things
like
you
can
do
views
you
can
do
lists.
You
can
do
stats,
you
can
do
status
and
a
lot
of
things
and
thus
that
specifics
API.
So
after
you
have
created
a
crd
yeah
and
then
you
create
a
new
job.
So
what
happens
when
you
create
a
new
job?
The
tscli
is
going
to
so
the
place
where
you
use
the
CLI
that
is
CR
manager.
Then
this
Thea
manager
is
going
to
include
that
tscli
and
he's
going
to
send
it
to
a
controller.
B
So
the
controller
that
is
going
to
be
responsible
here
will
be
the
Tad
controller,
throughput
anomaly,
detection
controller.
Now
this
this
controller
is
going
to
utilize
all
the
arguments
that
we
have
passed
to
the
Tia
manager
and
it's
going
to
make
a
specific,
specific
command.
That
will
help
us
invoke
a
spark
of
invoke
a
spark
job.
B
Now,
how
will
a
spark
job
be
invoked
and
that
that
happens
because
after
the
controller
it
goes
to
a
spark
operator,
a
spark
operator
is
is
basically
a
place
where
you
have
where
you
have
already
made
sure
that
your
images
of
the
new
job
should
be
present.
B
So
the
spark
operator
will
take
this
crd
and
will
create
a
will,
create
a
new
instance,
for
example,
at
our
stage,
we'll
have
the
Tad
instance
and
what
I
mean
by
tid
instance
is
having
a
driver
spark
driver
pod
that
Spark
Run
driver
Port
is
then
going
to
create
the
executor
ports.
Now
executor.
Ports
are
basically
responsible
for
any
action
that
you
do
in
your
specific
application.
B
So
we
have
a
stat
application
that
is
going
to
do
some
things
in
the
back
end
and
this
and
this
things
are
going
to
happen
in
the
spark
in
the
spark
executor
board
and
now
the
things
that
are
going
to
happen
in
the
back
end
are
basically
the
spark
executor
is
going
to
run
the
script.
The
script
is
going
to
take
all
the
arguments
that
we
have
passed
to
it.
B
It
will
go
and
analyze
the
data
that
is
already
present
in
the
click
house
and
table
from
in
in
our
kubernetes
cluster
and
from
this
clickhouse
data
it
will
take
all
the
flows
table
now.
The
flow
stable
has
the
data
of
source
IPS,
Source,
Port
destination,
IPS
destination
ports.
What
protocol
we
are
using
and
the
the
throughputs
that
we
have
seen
until
now,
so
that
they
are
the
actual
throughputs
that
we
have
seen.
B
So
it
is
going
to
contain
all
those
data
now
that
data
will
be
passed
to
to
a
specific
algorithm,
the
algorithm
that
you
can
choose
as
a
user
and
that
algorithm
will
go
and
calculate
all
the
throughputs
that
should
have
been
there
and
based
on
the
throughputs
that
have
been
calculated
and
the
ones
that
were
already
present.
We
are
going
to
see
if
there
is
any
difference.
If
the
difference
is
too
high,
then
we
can
say
that
yeah
there
is
an
anomaly.
B
If
the
difference
is
not
too
much,
then
we
can
say
yeah,
it's
just
like
it's
simple
calculation
thing,
so
we'll
just
go
ahead
with
it.
So
it
is
not
an
anomaly.
Actually
so
I
know
like
this
is
this
is
a
lot
of
thing
that
I
just
thought
I
just
threw
on
here,
so
to
see
it
in
action
and
to
understand
it
more.
Let's,
let's
move
towards
the
demo
part
of
it.
B
So,
as
I
was
saying,
there
should
be
a
crd,
there
should
be
a
custom
resource
definition,
and
this
is
this
is
the
custom
resource
definition
and
there
would
be
Tia
throughput
anomaly
detectors
the
name
of
the
name
of
the
resource
that
we
are
going
to
pass,
and
then
there
will
be
the
apis
is
going
to
be
stored
in
the
crdpi
and
triad.io
the,
as
we
know
that
there
there's
always
like
some
some
things
that
are
required
in
a
in
any
new
resource.
B
For
us,
the
only
required
resource
is
the
spec,
and
the
spec
will
have
only
one
required
resource,
and
that
is
the
job
type.
Now,
what
is
the
job
type
will
go
or
and
talk
about
this
later,
for
now
it
is
like
there
will
be
a
job
type.
There
could
be
a
start
interval
an
interval.
There
could
be
instances,
there
could
be
driver
code,
requests
and
everything,
and
so
just
just
to
give
you
a
little
brief
on
this
thing.
B
So
these
last
five
things
the
executed
instances,
driver
code,
request,
driver
memory,
execute
a
core
request,
execute
a
memory.
This
all
are
basically
the
arguments
that
you
send
to
a
spark
job,
so,
as
your
spark
can
reserve
specific
memory
specific
course
for
you
as
a
driver,
as
well
as
for
the
executor
pod.
So
these
These
are
basically
for
that
spark.
The
end
interval
is
start
and
start
interval
is
basically
like
from
where
till
where
you
want
to
see
if
there
is
any
normally
and
then
job
type.
B
Basically,
it
means
like
which
algorithm
you
are
going
to
use,
so
it
could
be
the
arima,
it
could
either
be
awma
or
it
could
either
be
DB
scanned.
So
as
we
can
see
that
we
currently
have
like
five
pods
running
in
our
flow
visibility
flow
visibility
in
namespace
and
what
we
can
do
is
we'll
go
to
the
Tia
manager,
so
clear,
my
tscl
library,
sorry,
is
basically
it
is
like
normal
CLI
that
we
see
everywhere.
B
So
it
has
different
commands
that
you
can
use
the
clickhouse
completion,
help
policy
recommendations
or
a
support,
bundle,
throughput
and
Omni
Direction
and
the
one
that
we
are
interested
is
in
this.
So
let's
go
inside
this.
Let's
say
it's:
ubered
I,
normally
detection
and
let's
go
into
the
help
of
that.
So
as
you
go
inside
the
help
it
says
like
there
should
be
a
command
that
should
be
ewma,
arima
or
DB
scan
and
also
it
says,
Alias
as
it
could
be
either
throughput
a
normally
direction
or
it
could
be
Tad.
B
So,
as
you
just
saw
me,
like
writing
a
whole
thing.
Instead
of
this,
we
can
just
write
that,
like
Tia
Tad
help,
so
that
also
works
now
the
commands
that
are
available
with
us
is
delete
list
retrieve,
run
and
Status.
The
one
that
we
want
to
do
right
now
is
run.
So,
let's
try
to
we'll
we'll
get
back
we'll
get
back
to
all
the
other
commands
as
well,
but
for
now
we
are
going
to
use
the
Run
command,
so
let's
say
Tia
that
run
and
let's
go
into
the
help
of
that.
B
So
as
we
go
into
the
help
of
the
run,
as
you
see,
we
have
different
things
as,
as
we
saw
in
the
cids
driver
code,
request,
driver
memory
and
times
and
time
execute
a
core
request.
Executor
instances
execute
a
memory
start
time
and
then
the
type
type
of
the
screen
is
basically
is
the
type
of
the
algorithm
that
you
are
going
to
use.
So
for
this
demo
we
are
going
to
use
that
run
and
let's
take
the
type
as
arima,
and
let's
give
it
the
driver
memory
as
1GB.
B
So
so
basically,
this
is
this
is
this:
is
the
command
that
you
have
to
pass
from
the
CLI
part
of
it?
So
once
you
do
that,
you
see
successfully
started
throughput
and
omelette
detection
job
with
name
arima.
So,
let's
see
which
instance
it
has
created.
As
we
can
see
yeah
see
over
here,
we
have
a
new
spark
driver
that
has
been
that
has
started
running.
So
let's
try
and
check
its
status
so
first
to
check
the
status.
We
need
to
know
what
is
the
ID
that
it
has
like.
B
We
can
obviously
fetch
ID
from
here,
but
let's
see
if
we
can
have
it
from
from
our
list
as
well
so
theatad
list
and
if
we
say
so,
it
says
like
there
is
a
name
that
is
this
and
then
there
is
a
status
that
is
running
right.
So
now
I
understand
like
this
could
be
an
ID
but
and
do
not
get
confused
due
to
name
in
this
I'll
get
I'll
show
you
why
I'm
saying
this
as
name.
B
Let
me
actually
show
it
quickly
to
you
guys
like
why
I'm
saying
this
as
a
name.
So,
as
you
remember,
I
told
you
there
will
be
a
base
API
So.
Currently
we
have
the
space
API
under
the
anomaly
detected
CR
entry
Ohio
with
the
resource
name
as
throughput
and
omelette
detector.
So
let's
try
and
get
this
I
guess.
I
do
not
have
the
token
here
here.
B
Okay,
we'll
get
back
to
that
thing
later.
Let's
try
to
do
it
over
here,
because
the
executor
board
is
started
running
so
they
are.
The
driver
has
already
included
an
Executor
port
and
the
executive
Port
is
running
right
now
and
it
is
doing
its
things.
Let's
see
what
is
the
status
of
this
job
so
to
do
that
we
have
status.
I
can
show
you
help
as
well.
So
it
says
like
in
help
you
can
just
show
the
name.
So
name
is
basically
like
you
can
either
just
write
the
ID
of
that.
B
So,
for
that
reason,
let's
first
try
that
list
and
the
list
we
are
going
to
include
the
status
status
will
be
of
this
specific
ID
and
if
we
go
into
this,
so
it
says
like
50,
has
been
completed.
So
there's
like
this
status
is
going
on
in
this
thing,
so
yeah.
Currently
it
is
running,
it
is
showing
all
the
status
that
it
has
completed
right
now,
stage
is
all
that
it
has
completed.
B
It
should
take
around
two
minutes
for
that
time
and
as
as
I
told
you
that,
as
I
told
you
in
this
slide
that
once
the
executor
pod
has
executed
all
the
function
or
has
done
everything
it
is
going
to
update,
it
is
going
to
populate
a
table
inside
the
clickhouse
database
and
the
table
name
will
be
ta
detector,
so
we
can
for
the
on
the
side.
B
We
can
go
towards
our
clickhouse
and
once
you
are
inside
clickhouse,
let's
see
if
it
has
the
tables,
and
we
can
see
there
is
a
TA
detector,
NTA
detector,
local.
Just
to
avoid
any
confusion.
These
tables
are
made
as
a
part
of
clickhouse
deployment
and
it
is
not
responsible
because
of
the
crd
that
you
create,
so
there
could
be
multiple
other
tables
that
has
nothing
to
do
with
the
crd
and
still
be
present
inside
the
clickhouse
database.
B
So,
as
we
see
the
driver
is
also
completed,
let's
see
the
stage
she
says
like
the
status
of
this
anomaly.
Detection
job
is
completed
so,
okay,
so,
let's
see
what
it
has
done.
B
So,
let's
go
select
all
from
and
the
table
name
is
TA
detector
and
you
can
see
it
has
created
21
rows
in
that
and
now
just
to
show
you
guys,
like
I'll,
show
you
what
is
this
data
and
everything
in
a
better
way,
but
just
to
show
you
how
many,
how
many
actual
flows
were
present
in
which
we
have
seen
the
the
anomalies
in
21
rows
so
select
all
from
close.
That
is
the
input
table
that
we
use
currently.
B
So,
as
you
can
see,
there
are
3000
rows
in
this
and
we
have
only
stored
the
data
for
the
you
know
in
which
there
is
an
anomaly.
So
we
are,
we
are
not
going
to
use
the
whole
data
store.
We
and
we
are
not
going
to
duplicate
the
storage
in
this.
So
as
like
just
to
keep
the
memory
short,
so
we
are
only
going
to
use
the
we
just
use
the
places
where
there
is
an
anomaly
and
now
to
see
it
in
a
better
shape.
B
Let's
see
Tia
that
and
let's
go
back
to
the
health
thing,
so
we
have
already
seen
list.
We
have
already
seen
status
now,
we'll
do
the
retrieve
so
retrieve
is
basically
like
having
a
result
of
what
you
have
done
so
instead
of
status,
I'll
just
write
retrieved
over
here.
B
B
Json
path,
so
you
have,
you
can
see
like
this
is
the
ID.
This
is
the
ID
that
we
were
using
right
now.
This
is
the
source
IP,
there's
a
that's
the
source,
Port
the
destination,
IP
destination
port
and
the
place
where,
when
we
saw
this
anomaly
is
this:
we
calculated
the
throughput,
which
was
little
different
with
what
we
saw
and
there's
like
throughput
point
in
like
two
of
these
two
of
the
such
flows
from
this
port.
B
So
that
is
one
and
then
you
can
see
the
next
ones,
and
these
would
be
specifically
like
if
I
have
to
guess
actually,
not
guess,
but
actually
the
truth
is
like
these
are
going
to
be
21
tier
21
data.
We
are
working
on
whether
to
write
this
data
as
Json
or
whether
we
want
to
put
it
as
a
table,
so
just
just
to
make
it
easier
for
the
user
to
go
through
this
table
and
figure
out
what
is
going
on.
B
So
this
is
basically
a
basically
the
thing
and
to
show
that,
let
me
try
to
get
the
command
if
I'm,
not
wrong,
I,
just
miss
the
token.
So
let
me
try
to
find
the
token
for
now.
D
B
Now
see
yeah,
so
we
can
see
the
this
crd
has
been
registered
and
we
can
see
the
and
the
the
ID
is
over
here.
So
the
reason
why
we
use
it
as
a
name
is
because
in
the
metadata
we
have
created
it
as
a
name.
So
that's
why
we
keep
it
as
a
name
over
here,
so
that
is
pretty
much
it
and
the
one
only
only
command
that
is
left
is
now
the
delete
one.
B
So
I
can
show
you
guys,
like
Tia,
dad
delete
and
the
same
thing
just
keep
this
ID
and
it
has
been
deleted
now,
if
we
see
inside
we'll,
let's
go
and
try
in
the
detector,
so
the
table
has
been
cleared
if
there
would
have
been
any
driver.
B
That's
due
to
Port
running
that
is
also
cleared,
and
now,
let's
try
and
fetch
this
data
again
and
as
we
can
see
that
there
is
a
kind,
but
there
is
no
matter
there
is
no
items
that
are
present,
because
there
is
no
other
detection
job
going
on
just
to
verify
that
completely.
Let's
do
this
throughput
and
only
detection
list-
and
you
can
see
there
is
nothing
inside
this.
So
this
is
basically
throughput
and
ombre
Direction.
Now
what
we?
What
we
do?
What
do
we
do
actually
with
this
data
is
over
here?
B
So,
as
you
can
see,
we
have
calculated
the
we
have
or
the
or
the
present
throughput,
the
one
that
we
already
had
and
then
the
calculated
ewma
throughput
and
the
arima
throughput.
So
we
see
there
are
multiple
spikes
in
this
data
in
this
table
when
we
visualize
the
table
that
we
just
got
from
the
flows
and
the
ones
that
we
calculated
after
doing
the
spark
shop.
B
This
data
shows
us
that
there
are
multiple
bytes
one,
two
three
four,
five
six
in
particular,
but
we
can
see
that
the
arima
or
the
ewma
reaches
the
first
spikes
like
they're.
Quite
close,
they
are
not
that
far
away.
However,
when
we
move
towards
the
Ender
and
the
two
spikes
they
are
quite
far
so
that
that
is
where
we
see.
We
say
that
yeah
there
is
anomaly,
so
not
every
every
Spike
could
be
an
anomaly
but
yeah
some
spikes
could
be
an
anomaly,
and
this
thing
is
what
we
are
calculating.
B
We
are
figuring
out
whether
it
is
an
anomaly
or
not,
so
this
is
basically
the
presentation
I've
taken
the
references
from
yongming's,
Radio,
cable
or
radio
paper
and
also
from
subramanian's
internship
project,
so
that
is
pretty
much
from
my
side.
B
Thank
you.
Let
me
know
if
you
guys
have
any
questions.
A
E
So
so,
actually
so
this
anomaly
detection
is
for
like
what
is
the
purpose
of
it?
Like
I
mean
it
is
like
walks
across
a
single
cluster
or
like
it
can
work
across
multiple
cluster
and
yeah.
So.
B
Yeah,
so
basically
the
purpose
of
having
the
throughput
anomaly
detection
is
just
to
figure
out.
If
there
is
an
anomaly
it
could,
as
I
said
like
it
could
be
either
because
of
some
threat
or
it
could
be
just
because
something
went
wrong
in
the
in
the
network
or
something
like
that.
But
we
should
be
aware
of
that.
There
is
an
anomaly
going
on
in
our
Network
and
based
on
those
anomalies.
We
can
figure
out
if
it
is
a
threat,
so
it
is
basically
a
part
of
networked
traffic
analysis.
A
Your
question
is
operates
on
the
data
that
are
stored
in
the
Tia
flow
of
database
and
at
the
moment
these
are
the
data
which
are
sent
from
the
anterior
flow
aggregator,
which
pertain
only
a
single
cluster.
So
we
don't
have.
We
don't
run
the
analysis
across
data
for
multiple
clusters,
yet.
A
That's
right,
that's
right!
If
you
had,
if
we
are,
if
you
had
a
data
pertaining
a
network
flows
for
multiple
clusters,
then
you
know
it
would
also
identify
throughput
anomalies
in
traffic
across
multiple
clusters,
because
at
the
end
of
the
day,
the
throughput
anomaly
detector
is
an
algorithm
that
analyzes
data
in
a
in
a
network
flow,
and
you
know
it
doesn't
care
whether
certain
destinations
are
always
in
in
the
same
cluster
or
not.
E
So
like
so,
it
is
also
like
a
DB
based
data
or
like
it's
just
a
network
flow
on
it.
A
The
network
flows
are
stored
in
the
clickhouse
database.
That
to
share
was
mentioning.
Basically,
this
is
part
of
the
generic
of
T
architecture,
where
you
have
as
an
Indian
entry
agent.
We
have
the
flow
exporter
that
captures
Network
flows
and
sends
them
to
a
centralized
aggregator.
The
aggregator
does
correlation
between
flows,
to
match
sources
and
destination
and
then
sends
them
either
to
clickhouse
for
the
in-house
installation
or
to
snowflake
when
people
are
using
snowflake
as
a
backend
and
and
yeah,
and
then
that's
once
the
once.
The
data
are
in
the
click
of
database.
A
This
spark
job
analyzes
the
data
in
the
database.
That's
so,
let's
say
that
the
the
part
about
running
the
algorithm
and
analyzing
the
data
is
sort
of
a
different
pipeline
from
supplying
the
data
in
the
database.
A
Okay,
good
yeah,
please
any
more
questions.
F
Is
this
possible
to
have
to
have
it
run
in
the
background
instead
of
like
triggering
jobs,
manual.
B
F
D
F
B
Yeah,
so
basically,
this
is
this
data,
as
as
we
know
that
R
cluster
is
like,
we
do
not
have
too
much
anomalies,
so
we
just
introduced.
We
just
took
the
startups
from
our
cluster
and
then
just
explicitly
included
the
anomalies
in
that
just
to
see
if
we
are
able
to
accurately
figure
out
whether
there
was
an
anomaly
or
not.
So
this
data
is
basically
like
input.
Data
taken
from
the
cluster
then
included
some
with
some
anomalies
and
then
ran
with
the
ewma
and
the
arima
algorithms.
F
D
F
B
F
Well,
if,
if
the
user
thinks
that
too
many
events
are
generated
and
that
the
detection
is
too
sensitive,
is
there
a
way
to
tweak
some
parameters
so
that
fewer
events
or
generated
basically,
some
things
that
may
have
been
considered
an
anomaly
before
will
not
be
considered
an
anonymity
anymore
with
the
new
parameters?
But
you
still
have
some
events
for
the
most
like
extreme
cases,.
B
Yeah,
like
I
guess
like
Yoming,
would
be
better
to
answer
this
thing,
as
this
thing
was
basically
based
on
his
paper,
so
I
I
guess
like
basically
it
is
like
ml,
so
it
is
like
training
and
getting
the
data
result
data
so
that
could
be
possible.
But
currently
we
are
not
working
on
that,
but
young
young
can
provide
better
point
on
it.
C
Yeah
definitely
just
related
to
arguments
you
pass
to
the
algorithm.
It
could
be
triggered
to
a
more
sensitive
one.
But
if
you,
if
the
user
finds
too
much
negative
positive
or
too
much
events,
we
could
pass
some
arguments.
Zero,
CRI
to
let
the
algorithm
less
sensitive
or
we
could
set
a
higher
threshold
to
generate
less
to
generate
less
events.
C
A
Hello,
just
a
quick
question
from
me
regarding
the
efficiency
of
the
effectiveness.
Sorry
of
those
algorithms
was
the
minimal.
Is
there
a
minimum
duration
of
flows
voltage
algorithm
to
make
sure
that
they
can
make
an
accurate
detection.
B
Currently,
it
is
like
so
for
arima.
We
cannot
take
any
data
if
the
data
difference
between
the
start
and
the
stop
is
less
than
three
seconds.
A
And
whereas
for
the
others.
B
Well,
for
the
others
like
DB
scan,
so
that
is
like
cluster
based
thing,
so
that
does
not
include
any
time
restriction
and
I
I
guess
the
same
is
for
the
ewma,
so
that
does
not
have
any,
but
arima
has
like
three
seconds,
but
that
can
also
be
tuned.
We
just
used
L3
seconds,
but
that
can
also
be
tuned.
D
C
So
I
at
some
point
because
on
Twitter
makes
three
seconds,
and
actually
it
will
more
likely
because
of
the
lack
of
data
points
will
cause
logarithm
less
accurate
for
the
Army
at
least.
We
need
three
data
points
for
each
connection,
so
it
will
depends
on
the
interval
of
flow
aggregator
if
their
interval
is
60
seconds,
which
means
that
we
need
three
minutes
to
have
to
have
an
optical
point
for
each
connection
right.
So
yeah
I
just
want
to
correct
that.
A
A
Okay,
perfect
seems
that
it's
all
for
this
topic,
so
many
thanks
to
the
game
to
char
for
this
presentation
of
this
demo.
We
hope
to
see
the
code
merging
to
the
tsos
code
base
as
soon
as
possible.
That's
that
will
be
a
great
addition
to
the
NTA
capabilities
for
the
Tia
project
and
yeah
and
I.
Think
that's
all
on
this
topic,
and
do
we
have
any
other
topic
for
that
you
like
to
bring
up
for
discussion
today.
So
let's
go
now
for
open
discussion,
anything
that
you
like
to
discuss.
A
D
A
Okay,
five
four
three
two
one
and
that's
it
so
I
would
like
to
thank,
as
usual,
everyone
for
attending
thanks
again
to
share
for
your
presentation,
and
we
will
have
our
next
meeting
on
February
the
14th.
A
So
that's
all
for
today
and
I
wish
everyone
a
good
good
night,
a
good
day
or
a
good
afternoon,
thanks
for
joining
again
and
bye.