►
From YouTube: OpenShift Commons Briefing #55: Monitoring OpenShift & Detecting Performance Anomalies with CoScale
Description
In this session, CoScale’s Peter Ariji and Samuel Vandamme will demonstrate how to monitor your OpenShift environment with CoScale’s container monitoring platform. CoScale tracks container metrics and lifecycle events, combined with detailed in-container application metrics to give visibility in your full stack running on OpenShift.
Speaker: Peter Arijs, Product Marketing Manager and Samuel Vandamme, Product Specialist – CoScale
A
A
So
I'm
going
to
let
Peter
start
us
off
with
a
bit
of
an
overview
of
co
scale
and
Samuel
is
going
to
give
us
a
deeper
dive
in
a
demo
into
their
they're,
offering
the
format
for
this
is.
If
you
have
questions
during
while
people
are
talking,
put
them
into
the
chat
Samuel
here
or
I,
or
one
of
the
other
folks
are
on
we'll,
try
and
answer
them,
and
once
the
presentations
are
done
and
the
demo
is
done,
we'll
open
it
up
for
Q&A
for
everybody.
So
without
any
further
ado,
Peter
take
it
away.
Ok,.
B
Thank
you
very
much
Diane
and
you
pronounced
my
name
very
well,
so
no
problems
there
and
thank
you
everyone
for
joining.
Let
me
say
a
few
quick
words
about
co
skill.
First,
we
do
as
Diane
sat.
We
offer
a
monitoring
solution,
so
we
call
it
full
stack
performance
monitoring,
but
really
focused
on
microservices
environments
such
as
an
open
shift,
and
our
solution
is
a
lightweight
solution.
B
That's
specifically
for
production
monitoring
and
we
use
anomaly
detection
to
find
problems
faster,
and
we
offer
this
as
SAS
as
well
as
on-premise,
and
we
are
firmly
embedded
in
the
container
ecosystem
as
an
open
shift,
prime
partner,
but
also
docker
ecosystem
technology
partner.
So
with
that,
let's
talk
a
bit
of
how
crew
skill
fits
in
the
open
shift
ecosystem
and
and
to
do
that
at
first
have
a
look
at
the
problem
that
the
open
ship
tries
to
solve.
B
So
look
when
we
look
at
the
evolution
of
application
architectures,
we
see
it
clear
shift
from
monolithic
applications
that
are
running
on
physical
servers
or
VMs
in
a
data
center,
typically
towards
much
more
agile
development.
These
days
of
micro
services
that
are
supported
by
containers
and
cloud
infrastructure
and
as
we
all
know,
on
the
infrastructure
side,
containers
have
become
a
fundamental
building
block
for
micro
service.
They
offer
an
attractive
way
to
build
and
package
them
to
ship
them
into
production.
B
All
this
by
packing
all
of
the
dependencies
in
inside
of
our
containers
but
running
containers
in
production
and
half
scale
does
pose
a
new
set
of
challenges
compared
to
using
them
just
for
development.
You
have
to
start
worrying
about
things
like
orchestration
automation,
networking
and
storage
security,
hosting
disaster
recovery,
logging
and
monitoring
and
general
application
performance.
B
So
these
are
all
things
that
that
when
you
move
it
to
production,
our
questions,
you
have
to
ask
yourself-
and
this
is
actually
where
red
hat
openshift
comes
in,
because
it
offers
a
package
container
platform
built
on
da
current
queue,
benitez
and
various
all
components.
To
solve
many
of
these
issues
that
I
mentioned
in
the
previous
slide
and
as
part
of
the
platform,
there
are
also
some
basic
logs
and
metrics,
but
openshift
also
has
a
strong
ecosystem
around
it
for
more
advanced
capabilities,
and
this
is
exactly
where
Co
skill
comes
in.
B
Now,
let's
look
into
the
monitoring
aspect
in
a
bit
more
detail.
I
think
we
all
realize
that
monitoring
is
an
important
part
of
running
an
application
in
production.
Yet
it
seems
that
many
people
are
still
struggling
with
this
when
it
comes
to
containerized
application.
This
is
data
from
a
recent
survey
by
cloud
foundry
on
the
top
challenges
when
it
comes
to
running
containers
and
micro
services
in
production,
and
we
can
clearly
see
that
that
monitoring
is
pretty
high
up
here,
just
after
container
management
actually
and
monitoring
and
troubling
shooting
microservices.
B
So
let's
look
at
these
challenges
in
a
bit
more
detail,
so
the
first
obvious
of
reservation
is,
of
course,
that
the
number
of
containers
is
much
higher
than
the
number
of
servers,
so
the
number
of
instances
to
increases
by
an
order
of
magnitude
when
we
use
containers
in
a
typical
customer
environment
that
we
see,
customers
use
up
to
10
or
20
containers
pro
hose,
but
we
have
even
seen
cases
with
200
container.
So
this
is
an
immediate
multiplication
of
the
number
of
metrics
to
monitor.
B
The
second
aspect
is
that
containers
can
be
very
short-lived,
that
this
dynamic
aspect
also
introduces
challenges
in
rapidly
picking
up
matrix
from
containers
setting
relevant
alerts,
as
well
as
understanding
the
impact
of
container
life
cycles
on
performance.
A
third
aspect
is
when
we
compare
container
environments
with
monolithic
applications,
we
see
a
much
larger
diversity
of
application
technologies
used
across
containers
or
where
people
typically
use
the
technologies
that
best
suited
for
the
use
case
of
a
particular
microservice.
This
all
comes
together
in
an
overload
of
metrics
to
monitor
and
alert
on.
B
If
you
look
a
bit
closer
on
how
we
would
traditionally
monitor
a
monolithic
application
and
compare
that
to
a
micro
services
application,
we
see
that
in
a
monolithic
application
we
typically
have.
Three
monetary
components
is
perhaps
a
bit
simplified
with
at
the
infrastructure
layer.
There
are
traditional
system
monitoring
tools
where
you
look
at
typical
resource
metrics
at
the
application
layer.
B
Typically,
you
would
use
an
ATM
tool
where
you
gain
insight
in
the
internals
of
your
monolithic
application
and,
finally,
the
end
user
experience
is
typically
monitored
as
well,
using
some
form
of
browser,
instrumentation
or
another
technique
now
from
micro
services.
However,
on
a
platform
such
as
openshift,
we
see
that
an
additional
layer
is
introduced
and
we
now
have
a
lot
of
smaller
and
lightweight
and
loosely
coupled
application
components
that
we
need
to
monitor.
B
So,
in
order
to
understand
application
performance,
we
do
not
only
need
to
monitor
these
container
instances
themselves,
but
also
the
way
that
they
are
orchestrator,
orchestrated
the
way
that
they
are
tied
to
services
and
finally,
also
the
services
running
inside
the
containers,
and
this
is
actually
where
most
APM
to
start
to
have
difficulties,
and
this
is
also
the
opinion
of
Cameron
hate,
who
is
a
research
VP
at
Gartner
and
one
of
his
recent
reports.
He
also
claims
that
these
new
application
architecture,
including
containers
and
micro
servers,
are
really
stressing
the
capabilities
of
APM
tools.
B
Now,
why
is
this
really
well?
First
of
all,
most
likely
em
tools.
They
were
designed,
maybe
five
or
ten
years
ago,
specifically
for
monolithic
applications,
for
example
written
in
Java,
NAT
and
so
on,
and
because
of
the
nature
of
monolithic
applications.
Understanding
what's
really
going
on
inside
your
application
and
interaction
between
application
components
requires
you
to
have
code
level
visibility
of
the
application.
B
In
fact,
most
of
these
heavy
weight
monitoring
tools,
they
will
require
you
to
install
an
agent
inside
your
container,
and
this
is
really
an
anti-pattern,
since
containers
should
be
limited
as
much
as
possible
to
a
single
process.
You
don't
want
your
pollute
your
container
by
packaging,
an
extra
agent
in
there.
Then
a
final
aspect
is
that
most
existing
tools-
they
have
a
hard
time
keeping
up
with
dynamic
environments,
especially
if
they
use
static,
alerting
below
I'll.
Tell
it
more
about
that
bit
later.
B
So,
if
you're
looking
for
monitoring
tool
for
a
containerized
environment,
what
visibility
should
it
really
give
us,
though,
what
metrics
should
we
monitor
at
the
host
level?
We
obviously
still
want
to
monitor
resource
metrics,
typical
things,
CPU
memory,
disk
and
so
on.
Typically,
you
would
use
an
orchestration
tool
in
case
of
KU
benitez.
It's
a
flavor
of
cooper
natives,
but
there
are
other
orchestrators
out
there
and
also
at
this
level.
You
want
to
monitor
things
such
as
the
amount
of
containers
how
they
are
set
up,
relationships
between
services
and
containers.
B
This
gives
you
more
service
oriented
visibility
like
which
container
runs,
which
servers
or
which
containers
are
impacted.
When
a
particular
service
starts
degrading
at
the
container
layer
itself,
we
also
want
to
keep
track
of
relevant
resource,
metrics,
TPU
memory
and
so
on
and
as
well
as
when
these
containers
are
started
and
stopped
to
their
life
cycles,
and
it
doesn't
stop
at
resource
matrix.
B
Of
course,
we
also
want
to
know
the
requests
going
in
and
out
of
our
containers,
as
well
as
application
metrics
from
those
or
services
that
are
running
in
our
containers,
and
these
could
be
things
like
engine
X
or
rattus
or
MySQL.
So
all
of
these
services.
You
also
want
to
monitor
quite
in
detail
and
then
finally,
our
application
will
serve
some
end-users
and
ultimately
also
a
business,
and
we
also
want
to
monitor
relevant
metrics
from
from
that
perspective,
this
could
be
things
like
page
load
times
or
conversion
rates,
and
things
like
that.
B
So
this
those
are
the
set
of
matter,
expect
that
you
want
to
monitor
and
how
does
coach
Cal
handle
that?
So,
what's
our
approach
to
monitoring
microservices
and
contain
as
well,
we
run
one
lightweight
agent
/
host.
It
can
be
either
installed
directly
on
the
operating
system
or
in
a
privileged
container,
and
with
that
agent
we
can
get
a
server
resource
metric
at
the
OS
level.
We
can
also
get
a
container
and
cluster
resource
metrics,
typically
using
the
API
is
from
docker
and
the
orchestrators
an
open
shift.
B
Now
there
are
other
tools
that
do
that
as
well.
The
core
skill
actually
goes
one
step
further,
because
we
have
a
very
rich
library
of
plug-ins
for
various
application
components,
and
we
can
configure
these
in
such
a
way
that,
first
of
all,
any
new
container
that
runs
a
service
for
which
we
have
a
plug-in
will
automatically
get
monitored
when
a
container
starts,
and,
secondly,
we
will
get
very
application
specific
metrics
from
these
containers
without
the
need
to
install
an
agent
in
the
container,
and
this
is
quite
unique
capability.
B
In
addition,
Coast
girl
also
has
a
real
user
monitoring
component,
where
we
use
a
little
JavaScript
snippet
to
get
the
end
user
experience
metrics
from
the
from
the
web
browser.
We
also
allow
you
to
track
unlimited
custom
metrics.
We
have
various
ways
of
doing
that:
scripting,
plug-in
or
logging
leveraging
our
api's
and
on
all
of
these
metrics-
and
this
is
the
important
part,
we're
an
automated
anomaly
detection
that
lets
us
quickly,
detect
abnormal
behavior
final
point.
We
also
track
relevant
infrastructure
changes.
B
This
provides
extra
context
in
what's
going
on
in
your
environment,
things
like
container
lifecycle,
events
or
art
or
events
from
your
orchestrator.
We
also
things
like
new
deployments
or
configuration
changes.
These
are
all
things
that
are
happening
in
your
environment
and
they
can
have
an
impact
on
performance
and
by
also
capturing
these
events,
with
various
integrations
that
we
offer
we
offer
extra
context
in
in
those
half
things.
So
this
picture
is
a
visual
representation
of
the
coast
killed
platform
with
our
lightweight
agent
and
all
of
the
plugins.
B
Well,
not
actually,
all
that
representative
part
of
the
plugins
that
we
support
the
real
user
monitoring
component
and
the
integration
for
various
custom,
metrics
and
events,
and
with
this
data
we
can
obviously
create
nice
dashboards.
We
are
monitoring
to,
after
all,
but
also
automatically
detect
abnormal
behavior
using
our
anomaly
detection.
So
anomaly,
detection
I
want
to
spend
a
little
bit
more
time
on
it,
since
it
is
one
of
the
differentiating
features
of,
of
course,
kale
adverse
straight
again.
B
Why
it's
so
important
to
use
automated
techniques
such
as
anomaly,
detection
I
just
have
a
look
at
the
explosion
in
the
amount
of
metrics
to
monitor
when
we
compare
a
traditional
monolithic
application
with
a
containerized
environment.
Basically,
the
amount
of
container
acts
as
a
multiplication
in
the
amounts
of
metrics
and.
B
B
Alerts
don't
work
or
a
bad.
They
did.
They
actually
work
very
well
for
well
understood,
consolidated
metrics,
for
example,
number
of
visitors
on
your
sides
or
some
business
matters
that
you
have
a
good
hold
on,
but
not
necessarily
for
those
thousands
of
metrics
that
are
coming
from
your
containers
and
n
microservices,
and
these
are
not
the
only
limitations,
the
amount
of
data.
There
are
other
limitations
as
well
like
how
to
deal
with
dynamic
environments
that
which
require
you
to
constantly
reset
or
reconfigure
your
alerts.
B
So
if
we
look
at
the
definition
of
an
anomaly
which
is
basically
a
deviation
from
what
is
normal
or
expected,
this
means
that
if
we
can
get
pretty
good
at
predicting
expected
behavior,
we
can
also
get
pretty
good
at
detecting
anomalies.
This
is
basically
what
we
have
focused
on
at
atco
scale.
We
will
principally
look
at
historic
behavior
of
all
the
metrics
we
monitor
and
make
a
prediction
based
on
that.
B
We
also
include
a
fair
amount
of
domain
knowledge
for
that
and
if
we
see
a
deviation
from
this
expected,
behavior
will
basically
give
it
an
anomaly
score
depending
on
how
large
deviation
is,
and
then
we
will
alert
when
this
anomaly
score
excels
exceeds
a
certain
threshold
value.
Now
this
is
simple
explanation.
There's
a
lot
more
sophisticated
things
going
on,
but
this
is
the
basic
concept
that
we
apply
now.
B
What
we
will
also
do
is
we
will
group
metrics
on
which
anomalies
are
occurring
at
the
same
time,
to
give
you
really
a
better
understanding
on
what's
happening
in
your
environment,
so
this
is
an
example
screenshot,
but
I
think
Samuel
will
illustrate
it
a
bit
better
in
the
demo
later,
where
we
see
that
on
our
anomaly
timeline,
we
have
different
metrics
at
the
server
user
and
business
level
showing
abnormal
behavior,
and
basically
we
see
that
certain
services
service,
certain
containers
are
overloaded.
This
crease
creates
an
increase
in
latency
on
our
website.
B
We
also
see
that
there's
more
views
on
our
website
and
basically
also
our
conversion
rate,
isn't
impacted.
So
this
consolidated
view
giving
you
all
metrics
in
together
gives
you
really
a
good
view
on
what's
happening
in
your
environment,
a
lot
of
context
to
understand
what
what
a
performance
problem
actually
is:
we're
also
applying
outlier
detection,
which
is
a
different
form
of
a
normally
detection,
where
we
look
specifically
at
matrix
from
similar
instances
in
a
cluster
such
as
containers
that
are
supported
by
the
same
or
supporting
the
same
service.
B
So
if
we
see
any
containers
with
a
different
behavior
compared
to
the
rest
of
the
cluster
week,
also
alert
on
it.
In
this
example,
we
highlight
containers
with
increased
memory
usage
in
general.
This
kind
of
outlier
detection
requires
less
of
a
learning
period
than
a
nobody
detection
on
time
series
d
theta,
but
the
basic
ID
really
remains
the
same
that
you
can
quickly
detect
changes
in
in
performance
without
having
set
up
a
lot
of
manual
alerts.
That's
the
basic
premise
of
Coast
kill
so
I'm
going
to
end.
My
part
of
the
of
the
presentation
here.
B
A
Think
Peter
might
have
to
stop
sharing
his
screen.
B
A
C
C
Perfect,
thank
you.
So
yeah
welcome
to
the
coast.
Scale
applications.
If
you
create
a
trowel
with
us.
This
is
one
of
the
first
screens
you
will
see
after
creating
your
account.
It
shows
you
the
four
main
components
of
the
coast
kill
platform.
So
we
have
our
real
user
monitoring.
As
Peter
talked
about.
We
have
integration
with
a
lot
of
third
party
services.
C
We
also
have
a
lot
of
ways
to
do
really:
custom
integration,
both
with
conflict
management,
as
we
have
a
command-line
tool
and
API
and
other
methods
of
really
binding
your
system,
together
with
our
monitoring,
but
today
I'm
mostly
going
to
talk
about
the
agents,
because
the
agent
of
course
will
be
used
to
get
server
data
and
specifically
for
this
demo,
openshift
information
and
docker
information,
so
I'm
going
to
click
through
and
I
arrived
on,
our
agents
page.
So
this
is
the
page
where
you
could
see
all
your
servers
and
all
the
agents
you've
configured
here.
C
So
you
probably
recognize
here
the
most
popular
open
source
tools,
and
specifically,
we
of
course
have
support
for
open
shift
and
darker,
but
I'm
going
to
show
you
the
doc
I'm
going
to
go
in
to
a
little
bit
more
detail
about
the
darker
configuration
a
little
bit
later
now,
because
I've
selected
an
open
shift
plugin
or
a
docker
plugin.
More
specifically,
I
get
two
options
of
installing
a
Coast
kill
agent.
C
C
Now,
specifically
for
open
shift.
We
have
a
configuration
available
that
will
allow
you
to
just
add
a
demon
set
to
your
open
shift
environment,
and
then
the
agent
will
automatically
be
deployed
on
every
server.
That's
also
an
open
shift,
so
here
are
quickly
opened.
The
open
shifts,
weapons
ur
face
and
the
scope
costco
project
contains
my
agent
and
you
see
I've
unknown
for
servers
running
each
one
of
our
agents
and
I'm
going
to
quickly
also
show
you
the
the
configuration
for
it.
So
here
is
the
demon
set
and.
C
Now
what?
What
information
do
we
get
from
the
core
skill
agent?
Peter
also
mentioned
it
all
really
a
little
bit
in
the
slide
so
because
open
shift
in
the
background
runs
q,
benitez,
environment
you're,
going
to
see
a
lot
of
the
same
concept,
so
we
have
replication
controllers
or
we
get
the
data
from
replication
controllers.
We
get
the
data
from
the
services,
we
get
all
the
containers
of
the
running
and
where
are
they
running?
C
We
also
have
a
very
powerful
event
system,
so
here,
for
example,
you
can
see
our
replication
controller
overview
and
your
every
time
you
have
an
event
of
insufficient
replicas,
meaning
that
probably
a
container
has
crashed
somewhere.
You
can
clearly
see
this
with
our
events
and
you
can
go
and
research
what
happens
below.
We
have
a
little
bit
our
container
overview
so
once
on
the
service
level
and
once
on
a
host
level.
So
you
see
here,
I
have
54
application
controllers.
C
Some
running
five
containers
I
can
clearly
see
which
are
more
the
help
or
containers
started
by
Q
benitez,
and
you
may
have
noticed
here
that
we
sometimes
have
a
different
color
for
a
container.
This
is
because
you
can
select
a
metric
for
each
of
these
widgets
and
then
set
the
trash
hold,
which
you
choose
yourself,
I
think
in
this
case
we've
selected
thirty
and
fifty
percent
CP
usage
and
then,
depending
on
the
value
we
get
back
from
the
container
we're
going
to
color
code
the
container
here.
C
So
this
way
you
can
quickly
see
if
some
containers,
or
maybe
using
too
much
cpu
as
in
this
case,
is
clear
that
we
have
ninety
seven
seven
percent
CP
usage.
We
might
be
impacting
all
the
other
containers
that
are
running
on
the
same
machine,
so
this
really
gives
you
a
little
bit
of
an
overview
of
the
entire
environment.
C
Now
the
next
dashboard
is
a
little
bit
more
focused.
So
this
dashboards,
as
you
can
see
here
at
the
top
with
our
dimension
system,
is
well
has
just
the
data
from
the
MongoDB
replication
controllers.
I
can
quickly
change
this
dashboard
also
if
I
wish,
if
I
want
to
see
data
from
other
replication
controllers,
but
here
on
this
dashboard,
I
get
a
little
bit
of
information
on
the
container
life
cycle,
so
I
can
see
when
containers
were
started
when
they
were
stopped.
What
the
exit
code
was
also
in
which
Don,
which
machines
are
they're
running.
C
What
the
cpu
usage
is
memory
usage
network
received
and
sent.
Here
again,
you
can
set
your
own
thresholds,
so
it's
very,
very
visual
way
of
seeing
if
the
container
is
performing
as
you
would
expect,
and
then
we
have
the
the
event
system
that
we
talked
about
already.
So
every
time
a
container
is
started,
it's
gonna
first
send
the
ready
signal
and
then
running
single,
saying:
okay,
I'm
ready
to
get
traffic
from
other
servers.
In
this
case,
the
next
part
I
want
to
show
you
is
a
little
bit
more
general.
C
So
this
is
a
dashboard
made
by
one
of
our
customers
and
they've
chosen
to
put
a
lot
of
their
services
together.
So
they
run
a
micro
service
environment
and
they
have
a
comment
API,
so
comments
micra
serve
as
a
product.
Api
and
check
out
the
API
all
running
on
openshift
and
they've
chosen
to
put
this
information
very
clearly
on
their
first
dashboard
that
they
open.
So
this
is
their
home
dashboard.
C
C
So,
let's
say
I
see
that
my
page
in
time
is
a
little
bit
too
high.
I
can
click
through
on
this.
This
tile,
we
call
it
and
I
arrive
on
a
dashboard
that
was
created
specifically
with
real
user
monitoring
information.
So
I
get
the
page
views
coming
from
there.
I
get
the
page
load
time.
I
get
my
most
popular
pages
and
my
the
slowest
pages
now
it
might
be
here
that
you
see
another
page,
so
there's
a
little
bit
too
slow.
C
Here
again
you
have
the
option
to
click
through
ones,
war,
because
this
is
still
the
front
end
of
the
user
and
now
I've
clicked
and
I
arrived
on
the
micro
service
level,
so
I
get
the
web
micra
straight
from
the
containers
that
are
delivering
this
web
request.
I
get
latency
and
I
get
the
error
rate.
It's
just
to
show
you
how
easy
this
to
link
dashboards
together
and
then
make
a
system
that
shows
you.
The
information
you
need
here.
Also.
C
We
have
a
couple,
the
alerts
that
were
in
this
time
frame
the
anomalies,
free
memory,
CPU
load
and
then
another
way
of
using
our
event
system.
So
here
this
customer
has
integrated
with
our
mail
chimp
integration.
So
every
time
they
send
the
mailing
campaign,
it's
going
to
be
added
to
co
scale
and
they'll
be
able
to
link
it
to
perform
this
problems
baby
or
changes
in
the
metrics.
They
do
the
same
for
software
deployment.
C
C
Peter
has
mentioned
that
co
scale
is
a
lightweight
monitoring
platform,
so
we
aim
to
have
a
very
low
resource
usage
on
the
server,
so
we
are
monitoring
and
because
of
that
reading
reason,
we've
made
certain
decisions
in
our
design
process,
for
example,
we're
not
going
to
push
the
cpu
load
or
the
CP
usage
of
every
single
process
running
on
my
machine,
but
sometimes
that's
very
valuable
information
here,
I
I
see
clear
spanking.
My
cpu
and
I
would
like
to
know
what
happened
at
this
time.
It's
for
this
reason
that
we
added
the
forensic
system.
C
The
forensic
system
is
a
small
lightweight
anomaly
detection
running
in
the
agent
and
when
there
is
a
southern
change,
it's
gonna
it's
going
to
take
a
snapshot,
take
a
picture
of
the
system
and
send
it
back
to
our
platform
and
then
I
can
research.
Okay.
This
spike
was
caused
by
the
docker
demon,
probably
deploying
a
new
image
or
something
else
now.
I
want
to
jump
back
to
the
agents
page,
because
I
said
I
wanted
to.
I
was
going
to
explain
our
dr
monitoring
a
little
bit
more,
especially
because
we
do
in
container
monitoring.
C
So
the
idea
there
is
the
plugins
you
saw
in
the
beginning
that
are
available
if
it's
just
installed
on
the
host
operating
system.
All
these
plugins
can
also
be
used
to
monitor,
what's
happening
inside
the
container.
So
let's
say
five:
an
Apache
can
a
container
running
with
the
apache
software.
I
can
get
metrics
from
that
apache
and
then
monitor
actually
how
its
performing
the
way
we
do.
This
is
I'm
going
to
quickly
open
the
configuration
of
our
docker
image.
Our
darker
plug-in
excuse
me.
C
C
So
the
first
thing
is
that
it
scales
with
your
containers.
If
you're
going
from
one
elasticsearch
container
25,
it's
not
a
problem.
Our
docker
plugin
is
going
to
detect
that
it's
going
to
start
for
more
elastic
search,
plugins
and
the
data
is
going
to
be
gathered
and
you're
going
to
be
able
to
see
the
data
coming
from
each
individual
container
or
all
together
on
an
image
level
or
in
a
tag
level.
So
we
really
allow
you
to
compare
data
also
from
previous
versions
to
the
new
version.
C
So
it's
really
a
powerful
system,
and
the
second
advantage
is
that,
because
we
start
that
plugin
within
the
container
itself,
the
configuration
becomes
a
little
bit
easier.
So
give
you
an
example.
So
we
have
the
configuration
for
an
engine
x,
plugin
coast,
killer,
gets
a
lot
of
his
information
from
AP
is
and
status
call.
C
So
we
need
access
to
the
engine,
X
global
status,
page
or
status
page,
and
you
might
have
noticed
here
that
I
use,
localhost
I,
hope
it's
clear
on
your
screen,
but
I
don't
mount
any
well
I,
don't
need
to
mount
any
ports,
I
know
and
don't
need
to
do
any
special
configuration
to
be
able
to
monitor
this
image
know
this
local
host
because
we
start
the
plug-in
within
the
container
is
just
the
container
itself.
So
this
port
is
just
accessible
in
this
case
8000
without
any
additional
configuration.
C
The
other
advantage
there
is.
This
is
the
same
for
file
system,
so
you
don't
need
to
mount
any
local
disks
on
your
host
machine
to
be
able
to
access
this.
This
access
log.
No,
this
will
just
work
and
the
moment
your
container
stopped
this
as
access
log
will
be
deleted.
But
that's
fine
because
khoshgele,
as
at
that
moment,
already
gathered
all
the
information
from
it.
C
Really
a
really
handy
way
to
monitor
live,
running
containers
now,
I
want
to
show
you
a
couple
of
dashboards
that
the
show
a
little
bit
the
the
advantage
of
having
the
system.
So
here
I
have
a
memcache
dashboard,
I've
general
metrics,
coming
from,
and
memcache
connections
to
memcache
Network
bites
receive
the
commands
and
hits
and
misses.
Now
you
see
that
the
commands
had
some
changes
in
its
metric,
so
we
used
to
be
around
800
commands
a
second
and
we
drop
down
to
400,
but
we
had
some
spikes,
which
is
a
little
bit
strange.
C
So
what
I
can
do
is
I
can
zoom
in
a
little
bit-
and
I
can-
I
can
clearly
see
here
that
two
containers
running
and
all
of
a
sudden
one
of
the
containers
stardom
is
behaving
a
bit
because
it's
crashed,
so
the
other
container
had
to
handle
a
lot
more
data
and
if
I
look
at
the
events
I'm
going
to
see
there
were
too
few
replicas,
the
one
was
missing
a
little
bit
later,
a
container
was
started,
so
we
see
the
new
line.
Popping
up
here
are
no
manual.
C
C
Then
the
other
example
is
our
engine
X.
So
here
again
we
get
the
general
dashboard
which
you
also
get.
If
you
create
a
coastal
application
with
the
amount
of
connections,
the
amount
of
containers,
the
average
latency
the
request
straight
and
a
nice
heat
map.
That
shows
me
the
performance
of
my
containers
over
time,
so
I
can
quickly
identify,
maybe
those
that
are
not
performing
as
I'd
like
and
then
here
we
have
a
more
dashboard.
That
shows
me
really
information
coming
from
yeah
that
the
latency
of
my
website
and
the
latency
of
all
my
requests.
C
So
here
we
have
really
a
lot
of
containers
delivering
my
website.
You
see
at
one
point
we
added
some
new
containers,
because
there
seems
to
me
that
there
was
an
issue.
These
are
probably
handled
by
openshift
itself
and
then
these
new
containers
we
start
delivering
the
website
to
the
customer
in
the
data,
starts
rolling
in.
C
Okay
and
so
now,
the
last
thing
I
want
to
show
you
cuz
peter
also
mentioned
this,
and
I
think
it's
a
very
good
point
that
in
these
new
environments,
you
have
so
many
metrics
to
march
in
so
many
containers
that
it
becomes
very
difficult
to
set
meaningful
static
alerts
that
don't
overflow
your
mailbox.
But
at
the
same
time
you
still
need
some
warning
that
something
happened
in
your
system
and
there.
We
think
that
the
anomaly
detection
can
really
add
to
these
container
environments.
So
peter
also
showed
this.
C
This
is
the
same
anomaly
as
we
saw
in
the
presentation,
so
we
have
the
anomaly
on
three
levels:
we
group
it
so
you
can
see
here
there
was
a
nominee
on
latency.
We
had
a
couple
of
anomalies
on
the
request
rate
and
then
we
add
anomaly
on
cpu
of
those
both
servers,
I'm
going
to
show
some
examples
for
coming
from
containers,
but
just
to
show
you
the
where
the
screenshots
from
from
peter
came
from.
So
we
can
see
that
the
latency
of
my
website
went
up.
We
have
a
nice
dot
plot.
C
This
is
on
different
pages
by
the
way,
something
to
note
that
Coast
kill
automatically
builds
a
tree
of
your
application
and
it's
going
to
do
anomaly,
detection
on
all
individual
pages.
So
if
one
page
changes,
you'll
still
be
able
to
see
this
with
the
anomaly
detection,
and
then
we
also
have
an
anomaly
on
cpu
usage
and
yeah.
You
can
clearly
went
from
thirty
to
fifty
percent
and
this
is
I
think
a
very
good
example,
because
normally
you
wouldn't
set
a
static
alert
at
50
or
55
percent.
C
You
would
say
that
70,
80
or
90,
even
but
still,
this
is
an
abnormal
behavior
of
your
server,
and
you
would
like
to
see
okay,
what
happened
at
this
time
with
the
forensic
psych
and
then
quickly
research
that
engine
X
was
using
more
CPU,
and
this,
of
course
makes
sense.
I
have
more
visitors,
though
my
web
server
has
more
work
a
different
example,
but
more
on
a
business
level.
So
we
did
a
large
proof
of
concept
with
the
customer
in
the
u.s..
C
They
sent
us
a
lot
of
their
business
theta
and
our
anomaly
detection
was
applied
to
it
and
we
were
able
to
find
small
issues
like
here
in
this
case,
where
the
amount
of
orders
per
minute
all
of
a
sudden
dropped
and
if
we
zoom
in
a
little
bit
you'll
see
that
they
dropped
to
almost
zero.
So
this
was
a
big
impact
for
them
and
with
the
anomaly
detection,
they
were
able
to
identify
and
fix
it
pretty
quickly.
That
was
the
last
example.
I
have
two
anomalies
here,
one
on
a
user
level.
C
So
this
is
the
request,
trade
and
one
on
server
level,
so
I'm
going
to
quickly
open
the
user
one.
So
we
went
from
around
nine
point:
five
requests
to
14.
You
see
the
anomaly
detection
system
was
able
to
quickly
identify
this,
and
if
we
take
a
look
at
the
nominee
on
cpu
usage,
this
was
detected
on
an
Apache
container
and
then
yeah.
This
is
a
very
clear
normally
where
we
go
from
zero
percent
or
very
low
CPU
such
to
very
high
in
a
very
short
time,
but
again
it
was
automatically
detected.
C
A
But
thanks
Samuel,
that's
that's
a
great
overview
of
how
Co
scale
works
and
showcases
the
anomalies,
Bob
and
Lee
see
if
I
can
find
the
first
question
we
had
from
Luke
response
was
asking
about
custom
metrics
from
apps
and
are:
are
they
supported
on
as
a
custom
plugins,
because
you
have
a
lot
of
pre
pre
configured
plugins
in
there?
But
if
someone
wants
something
specific
for
their
own
apps,
how
would
somebody
go
about
customizing,
a
plug-in
on
creating
a
custom
plug-in.
C
C
Sorry
to
your
binary
and
then
co
scale
or
the
agent
run
this
script
every
minute
or
every
five
minutes
this
you
can
set
up
yourself
and
then
you
can
push
data
back
to
Coast
kill
this
way.
So
this
is
really
more
of
a
poll.
You
can
also
push
metrics
with
our
command
line
tool,
so
together
with
our
agent,
if
you
install
it
as
a
package
or
we
have
a
container
available,
we
well,
we
have
a
container
available
with
a
command
line
tool
with
that
you
can
easily
push
data.
C
C
Show
it
after
and
then
we
also
have
a
plug-in,
which
is
we
call
our
log
plug-in,
and
this
is
a
really
powerful
tool.
If
you
have
existing
log
files
that
contain
information
that
you
need
could
be
latency
or
just
a
number.
You
can
use
regular
expressions
to
get
that
information
out
of
there,
but
this
is
really
an
easy
way
to
get
theta
without
having
to
do
large
changes
to
your
environment,
then
we
also
have
the
option
to
push
threatened
data
through
study
and
a
core
skill
API.
C
If
you
really
want
to
go
and
do
a
custom
integration,
we
have
a
very
mature
API
available
and
that
you
can
use
I'm
going
to
quickly
show
the
command
line
tool.
So
here's
an
example
of
the
command
line
tool
to
insert
data
where
this
is
the
metric
names,
the
level
and
then
the
value,
and
now
just
to
show
you.
You
can
find
more
information
than
this
in
our
documentation.
Just
docs.google.com.
B
This
is
Peter
I
just
wanted
to
say,
there's
also
a
few
good
examples
on
our
block
for
working
with
custom
metrics
and
as
part
of
that
question.
I
also
saw
that
this
person
asked
to
monitor
specific
transaction
and
points
like
a
specific
request,
and
for
that
it's
also
worth
noting
that
we
recently
introduced
a
new
feature
which
is
basically
active
checks
that
you
can
really
put
in
our
plugins
to
say.
B
A
And
the
other
question
and
Frederick
sort
of
answered
it
in
in
the
chat
as
well.
It
was
whether
or
not
the
anomalies
are
based
on
stairs
standard
Garrett's
or
are
configurable
via
thresholds
and
predicted
baselines.
Maybe
if
you
could
talk
a
little,
this
is
an
important
piece.
Oh
yeah
back.
D
D
Let's
say,
for
example,
is
CPU
usage
and
it's
tightly
related,
mostly
with
the
request
rate
that
is
coming
in
then
we
will
create
a
model
that
contains
both
tcp
you
and
the
request
street,
and
we
will
make
a
baseline
of
that
that
is
based
on
that
goes
with
time.
So
you
have
the
per
hour
derivatives
you
have
per
per
day
and
so
on,
and
so
will
create
a
different
type
of
analysis
for
each
of
these
metrics
like,
for
example,
a
memory
usage
is
not
that
dynamic.
D
You
typically
see
it
rising
and
going
down,
but
it's
not
as
fast
as,
for
example,
CPU
usage.
So
it's
a
completely
different
model
that
we
use
then
to
will
automatically
detect
based
on
the
metric
based
on
the
data
which
which
model
is
most
fit
for
this
type
of
data
and
then
generate
the
analysis.
Based
on
that,
there
is
no
configuration
needed
to
be
done,
you
don't
have
to
set
threshold
or
you
don't
have
to
specify
what
your
metric
will
look
like.
It
will
be
automatically
detected
and
we
will
have
an
automatic
analysis
for
that.
A
B
Was
a
question
that
that
we
answered
offline,
which
was
actually
a
good
question?
I
would
like
to
answer
it
in
public,
which
is
regarding
our
architecture
and
where
we
store
data
so
want
to
be
very
open
about
our
architecture.
So
I
opened
a
slide
here.
So
basically
we
don't
use
very
modern
application
components
or
metric
data
is
storing
in
Cassandra
event,
data
storing,
elasticsearch,
also
some
some
metadata
in
Postgres.
B
Our
entire
architecture
is
such
that
it
can
be
perfectly
horizontally
scaled.
It
can
also
be
deployed
on
premise
in
a
doc
rised
environment,
so
that
that
also
makes
it
very
easy
to
set
up
and
and
scale.
We
recently
did
a
proof
of
concept
where
we
actually
handled
over
a
million
data
points
per
second.
So
that's
some
more
context
on
on
our
architecture.
Wow.
A
Very
nice
I'm,
almost
not
infinitely
scalable,
but
very
nice
I
think
that
answer
is
maybe
waleed's
question
about
where
the
metric
data
data
stores,
except
like
it
is
cassandra
elastic
scripts
or
so
that's
using
some
of
the
latest
and
greatest
fun
bits
that
are
also
part
and
parcel
of
broken
shift
as
well.
So
we
reaffirm
illy
ER
with
a
lot
of
that,
give
everybody
a
few
more
questions,
see
if
there's
any
other
questions.
B
Yeah
I
want
to
thank
everybody
for
attending
and,
if
you're
interested
to
dry
out
our
solution
on
openshift,
so
I
said
we
were
a
prime
partner
and
our
solution
is
available
for
everybody.
Just
go
to
co,
skillet
come
free
trial
and
you
can
try
it
out
for
yourself
during
30
days
and
if
you
have
any
further
questions.
Our
contact
details
are
here
at
the
bottom
of
the
slide,
so
feel
free
to
reach
out
all.