►
Description
Frederick Ryckbosch, founder and CTO of CoScale joins us for a discussion of performance considerations of running applications on OpenShift in production, and how to address these with CoScale’s container monitoring platform. A detailed demo will be provided including installation and configuration for OpenShift-specific insights.
A
Well,
hello,
everybody
and
welcome
again
to
another
openshift
Commons
briefing,
this
time,
I'm
really
pleased
to
have
with
us
the
folks
from
post
scale,
credit
refresh
and
Samuel
Van
Damme
who's
been
with
us
before
they're,
going
to
talk
about
using
some
of
their
co
scale,
tools
and
services
for
proactive
performance,
management
of
open
shift
and
Eddie's,
and
so
I'm
gonna.
Let
them
introduce
themselves.
The
format
of
this
session
is
there's
a
chat,
ask
questions.
A
B
Now
we
will
start
with
a
couple
of
scenarios
that
we've
seen
happen
at
customers
and
what
the
effects
are
that
you
can
see
on
on
the
open
shift
environment
now.
I
know
that
Fred's
has
put
in
a
lot
of
time
this
week
to
set
up
a
very
nice
open
shift
environment.
Though
Fred,
could
you
show
us
a
little
bit
what
you've
set
up
for
us
today?
Sure.
C
So
let
me
go
to
the
open
shift,
your
wife
for
that,
though,
as
you
can
see
here,
we
have
a
lot
of
things
running.
We
have
my
sequel,
nginx
and
so
on
running.
But
what
this
thing
actually
does?
It's
a
word
count
application.
So
if
we
go
to
this
URL
here,
you
can
see
how
it
works.
So
here
you
can
submit
some
words
and
then
you
get
some
statistics
about
the
words
that
were
entered
before,
so
you
can
see
the
most
used
words,
the
most
entered
words
and
the
most
entrance
letters.
C
Actually,
so
it's
a
very
simple
application,
very
basic.
However,
somebody
done
well.
Over-Engineered
is
a
little
bit,
so
he
put
an
engine
egg
first
nginx
sends
traffic
to
the
receiver.
That
goes
with
things
into
RabbitMQ.
Then
there
is
a
service
that
picks
it
up
and
puts
it
into
my
sequel,
and
then
there
are
other
services
who
processes
data
and
put
it
into
Redis
and
that's
served
to
the
customer
again.
So
you
can
see
those
services
here
and
we
can
see
the
workers
below.
C
So
we
have
some
workers
for
calculate
and
letters,
calculating
the
words,
a
processor
and
so
on.
So
this
is
a
what
we're
doing
in
this
application.
Of
course,
I
also
installed
Co
skill
and
we're
monitoring
this
environment.
So
here
you
can
see
that
you
are
monitoring
openshift,
but
also
all
the
things
running
on
on
the
open
shift,
so
nginx
java
processes,
RabbitMQ
everything
I
just
mentioned.
C
B
C
So
for
creating
a
route
I,
just
click,
this
create
route
button,
so
I
created
the
route
for
nginx.
This
means
that
traffic
going
to
this
URL
will
be
sent
to
by
upshift
notes.
Once
the
traffic
arrives.
At
my
openshift
note,
the
openshift
proxy
will
have
a
look
at
what
you
are
Elvis
and
based
on
the
URL.
So
it
sees
that
it
is
nginx
words
dot
this
thing,
then
it
will
send
it
to
the
nginx
service.
Then
the
nginx
service
will
receive
that
request.
C
If
we
have
a
look
at
the
nginx
service
and
the
code,
we
can
actually
see
that
the
nginx
service
talks
to
the
receiver,
and
here
we
provide
an
environment
variable.
So
this
is
some
HP
code
for
getting
an
environment
variable
so
using
an
environment
variable
resets,
the
receiver
host.
In
our
case
on
this
cluster,
the
environment
variable
is
just
set
to
receiver,
because
the
receiver
is
in
the
same,
it's
in
the
same
namespace
as
the
nginx
openshift,
we'll
add
it
to
DNS
and
nginx
will
automatically
be
able
to
resolve
the
receiver.
C
C
It
depends
on
how
you
talk
to
I
think
so
there
are
a
lot
of
services
in
here
and
they
all
have
their
own
purpose,
so
their
their
single
purpose.
So
in
that
aspect
this
is
a
micro
service
environment.
However,
some
people
would
say
that
the
data
has
to
be
isolated
per
micro
service.
So
if
you
look
at
the
redness,
the
redness
is
being
communicated
to
by
3
different
services,
so
some
people
would
say
that
violates
a
micro
service
architecture,
though
perhaps
not
for
everyone,
but
I
would
consider
this
micro
services.
Ok,.
B
C
So
we
are
seeing
both.
Basically,
some
customers
are
doing
completely
new
environments,
very
greenfield
technology
using
micro
services
to
get
things
going
and
they
used
open
ships
for
that
to
be
to
scale
it
easily
and
so
on.
Others
are
coming
from
more,
have
more
legacy
already
and
they
are
putting
their
monolithic
applications
into
openshift,
and
then
they
try
to
split
off
parts.
They
say:
ok,
this
component
looks
very
isolated,
so
we'll
create
a
separate
container
for
that
component
and
they
will
split
it
off.
C
C
On
a
business
site,
most
people
are
attracted
to
this,
because
if
you
split
everything
into
small
components
and
the
components
are
more
isolated,
we
can
iterate
faster
on
those
components.
So
you
can
make
sure
that
if
you
have
a
new
feature
that
you
want
to
add
to
one
component,
you
can
add
that
fast
without
affecting
the
whole
system.
C
So
you
don't
have
to
build
the
whole
system
again,
it's
a
lot
faster
to
get
this
into
production
and
from
a
technical
perspective,
they're
really
interesting,
because
now
these
smaller
services
can
be
distributed
to
distributed
across
multiple
nodes.
You
can
scale
them
really
easily
and
it's
more
resilient
against
failures
by
that,
because
you
have
multiple
instances
running
on
multiple
nodes,
so.
B
C
So
let
me
go
to
a
code
scale
dashboard
on
this
dashboard.
We
can
see
the
free
space
on
one
of
the
nodes
in
the
cluster,
so
I
will
mention
this
first,
so
I
have
a
cluster
of
ten
nodes.
I
have
to
infer
nodes,
I
have
three
masters
and
I
have
five
nodes.
Here
we
can
see
the
containers
that
are
running.
For
my
words,
names
is
selected,
this
name
space
and
we
can
see
the
containers
running
for
that
namespace.
C
Now,
if
you
look
at
this
graph
here,
you
can
see
the
free
disk
space
on
one
of
the
nodes
you
can
see.
The
disk
was
actually
almost
full
and
then
somebody
start
to
process
it
to
start
filling
up
the
disk
filling
up
the
disk,
and
at
this
point
we
notice
that
there
is
an
event
we
can
see
here
that
there
is
an
event.
The
status
of
note
1
changed
from
this
case
efficient,
is
to
note
out
of
this.
C
This
means
that
the
openshift
will
not
schedule
new
notes
in
Whidden
will
not,
but
on
this
container,
so,
however,
what
we
also
see
here
is
that
for
note
1
there
are
still
containers
running.
So
it's
not
because
it
went
out
of
disk
that
openshift
says
I
have
to
remove
all
the
pots
from
this.
From
this
note,
so
these
are
the
type
of
things
that
you
really
want
to
get
to
this
bill
right.
C
B
C
See
that
yeah!
So
that's
why
you
need
in
container
visibility.
You
have
to
have
look
at
the
services
that
are
running
inside
your
containers,
whether
these
are
performing
as
you
expected,
because
when
the
disk
is
running
full,
it
might
have
an
impact
on
these
services,
and
you
want
to
be
aware
of
that.
You
want
to
make
sure
that
it's
not
only
when
things
crash
that
you're
notified
you
want
to
know
in
advance.
Okay,
pretty.
B
B
C
C
If
I
look
at
the
deployment,
then,
if
I'm
fast
enough,
then
we
will
see
that
they
could
see
that
the
container
is
creating
at
that
moment,
and
now,
let's
start
it,
so
we
actually
went
from
four
instances
of
nginx
to
five
with
one-click
course.
You
can
also
do
this
through
the
CLI
so
that
you
can
automate
this
okay.
C
It's
so
that's
a
very
important
question.
I
think
your
monitoring
has
to
be
aware
of
these
things,
so
you
have
to
make
sure
that
you're
monitoring
to
knows
what
is
going
on
right.
So
if
we
have
a
look
here,
we
can
see
for
the
nginx
I
preview
previously
scaled
it
up
from
one
container
to
three
containers,
and
that
is
that
is
what
we
can
see
right
here.
The
yellow
line
shows
that
there
was
one
container
running
at
this
point
in
time
and
then
at
3
o
clock
in
the
afternoon.
C
I
scaled
it
up
to
three
containers,
three
pots,
but
actually
what
happens
you
can
see
here?
So
you
can
see
this
container
kept
on
running.
So
the
green
area
indicates
where
the
container
was
running
at
3
o
clock.
Another
one
was
started.
Actually
two
containers
were
started,
the
second
one
exit
again
and
open
shift
scheduled
another
one
for
me.
So
we
killed
this
one
and
Oh
chief
that
ok,
you
asked
for
three,
so
we
will
schedule
another
one
for
you.
You
can
really
easily
see
in
this
graph
what
is
going
on
when
our
container
started?
C
Okay,
so
I
think
if
we
go
to
a
dashboard-
and
we
can
actually
see
this
so
this
is
my
nginx
and
I-
can
see
here
for
the
whole
service
that
the
CPU
load
dropped.
The
average
sea-blue
CPU
load
for
the
whole
service
dropped
here.
So
if
we
open
this
up,
then
we
can
see
okay
for
the
words
namespace.
There
is
a
replica
set
engine
X
if
I
open
this
I
can
see
all
the
bots
or
that
namespace
for
that
replica
set.
Excuse
me
so
right
here,
I
can
see
that
we
had
one
container
running.
C
C
We
can,
of
course,
also
have
a
look
at
this
at
the
at
the
service
level.
So
if
I
go
back,
if
I
click
forward
now
on
nginx,
here,
I
can
actually
see
some
more
in-depth,
nginx
metrics.
So
I
can
see
the
number
of
requests
that
are
coming
in.
So
if
I
click
on
the
number
of
requests,
then
I
can
see
that
there
is
also
a
drop
and
an
average
number
of
requests
as
being
searched.
C
That's
strange,
but
it's
very
logical,
right
open
it,
so
we
can
again
see
multiple
container
joint
and
they
took
over
the
the
request,
if
I
now
to
stack
this
graph,
I
can
actually
see
that
this
is
very
normal.
Behavior
I
had
about
two
requests
per
second
before
we
scale
up
the
replica
set
and
we
can
see
that
there
are
no
three
containers
and
they
are
all
serving
equal
traffic
for
that
service.
C
B
That
really
makes
sense,
of
course,
maybe
a
question
like
we.
You
gave
us
an
example:
anodes
actually
failing,
but
I
can
imagine
a
lot
of
scenarios
where
yeah
you
want
to
do
this.
I,
don't
know
you
need
to
maintain
a
machine
or
you're,
seeing
Hardware
errors
almost
specific
one
and
you
think
it's
better
to
take
it
offline.
C
So,
for
example,
when
you
require
maintenance
on
your
machines,
so
this
can
happen
right.
You
have
a
security
update
that
has
to
happen
to
the
underlying
operating
system
and
for
that
security
update
you
have
to
reboot.
So
at
that
point
you
want
to
evacuate
that
note.
You
want
to
drain
the
nodes,
you
want
to
say
to
open
shift
drain
a
note.
It
will
evacuate
all
the
pots
and
it
will
reschedule
them
on
different
notes.
So
I
don't
know
where
you
can
do
this
in
the
UI.
C
That
would
be
good
information,
but
I
know
you
can
do
it
from
the
command
line.
So
there
it
is
Oh
admin
you
open
shift
administrator
to
and
there
you
have
options
and
one
of
the
options
is
drain.
So
here
we
can
see
drain
node
and
preparation
for
maintenance,
so
we
can
just
say
Oh
admin
and
then
drain
example.
My
note
2,
if
I
do
this,
yeah
I
will
get
some
warnings
that
I
want
to
ignore
the
daemon
sets,
though,
let's
do
that.
C
B
C
C
Actually
I
created
the
graph,
so
we
can
see
the
number
of
successful
requests
to
the
server
to
the
receiver,
though
I
know
that
the
orange
one
is
the
one
that
is
running
on
load
2,
because
I
looked
it
up
before
and
I
know,
there's
only
one
GLaDOS
running,
and
it's
this
one.
It's
running
on
that
note.
If
you
now
go
forward
in
time,
just
a
bit
then
we'll
see
at
1450
for
the
note
2
started
training.
So
this
is
at
this
point
and
then
some
strange
stuff
starts
happening
right
here.
C
C
So
we
can
see
that
the
number
of
successful
requests
is
very
low
in
this
area,
and
that
is
because
there
is
no
connection
to
Redis.
So
there
is
no
Redis
at
that
point,
and
things
start
failing.
However,
since
openshift
manages
to
get
the
spot
up
and
running
again,
we
can
see
here
that
it
that
requests
are
restored
and
this
yellow
container
is
a
new
container
that
is
scheduled
on
a
different
note.
We
can
see
here
that
note.
B
C
C
You
want
to
end
your
client
code,
make
sure
that
you
connect
to
another
node
in
the
cluster
and
try
to
request
again,
so
these
retry
kind
of
mechanisms
are
also
really
important,
and
the
crew
thing
with
these
tools
is
that
you
can
actually
see
this
behavior
really
easily,
so
you
can
actually
see
what's
the
impact
of
no
trade.
What's
the
impact
of
a
disk
running
fool,
and
so
on.
On
my
on
my
application
level,
yeah.
C
However,
there
are
other
things
like
this
throughput
Network
throughput
that
are
not
on
which
you
cannot
set
quotas,
so
this
is
a
limitation
of
the
Linux
kernel
which
in
which
it
is
not
possible
to
to
to
do
these
quotas
today,
so
home
just
can
also
not
do
it.
So
this
means,
if
you
have
one
container,
that
is
a
very
disk
intensive,
so
it
writes
a
lot
of
stuff
to
the
disk.
It
can
actually
consume
all
of
the
bandwidth
to
the
disk
and
another
container.
C
On
the
same
note,
if
it
also
requires
band
bandwidth
to
the
disk,
can
experience
problems
from
that,
so
it
can
be
starting
on
on
through
boots,
so
one
container
can
affect
another
container,
and
these
are
the
kind
of
things
that
you
want
to
see.
So
you
want
to
know
for
all
of
your
containers.
Okay,
what's
if
you
are
using
watch
what
memory
are
they
using
against
their
Koda?
But
you
also
want
to
know:
what's
the
network
traffic
what's
disk
throughput,
you
want
to
keep
an
eye
on
this.
C
It's
very
important
that
you
do
this,
that
you
have
a
historical
view
of
this,
so
that
you
can
see
which
containers
should
can
be
scheduled
together
and
which
containers
you
should
keep
on
different
notes.
Actually,
so
you
can
use
mechanisms
like
note,
affinity
and
note
labels
to
make
sure
that
heavy
containers
are
not
scheduled
with
other
containers,
but
you
need
data
to
come
to
those
conclusions.
Yeah
make
sense.
B
C
B
C
That's
a
good
question,
so
most
of
most
of
the
companies
we
talked
to
start
with
an
internal
application,
so
they
have
this
internal
application
that
they
would
use
to
test
this
where
they,
where
they
start
with.
Okay,
there
is
no
real
ant
user
impacted.
If
we
do
this,
they
try
it
out
with
that,
but
we
now
we
are
seeing
the
push
for
more
and
once
they
gain
experience
with
that
they
start
going
to
more
customer-facing
applications,
and
we
see
a
lot
of
customer
facing
stuff
starting
to
happen
right
now.
It's.
B
Very
exciting,
to
see
all
this
happening.
Of
course
now,
maybe
coming
back
to
open
ship
I
can't
imagine
that
there's
some
scenarios
where
a
container
is
misbehaving
or
or
doing
something
it
it
shouldn't,
but
that
from
opens
shifts
point
of
view.
This
isn't
really
clear.
So,
for
example,
yeah
I
can
open
shift,
handle
every
type
of
container
issuer
to
container
crash.
So.
C
When
a
container
crashes
open
shift
will
help
you
right,
it
will
reschedule
it
and
to
make
sure
that
the
pot
gets
back
up
and
running.
There
are,
however,
a
lot
of
situations
where
the
container
is
not
crashing
and
health
checks
appear
to
be
healthy,
so
hoping
to
OpenShift
things.
Okay,
this
container
is
doing
what
it's
supposed
to
do,
but
actually,
if
you
look
at
other
metrics
inside
of
the
inside
of
the
container,
so
more
performance
metrics.
What
are
the
ladies
of
the
requests
that
are
being
done
and
so
on?
C
C
So
this
is
the
the
wrapped
mq
still
running
in
our
test
application,
and
so
we
can
here
see
some
very
global,
metrics,
very,
very
high
level,
the
number
of
channels,
the
number
of
connections,
consumers,
exchanges,
queues
and
so
on
message
rates.
How
many
messages
are
coming
in
at
a
moment?
What's
the
memory
looking
like-
and
we
can
see
here
that
there's
a
strange
trend
going
on
here
right.
So
at
this
point
everything
is
fine,
so
there
are
not
a
lot
of
messages
in
the
Q.
C
This
Q
is
used
as
it's
like
a
job,
so
you
put
something
on
the
queue
and
then
somebody
else
would
pick
it
up
and
process
that
data.
But
at
this
point
we
can
see
that
something
is
going.
Work
starts
piling
up
or
messages
start
piling
up,
so
you
can
see
the
request
rate
goes
up
and
this
message
you're
not
being
handled
this
causes
the
memory
to
go
up.
So
this
container
starts
consuming
a
lot
more
memory.
C
It
keeps
on
going
up
and
at
some
point
it
will
hit
a
limit
and
it
will
crash
or
restart
or
so
on.
At
that
point,
of
course,
you
will
lose
that
data.
In
our
case
the
RabbitMQ
is
not
persistent,
so
if
it
restarts
then
we'll
we
will
lose
that
data.
If
I
click
here
then
I
get
can
get
some
more
detailed
information
and
I
can
actually
see
that
there
are
two
queues.
C
So
here
we
have
the
cube
called
junk
that
contains
a
lot
of
messages
and
then
the
messages
queue
that
is
actually
used
by
the
application
that
is
being
cleared
often
so
the
work
is
picked
up,
and
that
goes
ok,
so
you
can
actually
see
here
how
to
how
to
how
to
debug
this.
You
can
see.
Ok,
there's
strange
behavior
going
on
queue
is
filling
up
and
we
can
get
on
to
the
queue
level.
Okay,
it's
disputer
that's
filling
up,
and
then
we
can
mitigate
that.
C
B
So
these
are
large
environments,
so
I
think
when,
when
talking
to
customers,
they
start
small,
they
start
scaling
up
after
some
time
after
they
have
their
tests
with
the
system.
So
I
can
imagine
that
you
maybe
start
with
4
containers,
but
after
a
while,
you
have
20
containers
running
the
same
application
and
it
becomes
I
think
very
difficult
to
do
something
like
this
right.
How
do
you
monitor
20
different
rabbit
and
pews
and
pick
up
when
they're
not
behaving
as
they're
supposed
to
yeah.
C
Definitely
so
you
want
to
have
good
dashboards
that
provide
you
good
visibility,
but
it's
not
possible
to
look
at
these
dashboards
all
of
the
time
so
having
somebody
dedicated
going
through
all
the
dashboards
all
of
the
time.
That's
not
something
we
are
interested
in.
So
we
have
an
anomaly
detection
mechanism
that
will
alert
you
when
there
are
big
changes
in
your
system,
things
that
you
should
actually
look
at.
B
C
C
We
can
see
that
one
of
the
containers
is
experiencing
a
strange
behavior,
so
it's
going
to
100%
CPU,
so
perhaps
it's
in
some
kind
of
a
loop
it
can't
get
out.
So
it
starts
consuming
a
lot
of
CPU
Co
scale.
We'll
look
at
all
of
the
containers
in
a
certain
service.
It
will
see
that
okay,
these
things
normally
look
like
the
same.
They
have
the
same
type
of
behavior
as
we
can
see
here,
it's
very
regular,
the
same
kind
of
behavior.
C
If
something
pops
out
of
that
behavior,
then
we'll
alert
users
so
for
this
one
we
can
detect
it
within
one
minute
we
can
say:
okay,
there's
something
strange
going
on,
especially
because
it's
a
very
large
anomaly,
though
so
you
can
see
highlighted
here
and
pinky.
That's
an
automatically
detected
anomaly
from
croscill,
okay,.
C
Exactly
so,
we
have
four
metrics
for
both
the
operating
system,
the
orchestrator,
the
containers
and,
of
course,
the
applications
inside
the
containers
and
for
all
of
those
metrics.
We
make
models
and
we
make
sure
that
the
models
are
calibrated
by
the
data
that's
coming
in
and
if
new
data
comes
in
then,
if
it
isn't
normally,
you
will
get
alerts
for
that.
Okay,.
B
I'm
sorry
I
can
also
receive
emails
for
this.
Then
yes,
okay,
pretty
cool,
no
yeah,
I!
Think
all
the
data
we've
been
showing
has
been
coming
from
coast
kale,
which
is
pretty
clear,
I.
Think
now,
I
think
the
people
on
the
webinar
will
probably
be
interested
in
okay.
How
do
I
install
this?
How
do
I,
no
matter
my
own
openshift
environments?
How
long
will
it
take
yeah.
C
So
if
we
go
to
data
sources,
we
can
see
we
have
an
agent,
and
at
this
point
the
agent
is
installed
on
all
of
the
10
nodes
in
the
cluster.
Let's
say:
I
was
starting
and
I
wanted
to
recreate
this
thing,
then
what
do
I
do?
I
create
a
new
crew
skill
agent
in
this
case,
I
will
deploy
it
as
a
container
I
will
deploy
it
on
open
shift.
I
will
talk
about
how
to
monitor
the
images
the
containers
inside
your
environment
later
on
so
we'll
skip
that
step.
C
I
can
give
it
a
name,
give
it
a
name.
Then
we
get
install
instructions.
So
let
me
first
scroll
down
a
little
bit
so
here
we
can
see
the
step
where
the
coastal
agent
is
actually
deployed
and
we
can
see
that
we
are
using
an
teaming
set.
So
we
team
set
is
a
mechanism
and
open
open
shift
that
deploys
a
certain
container
on
all
of
the
nodes
in
your
cluster.
This
way
the
post
kill
agent.
The
course
kill.
Agent
container
is
running
on
all
of
the
notes
in
your
in
your
cluster.
C
So
you
can
see
here
that
we
are
mounting
the
docker
socket
and
some
other
stuff
to
make
sure
that
we
can
get
metrics
from
docker
that
we
can
get
metrics
from
the
from
the
underlying
operating
system.
Another
thing
to
notice
here
is
that
we
are
using
privileged
mode.
So
in
order
to
read
metrics
from
the
the
underlying
operating
system,
we
have
to
have
privileged
mode
to
read,
talk
and
an
open
shift.
You
have
to
do
some
stuff
to
make
sure
that
the
privileged
mode
works
to
make
it
easy
for
our
customers.
C
We
actually
include
the
instructions
to
do
that
as
well
here.
So
in
the
first
step,
we
set
up
a
security
context,
constraint
constraints
which
allows
us
to
run
a
privilege
container.
You
can
also
see
the
other
things
that
we
are
using.
You
can
review
that
and
give
this
to
your
security
team
to
see
where
they,
where
they
like
this,
then
here
we
create
a
new
project
code,
go
skill
project
on
your
cluster.
We
create
a
service
account
for
it.
We
add
the.
C
D30
context
to
that
service
account
and
then
we
deploy
the
diamond
set,
so
the
installation
is
actually
as
simple
as
copying
this
and
pasting
it
into
your
and
pasting
it
into
your
terminal.
So
we
can
just
do
this
right
here.
Of
course,
this
was
already
done,
but
you
see
how
that
works
right
and
that's
the
whole
installation,
though.
B
C
B
C
So,
let's
go
to
the
home
network:
when
you
do
the
installation
with
the
daemon
set,
you
would
see
the
resources
from
the
operating
system,
the
matrix
from
Tucker
and
the
OpenShift
there's
a
bit
more
configuration
required
to
get
the
other
matrix.
We
will
talk
about
that
right
away,
but
I
can
show
you
what
what
is
already
in
the
open,
shipped
dashboard.
So
a
lot
of
the
widgets
that
I
have
been
showing
our
present
on
these
default
dashboards.
So
if
you
install
it,
you
get
these
teeth.
C
You
get
these
dashboards
immediately,
so
you
can
see
how
many
containers
are
running,
how
many
nodes
do
I
have
how
many
replication
control
services
and
so
on
I
can
see
the
events
that
are
happening
in
my
cluster
and
so
more
information
about
builds
deployments
and
so
on.
The
cool
thing
is
that
this
dashboard
gives
me
a
lot
of
high-level
view
right.
C
It's
it's
very
high-level,
but
I
can
click
through
on
this,
so
I
can
click
through
on
the
nodes
and
get
this
this
kind
of
view
same
thing
as
we
saw
before
we
can
click
on
one
of
the
containers
here
and
go
into
that
dashboard.
We
can
see
that
here
the
container
was
running.
I
can
zoom
into
that
I
can
see
the
events
for
that
container.
I
can
also
click
through
to
other
technologies.
C
B
I
guess
when,
when
talking
to
customers,
you
put
a
lot
of
the
knowledge
you
builds
into
the
dashboard
again,
yes,
exactly
okay,
now
I
noticed
on
top
of
maybes.
Some
other
people
also
notice
these
dropdowns.
You
have
like
replication
controller
namespace.
In
this
case,
can
you
maybe
explain
a
little
bit
what
what
the
what
that
is,
or
what
that.
C
Does
yeah
sure
let's
go
back
to
the
dashboard
first,
so
for
our
notes,
we
can
see
that
here,
I
have
selected
the
the
words
namespace,
so
I
can
see
all
of
the
containers
that
are
running
for
that
namespace
if
I
now
click
a
different
namespace,
for
example,
koh
skill
in
which
the
coastal
agent
is
running
I
can
see
those
okay.
After
that
we
can
also
we
can
filter,
so
we
can
find
a
certain
service.
So
I
can
look
for
my
Redis,
see
where
that
is
running,
and
so
on
yeah.
It's
pretty
cool
there's,
however,
more.
C
So
if
we
have
look
at
this
dashboard,
so
this
is
a
service
metric
dashboard.
This
is
also
in
people
type
or
that
you
get
out
of
the
box.
You
can
see
the
average
CPU
memory
network
traffic
and
they
stupid
for
all
of
your
service,
which
might
not
be
that
useful,
but
you
can
open
this
up
and
actually
drill
down.
So
you
can
see
all
of
your
service,
but
I
can
also
go
to
kubernetes,
for
example,
and
there
I
can
see
there
are
notes.
C
There
are
master
note
and
regular
notes,
so
I
can
select
one
of
the
masters
or
I
can
select
one
of
the
other
notes.
So
if
I
split
this
up
for
all
notes,
that
I
can
see
per
note
what
the
behavior
is.
So
if
I
click
on
this
one
I
would
see,
which
note
is
that
I
can
then
pick
out
that
note
to
inspect
it
further.
There
are
two
other
dimensions
here,
so
we
can
also
see
that
there
is
this
dimension
and
an
interface
dimension.
So
the
interface
is
the
network
interface.
C
So
if
I'm
interested
in,
for
example,
the
network
traffic
that
is
going
between
the
notes
or
publicly
I
can
click
on
that
interface
and
see
data
for
that
same
thing,
for
the
other
interfaces
of
course.
So
this
allows
you
again
to
start
from
a
very
high
level
and
then
drill
down
on
certain
certain
aspects.
B
Ok,
cool
yeah,
I
think
or
maybe
a
quick
question
on
this
before
I
forget
is
so
this
system
really
means
I
can
create
a
dashboard,
for
example,
form
application
and
if
I
have
a
development,
namespace
is
staging
namespace
in
the
production.
Namespace
I
can
quickly
compare
the
performance
between
the
tree,
so
I
don't
need
to
create
3d
dashboards
to
see
the
same
information
pretty
much.
C
C
Ok,
so
you
can
have
a
look
at
that
so
by
default
there
are
some
alerts
that
have
been
set
by
go
skills,
so
you
have
the
average
load
of
the
CPU
free
disk
space
and
so
on,
rememory.
If
I
click
on
this
one.
So
let's
say
the
free
disk
space
I
can
edit
the
the
event
and
it's
very
readable,
so
I
can
say
if
free
disk
space
and
percent
is
less
than
10%
for
five
minutes
for
D
servers.
C
C
In
this
case
you
know
who
that
is,
for
five
minutes
and
I
can
then
set
it
on
a
certain
container.
So
I
can
do
it
on
the
image.
So
I
can
say
all
the
containers
that
are
running
the
nginx
image.
I
want
to
do
it
for
that,
but
I
can
also
do
it
on
a
more
granular
level,
so
I
can
do
it
for
a
certain
replica
set
only
or
I
can
do
it
for
the
whole
deployment,
for
example,
or
for
services,
and
so
on.
So
it's
a
very
modular.
C
You
can
do
it
for
one
namespace
or
for
all
namespaces,
it's
very
easy
to
create
alerts
for
a
very
specific
thing.
You
can
also
see
that
I'm,
not
selecting
containers.
So
if
you
drill
down,
you
will
see
actual
containers,
but
that's
not
that's
not
very
relevant,
because
containers
go
come
and
go
a
lot,
so
you
want
to
do
it
at
a
more
more
of
an
aggregate
level.
C
B
C
How
would
I
do
this
interesting
that
you
ask
this
I
have
a
very
good
example
of
this.
So
here
we
have
high
memory
on
the
Cal
clatters
deployment.
So,
as
I
showed
you
before
this
one
is
if
docker
memory
each
byte
is
greater
than
200
megabytes
for
everything
that
is
in
the
help
address
deployment,
then
I
want
to
get
trigger
and
alert.
So
we
can
see
here
that
actually
goes
up
a
lot
more
than
200
megabytes
and
for
that
I
will
set
a
certain
rule.
The
rule
here
is
that
a
web
hook
is
being
executed.
C
C
We
can
read
that
right
here,
so
the
action
is
either
triggered,
acknowledged
or
resolved,
so
the
back
hook
will
be
sent
for
and
an
alert,
that's
being
triggered
right
now
or
alert
that
is
being
resolved
right
now
at
the
server
field
actually
contains
this
or
that
is
affected
by
this
alert
mix-ins.
So
this
information
will
be
sent
to
that
URL.
So
if
we
have
look
at
our
OpenShift
dashboard.
C
We
can
see
that
this
service,
the
white
book
service,
is
also
running
in
our
open
ship
deployment.
So
that's
this
one
right
here:
I
can
show
you
the
code,
it's
it's
really
simple
peyten
program
and
what
it
does
so
it
exposes
a
route.
Debug
slash,
keep
them.
It
checks
the
token.
This
is
a
very
basic
form
of
security
right,
it's
it
should
be
over
HTTPS
and
it
should
check
the
IP
range
of
the
co
skill
and
so
on,
so
don't
use
and
production.
C
But
it's
a
very
simple
example
with
a
very
simple
security
mechanism,
so
it
checks
the
token
first.
Then
it
checks
whether
the
eight
where
the
server
is
present
and
the
action
is
present
present
if
the
action
is
triggered.
So
if
the
alert
is
created
right
now,
then
we
will
extract
the
pot
in
from
the
server.
So
we
can
do
that
with
a
regular
expression.
We
can
get
the
pot
name
and
then
we
do
a
heap
dump,
so
we
take
a
heap
dump
for
that
pot.
So
what
is
happening
in
our
system?
C
We
see
that
the
memory
usage
is
is
growing
for
a
certain
container.
At
some
point,
the
alert
gets
triggered
and
we
say:
ok
trigger
this
fat
hook,
that
will
take
a
heap
dump
and
that
heap
dump
can
then
be
later
on
analyzed
to
see
which
objects
are
consuming
the
most
most
space
and
you
can
actually
optimize
your
your
service
with
that.
C
So,
if
you
look
at
the
take
heap,
dump
method
the
thing
that
is
being
executed,
so
we
take
it
up,
we
upload
the
dump
and
then
we
do
a
cleanup,
so
taking
a
dump
is
done
using
a
map.
So
this
is
a
Java
utility
for
creating
a
heap
dump
from
your
JVM.
We
have
to
fill
in
the
Java
process
ID.
So
this
is.
We
use
this
command
right
here,
for
that
we
use
the
curl
to
upload
it
to
a
certain
FTP
server,
and
then
we
remove
the
heap
dump
from
the
container
yeah.
C
This
command
is
being
executed
using
cube
CTL,
so
we
do
cube
CTL
exact
in
the
pot
that
was
provided
by
the
alert.
The
alert
provides
the
container
that
is
having
that
that
problem.
So
we
use
that
here
to
execute
a
certain
command
inside
of
that
container,
to
get
the
heap
dump
and
put
it
on
to
an
FTP
server.
Does
that
make
sense
I,
don't.
B
Know
that
much
about
Java
but
I,
guess
people
that
work
with
Java
every
day
should
be
pretty
excited
about
this
okay
cool.
So
we
you
can
take
actions
now
I.
Remember
you
showing
us
a
little
bit
of
application
data.
Probably
the
people
on
the
line
will
be
pretty
interested
in
okay.
How
do
you
now
get
that
in
container
data
into
Co
scale?
So
how
do
you
monitor
up
the
applications
in
the
container.
C
Okay,
so
let
me
go
back
to
our
agent
page
page,
so
the
step
I
just
skipped
was
this
step
so
where
you
can
define
the
the
images
so
right
here,
I
have
defined
that
I
want
to
monitor
the
res
image
with
an
attack
with
a
radish
plugin.
If
I
click
Edit
I
can
see
how
the
Redis
plugin
is
configured
and
it
is
using
a
certain
connection,
localhost
and
a
port
and
it's
doing
an
active
check.
So
this
is
how
I
configured
my
radish
monitoring
for
this
image,
maybe
go
back.
B
C
Actually,
you
can
provide
the
environment
variables
like
this.
You
can
just
put
the
environment
variable.
We
will
detect
that
you
provided
an
environment
variable
here.
We
will
see
that
the
containers
that
are
running
are
having
that
environment
variable
and
we'll
fill
it
in
at
runtime.
So
there
is
no
need
to
set
a
fixed,
password
and
user
name
on
your
containers.
You
can
still
do
that
using
the
native
mechanisms
using
the
environment
variables
using
the
config
Maps,
and
then
here
you
can
just
use
the
environment
variables.
Okay,.
C
C
What
is
the
trick
that
we
are
using
there?
So
we
can
see
here
that
there
is
another
button
generate
docker
labels.
Do
you
can
actually
configure
a
plugin?
So
let's
do
it
same
thing
as
before
for
Redis,
let's
configure
to
read
this
plugin
with
just
the
basic
the
default
and
then
I
will
get
a
label
this.
C
This
is
a
docker
label,
so
this
is
label
that
I
can
put
into
my
docker
file
and
whenever
a
container
is
started
and
the
image
has
a
certain
label,
Coast
field
will
pick
that
up
and
will
start
monitoring
as
defined
by
this
label.
So
this
means
as
a
developer.
You
can
set
how
your
container
should
be
monitored.
C
So,
for
example,
if
you
change
something
to
your
container,
if
you,
for
example,
adding
you
metric
to
your
JMX
metrics,
you
can
just
add
it
here-
and
the
label
at
the
label
to
your
doctor
container
and
the
metric
will
be
automatically
picked
up
when
the
container
is
started.
So
there
is
no
need
to
run
to
operations
to
ask
them
to
add
this
metric.
For
me
for
this
container,
you
can
do
it
yourself,
you
can
add
to
label
on
on
your
container
and
then
things
will
be
started
automatically.
C
C
C
B
C
Actually
a
good
question,
so
the
plugins,
the
Waco,
still
starts
them
as
the
date.
The
agent
that
is
running
on
all
of
the
notes
will
start
a
plugin
for
the
for
the
containers
inside
of
the
namespace
for
those
containers.
That
means
that
these
plugins
can
see
everything
that
is
local
to
the
container,
so
they
can
use
local
hood
local
host
8004
seen
from
within
the
container,
so
you
actually
don't
have
to
expose
the
support.
Even
so,
this
is
a
port
that
I
only
exports
the
status.
That
is
interface.
C
C
We
also
support
that
you
can
put
def
as
to
be
out
here,
but
if
you
have
container
that
does
logging
inside
of
the
container
we
can
manage
so
we
can
actually
get
to
the
file
inside
of
the
container
no
reason
to
mount
it
anywhere
or
so
on.
So
it's
all
very
transparent,
you
can
you
can
reason
from
the
the
view
of
your
container,
okay.
B
C
You
just
add
the
label
on
the
container
and
if
the
container
is
running
on
the
first
environment
will
be
picked
up
the
picked
up
there
and
will
be
start
monitoring
same
thing
for
all
the
other
environments.
So
there's
no
change
between
the
different
environments,
so
you
can
actually
test
your
monitoring
on
your
staging
environments,
see
where
everything
works,
perfect
there
and
I
moved
into
production,
and
you
will
know
that
configuration
of
your
monitoring
will
be
exactly
the
same
cool
and.
B
C
So
we
pride
ourselves
in
being
a
very
lightweight
monitoring
solution,
so
we
make
sure
that
everything
is
running
very
efficiently,
that
we
don't
put
a
lot
of
burden
on
your
containers
and
on
your
hosts.
This
is
really
important
because
we're
seeing
a
lot
of
containers
right
now
running.
It's
not
like
you
have
one
process
per
machine
and
you
can
add
10%
overhead
there,
because
there's
only
one
process
right
right
now.
C
B
A
C
A
C
I
think
that's
a
difficult
one,
so
we
can
run
in
some
kind
of
a
degraded
mode.
So
if
you
don't
have
privileged
mode,
for
example,
the
agent
will
still
work
and
the
plugins
will
still
work,
but
some
things
that
are
shielded
like,
for
example,
I,
think
the
disk
metrics
are
shielded
by
in
the
proc
file
system.
So
we
cannot
get
those
without
privileged
mode,
but
the
plugins
will
work
and
you
will
get
gathered
data,
but
those
specific
metrics
won't
be
won't
be
available.
You.
A
C
So,
in
that
case,
the
comment
that
you
get
for
installing
you
have
to
change
it
a
little
bit
so
right
here
it
tells
you
to
use
privilege
mode.
If
you
just
leave
that
out,
you
will
be
able
to
deploy
it
at
an
environment
where
you
don't
have
privileged
mode,
and
you
will
get
into
the
the
degraded
scenario.
C
A
The
reason
I'm
curious
about
this,
because
I
get
a
lot
of
people
asking
about
monitoring
solutions
to
use
when
they
use
dedicated.
We
ship,
dedicated
or
elsewhere,
hosted
open
ship
deployments,
so
I
think
I
hear
you
saying
that
we
could
use
Co
scale
if,
like
we
were
using
someone
else's
hosted
environment.
This
is
that
true
important
was
that
sort
of
yeah.
A
That'd
be
very
handy
to
have
a
lot
of
people
are
asking
for
different
monitoring
tools
for
not
just
for
dedicated
in
online
but
and
business.
My
experience
with
services
similar
to
Mercy
is
pretty
limited
to
using
them,
like
New
Relic,
to
hack
on
and
deep
of
my
own
application.
It's
not
that
operations
level
that
your
shine,
the
kubernetes
and
it's
this
is
it
pretty
pretty
stunning.
The
deep
and
operations
focused
its
but
I.
Think
for
those
of
us
who
are
writing
applications.
A
A
Think
we've
asked
every
question
here:
can
you
put
your
final
slide
up
with
how
to
contact
you
guys
so
that
that
was
that
way?
If
anyone
has
any
further
questions
or
anyone
watching
this
video
and
the
later
has
a
question
that
would
be
a
great
place
to
reach
out
to
them
and
you
get
both
of
them
there's.
C
Yes,
definitely
so
the
monitoring
starts
at
the
moment
the
services
started
up.
So
if
your
services
are
very
short-lived,
will
actually
start
monitoring
them
at
the
moment
they
are
started
and
when
they
stop
or
die,
then
the
monitoring
will
be
stopped
for
them.
We
also
take
a
lot
of
scaling
considerations
into
account
for
this.
So
if
you
have
a
lot
of
these
jobs,
we
make
sure
that
it's
still
possible
to
visualize
those
and
to
see
those
over
time,
because
you
can
get
a
lot
a
lot.
A
lot
of
containers
in
that
case.
A
D
Yeah,
my
question
is
more
specific
to
on
the
services
that
we
are
going
to
host
on
containers
now
because
of
the
new
native
cloud
native
architecture
and
various
micro
services
that
we
are
going
to
build
on
these
containers.
There
is
a
possibility,
this
most
of
the
cases
there
will
be
a
lot
of
micro
services
or
the
application
that
we
developed
may
not
be
in
use.
D
C
Yeah
definitely
it's
a
very
good
question.
It
comes
back
a
bit
to
our
to
our
web
cook
thing,
and
so,
if
you
have
a
look,
it's
difficult
to
do
it
as
as
openshift,
because
judging
from
CPU
usage
memory
usage
and
so
on,
it's
difficult
to
see
whether
a
container
is
active
or
not.
However,
because
we
do
in
a
container
monitoring,
we
can
see,
for
example,
if
there
are,
if
there
are
requests
coming
into
this,
these
containers
so
you'd
have
Micro
Services,
and
you
see
that
there
are
no
requests
for
a
certain
period
of
time.
C
D
A
A
Would
be
great
awesome
all
right?
Well,
we
won't.
We
won't
advance
you
guys
too
much
longer,
because
we
did
fill
up
that
entire
hour
and
but
it
was
a
spectacular
show
and
I'm
so
pleased
that
you
didn't
use
slides
through
the
entire
thing.
So
thank
you
very
much
because
it
was
a
really
useful
information
and
I
know
guys,
have
a
trial
capabilities
too.
So
folks,
on
the
call,
if
you
want
to
keep
it
a
trial
and
check
it
out.
A
This
is
a
really
a
great
service
and
hopefully
we
can
use
it
to
gain
some
insights
into
our
open
ship
deployments.
So
thanks
Samuel
and
thanks
Fred
and
anyone
who'd
like
to
reach
out
this.
This
podcast
will
be
up
online
at
the
blog
that
open
ship
dot-com
site
shortly
and
we'll
also
put
it
up
on
our
YouTube
channel.
So
thanks
again
guys
and
have
a
great
evening
over
there
and
building
Thank.