►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello,
everyone
welcome
to
cloud
native
live
where
we
dive
into
the
code
behind
cloud
native,
I'm
taylor
dolezal
a
senior
developer
advocate
at
hashicorp,
where
I
focus
on
all
things:
infrastructure,
application,
delivery
and
developer
experience.
Every
week
we
bring
a
new
set
of
presenters
to
showcase
how
to
work
with
cloud
native
technologies.
They
will
build
things,
they
will
break
things
and
they
will
answer
your
questions
in
today's
session.
Alok
and
caesar
have
joined
us
to
talk
about
leveraging
the
cncf
observability
tools
for
kubernetes
troubleshooting.
A
A
B
Taylor,
so
I'm
a
look:
I'm
the
founder
cto
of
upscrews,
an
observability
company
built
on
open
source
and
cncf
telemetry,
and
I
will
also
introduce
cesar
quintana,
my
colleague,
who
is
the
principal
solutions
architect
at
hopscotch.
Thank
you
so
the
way
we
thought
we
would
do
this
before
we
set
up
the
the
demo
itself
and
go
through
that.
The
fun
part
I've
only.
B
B
Great
so
as
as
mentioned
by
taylor,
we
are
talking
about
how
to
add
intelligence
and
observability.
Now
that
we
have
open
source
monitoring
right,
you
know
going
through
the
standard,
confidentiality
and
legal
notice
we'll
skip
over
that.
B
You
know
if
you
will,
from
top
down
kind
of
what
we
call
vertical
as
well
as
across,
and
this
is
happening
all
the
time.
The
third
complexity
is
dynamism.
Great.
We
want
to
be
agile
right,
we
want
to
add
services
change,
any
one
component
scale
out
scale
in
you
know
some
things
drop.
Something
is
brought
up
that,
together
with
all
of
this,
is
like
a
highly
complex
distributed
system
and
just
looking
at
a
couple
of
metrics
is
no
longer
fair.
Longer
sufficient.
You
know
it's
things
are
changing.
A
A
B
B
B
You
know
get
to
understand,
what's
happening
in
real
time,
so
they
can
detect
quickly,
find
the
real
issues
and
get
back
up
and
running.
You
know
it's
the
same
things
that
you've
heard
time
to
mean
time
to
test
should
be
fast,
don't
waste
time
with
false
alerts
and
get
to
the
root
cause
mean
time
to
resolution
right.
B
B
C
B
B
So
they
can
actually
detect
the
problem,
isolate
it
and
and
analyze,
and
and
and
figure
out
what
the
resolution
should
be.
So,
if
you
think
about,
if
observability
has
to
be
really
intelligent,
they
have
to
establish
this
context,
this
understanding
and
surface
that
from
all
that,
you
know
effectively
called
noise.
That's
coming
in
all
the
data
that's
sitting
in.
If
you
can't
do
that,
then
we've
actually
made
the
life
of
a
typical
devops
and
sre
very
difficult.
So
that's
what
we
want
to
do,
so
our
thesis
is
help
leverage.
This.
C
B
C
C
B
B
B
B
B
A
C
B
B
B
A
B
B
Six,
why
would
this
happen,
given
what
I
have
seen
so
essentially
think
of
it?
It's
almost
like
an
anthropomorphic.
What
an
ops
would
do
and
they
understand
if
we
can
put
all
of
this
in
place
and
automate
this
pipeline,
we
have
reduced
the
amount
of
work
that
off
spends
today,
trying
to
understand
what
is
it?
What
does
the
application
look
like
who's
talking
to
whom,
when.
C
B
A
problem
there
instead
of
setting
thresholds,
and
if
I
do,
how
do
I
analyze
it
if
we
can
collapse
that
and
reduce
that,
we
have
really
done
the
right
service
to
get
the
right
level
of
intelligence
and
observability.
So
this
flow,
if
you
think
about,
is
what
we
will
demo
today
using
what
you're,
seeing
on
the
left
that
essentially
build
context,
understand
the
application
graph
understand
the
behavior
to
surface
problems,
detect
it
analyze
it
in
context
using
all
the
telemetry.
We
have.
B
Over
to
cesar,
because
we
want
to
get
to
the
demo
and
he'll,
tell
you
exactly
how
we
leverage
open
source
monitoring
and
use
open,
cncf,
open
telemetry.
Do
this
so.
C
Actually,
look
if
you
could,
if
you
could
go
back
to
that,
we'll
talk
really
briefly
about
those
about
those
open
source
platforms
that
we're
leveraging.
If
you
could
share.
C
There,
it
is
all
right,
let's
go
yeah
so
again,
everybody,
my
name
is
cesar
quintana,
I'm
a
principal
solutions,
architect
at
ops,
cruise
here
and
and
yeah
so
to
to
add
on
to
to
what
eloqua
was
mentioning
right,
the
the
whole
premise
of
leveraging
these
open
source
platforms
that
you
know,
essentially
the
whole
data
collection
layer,
has
been
commoditized
right.
Observability
data
is
now
easier
than
ever
to
access
things.
Thanks
to
these,
you
know
powerful,
open
source,
particularly
around
the
cncf
platforms
right.
C
So
what
we've
set
out
in
mind
right
is
to
build
something
and
leverage
these
amazing
tools
to
make
everybody's
life
easier
right.
So
things
like
this,
this
is
this
is
an
example
of
our
architecture,
of
how
we're
leveraging
all
this
open
source
data
and
all
these
open
source
platforms.
So
as
you'll
notice
here,
if
you
focus
on
the
on
that
kubernetes
cluster
square
on
the
right
side
right,
what
you'll
see
is
across
the
top
in
the
green
you'll,
see
your
workloads
right.
C
You
know
pod
one
two,
three
four.
These
are
eventually
your
own
applications
running
whatever
you're
doing,
whether
you're
running
an
e-commerce
site,
a
financial
training
platform,
etc.
This
is
what
you're
running
inside
your
actual
workloads,
but
underneath
in
the
in
that
light
and
dark
blue
are
the
open
source
tools
that
are
now
so
common
throughout
the
throughout
the
I,
it
landscape
and
in
the
modern
application
environments
right
so
towards
the
bottom.
C
On
the
with
a
dark
blue
you'll,
see
you
know
here
in
this
this
reference
architecture
we're
showing
jager
prometheus
loki.
It
could
be
something
you
know.
This
is
just
an
example.
We
can
leverage
logs
from
other
sources
like
fluent.
I
think
somebody
asks
about
fluent.
It
could
be
loki,
it
could
be
fluent
d
and
then
we
take
metrics
in
from
prometheus
and
then
traces
we're
leveraging
yeager
as
a
backend
for
our
particular
architecture.
C
But
we
are
supporting
open,
telemetry
libraries
for
the
client
side,
so
the
important
that's
one
of
the
really
really
cool
things
about
the
new
you
know
standards
is
that
they're
now
you
know
well
defined,
which
means
that
you
could
be
using
a
mixture
in
your
environment,
of
open,
zipkin
and
jaeger,
or
the
open,
telemetry
libraries
themselves
and
still
have
a
unified
back-end,
where
you're
able
to
collect
all
that
data
and
leverage
it
and
use
it,
even
though
you're
technically
using
disparate
libraries
throughout
your
enterprise,
all
right.
C
So
so
what
you'll
see
here,
you
know
how
we've
architected
ourselves
to
be
built
is
again
around
these
open
source
platforms
again,
whether
it's
fluentd,
whether
it's
loki
and
prometheus,
etc.
They
serve.
As
you
know,
now,
you're
you're
your
data
collection
and
data
data
stored.
You
don't
have
to
go
out
and
pay
another
vendor.
You
know
10
15
x
for
storing
just
metrics
right
when
you
can
store
them
in
your
own
infrastructure,
they're,
we're
all
doing
the
same
thing
right
just
putting
them
inside
of
inside
of
a
long-term.
C
You
know
bucket
right,
and
so
now
that's
under
your
control,
and
so
we,
for
example,
prom
tail
right.
If
you,
if
you
start
looking
upward
towards
the
stack
in
the
light,
blue
promtel
will
run
as
a
demon
set
collect
logs
from
all
your
nodes
and
from
all
your
containers
right
and
then
you
have
on
top
of
that
node
exporter
right.
C
Friction
has
an
export
for
prometheus
to
grab
the
metrics
from
from
the
nodes
themselves
and
going
above
that
you'll
see
c
advisor
collecting
data
from
from
the
containers
themselves
running
on
each
node,
and
then
we
also
leverage
ksm
exporter,
pretty
awesome,
grabbing,
kubernetes
object,
status,
data
and
all
those
are
going
to
be
fed
out
into
prometheus
or
to
loki
and
if
you're,
using
traces
again
to
jager
and
really
you
know
now,
even
just
with
that
you've
got
a
pretty
darn
functional,
observability
layer
right
now
you
have
metrics
and
they
have
traces.
C
Now
you
can
go
into
different
places
and
look
at
your
logs.
But
what
we're
you
know
what
what
alok
was
mentioning
earlier
is
that
smart
layer
right
now
you
want
to
leverage
all
those
pieces
of
data,
bring
them
in
together
and
do
something
really
really
powerful
with
having
all
that
context,
all
that
configuration
data
that
we
can
grab
from
the
kubernetes
api
and
then
just
bring
it
all
together.
C
On
top
of
that,
you
have
you
have
metric
data
configuration
data,
performance
data
from
and
event
data
from
from
from
your
cloud
environments
right,
so
bringing
in
bringing
things
like
you
know,
more
and
more
applications
are
hybrid
right,
they're
using
you
know
whether
it's
vms
and
kubernetes,
or
serverless
and
and
pass
you
know
you
have
all
these
really
really
hybrid
environments,
that
again
it's
the
whole
extreme
production
of
data
and
having
one
place
and
easy
ways
to
collect
them,
and
that's
really
what
these
open
source
platforms
have
allowed
us
to
do
right.
C
But
going
back
to
what
I
was
mentioning
about
cloud,
you
also
want
a
place
where
you
can
grab
your
data
and
bring
him
in
bring
it
in
talking
about
again
serverless
or
function
as
a
service.
These
the
the
paths
layers,
which
are
only
constantly
growing
right.
You
know
you
have
these
these
cloud
caches
and
messaging
services,
cloud
databases
etc.
C
So
you
know
what
obscure
sets
out
to
do
is
not
only
grab
that
open
source
data
in
leverage
collection
platforms,
but
also
bring
in
the
cloud
data
and
and
mess
it
all
together
and
build
something
really
really
rich
and
then
provide
actionable
data
based
on
that.
So
what
I'm
going
to
do
is
I'm
going
to
show
you
a
a
demo
of
obscures?
Oh
sorry,
look
did
you
want
to.
B
C
Let's,
let's
address
that
yeah!
No,
so
as
mentioned
right,
we
we
can
take
logs,
basically
from
from
whether
it's
loki
fluent
bit
is
usually
the
thing
that
are
fluent.
Those
are
usually
the
the
pieces
we
run
into
right
and
absolutely
you
know.
The
whole
point
is
to
build
a
modular,
flexible
platform
where
you
can
grab
data
from
you
know,
whatever
your
your
preferred
variant
of
of
that
is
right,
so
yeah
absolutely
obscures
particularly
provide
support
for
fluency
loki
and
a
few
others
as
well.
Yeah.
B
The
metrics
in
this
approach
will
still
work,
of
course,
with
opencncf.
We
don't
have
to
do
proprietary
agents,
proprietary
instrumentation.
We
can
be
sitting
outside
without
being
intrusive,
so
think
of
it.
That
way,
the
real
intelligence
or
observability
is
not
how
the
metric
has
got
to
us
and
what
it
is.
As
long
as
we
have
coverage,
that's
the
key.
The
coverage
is,
all
of
these
is
needed.
You
can't
just
go
on
metrics
and
logs
and
traces
independently.
It
doesn't
give
you
the
whole
picture,
you
know.
B
B
C
Thanks
thanks
for
that,
all
right,
so
now
I'll
I'll
share,
look,
I
think
you
might
have
to
stop.
I
can.
B
C
Okay,
awesome
so
yeah,
so
this
is.
This
is
a
this
is
our
landing
page
for
op
screws
and
you
can
see.
There's
there's
quite
a
few
pieces
of
data
here.
You
might
you
know
this.
The
screen
might
look
familiar
for
any
of
you
who
have
used
apm
tools
before
so
this
is
a
real-time
service
apology
map.
C
Excuse
me
a
flow
of
how
your
services
are
are
interacting
with
each
other
and
I'm
zooming
in
more.
Of
course.
Now
this
is
not
even.
C
This
is
nowhere
close
to
some
of
the
busiest
environments,
but
you
can
see
that
does
get
busy
really
really
quick
and
that's
one
of
the
cool
things
about
you
know
having
all
all
the
configuration
data
and
the
really
rich
data
that
the
underlying
tools
like
c
advisor
collect
is
that
we
get
a
lot
of
really
rich
object
data
along
with
the
along
with
the
metric
locks,
so
things
like
being
able
to
under
understand
you
know
the
configuration
data
of
these
pieces
allows
us
to
also
extract
things
like
labels
and
and
tags.
C
So
when
you
have
a
busy
environment,
you
might
only
want
to
filter,
for
example,
on
a
particular
namespace
right.
I
might
only
want
to
look
at
you
know,
maybe
the
obscure's
namespace,
and
so
that
really
helps
you
cut
down
on
on
some
of
that
noise
when
you're
trying
to
isolate
an
issue
but
going
back
to
kind
of
our
premise,
you
know
what
we're
showing
here
is
a
mixture
of
quite
a
few
different
pieces
of
data
you're
showing
the
ebpf
pieces.
Again.
C
We
talked
about
cloud
so
this
this
demo
happens
to
be
running
inside
of
aws,
but
you
know
whatever
cloud
you're
running
on
you're,
going
to
have
that
path,
layer
very
likely
so
think
so
being
able
to
collect
that
data
and
bring
it
all
together
to
your
kubernetes
environments.
You
know
when
all
monitored
in
a
single
place
is
absolutely
powerful.
So
if
I
click
on,
for
example,
that
aws
rds
instance,
you
know
again
we're
talking
about
the
metrics.
C
So
if
you
look
at
this
right
side,
we're
collecting
all
those
individual
metrics,
the
the
read
I
ops
and
the
throughput
etc.
This
is
a
high
level
summary,
but
important
is
metrics
right.
So
I
can
go
in
here
and
look
at
all
the
individual
metrics,
that's
one
of
the
pillars
of
observability
and
that's
just
that's
just
for
one
entity
same
thing
for
same
thing
for
a
pod
right.
This
is
a
pod
in
the
container.
So
if
I
click
on
a
pod
same
thing,
I'm
bringing
back
all
this
configuration
data
all
these
labels.
C
You
know
what
time
this
was
created.
What's
what
host
it's
running
on?
It's
important
to
understand
all
these
things,
because
when
you're
troubleshooting
you
know
well,
when
was
I
you
know
what
time
was
this
pod
running?
It
was
supposed
to
have
been
restarted
five
minutes
ago.
Did
we
actually
perform
the
restart
or
was
there
an
issue?
You
know
doing
that
that
that
roll
out
of
the
application?
Well,
look
it's
been
running
for
you
know,
since
a
couple.
C
That
rollout
wasn't
successful
right
again,
we've
got
metrics
as
well,
and
each
each
entity
has
its
own
pieces
of
data
and
it's
important
to
be
able
to
look
at
that
data
again
in
context
for,
for
you
know,
whatever
probably
troubleshooting.
C
In
this
scenario
I
clicked
on
this
container,
it
happens
to
be
the
yeager
agent,
but
I
click
on
this
container
and
I'm
getting
you
know
additional
data-
that's
contextual,
for
that
particular
container,
the
ports
that
are
being
exposed.
But
on
top
of
that,
you
know
being
able
to
to
see
how
the
infrastructure
is
working.
What
things
are
related
to
what
so,
for
example,
have
we
have
these
contextual
access
to
these
different
pieces
right?
So
if
I
click
on
this
three
layer
view
right
what
it
does
is.
C
It
shows
me
this
particular
container,
and
this
pod
is
running
some
details
about
it.
The
ip
address
the
image
name
that
it's
using
as
well
as
some
high
level
metrics
such
as
cpu
and
memory,
but
also
it
shows
me
what
kubernetes
node
this
particular
container
is
running
on,
as
well
as
some
of
the
neighbors
and
those
cpu
on
memory
metrics
for
those
neighbors,
and
then
this
kubernetes
node
is
running
on
top
of
what
cloud
instance
right.
C
So
when
you're
troubleshooting,
I
know
I
have
some
instances
in
let's
say
you're
running
aks
and
you
have
some
nodes
in
one
particular
subnet
or
one
particular
availability
zone
that
are
having
connectivity
issues
and
you're
trying
to
diagnose.
You
know
right.
C
You
know
this
little
click
you
can
understand
if
your,
if
your
container
happens
to
be
running
on
one
of
those
notes
and
things
like
the
region
and
how
much
storage
is
attached
to
it,
but
not
only
that
again,
as
we
mentioned,
the
the
the
rigorousness
of
all
this
of
all
this
data
and
the
ease
of
collecting
makes
it
really
really
simple
to
bring
it
all
together
and
now
we
can
look
at
the
infrastructure
map
that
we
call,
which
is
essentially
a
cloud
map
and
now
we're
looking
in
the
context
of
this
particular
cloud
instance
and
we're
looking
at
this
ec2
virtual
machine
and
looking
at
the
configuration
of
that
in
the
text
right
and
I'm
just
kind
of
showing
behind
the
scenes,
the
the
all
the
open
source
data
that
we're
actually
collecting
and
how
even
that
open
source
data
by
itself
makes
really
powerful.
C
But
once
we
combine
the
intelligence
which
I'll
talk
about
in
a
second,
that's
where
really
things
really
start
to
take
off.
But
as
we
mentioned,
we're
collecting
data
from
the
kubernetes
api
and
and
from
from
the
container.
So
that's
where
we're
grabbing!
You
know
the
individual
container.
Metrics
and
the
node
metrics.
We
also
have
an
understanding,
for
example,
at
a
per
node
view
right.
So,
instead
of
looking
at
it
from
a
kind
of
application
center
view,
I
can
look
at
at
the
node
level,
let's
clear
out
some
of
these
filters
now.
C
So
you
see
we
have
five
nodes
running
and
now
I'm
looking
at
each
individual
node,
and
I
can
see
the
workloads
that
are
running
on
top
of
that
node.
I
can
click
on
metrics
and
get
the
metrics
for
that
particular
node,
so
load
in
just
a
second
but
I'll
go
back,
and
then
we
can
actually
look
at
the
configuration
for
the
particular
node
itself.
A
C
Yeah
so
again,
we're
collecting
all
the
configuration
and
metadata
not
only
of
the
containers
themselves,
but
even
the
nodes
that
you're
running
on
so
things
like
the
memory
utilize.
Sorry,
the
memory
capacity
where
the
node
is
ready,
so
you'll
see
here
max
memory,
max
storage,
what
version
of
kubernetes
are
they
running?
C
And
so
you
know
here
we
see
that
we're
running
version
117
of
the
kubernetes
narrow,
which
is
actually
quite
a
little
bit
updated
and
the
kernel
version
of
the
of
the
operating
system
that
it's
running
on
et
cetera,
so
we're
bringing
again
all
this
data
together,
which
is
really
really
empowered
by
all
these
open
source
layer
tools,
we're
not
using
custom
agents,
we're
not
doing
anything.
You
know
special,
it's
just
leveraging
all
this
data,
but
bringing
it
all
together
in
a
single
place.
C
On
top
of
that,
you
know,
I
mentioned
it's
important
to
cover
things
like
pas
services
and
serverless.
So
so
again
we
we
also
collect
that
kind
of
data,
so
you'll
you'll
notice.
Here
you
saw
an
rds
instance.
I
think
I
might
have
shown
a
load
balancer
as
well
in
this
case
in
this
environment.
I
have
you
know
an
api
gateway
running
with
with
an
s3
call
out
actually
via
serverless,
so
you'll
see
this
api
gateway
and
again
I'm
grabbing
the
data
from
that
particular
api
gateway.
C
Just
like
for
the
containers,
we
saw
that
particular
entities
made
it
data.
C
Now,
here's
for
the
api
gateway
and
some
of
the
metrics
as
well
and
same
thing
for
for
the
server
list,
functions
right.
I
can
see
the
arn
or
that
particular
server.
This
function,
the
region
and
I
can
click
on
metrics
to
down
to
that.
So
the
whole
point
is
to
bring
something
that's
all
together
and
finally,
you
know
actually
before
I
show
that
I
also
did
mention
traces
and
let
me
actually
share
this
screen,
because
I
think
I'm
not
turning
that.
C
I
don't
want
to
show
the
traces
before
jumping
on
to
something
else.
There
we
go
so
again.
We
also
have.
We
also
have
a
our
trace
map
view
that
we
just
recently
announced,
and
so
when
you're
leveraging
as
we
mentioned,
distributed
tracing,
you
know
we
can
collect
all
that
data
again
on
a
single
space
and
now
what
we're
doing
is
is
we're
collecting
the
individual
traces
and
actually
we're
doing
something,
pretty
cool,
which
is
what
we
call
the
trace
map
and
an
identification
of
these
trace
paths.
B
A
C
Is
this,
hopefully,
this
is
a
little
bit
better.
C
Let
me
know
if
there's
still
visibility
issues,
but
I
bumped
it
up
just
a
little
bit
yeah
so
again
we're
just
we're,
I'm
just
showing
off
the
tracing
capabilities
again
just
bringing
everything
all
together
in
a
single
place
you
can
see
here.
This
is
this
trace
map
showing
the
different
interactions
from
the
front
end
to
the
ad
service
to
the
product
catalog
service,
but
one
of
the
really
cool
things
that
that
is
kind
of
unique
that
we've
been
able
to
develop.
C
C
Hopefully,
hopefully
this
is
a
little
bit
better.
I
think
I've
hit
the
limit
of
my
of
my
zooming
in
capabilities.
Sorry
guys,
I
always
thought
it
was
a
little
bit
bigger.
Hopefully
this
is
some
sort
of
for
you.
Okay,
so
we've
got
the
traces.
We've
got
the
the
trace
maps
and,
oh,
it
looks
like
I'm
getting
some
too
much
noise
on
my
machine.
So
sorry
about
that,
I'm
saying
that
in
the
chat
churn
hopefully
turned
off.
The
notification
sounds
here.
C
Hopefully
that
will
stop
interrupting
okay,
so
we've
got
the
trace
map
view,
but
we're
also
discovering
what
we
call
the
trace
path.
So
these
tradespads
are
not
just
sorry
guys.
Give
me
just
one
second,
I'm
trying
to
you're.
B
B
A
C
C
My
apologies
to
everyone.
Okay,
so
let
me
head
back
here,
okay,
you
know
we
have
auto
discovery,
essentially
of
not
only
the
the
transactions
themselves,
but
you
are
used
to
seeing
distributed
tracing
platforms,
but
we
are
also
grabbing
the
identification
of
the
paths
themselves.
You
might
have
a
transaction.
C
You
know
for
one
of
these
products
that
you
know
might
be
a
slash
checkout,
but
you
might
have
a
different
types
of
checkouts
for
maybe
a
class
right,
maybe
you're
selling
a
class
on
your
email
converse
site
versus
a
product
right.
So,
even
though
you
know
they're,
both
called
checkout
one
might
go
to
ad
service
and
another
one
might
go
to
the
checkout
services
and
product
catalog
service.
C
So
even
though
they're
both
named
the
same,
who
identify
those
differences
between
them
and
then
also
perform
anomaly,
automated
anomaly
detection
and
profile
those
transactions
separately
from
each
other
right.
So
that
is
that
you
know
that's
some
of
the
tracing.
We
won't
delve
too
far
into
this,
because
I
want
to
show
really
some
of
the
some
of
the
magic
behind
what
we
can
do
now
that
we
have
all
that
really
rich
open
source
data
right.
So
let
me
stop
sharing
and
re-share.
My
other
screen
just
give
me
a
second
here.
C
C
All
right
all
right,
so
you
know
some
of
some
of
the
things
that
we
can
do
now
that
we
have
all
this
open
source
data
is
that
we
can
now
start
doing
anomaly:
detection,
detecting
of
misconfigurations
misbehaviors.
C
You
know
one
of
the
things
I
actually
did
not
show,
if
I
go
back
here
really
quickly,
is
that
we
can
all
we're
also
collecting
configuration
data
not
only
at
at
this
kind
of
high
level,
metadata
kind
of
view,
but
we're
also
showing
the
the
entire
manifest.
So
if
I
click
and
I'll
just
show
what
I
did
there,
if
I
click
on
detailed
view
for
this
particular
pod
right
now,
I'm
looking
at
the
actual
manifest
for
this
particular
pod.
C
So
I
can
look
at
the
details
of
what
exactly
is
going
on
throughout
without
having
to
go
inside
the
command
line
and
figure
out.
You
know,
you
know,
cube
ctl
get
pod
dash,
oh
yaml,
and
it's
this
is.
This
is
way
simpler
and
it
also
helps
keep
everything
in
context
and
keep
you
inside
of
a
single
place.
C
But
now,
with
all
this
really
rich
data
and
and
knowing
you
know,
the
other
thing
we
do
is
we
have
the
what
we
call
curated
knowledge,
because
on
top
of
all
this,
you
do
need
to
understand
how
these
systems
interoperate
with
each
other
and
what
kind
of
dependencies
they
have
on
each
other.
That's
why
we
do
build
that
relationship
view
leveraging
all
the
data.
That's
why
we
want
to
know
what
containers
running
on
what
pod.
C
I'm
sorry
on
what
node
and
what
node
is
running
on
top
of
what
piece
of
infrastructure
is
that
we
know
when
a
piece
of
infrastructure
is
down.
We
know
that
it's
affecting
you
know
the
the
container
that's
hosted
on
it
and-
and
you
know,
there's
a
lot
of
nuance
and
variance
to
the
kind
of
problems
that
can
arise.
But
having
again
this
richness
of
this
open
source
data,
it
makes
it
all
possible.
So
I'll
show
a
couple
of
a
couple
of
things
that
we
do
here.
Let
me
find
an
alert.
C
I
think
I
was
looking
at
this
alert
a
little
bit
earlier,
so
I'll
explain
a
little
bit
what
this
is
right.
So
in
this
case
we
have
a
deployment
problem
right
on
our
particular
web
server
deployment,
we're
supposed
to
have
a
total
of
three
replicas
and
in
this
case-
and
you
know
what
I'll
bump
up
the
text
a
little
bit,
because
I
know
that
was
hacked
before
so
we're
supposed
to
have
a
total
of
three
replicas.
C
In
this
case,
we've
only
got
two
available
replicas,
and
this
has
been
going
on
for
a
little
bit
so
down
here.
You
know
we
provide
some
details.
It's
part
of
the
shopping,
cart,
name
space,
it's
the
web
server
deployment
and
here's
some.
You
know
additional
kind
of
feel
key
value
pair
details,
but
we'll
go
to
the
fun
view.
I
know
some
of
you
guys
love
reading
json,
but
I
kind
of
like
this,
the
ui
just
a
little
bit
more.
C
So
when
I
click
on
this
analyze
view,
what
it
shows
us
is
what
we
call
the
the
contextual
rca,
which
is
our
fishbone
rca
right.
So
in
this
case,
what
we're
showing
is
we're
showing
failure
categories
across
the
top
and
bottom
that
are
affecting
this
particular
deployment.
So
again,
all
this
is
being
collected
just
through
the
you
know,
acquiring
the
kubernetes
api
and
then
the
the
relationship
of
the
of
collecting
the
events
and
the
containers
and
linking
those
all
those
pieces
together.
C
So
we
have
a
replica
set
scaling
issue
right,
we're
having
an
issue
scaling
up
an
additional
replica
of
that
particular
image
and
now
we're
getting
actually
a
back
off
restart
as
well,
but
this
is
all
really
associated
to
the
startup
failure
right
and
if
I
click
on
that,
what
it's
going
to
tell
me
is
that
I
have
an
invalid
image
name
right,
so
obscurus
is
spelled
with
one
eye
and
it
looks
here
like
somebody
spelled
obscure
with
two
eyes
and
so
that's
a
bad
image
name.
C
You
know
it
took
us
all
of
what
you
know
three
four
seconds
to
figure
out
that
one
of
our
replicas
isn't
coming
up
because
of
a
bad
image
name.
So
it's
those
kinds
of
things:
the
richness
of
the
data
that
allows
us
to
build
these
really
really
quick
root,
cause
analysis
pieces
into
into
something
like
obstacles
right.
So,
yes,
you
can
do
this
from
the
command
line.
C
It's
you
know
it's
a
little
bit
more
work.
It'll
probably
take
anywhere
from.
I
don't
know,
30
seconds
to
a
couple
of
minutes,
but
you
know
multiply
this
times
a
thousand
times
5
000.
That
can
happen
in
a
month.
C
You
know
that's
a
lot
of
time
saved
for
operations,
teams
right
and
you'll
also
notice
other
ones
that
some
of
these
are
more
complex,
and
you
know
these
are
just
building
blocks
to
what
I'm
going
to
show
you
in
a
sec
of
of
these
individual
kind
of
problem,
detections
and
anomaly
detections,
but
you'll
notice,
other
other
categories.
So
things
like
a
missing
config
map
right.
If
you
reference
a
config
map
in
your
manifest
that
does
not
exist.
C
You
know,
you're
gonna
have
a
failure
of
your
of
your
pods,
so
we'll
highlight
those
things
or
failed
volume
amounts
or
even
bad
image
tags.
I
think
I
think
I
might
actually
have
a
bad
image
tag
in
here
that
I
was
looking
at
it's
a
very,
very
similar
scenario,
but
for
the
cart
server,
if
I
click
on
analyze
yep,
you
know
same
kind
of
symptoms,
no
replicas
at
scaling
issues
we're
having
back
off
restarts
going
into
a
crash
loop,
but
you
know
in
this
case
we
have
an
invalid
image
stack.
C
This
particular
image
tag
does
not
exist
right
now.
The
other
thing
that
I
didn't
go
too
far
into,
but
I
it
really
is
absolutely
key-
is
machine
learning
right
so
from
all
of
all
the
individual
services
that
you
deploy
onto
onto
your
clusters.
What
happens
is
that,
with
data
being
collected
from
c
advisor
and
from
and
from
nordic's
program
from
the
discovery
pieces,
what
we
do
is
we
create
a
a
really
rich
behavior
model
right.
We
we
we
detect
what
is
normal
behavior
for
your
individual
services
right.
C
So
if
you
are,
you
know
we
don't
just
look
at
one
or
two
metrics
like
error
rates
and
response
time,
but
we
look
actually
each
one
of
the
entities
that
I've
shown
you
have
their
own
behavior
models
and
there's
a
bunch
of
others
that
I
didn't
show
you
as
part
of
this
demo,
but
things
like
if
you're
using
databases
like
mongodb
or
or
a
jbm
or
an
nginx
container
right,
and
then
the
generic
containers
themselves
the
nodes
themselves.
C
They
all
have
their
own
behavior
models
and
we
pick
up
a
mixture
of
a
lot
of
different
metrics
to
understand
what
is
normal
behavior
and
then,
when
we
find
what
is
abnormal.
We
have
these
types
of
alerts
that
are
prefixed
by
ml,
telling
us
that
there
is
some
sort
of
ml
detected
performance
violation
right.
So
if
I
click
on
this
in
this
scenario
you
know
again
I'm
going
to
get
some
details
as
to
what
happened
right.
C
I
get
you
know,
I
know
let
me
zoom
in
a
bit
network,
transmittal
bytes
increased
by
540
percent
and
level
four
bites
for
the
outbound
traffic
increase
and
inbound
transmitter.
Bytes
decreased.
Actually,
so
we
don't
only
detect
increases
but
also
abnormal
decreases
as
well,
but
just
like
in
the
other
scenarios.
C
If
I
click
on
analyze,
I
can
get
a
fishbone
representation
of
what
exactly
is
going
on
with
with
the
metrics
and
why
the
ml
in
the
first
place,
triggered
an
anomaly,
and
so
I'm
going
to
zoom
out
just
one
piece
just
like
just
like
you
saw
for
the
you
know:
kubernetes
specific
deployment
scenarios
now
and
in
this
facebook
rca
we're
looking
at
a
container
view
this,
particularly
the
card
cache.
Had
you
know
some
some
deviation
in
its
in
its
metrics
and
actually
before.
C
Looking
at
at
this,
I'm
going
to
go
back
just
to
the
to
the
summary
screen
and
show
you
down
here.
If
I
click
more
details,
you
know,
speaking
about
the
ml
and
all
the
metrics
we
take.
These
are
all
the
different
metrics
for
just
the
generic
container
model
that
we're
looking
at
right.
So
again,
it's
not
just
one
or
two
or
three
metrics
we're
looking
at
transmittal
bytes
packets
in
packets
memory
failures,
cpu
utilization.
C
All
this
data,
particularly
for
the
container,
is
again
provided
by
c
advisor,
an
open
source
tool
right
so
again,
going
back
to
the
analyze
button.
Now
we're
seeing
the
actual
pieces
that
actually
triggered
the
ml,
and
so
you
now
you'll
notice
that
the
fishbone
has
changed
from
from
our
startup
failure.
Now
we're
showing
memory
and
file
system
and
cpu
and
so
right
away,
we'll
show
you
in
red,
you
don't
have
to
go
and
look
at
a
chart
for
this
specific
thing.
C
It's
it's
here
right,
so
I'm
seeing
cpu
utilization
has
increased
by
close
to
50
percent.
I'm
looking
at
demand,
which
is
which
is
incoming
requests.
The
response
time
has
increased
by
over
1700
percent.
I'm
looking
at
outbound
supply
side
response.
Time
has
increased
by
2
200
for
outbound
requests
from
part
cash
and
then
not
only
that
our
response
size
has
increased
from
one
to
close
to
eight
megs
right
and
then
bringing
in
the
kubernetes
layer.
It's
this
whole
image
change
right
so
again
bringing
the
data
from
the
kubernetes
api.
C
I
can
see
that
I've
got
a
recent
image
change,
that's
likely
contributing
to
this
phase.
Now
again,
I'm
glossing
over
a
few
details,
because
in
the
interest
of
time
I
want
to
show
you
guys
how
we
bring
you
know
a
couple
of
these
things
together.
This
is,
you
know,
an
ml
alert
and
by
the
way
you
can
see
this
automatically
chart
those
important
metrics
down
here.
C
So
you
can
see
their
behavior
during
the
time
of
the
anomaly,
and
as
mentioned
you
know
you
can
you
can
drill
down
into
any
any
logs
that
might
be
coming
in.
Actually,
I
should
probably
show
that
I
don't
think
I
did
so
here's
another
example
of
an
anomaly
database,
server
and
you'll
notice
here
that
you
have
different
contextual
access,
right
application.
State
I'll
show
that,
but
we
have
a
time
travel
capability
where,
with
all
this
data,
the
the
metric
data
from
prometheus
the
log
data
from
fluent
from
loki
the
trace
data.
C
All
this
you
know
we
we
build
that
real-time
map
that
you
guys
saw
and
and
all
the
configuration
data
we
take
snapshots
every
five
minutes
and
I'll
show
you
guys
that
in
a
second,
but
you
can
go
back
in
time
at
the
time
how
your
system
was
configured
during
the
time
of
this
particular
anomaly.
In
this
case
this
goes
back
a
day.
C
So
if
I
click
that
it'll
take
me
back
into
you
know
one
day
before
and
show
me
the
entire
config
of
my
entire
state
at
that
time,
but
we'll
go
into
that
in
a
second.
I
can
click
on
metrics
to
understand
the
metrics
for
that
database
server
or
any
events
that
are
related
to
that.
In
this
case,
I
want
to
show
logs.
So
if
I
just
click
on
the
pot
or
the
container
logs-
oh
maybe
it
wasn't
logging
there,
but
I
do
want
to
show
that
we
have
contextual
access
to
the
logs.
C
Actually,
let
me
find
I
just
want
to
show,
because
I
think
we
did
not
actually
show
logs.
Let
me
see,
I
think,
maybe
note
exporter
will
be
logging.
Sorry.
A
C
B
C
B
C
A
B
C
Keep
going
yeah
absolutely
so.
The
thing
I
wanted
to
show
is
logs,
because
I
absolutely
did
not
not
show
that,
even
though
it's
super
important,
but
any
anything,
that's
logging
right.
We
we
picked
that
up
from
from
your
standard
out,
but
if
you
click
on,
if
you
click
on
any
whether
it's
an
anomaly
in
this
case,
I'm
just
showing
the
pod
right,
I
have
a
pod
open
and
from
it's.
This
is
what
we
call
the
quick
view
right
from
its
quick
view.
C
One
of
the
one
of
the
links
you
have
is
for
logs
right,
so
I
can
just
click
on
logs
and
that
takes
me
straight
into
the
logs
for
that
particular
service.
Now
this
is
pretty.
You
know
static
logs
here,
but
I
can
you
know
it
is
searchable
right,
so
I
can
look
for
requests,
for
example,
or
conversion.
C
So
you
know,
depending
where
we
get
it
from
yeah,
correct
yeah.
I
I
think
I
don't
have
a
problem
here
that
has
logs
right
now,
but
but
we
do
surface
that
as
well.
So
if
you're
having
an
anomaly,
you
can
go
straight
into
the
logs
for
when
they're,
active
and
that'll
show
now.
What
I
do
want
to
show
with
all
this
really
put
together
is
I
I
I'm
going
to
show
you
an
alert
right,
so
we
we
have.
I
showed
you
guys.
C
You
know
how
we
collect
all
the
different
data,
the
architecture
right
again
we're
leveraging
just
purely
open
source
tools
here
to
collect
the
data
from
from
you
know
whether
it's
vms
or
whether
it's
kubernetes
etc
from
the
application
level,
mongodb
exporters
or
nginx
exporters,
as
well
as
the
traces,
whatever
open,
telemetry
compatible
library,
is
all
basically
built
on
open
source,
but
now
right.
What
we
have
here
is
again.
C
I
also
show
you
the
anomalies
on
how
like
specific
kubernetes
detection,
and
then
I
showed
you,
the
ml,
how
we,
how
we
automatically
detect
performance,
deviation
and
again
lots
of
different
metrics.
So
in
this
scenario,
we're
kind
of
tying
everything
together
right,
so
I
have
a
response
time
slo
breach
on
the
on
my
nginx
server.
So
I'm
going
to
click
on
that
and
again
here's
some
some
details
right.
I
have
an
slo
of
five
seconds.
C
My
response
time
is
over
15
seconds,
so
I
want
to
see
what's
going
on
right,
if
I
click
on
analyze,
what
this
is
going
to
do,
I'm
going
to
close
we'll
come
back
to
this
summary
in
a
second
I'm
going
to
close
that
piece.
What
this
is
doing
is
now
we're
showing
a
slice
of
the
actual
app
map
that
we
were
looking
at
earlier,
but
now
it's
it's.
It's
focused
on
the
time
frame
and
in
the
context
of
this
particular
anomaly
right,
so
we
so
your
route
is
essentially
here.
C
You
know
at
nginx
you're,
seeing
a
slowdown,
but
we've
also
identified
what
downstream
services
are
involved
right.
So
we
have
nginx
itself
right.
So
this
is
the
kubernetes
service,
the
part
of
the
container
and
same
thing
service
spot
container
for
web
server
redis
service,
whereas
pod
redis
container
they've
got
a
cart
server,
service,
pod
and
container
you'll
notice.
Immediately
in
the
red
we've
highlighted,
so
we're
doing
fall,
domain
isolation
as
well.
C
Nobody
had
you,
don't
have
to
call
the
nginx
micro
service
team
whoever's
managing
that
you
don't
have
to
call
the
web
server
micro
service
team.
Whoever's.
Managing
that
could
be
a
couple
of
the
same
team
could
be
a
couple
of
different
teams.
You
don't
have
to
reach
out
to
them.
You
don't
have
to
go
inside
your
tools
and
look
at
the
metrics,
for
these
particular
we're,
showing
you
they're
healthy
right.
C
So
what
the
data
has
shown
us
from
the
data
we've
collected
from
these
containers
as
well
from
the
network
data
and
the
configuration
data
and
combined
with
rml
that
intelligent
layer
of
the
operations
is,
we've
highlighted
the
red
pieces
right.
So
our
container
for
redis
is
red.
Our
card
server
service
is
red,
and
so
our
potting
container,
so
we'll
kind
of
take
this
in
the
chain
and
see
what's
going
on.
So
we've
identified,
we
have
an
slo
failure
up
here,
we're
responding,
really
really
slow.
C
Now,
if
I
click
on
the
next
piece
in
the
chain,
I'm
showing
you
know
that
redis
container
is
problematic.
If
I
click
on
that,
what
it's
going
to
do,
it's
going
to
show
us.
This
is
a
separate,
technically
a
separate
anomaly
from
the
nginx
one,
but
the
ml
has
detected
that
this
is
very
much
related
and
you'll
see
a
few
different
failures
right,
so
you'll
see
that
we're
getting
an
increase
in
throttling
on
the
cpu,
the
user.
Second,
solar
spin
on
the
cpu
has
increased
by
about
10
percent,
but
really
interesting.
C
Actually,
here
is
you'll
notice.
The
response
time
normally
as
at
2.94
milliseconds
right
now
we're
at
over
two
seconds.
This
was
automatically
detected
and
then
also
super
important
is
our
error
rate
right?
Basically,
you
usually
have
zero
errors
right
now
our
error
rate
has
jumped
up
sorry
to
36
out
of
every
single.
Basically,
every
single
request
has
essentially
gone
into
an
error
mode,
so
something
is
wrong,
so
we're
gonna,
we're
gonna
go
back
here
and
and
just
see
what
rca
is
pointing
at
so
redis
is
calling
card
server.
C
Now,
if
I
click
on
cart
server,
I'm
gonna
see
the
alert
very,
very
clear
this
service,
the
cart
server,
doesn't
have
any
pod
to
serve
requests.
That's
a
very,
very
clear
indicator
that
obviously
redis
is
experiencing
a
bunch
of
response
time
failures
and
now
error
rate
failures,
because
there
are
no
requests
behind
this
kubernetes
spot.
There
are
no
pods
behind
this
service
to
serve
any
requests
and
to
look
into
just
a
little
bit
more
detail
if
I
click
on
this
particular
pod.
C
Now
on
the
cart
server
to
see,
if
I
can
clean
what's
going
on,
it
looks
like
I'm
having
to
back
off
restart
and
if
I
look
at
the
details
of
that
alert,
it'll
actually
show
me
a
little
bit
more
detail
again.
These
are
all
separate
but
linked
together
problems.
If
I
click
on
the
analyze
tab.
Now
it's
going
to
show
me
the
real
root
cause
right.
We've
got
an
invalid
image
name
as
we
talked
about
earlier,
so
this
broken
image,
name
with
two
eyes
in
india.
C
This
is
this.
This
is
I'll
zoom
in
a
little
bit,
so
you
can
see
that
isn't
too
much,
but
you
can
see
that
obscures
india
here,
showing
two
eyes
right.
So
we
have
this
invalid
image
name,
and
that's
really,
you
know
the
the
the
root
cause
of
the
issue,
and
it's
and-
and
it's
all
shown
right
here
in
a
matter
of
seconds,
all
right
in
general-
is
experiencing
a
a
response.
Time
slow
down
radius
is
saying
that
we've
got
an
increase
in
response
time
and
alert
and
errors
hard
servers
saying.
C
Well,
I
don't
have
any
pods
to
serve
and
the
pod
itself
is
saying.
Well,
I
can't
start
because
somebody
gave
me
a
bad
image
name
all
this
in
a
matter
of
you
know
about
20
seconds
right.
It
took
me
obviously
close
to
a
minute
and
a
half
to
explain,
but
all
this
understanding,
the
kubernetes
level
up
to
the
application
level
and
how
they
are
affecting
each
other
is
all
powered
by
these
open
source
tools.
C
Plus,
you
know
the
intelligent
layer
on
top,
which
is,
in
my
opinion,
pretty
darn
cool
in
the
interest
of
time.
There
are
things
I
wanted
to
show.
I
didn't
want
to
show
time
travel,
but
I
think
we're
pretty
close
out
of
time.
I
want
to
open
it
up
for
questions
so
with
that
I'll
turn
it
back
to
you
local
time,
yeah.
B
B
B
Really
what
observability
has
to
do
to
have
this
intelligence
is
able
to
understand
the
full
context
of
the
application
across
everything
across
all
dependency
track,
that
users
should
not
have
to
do
that.
So
that's
what
we
need
to
fill
in
and
then
understand
the
application
profile,
the
behavior,
so
they
don't
have
to
worry
about
how
to
detect
carbon
setting
thresholds.
B
We
want
to
take
that
off
the
table
and
then
contextually
analyze
everything,
because
now
we
have
rich
data
in
this
whole,
distributed
systems
that,
whether
it's
infrastructure
or
kubernetes
related
or
down
to
the
application
they
all
think
together.
You
don't
you
don't
need
six
different
folks,
looking
at
traces
logs
events
alerts
to
do
that.
That's
the
role
of
observability
in
this
new
world
and
thanks
to
open,
telemetry
and
open
source
term,
it
is
possible
to
do
that.
So
think
of.
A
B
The
proof
point
you
don't
need
to
worry
about
running
multiple
silo
tools
to
really
build
that
intelligence
or
and
reduce
the
amount
of
effort
needed
I'll
pause
there.
That
was
the
whole
point
of
you
know
guys
take
advantage
of
the
open,
telemetry
and
the
open
source
tooling.
That
cnc
has
been
helping.
We
are
firm
believers
in
that
and
I
hope
you
can
leverage
it
too.
A
C
That's
a
great
question:
let
me
see
if
I
have
it
in
this
environment
here
so
for
a
typical
deployment
into
maybe
an
on-prem
cluster.
We
leverage
helm
right
again,
another
another.
You
know
open
source
tool,
so
I
mean
we
leverage
helm
it's
these
commands.
You
know
it's
about
three
or
four.
Well,
it's
actually
five
commands.
A
Inside
could
you
bump
that
up
just
a
little
bit
yeah?
I
absolutely.
C
Will
thank
you
so
essentially,
so,
essentially
you
know
if
you
don't
have
these,
if
you
have
these
existing
tools
because
a
lot
of
people,
as
I
mentioned
you
know,
these
are
the
essentially
the
de
facto
standard
for
open
source
monitoring
and
all
these
modern
environments,
most
of
the
people
we
run
into
already
have
these
tools,
so
it's
actually
a
little
bit
simpler.
But
if
you
don't
have
these
tools,
we
absolutely
you
know
this.
I
think
it's
this
last
command
that
will
deploy.
C
You
know
all
the
the
the
the
the
open
source
tools,
if
you
don't
have
them
underneath
already,
but
essentially
it's
through
helm
right
these
these
commands.
You
know
these
these
five
commands
and
that
gets
you
from
a
green
field
cluster.
To
up
and
running
literally,
I
mean
copy
and
paste.
I
mean
you're
up
and
running
in
about
three
minutes
four
minutes
and
you
have
the
entire
environment
that
I
showed
the
only
thing
that's
not
available
right
off.
The
bat
is
the
ml,
because
it
you
know
again,
it
takes
a
couple
of
anywhere
from
I've.
C
Seen
I've
seen
ml
alerts
come
back
in
a
couple
of
hours
to
you
know,
24
hours,
this
is
usually
usually
that
sweet
spot,
but
everything
else
that
you
saw
within
you
know
three
to
five
minutes
of
deploying
you're
getting
all
that
data.
So
this
is
how
simple
it
is.
A
Awesome
awesome.
Thank
you
so
much.
The
next
question
I
saw
was:
is
it
free
or
or
kind
of
what
levels?
How
does
it
work?
Is
it
a
sas
or
is
it
something
you
can
host
on
your
own.
B
A
A
B
Interesting
so
you're
talking
about
when
you
don't
have
what,
if
you
can
collect
this
metrics
you're
saying
and
push
it
to
us,
it's
a
little
trickier
with
this.
Depending
on
the
context.
Maybe
you
have
to
dig
into
a
little
more
specifics.
You
know
what
data
because
remember
in
order
to
understand
the
application
context,
we
pull
everything
being
able
to
see
the
dependencies
so
seeing.
B
That
so
it
will
probably
be
specific,
so
we
can
take
this
offline,
and
this
you
know
attendee
has
something
specific
that
we
can
follow
up
and
things.
C
Yeah
yeah,
I
will
mention
you
know
it
it.
You
know
it
depends
on
how
your
edges
it
just
compare
like
if
you
flat
out,
don't
have
like
access
to
to
like
export
metrics
right,
I
mean
again,
you
know,
obscures
itself,
isn't
really
doing
much
on
the
collection
side.
It's
really
around.
C
You
know
having
prometheus
on
the
cluster
and
having
loki
or
fluentd
on
the
cluster
to
collect
that
data.
Really,
if
you
don't
have
access
to
there,
you
know
that
that's
that's
something
to
be
explored,
but
speaking
about
edge
itself.
You
know
we
have
recently
published,
like
a
joint
blog
with
verizon,
where
we're
talking
about
and
again
I
don't
know
what's
going
on
with
my
dnd
button.
I
know
somebody
mentioned
it.
I
don't
know
what's
going
on,
I
do
not
disturb.
I
promise.
I
turned
it
on,
but.
C
You
know
we
when
you're
running
kubernetes
clusters,
for
example
at
the
edge
or
running
workloads
at
the
edge.
It
absolutely
is
a
supported
model
right
again,
that
joint
blog,
I
mentioned
is
is
is
launching
a
kubernetes
cluster
on
aws
wavelength
and
you
know
with
kubernetes
and
observability
built
in
with
you
know,
with
the
op
screws,
and
you
know
it
functions
perfectly
fine.
But
again,
if
you
have
like
a
really
really
locked
down
edge,
and
that
might
be
something
we
can,
you
know
talk
offline
and
feel
free
to
reach
out.
A
Awesome
awesome.
Thank
you
next
question
next
question
is:
is
aws
bottle
rocket,
supported.
A
Yeah
amazon's
operating
system
bottle
rocket.
I
believe
that
it's
it's
kind
of
built
for
containers
and
and
running
things
on
that
front.
My
initial
thought
would
be
yes
because
of
the
interfaces
that
you've
chosen
to
bind
to.
You
know
cni
csi,
all
of
those
things,
but
so
long
as
those
are
supported.
That
should
be
good,
but
not
sure.
If
that
might
might
correlate
to
the
system,
metrics
might
be
the
specific
question.
C
Yeah
correct,
so
it
should
be.
You
know
I
don't
I
don't
remember
if
there
was
actually
somebody
that
is
using
aws
bottle
rocket,
but
again
we
actually
don't
build
in
necessarily
too
much
into
the
into
the
os,
because
as
long
as
they're
running
like
the
minimum
required
kernel
like
on
the
on
the
on
the
nodes
right,
which
is,
I
believe,
kernel
415
of
linux
and
up
yeah,
I
mean
we,
we
shouldn't
have
any
any
issues
supporting
that.
C
If
you
want
to
explore,
I
I
highly
suggest
signing
up
for
the
for
the
free
version
and
it
should
work.
I
don't
see
why
it
wouldn't
so
yeah
that
that's.
B
A
B
A
Awesome
awesome:
awesome
next
question
is:
does
the
sas
support
sso
and
saml
login.
C
Yes,
absolutely
yes,
so
I
believe
the
free
version
does
doesn't
have
like
that
quality
of
life.
There's
there's
some
of
those
things
that
you
know
are
like
more
enterprise
features,
but
yes,
absolutely
many
of
our
customers
are
using
like
azure
id
or
they
might
be
using
octa,
etc.
We
absolutely
support
that.
A
Excellent
next
question
is:
I
promise
keep
keep
peppering
you
until
we're
out
of
time
what
resources
does
ops
cruise
require,
I'm
guessing
that
might
be
pertinent
to
like
kubernetes
kubernetes
primitives,
like
node
storage
deployments,
config
maps.
A
C
Right
but
typically
you
know
for
for
the
actual
open
source
collection
tools.
I
mean
you
know
each
one
has
their
own
requirements,
but
they're
really
small.
I
mean
I
mean
we're
talking
about.
You
know
hundreds
of
millicourse
to
run
the
the
open
source
collectors
like
you
know,
see,
advisor
and
and
notice
where
those
are
all
really
really
lightweight.
The
only
piece
that
use
utilizes
more
resources
is
really
prometheus
right
and
that's
just
depending
on
the
you
know
how
many
objects
you
have
in
the
cluster.
We
typically
for
a
small
size.
C
Cluster
recommend,
maybe
like
in
maybe
like
a
two
two
cpu,
eight
or
twelve
gig
machine,
but
you
know,
as
you
scale
up
and
the
amount
of
objects
that
you
know.
A
C
I
think
I've
seen
just
and
I
might
be
misremembering
so
you
have,
if
you
want
to
more
details
like
really
hard
numbers,
please
reach
out,
but
I
think
we've
seen
like
five
close
to
five
thousand
containers
being
monitored
by
like
at
this
point,
maybe
64,
gig,
four
cpu
machine
node
to
power
prometheus
and
it's
not
fully
used,
but
sometimes
when
it,
you
know
when
you
scale
out
and
and
and
you
get
just
all
those
some
tons
of
a
spike
in
objects.
C
That's
really
when
it
uses
that,
but
prometheus
is
really
the
biggest
one
and
you'll
find
this.
You
know
it's
not
an
options,
it's
a
prometheus
piece,
but
other
than
that,
all
those
components
I
mean
you're,
talking
extremely
extremely
small
resource
requirements,
really
negligible
on
your
clusters.
A
Absolutely
absolutely
well
with
that.
Unfortunately,
we
are
at
time.
I
really
do
appreciate
everybody
reaching
out
and
asking
those
questions
like
all
good
things.
Streams
have
to
come
to
an
end,
so
we
are
at
that
point,
but.
C
A
There
is
anyone
looking
to
kind
of
reach
out
to
either
of
you.
Is
there
a
good
place
to
open
up
those
questions.
B
Sure
you
can
reach
us
to
info
obscures.com
to
be
generic
enough,
and
you
can
also
ping
us
on
the
website.
Upscrews.Com
itself.
You
know
we
should
be
easy
to
find
us.
We
also
on
linkedin.
If
you
want
to
look
outside
our
office,
page
love
to
chat
with
you
guys.
B
This
was
interesting
and
I
know
exciting,
given
where
things
are
going
with
open,
telemetry,
open
source.
A
Well,
thank
you
both
so
much.
Thank
you.
Everyone
for
joining
the
latest
episode
of
cloud
native
live.
It
was
great
to
hear
from
eloque
and
caesar
we
really
again
love
the
interaction
and
questions
from
all
of
the
audience
join
us
next
week
to
hear
about
how
we're
going
to
be
building
stability
in
kubernetes
with
andy
suderman
of
pharaoh
ends.
Thank
you
all
for
joining
us
today
and
we
will
see
you
soon
have
a
go.