►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
My
name
is
Marty
I
go
as
DNS
machine,
which
is
a
little
hard
to
pronounce
in
English,
which
is
DNS.
M-I-C-H-I
I
got
that
nickname
like
10
years
ago,
when
I
was
working
in
Vienna
at
the
University
and
I
couldn't
get
rid
of
it
after
a
while.
A
But
it's
not
about
me
today,
it's
like
what
what
should
you
expect
in
the
next
couple
of
minutes,
some
stories
from
an
open
source
monitoring,
maintainer
I
was
in
the
past
diving
into
metrics
Prometheus
grafana
cubicius,
some
things
about
alerts
in
the
service
level
objectives
and
then
we're
diving
into
chaos
with
kubernetes
talking
a
little
bit
about
DNS,
again,
chaos
tracing
and
some
ideas
and
stories
around
Observer,
Beyond,
observability,
open
Telemetry
and
obviously
you
might
find
some
Lego
images
between
the
slides,
so
you
might
catch
them
all
and
I
want
to
start
with
some
up
stories
on
like
how
do
we
approach?
A
We
have
kubernetes
up
and
running.
There
is
the
architecture.
There
are
many
components.
There
are
many
names
to
understand.
It's
notes.
Parts
containers
deployments
Services
apis
ports,
data
sources
and
then
my
knowledge
ends
at
some
point
and
the
question
was
like
how
could
I
like
monitor
that?
A
What
is
important
to
me
and
someone
said
we
need
monitoring,
I'm,
saying:
okay,
maybe
availability,
monitoring
or
something
like
that,
or
some
performance
and
resource
monitoring,
and
we
want
to
identify
slower
blocking
deployments
now,
the
classic
host
service
model
with
state-based
pulling
and
something
like
that,
doesn't
really
apply
to
microservices.
So
we
might
be
looking
into
metrics
logs
even
more.
We
do.
We
need
to
understand
all
the
components
which
are
running
to
really
feel
that
everything
is
working.
Maybe
maybe
not
what
are
the
best
practices
and
it
can
get
overwhelming.
A
So
we
really
need
to
figure
out
what
is
important
now
and
what
can
be
done
later
on
so
the
first
iteration
it's
now
like
kubernetes
has
many
different
data
sources.
Where
can
use
service
Discovery
and
within
the
cncf
ecosystem
and
wider
Community
I
found
Prometheus
the
metrics
endpoints.
It
has
a
Time
series
database,
you
can
calculate
Trends,
you
have
dashboard
service
level
objectives
and
from
there
it's
easy
to
go
to
alerts
and
incidents.
A
A
little
bit
about
Prometheus
itself,
it's
a
huge
picture.
The
most
important
thing
is,
we
have
the
premises
server
and
it
allows
us
to
query
kubernetes
by
service
Discovery
and
being
able
to
monitor
certain
things.
A
You
can
play
kind
of
Lego
and
take
many
things
on
top,
but
we
want
to
focus
really
on
Prometheus
and
also
keep
in
mind.
It
has
a
strong
query
language
or,
like
a
feature-rich
query
language
which
allows
us
to
calculate
the
metrics
and
the
data
we
want
to
use.
We
can
present
it,
we
can
consume
it
using
an
API
and
the
format
is
pretty
much
I,
wouldn't
say
self-learning,
but
it's
straightforward
to
really
get
an
Insight
pretty
fast
into
your
monitoring.
A
Now
from
the
UI
side,
you
might
know
that
there
are
rafana
dashboards,
but
promises
also
has
a
UI
most
recently
pretty
much
improved,
and
one
of
the
other
things
I
recently
found
was
persis,
which
is
in
development.
At
the
moment
at
the
cncf,
which
is
dashboard,
is
code
and
could
help
automate
certain
things
even
more.
A
Do
you
like
kubernetes
and
Prometheus?
The
Prometheus
operator
is
a
feature
Rich
operator,
and
it
also
provides
Cube
Prometheus
on
top,
which
allows
you
to
deploy
Prometheus
the
nodex
Border,
the
web
manager
grafana
and
also
certain
best
practices,
around
alerts
and
dashboards,
which
is
awesome
because
you
can
immediately
see
something
and
it's
not
just
monitoring
is
deployed
and
it's
empty.
A
Now.
What
exactly
will
we
be
seeing
within
the
cube
premises?
Deployment?
It
will
be
lots
of
metrics.
We
do
have
custom
metrics
for
the
node
status,
for
the
resource
usage
for
the
deployments,
number
of
Parts
Network,
and
even
even
more
so
at
some
point
it
has
gotten
quite
overwhelming,
looking
at
all
the
dashboards,
but
it's
great
to
have
them,
so
you
can
analyze
them.
A
The
other
thing
to
mention
is
cubesat
State
metrics
is
is
also
a
project
which
gets
automatically
deployed
to
figure
out
the
health
of
the
deployments,
nodes
and
parts.
Many
different
ways
are
being
abstracted.
So
you
don't
need
to
reinvent
the
wheel,
doing
the
monitoring
can
just
use
that
and
it's
installed
by
default.
A
Now,
when
we
think
of
defining
service
level
objectives
or
slos
and
alerts,
we
need
to
figure
out.
Well,
metrics
are
nice.
What's
next
so
like
we
want
to
define
the
definition
of
failure,
we
want
to
do
something
because
maybe
a
threshold
has
been
violated,
a
rule
has
been
matched,
so
we
want
to
notify
and
raise
an
alert,
the
who
is
potentially
everyone
No.
A
It
should
be
a
responsible
team
and
identified
personas,
and
also
it
shouldn't
be
just
like
an
alert
and
say
Yes,
nice
but
acknowledged
and
go
away,
but
you
want
to
really
act
on
that
and
provide
documentation
for
incidents,
run
books
to
fix
problems
to
analyze
things
and
maybe
even
Define,
like
corrective
actions
within
the
incident,
and
the
other
thing
is
to
keep
in
mind
to
iterate
on
every
incident
and
everything
which
happens
like
reducing
the
mean
time
to
response
or
with
resolver
depends
how
you
want
to
abbreviate
it
when
we
think
of
alerts
with
Prometheus.
A
There
are
two
ways
like
Prometheus
alert
rules,
sending
them
to
the
alert
manager
and
the
alert
manager
allows
you
to
do
some
grouping.
Some
inhibitions,
some
silencing,
which
means
kind
of
acknowledging
things
for
a
given
time,
and
then
you
can
send
the
alerts
to
specific
endpoints,
apis,
email,
classic
or
Pages
or
other
ways
to
to
notify
about
things.
Now
for
kubernetes.
A
There
are
some
alerts
defined
by
default
by
the
premises
operator.
Cube
Prometheus
I've
also
found
a
website
which
is
called
awesome
prometheusalerts.grab.to,
which
has
many
many
more
best
practices
and
you
can
easily
like
copy
paste
them.
You
should
like
inspect
what
you
really
need,
but
it
can
be
helpful
to
Define
infrastructure
alerts
for
memory,
CPU
and
even
disk
pressure
plots
error
rates
and,
and
also
reachability,
for
example.
A
This
is
a
lot
to
like
understand
and
to
know
the
way
we
can
integrate
it
into
the
premises
operator
is.
There
is
a
so-called
Prometheus
Reward
customer
resource
definition,
which
allows
you
to
wrap
the
from
the
premises
alert
rule
into
the
kubernetes
XML
format,
and
this
can
be
helpful
to
like
deploy
everything
in
one
format
and
not
just
have
five
different
ways
to
configure
things
with
your
Prometheus.
A
Now
for
the
alert
receivers,
a
chat
can
be
something
like
I
want
to
immediately
see
something
but
oftentimes.
It's
really
needed
to
refine
things
and
think
about
grouping
and
not
have
500
alerts
being
flooded
in
your
inbox
at
3am
in
the
morning,
and
you
have
no
idea
what's
going
on,
but
you
still
need
to
fix
the
problem
because
the
customer
is
calling.
A
Maybe
you
want
to
like
reuse,
external
ticket
issue
systems
to
have
a
way
to
consolidate
this,
for
everyone?
Maybe
mailing
lists
even
and
the
other
thing
is
like
also
to
think
about
that
you
might
get
tired
of
too
many
alerts.
It's
alert,
fatigue
I
think
it's
pronounced.
This
is
often
which
something
which
can
lead
to
Bernard
after
a
while.
A
So
it's
really
important
to
think
about
what
should
be
actually
actually
alerted
and
not
just
be
a
notification
number
in
your
inbox,
which
is
20K
and
read
or
something
yeah,
and
the
thing
is:
how
do
we
get
these
alerts
so
by
default,
everything
is
working
and
oftentimes.
You
don't
see
anything
now
we
can
trigger
loads
manually
by
killing
a
ball
or
doing
something,
but
like
manual
work
is
a
pack,
so
there
are
probably
better
ways
to
cause
some
chaos
in
in
the
kubernetes
monitoring.
A
You
might
be
using
Cube
Doom
to
kill
some
ports
in
a
fun
way,
but
I'm
not
sure
if
that
is
allowed
at
work,
but
it's
still
an
interesting
project
to
try
out
and
and
see
what
the
alerts
are
working
or
not.
A
A
I
can
break
DNS
and
see
how
everything
is
like
not
working,
but
in
the
end
you
want
to
like
add
some
sort
of
automated
Chaos
on
a
schedule
on
a
cycle
on
a
way
of
also
using
it
in
a
staging
environment,
but
also
in
a
production
deployment
testing
in
production,
and
the
idea
is
to
really
trigger
alerts
and
service
level
objectives
to
see
that
everything
is
working
as
expected,
and
you
can
react
on
that
now.
This
brings
me
to
observability
and
Chaos
engineering,
which
is
kind
of
you
can
do
it.
A
The
boring
way
like
the
German
chaos
or
you're,
looking
into
like
Cloud
native
deployments,
chaos,
Frameworks
chaos,
experiments
and
so-called
instrumentation
sdks,
but
I
still
like
the
German
chaos
but
yeah.
The
idea
is
really
to
use
a
framework,
and
one
of
these
Frameworks
I've
found
is
in
the
cncf
communities.
Chaos
mesh.
There
is
also
litmus
scales
and
some
other
projects
around
this
area
and
yeah,
and
it's
easy
to
fail.
A
Kubernetes
notes,
Parts
whatever
or
even
hosts
in
your
Cloud
infrastructure,
and
you
can
play
around
with
like
reducing
the
network
or
failing
the
network
failing
https
or
making
it
slower
playing
around
with
time,
which
is
quite
interesting.
A
If
you
maybe
know
an
Argos
from
10
years
ago,
cannot
reschedule
a
check
because
time
is
not
in
sync
or
maybe
even
DNS,
because
when
something
is
not
resolving
to
an
IP
address
or
giving
a
wrong
response,
the
application
might
do
something
weird
or
the
deployments
and
from
a
from
a
user
perspective,
chaos
mesh
allows
you
to
run
experiments
like
want.
A
You
define
it
or
you
can
also
use
it
continuously
as
a
schedule
which
can
be
helpful
that
you
say
you
want
to
test
it
every
day
in
the
morning
at
9am
and
see
how
it
goes,
probably
not.
On
the
weekend
the
first
steps
in
generating
some
cows.
Some
chaos
can
be
to
kill
some
some
or
some
not
to
kill
some
parts
of
face
and
parts
in
a
more
Gentle
Way
and
the
shown
screenshot
kind
of
forces.
A
The
pods
to
fail
and
kubernetes
detects
that,
after
a
while
there's
a
crash
loop
back
off
and
okay,
this
is
kind
of
expected,
but
this
is
sort
of
like
the
real
chaos.
I
really
want
to
see,
and
so
I
was
like.
Maybe
we
should
dive
into
more
practical
examples
rather
than
trying
to
click
in
the
UI
and
do
something
so
I
was
thinking
of
maybe
turning
back
time
a
little
bit
when
I
was
a
developer
and
we
wrote
distributed
monitoring
application
and
only
when
DNS
was
failing
at
the
customer
environment.
A
It
leaked
memory
and
caused
some
other
troubles
and
we
as
developers.
We
were
not
able
to
reproduce
that,
and
it
was
like
fix
this
no
idea.
Okay,
then
fixing
the
code
pretending
it
worked,
releasing
it
and
letting
the
customer
test
in
production,
this
kind
of
turned
into
an
endless
burnout
cycle,
but
in
the
end
it
was
a
nice.
Well,
it
was
a
good
experience
to
learn
and
say:
hey.
There
were
certain
scenarios:
I
cannot
even
predict,
they
would
happen
in
this
in
in
a
production
environment,
and
it
got
me
thinking
of
well.
A
Maybe
I
can
write
a
short
application
which
simulates
that
behavior
just
using
a
receive
buffer,
which
is
like
one
megabyte
and
we're
doing
some
DNS
resolver
and
we're
handling
all
the
errors
and
we're
not
freeing
the
buffer.
So
we
are
leaking
this
problem
intentionally
now.
The
thing
is
then
I
I
talked
about
this.
You
know
everyone
can
contribute.
Cafe,
Meetup
and
Nicholas
was
like
yeah
I
have
my
own
chaos.
I
can
like
scale,
coordinates
to
zero
replicas
and
was
like
oh
I.
A
Didn't
know
that
interesting,
which
is
one
way
which
is
a
it,
is
a
cool
way
to
fail
everything,
and
it
got
me
thinking
but
yeah.
How
can
we
do
that
with
the
more
automated
way
and
only
failing
certain
things
in
this
scenario,
so
with
chaos
mesh?
You
can
automate
this
kind
of
DNS
failure
and
say
I
want
to
choose
the
the
type
DNS
chaos
for
that
for
that
thing,
or
for
that
for
that
experiment
and
the
action
should
be
error.
A
You
can
also
choose
random,
which
provides
some
random
IP
addresses
as
a
response
and
see
what's
what's
happening
and
the
other
thing
is,
you
can
use
selectors
for
namespaces
and
labels,
of
course,
and
the
patterns
so
in
this
example,
I'm
checking
specifically
for
or
11y
dot
whatever
and
also
for
container
days.
So
I
was
actually
looking
into
a
demo
some
hours
ago
to
really
make
this
fail
and
looking
at
time,
I
think
I'm
good
in
time.
A
I
can
do
the
demo
I
want
to
do
the
steps
with
you
together
now
and
one
of
the
things
is
I
do
have
oops,
maybe
I'm
looking
at
it.
A
Yeah
and
this
one
is
actually
resolving
things,
then
I
need
to
make
sure
this
is
the
port
for
warning
is
currently
not
working.
Okay,
we
do
have
a
chaos
experiment
over
here
in
chaos
mesh
so.
A
A
We
do
have
the
Prometheus
query
for
the
container
memory
usage,
and
this
is
currently
something
like
300
kilobytes,
200
kilobytes,
something
and
when
the
experiment
is
starting,
the
should
go
up
actually.
A
A
A
Let's
create
one
from
scratch,
and
this
should
be
the
DNS
schedule
for
container
days,
so
you
can
also
write
it.
The
same
way
as
I
did
now
and
just
create
a
new
schedule,
and
you
can
just
upload
a
Yammer
definition,
for
example,
submit
it
everything
gets
pre-filled
by
by
default,
and
it's
called
it
CD
Now
and
force
it
to
run.
A
And
if
it
doesn't
work
yeah,
then
then
I
blame
it
on
the
Wi-Fi.
No.
A
Okay,
I'm
gonna,
we'll
be
getting
back
to
that
after.
In
the
end
of
the
talk,
the
thing
is,
or
the
thing
I
wanted
to
show
you
is
that
the
container
memory
usage
is
going
up
and
I've
defined
an
alert
on
a
service
level
objective
for
10
megabytes
and
everything
goes
up
and
the
chaos
experiment
is
running
like
in
the
screenshot
and
before
doing
so,
I've
also
defined
an
alert
which
triggers
and
you
can
see
it
in
the
alert
manager.
So
this
is
a
screenshot
from
first
of
September.
I
also
have
one
from
today.
A
Well
yesterday,
the
idea
is
really
to
like
trigger
an
alert
and
see
that
something
is
failing
just
because
it
leaked
memory.
Now
the
thing
is
when
you're
generating
lots
of
alerts
and
lots
of
things
in
your
kubernetes
cluster
and
and
generating
a
lot
you
might
need
grouping.
You
need
additional
context.
You
also
need
to
focus
on
the
dashboards,
because
the
cube
Prometheus
provided
dashboards.
A
There
are
so
many
of
them
that
it
might
be
needed
to
reduce
the
amount
of
data,
correlate
certain
things
and
also
provide
more
context
to
really
provide
actionable
insights
into
what's
going
on,
because
when
everything
is
burning,
you
really
need
focus
and
want
to
fix
things
fast
enough
to
gain
confidence
and
really
look
into
things.
There
are
many
ideas
points
already
available.
We
have
metrics,
we
have
the
values
we
can
like,
create,
prompt
queries
for
alerts
and
service
level
objectives
for
Ops.
A
It's
still
valuable
to
look
at
the
golden
signals
defined
by
the
Google
SUVs,
like
latency
traffic
error
situation,
to
get
an
immediate
Insight
whether
the
deployment
is
healthy
or
the
current
cluster.
A
The
other
thing
is
like
document
everything
about
customizations,
also
think
of
onboarding,
because
when
someone
is
joining
the
team-
and
they
don't
really
know
about
the
current
state
of
observability
in
kubernetes-
can
be
really
hard
to
really
to
understand
and
get
into
the
loop
basically
and
the
goal
should
always
see
to
see
to
immediately
see
what's
important
during
an
incident.
A
When
you
look
into
customization
for
cube
Prometheus,
you
need
to
learn.
Jsonnet
which
can
be
done.
Can
develop
your
own
rules
and
dashboards?
You
can
monitor
their
namespaces.
Add
applications
remove
applications
if
you
don't
want
to
use
grafana,
but
something
else,
for
example,
can
do
that.
A
I've
looked
into
the
previous
example
of
the
container
memory
usage,
this
can
be
defined
by
adding
a
grafana
dashboard,
defining
the
data
sources
using
the
prompt
query
and
just
like
have
have
it
configured
in
one
format
and
deployed
into
the
the
cluster
another
Advantage.
Another
confidence
thing
is
like
with
Prometheus
promises
automatically
scrapes
HTTP
endpoints,
like
the
slash
Matrix
endpoint,
when
your
application
provides
a
service,
monitor
customer
resource
definition,
key
promises
picks
it
up
automatically
and
you
don't
need
to
think
about.
A
Hey
I
need
to
configure
a
host
a
service
or
the
metric,
but
it's
that
everything
works
with
auto
discovery,
which
is
also
a
quite
nice
way
to
to
do
that
now.
For
a
moment,
we've
talked
a
lot
about
metrics,
but
obserability
is
like
Beyond
metrics
and
there
are
certain
event
types
which
we
certainly
see
like
logs.
We
have
maybe
traces
profiling,
error,
tracking,
even
more
event
types
and
then
we're
thinking
about
if
this
observability
yeah.
A
Maybe
we
want
to
have
like
the
look
from
above
and
see
things
which
are
like
the
unknown
unknowns
like
DNS
is
leaking
some
memory
or
something
else
is
causing
a
failure
which
we
haven't
been
aware
of.
Yet
we
never
would
have
would
have
expected
it.
For
example,
for
locks.
I've
read
many
things,
I've
ever
evaluated
many
things
myself.
There
are
many
different
opinions
as
well
for
Central
Lock
management
solution
with
kubernetes.
You
really
need
to
figure
out
what
is
important
to
you.
A
Is
it
helpful
to
have
a
retention
of
logs
for
one
year
and
pay
I?
Don't
know
a
million
of
dollars
or
something
or
is
it?
Is
it
just
needed
for
Life
tailing
or
for
the
past
the
past
day,
when
the
internet
happened
to
analyze,
something
there
are
various
options
available?
This
is
just
a
random
list
of
of
vendors
and
tools
which
you
can
look
into,
but
in
the
end,
it's
really
like
up
to
what
is
needed
to
solve
the
problems
rather
than
keeping
the
logs
forever,
because
we
have
endless
storage,
which
we
don't.
A
The
other
thing
which
is,
in
my
opinion,
a
little
more
interesting,
is
to
look
into
tracing,
which
is
like
a
trace
is
like
a
log
but
spans
with
starts
and
end
time.
There
is
a
context
there
is
metadata
enrichment
and
the
thing
is
even
if
I
need
to
learn
it
in
in
the
way
of
adding
it
to
my
code
or
as
a
developer,
I
need
to
edit
to
the
code.
A
There
are
other
ways
to
Auto
to
use
Auto
instrumentation
and
details
providing
a
few
and
distributed
environments
with
many
different
micro
services
and
sources
and
ways
how
like
the
the
client
response,
gets
calculated
in
the
back
end.
So
this
can
be
really
like
helpful
looking
into
tracing
one
of
the
things
which
evolved
over
time
and
I
will
only
touch
the
surface
now
is
open
Telemetry
as
a
specification
and
framework
which
is
beyond
vendor.
A
So
everyone
is
working
together
on
a
specification,
it
it's
a
collector
and
can
also
be
run
as
a
sidecar,
and
the
idea
is
to
bring
your
own
backend.
So
if
you
want
to
store
metrics
use
Prometheus
if
you
want
to
store
traces,
use,
yoga
tracing
and
on
the
other
side,
the
the
client
libraries
in
the
ecosystem
is
very
feature-rich,
so
there
are
many
libraries
already
available
or
in
development,
and
even
our
trans
instrumentation
is
available.
A
So
if
you
want
to
learn
more
I
would
recommend
checking
out
dotan's
talk
and
lightning
talks
tomorrow.
I
think
he's
an
expert
in
that
area.
Can
answer
many
questions
for
now.
For
me,
it's
the
focus
on
traces,
so
in
cubanators
the
components
can
send
traces.
There
are
certain
patches,
Upstream
already
available,
which
you
can
enable
and
the
application
can
get
instrumented
with
traces
being
sent
to
the
open,
Telemetry
collector
stored
in
Jaeger,
and
we
can
use
it
to
visualize,
correlate
and
alert.
A
Why
would
I
be
doing
that
so
one
example
could
be
like
the
client
request,
some
data
from
an
HTTP
server,
the
servers,
the
backend
query,
some
apis
collects.
The
data,
creates
client
response.
The
idea
is
then,
to
instrument
nginx,
and
apart
share
as
a
web
server
to
see
the
traces
you
can
use.
For
example,
the
open
Telemetry
web
server,
SDK
and
Center
traces
to
Jager,
which,
which
shown
in
the
screenshot,
got
me
thinking.
A
How
can
I
add
chaos
to
that,
because
I
really
want
to
see
that
the
traces
are
getting
longer
because
something
is
going
wrong,
so
I
thought
of
well,
maybe
add
some
sleep
functions
in
my
code
and
then
deploy
it
and
so
that
nobody
sees
it
behind
a
feature
flag.
I
was
like
no,
that's
crazy
shouldn't
be
doing
that
better.
A
Think
of
a
chaos
experiment
for
like
HTTP
Network
or
even
put
a
stress
test
on
the
Node
or
the
pod
to
see
what
is
going
on
and
one
of
the
things
I
I
did
in
the
past
week
was
to
use
chaos
mesh
to
stress
test.
The
CPU
in
memory
of
a
specific
part
for
this
open,
Telemetry
engineering,
service
and
I
could
see
like
the
trace.
Time
was
increased
by
five
times
or
something
still.
A
It
was
two
microseconds
two
or
two
milliseconds
to
nine
milliseconds,
but
it
wasn't
a
certain
increasement
and
the
idea
which
which
I
got
from
that
was
hey.
We
can
link
metrics
to
traces
so
when
something
is
breaking
in
in
the
system-
and
we
see
the
graph
going
wild,
maybe
it's
helpful
to
see
the
traces.
A
There
was
a
talk
about
the
future
of
yiga
and
aggregated
Trace
metrics
for
mentioned,
so
you
can
create
metrics
from
traces
received
in
open
Telemetry,
so
we
can
again
add
alerts
to
the
metrics
and
add
chaos
to
that
to
see
what's
what's
going
on,
and
the
other
thing
is
like,
like
I
mentioned,
before,
traces
for
kubernetes
system,
components
which,
which
is
also
a
thing
with
kubernetes,
now
long
story
short.
A
There
is
so
much
more
to
talk
about
even
Beyond
observability
just
want
to
like
pitch
some
ideas
to
you.
What
else
is
coming
or
coming
soon
and
it's
important
to
maybe
have
a
look
at
we're
talking
a
lot
about
eppf,
which
will
also
allow
us
to
perform
observability
on
the
Kernel
level
or
on
on
specific
sus
call
events
using
it
for
observability,
but
also
security
and
other
things,
and
the
thing
is:
how
does
it
fit?
A
How
does
it
fit
Prometheus
metrics
does:
can
we
integrate
that
can?
Does
it
complement
itself?
How
do
like
ebpf,
service
level,
objectives,
alerts
and
Chaos
engineering
work
together?
A
Also
because
psyllium
open
source
tetragon
at
kubecon
and
I
thought
of
well
I
only
have
like
35
minutes
today,
but
I
will
be
building
future
talk
stories
and
demos
and
other
things
for
that.
So
we
need
to
learn
and
learn
and
learn
now.
A
The
other
thing
I
want
to
bring
to
your
attention
is
hacking
kubernetes,
so
the
security
parts
and
thinking
of
like
impersonating
the
attacker
and
maybe
use
chaos,
engineering
principles
for
pen
testing
I'm
talking
too
much
yeah
and
maybe
have
observability
for
for
better
security
in
in
that
regard,.
A
And
even
devsecops
and
for
your
own
chaos,
you
should
know
the
limits
of
Chaos
so
avoid
chaos
Inception,
it
doesn't
make
sense
or
you
shouldn't
be
running
all
experiments
all
the
time
because
it
can
harm
existing
workflows
and
teams,
maybe
use
staging
environments
to
prevent
data
loss
and
also
think
of
that
chaos.
Engineering
doesn't
solve
all
the
reliability
issues,
but
they
can
bring
what
can
help
with
New
Perspectives
and
I'm
keeping
it
short
because
I'm
losing
my
voice
already
continuous
observability.
A
Think
of
your
workflows,
the
ICD,
with
with
merge
requests
and
there
to
use
chaos,
engineering
and
metrics
and
observability
in
there
to
get
feedback
fast
and
avoid
the
DNS
errors
I
made
10
years
ago
or
think
of
continuous
delivery
and
run
chaos,
experiments
in
production
and,
last
but
not
least,
I'll
bring
confidence
with
chaos,
bring
chaos
into
observability
chaos,
workflows,
alerts,
iterate
and
innovate,
and
the
last
thing
is
maybe
combine
machine
learning
and
Chaos
engineering,
which
would
be
fun,
maybe
Skynet
in
the
future,
or
something
but
yeah.
A
Many
many
ideas,
many
great
things
to
learn
and
if
you
want
to
learn
more
I
would
recommend.
Checking
out
and
many
more
to
tomorrow,
which
dive
deeper
into
observability
and
I.
Have
a
newsletter
and
I
also
have
new
gitlab
stickers.
If
you
want
some
later
on
after
the
questions,
thanks
for
the
attention.
B
A
A
One
thing
you
can
do,
is
you
define
an
the
alert
manager,
allows
you
to
Define
web
hooks,
for
example,
or
a
different
transport,
and
it
should
be
doable
if
you
just
configure
the
the
external
webhook
with
your
or
something
else
and
move
it
from
there.
But
you
can
ping
me
on
on
Twitter.
If,
if
you
need
more
information
about
this
okay.
A
I'm
hoping
that
the
the
passive
project
which
I
showed
so
it's
github.com
passes
where
there's
like
work
underway,
it's
Apache
version,
2.0
licensed
it's
not
httpl,
like
ravana,
is
now
so
I'm,
really
looking
forward
to
see
progress
and
I.
Think
at
coupon
kubecon
North
America
there
will
be
more
updates
in
late.
October
I
will
be
there
as
well
and
like
follow
the
project.
I
would
say:
okay,.
A
We
don't
know
if
that's
really
like
dashboarding
already
in
the
UI,
so
like
it
premises,
UI
was
replaced.
I
think
it's
react
based
right
now
and
there
is
like
a
basic
view
on
metrics
and
you
can
query
with
things
I'm,
not
sure
how
far
they
will
be
going
with
dashboarding
or
focusing
on
the
aforementioned
persons
project,
but
I
really
like
I'm.
Looking
forward
that
someone
Builds
an
alternative
to
the
almighty
grafana.
To
be
honest,
okay,.
B
B
A
Honest
to
be
honest,
you
don't
know
I
I,
don't
know,
but
I
think
it's
I
will
look
it
up
yeah!
It's
a
great
idea.
Thanks.
B
A
I'm
not
sure
if
it
will,
if
this
will
like
happen
within
the
premises
operator
project
I'm,
pretty
sure
that
vendors
who
integrate
with
Prometheus
will
provide
an
obserability
platform
or
devops
platform
or
whatever
that
they
are
thinking
of
this
I've
seen
that
the
term
AI
Ops
is
coming
back
within
observability,
so
that
we
get
assistance
with
many
metrics
with
observability
and
also
with
alerts,
and
maybe
in
the
future.
A
B
A
Probably
you
shouldn't
be
doing
tests
when
your
case
experiments
are
running,
but
on
the
other
side
it
would
be
interesting
to
see.
I
would
make
sure
whether
you
want
to
have
all
the
teams
informed
and
know
about
this
or
you
go
the
security
way
of
pen
testing
and
say
we
are
the
red
team.
We
don't
tell
anyone
and
we're
just
deploying
our
chaos
experiment
and
see
how
things
are
going.
I
think
like
damage,
control
and
estimating
what
could
go
wrong
in
production
is
really
important
to
not
have
like
10
000
of
viewers
Wasted
by
chaos.