►
From YouTube: SIG Instrumentation 20200528
Description
SIG Instrumentation Meeting May 28th, 2020
B
A
A
I
think
we
have
one
for
metric
server,
but
we
don't
have
one
for,
like
you,
know
generic
stuff
or
prototyping
things,
and
I
thought
it
might
be
nice.
B
Do
we
have
like
precedence
for
this
with
like
api
machinery,
or
something.
A
They
are
very
liberal
about
creating
things
like.
I
know
that,
like.
A
The
storage
migrator
stuff
was
in
a
sigripo,
I
mean
mostly,
I
think
people
are
trying
to
get
away
from
putting
stuff
into
the
main
kate's
stuff,
if
it's
yeah
for
sure
not
completely
necessary,
and
I
I
think
that's
desirable
pattern
and,
like
we
shouldn't
be
constrained
by
kubernetes,
we
don't
have
to
be.
B
Yeah
no,
no,
I
I
agree,
and
especially
if,
if
this
is
not
something
that's
intended
to
ship
with
every
everything
and
everyone,
it
doesn't
make
sense,
I
mean
I,
I
think
it
would
be
okay
to
create
a
a
repo.
I
think
maybe
we
should,
at
the
very
least
we
should
have
kind
of
a
clear
purpose
of
any
repo
that
we
that
we
create.
B
I
think
it's
okay,
if
we
say
this
is
for
tooling
stuff
that
we
want
to
prototype
and
stuff
like
that,
but
at
the
very
least
it
needs
to
be
that
right,
yeah.
I
think
I
think
that
would
be
okay
cool.
I
do
wonder,
do
we?
I
think
we
have
repo
creation
as
part
of
our
governance.
We
do,
if
I
remember
correctly,
so
I
think
it
just
needs
to.
B
I
think
there's
there's
literally
a
section
in
our
government
governance
about
creating
new
projects
for
the
sig,
so
we
do
need
to
follow
that,
but
other
than
that
I
don't
really
see
an
issue.
A
Awesome
there
was
oh
yeah,
so,
okay,
so
that's
that
the
next
issue
is
also
mine,
which
is
elena
signed,
signed
a
to
do
for
me,
less
less,
I
think
hold
on.
Let
me
find
the
there's
a
github
issue,
but
I
I
don't
see
it
on
the
agenda,
but
it's
not.
It
is
actually
on
the
agenda.
I'm
on
my
personal
computer,
because
zoom
doesn't
work
on
my
other
computer,
so
it
is
on
the
previous
week's
thing.
A
B
Linked
it,
I
believe,
fixing
api
request
total
metric
before
returning
to
stable.
A
Wasn't
that
the
agenda,
or
was
it,
was
it
stability?
Okay,
it
was
stability
to
to
ga
that
we
wanted
to
talk
about,
or
there
was
this
other
one.
I
I
I
conflated,
though
I
think.
B
I
guess
I'm
marked
for
this
one,
but
I
think
so
then.
The
next
item
that
we
have
on
the
agenda
for
today
is
revisit
metric
stability.
Cap.
Graduation
criteria
is
that
the
one
that
you're
talking
about.
C
About
I'll,
add
it
han:
let's,
let's
skip
that
one
for
the
end.
B
Okay
yeah,
so
let's
go
through
this
one,
then
the
unfortunately,
the
link
and.
B
Yeah,
I
guess
this
is
actually
hans
hunts
issue,
so
this
is
moving
api
server
metrics
to
stable.
A
So
yeah
the
api
server
request.
Metrics
are
arguably
probably
the
most
important
for
cluster
administrators
to
establish
slos
and
there
are
two
metrics
specifically,
which
everyone
establishes
slos
against,
which
is
api,
server,
request,
total
and
request
duration.
Now,
which
is
the
new
one.
A
And
there
are
perpetual
issues
with
these
metrics
cardinality
issues
and
also
it's
like
it's
really
easy
for
people
to
want
to
just
add
things
to
this,
because
everything
hinges
off
of
a
request
right.
So
this
thing
has
like
11
dimensions
or
something
and
a
lot
of
them
don't
make
sense.
We've
had
cardinality
issues.
We
just
recently
had
the
security
issue
with
it
and
before
we
turn
it
to
stable,
I
think
maybe
we
should
just
strip
it
to
its
bare
essence
like
what
is
required
for
the
slo
and
that
way
like
when
we
reduce.
A
We
reduce
the
probability
that
it
will
become
a
security
issue,
and
we
have
to
do
something
to
it
and
two,
like
I
have
this
link
here,
where
we
as
open
source
community,
have
defined
slo.
A
Like
slo-ish
type
things
for
for
for
kubernetes
in
terms
of
life,
are
you
talking.
A
So
we
actually
have
no
way
right
now
to
surface
whether
or
not
we're
even
meeting
these
things.
Yeah,
and
I
think
if
we
go
down
this
route
and
make
the
effort
to
make
these
like
true
canonical,
kubernetes
slo,
sli
metrics,
then
one
we
should
reduce
the
clutter
and
two.
We
should
be
able
to
actually
guarantee
the
slos
that
you
know.
We
are
saying
that
we
will
meet.
B
I
mean,
I
think
the
the
later
part
is
six
scalabilities
responsibility,
but
I
agree
like
we
I
mean
the
the
fact
that
these
should
be
stable.
Metrics
is
basically
already
proven
by
the
fact
that
every
time
we
mess
with
them,
we
first
check
with
six
scalability
right
because
they
they
will
immediately
feel
the
impact.
The
moment
we
merge
it
right.
So
I
agree
with
that.
There's
one
one
thing
that
I
wanted
to
call
out.
B
I
don't
know
if
we
have
matthias
on
the
call,
but
is
exactly
based
on
these
slos
matthias
has
already
created
a
couple
of
prometheus
rules
that
follow
like
the
google
sre
books,
best
practices
on
like
multi-error,
multi-window
error,
burn
rates
and
stuff
like
that.
So
I
encourage
folks
to
check
those
out.
Can
you
can.
B
B
Actually
part
of
one
of
our
sig
projects,
which
is
the
kubernetes.
B
Yeah,
thank
you
but
yeah.
I
totally
agree.
We
should
definitely
do
this
one
one
kind
of
potentially.
B
Controversial
thing
that
I'm
gonna
throw
into
the
room
is
what,
if
we
got
rid
of
the
metric
entirely
and
only
used
histograms,
because
at
the
end
of
the
day,
they're
actually
duplicate,
because
histograms
already
count
all
the
requests
right
and
histograms
you're
gonna
need
anyways,
but
you
only
really
care
about
histograms
for
succeeding
requests
or
we
would
strip
histograms
to
only
report
succeeding
requests.
I.
A
B
Content
type-
I
guess
yeah,
I
guess
what
I'm
trying
to
say
is-
or
you
already
included
that
in
here,
so
that
we
should
review
all
of
those
metrics.
I
was
kind
of
focused
on
the
request
total
metric,
but
yeah.
A
A
B
I
I
do
wonder
it
does
still
account
for
the
cumulative
latency
right,
so
I'm
not
sure
I
would
actually
exclude
it,
but
I
would
definitely
make
sure
that
we
have
appropriate
metrics
that
show
the
webhook
latency,
yeah
and
and
for
kind
of
our
control
group.
Six
scalability
obviously
can't
have
like
non-standard
or
I
guess
they
shouldn't
have
any
web
hooks
at
all.
A
Yes,
I
I
think
we
should
definitely
have
both,
because
you
want
the
cumulative
latency
as
well
as
the
weapon
latencies
yeah,
and
we
need
to
do
our
best
to
reduce
the
cardinality,
because
if
we
introduce
multiple
metrics
for
these,
then
obviously
you're
just
doubling
it.
It's
really
not
that
bad,
though
considering
the
number
of
dimensions
that
we
have
like,
we
can
easily
double
these.
A
If
we
get
rid
of
a
few
things
right,
like
yeah,
but
moving
forward,
we
need
a
set
of
prescriptions
for
what
people,
what
to
tell
people
to
do
if
they
want
to
add
something
that
hinges
off
of
this
right,
like
like,
nobody
has
like
people,
are
just
throwing
stuff
into
this
one,
because
there
is
no
set
of
prescriptions
right
like
like.
B
I'm
not
sure
I'm
following
are
you?
Are
you
referring
to
what
you
mentioned
earlier
about,
like
anything
that
refers
to
a
request
tends
to
people
tend
to
want
to
throw
into
this
metric?
Yes,.
E
A
Yeah,
so
so,
if
we
have
guidance
about,
if
you
have
core
like
request,
type
request
related
metrics
like
this
is
how
you
should
do
it.
You
should
create
your
own
thing
with
your
own
dimensions,
and
then
here
is
a
prometheus
query
that
you
can
use
to
join
the
two
metrics
right
and
in
that
way,
like
your
metric.
A
F
Point
because
in
the
kubernetes
mixin
like
what
I
try
to
do
is
actually
the
count
based
availability
measurement,
where
we
essentially
like
sum
up
our
requests
by
verb
and
count
the
code
over
28
days
and
that
basically
blows
up
almost
on
every
every
other
deployment
out
there,
because
the
cardinality
is
as
high
as
you
just
said.
So
that
would
be
awesome.
G
B
Yeah,
I
think
it's
it's
kind
of
our
it's
kind
of
our
failure
to
educate
the
rest
of
the
kubernetes
engineers
on
some
of
these
topics
and
like
it's
just
some
people
are
just
new
to
this
to
this
world
right
and
that's
fair
and
that's
fine.
We
just
need
to.
I
guess:
we
need
to
kind
of
educate
folks
a
bit
more
yeah.
G
So
my
point
is
that
should
we
like
so,
we
should
promote
like
so
if
I
would,
if
I
think
how
I
would
understand,
need
the
user
client
I
would.
There
are
some
cases.
People
like
need.
Those
things
to
to
under
like
have
an
aggregate
or
be
able
to
search
what
our
clients
use
or
what
versions,
and
I
think
this
is
more
for
debugging
and
and
things
like
structured
logging,
which
gives
you
debug
information.
G
What
happens
in
my
system
what
events
are
in
my
system,
what
user
clients
spoke
to
my
system
and
metrics
are
more
about
defining
state,
so
we
should
think
about.
There
was
some
time
ago
I
discussed
with
david
about
like
having
a
good.
You
know,
making
it
easier
for
kubernetes
developer
to
decide
if
they
should
write
a
log
or
a
metric,
and
I
think
this
goes
under
our
responsibility.
G
Should
yeah
think
about
writing
a
big
guidance
so
because
I'm
currently
writing
the
migration
for
structure
logging
like
instructions?
G
I
look
into
current
current
developers,
documentation
and
I
quickly
say
that
it's
pretty
lacking
there's
mostly
no
info
like
metrics,
are
described,
but
locks
are
much
much
smaller
and
I
think
we
should
think
about
like
vision
for
building
a
vision
for
instrumentation
kubernetes.
What
should
be
a
lock?
What
should
be
a
metric
in
house
people
should
where
people
should
look
for
it
like
when
someone
wants
to
solve
some
problem
like.
A
Example
of
things
that
should
be
in
logs
right,
because
these
are
you
do
want
these
things
and
you
you
can
break
out,
like
you
know,
log
based
metrics
or
whatever,
with
the
right
back
end
and
do
your
accounts
that
way.
But
you
know,
there's
stuff
like
that,
and
there's
also
stuff
like
like
there
might
be
things
that
you
want
metrics
on,
which
are
bounded
right
like
like,
for
instance,
we
have
this.
This
dry
run
dimension
right,
which
is
kind
of
an
odd
thing
to
be
in
a
nestle
metric,
which
is
right,
like
yeah.
A
B
A
B
Agree,
I
think
we
have
a
couple
of
more
topics
on
today's
agenda
if
I'm
not
completely
mistaken,
so
I
I
think
maybe
we
can
each
give
this
a
little
bit
of
thought
and
try
to
comment
on
hans
issue
here
and
I
would
love
to
move
this
conversation
forward.
B
Sorry,
I
guess
I
should
have
continued
to
share
my
screen.
We
have
one
more
topic
which
is
housekeeping,
wrap
up
metrics
overhaul
ken.
C
Yeah,
I
just
added
that
to
the
agenda
when
I
was
looking
through
my
github
notifications,
we've
gotten
some
pings
from
the
enhancements
team
in
terms
of
like,
what's
going
on
with
this
in
119
or
whatever
is
this
done,
and
I
just
wanted
to
check
in
with
folks
like
where
we
are
with
that.
B
B
Sorry,
sorry,
I
think
you
recently
got
like
the
prom
tool,
the
linting
stuff,
to
work
right,
so
I
think
that's
kind
of
an
yay
a
huge
achievement.
I
think
that
was
a
pretty
massive
rabbit
hole.
I
don't
know
if
everybody
followed
this,
but
essentially
he
had
to
go
up
to
the
prometheus
project,
extract
things
into
the
library
and
everything
like
it.
It
was
pretty
massive,
but
we
finally
made
it
so
yeah
huge
props.
For
that,
that's
awesome.
B
Thank
you.
Thank
you
yeah.
So
I
think
we're
kind
of
one
step
further,
but
maybe
I
can
I
can
share
my
screen
one
more
time.
We
do
have
six
minutes
and
we
can
have
a
look
at
what
else
we
had.
A
Left,
oh
yeah,
by
the
way
I
I
have
volunteered
rainbow
mango
for
a
maintainer
at
cubecon,
china
awesome.
I
think
that
makes
a
lot
of
sense,
given
its
contributions,
which
we
really
appreciate.
B
Does
anybody
know
what
we
still
have
to
do
for
this?
One.
H
I
think
we
we
should
add,
attach
the
plan.
B
C
Down
there
are
some
comments
from
the
folks
managing
enhancements.
A
Okay,
yeah,
if
doesn't
break,
then
I
think
that
that
we
have
it's
okay,.
C
C
I
mean
this
this.
This
kept
us
so
old,
and
the
the
scope
of
this
cap
was
not
really
something
that,
like
a
formal
test
plan,
might
necessarily
apply
to.
So
I
think
that
we
just
need
to
go
and
like
do
the
paperwork
and
do
the
housekeeping
to
say
like
look.
This
is
all
already
been
implemented.
It's
all
working,
it
didn't
break
anything
like.
We
don't
really
have
a
formal
test
plan,
because
this
was
mostly
like
metrics
usability
stuff
like
what
do
we
do?
What
else
do
we
need
signed
off
on.
B
I
mean
we,
we
have
tests
for
everything
that
we
implemented,
so
I
don't
know
exactly
what
they
would
be
asking
for.
So
is
there
any
volunteers
to
like
get
the
paperwork
done
with
with
testing
them.
A
H
C
Up
on
this,
one
rainbow
mango
you've
been
running
with
this.
So
if
you
want
to
pull
me
into
saying,
like
look
that
like,
if
there's
anything
that
you
have
on
your
agenda
here
like
to
do
for
these
folks,
let
me
know,
but
otherwise
I
can
just
go
back
to
them
and
be
like.
I
don't
think
we
need
this.
A
B
A
It
has
had
regressions
they've
had
to
change
stuff
but
yeah,
but
yeah.
I
I
think
they've
made
the
changes
and
everything
works
now.
B
Yes,
we
broke
metrics,
so
they
had
to
adapt
things
so
that
in
that
sense,
you're
absolutely
right,
but
I
mean
to
me:
it
sounds
like
we're
actually
now
at
this
actually
at
this
step.
So
I
guess
we're
literally
in
at
the
point
where
we
should
be
rediscussing:
the
api
server
metrics
and
that's
even
outside
of
the
scope
of
this
cap.
So
once
once
we
kind
of
get
the
okay
from
the
other
six.
I
feel
like
we're
done
with
this.
C
B
I
guess
I
guess
we
should
not
just
talk
to
sick
testing,
but
just
also
cycle
back
one
more
time
with,
I
think
it's
sick
release,
or
whoever
commented
on
this
just
to
kind
of
close
the
loop
with
everyone
but
yeah,
which
I
think
would
conclude
what
may
have
been
the
biggest
work
item.
We've
ever
done
as
this
group,
which
I
think
everybody
can
be
a
bit
proud
of
yay.
B
A
That's
awesome,
yeah,
oh
one,
one
other
thing,
so
we
have
a
lot
of
things
in
the
pipeline,
so
merrick
just
started.
This
got
the
the
logs
directory
in
component
base.
So
I
was
wondering
if
we
wanted
a
top-level
directory,
mostly
because
it
would
make
it
like
right
now
we're
flat
on
the
component-based
directory
level,
and
so
we
have
like
metrics
and
we
have
logs.
But
if
we
have
instrumentation,
basically
we
don't
have
apple
problems
anymore.
We're
going
to
have
tracing
stuff
right
like
that
is
for
sure
yeah.
A
B
So
I
I
guess,
if
I'm
hearing
this
correctly,
you
you're,
suggesting
we
create
a
instrumentation
one
on
top
level
and
then
the
three
and
then.
G
So,
to
give
a
context
like
this
is
component
based,
it
was
proposed
as
a
cap
as
a
way
to
refactor
all
the
components,
so
it's
managed
by
working
groups,
standard
component
standard
and
it
like,
I
would
leave
so.
My
idea
was
to
leave
to
like
structure
of
the
the
repository
up
to
them
because
they
own
it.
We
like
there
are
lots
of
six
that
will
be
owning
different
sub
directories,
but
currently
so
so.
Currently,
it's
up
to
them
to
to
agree,
and
I
will
just
check
yeah.
A
For
sure
we
should,
we
should
definitely
talk
to.
Like
you
know,
stefan
stephanie
is
the
one
stefan
and
michael,
I
think,
are
yeah
michael.
B
Yeah
are
the
two
two
main
component
people
I
mean.
We
obviously
want
to
check
back
with
them,
but
it
doesn't
from
the
sick,
instrumentation
side.
I
don't
think
it's
controversial.