►
From YouTube: SIG Instrumentation 20200611
Description
SIG Instrumentation Meeting June 11th, 2020
A
Hey
everyone
today
is
june.
11Th
welcome
to
the
sig
instrumentation
bi-weekly
meeting.
It
looks
like
we
have
four
items
on
the
agenda.
Frederick,
do
you
wanna
take
the
lead
on.
B
Yeah,
so
I
think
this
is
something
that's
actually
been
in
our
backlog
for
quite
quite
a
long
time
and
we
just
had
not
had
a
chance
to
discuss
it.
I
think
the
kind
of
summary
of
every
of
this
this
issue
is
currently
the
path
of
logs
on
disk,
only
contain
name
and
part
name,
namespace
and
part
name,
and
not
endpod
uid,
but
not
namespace
uid.
B
If
I'm,
if
I
remember
correctly
and
kind
of
the
problem
here,
was
that
just
like
pod
uid,
it's
potentially
possible
to
have
clashes
with
namespace
names
as
well.
It's
unlikely,
but
it
can
happen
and
kind
of
the
proposal
that
was
proposed
here
is
simply
to
add
namespace
uid
as
well.
B
B
If
we
change
this,
so
my
understanding
is,
we
cannot
break
the
existing
paths,
we
would
have
to
make
them
additive
and
then
potentially
in
the
future,
deprecate
the
current
ones.
But
I
think
david
would
be
better
to
answer
that.
C
C
B
B
A
A
Okay,
the
next
thing
is
the
api
call
latency
slo.
We
were
actually
just
talking
about
something
related
to
this,
but
so
currently
scalability
defines
a
bunch
of
slos
on
the
api
server
and.
A
A
Unfortunately,
we
we
can't
actually
guarantee
any
of
these
things,
because
it
is
possible
for
people
to
configure
arbitrary
web
hooks.
A
A
Allowed
us
to
actually
make
guaranteeable
claims
like,
for
instance,
by
by
having
something
like
internal
api
server,
request,
processing
time.
B
I
well
it's
used
for
scalability
tests,
yeah.
So
to
me
to
me
it
sounds
like
then,
if
you
have
like
web
hooks
that
may
interfere
with
this
metric,
then
you
have
to
adjust
that
slo.
Accordingly,
if
you,
when
you
offer
kubernetes
as
a
service
or
whatever,
like
you're
you're,
still
actually
interested
in
starting
actually
interested
in
the
entire
time
it
takes
for
you
to
serve
your
users
right.
A
Yes,
but
these,
if
you
look
at
the
top
level
of
that
directory,
then
you
will
see
a
slos
dot
markdown,
and
these
are
not
like
gating
releases.
These
are
user-facing
slos
like
these
are
like
it.
B
This
exactly,
I
think,
that
that's
kind
of
that
that's
kind
of
the
problem
and
that's
something
we
need
to
fix,
but
I
think
segregating
the
metrics
is
a
possible
solution.
I
personally
don't
think
it's
the
solution
that
we
should
go
with.
I
think
we
should
just
change
the
wording
on
that
and
say
this
is
our
guarantee
if
you
do
not
have
web
hooks,
you
need
to
think
about
what
overhead
a
web
hook
introduces
when
you
guarantee
something
to
your
users.
A
I
mean
currently
it's
impossible
to
know
how
long
an
internal
request
takes
in
the
api
server
right.
Like
I
mean
we
have
web
hook,
latency
metrics
and
we
have
request
metrics
but
they're,
not
really
joinable,
right
like
you,
you
can't
really.
A
B
But
the
the
if
web
hooks
are
configured,
users
are
still
gonna
gonna
experience,
what
they
experience
right
and
that
is
still
correct
to
alert
on.
But
I
like
it
is.
B
I
do
agree
with
you
that,
like
we
need
to
have
a
mechanism
to
say,
wait
a
minute,
the
the
yes,
the
the
requests
did
take
this
long,
but
it
was
because
of
the
web
hooks
and
not
because
of
something.
D
B
Api
server,
that's
what
we're
trying
to
answer
right.
So
typically
that
is
done
with
or
the
way
I've
seen.
This
done
is
with
like
histograms
for
the
outgoing
metrics
that
goes
to
the
web
hooks,
and
that
is
totally
something
that
you
could
could
inhibit
api
server
alerts,
for
example,
if
that
particular
alert
is
firing,
does
that
make.
B
Sense,
I
mean
I
don't
know
if,
if
other
monitoring
systems
have
this
kind
of
capability,
but
in
prometheus,
for
example,
you
would
say:
okay
alert
for
webhook
responses.
Taking
long.
If
that's
firing
do
not
actually
notify
me
about
api
server
latency
high,
because
I
know
that
that's
gonna
be
high
because,
because
of
the
web
hooks
being
high.
B
A
Because
yeah
they're
killing
them
they're
they're
killed
of
latencies
right
like
you
can
you
can
literally
have
hundreds
of
web
hugs
so
like
you
can,
you
can
actually
be
in
the
situation
where
one
web
hook
is
not
alerting,
because
the
latency
is
actually
quite
reasonable
for
that
web
hook.
But
the
aggregate
total
of
your
web
hook
processing
time
is
extraordinarily
high.
B
Like
they
say
that
you
like,
if
you
have
an
aggregated
api,
you
we
also
track
the
entire
the
latency.
B
Yeah,
it's
entirely
independent
yeah
different
api
server
yeah.
So
I
mean
what
you
could
argue
that
that's
equally
as
much
as
as
much
a
problem
as
web
hooks,
which
I
I
guess
I
agree
with.
A
But
we
have,
we
have
component,
we
have
component
in
our
request,
metrics,
and
that
should
give
you
whether
or
not
the
the
endpoint
is
so
it's
actually
not
not
the
same,
because
we
can
actually
distinguish
the
two
right
now.
Okay,.
A
I
mean
we
can
do
the
internal
thing,
but
I
don't
like
okay
if,
if
there's
an
alternative
to
doing
internal
processing
time,
I'm
like
I'm
receptive
to
it,
I
I'm
not
sure
that
I
at
least
I'm
not
understanding
an
alternative
if
you're
suggesting
I.
B
Mean,
as
I
said
like
I,
I
I
actually
don't
think,
there's
anything
wrong
with
this.
I
just
think
that
we
we
cannot
promise
it
in
the
way
that
it
is
stated
there
right
now
like
we,
we
were
coming
from
the
same.
B
Like
direction
right,
but
I'm
saying
because
people
can
modify,
can
customize
their
kubernetes
in
various
ways
with
web
hooks.
That
does
make
it
a
custom
experience
and
they
need
to
figure
out
what
the
slos
for
that
is
and
kind
of
a
standard
kubernetes
without
any
web
hooks.
This
is
what
we
can
guarantee
and
everything
else
you're
going
to
need
to
figure
out
figure
out.
C
A
B
I
mean
okay,
so
you
you
brought
you
brought
up.
The
example
of
you
wouldn't
know
which
webhook
is
the
is
the
origin
of
the
of
the
problem
right
potentially.
B
Yes,
so
like
let's
say
we
did
have
the
the
metric
that
kind
of
captures
the
latency
without
web
hooks.
That's
kind
of
what
I
understood
you
were
internal
processing
proposing
now.
B
What
would
that
be?
The
next
step
that
you're
active
for
actually
investigating
this?
B
You
would
still
need
to
figure
out
which
webhook
or
which
combination
of
web
hooks
is
actually
causing
this
right,
ultimately,
you're
still
trying
to
resolve
the
problem.
No.
B
Yeah
I
mean,
I
think,
yeah
and
then
you
can
find
it
too.
We
cannot
we
kind
of.
If
I'm
understanding
you
correctly,
we
kind
of
run
into
the
same
problem
as
we
do
with
like
crds
or
something
where
we
can
essentially
now
have
an
arbitrary
amount
of
metrics
added,
because
it's
dynamic.
A
B
A
Yeah,
you
know
that
it's
in
some
combination
of
web
hooks.
You
know
that
the
total,
the
cumulative
web
processing
time
for
this
specific
endpoint
was
greater
than
what
you
expected.
You
don't
like.
B
B
I'm
still,
I'm
still
not
totally
convinced
that
that
is
necessarily
the
better
thing
to
do,
rather
than
a
per
web
hook,
histogram
or
a
summary.
Even.
C
That
each
component
should
have
its
own
slos
that
are
or
have
its
own
signals
as
to
whether
that
component
is
performing
correctly.
So,
in
my
mind
like,
yes,
you
should
add
metrics
to
your
web
hooks
so
that
you
know
how
they're
performing,
but
I
do
think
it's
also
quite
useful
to
have
a
signal
just
to
see
if
the
api
server
is
is
doing
its
job
correctly.
A
Yeah,
I
mean
it's
kind
of
hard
to
tell
right
now.
Is
the
thing
look?
It's
all
it's
actually
basically
impossible
to
tell
if
it
is
a
conjunction
of
web
hooks
or
if
it's
internal
api
server
and
it
it
takes
a
lot
of
digging
and
grabbing
through
logs
to
figure
out
whether
or
not
like
you
know
what
something
looks
like
during
one
request
path.
I
mean
it's
yeah.
A
Definitely
yeah,
I
just
wanted
to
put
it
out
there,
it's
something
I
I
mean
it's
not
something
I
have
an
answer
for.
I
just
was
curious
what
people
thought
about
it.
I
will
continue
to
think
about
it
and
if
anybody
has
additional
thoughts
on.
B
I
I
definitely
think
the
the
intention
is
very
good.
I
think
there
are
a
couple
of
solutions
that
we
could
go
with.
A
We
still
have
another
thing
on
the
agenda,
so
we
should.
We
should,
let's
skip
the
this
week
through
the
pr
backlog
and
and
alexander,
do
you
wanna.
D
I,
for
some
reason
I
can't
share
my
screen.
It
says
the
the
host
has
disabled.
A
C
D
Or
I
can
just
forward
well
od.
D
D
Yes,
okay,
all
right!
Thank
you.
So
thank
you
so
much
for
giving
us
an
opportunity
to
come
and
talk
to
you
guys
about
static
analysis
and
dynamic
analysis.
D
So
my
name
is
alex
czernikovsky
and
here
today
with
me,
I
have
patrick
we're
both
part
of
the
gte
security
team
and
we've
been
working
on
trying
to
solve
the
problem
of
reducing
the
possibility
of
leaking
potentially
sensitive
information
like
credentials
to
logs
and
in
the
next
15
minutes.
D
D
So
here's
our
agenda,
as
I
mentioned
the
objective
is
today
to
there
were
some
very
good
questions
that
were
raised
as
part
of
the
as
part
of
the
cap.
One
of
the
questions
was:
what
is
the?
Why
cannot
like
concretely?
The
question
was:
do
we
need
to
extend?
Do
we
need
to
actually
extend
k
log,
which
is
a
core
component
of
kubernetes,
and
a
lot
of
other
components
in
the
ecosystem
depend
on
it?
D
Why
can
we
not
just
use
static
analysis
which
does
the
scanning
completely
outside
time
and
get
similar
results
and
just
sort
of
tldr
here?
We
believe
that
we
should
do
both
and,
as
a
matter
of
fact,
in
this
gap,
we
decided
to
purposely
focus
on
the
dynamic
analysis.
Just
the
k.
Log
expansion
in
our
plan
is
to
follow
up
with
another
key
app
that
will
propose
adding
a
pre-submit
check
to
kubernetes.
D
That
does
the
static
analysis,
but,
as
you
will
see-
and
hopefully
we
will
be
able
to
explain
this-
that
for
the
best
results,
we
should
focus
on
both
both
types
of
analysis,
dynamic
and
static.
D
So
excuse
me,
so
I
will
explain
what
how
we
do
static,
how
we
do
static
analysis.
We
also
do
a
short
demo
of
the
tool
that
patrick
and
I
we've
been
working
on-
that
we
actually
been
running
against
kubernetes
and
actually
found
some
some
leaks
already.
D
D
So,
as
I
mentioned,
static
analysis
is
something
that
we
decided
consciously
decided
not
to
include
in
this
cap,
just
to
make
sure
that
we
have
a
actionable
outcome
from
the
from
this
cap,
where
we
want,
to
inject
a
add,
a
possibility
to
inject
additional
processing
into
the
k
log
and
this
processing
will
take
care
of
detecting
potential
leaks,
and
it
is
our
intention
to
follow
up
with
another
cap
for
the
static
analysis.
D
D
So
team
propagation
analysis
is
based
on
the
idea
of
identifying
sensitive
pieces
of
information
as
they
enter
the
program
or
a
function
and
tracing
its
potential
flows
through
the
through
the
program
and
in
particularly
we're
interested
in
in
the
events
where
this
potentially
sensitive
data
exits
the
boundary
of
the
program.
So,
for
example,
such
boundary
could
be
a
log
or
it
could
be
a
database
or
it
could
be
an
rpc
call
or
an
http
outbound
call
and
in
the
parlance
of
tame
propagation
analysis,
the
sensitive
piece
of
data
we
call
it
in.
D
We
call
a
source-
and
this
this
moment
in
time
when
the
when
this
data
leaves
the
boundary
of
the
program,
we
call
it
a
sync
and
basically,
the
idea
of
the
same
propagation
now
is
very
simple:
we
build
the
graph
of
the
program
and
we
first
find
the
inputs
and
they
become
sort
of
the
root
of
the
graph
and
we
trace
all
the
execution
paths
until
we
find
the
sink,
and
we
also
have
this
concept
of
a
sanitizer.
D
D
Basically,
this
is
considered
a
normal
code
flow
and
we
do
not
raise
any
any
concerns
about
that.
So,
probably
as
you're
thinking
about
this
there
is,
there
are
two
parts
to
this.
One
part
is
the
analyzer
itself
and
sort
of
the
the
sort
of
graph
processing,
et
cetera,
et
cetera,
but
you're,
probably
already
thinking
in
terms
of
yes.
D
This
is
this
requires
a
lot
of
domain
knowledge
like,
for
example,
I
could
be
the
best
static
analysis
developer
in
the
world,
but
but
unless
I
knew
the
types
that
are
that
may
contain
sensitive
information
in
kubernetes,
my
analyzer
will
not
find
anything.
D
In
other
words,
there
is
certainly
significant
input
that
is
required
from
the
community
to
describe
the
inputs
that
may
contain
potentially
sensitive
information,
and
the
same
goes
for
the
sanitizers,
and
the
same
goes
for
the
log
which,
which
pretty
much
kind
of
explains
the
main
architectural
characteristic
of
the
of
the
analyzer,
that
it
is
very
config
driven.
So
it
receives
a
config
that
defines
all
the
sources
all
the
sanitaries
on
the
logs
and
then
when
it
analyzes
the
graph,
it
makes
the
recommendations
about
potentially
risky
risk
operations.
D
So
there
is
another
aspect
of
of
the
there's
another
interesting
complexity
about
this
about.
This
analysis
is
something
we
call
propagators
so
like
pretty
much
the
same
picture
here
you
see
the
input,
but
then
we
took
the
input
and
we
converted
to
the
string.
So
basically,
let's
say
imagine
that
the
input
is
the
kubernetes
secret
and
we
call
the
string
method
on
it
and
now
the
variable
s
is
a
string
that
contains
potentially
contents
contains
credentials.
D
D
So
I
think
this
is
pretty
much
all
the
all
the
theory
you
you.
We
need
to
understand
how
the
static
analysis
works
with
that.
I
will
pass
it
over
to
patrick
and
he
will
do
a
short
demo
of
the
of
the
tool
we've
been
working
on.
E
I
to
I
would
also
like
to
share
my
screen.
E
Here
we
go
so
share
screen
here
we
go
all
right,
cool,
so
yeah
pretty
much
exactly
like
what
alex
was
saying.
We
did
end
up
finding
one
live
in
the
wild.
E
Example
here
we
are
in
the
cubelet
token
manager
and,
like
primarily,
I
always
think
of
these
static
analysis
tools
as
things
that
are
like
helping
carry
developer
mental
load,
because
here
we
have
like
just
a
token
request,
refresh
and
there's
a
very
simple
case
of
if
the
requested
expiry
is
nil.
That's
like
an
invalid
request,
so
we're
gonna
just
throw
this
into
a
log.
Hey
this
request.
E
Wasn't
valid
figure
out
what's
going
wrong,
but
in
this
particular
case
we
have
this
token
request
is
defined
over
here
in
the
top
right
as
containing
this
status,
which
is
optional,
but
that
request
itself
might
contain
this
token.
So
this
is
one
of
the
instances
where
it's
like
yeah.
This
is
maybe
something
we
should
have
explicitly
removed
a
token.
We
don't
know
if
it's
there
or
not.
E
This
may
have
never
actually
become
an
issue,
but
here
back
at
the
end
of
april,
we
ended
up
rewriting
this
to
explicitly
zero
out
that
token
before
it
was
sent
to
logs.
But
this
is
the
sort
of
thing
static.
Analysis
is
great
at
checking
or
catching
so
here
we
can
use
our
vet
tool
and
see
that
it's
telling
us
that
right
here,
we've
got
this
issue
and
the
big
one
like
alex
was
saying
is
catching
propagation.
E
So
here
if
we
were
to
specifically
say
cast
this
thing
to
a
string
first
and
log
that
you
know
make
sure
that
we're
still
catching
things
like
this
as
well,
so
we
do
have,
oh
sorry,
go
for
it.
E
Oh
okay,
well
so
like
the
two-minute
wrap-up
is
the
the
interaction
with
the
existing
cap
would
be
this
kind
of
the
dynamic
portion
of
analysis
inverts
the
whole
where,
instead
of
trying
to
find
where
input
can
possibly
reach,
we
are
looking
at
the
point
of
logging.
What
input
is
coming
in
and
we
are
able
to
integrate
our
sanitizers
into
that
to
identify
data's
coming
in
probably
using
reflection,
but
yeah
it's
a
hard
problem
and
I
think
best
results
you
should
use
both.
E
Certainly
knowing
what
sorts
of
things
to
look
for
in
static
analysis
depends
fundamentally
on
what
developers
are
doing
and
without
any
signal
of
what
static
analysis
like
false
negatives.
You
are
allowing
through
there's
not
really
a
great
way
to
iterate
on
that
without
having
something
in
the
dynamic.
On
the
runtime
side
saying
this
is
what
you're
missing
so
yeah.
That's
my
two
cents.
A
I
I
think
we
should
probably
talk
more
about
this.
I
would
also
like
merrick
to
be
here
because
america
has
just
done
the
the
static
analysis
bit
for
instrumentation
recently,
so
like
yeah,
I
I
think
it
sounds
reasonable
to
me,
but
we
are
out
of
time.
I
am
so
sorry
and
I
don't
know
alarm's
going
off
so
hey.