►
From YouTube: SIG - Performance and scale 2021-05-27
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit
A
Okay,
all
right
all
right.
Welcome
to
sixth
scale
everybody.
I
touch
the
document
chat.
Add
your
name!
That's
an
attendee!
When
you
get
a
moment,
I'm
gonna
share
my.
A
A
Okay,
you
should
be
seeing
the
our
meeting
minutes
at
google
doc.
Okay,
so
we're
today
for
the
agenda
items
added
a
few
things
talking
about
tooling
or
from
some
tooling
so
we've
had
we've
had
a
number
of
discussions
on
the
mailing
list.
We've
talked
about
different
tools
that
we
can
measure
and
measures.
You
know
one
of
our
important
initiatives
that
we
want
to
go
through.
A
We
want
to
measure
performance,
we
want
to
measure
scale,
so
I
could
see
there
being
sort
of
two
tools,
or
at
least
two
different
sort
of
verticals-
that
we
can
go
after
one
for
performance,
one
for
scale,
so
I
figured
we
could
take
the
time
today
to
discuss
some
of
what's
been
said
on
the
mailing
list
and
see
if
we
can
kind
of
capture
some
requirements,
the
details
prior
art,
everything
that
we
can
think
of
that
goes
into
solving
some
of
these
problems.
A
So
I
figured
we
could
just
start
with
one
and
then
let's
just
see
how
far
we
get
and
we
can
always
see
how
see
if
we
get
to
the
to
both.
So
I
think
we
start
with
performance
that
was
kind
of
one
of
the
ones
that
there's
a
lot
of
mailing
list
traffic
on
david.
You
talked
about
it,
a
bunch
with
some
of
your
what
you
looked
at
with
the
profiling
and
then
fan
had
also
mentioned.
A
He
was
interested
in
doing
some
work
around
this
and
some
tooling
work
he's
done
so
I
figured
we
could
start.
You
know,
I
think
we
could
start
with
requirements.
Does
that
that
make
sense
that
maybe
we
can
capture
some
like
what
are
the
things
that
we
want
in
a
tool
that
that
measures
performance,
so
what
do
people
think
does
it?
I
add
four
things:
there's
probably
a
lot
more
so
like
what
you
know,
what
are
some
things
and
I
can
write
them
down.
B
So
I
would
move,
go
profiling.
There
might
be
a
third
category
here
on
the
agenda.
Okay,
so
we
have
tools
to
measure
performance
to
I'm
not
sure
measure
scale
would
be
accurate
as
much
as
to
create
stress
at
scale
or
something
like
that
and
then
there's
tools
to
what
would
you
even
call
it?
It's
like
it's
not
really
measuring
as
much
as
it
is
like.
I
consider
profiling,
something
that
helps
us
address
problems
we
found
with
performance,
so
it
it's
not
really
measuring.
I
don't
know
what
it's
measuring.
A
Okay,
that's
fine,
so
we
can
call
it
so
we'll
we
can
and
and
again
like
this
could
be
all
one
tool.
I
I
don't
really
you
know,
and
these
could
just
be
all
features
whatever,
but
so
we
could
just
start.
I
don't
know
we
need
to
start
with
this,
like
I,
I
guess
so
profile
we
could
say
like
we
will
say
this
is
a
requirement
like
we
want
to
do
profiling
with
this
performance
tool.
Does
that
make
sense
like?
I
think,
that's
like
something
that
would
fit
here
like?
A
I
think
that's
what
we
want
to
do
like
measure.
We
want
to
do
some
measurements
of
periods
of
time.
We
want
to
record
phase
changes
like
we
want
to
know
every
time
we
go
from
running
or
from
pending
to
scheduling
we
want
to
capture
that
places.
We
want
to
record
like
what
else
do
we
want
in
here
like
what
are
some
other
ways
we
can
measure,
or
I
consider
this
like
measuring
code
david,
like
we're,
measuring
or
we're
sort
of
like
we're
your
profile,
yeah.
C
C
The
go
code
seeing
how
much
time
we
spend
in
the
function
and
stuff
like
that
and
the
errors
are
measurements.
That's
also
the
period
of
time
stuff
that
are
also
useful
during
operations
or
not
necessarily
by
a
class
cluster
admin,
but
by
us
running
during
tests
and
stuff,
and
that's
like
kind
of
two.
Okay.
C
A
C
For
scale
tests,
possibly.
B
Yeah,
okay,
that
can
make
sense.
It's
the
different
insights
that
we're
looking
at
so
one's
measuring
externally,
like
how
long
it
takes
for
certain
things
to
occur,
and
then
I
guess
the
profiling
is
measuring
internally,
where
we
actually
spend
the
most
time
in
our
actual
functions
and
things
like
that.
A
B
B
Knowing
how
many
times
they
are
popping
per
a
a
key,
so
understanding
like
between
going
from
posting
a
vmi
to
running,
for
example,
how
many
times
are
we
processing
or
syncing
that
virtual
machine
to
get
there,
and
that
gives
us
some
indications
of
problems?
If
you
know
the
average
amount
of
work
we
do
is
greatly
increased,
so
we
can
lower
the
amount
of
work
that
we're
doing
to
get
it
into
that
state.
B
That
seems
like
that
would
improve
things,
possibly
so
yeah
q
length
and
maybe
statistics
on
how
often
keys
are
called.
C
Okay,
yeah,
like
those
topics,
are
kind
of
under
the
the
api
pressure
group.
Like
the
pressure
we
cause
by
calling
the
api
server
a
lot
or
the
pressure
we
get
from
the
api.
So
because
there's
a
lot
of
objects
like
watching
conflict
maps
is
like
can
be
very
horrible,
so
stuff
like
that,
is
related
to
all
the
all
the
caches
and
views
we
have.
That.
Can
that
I'd
like
to
see
more
numbers
about
how
we,
how
we
behave
at
scale
and
if
our
code
is
performant
enough.
B
So
sometimes
we
have
trends
that
look
good
in
small
clusters,
because
maybe
we're
watching
all
objects
in
the
cluster,
but
there's
just
not
that
many
of
them,
but
then
that
quickly
multiplies
as
the
scale
of
the
cluster
and
the
number
of
workloads
and
other
types
of
objects
increase
to
the
point
where
it
it's
not
just
a
linear
progression
at
performance.
It's
it's
worse
and
those
aren't
always
obvious.
B
I
think
we
had
go
profiling
here,
but
just
measuring
simple
things
like
that,
the
cpu
and
memory
usage
of
our
controllers
is
useful.
D
Okay,
don't
worry
so
yeah,
I
think
we,
it
would
be
better
to
also
measuring
the
the
latency
in
the
controller.
So,
for
example,
when
the
the
pod
creation
and
the
the
vmi
creation
are
not
synchronized,
so
the
power
creation
most
likely
be
very
fast,
and
but
the
vmi
status
updates
would
be
later
customize.
The
security,
the
keys,
are
filled
up
in
the
work
queue
for
the
available
worker.
D
To
pick
up
so
the
later
that
that
is
the
latency
also,
I
think
it
would
be
better
to
measure
the
the
event
handler,
a
call
called
back
in
the
controller
and
how
the
key,
after
the
the
event
includes
a
key
in
the
worker
queue
and
how
soon
the
key
will
be
pick
up
from
the
queue.
A
Can
you
add
that,
from
there
fan
what
you
just
said,
I
think
under
crew
life
yeah
add
that
in
there,
okay
yeah
sure
those
are
all
good
things.
Okay,
I'm
thinking
some
other
things
like
like
portability.
Some
of
those
that's
another
one
that
I'd
be
interested
in.
Even
this
is
not
necessarily
to
measurement
or
just
has
to
do
with
measurement,
but
this
is
something
that
I
want
as
a
requirement
like
who's
gonna,
be
using
the
tool
like
the
different
profiles
like
I
want
to
use
it
as
a
developer.
A
I
we
also
want
to
use
in
ci
who
else,
like
I'm
sure,
some
qa
people
would
probably
love
to
use
this
and
then
who
else
like
where
could
like?
How
can
we
hand
this
to
people
like
like?
We
just
give
them
like
a
pod
like
or
something
to
run?
That's
another
thing,
I'd
be
interested
in
the
way
that
we
can
move
it
around
so
lightweight,
something.
That's
that's
just
portable.
We
just
launch
in
a
cluster
and
then
maybe
just
some
measurements
and
returns
or
something.
B
A
So
we
have
okay,
so
we
have
something
all
right.
So
we
do.
We
have
some
profiling
we're
going
to
measure
periods
of
time
we're
going
to
okay.
So
how
okay?
We
we
another
thing
so
we're
measuring
periods
of
time.
How
about
like,
like
creating
like
we
probably
want
to
like
I'm
thinking
about
how
we
control
this
like
we
could.
We
could
have
like
a
like
like
wait.
A
E
I
was
actually
going
to
bring
that
up.
I
think
we
have
a
number
of
sort
of
households
or
things
that
do
both
and
I
think
I
think
it
would
be
good
to
I
mean
we
need
to
evaluate
that
and
talk.
E
E
B
Given
this
a
little
bit,
I
like
the
idea
of
separating
the
two
so
separating
the
tool
that
creates
the
stress
and
then
have
something
like
because
it
might
be
disjoint
how
we
monitor,
we
might
be
getting
some
statistics
from
prometheus.
We
might
be
getting
some
from
some
other
profiling
tool.
We
made
it's
unclear
to
me
exactly
where
all
these
measurements
are
going
to
live
and
what's
appropriate,
even
so
the
idea
of
separating
the
two
tools
or
separating
the
ability
to
to
measure
from
the
ability
to
generate
load.
C
I
I
think
we
already
have
a
separate
effort
that
focuses
on
on
creating
stress
and
load
tests,
and
so
the
stuff
I
selected
right
now.
This
all
kind
of
is
in
the
area
of
prometheus
metrics
that
they
already
are
working
on
to
export
from
test
runs
and
stuff.
So
that's
all
stuff:
they
they
can
work
on
the
load
testing
with
that
all
feeds
like
prometheus
style,
metrics
and
the
go.
Profiling
is
more
on
the
depth
side,
and
maybe,
if
we
had
tracing
that
can
fill
into
that.
But
these
here
seem,
like
usual
metrics,.
B
Possibly,
I
think
I'm
leaning
towards
that
if
we
could
enable
some
sort
of
debug,
I
don't
think
we
want
to
always
export
all
these
metrics
to
prometheus,
but
it
could
be
like
enable
debug,
metrics
or
something
like
that.
And
then,
when
we
look
at
results
over
time
periods,
we
could
you
can
get
that
for
prometheus.
In
the
time
series.
B
How
do
you
all
feel
about
using
prometheus
I'm
a
little
uneasy
about
requiring
prometheus
here?
Is
that
something
that
people
on
the
couch
feel
comfortable
with,
or
what
are
the
thoughts.
A
What
do
you
mean
by
requiring
parenthesis
like
so
like?
In
other
words,
we
don't
like,
because
I
guess
what
I
was
thinking
is
that
we
could
just
have
output,
be
anything
that
the
person
that's
using
the
tool
chooses
so
like
it
could
be
export
to
prometheus
export
to
json
export
to
file
the
only
sort
of
format
that
we
can
use
to
sort
of
create
data.
B
Well,
I
mean
that's
a
that
sounds
good.
Here's,
the
problem
with
that
when
I'm
looking
at
this
list.
Some
of
these
some
of
this
information
already
exists
in
prometheus.
So
specifically
like
api
calls
made,
we
can
take
a
look
at
the
api
server
and
the
metrics
are
already
exposed
there
and
detect
what
comes
from
our
controller
pods,
for
example,
and
we
can
create
data,
we
can
get
data
about
that
already,
so
it
already
exists
in
prometheus.
C
Right
now,
our
code
base,
metrics
codebase,
seems
to
be
very
focused
on
prometheus,
like
we
import
the
prometheus
code
to
generate
our
metrics.
C
That's
a
format
that
a
lot
of
scraping
services
can
read
nowadays,
but
if
we
would
look
at
stuff
like
open
telemetry,
which
is
more
generic
around
the
whole
metric
story,
I
think
they
support
different
exporters
that
can
be
configured
like
you
say
you
want
to
export
a
prometheus
or
to
stackdriver
or
datadog
or
whatever.
I
think
that's
in
their
scope
and
that's
more
generic.
C
A
Yeah,
I
mean
I
guess
what
I'm
thinking
is
that,
like,
like,
I
talked
about
with
the
different
personas
like
in
in
like
dev
qa
or
wherever,
whoever
is
using
this
I'm
just
sort
of
thinking
of
in
terms
of
the
their
their
ramp.
Like
you
know,
what's
their
initial,
you
know
what
do
they
need
to
do
to
get
to
leverage
this
tooling
and
so
yeah
we're
we're
adding
a
dependency?
A
If
we
say
you
have
to
use
prometheus
yeah,
I
I
like,
I
think
like
so
I
guess
what
I'm
saying
is
like
some
of
the
stuff
like,
like
you
say,
api
calls.
It
definitely
like.
I
already
see
like
found
that
it's
already
recorded
information,
so
we
can
capture
it
some
of
the
other
stuff,
like
I'm,
seeing
like,
for
instance,
like
phase
changes
like
it.
I
could
see
it
being
almost
in
both
cases
like
like
this
doesn't
exist
right
like
we
don't
have
like.
A
If
I'm
gonna
do
this,
I'm
gonna
like
record
phase
changes.
This
sounds
like
I'm
to
write
some
code,
I'm
going
to
write
some
code
that
records
this
or
something,
and
then
it
just
and
then
we're
going
to
report.
We're
going
to
expose
it
on
an
endpoint,
and
so
like
some
of
these,
like
I'm
wondering
if
they
that
they
could
do
either
one
yeah.
B
We
can
write
a
package
of
some
sort
that
begins
capturing
this,
this
kind
of
developer
insights
and
then
it
can
have
different
ways
of
exporting
it
so
to
prometheus,
maybe
to
a
locally
and
then
aggregated
via
sub
resource
endpoint,
or
something
like
that.
B
B
The
cluster
that
aren't
going
to
be
represented
there
unless
we
totally
recreate
the
way
these
things
are
collected,
then
sure.
C
A
Yeah,
like
I,
I
don't
think
I
actually
think
we
don't
even
need
to
change
this.
I
think
like
so.
I
guess
we
could
like
so
this.
This
part
right
here,
yeah
like
if
this.
If
I'm
looking
at
doing
I'm
looking
to
measure
performance,
I
guess
so
what
I'm
trying
to
get
is
like
what
is
like
the
bare
minimum
like
in
terms
of
like
okay.
This
is
useful.
I
could
hand
this
to
qa.
They
can
give
me
a
measurement
back
and
it's
like
very
little
on
ramp
and
like
what
are
like
two
or
three
things.
A
I
need
like
here's,
one,
here's,
here's
two
and
like
here's,
three
like
that's,
that's
it
and
like
this
stuff,
like
that's
great
and
I
can
maybe
that's
like
I
consider
it
almost
more
advanced
like
I
don't
I
almost
don't
even
want
to
touch
it
like.
I
think
that's
good
like
we
have
that
as
prometheus,
and
we
just
leave
it
like
that,
and
then
these,
like
three
things
like
these
simple
things
like
it's.
Basically
just
a
blob
of
text
like
we
can
just
have
that,
be
like
our
our
format.
C
Yeah,
with
the
former
advice
I
see,
is
a
slight
issue
with
with
the
more
metric
see
stuff
because,
like
right
now,
it's
being
output
as
time
series
like
it
gets
scraped.
It's
not.
We
don't
record
the
go
code,
doesn't
record
time
series.
I
don't
think
it
should
because,
like
a
database
like
that
grows
insanely
so
right
now
there
is
a
output
of
the
prometheus
metric
every
x
and
promises
grapes
that
it
reacts.
A
Yeah
also,
let
me
also
mention
so
this
this
okay,
so
this
is
in
this
right
here
this
prometheus
reporting.
This
is
like
what
I'm
hearing
is
like
this
isn't
like
already
baked
into
the
code.
What
part
of
what
I'm
thinking
is
like,
so
we
have
like
a
tool
that
measures
performance
like
like
would,
for
example,
this
recording
happen
all
the
time.
Would
we
say
that
like
if
we
record
phase
changes?
Is
this
something
we
do
all
the
time
or
is
this
something
only
we
do
when
we
want
to
specifically
measure
performance.
B
Something
we
enable
as
a
debug
metric.
C
For
the
phase
change
to
some
extent,
we
might
already
have
the
metrics
implicitly
because
we
update
the
conditions
conditions
on
on
our
resources
with
timestamps,
so
that
could
and
and
there's
the
kubernetes
events
to
some
extent
like
that
could
already
be
extracted.
We
would
just
build
that
into
the
operator
more
consistently
or.
C
I
see
different
ways
for
those
phase
changes
either
like
one
could
be
a
a
metric
style.
We
we
record
when
we
reconcile
if
a
resource
switched
conditions
into
ready
or
vm
into
yeah,
if
it
should
change
phase
and
we
record
the
timestamps
into
the
time
series
or
we
add
like
tracing
that
is
annotated
with
the
resource
name
and
namespace,
and
we
actually
have
a
trace
how
long
it
spends
in
what
phase.
B
A
B
Code,
so
I
think
you're,
right
ryan
in
saying
that
this
would
be
something
new,
that
we
would
have
the
potential
of
exporting
in
multiple
ways.
A
Okay,
hey,
okay!
I
guess
like.
A
Okay,
yeah
I'm
trying
to
think
like
what.
Maybe
we
can
break
this
down
a
little
bit
more
and
let
me
I
want
to
change
this
list
a
little
bit
into
the
things
like
here's
like
one
of
them
like
a
few
of
these,
like
maybe
we
can
break
apart
this
so
like
what
what
currently
exists
in
like
prometheus.
C
B
So
we
have
ap
api
call
information,
that's
exported
by
the
kubernetes
api
server.
A
B
So
we
have
a
lot
of
that,
but
that
that's
coming
from
a
different
view
and
we're
talking
about.
I
think
it's
still
accurate.
It
tells
us
what's
being
made
and
how
frequent.
So,
I
think
it's
good
enough,
but
it's
coming
from
the
the
inside
is
coming
from
the
api
server
rather
than
from
the
actual
component.
That's
making
the
api
calls.
C
A
C
B
A
A
Should
any
of
these
be
like,
maybe
just
a
little
bit
of
code
in
the
monitoring
side
that
that,
like
we,
just
we're
just
missing
that'll
suddenly
be
like
we
can
add
to
prometheus?
Like
is
anything
that
we
see
in
here?
Oh
yeah,.
A
Okay,
so
all
of
it
will
okay
so
this.
So
then,
when
okay,
then,
when
this
makes
sense
like
so,
if
I,
if
we
created
a
new
tool,
it
would
be
capable
of
doing
go,
profiling
measuring
periods
of
time,
record,
phase
changes,
and
we
just
kind
of
like
so
things
that
are
focused
on
like
a
period
of
time.
There
are
additional
debugging
things
that
we're
doing
when
we're
doing
load
that
we
can
capture
output.
C
C
B
The
way
I
envisioned
possibly
this
this
tool
that
you're
talking
about
is
kind
of
deep
profiling
and
maybe
recording
some
of
this
developer
information
is
that
we
can
create
a
sub-resource
endpoint
that
it
turns
on
profiling
within
all
our
components,
to
begin
capturing
all
this
information
and
then,
when
we
stop
it,
it
can
aggregate
all
that
information
and
just
give
us
back
a
report.
So
that
would
give
us
some
averages
and
maybe
like
p99
of
certain
statistics
and
things
like
that.
We
find
interesting.
B
Those
same
statistics
can
be
exported
prometheus
if
we
want-
and
so
people
can
use
prometheus
to
gain
the
same
insights
they
wanted.
But
you
can
get
your
nice
like
print
outs,
if
you,
if
you
want
in
it.
A
Okay,
so
I
guess
so
let
me
let
me
characterize
this
section,
so
this
is
gonna
be
this
is
gonna,
be
so
measurements
developers
or
actually
used
to
be
anyone
anyone
wants
to
on
for
a
period
of
time
during
load
to
get
or
okay,
something
like
that
yeah
like
that's
what
we
want
to
do
with
this
sectioning
requirements,
and
then
this
is
what
we
and
so
we'll
export
to
so
export.
A
Atheists
file
standard
out-
I
don't
know
something
like
that.
B
Really
simple,
like
I
even
did
like
a
proof
of
concept
on
how
we
could
call
the
resource
the
sub
resource.
Endpoint
call
like
enable
something
in
all
of
our
components
for
a
period
of
time
and
stop
it
and
then
gather
results
like
it
should
be
just
practical
to
start
with.
I
don't
want
us
to
if
it
makes
sense
for
us
to
start
with
just
exporting
this
kind
of
deep
insight
stuff
into
a
file
in
json
format
or
whatever
great.
B
C
I
just
think
the
gathering
and
getting
into
json
file
is
actually
the
most
work
compared
to
exporting
most
of
the
stuff
to
prometheus,
because
you
have
to
make
up
your
own
storage
for
that.
You
have
to
aggregate
all
this
stuff
somehow
and
the
metrics
I
I
don't
know
if
that's
even
for
phase
changes,
I
don't
know
how
we
would
record
that
properly.
B
You
don't
have
to
put
in
a
you,
don't
have
to
store
it
anywhere,
you
can
store
it
all
in
memory
and
then
you
export
it
just
over
the
network
and
then
it's
aggregated
at
like
invert,
ctl
or
something
when
the
sub
resource
is
returning.
All
this
information
aggregate
is
going
to
aggregate
all
in
some
sort,
json
format
and
then
you're
just
dumping
it
to
a
file
when
you
get
it.
So
it's
it's
all
in
memory
until
it
actually
lands
on
your
local
machine.
C
Yeah,
but
that's
like
that's
already
the
hard
part
like
we
would
have
to
store
data
memory.
That
could
be
quite
a
lot
and
we
there,
when
we
already
have
systems
that
can
do
that
for
us
like
something
like
prometheus
or
any
any
service
that
can
scrape
a
matrix
endpoint
like
even
the
even
the
tracing
that
I'm
I'm,
I'm
still
a
big
fan
of,
and
we
like
to
see
super
tracing.
It
just
exports
traces
and
they
get
collected
by
something
else.
It
doesn't
store
anything
in
memory
because
that's
too
much.
B
A
Yeah,
like
like,
I
guess
what
I'm
saying
is
like
yeah,
I
think,
giving
the
opportunity
for
someone
to
export
the
data.
I
think
is
I
I
understand
what
you're
saying
about
previous.
It
makes
sense
to
me
like
it's
important,
it's
it's
a
great
tool
and
it
gives
us
a
lot
of
power.
A
It's
just
it's
the
dependency
of
of
having
to
have
it
well,
I
think
like
in
a
lot
of
cases,
we're
we're
going
to
leverage
that,
but
I
also
think
there's
value
in
and
also
having
another
data
format.
You
know
whatever
json,
but
I
I
think
that's
something
we
can
evaluate
like,
as
as
like,
we
go
a
little
bit
further.
Like
dave
was
saying:
what's
the
easiest,
I
think
we'll
get
sort
of
a
reasonable
idea
for
that
when
we
start.
B
It
wouldn't
be
a
time
series
database,
it
would
be
a
a
metric,
that's
calculating
a
single
like
we're,
not
we're
not
going
to
be
taking
samples
of
this
over
and
over
and
over
and
presenting
all
those
samples
back.
I
don't
think
I.
B
Work
queue
length
or
whatever
we
probably
have
like
a
running
average
and
maybe
like
a
max
and
a
min
or
or
things
like
that,
that
we
we
keep
in
memory,
not
a
this
is
the
sample
every
millisecond
of
how
like
it
wouldn't
be
time
series.
That's
not
what
I
envisioned
at
least,
maybe
that's
what
others.
A
Yeah
well
for
some
of
the
things
like
for
like
phase
changes
right
like
we're.
Gonna
have
timestamps
for
each
of
the
events
for,
like,
let's
say
we're
watching
for
10
minutes,
and
we
see
a
bunch
of
pod
changes.
We'll
have
we'll
have
you
know
whatever
extended
pods
that
go
through
those
phase
changes
and
we'll
have
times
for
them.
B
So
we
would
have
maybe
it's
a
bmi
that
every
vmi
would
be
a
key
and
then
you'd
have
a
series
of
of
time
stamps
associated
with
each
phase
of
that
bmi,
which
would
be
exported.
B
C
Yeah
but
then
again
like
for
the
for
the
phase
changes,
I
still
think
it
wouldn't
have
to
be
something
that
runs
in
our
operator
or
anything.
This
can.
This
could
be
done
by
tool,
that's
externally
and
exporting
this
stuff,
because
it's
getting
it
from
kubernetes
events,
anyways
enough
to
watch
the
resource.
B
C
Anything
that
like
it
just
as
far
as
understood
the
phase
changes
it
just
has
to
be
something
that
watches
the
vmi
resource
on
the
kubernetes
api
server
and
records
itself.
It
doesn't
have
to
be
in
the
in
the
recon
in
the
operator
itself.
It
can
be.
A
B
A
A
It's,
I
guess,
what's
mostly
like
that,
I
just
wanted
for
to
sort
of
protect
us
the
so
the
idea
like
I
talked
about
portability
like
if,
like
in
different
ways,
different
personas,
like
people
who
want
to
use
different
sort
of
different
stacks
like
if
I'm
just
doing
like
regular
development,
like
I
don't
really
have
a
prometheus
stack
like
I
just
have
my
cube
cluster.
It's
just
got
keyword
on
it
and
you
know
what
do
I
need
to
do
to
get
all
this
national
metrics?
I
have
to
go
and
enable
one
I
don't
know.
A
Maybe
that's
just
the
case
that
I
have
to
deal
with
because
of
I'm
doing,
load,
testing
or
I'm
doing
performance,
and
so
I
should
have
one
so
maybe
that's
just
the
right.
B
Sure
yeah,
I
think
it
might
be
the
case.
I
know
at
least
in
our
development
development
environments
that
we're
using
with
like
cluster
up
and
stuff
like
that
there
is
a
provider
that's
being
developed
in
kubernci,
that's
going
to
come
with
prometheus
and
grafana
built
in
automatically,
so
it's
going
to
be
something
that's
more
accessible
to
developers
soon.
B
A
We
can
I
could
just
take
like
so
I
guess
let
me
look
at
it
from
a
different
way
if
we
enable
the
stuff
to
be
described
by
prometheus,
and
we
say
okay,
if
prometheus,
like
I'm
trying
to
think,
can
I
enable
these
identical
use
cases
like
so
I
guess
well
all
right.
Two
things.
One
of
them
is,
though
the
on
wrap
was
one
of
my
concerns
like
okay,
I
guess
like
so.
If
I
get
over
that,
that's
that's
fine.
A
The
other
thing
is
other
types
like
one
of
the
things
like
I
like
to
like
look
at
is
I'll
want
to
take
wait.
Wait
was
busy
with
json
like
I
can
take
it.
I
can
make
visualizations
like
if
I
wanted
to
do
it
in
excel
or
any
other
format.
I
have
a
little
bit
more
flexibility
and
I
could
just
scrape
prometheus
right
just
because
it's
just
json
there
anyway.
I
could
just
and
then
build
it
from
there.
C
The
baristas
yeah
you
can't
just
prometheus
is
not
exactly
it's
not
jason,
I
think,
but
you
can
yeah,
you
can't
just
describe
prometheus
or
you
can.
If
you
don't
want
to
want
your
prometheus,
you
can
just
build
an
application
that
understands
prometheus
format,
which
is
very
easy
and
end
points
yourself.
A
B
The
right
I
hate
it,
but
I
think
that's
the
right
approach
just
to
see
how
far
that
gets
us
if
we
find
it
obviously
just
isn't
going
to
give
us
the
fidelity
and
response
that
we
need
and
we'll
create
our
own
thing.
There's
some
things
that
cambridge
replaced
like
the
go
profiling
when
it
actually
comes
to
understanding
where
we
spend
the
most
time
and
what
functions
and
things
like
that.
B
Certainly
we
have
to
build
our
own
tooling,
to
aggregate
that,
but
really
all
this
other
stuff
that
I'm
saying
we
can
export
as
a
metric
like
the
phase
changes
we
can
create
in
our
watch
and
the
bmi
controller
say:
okay,
we
got
a
vmi.
We
see
that
the
previous
one
didn't
have
the
space,
and
now
it's
running
so
we
can
export
to
whatever
is
exporting
the
prometheus
that
that
information,
so
it
gets
out
there.
C
Yeah
david,
maybe
some
context
on
the
go
profile
because
you
talked
about
aggregating.
Did
you
have
more
in
mind
than
what
gopro
stuff
outputs
on
an
endpoint?
If
you
wanted
to,
or
do
you
mean
aggregating
those
for
per
service.
B
Yeah,
it
would
be
aggregating
the
pprof
information
back
so
that
binary
pprof
information
would
be
aggregate
back
per
component
where
you
could
go
in
okay,.
B
Can
consider
like
using
bert,
ctl
dev
start
profiling,
then
stop
profiling
into
dump,
and
then
it's
going
to
dump
all
the
aggregated
pprof
information
into
like
a
zip
file
or
something
yeah.
C
Okay,
yeah
and
like
I
want
to
advertise
this
again,
like
I
think
quite
some
few,
including
the
work
use,
could
be
something
that
profits
from.
If,
if
we
think
more
about
adding
tracing
to
our
code
in
general,
like
open
tracing
or
what
it's
called
now
like,
you
can
get
flame
charts
of.
What
is
what
is
going
on
in
our
operators.
Kubernetes
is
doing
this
now
to
some
extent,
and
it's
it's
pretty
great.
C
To
visualize
that
there
is
a
few
I
think
the
most
common
one
is
called
jaeger,
but,
like
I
know,
I
used
one
that
was
built
into
tcp
for
a
while.
I
don't
know
about
the
others,
and
you
can
annotate
it
with
like
you.
Can
you
could
even
have
traces
that
let
you
query
how
a
bmi
progressed
through
the
cluster
that'd.
C
I
had
a
great
time
with
that
and
it
gave
a
lot
of
insights
on
our
operation,
the
previous
job
on,
because
they
got
stuck
in
stuff.
They
were
managing
kafka
and
kafka
was
low
and
we
figured
out
through
that.
C
A
Okay,
yeah,
I
think
that's
that
makes
it
like
this
opens
up
a
bunch
of
tooling
for
us
like
going
the
prometheus
around
it,
and
even
just
like
these
things,
there
was
a
lot
of
tooling
and
then
even
what
we
can
do
here
with
the
profiling.
So
then,
okay,
so
then
I
guess
to
break
this,
to
break
this
down
measure
performance.
It
sounds
like
we
have
two.
A
We
have
one
tool
and
it's
it's
specifically
for
the
for
the
profiling
and
for
for
doing
tracing,
and
these
are
just
going
to
be
our
requirements
and
then
we
have,
and
then
we
look
at
the
rest
of
the
stuff
kind
of
as
as
metrics.
So
these
would
be
basically
a
package
inside
of
cuber,
so
not
necessarily
a
tool,
it's
just
something
that
is
there
and
then
these
we
we
enable
right,
like
we
enable
with
we
have
like
some
performance
sub
resource
that
we
enable,
like,
I
think
we
already
collected
these.
A
B
D
Yeah,
yes,
that's
that's
something
relating
to
how
we
present
the
result.
So
if
yeah
I
like
to
appear
using
just
some
some
resources
of
the
bmi
to
export
the
metrics,
I
a
concern
here
is:
if
anything
changes
on
the
vmi
object
will
cause
the
inq
and
decreasing
the
q
less,
which
might
increase
the
burden
of
overheading.
The
word
controller,
so
I'm
thinking
yeah
just
the
brainstorming.
D
So
if
we
could
using
the
crd
or
export
the
metrics
in
different
than
the
bmi
object
like
using
the
customer
metrics
api,
is
that
whether
that
be
a
better.
A
A
Okay,
I
think
we,
this
could
be
something
we
could
look
at
as
part
of
the
sign.
Okay,
I
think
the
the
I
guess,
the
next
step,
I
think
to
me
this
makes
sense
like
as
a
set
of
requirements-
and
I
think
it's
clear
to
me
like
this-
is
its
own
tool.
This
is
in
courtroom
code,
so
we
have,
I
guess,
two
different
things.
Okay,
do
we
have,
I
think
so.
I
think
what
next
steps
on
this
is
we
need
to.
We
need
to
have
like
a
formal
design.
A
Is
that
does
that
make
sense
like
we
can
do?
Let's
do.
Let's
do
some
two
designs,
one
here
one
here.
Does
anyone
want
to
volunteer
for
this
like
for
like
for
one
of
these,
and
we
can
look
at
or
I
mean,
or
we
can
do
it
together?
I
don't
know
what
what
what
do
people
think
does
that
make
sense
to
the
next
step.
A
A
B
Want
to
do,
I
think
the
design
is
enabling
a
a
path
for
us
to
enable
debug
metrics,
so
that-
and
these
are
all
just
items
that
we
would
add
after
the
fact.
So
how
do
we
approach
turning
on
debug
metrics
in
cubert
and
exporting
them
all
right,
and
then
we
can
just
add
all
of
our
metrics
to
that
package.
B
C
I
think
only
a
few
of
those
I
would
enable
make
disable
enable
and
disable
because,
like
the
work
you
length
and
like
measure
periods
of
time
is
like
kind
of
given
by
the
metrics
concept
itself,
but
work
your
length
would
be
something
that
I
would
always
send
out
to
winners,
because
it's
like
a
no-brainer
and
it
helps
with
our
scalability
testing.
And
I
feel
we
already
have
a
few
initiatives
around
metrics
and
alerts
and
all
that.
Maybe
we
should.
C
B
C
Don't
know
the
working
link,
I
don't
know
if
that's
like
more,
I
don't
know
about
the
work
using
detail
right
now,
but
it's
more
if
it's
more
high
high
and
low
watermark
thing
that
could
be
really
helpful
because
people
sometimes
wonder:
why
is
my
resource
not
being
reconciled
in
huge
environments
and
the
ops
team
can
see
why?
Because
we're
cuteful.
A
Yeah,
I
I
I
think.
Well,
I
I
just
don't
like
some
of
these
yeah,
I
don't
know.
I
think
these
need
to
be
turned
on
and
off
like.
I
just
think
that
at
least
so
from
my
perspective
as
a
developer
persona,
I
only
really
want
to
look
at
this
like,
for
instance,
I
like,
if
I
know
I'm
doing
a
load
test
I
want
to.
You
know,
have
a
specific
time
period
that
I
expect
to
for
it
to
be
running
because
you
know
whatever
I
have
this
tool
generate
load.
I
know
when
it
starts.
A
I
know
it's
ends.
I
I
want
to
capture
during
that
time
period
this
stuff
and
then
the
rest
of
the
time.
I
really
don't
care
like
it.
You
know
I
and
then,
from
the
operator
perspective
from
the
the
end
user
perspective,
yeah.
What
do
you?
What
are
you
getting
from
this?
I
mean
I
guess
you
can
get
like
okay
and
it's
taking
a
long
time
scheduling
I
mean
you
could
see
that
anyway.
I
guess
like
some
of
the
stuff
like
yeah,
I
don't
know
what
it's
hard
to
say.
C
So
so
for
us
ex
like
if
we
talk
in
the
prometheus
stylometric
context
for
us
exporting
metrics
is
as
cheap
as
it
gets.
The
only
disabling.
The
only
thing
we
would
probably
disable
is
not
recording
them
because
to
check
if
it's
enabled
is
more
work
than
actually
exporting.
We
would
just
disable
that
we
export
this
method.
C
It
would
not
show
up
in
the
metrics
end
point,
so
we
would
only
relieve
pressure
from
the
prometheus,
that's
scraping
it
and
the
producers
can't
say
I
don't
care
about
this
metric
and
not
store
it
and
for
measuring
a
period
of
time.
You
would
either
in
this
time
period
you
generate
a
load,
and
you
would
only
look
at
this
time
span
in
your
time
series
or
you
would
only
scrape
during
a
time
series
if
you
turn
on
your
prometheus.
Just
for
that.
B
A
C
Yeah
because,
like
the
metrics
themselves,
mostly
are
atomic
ins
or
floats.
A
Okay,
I
mean
that
sounds
like
to
me,
like
we
kind
of
just
captured
a
lot
of
the
design
right
like
that.
That's
that's
it
right.
We
just
we're
going
to
capture.
We
just
decided
we
export
and
the
means
that
we
do
that
in
the
way
that
it's
configurable
I
mean
it
sounds
pretty
natural
to
me
on
the
keyboard:
cr,
yeah,
okay,
okay,
so.
A
Okay,
all
right,
so
then
the
we're
running
a
long
time,
but
I
guess
the
last
question
is
then
I
I
think,
like
that
to
me
seems
to
be
what
would
do
I.
So
I
guess
I
mean
what
how
do
people
want
to
look
at
this?
I
I
mean
if
you
want,
I
can
write
sort
of
a
design
for
this,
like,
I
think,
can
kind
of
capture
this.
What
we
talked
about
today
and
some
notes-
and
that
could
just
be
our
like
design
like
the
keyboard
cr,
we
export
it.
A
A
C
One
more
addition
like
some
of
those
might
like
the
the
amount
of
work
required.
The
first
quite
a
bit
like
working
length,
might
be
surprisingly
easy,
but
the
phase
changes
it
might
need
more
of
a
concept
of
how
we
actually
record
them
and
where
we
record
them
or
okay,
the
latency,
yeah
and
stuff.
That's
that's.
I
think.
A
We
can,
I
think,
we've
carved
it
out
enough
that
I
think
that
it's
that's
something
we
can
discuss
in
the
in
the
in
the
pull
request.
The
way
we're
always
working
on
it
can
just
can
kind
of
take
the
lead
on
that,
because
I
think,
because,
like
yeah
you're
right-
and
that
is
a
little
different
for
each
of
them,
but
I
think
yeah-
that's
something
we
can
discuss
kind
of
when
we
get
to
start
digging
into
the
code.
C
B
A
There
is
an
issue
someone's
at
it.
Roman's
doing
it.
I
don't
know
where
it
is.
There's
an
issue
somewhere.
Oh,
I
have
it
in
another
tab
somewhere
that
it's
looking
to
be
added,
I
think,
by
default
for
ci.
So
that's
actually
a
dependency
me
to
track.
A
B
From
the
developer
flow-
and
I
mean
I'm
not
sure
if
you
all
are
using
cluster
up
and
cluster
down
and
things
like
that
for
development,
okay,
yeah!
Well,
if
you
started
to
at
least
for
development,
I
understand
it
doesn't
make
sense
for
testing
that
that's
an
easy
path
to
begin,
just
gaining
experience
with
prometheus
and
grafana,
and
all
that
once
once
that
provider
lands,
I
mean
it's
just
trivial
at
that
point,.
C
I'm
using
I'm
on
using
cluster
sync
with
my
own
cluster
that
has
promises
on
it,
but
I
tried
to
get
prometheus
into
cluster
up
before
and
I
had
not
enough
ram.
C
B
C
No,
it's
I
think
it's
32,
but
like
it
was,
it
wasn't
even
around.
It
was
a
cpu
course
like
there's
a
flag
cpu
course
that's
defaulting
to
two,
and
if
you
do
a
lot
of
work,
your
two
cpus
sadly
die
at
some
point:
okay
and
yeah.
A
Okay,
I
think
this
is
good,
I
think
so
I'll.
Do
the
I'll
take
the
item
to
I'll
capture
this
kind
of
everything
we
talked
about
here
as
a
design
and
sort
of
a
design
document
I'll
show
you
on
the
mailing
list
in
the
doc
I'll
have
these
and
let's
we
can
have
some
people
sign
up
to
take.
You
know
one
of
them,
some
of
them.
They
want
to
work
on
and
we
can.
We
can
start
tackling
it
that
way.
C
One
more
addition:
I
have.
We
have
the.
C
The
api
calls
mapas,
I
recently
checked,
and
the
only
way
we
see
if
the
api
calls
incoming
to
our
api
is
through
the
api
metrics
supported
by
quantities,
and
they
are
not
that
great
from
what
I've
seen
so
far
like
one
problem
being,
we
only
get
latency
across
some
endpoints
and
they
get
polluted
by
the
console
and
stuff
called
so
our
request.
Latency
is
insanely
high,
because
people
do
console.
B
C
A
Yeah,
that's
definitely
something
we
could.
We
can
look
at
okay,
great,
I
we're
already
a
few
minutes
over.
So
I
I
think
that's
good
I'll.
Do
we'll
look
at
this
so
next
time
we
can
talk
about
scale
or
whatever.
Let's
we'll
just
focus
on
this,
we
gotta
get
our
measuring
down
first.
So
let's
do
that
and
we'll
start
to
start
executing
on
that.