►
From YouTube: Kubernetes SIG Testing 2018-04-17
Description
B
B
B
We
kind
of
like
started
with
a
retrospective
of
what
we
had
done
in
the
last
year
and
kind
of
like
a
lot
of
the
recent
performance.
Isolation
and
you
know,
low-level
support
for
performance,
sensitive
applications,
work
in
the
cubelet
was
driven
out
of
that
working
group
and
then
kind
of
you
know
incubating
features
that
got
offloaded
to
SIG's
for
for
code
ownership.
B
Most
of
that
stuff
went
to
sig
node
but
stuff
like
CPU,
pinning
and
support
for
pre-allocated
huge
pages
device
like
accelerator,
plugins,
I'll
kind
of
like
started
and
in
conversations
in
that
group,
and
there
are
some
more
proposals
that
are
on
the
backlog.
But
we
kind
of
came
to
the
conclusion
consensus
in
that
forum
that
we've
reached
the
point
that
even
engineers
that
have
done
a
lot
of
performance.
B
So
to
that
end,
there's
really
two
goals.
Right.
One
is
to
detect
regressions
for
a
different
kind
of
like
vertical
workloads
that
you
know
vendors
and
users
care
about.
So
you
know
certain
classes
of
performance
sensitive,
like
high-performance
networking,
machine
learning.
Training
is
another
one.
That's
come
up
a
lot,
so
you
know,
there's
kind
of
like
certain
use
cases
that
users
are
kind
of
we're
pushing
for
these
features
are
using.
So
it
would
be
great
to
kind
of
like
get
some
representation
of
that
in
a
in
a
performance
test.
B
Suite
that
we
could,
you
know,
use
to
detect
regressions
over
time.
So,
for
example,
if,
like
1.11,
you
know
if
it
makes
this
workload
run
20%
worse,
it
would
be
nice
if
we
knew
ahead
of
time
and
then
the
other,
the
other
part
is
kind
of,
as
I
said,
which
is
to
help
us
to
prioritize
and
evaluate
the
effect
of
future
performance
events.
B
So,
but
that's
that's
really
what
this
document
is
about.
It's
not
very
prescriptive
about
like
where
the
code
goes,
or
you
know
the
testing
infrastructure
or
anything
like
that.
It
was
kind
of
like
the
next
step
coming
out
of
the
resource
management
worker
was
to
just
raise
this
topic
within
the
cig
testing
forum
and
try
to
get
some
feedback,
some
guidance
and
kind
of
like
helpful.
So
long.
So
with
that
brain
doubt,
is
there
any
feedback.
B
I've
already
talked
to
Tim
Sinclair
who's
on
the
line,
so
got
his
thoughts
already,
but
open
to
anything
that
anyone
else
can
think
of.
It
would
be
a
good
place
to
start
concretely
we're
thinking
about,
like
you
know,
initially
just
for
ease,
maybe
adding
a
subdirectory
in
test
no
D
to
e
and
then
having
it
not
run
as
part
of
the
default
set,
but
letting
smutty
reasons
run
it
themselves
and
then
worry
about
like
integrating
it
into
some.
B
That
saying,
is
implemented
such
that
you
could
add
more
policies
and
so
really
to
get
any
sort
of
signal
about
the
effectiveness
of
like
a
more
autopilot
like
policy.
We
wouldn't
want
to
run
it
against
a
suite
of
you
know,
well-known
workloads,
there's
also
the
question
of
Numa
or
kind
of
like
low-level
device,
topology
aware
resource
allocation
within
the
cubelet.
So
right
now
we
have
a
you
know:
device
manager,
subsystem
that
lets.
B
So
those
two
features
just
by
themselves.
You
know
you
can
use
them
in
isolation,
but
if
you're
on
a
multi
socket
system-
and
you
enable
both
features-
you're-
not
guaranteed
that
you're
going
to
be
pinned
to
a
cpu
on
the
same
Numa
node
as
your
device.
That
was
allocated
for
example.
So
that's
one
thing
that's
kind
of
on
the
horizon.
They
want
to
be
able
to
test
and
see,
like
you
know,
for
different
workloads.
C
I
guess
that's
where
I'm
I'm
trying
to
understand
if
we're
talking
about
node
level,
changes
impacting
cluster
level
work
once
or
or
if
you're
talking
about,
because
to
me
when
I'm
just
reading
through
the
dock,
I'm,
not
sure
when
I
think
of
no
testing
I,
think
of
testing
a
single
node
in
isolation
as
opposed
to
more
work
load
that
into
a
cluster.
Maybe
my
terminology
is
confused.
Yeah.
B
So
not
multi-node,
but
you
know
on
a
single
node.
Maybe
you
have
like
you
know,
a
an
application
whose
performance
you
care
about
a
lot,
and
you
know
if
you
want
to
kind
of
like
pressure
it
by
adding
some
aggressors
like
something
that
uses
a
lot
of
cash
or
something
that
uses
a
lot
of
memory
bandwidth,
for
example,
extremum
workload.
That
would
be
a
way
to
kind
of
like
keys
out
like
in
what
situations
do
these
performance
optimizations
help?
B
A
A
B
That's
kind
of
a
tricky
one,
you
know
so
for
multi
socket
GCP.
As
far
as
I
know.
Last
time
we
checked
it
doesn't
support
there.
It
doesn't
report
multiple
Numa
nodes,
so
even
their
biggest
machine
is
just
you
know
one
like
logical
socket,
you
know.
So
you
know
if
we
don't
have
to
get
everything
in
the
first
step,
if
we
can
start
somewhere
and
then
kind
of
like
iterate,
that
would
be
fine
over
time.
It
would
be
good
to
maybe
extend
you
know.
B
A
lot
of
these
features
are
really
kind
of
intended
for
the
bare
metal
use
case.
So
if
we
could
get
some
bare
metal
machines
in
a
different
testing
infrastructure,
that
would
be
cool,
but
you
know
we
don't
necessarily
expect
like
CN
CF,
for
our
carré's
to
provide
all
the
infrastructure,
but
there's
a
place
to
host
the
code.
It
would
be
kind
of
nice
if
it
was
upstream
but,
like
I,
said
open
to
all
sorts
of
opinions.
At
this
point
it's
kind
of
wide
open
at.
D
This
UNCF
see
I
at
least
packet
is
an
option
and
that
that's
like
more
or
less
bare
metal
wine.
So
there
should
be
some
way
of
wiring
in
something
not
saying
that
I'm.
You
know
it
should
be
too
disruptive,
but
you
know
Amazon
for
first
and
foremost
would
have
BMS
that
supports
like
multi
socket.
So
this
would
be
a
path
forward.
C
So
so
that
all
makes
sense
to
me
like
any
act
like
this
totally
seems
like
I'd
like
it
a
good
idea
to
do
I
guess
the
other
thing
that
jumps
out
to
me
is
metrics
collection,
so
I
would
really
love
for
somebody
to
correct
me
if
I'm
wrong,
but
I,
don't
think
we
have
the
greatest
support
for
metrics
collection
in
our
tests
tools.
Today,
we
like
so
one
example,
would
be
the
scalability
tests.
C
They
measure
like
CPU
and
memory
usage,
I,
think,
but
that's
some
like
hard-coded
thing
that
lives
somewhere
in
the
test
e
to
be
packaged.
We
don't
support
something
nice,
like
continuously
scraping
given
set
of
Prometheus
metrics
into
some
kind
of
file
that
could
be
exported
and
then
be
imported
into
something
later
and
I
feel
like
that's
going
to
be
the
larger
hurdle
that
you
may
have
to
overcome,
but.
A
C
And
then
the
other
thing
that
jumps
out
to
me
is
looking
at
storing
the
test
results
in
I.
Don't
know
that
our
testing
or
testing
infrastructure
doesn't
directly
store
test
results
in
protobuf
format
and
GCS.
We
store
test
results
in
GCS
as
like
JSON
files
or
XML
files,
machine
parts,
cool
stuff
and
then
there's
a
separate
job
that
goes
through
and
scrapes
GCS
and
converts
that
into
data
and
then
gets
scraped
by
something
else.
It
puts
into
bigquery
and
something
else
straight
to
a
query
and
converts
planning
to
print.
Above.
C
Maybe
the
test
conserves
out
so
all
that
to
say
like
mom,
human
readable,
but
machine
parsable
data
is
kind
of
the
expected
artifact
and
then
the
rest
of
the
jQuery
can
take
it
from
there.
So.
B
E
C
C
B
E
E
C
Everyone's
posting
links
to
the
docs
faster
me
I,
don't
have
a
great
ovary
doc,
but
I
think
somewhere.
Looking
at
maybe
the
Cooper
Nader
read
me
and
it
talks
about
the
expected
final
format
that
things
should
be
posted
into
GCS,
but
I.
Don't
really
have
something
that
concisely
describes
how
things
float
from
one
place
to
another:
okay,.
B
B
A
C
F
Already
chatted,
like
he's
pretty
much
I,
gave
him
a
couple
different
options.
There's
the
question
that
I
posed
towards
him
is
how
many
people
are
going
to
watch
this
signal
who's.
How
is
it
going
to
block
just
because
we
have
tests
and
test
grid
and
we
actually
even
get
the
data
up?
There
doesn't
mean
that
it
blocks
things
and
people
even
watch
it.
So
having
the
signal
and
having
the
right,
Watchers
I
think
is
the
important
part
to
the
logistics
matter.
C
I'm
still
somewhat
on
the
hook
to
document
this,
but
it's
kind
of
a
crawl
block,
one
thing
where
first,
let's
make
sure
we
can
actually
get
the
data
into
you
test
period
and
then
we'll
make
sure
we
have
humans
watching
it
and
then
we'll
make
sure
you
have
the
machinery
to
enforce
all
this
stuff.
But
right
now
we
don't
have
a
whole
bunch
of
that
documented,
very
concisely,
outside
of
like
a
Google
Doc
that
I
have
from
any
issue
that
is
probably
marked
as
stale
by
a
beta
bot.
C
New
tests
call
them
blocking
and
make
sure
they
remain
like
what
criteria
they
have
to
meet
to
be
blocking,
and
then
how
often
we
should
verify
that
they're
still
meeting
those
criteria
in
terms
of
test
generation
and
flakiness
and
ownership
and
responsiveness
of
those
owners
to
test
failures,
and
things
like
that
well,
I
I
mean
I,
was
just
made
me
interested
in
like
mechanically.
How
would
this
work
living
outside
of
the
repo
more
is
a
may,
be
an
update
from
communitive
like
where
you
can
see
this
heading
I.
B
Yeah
I
think
I
mean
that's.
One
of
the
issues
is
that
it's
meant
to
be
kind
of
shared
responsibility,
although
that's
kind
of
a
bad
term,
nobody
likes
but
yeah,
sure
donorship.
Let's
say
you
know
any
any
sort
of
like
a
feature
owner
for
one
of
these
performance
features
should
be
interested
in
maintaining
the
test
and
there
are
a
bunch
of
different
organizations.
Yeah.
F
C
Tell
you
like
the
human
oriented
process
I
put
together
when
I
was
CIA
to
lead
for
release
a
little
while
back,
which
was
step.
One
which
we
do
in
force
is
make
sure
that
every
job
that
runs
a
suite
of
tests
is
owned
by
some
saké.
So
it
should
show
up
on
some
six
dashboard
somewhere.
So,
for
example,
six
scaleability
owns
the
performance
related
jobs,
so
it's
their
responsibility
to
respond.
If
those
jobs
go
from
green
to
red
listening,
they
stay
great,
but
then
all
of
the
individual
test
cases
insane
like
the
correctness
job.
C
That
makes
sure
that
kubernetes
is
behaving
like
a
kubernetes
of
five
thousand
nodes.
They
don't
code
any
of
those
features.
So
each
of
those
individual
test
cases
are
owned
by
the
snake.
That's
responsible
for
that
feature,
and
so
it's
kind
of
their
job
to
go,
say:
hey,
you
know,
sake
network
or
whatever,
like
your
new
proxying
feature,
isn't
working
at
scale.
You
need
to
fix
it.
C
That's
scale
like
it
might
be
worth
your
test
might
be
working
totally
fine
and
AWS
and
in
small
scale
and
GCE,
but
it's
really
broken
at
high
scale,
and
so
try
to
that's.
The
shared
responsibility
in
terms
of
you
have
like
a
watcher
on
the
walls
is
a
sink
to
watch
the
job,
but
the
person
responsible
for
owning
the
feature
itself
or
the
thing
that's
actually
being
tested,
will
generally
own
the
test
case.
C
We
don't
really
have
machinery
around
automating,
this
sort
of
notification
of
people
in
practice
as
a
human,
when
I
was
applying
this
to
you,
jobs
and
tests
that
were
blocking
the
release
or
going
out
the
door,
it
tended
to
work,
and
it's
like
it's
dumb
enough
that
I
think
it
could
be
converted
into
something
automated.
C
Mean
I
kind
of
written
up
at
a
dock
and
I'll
work
it
with
the
current
see
a
signal
person
Oh
like
the
convention,
is
basically
just
make
sure
that
the
state
responsible
for
the
future
shows
up
in
brackets
in
the
testing.
It's
you
know
very
strongly
tight,
but
most
of
the
test
cases
follow
that
pattern.
Today,
there
very
few
that
don't,
and
so
we
can.
C
We
can
start
with
that
for
now,
like
I
personally
feel
the
amount
of
time
it's
going
to
take
you
to
get
reliable
test
data
and
a
meaningful
enough
signal
to
even
block
anything
was
probably
even
of
like
112
timeframe,
by
which
time
I
would
hope.
We
would
have
something
more
enforceable
in
place
in
terms
of
defining
what
blocks
it
doesn't
right.
C
So,
for
example,
what
I
have
advocated
for
in
the
past
and
or
these
teams
have
never
done
is
say
like
we
are
literally
not
moving
forward
in
the
release
schedule
until
all
of
the
tests
on
this
dashboard
are
green.
Well,
you
know
what
we've
always
traditionally
done
is
cop
builds,
even
though
all
the
tests
might
be
failing,
which
aids,
alphas
and
betas
go
out.
C
The
door
a
predictable
schedule,
but
we
literally
have
no
idea
how
functional
they
are,
and
then
we
start
chasing
that
game
when
it
comes
to
actually
cut
release
candidates
and
go
through
burned
down
and
whatnot.
So
what
the
release
team
is
trying
this
time
as
a
Karen's
that
it
may
stick
is
to
say,
if
you
can
get
more
tests
passing,
we
will
freeze
later
so
that,
ideally,
if
things
are
stable,
you
don't
have
to
go
into
a
code
freeze
in
order
to
stabilize
everything.
F
Right,
like
we
have
a
bunch
of
tests
that
are
still
non-functional
and
I'm
I
have
to
go
through
and
take
ownership,
whether
I
want
to
or
not
for
some
signal,
because
we
have
product
that
went
out.
That
has
problems
right
because,
like
every
dot,
zero
release
of
kubernetes
and
I
know
Justin's
not
looking
at
the
camera,
but
he
can
probably
nod
vigorously
that
every
dot,
zero
release
we've
ever
had
has
been
a
steaming
pile
of
awful
and
we
fixed
the
problems
almost
really
really
fast.
F
C
Help
the
release
team
and
please
advocate
that
the
release
team
actually
have
the
power
to
delay
the
release
schedule.
If
the
tests
are
passing,
like
I
really
tried
to
be
very
diligent
about
defining
what
how
stable
should
a
set
of
tests
be
before
we
decide
that
or
like
how
flaky
should
they
get
before
we
decide?
You
know
what
this
isn't
meaningful
signal
and
like
who
should
be
responsible
for
saying.
No,
no,
this
feature
absolutely
must
be
in
a
release,
and
then
how
does
the
conversation
proceed
with
like?
C
Well,
if
you
want
feature
in
the
release,
you
actually
have
to
have
automated
tests
that
are
this
reliable
to
make
sure
that
feature
actually
works.
Otherwise
we're
definitely
shipping
a
broken
feature
and,
like
the
release
team
totally
in
my
opinion,
should
have
the
power
to
do
that.
It
seems
to
be
more
of
a
like
product
level,
conversation
and
I'm,
not
entirely
sure.
C
F
G
Is
a
structural
bug
in
the
process,
which
is
that
the
merge
to
the
next
release
opens
before
the
release
team
is
decided
so
right
from
the
get-go,
the
release
manager
can't
say,
put
right
to
select
right
from
the
get-go
there
behind
the
bowl.
One
thing
is
that
this
is
no
longer
true.
Okay,
can
you
describe
what
you
mean
a
little
bit
there
I
so
I,
don't
think
we've
concrete
company
I,
don't
know,
we've
filled
out
the
111
team,
yet
entirely
it
is
ranches.
It.
C
G
C
C
Sorry,
but
the
test
we've
been
passing
when
we
froze
it.
You
understand
that
right,
the
testable
this
better
than
passing
it
tries
me
bananas.
So,
in
my
opinion,
I
would
much
rather
happily
ship
when
it's
done.
We
ship
when
it's
ready
schedule,
but
what
what
seems
to
be
preferred
by
those
who
have
a
more
marketing
or
product
oriented
meant
exactly
ship
on
a
predictable
schedule.
Every
court.
F
The
question
that
I
have
a
really
difficult
time
answering
is
it's
like
this
imaginary
force
that
no
one
can
see,
but
it
kind
of
pushing
the
universe.
Maybe
it's
like
dark
energy
right,
the
expansion
of
the
universe.
It
continues
unabated,
but
like
there's,
no
particular
person
that
is
doing
that,
pushing
like
it
so
much
become
a
pathological
thing
where
we
continue
down
this
road
because
we've
done
it
for
so
many
times,
even
though
like
no
one
is
actually
doing
the
pushing
right
like
they're
there.
G
C
Bonkers
and
bananas,
it
drives
me
up
the
wall.
The
fact
that
we
kind
of
very
well-attended
talk
at
the
last
week
on
about
how
we
should
change
the
release.
Scheduling
process
has
been
completely
swept
under
the
rug
and
each
release
to
basically
already
has
a
schedule
ready
to
go
before
we've
even
done
our
postmortem,
it's
bananas
I,
agree,
I
get
but
I'm
trying
to
suggest.
Maybe
this
is
more
release
oriented
topic,
as
opposed
to
a
testing
topic
for
like
we.
C
We
provide
signal
and
I
think
our
tools
mostly
do
a
good
job
of
that
now,
whether
they're
not
the
thing
that
generates
the
signal
is
meaningful
or
not,
is
kind
of
not
entirely
our
responsibility
if
I
were
to
for,
as
somebody
who's
been
a
CI
signally,
for
least
before
I
think
one
thing
we
could
do
better
is
make
tester
and
understand
a
little
bit
more
than
it
does
about
test
hierarchy,
because,
like
right
now,
I
can
go.
Look
at
a
summary
dashboard
and
see
perpetually
failing
test
cases
that
aren't.
C
Actually
they
don't
exist
anymore,
but
test
grid
has
some
has
hysteresis
we're
like
test.
Failures
can
show
up,
but
the
overall
test
doesn't
show
up
so
I
know
that
I
shouldn't
actually
look
at
that
job,
because
the
overall
job
is
passing
the
failure
user
left
over
residual
stuff.
So,
like
there's
stuff,
we
could
do
to
to
prescribe
more
of
a
signal
for
those
results,
but
ultimately
it's
the
responsibility
of
people
generating
those
test
results
to
do
stuff
with
them.
C
C
How
do
we
notify
and
how
do
we
make
this
meaningful
it
almost
to
me,
you
don't
want
to
seem
like
people
paid
more
attention
when
I
as
a
human
being
came
and
bucked
them,
because
they
trusted
that
I
was
a
little
less
noisy
than
a
bot,
so
until
we
can
figure
out
a
way
to
make
less
muzie
automated
notifications,
I
still
feel
like
it
deserves
some
tight
collaboration
with
the
release
team,
first
and
foremost,
I'm
happy
to
help
with
to
do
and
it's
true.
Maybe
this
opens
up
the
possibility
of
right
now.
C
I
talked
about
how
a
cig
owns
a
bunch
of
tests,
but,
for
example,
we
have
one
job
that
runs
all
the
tests
inside
of
GCE
and
then
each
individual's
safe,
just
like
Reggie
Jackson's
that
down
to
their
specific
set
of
tests,
and
so
it
could
be.
If
see
it,
you
know.
Sig
knows
tests
fling
can
cause
the
whole
job
to
fail
like
Signet,
where
it
doesn't
necessarily
see
that
and
it
makes
the
signal
go
easier
for
them.
C
So
there
was
our
thought
at
one
point
in
time
of
why
don't
we
like
regex
down
just
the
same
network
tests
and
just
the
same
new
tests
and
just
the
state
UI
tests
and
stuff?
Now
we
create
a
whole
bunch
more
jobs
for
us,
my
personal
feeling
is
that
would
result
in
us
spending
a
lot
more
time,
standing
up
clusters
and
a
lot
less
time
doing
useful
stuff
with
them.