►
Description
Part of https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/143
A
I'm
gonna
attempt
to
share
my
entire
desktop.
If
someone
can,
let
me
know
if
that
is
just
unreasonable,
because
I
do
have
a
relatively
large
screen,
but
I'll
try
to
bump
the
font
size
can
I
get
a
thumbs
up
from
jarv
or
thumbs
down.
If
it's
horrible,
it
looks
good
job
alright.
So
the
work
in
progress
for
enabling
auto
deploy
in
Kate's
workloads
is
still
ongoing.
There's
a
few
puzzle
pieces
that
we're
closing
in
on
the
ability
to
get
together
about
at
least
we
have
the
ability
to
trigger
and
upgrade.
A
So
all
we
need
to
know
is
the
environment.
We
need
to
know
whether
this
is
a
dry
run
and
we
need
to
know
a
specific
image
that
we
want
to
upgrade
our
sidekick
image
to
in
a
future.
Iteration
we'll
be
upgrading
our
home
chart
until
some
other
work.
That
is,
that
we're
waiting
on
to
get
into
place
is
done.
A
A
To
keep
scope
limited
I'm,
limiting
this
to
sidekick,
only
because
that's
what
we're
concentrating
on
the
registry,
if
we
get
auto,
deploy
working
properly
in
general
as
meaning
that
we
get
our
helm
tags
up
to
or
helm
images
updated
with,
the
appropriate
information
will
be
able
to
auto
employee
registry.
At
the
same
time,
we
enable
the
auto
deploy
our
helm,
charts,
yeah.
A
Yeah
so,
as
we
could
see
in
our
diff,
the
sidekick
container
were
going
from
we're
running
in
horrendous
ly
old
version
for
some
reason,
probable,
EE,
and
we're
going
to
update
that
to
1210
what
I
specified
so
I
saw
something
else.
Oh
those
the
dependencies
container,
getting
the
same
image
name
and
that's
the
only
thing
in
her
death,
so
I'm,
confident
that,
theoretically,
this
will
work
precisely
the
way
we
wanted
to.
A
A
A
A
A
If
I
could
remove
that,
I
could
reveal
our
values
cuz
we're
not
going
to
show
anything
special,
but
you
can
see
they
were
pulling
in
yesterday's
image.
We're
operating
on
the
pre
environment
we
set
driver
on
the
false
dry
run.
False
is
very
important
because
that
controls
a
few
things
one.
The
building
of
this
proper
pipeline
in
we
want
to
make
sure
that
we're
gonna
pull
credentials
that
are
not
read-only,
so
we're
using
our
writable
service
key
instead
of
our
read-only
service
key.
But
our
diff
shows
the
same
thing:
we're
going
to
update
our
image.
A
That's
the
only
change
for
both
the
sidekick
in
the
dependencies
container
and
we
see
down
here
the
upgrade
actually
started
and
now
we're
waiting
for
that
deployment
to
be
ready
and
I
could
sit
here
and
do
a
watch.
You
control
equals
sidekick,
because
this
is
pre
have
access
to
this
cluster
locally.
B
A
B
A
A
A
A
B
B
I
mean
it
does
it
doesn't?
It
does
make
this
a
bit
simpler,
because
then
you
don't
have
to
figure
out
the
speed,
the
time
format
or
the
pipeline
check,
which
is
a
different
format
than
the
image,
so
that
that
would
make
these
little
bit
simpler.
Yes,
I'm
a
little
bit
worried
that
this
image
that
were
gonna
test
with
NYX
might
be.
You
know
like
more
recent
than
what's
on
pre
product
because
it
was
yesterday,
but
he
may
have.
A
A
D
B
A
A
B
C
A
A
A
B
A
A
Instead
of
having
some
awkward
default,
our
helm
file
is
configured
to
pull
the
cluster,
to
figure
out
what
version
of
the
sidekick
image
is
running
that
way,
if
we
ever
need
to
make
a
configuration
change
which
will
not
contain
an
image
tag,
environmental
variable
that
will
automatically
not
change
it-
will
automatically
populate
our
helm,
template
with
the
appropriate
sidekick
image.
That
way,
we
don't
accidentally
change
the
image
value
unnecessarily.
A
A
B
B
B
B
A
Okay,
well,
the
production
club
will
all
clusters,
except
for
Priya,
are
now
updated
with
a
a
version
of
an
image.
So
we've
got
a
few
things
we
need
to
look
into.
One
is
improvements
into
validating
that
our
images
are
being
built
properly,
because
the
last
two
that
should
have
been
built
are
not
available
for
X
reason.
So
we
need
to
figure
out
why
that
is
maybe
an
improvement
to
our
pipeline
to
check
the
registry
that
the
image
is
available
and
then
on
the
side.
A
D
A
The
important
part
is
that
we
had
a
configuration
change
made,
a
chef
that
was
not
reflected
to
kubernetes.
This
is
something
that
we
I'm
documenting
today.
I've
wrote
up
a
document
that
I'm
about
to
submit
in
a
merger
quest
for,
but
we
don't
have
any
way
of
saying:
hey
dear
person,
who's
making
this
configuration
change.
This
impacts
kubernetes.
Please
make
sure
that
we
get
a
pipeline
configured
to
apply
that
change
to
or
kubernetes
environments
as
well.
B
D
B
B
A
D
B
A
A
B
B
C
B
B
We've
enabled
the
vertical
pod
autoscaler.
This
is
not
something
we're
going
to
turn
on,
in
other
words,
we're
not
going
to
enable
vpa
to
adjust
the
pod
resource
requests
and
limits.
You
know
automatically,
because
it's
not
something
you
want
to
do
with
HPA
enabled
to
do
something
you
have
enabled,
but
it
does
give
us
some
recommendations
on
what
you
know.
What
the
VBA
thinks
is
good.
B
What
are
good
limits
for
limits
and
requests,
so
I
think
we'll
start
with
staging
staging
is
only
interesting
for
cyclic
export
because
that's
where
we
have
mode
generated
I,
don't
think
there's
anything
here.
That
is
too
surprising.
So
these
terms
lower
bound
to
argue
it
uncapped,
Arbit,
uncap,
target
and
upper
bound.
I
define
them
up
here.
B
I,
don't
think
I
like
100%
understand
what
they
are,
but
this
is
kind
of
like
a
high-level
explanation,
note
that
these
are
limits
but
for
requests,
and
so
this
isn't
like
recommending
what
the
limits
should
be,
but
the
request
and
the
request
is
like
the
amount
that's
reserved
for
the
pod
when
it
gets
provisioned
on
the
cluster,
so
I
guess
what
we
want
to
compare
is
what
are
the
requests
that
we
have
but
now
and
what
is
it
recommending?
You
can
see
that
these
are
the
requests
for
CPU
and
memory.
B
Interestingly,
for
sidekick,
you
can
see
that
it's
actually
thinking
that
we
should
have
much
higher
requests.
I
think
this
makes
sense.
Skarbek,
maybe
maybe
even
like
chimed
in
here,
but
I-
think
what
it's
telling
us
is
that
our
requests
are
too
low
and
that
that
kind
of
makes
sense
because
they
are
pretty
low,
as
they
are
now
right.
B
So
I
think
it's
even
more
interesting
for
Registry,
so
the
problem
we're
trying
to
solve
are
what
this
issue
is
about
is
the
fact
that
we're
getting
a
lot
of
evicted
pods
in
production,
and
it
appears
to
be
that
the
node
is
just
running
out
of
memory.
It's
not
that
we're
hitting
the
memory
limit
for
the
pod,
we're
hitting
the
memory
limit
for
the
node.
B
A
B
B
A
B
A
B
Yeah
yeah,
so
if
you
can
just
I'm,
imagine
like
you
start
like
adding
a
bunch
of
idle
pods
and
then
they
start
receiving
traffic
and
the
memory
grows
on
each
of
them
and
then
boom.
You
run
out
of
memory
on
the
little
afternoon,
yeah,
so
I
I
think
just
we
increase
this
slowly
like
we.
We
start
by
maybe
like
CPU
of
a
hundred
memory
of
like
maybe
150,
and
do
it
in
increments.
B
B
So
I
I
kind
of
want
to
take
a
small
step
first,
just
to
like
make
sure
that
we
aren't.
There's
no
there's
like
some
assumption
that
we're
missing
here,
something
like
something
I
I'd
like
to
take
like
a
baby
step
first
and
then
kind
of
look
at
the
behavior
of
the
cluster
and
then
decide
what
to
do
next,
because
what
I
guess
what
we
don't!
What
we
want
to
avoid
is
not
having
the
node
capacity,
because
it
takes
time
to
scale
up
and
end
up
with.
You
know
something
that
affects
a
service
production.
B
B
B
A
D
B
D
B
B
A
D
B
B
D
A
B
C
A
B
B
B
Just
a
quick
review
of
what's
left
in
the
epic
I
think
pretty
much
everything
is
actively
being
worked
on.
Then
this
one
has
a
duck
yeah,
so
everything
is
assigned
right
now,
so
I
think
we're
and
I'm
pretty
good
shape
there
and
also
I
just
felt
like
with
that
with
reticent
Postgres
like
like
the
nature
of
project
export.
It's
just
like
I,
just
don't
think
it's
putting
a
lot
of
load
on
the
Atari,
bursty,
I
I
think
we'll
probably
be
be.
Okay,
all.
A
D
But
this
is
really
easy
to
for
for
things
to
creep
into
the
scope.
So
how
about
we
roll
out
on
on
Monday?
We
roll
out
a
change
to
production
so
that
we
can
start
taking
the
traffic
okay
and
then
have
a
task
until
Wednesday
to
manually
bump
cyclic.
If
we
don't
have
this
done
by
then
when
I
say
this,
I
mean
the.
D
B
B
D
A
Anything
what
we
could
do
is
just
set
the
jobs
that
do
the
triggers
to
allow
failure
being
true.
That
way,
we're
not
blocking
the
pipeline,
but
we
could
outside
of
the
deploy,
investigate
why
sidekick
is
failing
to
deploy
that
it
alone
will
get
us
some
information
as
well,
and
we
can
make
improvements
in
parallel
to
running
this
stuff
into
production.
D
D
We
have
alerting
for
it
right
like
we
can
immediately
like
we
can
leave
a
change
issue
open
or
something
open
and
then
just
have
the
sa
resume
called
be
aware:
hey.
This
is
where
you
go.
When
you
see
this
alert,
this
is
what
you
need
to
do
to
scale
down,
so
that
production
can
continue
as
normal
right
and
I
kind
of
expect
that
in
the
full
week
we
won't
have
that
we
won't
have
a
problem,
but
like
I'm,
not
psychic.