►
Description
Enabling the project export queue in production https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1736
B
Posted
the
previous
week
goals
in
today's
agenda,
so
we
could
start
a
priori
goals.
The
fixed
desk
utilization
metric,
we
pinged
Ben
but
I-
think
we're
going
to
take
a
look
at
this
ourselves
which
it's
perfectly
fine
I
just
haven't,
had
a
chance
to
do
that
this
week.
So
hopefully,
if
we
are
able
to
deploy
successfully
today,
maybe
that's
something
you
can.
D
I,
don't
really
have
much
to
cover
here.
It
looks
like
we're
in
pretty
good
shape
with
logging.
There's
one
outstanding
issue,
or
one
known
outstanding
issue
that
I
do
uncover
today,
while
doing
some
additional
failure
testing,
it
looks
like
some
log
messages
aren't
being
properly
indexed
by
elasticsearch
I'm,
doing
a
chart
update
to
log
400
error
reasons,
so
I'm
working
on
that
now
to
kind
of
dig
into
what
the
problem
is,
but
I
added
the
issue
to
the
epic
and
that's
the
only
offsetting
issue
we
have
right
now,
so
otherwise
logging
is
good.
D
B
So
the
last
item
on
this
list
was
determining
why
rollback
did
not
occur
after
we
had
timed
out
a
deployment
turns
out.
We
had
just
missed
an
option
in
our
home
file.
Configuration
will
be
migrated
from
our
cake
control,
managed
a
variety
of
helmand
command
line
options
into
helm
file,
which
uses
a
configuration
item
for
this.
So
this
was
actually
relatively
easy
fix.
I
was
done
relatively
quickly,
so.
A
B
So
I
will
share
my
screen.
I'm
gonna
share
the
pipeline
I'll
bump
the
font
size
for
everyone's
viewing
pleasure,
so
I
ran
into
an
issue
yesterday,
where
also
found
two
other
seekers
that
were
not
populated
in
our
GCMs
vault,
but
these
are
already
being
populated
inside
of
our
infrastructure,
so
I
ran
a
dry
run
earlier
today
with
the
corrections
in
place.
B
C
B
A
E
B
D
Yeah
I
mean
I
think
what
I
was
looking
for
was
the
reason
for
the
failure
on
the
elasticsearch
side,
but
Igor
mentioned
that
that
might
be
difficult
to
track
down.
So
maybe
time
would
be
better
spent
increasing
our
logging
on
the
client-side
and
you
can
either
set
the
log
level
to
debug,
but
that's
really
noisy
or
the
plug-in
has
a
special
option.
D
B
B
Yeah
there's
one
remaining
item
that
I
recall
related
a
boot
snap.
That's
going
to
improve
the
boot
of
the
container
okay,
we're
now
ready
on
this
pod.
The
one
thing
I'm
noticing
and
I
didn't
really
notice
this
until
trying
to
push
this
into
production
is
that
it
takes
a
really
long
time
for
the
sidekick
pod
to
switch
between
running
and
ready.
I
think
after
this
call,
I
will
spin
up
an
issue
to
investigate
why
that
is
occurring.
B
B
So
we
have
our
deployment,
it's
running
version,
twelve,
eight
five.
We
have
our
one
pod
running
and
we're
getting
our
memory
usage,
CPU,
usage
quota
and
stuff
will
probably
start
populating
after
a
little
bit
more
time
has
passed,
so
we
now
have
metrics,
which
is
good.
So,
let's
look
for
our
logs,
whose
pinging
me.
B
E
B
D
E
B
B
Put
it's
looking
for
a
queue
called
null:
it's
gonna
for
work
from
a
queue
called
null
which
does
not
exist.
Okay.
This
is
simply
here
to
make
sure
one
the
deployment
worked.
The
configurations
are
saying
so
the
pods
started
up.
So
we
know
the
configurations
are
saying
we
got
past
the
issue
that
we
found
out
yesterday.
So
I
think
really.
B
B
F
B
B
B
Let
me
make
sure
that
image
exists.
I.
B
313
835
this
morning,
six
seven,
two
eight
one
to
the
east:
seven,
three,
eight
okay,
so
that
image
does
exist
in
our
registry.
So
this
merge
request.
Jarv,
it's
number
one
64.
This
will
do
two
things:
it
updates
our
staging
image.
That
way,
it's
just
a
sign
in
shape
that
the
image
works
in
staging
and
it
will
also
update
the
image
in
production
while
simultaneously
changing
to
the
project
export
queue,
allowing
us
to
pull
from
project
export.
D
B
B
D
A
Okay,
just
to
be
clear
here
while
I
do
want
to
see,
is
moving
forward.
I
don't
want
us
to
move
forward.
If
neither
of
you
are
feeling
confident
that
we
should
be
right
like
if
you
need
time
to
take
a
look
at
everything,
go
for
it,
I'm
just
wondering
what
kind
of
expectations
do
we
have
with
this
double
checking
right.
A
B
Double
checking
is
going
only
as
far
as
making
sure
the
configurations
look
saying
inside
of
the
pod,
like
I
want
to
be
able
to
compare
the
configuration
built
inside
of
the
pod
to
a
sidekick
node
running
in
production
beyond
that
I'm,
confident
in
moving
forward
with
the
plan
of
action,
as
is
I
just
want
a
bit
of
time
to
compare
configuration
files
to
make
sure
there's
not
something
that
looks
agree
egregious.
That
needs
to
be
looked
at
or
investigated.
B
B
E
C
A
B
Jarv,
while
I'm
looking
at
configuration
files,
I
updated
that
merge
request
with
the
ops
pipeline.
Do
you
want
to
just
run
through
that
with
the
people
that
are
on
this
call
just
to
show
off
the
expected
changes
that
we
expect
to
see
between
the
two
staging
and
production
environments
and
nothing
in
pre
in
canary.
D
Here's
the
knocks
we
go
to
the
pipeline,
and
so
what
we're
expecting
to
see
here
are
some
changes
for
production
and.
D
D
B
F
D
D
D
D
D
B
A
A
D
C
D
B
F
B
F
B
A
B
D
I'd
be
a
pure
pop,
you
don't
have
any
pots
in
production,
so
we
didn't
have
to
wait
to
cycle
through
all
of
them
like
we
have
like
no
tests.
So
this
is
the
first,
probably
the
only
deployment
that's
going
to
be
nice
for
production,
because
we're
probably
going
to
scale
up
I,
don't
know
we
may
not
scale
up
that.
Much
though
we'll
see
we
we.
B
Have
an
issue
to
investigate
what
the
ploys
look
like
when
you
have
more
than
one
pot,
because
we
may
need
to
tune
the
the
time
out
that
helm
waits
for
before
it
decides
to
roll
back,
because
these
pods
are
taking
a
long
time
like
it's
been
three
minutes
in
the
pot
is
yet
to
enter
the
ready
state.
And
if
it's
going
to
be
like
that
for
one
pod,
we're
gonna
run
into
a
large
issue
when
it
comes
to
deploying
in
general.
B
A
So
yeah
that
means
order
priority.
There
will
be
health
this
running,
even
if
it's
only
with
one
part
to
collect
data
on
how
we
are
behaving
with
real-world
workloads
and
then,
at
the
same
time,
increase
the
pressure
on
distribution
to
actually
work
on
speeding
this
up
and
then
in
this
at
the
same
time,
for
us
ensure
that
we
don't
deploy
manually
but
which
also
deploy
there.
We
go
if.
A
B
Create
an
issue
I
just
want
to
make
sure
it
gets
noted
down
since
I.
Don't
have
the
agenda
on
my
screen,
so
the
job
succeeded.
So
we
should
now
start
being
able
to
pull
work
so
I'm
gonna
pull
up
the
logs
and
because
I
can't
auto
refresh
I'll
just
hit
the
button
manually
over
and
over
and
over
again.
C
D
D
D
D
A
B
This
is
an
alert
that
we
have
to
ensure
that
we're
not
wait
for
clarity.
Did
you
get
an
alert
or
a
page
page
a
page,
so
we
probably
have
it
as
a
page,
because
if
we
say
for
the
gitlab
registry,
which
is
taken
all
the
traffic,
if
we
scaled
up
to
its
maximum
pod
count,
something
is
probably
wrong.
So
the
fact
that
it
exists
is
a
valid
Alert,
slash
page,
but
we
don't
need
to
be
paging
for
this
because
we're
an
experimental
phase
at
this
moment
in
time.
B
So
we
should
silence
that
for
quite
a
while,
and
we
should
probably
revisit
how
those
rules
operate
in
situations
such
as
this,
because
I
think
we
were
considering
at
one
point
in
time,
maxing
out
the
HPA
to
start
with,
because
the
time
it
takes
for
this
pod
that
started
was
so
lengthy
at
one
point
in
time.
So.
B
D
D
All
of
the
project
exports-
and
this
will
in
turn,
there's
a
grace
period
of
like
20
seconds
or
so
don't
wait
for
me.
Ups
are
in
progress
to
finish,
but
I
don't
really
see
much
in
the
logs
anyway,
so
I
think
it's
going
to
be
fine
and
then
the
next
job
should
hopefully
pick
up
picked
up
by
the
pod.
I
think
it's.