►
From YouTube: 2020-05-08 Sidekiq elasticsearch shard migration
Description
https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/237 , part of https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/112
A
All
right,
so
the
first
thing
we're
going
to
discuss
here
is
the
elasticsearch
migration
I
just
wanted
to
bring
up
the
conversation
about
the
dedicated
note,
polis
sounds
like
we've
agreed
to
put
it
on
the
default
node
pool
for
now,
given
that
there
isn't
a
better
choice,
I
guess
the
issue
for
that
is
linked
to
the
epic.
You
want
to
comment
on
it.
Does
anyone
have
any
additional
thoughts
on
this
or
not
not.
B
A
It's
super
easy.
It's
a
bit.
You
know
it's
a
bit
of
a
pain
to
decommission
a
pool,
but
as
far
as
migrating
to
a
new
pool,
that's
super
simple.
You
know
and
I
think
one
will
eventually
do.
Is
move
registry
to
a
dedicate
and
then
have
this
default
pool
as
just
the
generic
one.
We
use
for
things
that
don't
have
a
better
place.
C
A
A
A
A
But
it's
cool
next
up
is
the
console
agent.
So
after
elasticsearch
we're
probably
gonna
look
at
gig
for
WebSockets.
The
first
thing
we
need
to
do
is
get
the
console
agent
installed
in
the
cluster
I.
Think
we're
just
going
to
use
the
upstream
charge
for
this
we're
going
to
install
it
as
a
daemon
set.
A
Unfortunately,
they
don't
have
an
official
well,
they
have
a
charge,
but
it
isn't
published
so
the
way
that
they
recommend
installing
it
is
to
clone
it
and
then
install
it
locally.
So
we
have
some
options
here.
One
is
that
we
mirror
it
to
get
lab
and
then
publish
it
on
charts
like
David
died.
Oh,
we
could
mirror
it
and
add
a
sub-module
to
the
project,
that's
installing
it
or
we
could
just
add
the
github
repo
as
a
sub
module
to
the
project.
That's
installing
that
we.
B
B
B
C
About,
like
just
I
mean,
while
we
suggesting
those
ideas,
what
about
a
part
that
actually
runs
in
communities
that
have
that
as
the
agent
and
they
make
the
call
against
that.
The
other
thing
is
it's
not
configured
at
like
the
the
system.
Resolver
right,
like
the
application,
has
like
an
IP
address
and
a
port
in
it,
which
means.
A
You
don't
why
not?
Yes,
maybe
I
don't
there
well.
No.
This
is
exactly
what
we're
gonna
do.
We're
gonna
have
the
console
agents
running
in
a
single
pod
as
a
daemon
set
and
right
and
then
the
problem
is,
is
that
we
just
need
to
install
the
chart
and
the
chart
is
managed
upstream
by
console
and
she
quark
console
and
they
don't
publish
it.
They
just
have
like
a
git
repository,
so
we
either
need
a
mirror
it.
We
need
to
mirror
it
or
publish
it
ourselves,
I
just
I,
don't
know
what
to
do.
A
A
A
C
A
C
A
C
The
question
is
whether
this
is
like
a
1
so
forth.
This
is
like
if
it's
a
1,
so
if
it
doesn't
even
really
worse,
it's
not
worth
the
discussion,
but
if
this
is
something
that
you're
going
to
be
doing
like
well,
if
this
is
a
pattern,
then
it's
worth
like
considering
right,
you
know,
and
so
how
many
other
things
do
you
think
are
there
that
are
like
this
that
are
going
to
know.
It's
like
same
pattern
there.
Maybe
if
you're
me
not
that
many
like
but
like
a
handful
like
three
or
four,
maybe.
A
A
I,
don't
know
like
so
I
think
like
doing
it.
The
sub-module
way
is
like
the
easiest,
obviously
right
and
having
a
sub
module,
that
references,
github
is
actually
even
the
easiest
thing,
but
I
don't
know
if
we
want
to
have
a
dependency
on
github
for
our
stuff.
So
if
we
don't
do
that,
that
mean
we
need
to
create
a
pool,
mirror
and
reference
that.
B
I
think
we
need
the
pool
mirror.
In
any
case,
you
know,
I
mean
we
remove
a
dependency
right
so
things
that
can
go
wrong.
Let's
say
we
need
opes
to
deploy
right,
so
I
would
love
to
have
the
mirror
and
oops,
because
I
don't
want
to
do.
I
have
everything
hopper
running
about
get
up
is
down
and
I
can't
deploy.
Well.
A
A
D
The
only
thing
I've
got
is
I
know:
we've
been
looking
at
the
Alaska
migration
to
be
at
least
investigating
it.
I,
don't
know
how
close
we
are
to
executing
it,
but
I
would
like
to
stress
that
we
really
need
to
remove
the
allow
failure
on
the
coronaries
deployments
and
deployer
we're
still
getting
failures
quite
often
and
like
we
had
one
today
and
I,
don't
know
why
it
failed,
but,
like
it
at
least,
went
through
the
entire
pipeline
before
it
got
to
the
point
where
than
a
job
itself
inside
of
kubernetes
failed.
D
A
E
D
A
A
A
A
D
B
D
See
I
don't
know
how
much
you've
been
following
along,
but
we've
since,
since
we've
had
project
export
inside
a
kubernetes,
we've
shifted
over
to
the
the
work
that
scalability
has
been
doing
and
we
have
migrated
over
to
shards
now.
So
the
memory
bound
shard
contains
project
export
and
some
other
random
queue
called
new
design
version.
Something
along.
D
D
A
We
have
a
box.
In
that
sense,
the
gke
law
looks
to
elasticsearch,
but
we've
seen
a
lot
of
errors
because
of
the
number
of
fields
and
honestly
like
elasticsearch,
is
like
you
have
to
be
very
careful
when
you
send
box
to
it
because,
like
you
have
too
many
fields,
it's
not
happy.
If
the
data
types
change,
it's
not
happy
like.
A
A
A
A
D
B
A
D
B
B
D
B
No
such
thing
as
opposed
deployment,
migration
in
Ruby
on
Rails
is
just
something
that
we
build
on
top
of
debt
and
so
schema
either
is
so
either
you
run
or
you
don't
run
the
migration.
So
when
you
do
things
locally
and
you
have
to
do
you
have
to
prepare
your
mod
request,
you
have
to
run
also
the
post
deployment
and
this
update
it
because.
D
A
B
I
can
just
database
it's
it's.
It's
a
schema
version.
B
B
B
B
B
A
D
D
Lately
I've
been
seeing
a
lot
of
SSL
failures
inside
of
our
ops
pipelines.
I,
don't
know
where,
in
our
process
that
SSL
failing
is
actually
occurring
like
whether
it's
reaching
out
to
get
lab
or
whether
it's
reaching
out
to
github
or
whatever
I,
don't
know
yet,
because
I
haven't
delved
deeply
into
it.
But
this
is
just
a
new
one.
D
I've
been
capturing
a
list
so
far,
I've
got
three
items
in
issue:
eight,
zero,
six,
that's
tied
to
the
I
think
I
think
you
and
I
need
to
work
together
to
figure
out
how
to
prioritize
this
particular.
Maybe
this
issue
or
epoch
in
Tayna,
with
all
the
other
work
that
we've
got
going
on
I
just
haven't,
had
that
had
the
time
to
pick
up
this
topic
at
all.
So.
A
D
A
Yeah
yeah,
so
so
normally
like
90
percent
of
the
time
that
happens
because
we're
all
stages,
but
if
there's
an
omnibus
change
and
without
a
rails
change
we
need
to
tag
CNG
as
well.
So
that's
that's
the
fix
for
that.
The
fix
for
this
I'm
not
really
sure
other
than
disabling
the
check,
I'm,
not
sure
what
else
we
can
do
right
there.
D
Carabas
suggest
that,
for
this
particular
issue,
we
open
up
a
an
issue
with
the
charts
for
the
distribution
and
I
think
proposing
an
option
to
skip
this
check
would
be
wise
and
I.
Think
I'll
be
met
with
good
criticism,
because
you
don't
want
this
shipped
to
customers,
but
I
think
the
ability
for
us.
This
will
be
important
unless
there's
some
way.
B
A
A
B
A
A
B
A
A
D
Considered
that
there
is
a
way
inside
of
kubernetes,
you
can
run
a
cube
control
command
that
will
output,
something
that
gets
stored
in
the
event
log
when
a
failure
occurs.
The
problem,
though,
is
that
this
upgrade
happens
inside
of
helm,
so
I
don't
know
how
to
integrate
a
cube
control
command.
I
will
look
at
a
failed,
deploy,
look
back
at
the
correct
deployment
to
get
that
information
and
I
also
don't
know
if
we
store
the
correct
information
in
those
logs.
D
So
this
would
be
something
we
would
have
to
investigate
to
see
what
we
could
do
and
if
we're
storing
the
right
data
to
get
that
information
so
that
we
don't
have
to
look
at
logs,
but
I'll
create
an
issue
to
address
that,
because
I'm
curious
about
that
as
well.
I'll,
add
that
to
this
exact
topic,
actually
good,
something
that's
the
perfect
place
to
put
it.
Okay,.