►
From YouTube: Database Decomposition Overview
Description
Pat gives an overview of the decomposition work.
Notes (internal) https://docs.google.com/document/d/1c_NBT-5GSCYhs7CZP9PGQACsP6_wUWCCpDVqVn782dI/edit#
A
Does
everyone
have
saw
the
link
to
the
doc
okay,
so
yeah
we
alex
asked
me
to
just
have
a
you
know,
an
overview
kind
of
the
decomposition
work
that
we
have
done
and
so
that
everyone
is
on
the
same
page.
I'm
not
sure
you
know
what
everyone
knows
about
it.
Some
people
like
simon
has
been
working
on
other
things,
a
lot
and
matt
too.
I
know
so.
A
Hopefully
I
don't
say
everything
that,
like
things
that
people
already
know,
but
at
least
hopefully
everyone
will
understand
the
basics
so
just
for
the
overview
right
now
for
the
current
status
of
it
and
I'm
not
super
up
to
date
on
it.
But
phase
four
was
fully
deployed
on
production
and
that's
that's
using
a
separate
connection
for
the
ci
and
main
database,
but
still
pointing
at
the
same
primary.
A
A
There's
been
some
things
with
migrations
and
how
that's
working
in
the
migration
2.0
that
need
to
be
addressed
first,
but
that's
the
and
also,
I
think,
they're
still
discussing
you,
know,
testing
the
rollout
plan
and
how
that's
all
going
to
play
out
so
just
sort
of-
and
this
is
not
something
I'm
very
familiar
with
either
because
I
didn't
really
work
on
this,
but
just
for,
like
everyone's
understanding,
you
know
in
ci
we
have
these
query
analyzers
that
run
and
that's
basically
supporting
how
we
make
sure
that
we're
not
doing
these
crust
database
queries,
and
so
I
put
the
the
directory
in
the
document
there
where
they
live
in.
A
Really
someone
from
starting
team
knows
more
about
that
than
me,
since
they
they
did
all
that
work,
but
basically
we're
detecting.
You
know
that
a
query
in
sierra,
detecting
a
query
doesn't
join
two
tables
that
are
in
different
databases
and
also
that
we're
not
talking
to
two
databases
in
the
same
transaction
or
writing
to
two
databases
in
the
same
transaction.
A
A
If
you
weren't
aware
of
that,
because
I
know
people
have
questioned
before
like
how
do
we
know
that
that's
happening
and
also
get
the
ci
now
is
working
in
decomposed
mode
by
default?
So
if
you
want
to
run
for
some
reason,
single
database
test,
there's
a
flag
that
you
can
add,
there's
a
single
database
flag
that
you
can
add
to
the
mr
and
that
will
run
also
in
the
single
database
mode.
If
you
want
to
test
something
specifically
save
for
self-managed
and
make
sure
it's
working
properly,
but
otherwise
decomposes
the
default
on
pipelines.
Now.
B
Pat
on
on
the
query,
analyzers
from
so
there's
the
way
we
distinguish
which
tables
belong
to
which
database
is
still
the
yamo
file
right
right.
A
A
As
far
as
I
know,
I
have
not
heard
discussion
of
plans
to
get
rid
of
that
file.
I
think
that's
still
the
the
way
that
we
have
it
working
right
now
and
I
linked
to
the
file
there.
This
gitlab
schema
schemas.
If
people
weren't
familiar
with
that,
but
that's
basically
where
we
have.
A
We
have
a
schema
or
gitlab
schema,
which
is
basically
an
abstraction
of
the
database,
so
there's
gitlab
main,
which
is
then
saying
that
this
anything
that
talks
to
the
main
scheme,
I
can
talk
to
the
main
database
and
the
shared
database
and
then
the
same
for
ci
that
I
can
talk
to
ci
and
then
the
shared
database
tables.
A
So
yeah
that
ties
into
the
next
part,
then,
is
the
how
the
migrations
work
with
this
migration
2.0.
I'm
sure
you
saw
the
things
in
slack
about
that
and
everything,
but
the
general
overview
there
if
you're
not
familiar,
is
that
we're
making
a
hard
rule,
basically
that
we
have
to
split
the
ddo
and
the
dmo
in
migrations,
and
this
is
going
way
back
to
the
very
beginning
of
the
project,
basically
to
ease
the
transition.
A
A
So
we're
splitting
the
ddl
in
the
dml
and
then
in
the
ddl
type
migration.
You
don't
have
to
do
anything
differently
than
you
would
do
with
a
migration
now,
and
it's
just
going
to
run
that
ddl
change
against
both
databases
and
then
with
the
dml.
Then
that's
when
you're
putting
this
restrict
get
lab
migration.
A
So
we
have
the
the
single
database,
which
I
call
the
or
the
the
database.
Basically
where
the
records
go.
I
call
that
the
tracking
database,
that's
where
the
tracking
data
for
the
migration
is
living
and
then,
but
once
the
migration
job
starts
running,
there's
really
nothing
preventing
it
from
being
able
to
talk
wherever
it
needs
to
talk.
A
So
you
can
still
do
the
the
migration
local
models
and
add
you
know
something
that
talks
to
ci
into
something
that
talks
to
maine
and
as
long
as
you're,
not
cross-joining
in
things
of
course,
then
there's
no
reason.
You
can't
do
that,
and
so
that's
also
kind
of
the
issue
that
we
had
with
this
finalized.
Recently
that
came
up,
which
is
a
only
we
had
a
call
yesterday
with
camille.
A
So
then,
the
normal
content
connection
handling
doesn't
work
quite
the
way
that
we
expect
it
to,
and
so
from
that
call
we
did
camille
mentioned
and
makes
sense
that
really
from
the
beginning,
it
would
probably
have
been
better
if
we
were
using
application
record
rather
than
active
record
base
to
talk
to
the
main
database,
and
then
we
are
only
ever
using
something.
That's
an
application
record
and
we
know
we're
always
getting
the
correct
connection.
A
A
Oh
and
the
other
thing
too,
with
the
migrations
is
there
was
this
new
edition
of
this
new
class
called
migration
record,
which
is
basically
a
replacement
than
for
active
record
base
so
rather
than
if
you
want
to
write
a
model
local
to
the
rails,
migration,
rather
than
inheriting
from
active
record
base,
you
would
inherit
from
this
migration
record.
So
that's
basically
just
you
know
like
an
abstraction
there.
C
C
C
A
I
I
you
know,
I
think
it
just
works
like
any
other
model
would
work,
but
it's
just
so
we're
not.
You
know
referencing
actor
record
base
directly.
So
if
we
need
to
change
that
later
down
the
road,
then
that's
easier
to
do
but
yeah,
I
don't
really
know
how
it
works.
That's
a
communal
question,
probably
I'm
good
and
then
so.
The
other
piece
of
that
that
ties
in
with
the
background
migrations.
A
So
we
have
the
shared
model
there,
which
I
think
everyone
is
probably
familiar
with
that
to
some
extent
and
basically,
what
we're
just
doing
there
is
we're
allowing
to
override
the
connection
on
a
perthroad
basis.
So
basically
we
have
these
models
that
we
use
that
inherit
from
that.
The
batch
background.
A
Magnesium
models,
the
partitioning
models,
like
all
the
pg
schema
tables,
that
we
have
like
the
pg
index,
pg
class,
that
wrap
the
pg
catalog
stuff,
they're
all
using
the
shared
model,
and
then
basically,
you
just
wrap
this
in
the
block
and
it's
going
to
override
the
connection
for
the
duration
of
that
block.
So
you
can
point
it
to
maine.
A
C
A
So
that
would
be
where
this
gitlab
shared
schema
comes
in,
so
those
that
would
include
all
those
tables,
the
git
lab
yeah,
the
pg
catalog
tables,
would
all
be
part
of
get
lab.
Shared
information
schema
as
part
of
the
gitlab
shared
like
any
database.
Specific
tables
that
we
would
need
to
access
outside
of
the
application
are
also
part
of
gitlab
shared.
A
And
so
then,
how
that
works
specifically
with
the
background
migration.
So
this
is,
we
have
the
legacy
background
migrations
in
the
batch
background,
migrations.
A
So
for
the
old
background
migrations,
we
have
two
workers
from
there's
a
worker
for
maine,
there's
a
worker
for
ci
and
they
basically
know
which
database
they
need
to
talk
to
it's
like
hardcoded
in
the
worker
and
then
when
we
schedule
we
would
schedule
the
job
when
the
migration
runs.
We
use
the
same
method
that
we
use
now,
which
is
like
the
cube
background
migration
by
integra
or
whatever,
and
it's
basically
it
would
look
at
like
the
connection
name.
So
it's
going
to
say:
oh,
the
connection
name
is
main.
A
Therefore
I
you
know
I'll
get
the
worker
that
talks
to
maine
and
then
everything
kind
of
keys
off
that,
and
if
it's
c,
then
it
would
do
ci.
So
that's
how
it's
scheduling
through
sidekick,
to
the
specific
worker,
basically
by
the
connection
name,
to
the
database.
It's
talking
to
so
like
in
the
current
state
that
we
have
right
now.
A
It's
still
hard-coded
to
use
main
only
because
we're
not
really
like
100
sure
how
we'll
roll
it
out
and
we
had
that
migration
2.0
which
isn't
finalized
yet
so
right
now,
everything
is
still
being
scheduled
directly
on
main,
which
is
an
issue
since
maine
and
ci
are
the
same
primary
right
now
so
like
the
ci
worker
is
basically
never
doing
anything
because
nothing
is
ever
being
scheduled
against
ci
and
that's
how
it
will
be
on
self-managed
too,
that
ci
background
migration
worker
is
just
it's
never
going
to
have
a
job
scheduled
against
it.
A
So
we'll
never
have
anything
to
do
for
the
batch
background
migration,
it's
a
little
different
there
since
we're
using
sidekick
quran
to
run
everything.
So
it's
not
based
on
the
jobs
being
scheduled.
It's
based
on
the
database
state
so
that
for
there
we
also
added
two
workers.
So
there's
you
know
again
a
worker
for
maine
and
a
worker
for
ci
and
they're
looking.
A
They
know
which
database
they're
looking
at
basically
and
since
the
ci
workers
would
basically
always
be
running,
but
there's
not
any
work
for
to
do
right
now,
since
or
we
wouldn't
want
it
to
run
basically
because
it
would
see
the
same
jobs
that
maine
is
seeing.
A
We
just
have
a
feature
flag
there
and
the
future
flag
is
turned
off.
So
then,
once
we're
past
the
the
cut
over,
we
have
two
separate
databases
and
the
data
is
in
the
right
state.
Then
we
could
turn
the
ci
feature
flag
on
and
then
that
worker
will
just
start
picking
up
jobs
in
the
ci
database
and
running
them.
A
So
that's
the
part
that
I
think
there's
still
discussion
around
and
I
can
I'll
find
the
issue
and
link
it
in
the
stock.
But
so
we
talked
a
little
bit
about
the
rollout
process
for
the
background,
migrations
and
I
think
for
the
for
the
legacy
migrations.
Particularly.
I
don't
know
that
there's
really
a
clear
plan.
How
that
would
work.
A
It
will
probably
basically
be
like
manually,
we'll
have
to
say
at
some
point
prior
to
the
cut
over,
like
don't
schedule
any
new
background
migrations,
because
there's
not
really
a
great
way
to
handle
that,
because
of
it
being
in
sidekick,
but
then
having
stayed
in
the
database,
it's
and
there's
no
way
to
pause
execution.
So
it's
just
running
sort
of
through
the
cut
over
and
things
might
get
sort
of
mixed
up
a
bit.
A
So
I
think,
for
the
background,
the
old
background
migrations
there's
going
to
have
to
be
some
manual
process
to
manage
that
or
like
a
process
step
ahead
of
time
to
basically
prevent
that
from
happening
for
the
batch
background,
migrations.
What
I
suggested
and-
and
I
think
camille
had
a
different
solution
which
maybe
is
better
but
at
least
at
a
bare
minimum.
We
can
basically
each
of
the
workers
has
a
feature
flag.
We
can
turn
those
feature
flags
off
and
then
neither
worker
will
do
anything
we
do
the
cut
over.
A
At
that
point,
we
have
the
main
database
and
the
ci
database
that
data
is
mirrored
at
that
point.
Basically,
you
would
have
to
like
go
in
the
main
database
and
delete
any
ci
related
migrations.
If
there
are
any
in
the
ci
database,
you
delete
any
main
related
migrations.
Then
you
turn
the
features
flag
back
on
and
it
just
picks
up
where
it
left
off.
Everything
should
be
good,
so
I
think
what
camille
suggested
in
the
issue
was
actually
now.
C
So
when
the
the
migrations
get
scheduled
on
either
database,
but
when
the
job
actually
runs,
does
it
have
a
connection
to
each
one
or
does
it
only
get
a
connection
to
its
own
database?.
A
So
it
it
gets
past
the
connection
to
its
own
database
and
then,
if
you
want
to
talk
to
the
other
database,
you
just
have
to
create
a
model
or
something
or
use
the
application
record,
basically
to
talk
there,
yeah
and
then
the
reason
it
passes.
The
connection
to
its
own
database
is
so
that
we
have.
You
know
generic
job
like
the
copy
column
job
that
we
use
for
the
primary
key.
C
A
Yeah
right,
we
did
talk
about
that
and
you
know
at
this
point.
A
That's
a
fair
question:
I
think
we
went
back
and
forth
on
that.
You
know
at
the
time
when
we
were
deciding
this
and
I
think
the
reason
we
decided
to
keep
it
fully.
Isolated
was
just
thinking
that
gives
us
more
operational
flexibility
because
we
can
basically
say
like
the
ci
worker
is
its
own
thing.
We
can
just
turn
that
off
or
we
could
pause
migrations
on
a
one
database
or
the
other,
and
it
just
seemed
like
a
better.
A
The
one
point
that
was
brought
up
against
that,
which
also
makes
sense,
is
like
if
we
continue
to
decompose
databases
in
the
future
and
went
down
the
road
there's
like
five
or
ten
or
whatever
databases,
then
maybe
that
becomes
unwieldy,
because
then
you
have
10
workers
and
then
everything
is
kind
of
that's
like
that's
for
in
the
future
that
that's
probably
not
yeah.
No
one
cares
right
now.
C
A
Yeah
so
it'll
be
a
little
different
for
sure,
because
the
get
lab
migrations
on
main
and
ci
can
potentially
run
in
parallel,
like
one
on
main
and
one
mci,
but
on
self-managed
everything
will
still
run
basically
single
threaded
because
for
now
they're
only
going
to
have
one
database,
so
the
ci
worker
will
still
just
be.
You
know
off
by
default.
A
Basically-
and
I
think
there's
you
know,
there's
a
good
bit
of
it's
going
to
be
kind
of,
I
think
tricky
for
the
time
being,
and
it's
going
to
be
new
for
the
development
team
and
figuring
out
like
how
they're
using
the
right
connection
and
how
are
they
talking
to
where
they
need
to
talk
and
how
that
is
confusing.
A
I
think-
and
I
know
camille
just
recently
added
a
cop
basically
saying
in
the
in
a
background:
migration
job,
don't
use
active
record
base,
only
use
application
record
or
ci
application
record,
which
has
helped
to
help
clarify
that
that
you
always
know
you're
talking
to
me,
and
you
always
know
you're
talking
to
ci
and
we're
also
trying
to
enforce
that
they're
or
I
added
a
cop
which
is
in
review
that
they're
the
background
migration
jobs,
at
least
for
the
batch
background.
A
A
C
A
You
know,
there's
other
database
features
we
made
to
work
with
multiple
databases.
Those
are
pretty
straightforward.
I
think,
like
partitioning
is
pretty
straightforward.
It's
just
running.
I
think
it's
iterating
over
all
the
models
that
are
there
and
saying:
oh,
this
model's
on
main
connect
to
main
and
run
the
partitioning
management.
Oh,
this
model's
on
ci
connect
to
cine1,
so
you're
only
going
to
ever
have
the
partitions
for
those
tables
where
that
table
would
naturally
live,
even
though
the
schema
is
mirrored
like
a
ci,
or
I
mean
like
auto
events,
for
example,
is
partition.
A
A
A
Oh,
this
table
has
other
tables
that
are
dependent
on,
so
I'm
going
to
move
it
to
the
very
top
of
the
dump
file,
so
that
I
make
sure
I
create
that
table
before
I
create
the
partitions,
which
is
why
you
see
this
diff
of
like
the
partition
tables,
moving
back
back
and
forth
all
the
time
which
is
really
unfortunate
and
it
there
was
like
discussion
at
some
point.
A
A
And
there
the
re-indexing
is
also
a
multi-database
feature
that
works
again.
The
same
way.
I
think
those
tables
are
all
shared
between
the
databases
and
the
rate
tests
and
everything
all
work
when
they
take
a
database
parameter
now
and
so
you're
just
saying
I
want
to
re-index,
you
know
x
index
on
main
database
and
it
will
run
there,
and
I
think
it's
also
split
two
so
that
eventually
there
could
be
two
separate
crons
that
run
that
and
we
can
run
you
know
re-indexing
in
parallel,
in
both
databases
on
the
weekends
and
things.
A
A
C
It
all
makes
sense
to
me:
do
you
know
much
about
and
the
call
yesterday
camille
was
talking
about
like
pressure
on
pg
bouncer,
due
to
increased
numbers
of
connections.
That's
just
because
we're,
basically
all
doubling
our
connections
during
phase
four
right
right.
A
Yeah
so
that
yeah
that
will
be
resolved
once
because
there's
a
separate
pg
bouncer,
separate
pg
bouncers
in
front
of
the
ci
cluster.
So
once
ci
starts
pointing
at
the
ci
cluster
for
real,
then
those
will
go
to
a
separate
set
of
pg
bouncers
and
then
we'll
have
this
really
the
same
distribution
we
have
now
or
even
less
really,
because
we
probably
need
less
connections
overall,
okay,
so
yeah,
that's
just
a
temporary
state!
We're
in
it's
a.
B
And
this
day
today
you
mentioned
the
you
know:
we
we
split
connections
to
the
primary.
Do
we
also
split
them
to
the
replicas
already.
A
Yeah,
so
the
replicas
are
already
that
was
phase
three,
which
was
the
prior
phase,
so
yeah
the
since
the
ci
cluster
is
being
replicated
from
the
the
primary
just
like
then
in
the
main
replicas
there's
phase
three
was
already
reading,
so
we're
already
doing.
Ci
reads
from
the
ci
cluster.
B
A
Well,
this,
I
guess,
are
you
talking
about
the
main
or
the
replicas,
or
I
mean
that
the
primary
or
the
replica
their
their
rights
are
their
reads.
Connections,
there's
a
lot
of
terminology.
A
Well,
for
the
for,
like
a
read,
write
connection,
it's
the
same
so
there's
main
connection,
there's
ci
connection
they're,
all
pointing
at
the
same
primary,
because
there's
only
one
primary
and
then,
but
for
the
there's
one
set
of
there's
like
the
current
set
of
replicas
that
are
coming
from
the
primary
which
are
for
maine
and
then
there's
a
new
ci
cluster,
which
is
coming
also
being
replicated
from
maine,
and
so
the
ci
connections
are
pointing
at
the
new
this
new
ci
cluster
for
the
reeds.
Only
though.
B
Okay,
so
for
replicas
we
already
have
physically
separated
things
yeah,
you
could
say,
and
okay
yeah.
That
makes
sense.
Is
there
some
sort
of
overview
for
infrastructure
side
of
this
like?
How
does
it
look
today?
What
are
the
next
steps.
A
That's
a
good
question.
I
can
look
so
everything
is
linked
off,
probably
linked
off.
There
was
the
main
epic
that
has
the
seven
phases,
which
is
the
epic
is
6160,
and
I
think
everything
is
probably
still
linked
from
there,
but
I
haven't
been
following
them
a
lot
as
closely
recently,
because
I
wasn't
really
working
directly
on
that
anymore,
but
yeah.
So
I
think,
like
phase
six
was
more
the
test.
The
infrastructure
set
up
and
test
phase
like
test
the
roll
out
and
roll
back
and
roll
on
a
roll
back.
A
Thoughts
or
questions,
or
I
mean
I
I
don't
really
know
if
there's
anything
else
that
I
can
talk
about
that.
I
guess
that
makes
sense
or
I
can't
think
of
anything
right
now.
B
B
What
do
you
think
is
the
biggest
challenge
completing
that
you
would
have
to
name
one.
A
Yeah,
well,
I
think
that
background
migration,
rollout
is
still
up
for
debate
yeah.
I
really
have
to
have
to
find
that
issue,
so
I
think
there's
some
and
I
think
that
also
you
know
making
sure
the
developers
are
using
it
the
right
way
and
I
think
that's
going
to
make
their
database
reviews
sort
of
tricky.
A
You
know
to
make
sure
that
that's
all
working
the
right
way.
So
I
think
that's
definitely.
You
know
like
a
big
challenge,
at
least
for
this
team,
and
then
I.
A
And
then,
of
course,
the
actual
like
cut
over,
I
think,
is
which
this
team
really
won't
be
working
on,
but
that's
pretty
tricky
also
too,
and
especially
because
we
have-
and
I
don't
even
really
know
what
the
resolution
to
this
has
been,
but
because
we
have
these
shared
tables
that
are
being
replicated
you
know.
So
then,
basically,
after
the
split,
then
you
basically
have
duplicate
data
in
those
tables.
A
So
somehow
there
has
to
be
like
a
way
to
you
know
like
say
the
you
know.
Like
I
mean
some
of
these
things
aren't
maybe
that
critical,
like
let's
say
like
reindexing
the
ring
indexing
tables
are
in
both,
so
that
doesn't
really
matter
if
the
data
is
on
the
ci
side
for
a
little
bit,
because
it's
not
going
to
affect
anything.
A
Let's
say
like
to
lose
foreign
keys,
that's
a
shared
table.
So
that's
another
feature,
that's
like
that.
So
basically,
then
you're
going
to
have
you
know,
data
living
on
main
and
then
that
same
data
is
mirrored
on
ci
and
then
somehow
you
have
to
break
that
apart
and
then
you
know
handle
that-
and
maybe
adam
has
already
coded
that
in
such
a
way
that
that
will
take
that
into
account.
I
don't
know,
but
I
think,
there's
some.
A
I
think
that's
not
really
super
well
known,
and
I
think
just
due
to
the
you
know,
desire
to
keep
the
cut
over
as
short
as
possible.
That's
like
tricky,
because
you
can't
doing
the
cut
over
and
then
running
all
these
scripts
or
something
like
that
to
clean
up
all
the
data
that
makes
it
sort
of
tricky.
A
But
yeah,
I
don't
know
that
there's
really
a
good
plan
or
or
I
don't
know
how,
how
far
they
have
got
with
that
actual
testing
of
the
rule.
Because
then
we
started
seeing
all
these
other
application
level
blockers
that
came
out
before
we
even
got
to
that
point.
So
I
think
that
there's
been
a
lot
of
focus
on
fixing
those
other
pieces
of
the
app
that
aren't
working
correctly.
A
A
You
know,
like
else
on
the
ci
database,
we
would
truncate
all
the
main
tables
and
vice
versa,
and
I
think
that's
still
the
plan
to
do
that,
like
some
point
relatively
soon
after
the
cut
over
but
yeah,
I
don't
know
what
really
this
the
status
of
that
is
status
of
it
is,
and
I
just
thought
too.
I
just
remembered
what
the
I
think
what
camille
had
suggested
for
the
batch
background.
A
Migrations
is
basically,
we
would
add
a
parameter
or
a
new
column
on
the
table,
which
is
containing
the
get
lab
schema
and
then
that
migration,
basically
that
main
worker
and
the
ci
worker
will
run,
and
then
they
would
only
pull
for
jobs
that
have
the
the
schema
that
they're
looking
for.
So
the
main
job
would
only
try
to
run
jobs
that
haven't
get
that
main
or,
however,
it's
flagged
there,
and
then
that
could
be
like
a
temporary
column.
That's
just
added
for
the
duration
of
the
cut
over
and
then
once
it's
done.
We
could.
A
B
Just
thought
we
could
do
cascading
truncates
and
see
if
we
got
all
the
foreign
keys
right
yeah.
We
should
have
them
right,
because
you
know
with
all
that
stuff
analysis
going
on
up
front
in
in
the
co-pace.
A
Yeah
yeah
yeah
and
there's
the
yeah
there's.