►
Description
Yorick Peterse and Alessio Caiazza chat about how to approach filling a column in a database for GitLab merge request https://gitlab.com/gitlab-org/gitlab/-/merge_requests/27219.
A
Then
there
were
some
proposals
to
use
a
different
approach
like
a
temporary
ID
column
and
some
post,
critical,
specific
approaches,
and
so
we're
gonna
discuss
sort
of
the
the
pros
and
cons
of
thousand
and
see.
If
we
can
kind
of
make
things
work,
let's
see
so
we
have
the
temporary
ID
one
I
think
we
sort
of
yeah,
I,
guess
I,
guess
I
at
least
realized
I
wasn't
gonna
be
as
easy
as.
B
A
Basically,
then,
the
alternative
was
to
use
the
postal
cycle
internal
row
and
page
IDs.
Basically,
I
was
watching
every
row
and
post
the
seco
has
a
sort
of
internal
ID.
It's
a
tupelov
I
will
give
the
page
number
and
the
sort
of
row
number
on
that
page
that
the
rows
own.
So
it's
kind
of
like
a
composite
primary
key.
Basically,
the
the
benefit
there
is,
you
can
fetch
rows
using
those
IDs
fairly
quickly
kind
of
like
a
primary
key
downside.
A
B
A
B
A
A
A
A
A
B
Think
this
one
understood
reading
deep.
The
original
document
that
you
linked
is
that,
basically
you
can
have
whole
sequence
if
you
delete
something
that
you
just
get
deleted.
So
in
that
case
the
row
will
have
nothing.
So
if
you
select
the
whole
table-
and
you
also
ask
for
the
city
ID
column-
which
is
this
one.
A
B
A
Let
me
see
cuz
if
I'm
just
a
select
count
it
times
out,
but
we
can
get
an
estimate
if
you
just
like
serve
man's
requests.
So
it
estimates
yet
one
point:
seventeen
million
rows,
that's
probably
more
than
what
they
actually
is.
Two
thousand
forty
nine
rows
per
page.
So
then
we
have
eight
eight,
nine
thousand,
basically
pages.
A
B
B
A
Say
welcome,
0,
not
sure
if
it
it's
one
index
based,
then
you
basically
increment
that
to
2049,
and
then
you
start
over.
You
start
with
1,
comma
0,
1,
comma,
1,
etc.
Basically,
keep
doing
that
so
I
think
what
you
could
do
is
in
theory,
you,
given
a
page
number,
you
fetch
all
the
rows
which
is
up
to
2049,
basically
load
those
into
memory.
I
think
the
way
you
do,
that
is
the
class.
A
A
Yeah,
there's
a
clue
in:
let's
see
how
that
performs,
explain
analyze,
so
that's
still
at
its
can
see
Ted
scan
whatever
did
you
do
yeah
so
we're
in
that
we're
in
would
them
contain
2049
values
for
every
page
and
I?
Think
what
you
can
then
do
is
given
those
rows.
I
guess
you
would
fetch
the
unique
deployment
IDs
and
then
do
a
update
for
every
unique
to
plummet,
ID
to
set
the
environment
ID.
So
you
get
something
like.
A
A
And
they
like
that,
a
bit
bigger
treatable
and
that
can
go
away
then
I'll
share
my
screen
again
and
I
just
got
the
whole
thing
there.
You
go
all
right
so
here
on
the
right,
we
have
fin
c'est
ce
qu'elle,
so
we
basically
end
up
at
some
points
like
update
deployment,
merge
requests,
set
environments
by
the
equals
x,
we're
deployment
in.
A
A
A
A
Yeah
yeah
yeah,
so
we
have
to
fill
the
duplicates
first,
I
guess
yeah,
but
I
I
think
we're
moving.
The
duplicates
itself
is
a
little
easier
because
what
you
essentially
can
do
is
you
can
do
select
count
and
then
group
by
I
guess
the
combination
of
obey
by
metro
custody
or
either
way
whatever
the
sort
of
composite
is,
and
then
you
get
the
rows
where
the
count
is
greater
than
one
and
they
just
remove.
A
So
you
get
where
the
count
is
greater
than
one.
Then
again,
you
have
to
fetch
those
rows
where
the
duplicate
wants
and
you
do
a
guess
it
did
lead
with
a
sub
select
or
something
where
you
basically
limit
it,
so
that
you
only
remove
all
about
one
row
after
the
top
of
my
head.
That's
something
like
the
leads
from
the
point
where.
A
A
A
B
C
C
A
Yeah
I
think
this
one
of
case.
We
have
to
really
break
it
down
into
like,
let's
say
you
know,
ten
separate
queries
first
and
then
see
how
we
can
sort
of
stitch
those
together.
It's
just
trying
to
think
what
sort
of
the
final
query
is.
Gonna,
look
like
my
head
already
explodes:
let's
do
this:
how
can
we
best
test
this
without
blowing
stuff
up?
I
will
just
start
a
real
console,
and
that
was
probably
some
production
engineers.
Looking
at
this
gogo
production.
A
A
We're
just
all
rows:
okay,
boy,
okay,
so
if
two
things
we
need
to
get
rid
of
these
duplicates,
which
means
we
have
multiple
rows
where
the
the
MIRR
trucker
study
is
the
saying
the
diplomatic
is
different,
but
the
environments
they
point
to
is
the
same.
Yes,
in
other
words,
deployment
a
and
B
may
point
to
environment
a
and
if
there's
you
know
two
occurrence
of
Americas,
we
only
want
one.
A
Normally,
if
you
have
a
primary
key,
we
can
do
something
like
is
delete
from
blah,
where
ID
is
not
maximum
something
we
don't
have
yet.
So
we
have
to
delete
the
latest
CT
IDs,
I,
guess
boy,
let's
see
so,
let's
get
the
three
of
2049
pages
to
do
pages
249,
and
then
we
had
how
we
need
row
still
yes
to
marriage.
Let's
say
seven
thousand
per
page
I
also
had
justice.
A
sorry
I
think
was
nine
thousand
just
eight
thousand
six
hundred
yeah.
B
A
A
A
A
B
A
A
226
interesting,
so
it
keeps
producing
that
I
wonder.
If
then,
the.
B
B
A
A
A
A
A
A
A
B
A
A
A
C
A
Yeah,
it's
gonna
be
tricky.
The
other
option
that
I
saw
is
that
we
can
use
cursors,
which
is
basically
stateful
pagination
on
the
database
side,
but
you
have
to
use
a
transaction
for
that
cuz
they're
scoped
to
a
transaction
I.
Think
in
that
case
we
could
do.
Is
you
basically
start
with
no
transaction?
You
start
one.
You
get
a
limited
number
of
rows,
let's
say
a
5,000
whatever
do
your
updates
and
then
sort
of
move
on,
but
I'm
not
sure
how
you'd
answer
to
figure
out
what
sort
of
offset
to
use
for
the
next
transaction.
A
B
A
So
instant,
it's
an
entirely
valid
solution
to
basically
duplicate
this
table.
There's
some
sequel
statements:
we
can
actually
use
where
you
can
create
a
copy
of
the
table,
including
the
indexes
and
everything
you
fill
it
up
and
they're.
Basically,
you
rename
it
the
issue.
Then,
if
you
have
to
keep
those
tables
in
sync,
so
you
have
to
say
using
triggers
to
make
sure
that
yeah.
B
C
C
C
A
A
B
So
I'm
not
just
replacing
the
table
I'm
just
copying
the
uncompleted
data
in
a
complete
form
in
the
new
one.
The
duplicates
will
be
deleted
in
this
insert
operation.
In
the
meantime
code,
the
new
code
is
already
running
because
we
deployed
it
and
also
it
suppose,
deployment
migration.
So
new
code
is
already
running,
so
every
new
deployment
will
get
the
the
full
tuple
will
also
the
environment.
A
B
C
A
So
that
could
work.
There's
some
issue
there.
If
you,
if
you
do
the
lead
of
the
old
data
and
then
insert
the
new
data
in
two
separate
transactions,
there's
gonna
be
a
short
period
of
time
where
Marcus
will
be
associated
with
deployments
because
we
basically
deleted
them,
but
we
haven't
copied
them
over.
Yet
that
is
effective.
A
That's
gonna
take
a
while
I
guess
what
you
could
do
is
sort
of
narrow
that
down
where
it's
instead
of
saying,
delete
all
insert
everything
you
do
it
in
groups,
so
you
iterate,
basically
over
the
temporary
table,
grab,
let's
say
a
thousand
rows
and
then
you
say:
hey
in
the
target
table
all
rows
with
this
deployment.
Id
remove
them
with
these
deployment,
IDs
remove
them
insert,
etc.
It
still
means
temporarily.
Some
of
the
data
is
not
there,
but
it's
gonna
be
much
shorter.
A
A
The
table.
Swapping
approach
doesn't
have
that
problem.
Assuming
xx
schema,
etcetera.
You
start
ups
in
section
you
just
rename
them
and
you're
done.
That's
a
very
cheap
operation
issue.
There
being
is,
if
you
copy
the
table
or
you
set
it
up,
you
also
have
to
probably
fix
the
names
of
all
the
indexes,
sequences,
etc.
A
A
So
we
use
your
approach
right,
so
we
create
a
temporary
table.
Oh
you
know
we
fill
it
up
with
all
the
appropriate
data,
etc.
Then,
at
some
point
we
determine
hey
we're
ready
to
swap
we
started
transaction.
We
basically
lock
both
tables,
then
I,
guess
we
have
to
do
one
final
check
to
make
sure
that
you
know
any
missing
data
is
in
this
temporary
table.
We
swap
the
names
commits,
and
then
we
delete
the
entire
table
that
we
have
now.
Basically,
in
favor
of
this
new
quantico
temporary
table.
A
That
particular
approach
means
one.
The
table
is
compact
because
we
insert
all
this
new
data,
so
it
might
actually
save
some
space
benefit
there.
Also
that
the
process
of
swapping
is
largely
dominated
by
how
long
it
takes
to
figure
out
hey.
Do
we
have
any
remaining
data
that
we
need
to
take
care
of
from
a
coding
perspective?
Mostly,
the
annoying
part
is
going
to
be
fixing
all
the
index,
naves
sequence,
names
etc,
so
that
they
are
the
way
reals
expects
them
to
be
I.
A
Think
post
the
sequel
has
a
way
where
you
can
say
create
table
like
this
thing,
and
it
will
copy
over
indexes
everything.
It
will
just
get
some
funny
names,
I
think
I
think
you
can
basically
just
do
a
little
pre
say
all
these
things,
rename
them
to
that.
I
think
that
is
probably
the
least
annoying
approach
seems
to
certainly
be
easier
than
the
CTI
D
approach.
A
A
B
A
B
A
B
A
A
Let
me
see
so
we
swap
the
tables
I.
Think
technically,
you
can
take
an
existing
index,
for
example,
and
change
what
table
it
points
to.
But
then
you
have
to
stop
messing
I,
think
with
processing
all
the
the
cataloging
terminals,
so
it's
probably
just
easier
to
recreate
the
index
with
like
a
different
name
and
then
rename
all
of
them,
because
let
me
see
cuz
renaming
indexes
and
everything
that's
instantaneously.
A
A
That
that's
a
no
weight
thing
and
I
immediately
have
I
think
we
we
we've
had
the
case
in
the
past.
We
looked
into
that
fingers
when
we
mark
rated
the
events
table
and
I
think
the
approach
it.
It
makes
a
difference
depending
on
the
index
types
especially
I
have
trigram
indexes.
They
take
a
long
time
to
update,
but
for
most
of
regular
b-tree
indexes.
A
B
Yeah
I
think
that
tomorrow,
I
will
give
a
chance
also
to
take
a
look
at
paginating
over
some
other
resources,
because
basically
this
table
is
linking
tree
resources.
So
if
I
can
paginate
over-
let's
say
environment
and
then
I
can
use
the
CT
ID
in
the
where
to
issue
the
update.
Maybe
we
can
update
this
in
place
right.
C
A
I
think
much
you
could
probably
paginate
by
deployment
ID
since
it
is
I
mean
there
are
duplicates,
but
probably
not
that
many
that's
forever
saner
quickly.
You
multiple
rows
with
the
same
deployment
ID,
but
if
you
basically
sort
them
as
sending,
you
can
do
where
and
what
diplomat
IDs
greater
than
that
might
be
even
easier,
no
I'm
totally
sure,
but
all
tomorrow
take
a
look
at
the
sort
of
table
swapping
approach,
see
if
that
even
makes
any
sense
all
right.
Let
me
stop
the
recording.