►
Description
Andreas, Grzegorz and Yannis discuss alternative ways to improve CI/CD data model.
A
Yeah
so
hello,
everyone,
it's
a
call
about
database
partitioning,
and
I
have
the
agenda
document
here.
Let
me
paste
that
to
the
zoom
chat,
there's
nothing
much
in
there
for
today,
I'm
I'm.
I
just
plan
to
take
some
notes
so.
A
Yeah,
so
I
wanted
to
discuss
our
plan
for
partitioning
ci
builds
table
or
another
table
that
we
are
going
to
then
migrate
rows
into,
and
I
wonder
what
are
your
like
thoughts
about
that?
I
I
hope
that
you
had
to
have
had
opportunity
to
read
the
merch
request
and
and
that
you
have
some
ideas,
questions
concerns.
So
I
would
like
to
hear
them.
B
Yeah
thanks
for
inviting
us
to
that
call
and
also
bringing
that
topic
up,
because
it
has
been
a
long,
long,
standing
topic
right.
We
we're
sort
of
looking
at
that
table
for
a
long
time
and
seeing
that
grow
and
seeing
the
growing
pains
with
that
and
yeah.
We
do
have
some
some
troubles
with
that
table
and
sort
of
the
data
model
that
surrounds
it.
So
that's
a
great
topic
to
look
into
I
was.
B
I
was
looking
at
the
the
recording
that
you
did
a
bit
from
from
last
week
and
also
read
the
the
much
request
on
the
discussion
there
I
think,
or
just
as
a
general
comment
sort
of
to
get
started.
What
is
what
is
perhaps
worth
saying
is
that
I
think
we
need
to
look
for
a
better
data
model
for
for
the
pci
for
some
parts
of
ci,
basically
that,
based
on
this
table
and
sort
of
partitioning,
that
is,
that
is
one
tool
in
our
hands
that
we
can
use.
B
But
it's
it's
probably
it's
a
good
tool
and
perhaps
we
can
use
it,
but
it's.
I
think
it's
good
to
approach
it
with
an
open
mind.
Sort
of
look
at
what
is
the
problem
that
we're
trying
to
solve
and
what
yeah,
what
changes
do
we
want
to
make
to
the
data
model
and
partitioning
can
be,
can
be
one
of
that
right.
We
have
more
problems.
I
think
already
identified
that
that
we
want
to
discuss
there
right.
A
Yeah,
so
I
completely
agree.
Unfortunately,
there
are
some
constraints.
For
example,
we
cannot
use
any
database
that
we
would
like
to
use.
We
cannot,
we,
you
know,
introduce
new
technology,
because
it
means
that
we
will
need
to
ship
that
on
premises
as
well-
and
it's
kind
of
you
know
something
that
is
limiting
us
in
many
different
ways.
A
So,
there's
not
like
not
a
lot
of
tools.
We
can
choose
from
to
model
the
cicd
data
better.
We
can.
We
are
already
using
object,
storage,
a
lot.
We
are
using
postgres
a
lot
and
if
we
can
actually
improve
the
data
model
using
these
tools,
it
would
be
great.
A
Otherwise
it
might
be
much
more
difficult
and
might
you
know,
go
beyond
the
scope
of
the
architectural
change,
so
you
know
I
try
to
focus
on
the
main
pain
points.
It's
usually,
you
know
a
good
idea
to
see
what
the
problems
are
and
work
based
on.
You
know
what
we
want
to
solve
so
like
we.
We
all
know
that
the
ci
table
is
very
large
and
creates
a
lot
of
problems.
A
I
plan
to
get
them
documented
in
the
emergency
quest.
I'm
in
the
architectural.
You
know
blueprint
I've,
I
have
not
done
it
yet.
So
I'm
sorry
for
that,
but
basically
I
think
that
you
know
database
experts.
You
basically
know
what
the
problems
are,
so
I'm
not
going
to
describe
them
in
this
call,
but
we're
running
out
of
the
primary
keys
like
we
are
probably
at
around
50
capacity
right
now
and
like
the
table
is
super
wide.
A
Super
long
and
engineers
are
facing
problems
with
statement
timeouts
and
from
what
I've
learned
from
jose
and
nikolai
like.
There
are
much
more
problems
that
stem
from
the
table,
size
that
are
happening
inside
the
postgresql
that
are
not
really
visible
to
engineers
or
even
sres.
You
know
things
that
are
consuming
a
lot
of
internal
resources,
and
this
is
something
that
is
not
that
tangible
for
people,
but
I'm
sure
that
problems
like
this
also
exist.
B
Yeah,
I
think
maybe
it
may
be
a
good
way
or
a
more
tangible
way
of
approaching
that
I
know
you
mentioned
that
before
is
like
seeing
performance
problems
with
the
table
size,
and
this
is
this-
is
something
where
partitioning
can
help
you,
but
you
still
have
to
understand
how
the
application
works
with
the
data.
So,
basically,
if
you
partition
it
one
way
and
the
application
works
with
it
the
other
way,
then
it's
still
gonna
be
bad,
like
or
maybe
even
worse,
right,
so
yeah.
A
I
completely
agree
that
that's
that's
the
reason
why
I
decided
to
get
involved,
because
I
know
how
ci
works
and
I'm
not
a
database
expert,
but
that's
the
reason
why
I
hope
to
collaborate
with
you
to
make
it
like
happen.
I
mean
partitioning
or
something
else
that
we
decide
to.
A
You
know
pursue,
but
right
now
my
best
bet
on
what
we
should
do
is
to
create
a
new
table
partition
the
table,
redesign
application
on
the
back
end
and
front
end
to
actually
make
use
of
the
other
table
as
well
and
to
gradually
move
old
pipelines.
We
would
call
archive
pipelines
to
that
table.
A
We
can
also
slice
vertically
and,
for
example,
persist.
Yaml
variables,
commands
and
options
inside
the
object.
Storage
for
archived
builds.
A
So
these
are
my
ideas
and
I
just
wonder
what
you
think
about
that
and
what
do
you
think
about
this
idea
of
introducing
partitioning
not
in
a
way
that
we
basically
partition
ci
bills,
but
instead
we
create
a
separate
table
that
we
partition
and
devise
an
iterative
plan
to
move
all
pipelines
and
builds
inside
the
new
partition.
The
new
type
table,
and
this
way
we
would
leave
the
ci
builds
basically
alone
for
now.
But
it
would
be
much
smaller
and
much
easier
to
change
and
you
know
in
the
future.
B
Reps
would
it
help
if,
if
we
expand
a
little
bit
on
what
we
already
have
as
experience
in
in
gitlab
with
working
with
partitions,
because
it's
not
the
first
time
that
we're
doing
this?
We
have
a
previous
example
where
we
were
working
on
the
the
audit
events
model.
That's
what
we
picked
at
a
time
where
we
also
introduced
partitioning
and
it
really
works
much
in
a
way
like
you
described
it.
We
were
basically
we
had
a
very
large
table,
the
auto
defense
table.
B
We
were
looking
at
the
basically
under
trying
to
understand
the
application
behavior.
So
how
do
you?
How
do
you
access,
auto
defense?
How
do
you
work
with
that?
And
in
this
case
it
was
relatively
simple
because
it's
you
know
a
limited
feature,
there's
relatively
limited
ways
of
accessing
the
data,
and
it
turns
out
that
well,
the
the
time
dimension
is
always
the
dimension
that
we
work
with
the
data.
B
You
know
change
change
it
in
that
way
and
that
sort
of
helped
us
to
make
a
decision
to
set
up
audit
events
is
something
that
we
want
to
partition
by
time,
because
that's
the
dimension
that
we
always
have
when
you
access
data-
and
this
is
really
where
partitioning
shines,
because
then
you
always
have
a
key
available
that
allows
you
to
really
when
you
query
the
the
database
go
to
the
right
partition.
B
So
you
don't
end
up
scanning,
all
all
partitions
or
a
very
large
table,
but
only
the
one
partition,
or
maybe
a
few-
that
you're
interested
in
that
that's
where
the
performance
benefits
them
come
from
and
then
what
we
did
was
creating,
like,
like
you
explained
it
already,
creating
a
new
table
that
is
partitioned.
Basically,
it
has
an
identical
schema
to
the
compared
to
the
original
table.
B
In
that
case,
even
we
had
the
same
problem
with
the
integer
primary
keys.
So
we
turned
the
new
tables
primary
key
into
a
weekend,
so
8
bytes
integer.
B
The
new
table
is
partitioned,
so
there
is.
There
is
already
code
that
handles
creating
monthly
partitions,
because
that
is
something
that
you
have
to.
You
have
to
do
as
you
go
so
not
not
with
schema
migrations,
which
is
more
of
a
static
kind
of
thing,
but
creating
monthly
partitions
is
something
you
have
to
do
on
a
monthly
basis.
B
Basically,
so
there
is
code
that
handles
this,
and
then
there
was
a
background
migration
that
was
running
for
a
while,
basically
copying
records
over
and
making
sure
that
updates
are
also
being
being
carried
over
and
then
at
some
point
we
were
at
least
or
actually
relatively
recently.
We
were
swapping
tables,
so
the
application
was
basically
picking
up
the
new
table
partition
one
and
now
the
correct
me.
B
If
I'm
wrong,
I
think
we're
still
have
to
drop
the
old
table
that
is
still
around
so
at
the
moment
it
works
vice
versa,
so
we're
inserting
and
updating
copying
stuff
to
the
old
table
just
for
as
a
backup
mechanic,
but
we're
basically
ready
to
drop
that
table.
B
B
B
post
was
12,
is
something
that
fixes
this,
so
this
is
going
to
get
easier
in
a
couple
of
months
when
we,
when
we
drop
couscous,
11
support,
and
then
I
think
the
sort
of
the
key
question
that
we
would
have
to
look
at
first
is
like
what
is
the
application
behavior?
What
is
the?
How
do
we
access
that
that
data?
A
And
I
think
that's
a
very
valid
question,
but
I
think
it's
basically
too
early
to
answer
that
question,
because
the
partitioning
k
we
choose
is
going
to
depend
on
how
we
model
access
to
archived
pipelines,
and
we
can
model
that
on
many
different
ways.
A
The
this
is
something
that
does
not
exist
yet
because
I'm
pretty
sure
that
we
cannot
partition.
Ci
builds
in
in
the
form
that
this
table
exists
today,
like
it's
too
big,
too
complex,
it
has
too
many
foreign
keys
constraints
and
a
lot
of
you
know
statement
groups
that
are
addressing
statement
groups
yeah
that
we
do
have
hundreds
of
them.
Basically,
so
I
feel,
like
you
know
it's:
it's
not
really
possible
to
partition.
Ci
builds.
A
What
is
possible
instead
is
to
build
a
completely
new
storage
model
for
archived
pipelines,
and
the
difference
is
going
to
be
that
the
aircraft
pipelines
do
not
require
being
processable.
I
mean
you
know.
Whenever
a
pipeline
is
archived,
you
will
be
able
to
only
see
it
and
access,
perhaps
for
the
api,
but
you
will
never
be
able
to
trigger
a
manual
action.
You
will
never
be
able
to
retry
a
build.
A
You
will
never
be
able
to.
You
know,
do
anything
with
that
pipeline,
except
of
visualizing
all
the
data
that
we
store,
because
I,
in
my
opinion,
data
durability,
is
important
and
removing
data
like
that.
That's
not
good,
not
a
good
product
decision
in
case
of
pipelines
that
may
be
relevant
to
user
even
after
years,
they
might
want
to
go
to
a
pipeline
that
deployed
like
version
x
created
years
ago
and
want
to
see,
for
example,
volume
for
variables
like
that.
So
I
think
that
we
should
not
remove
data.
A
We
can
move
data,
move
data
to
a
different
table
partition
or
to
object
storage,
but
it
the
data
should
be
there.
So
how
I
basically
envisioned
the
new
feature
and
new
data
model
for
archive
pipelines.
I
can
tell
you
describe,
I
can
describe
you
that
in
a
moment,
but
eventually
it
might
not
be
only
my
decision.
A
A
You
know
the
table
that
lists
all
the
pipelines
latest
20
because
we
paginate
that
right,
but
at
the
bottom,
at
the
at
the
top
of
the
page,
you
have
running
pipelines,
pending
pipelines,
all
pipelines,
and
I
envision
adding
a
archived
pipelines
tab
in
the
aircraft
pipelines
that
you
will
see
a
drop
down,
for
example,
2020
2021,
you
know
2019
and-
and
you
choose
a
time
range
we
could
do
partitioning
yearly,
for
example,
and
when
you
choose
the
range
that's
the
moment
when
we
know
which
partition
we
are
going
to
access
and
we
are
going
to
like
display
all
the
pipelines
when
you
click
on
that.
A
B
A
Month,
like
it's
not
clear
to
me
yet,
if
we
should
partition
yearly
monthly
like
it
depends
on
the
quantity
of
data
of
other
aspects,
and
and
basically
you
know,
we
need
to
model
that
in
the
front
end,
we
need
to
model
that
in
the
ui
we
need
to
refactor
and
change
backend
to
make
it
possible
to
display
pipelines
that
are
kind
of
different.
They
are
going
to
be
stored
differently
in
a
different
table
with
perhaps
you
know,
using
a
different
model.
A
A
So
you
know
that's
that's,
basically
how
I
see
that
and
of
course,
how
we
model
that
in
the
front
end
whether
it's
you
know,
the
product
team
will
need
to
make
a
conscious
decision
about
how
users
want
to
access
their
pipelines
after
they
are
archived
right,
and
this
decision
will
actually
influence
what
partition
key
or
keys.
We
are
going
to
use
so
with
right
now,
it's
too
early
to
tell,
but
this
is
how
I
see
that
do
you
think
it
makes
sense.
B
No
sure
I
think
the
like
you
said
this
is
not
a
not
only
a
technical
discussion,
it's
also
product
product
or
foremost,
the
product
position
to
make
on
that
on
the
technical
side,
only
I
I
wouldn't
be
able
to
see
why
we
or
if
we
would
do
it
any
differently.
If
we
didn't
have
archive
pipelines
so
could
even
like
if
from
a
product
site,
it
makes
sense
to
to
always
access
your
pipelines
by
month.
A
One
reason
is
that
the
pipeline
queueing
has
been
modeled
on
the
database
level,
so
whenever
a
runner
asks
gitlab
to
provide
a
build,
we
run
this
huge
statement.
That
is
like
one
page
long.
If
you
print
it
and
it's
very
complex,
it's
using
sub
queries
complex
joins,
and
you
know,
if
we
add
partitioning
to
the
equation
like
it
might
be
very
inefficient
or
even
impossible
to
do
that
properly.
Imagine
queuing
that
spans
through
multiple
partitions.
A
What
happens
if
you
go
to
a
pipeline
that
you
created
a
year
ago
and
you
create
create
retrieve
build?
In
that
case,
the
new
build
is
going
to
be
created
and
it's
going
to
become
pending
and
which
partition.
Should
it
go
like
how
the
queueing
would
work
in
that
case.
So
this
is
one
of
many
many
caveats
of
how
ci
works
and
yeah.
B
Isn't
that
also
a
way
of
saying
when
we
look
at
the
current
data
model,
then
there
is,
we
basically
only
have
one
model
for
ci
bills
right.
This
is
also
the
reason
why
it's
so
wide,
because
we
cover
all
those
those
cases
and
it
sounds
a
bit
like
we're.
Also
mixing
up
different
needs
like
one
is,
storing
the
build
information
and
then
the
other
one
is
the
queuing
which
is
sort
of
a
way
of
saying
that
across
the
site
I
want
to
see
all
pending
builds.
B
This
is
not
going
to
work,
but
basically
maybe
we
can
make
that
a
part
of
the
discussion
that
we
say.
We
know
that
we
have
this
queuing
problem
and
we
also
need
to
store
the
data
in
some
way.
Perhaps
we
can
treat
those
as
separate
ones
and
find
a
good
solution
for
the
queueing
problem
and
also
for
the
for
the
data
storage
and
that
can
be
postposed
to
for
the
queueing
side.
But
it's
just
that.
We
split
those
needs,
would
that
make.
A
Sense,
I
completely
agree,
and
I
think
we
should
redesign
queueing
and
but
I
also
want
to
be
pragmatic.
A
It's
not
going
to
happen
overnight
or
before
we
devise
a
way
to
partition,
ci
builds
and
in
order
to
actually
redesign
queueing,
we
should
improve.
Ci
builds
table
anyway,
and
I
feel
like
these
are
like
separate
things
we
should
do
in
the
ci.
But
partitioning
has
a
bigger
priority
right
now,
because
of
the
all
the
database
quality
okrs
being
created
and
yeah.
It's
clear
that
you
know
we
are
also
running
out
of
the
primary
key,
so
redesigning
queueing
is
going
to
take
a
lot
of
time
as
well.
A
B
Can
I
ask
a
question?
Yes,
of
course,
just
just
from
a
strategy
point
of
view.
I
know
you,
you
were
working
on
the
sort
of
the
bigger
picture
blueprint
redesign
for
for
that
and
it's
it
sounds
a
bit
like
we're
sort
of
narrowing
the
focus
and
concentrate
more
on
the
database
side.
Only
of.
C
B
And
say
like
well
database
partitioning
is,
is
the
one
step
that
we
want
to
take?
How
does
how
does
it
relate
to
the
bigger
picture?
Is
that
something
that's
still
in
progress,
or
have
you
sort
of
given
that
one
up
and
sort
of
focus
on
this.
B
So
we're
saying
that
we
need
to
do
database
partitioning
because
we
have
those
database
problems
right
and
at
the
same
time
I
know
there
is
a
lot
more
discussions
going
on
about
redesigning
ci
completely.
B
A
So
that's
a
good
question.
I
think
that,
basically
we
it's
clear
that
we
need
to
redesign
ci
in
a
way
that
it
can
sustain
the
further
growth
and
it
can
scale
even
more,
and
there
are
some
limits
to
how,
for
example,
efficient
can
be
efficient,
but
I
feel
like
we
should
move
increment
and
incrementally
and
in
order
to
actually
improve
ci
even
more,
we
need
to
have
a
way
to
migrate.
A
A
Moving
data
out
of
the
ci
builds
table
and
actually
also
in
updating
every
row,
and
it
it
was
three
years
ago
or
two
years
ago
and
back
then
it
was
a
big
problem.
It
it
caused
a
ton
of
table
bloat.
It
cost
like
background
migrations
running
for
weeks
and
right
now,
it's
even
difficult
to
actually
take
a
step
at
migrating.
Anything
because
this
table
is
too
large.
A
So
in
my
mind
you
know,
addressing
partitioning
can
allow
us
to
slightly
improve
background
migrations
and
if
we
make
them
possibly,
we
should
make
it
possible
for
them
to
run
in
parallel.
For
example,
like
that's
many
one
of
many
ideas,
we
can
actually
resolve
that
problem
of
not
being
able
to
migrate
data
from
this
table
or
within
this
table,
and
this
can
actually
unblock
all
the
future
redesigns
and
improvements
for
the
ci
data
mode.
That
doesn't
make
sense.
B
It
does
yeah
and
it's
it's
very
painful
today-
to
do
these
kind
of
things
and
we
can
improve
that.
But
I
think
I
would
still
say
that
looking
just
looking
at
the
size
and
the
number
of
records,
if
we
want
to
migrate
the
data
and
copy
that
over
into
into
a
new
table
structure,
I
think
we
have
one
shot
of
doing
that
within
the
next
year.
Basically
or
you
know,
roughly
speaking,
but
that's
my
that's
my
feeling
about
this.
B
I
think
we
can
still
improve
on
the
on
the
background
migration
side,
copying
data
over
making
that
faster
and
all
that,
but
it's
still
a
huge
undertaking.
So
I
would
just
based
on
the
experience
with
audit
events
and
other
background
migrations
that
we
did.
I
would
expect
that
if
we
approach
that
we
copy
data
over,
we
have
a
good
chance
that
this
takes
a
long
time
for
us
to
come
to
finish
completely
right
and
that's
why.
B
I
think
we
need
to
be
very
careful
with
our
expectations
towards
partitioning
and
what
problems
this
is
solving
for
us.
A
Yeah
I
mean
transforming
data
and
copying
to
a
different
place
or
within
you
know,
all
creating
a
column
and
transforming
data
and
moving
it
to
separate
to
the
other
column
and
like
I
can
use
many
different
transformations
that
we
should
make.
For
example,
instead
of
having
a
hard
code
that,
like
hardcoded
like
environment,
referenced
as
a
character
varying
string,
we
could
create
a
separate
table,
have
a
environment,
you
know
row
and
then
reference
this
as
an
id
like
partially
is
done
already,
but
there
is
some.
A
B
I
don't
see
how
partitioning
improves
that
for
us,
we
would
still
update
the
the
full
column
you
still
have
to
have
to
deal
with
all
the
records
and,
yes,
those
those
will
live
inside
different
partitions,
but
you
still
have
to
update
all
of
them.
So
for
those
scenarios
partitioning
doesn't
help
us.
I
think,
unless
I'm
I'm
seeing
something.
A
So
that's
that's
a
very
interesting
discussion
because
in
in
the
past,
I've
heard
about,
for
example,
being
able
to
run
more
vacuum
workers
to
make
it
easier
to
handle
table
bloat.
Whenever
you
update
that
many
records,
I've
heard
that
it
might
be
possible
to
actually
execute
migration
in
parallel
on
every
partition,
instead
of
doing
the
sequentially
record
by
record-
and
there
are-
you
know
a
few
other
things
that
I've
heard
could
be
possible
with
partitioning
that
are
not
possible
when
you
don't
have
partitioning.
B
That's
that's
true.
For
the
vacuum
side,
on
the
sort
of
more
the
operational
side,
this
becomes
easier
to
manage,
at
least
but
still
from
a
from
a
sort
of
application
perspective
from
how
the
how
you
deal
with
data.
How
do
you
write
those
migrations,
not
a
lot
of
changes,
so
you
don't
really
benefit
from
that
side
from
partitioning.
A
Like
I,
I
mean,
if
we
really
want
to
transform
cic
data
model,
we
would
need
to
improve
data
migration,
background
migrations
as
well
like
this
mechanism.
In
my
opinion,
it's
not
reliable
enough
right
now
and
yeah
and
there's
definitely
some
room
for
improvement.
If
you
want
to
make
it
much
faster
and
much
more
reliable.
A
And
I
I
think
that
you
know
partitioning
can
actually
help
with
migrating
data
data
once
we
improve
background
migrations
as
well.
But
what?
What
are
your
thoughts
about
that.
B
B
C
C
Partitioning
won't
solve
your
problem;
it
can
solve
some
problems
so
as
as
you
said
that
we
could
start
building
at
some
point,
batching
that,
with
bats,
would
create
different
batches
over
different
partitions.
C
C
This
is
vertical
to
the
discussion
of
how
can
we
make
background
migrations
better
in
from
my
perspective,
so
we
also
agree
that
we
have
seen
issues
with
background
migrations
not
being
reliable
enough
not
being
able
to
to
be
split
properly
or
how
can
you
define
the
bad
size
how
you
can
make
them
dynamic?
C
Those
are
a
lot
of
problems
that
we
also
are
thinking
about,
but
I'm
not
sure
if
they
are
only
tied
to
partitioning
or
not
partition
tables.
This
is
something
we
have
to
work
towards,
but
this
is
also
something
vertical
to
all,
whether
we
need
partitioning
or
not
and
from
my
perspective
partitioning
is
needed
if
you
can
do
partition
pruning
properly,
and
that
means
that
when
you
clear
the
data,
when
you
are
trying
to
fetch
data,
that
you
can
focus
the
query
to
check
only
one
or
two
partitions
out
of
100.
B
C
Of
200
partitions
and
that
will
make
things
way
faster,
both
while
selecting
and
also
while
upgrading
the
updating
data,
and
I
think
that
that's
the
important
thing
to
focus
when
we
are
thinking
about
partitioning.
A
And
I
think
that
you
know
basically
I
think
that's
exactly
what
we
want
to
achieve.
We
want
to
make
ci
build
stable,
very
small,
because
every
time,
a
runner
processes
a
job.
We
do
see
a
ton
of
reads:
a
ton
of
rights
and
a
ton
of
filtering
and
searching
through
this
table
and
with
you
know,
introducing
another
table
called.
For
example,
ci
builds
archive
with
a
slightly
different
schema.
A
We
can
actually
make
ci
builds
much
smaller,
but
I
I
don't
think
it's
reasonable
approach
to
partition
ci
bills.
Right
now.
We
can
partition
ci
builds
archive,
but
we
might
decide
not
to
do
that
like
if,
if
the
benefit
is
not
going
to
be
significant-
and
it's
you
know
up
to
you
guys
to
tell
me
that
it
might
not
be
then
perhaps
the
first
reasonable
iteration
is
to
actually
extract
see.
A
I
built
archive
table
migrate,
a
ton
of
builds
to
that
table,
and
and
that's
it
perhaps
it's
going
to
make
ci
build
stable,
sufficiently
small
and
we
might
not
need
to
partition
the
cia
built
archive,
but
from
what
I've
heard
from
nikolai
and
jose.
A
B
B
Just
from
a
general
feeling,
I
think
I
still
want
to
focus
that
discussion
more
on
the
problem
that
we're
trying
to
solve,
or
at
least
I
don't
understand
what
the
what
the
exact
problems
that
we're
trying
to
solve.
To
give
an
example,
let's
say
we
are
concerned
with
the
queuing
mechanism
that
you
described
where
we
have
those
gigantic
queries
that
go
across
everything
and
those
are
very
slow
and
presumably
also
pretty
pretty
frequent.
B
If
that
was
the
problem
that
we're
trying
to
solve,
we
could
say
that.
Well,
we
leave
this
in
place
as
it
is
the
ci
builds
table,
but
we
find
a
good
model
that
allows
us
to
do
the
queueing.
B
We
make
that
as
as
narrow
as
possible,
and
then
we
we
have
duplicate
data,
because
that
information
presumably
also
lives
in
the
current
table.
That's.
A
B
In
the
case
that
I
just
explained,
you
don't
migrate,
any
data,
you
just
start
creating
a
new
queueing
mechanism
and
you
start
using
that-
and
perhaps
that
takes
a
little
bit
of
like
time
to
to
get
into.
A
A
C
So
if
your
approach
is
to
go
through
in
batches
of
and
go
through,
ten
thousand
batches
each
time
you
will
take
the
same
time,
some
things
will
be
faster
index,
lookups
will
be
faster,
other
things
will
be
faster.
Maybe
we
can
start
discussing
and
thinking
about
a
way
to
addre
to
access
multiple
partitions
at
the
same
time,
which
would
for
sure
make
things
faster,
but
it's
not
like
it
will
make
everything
faster,
because
if
you
have
to
copy
a
billion
things,
you
have
to
copy
billion
things.
That's
it.
C
You
have
one
disk,
let's
say
one
disk
and
you
have
to
move
one
billion
things
from
one
place
to
another
place.
So,
however,
with
chunk
things,
the
cost
is
the
same.
The
lookups
will
may
be
faster
and
more
parallelized
version
may
be
available,
but
either
way
you
have
to
go
through
the
process
of
coping
a
billion
things.
C
That's
true,
that's
100
true,
but
if
you
and
the
and
we
agree
that
the
bloating
is
affected,
the
size
of
the
index
is
affected,
which
is
very
important
because
smaller
indexes,
you
have
a
bigger
partner
in
memory,
et
cetera,
et
cetera,
et
cetera.
But
this
is
not
like
it
will
move
you
from.
C
It's
not
like
you
will
go
down
to
two
days.
You
will
go
down,
but
what
we
have
to
understand
here
is
that
this
is
not
a
magical
one
and
what's
very
important
here-
is
that
with
partitioning,
because
this
is
a
physical
partition,
it's
a
physical,
changing!
You
win
something
you
lose
something
so,
for
example,
if
we
partition-
let's
say
by
creating
that,
but
you
want
to
process
by
expire
that,
because
you
want
to
to
process
expired
pipelines
if
those
are
correlated
in
this
case,
but.
C
You
will
have
to
visit
marketing
partitions,
and
this
is
a
random
example
I'm
giving
now,
which
means
in
that
case
you
won't
gain
anything
so,
for
example,
for
the
cleanup
jobs
that
go
in
the
expire,
job
pipelines,
those
won't
be
affected
and
maybe
worse
from
an
execution
perspective.
A
I
think
I
understand
you
know
how
to
actually
what
to
do
to
benefit
from
partitioning,
and
I
understand
that
partitioning
can
go
things
much
worse
when
we
are
not
mindful
about,
for
example,
adding
the
you
know
keys,
we
partitioned
right
with
the
workloads
right
that
that's
clear
what
I'm
trying
to
understand.
A
A
But
it's
not
clear
to
me
what
what?
What
is
the
message
you're
trying
to
convey?
So
can
you
like
elaborate
on
that.
B
I
think,
in
order
to
to
say
something
about
the
quality
of
a
partitioning
approach,
so
basically
pick
some
key
partition
a
table.
You
really
have
to
understand
what
what
is
the
problem
that
we
are
solving
like?
What
are
the
queries
that
we
are
making?
How
do
we
access
that
data?
How
does
that
partitioning
approach
help
us
there?
B
What
is
our
goal
right
if,
if
table
bloat
is
our
only
problem,
then
partitioning
is
probably
a
really
good
tool
right,
but
I
think
I
I
but
I
mean
we
know
we
have
more
problems
than
table
load
and
maybe
that's
not
the
most
important
one.
So
that's
what
I'm,
what
I'm
still
missing
from
that
conversation
is
sort
of
saying
what
is
what
what
are
ways
that
we
access
this
data?
B
A
A
Seeing
the
problems
related
to
the
size
of
the
table,
that
every
statement
is
very
slow,
we
might
even
be
using
an
index,
but
sometimes
it's
not
enough.
The
table
is
just
too
large
and
you
know
engineers
are
being
affected
by
that,
because
something
works
in
the
gdk.
But
then,
when
it
hits
production
it
doesn't
work
because
the
table
is
just
too
weak.
We
we
have
we've
had.
A
We
have
had
like
a
lot
of
merch
requests
when
someone
wanted
to
migrate
the
data
because,
for
example,
this
column
is
not
used
anymore
and
we
would
like
to
extract
this
data
to
somewhere
else
or
you
know,
there
is
a
a
column
that
is
basically
nil
and
we
would
like
to
backfill
the
data
in
there
and
there.
A
You
know
many
problems
like
that
are
affecting
development
velocity
and
the
idea
I
had
is
to
actually
you
know,
introduce
the
ci
builds
archive
table
that
is
going
to
have
a
different
schema
and
if
we
move
95
percent
of
builds
to
that
table,
the
ci
builds
table
is
going
to
be
much
more
manageable.
It's
going
to
be
much
smaller.
The
index
size
is
going
to
be
much
smaller.
The
amount
of
row
and
reference
from
external
tables
is
going
to
be
much
smaller.
B
It
does
yeah,
but
it's
also
a
way
of
saying
that
everything
depends
on
the
data
access
yeah.
So
let's
say
you
have
a
very
large
table.
That
itself
is
not
a
problem.
It's
just
a
way
of
saying
how
much
data
there
is
right.
If
you
have
a
data
access
type.
That's
that's
always
looking
up
records
by
its
primary
key,
not
a
problem.
You
can
have
very
large
tables
for
that.
B
If
you
have
in
addition
to
that,
you're
also
scanning
on
a
column
and
have
a
very
expensive
query
where
you
can't
make
use
of
the
one
index,
only
large
table
becomes
much
bigger
of
a
problem.
So
basically
that's
what
I'm?
What
I
mean
with
looking
at
the
problems
that
we're
just
trying
to
solve
is
by
which
perspectives
do
you
have
on
the
data?
How
do
you
access
with
that
data.
A
So,
and
that's
that's
a
very
good
question,
if
you
formulate
this
way,
because
I
think
the
answer
is
that
we
have
so
many
patterns
of
using
this
table,
that
we
can't
even
tell
how
we
are
accessing
data,
we
presumably
do
have
hundreds
of
queries
that
are
accessing
ci
builds
and
there's
almost
no
way
to
optimize
that,
given
the
current
size
of
the
table
and
observability
mechanisms
like,
I
think
it
was
jose
or
nikolai
that
tried
to
collect
or
the
statement
groups
with
cia
build
stable
involved,
and
although
we
can
probably
duplicate
this
set
of
statement
groups
even
more
like
it's
quite
clear
that
if
there
is
more
than
a
few
hundred
queries,
different
queries
that
somehow
join
or
use
the
ci
builds
table.
A
So
it's
clear
that
you
know
trying
to
address
how
we
are
using
this
table
is
not
going
to
help
us
or
you
know
like
it
might
be
a
few
times
effort
there
is
like
the
application
is
too
complex.
Already
the
coupling
to
the
table
is
just
too
big.
What
we
can
do
instead
is
to
tell
the
application
that
look.
95
of
that
of
these
builds
are
archived.
A
We
can
move
them
to
a
completely
different
schema,
completely
different
table
and
you
will
need
to
deal
with
like
only
the
five
percent
and
that's
something
that
I
feel
is
attainable.
That
is,
you
know
something
we
can
do
because
optimizing
all
the
queries
optimizing
all
the
usages
patterns,
it's
simply
impossible,
given
the
size
of
this
application
in
the
amount
of
complexity
and
cutting.
B
That's
a
bit
surprising,
though
I
mean
you
know
just
coming
in
from
from
having
an
ideal
idea
about
how
to
approach
it
when
you
think
about
ci
data.
This
is
always
within
a
pipeline,
probably
within
a
project
or
at
least
within
a
namespace
right.
B
So
when
you
look
at
queries
today
that
we're
making,
what
I
would
expect
to
see
is
a
lot
of
queries
that
happen
inside
the
namespace.
You
go
to
the
pipeline,
tab
and
stuff
like
that,
and
it
can
well
be
the
case
that
these
queries
don't
have
any
additional
namespace
id
equals
or
project
id
equals,
filter
and
stuff
like
that
today,
right,
I.
A
Think
that's
not
needed.
The
only
thing
that
can
not
be
scoped
to
a
namespace
or
project
is
queuing
because
is
an
instance-wide
thing
right.
A
So,
however,
we
address
that
the
queuing
will
eventually
need
to
access
multiple
partitions
and
that's
the
reason
why
we
cannot
really
partition
by
the
namespace
or
project
we
might,
you
know
actually
need
to
rework
queuing,
but
the
extent
of
changes
would
be
so
huge
that
we
probably
would
need
to
build
a
separate
service
that
would
have
has
it
its
own
database
and
whenever
there
is
a
build,
we
would
need
to
push
the
push
the
bill
to
the
service
and
the
queuing
could
happen
there.
A
So
this
way
you
know,
we
could
actually
make
the
queuing
possible
when
we
partitioned
by
the
namespace
or
project.
But
I
we
try
to
like
there's
the.
If
there's
this
issue
about
extracting
cicd
daemon,
we
wanted
to
make
it
like
a
golang
based
service
with
a
separate
database,
a
separate
queuing
mechanism
that
would
all
only
hold
active
builds
that
are
being
either
processed
or
enqueued
right,
but
we
know
that
it's
not
going
to
happen
anytime
soon.
A
The
effort
and
the
investment
would
be
enormous,
and
that
does
not
seem
like
something
we
can
do
anytime
soon.
We
might
need
to
do
that
one
day,
but
not
anytime
soon.
A
So
with
with
that,
you
know
in
mind
that
we
cannot
really
partition
by
project
or
namespace.
B
Do
you
see
what
I'm
saying
when,
when
we
talk
about
looking
at
the
current
increase
and
not
being
able
to
spot
the
common
key
that
we
could
use,
this
doesn't
really
say
that
there
is
no
such
common
key
right.
B
I
mean
I
see
what
you're
saying
about
queueing
and
this
goes
back
to
maybe
doing
that
that
differently
or
splitting
those
models
for
it.
But
basically
that's
that's.
Perhaps
we
can.
B
We
can
take
that
one
step
further
and
when
looking
at
all
those
queries
figure
out
if
it
would
be
possible
to
use
a
project
key
project
id
as
a
key
or
something
like
that
just
to
explore
those
those
options
a
bit
further.
A
Actually,
I
have
more
like
practical
question:
what
is
the
cost
of
having
a
partition
if
we
decide
to
partition
by
a
namespace
or
a
project,
and
we
have
like
at
one
of
three
users
that
are
going
to
have
one
pipeline?
We
are
going
to
create
a
partition
for
them
because
they
do
have
a
new
project
is
what
is
the
cost
of
maintaining
such
a
partition?
That
has
only
a
handful
of
entries.
B
B
What
you
can
do
is
hash
partitions
based
on
the
project
id,
for
example,
that
doesn't
give
you,
those
not
you,
know
same
size
partitions
and
there
can
be
hotspots.
So
this
this
can
have
some
problems,
but
I
would
not
recommend
creating
partitions
for
selected
projects
or
something
like
that.
C
C
C
Will
be
grouped
together
equally.
A
But
wait
like
if
we
devise
a
hashed
based
partitioning
key
and
we
have
let's
say
500
partitions.
Yet
we
do
have
millions
of
projects.
So
does
it
mean
that
it's
possible
that
multiple
partitions
are
going
to
hold,
builds
for
the
same
project.
C
C
C
Let's
say
if
it
was
by
project
id,
everything
from
the
same
project
will
be
in
the
same
partition,
but
you
will
have
you
know
whatever
your
ratio
is
that
many
different
projects
in
the
same
participation.
A
And
so
how
do
we
model
the
multi-level
nested
groups
in
that
case,
because
this
is
how
gitlab
works
right
there
are
multi-level,
like
group
can
include
a
group
and
include
a
group,
and
you
know
there
is
some
nesting
involved,
and
this
can
actually
affect
the
strategy.
But,
like
the
strategy
like,
we
can
explore
that
of
course.
But
how
do
we
fix
the
wing
in
that
particular
case?.
B
Right,
so
this
is
basically
there's
this
partitioning
scheme
where
you
always
have
the
project
id.
So
this
is
great
right
and
then
there
is
this
additional
perspective
where
you
say
like
give
me
all
the
pending
builds
based
on
their
status.
B
That
is
pending
right,
and
in
this
case,
what
you
do
is
actually
split.
Those
like
have
two
different
data
models
and
sort
of
make
make
one
it's
an
optimization
right.
It's
it's
a
way
of
saying
that
we
optimize.
A
Yeah,
so
it
means
that
you
know
we
we
cannot
have
a
separate
partition
for
pending,
builds
only
or
active
bills
like
for
queueing,
it
might
work,
but
it
it's
not
going
to
work
to
for
pipeline
visualization
right,
because
someone
might
want
to
see
a
pipeline
and
it
does
not
make
sense
to
scan
multiple
partitions
to
have
only
one
pipeline
graph.
If
we
you
know
make
pending,
builds,
be
written
to
a
separate
partition.
A
So
it
feels
like
you
know
that
two
mechanisms
are
not
very
like
compatible
right
now,
because
we
might
need
to
rework
cueing.
But
this
is
not
going
to
be
a
simple
problem
to
solve.
B
Yes,
but
that's,
I
think,
that's
ultimately.
The
point
is
that
we
have
different
ways
of
accessing
the
data
and
if
we
choose
partitioning
we
always
choose,
we
always
favor
one
right,
which
might
mean
that
it
could
still
work
for
for
the
other.
But
in
many
cases
it's
just
going
to
break
right
because
it
scans
all
those
partitions.
Well.
A
Not
not
really
because
we
can
have
two
different
tables
like
right
now
we
have
ci
builds,
but
we
can
have
ci
builds
and
use
it.
The
old
way
and
we
can
have
ci,
builds
archived
that
has
a
different
schema,
different
access
patterns
and
it's
basically
something
that
is
kind
of
you
know
separate
from
ci
builds.
We
might
you
know
at
the
end
of
the
month
migrate.
C
And
what
you're
saying
is
100
valid,
but
the
the.
B
C
There
is
that
this
is
unrelated
partitioning,
so
having
multiple
entities
or
marketing
models
that
we
are
going
to
use.
Those
are
valid
solutions
that
we
may
have
to
do
while
also
thinking
about
partitioning,
but
our
feedback
there
is
that
partitioning
is
not
going
to
solve
all
problems.
You
may
also
need
to
have
you
know
materialized
tables
or
secondary
models,
or
what
you
are
saying
so
that
you
can
support
multi
and
different
access
patterns
and
yeah.
A
I
completely
already,
I
think
that
no
one
says
here
that
you
know
partitioning
is
going
to
be
the
only
valid
solution.
I
mean
that
we
need
to
redesign
our
application
in
some
way
or
another.
The
idea
with
the
archive
build
seems
the
most
attainable
to
me,
and
then
you
know
these
are
like
having
separate
table
for
the
archived
builds
is
not
an
orthogonal
problem
to
partitioning,
because
if
we
want
to
partition
the
separate
table,
we
need
to
set
up
and
configure
partitioning
from
the
day
zero.
A
Otherwise
going
back
in
and
partitioning
the
table
once
we
move
all
the
data
in
there
like
it's
going
to
be
again
almost
impossible.
So
if
you
want
to
create
a
ci
builds
archive
table,
we
will
need
to
device
a
partitioning
model
for
it
before
we
actually
start
to
rebuilding
the
application
on
backhand
and
front
end,
because
the
partitioning
model
is
going
to
dictate
the
changes
we
need
to
make
on
backhand
and
front.
C
From
my
from
my
perspective-
and
I
assume
that
others
will
agree
in
this
call
as
a
database
guy,
if
you
could
give
me
a
solution
where
we
only
have,
we
have
a
partitions
that
are
below
50
gigabytes
and
I
will
be
happy.
So
it's
not
like
we
don't
like
partitions.
C
My
tables
small,
as
small
as
possible-
that's
the
best
case.
Possibly
we
I
have
a
fast
indexes.
Lookups
are
the
best,
but
we
have
to
think
about
how
to
manage
to
do
that.
And
then
there
are
a
lot
of
details.
I
want
to
to
address
something
that
jose.
C
Partitioning
multi-level
partition,
I
think
that
we
can
dive
into
those
as
we
move
forward
and
for
sure
there
are
solutions
like
you
know.
You
can
start
by
name
space
and
then
going
a
time
based
or
do
whatever,
but
most
probably
that's
after
we
discussed
the
core
accessing
mechanism.
B
Maybe
coming
back
to
the
blueprint
that
you
described,
I
think,
would
would
be
awesome
to
have
as
a
as
a
sort
of
document
describing
those
things
as
basically
the
how
do
we?
How
do
we
plan
to
access
data
or
how
do
we
access
data
today?
How
do
we
plan
to
change
that?
B
What
is
the
partitioning
strategy
and
how
does
that
help
for
those
cases
and
then
perhaps
also
dive
into
the
retention
strategy?
If
that
is
possible
like
do
we
need
to
keep
data
forever?
Where
do
we
not
do
that,
like
in
the
what
you
described,
we
have
the
archive
table
and
the
new
table
like
there
is
a
retention
strategy
on
a
new
table.
How
that
is
implemented.
A
So
that's
interesting
because
in
my
mind,
all
the
usage
patterns
for
the
new
table
that
is
going
to
hold
the
archive
builds
depend
on
how
we
partition
the
table,
because
how
we
partition
the
table
will
depend
on
the
foreign
keys.
We
can
have
constraints
we
can
have,
and
all
the
caveats
and
limitations
of
partitioning
will
need
to
be
addressed
in
the
application
in
how
we
can
model
access
to
that
table.
A
B
A
Do
have
patterns
for
ci
builds,
but
we
know
that
the
these
patterns
are
so
chaotic
and
so
unpredictable
that
we
cannot
really
describe
them
and
think
about
partitioning
this
particular
table
easily
without
actually
completely
reworking
the
patterns,
and
this
will
rework
what
would
result
in
that.
The
new
table
of
builds
that
are
no
longer
active
or
you
know,
accessible.
A
Processing
pipelines
how
eq
builds
how
we
update
them,
how
you
know
the
processing
works?
What
are
the
dependencies?
What
is
dark,
what
is
like
stage,
and
all
these
things,
like
that's
90
and
remaining
10
percent-
is
perhaps
how
we
visualize
that
and
moving
to
a
new
new
like
table
is,
is
going
to
require
reworking
this
10
percent
of
how
we
visualize
and
the
90
of
how
we
process
pipelines
is
not
going
to
be
changed
because
only
what
is
left
in
the
ci
builds
table
will
need
to
be
processed.
A
A
I
don't
know
if
you,
if
you
understand
what
I
mean,
what
how
it
looks
like
in
my
mind-
but
I
feel
like
you
know,
having
a
separate
table
is
the
only
way
forward
that
will
allow
us
to
iterate
on
that
through
making.
You
know
these
two-way
door
decisions
and.
C
Can
I
ask
something:
how
are
we
going
to
what's
your
plan
about
moving
to
the
archives
debbie,
I
know
that
we
discussed
it
already,
but
can
you,
how
are
we
going
to
move
for
data?
Are
we
going
to
say
that
we
haven't.
A
Going
to
get
background
migration,
I
think
it
needs
to
be
an
in
application
logic.
It's
not
going
to
be
a
migration,
because
we
will
need
to
have
a
worker,
presumably
working
on
being
scheduled
by
some
kind
of
a
chrome
job
or
basically,
you
know
a
crown
worker
that
is
going
to
gradually
find
old,
builds
and
move
them
to
the
new.
A
Table,
okay:
in
that
way,
we
would
actually,
you
know,
wouldn't
need
to
be
concerned
about
using
backward
migrations,
especially
on
premises,
and
this
would
be
this
kind
of
mechanism
that
works
all
the
time
every
day,
every
every
week,
every
month,
it's
actually
moving
old
builds
with.
You
know
this
actually
message
to
a
user
that
such
build
moved
to
the
new
table
is
never
going
to
be
processable
again.
B
A
So
that's
that's
a
very
good
question
and
I
feel
like
data
durability
is
important.
I
I
I
I
in
my
opinion
like
that.
That's
my
personal
opinion.
We
should
not
remove
them.
We
we
can
move
them
somewhere
else,
for
example,
object
storage,
but
they
should
be
there
right
and
we
should
also
make
it
possible
for
users
to
filter
them
by
month
or
by
year
for
the
api.
So
essentially
I
envision
a
completely
new
api
endpoint
for
archive
builds.
A
A
But
ultimately
again
it's
not
an
engineering
decision.
It's
a
product
decision.
We
can,
you
know,
tell
product
managers
what
is
easier
or
better,
but
if
they
tell
us
that,
for
example,
I
don't
know
users
should
have
access
to
enumerating
all
the
bills
they
have
in
one
core
post
or
get
like.
It
means
that
partitioning
this
table
this
way
might
not
be
possible.
So
that's
you
know
again
this
very
interesting
situation,
which
I
believe
we
cannot
really
partition
anything
without
the
input
from
the
product
team.
B
B
A
What
what
is
a
detail
and
what
is
not
in
detail
it?
It's
very
doubtable.
The
button
will
because,
for
example,
people
might
want
to
enumerate
fail,
builds
and
is
a
build
status.
Detail
or
is
it
not?
Is
the
runner
id
detail,
or
is
it
not
like?
I
do
agree
that
no
one
wants
to
find
a
build
by
a
full
text,
search
of
a
script
or
a
yaml
variable
used
right.
This
is
not
something
you
can
do
right
now
and
presumably
never
going
to
be
something
we
will
allow
users
to
do
so.
A
A
We
can
archive
them
in
object,
storage
and
only
make
them
available
whenever
someone
wants
to
visualize
this
particular
build,
but
everything
else
that
we
store
in
ci
build
stable
and
eventually
in
the
cia
builds
archive
table.
It
can
be
considered
as
a
not
a
detail
right.
So
that's
tough,
and
it's
like
this
is
my
realization
that
it's
incredibly
surprising
that
a
ton
of
product
decisions
need
need
to
be
made
before
we
actually
know
how
to
partition.
Something-
and
that's
that's.
You
know
very
interesting
insight.
B
I
just
thought,
maybe
just
as
an
idea
when
you,
assuming
that
normally
people
don't
interact
with
archived,
builds
a
lot
like
this.
Is
you
know
if
we
get
the
timing
right,
at
least
what?
If
you
can
go
into
your
project
and
sort
of
be
able
to
say
like
I
want
to
retrieve
all
my
builds
from
2017,
and
I
accept
that
this
takes
a
couple
of
seconds
minutes
whatever
we
grab
that
from
the
cheap
storage
and
we
import
that
into
the
database,
so
you
can
interact
with
that.
A
This
way
you
cannot
search
them
like
you,
you
cannot
really
like.
You
might
need
to
make
a
api
request,
for
example,
because
I'm
thinking
about
api,
because
people
tend
to
automate
a
lot
of
stuff
for
the
ci,
and
this
way
you
would
need
to
make
a
request
to
the
api
to
tell
gitlab
that
it
needs
to
hydrate,
see
I
build
stable,
given
like
the
time
span,
and
only
then
you
can
filter
that
and
then
after
some
time
it
would
remove.
Data
like
it
feels
complex
and.
A
But
it's
better
to
have
a
simple
solution
done
an
easy
solution
and
keeping
stuff
in
postgresql
feels
still
like
a
simple
solution.
A
B
Also,
the
most
expensive
one
in
a
sense,
that's
what
I
wanted
to
add
about
the
retention
strategy:
you're,
not
not
dropping
data
from
the
database.
That's
I
understand
the
reasoning
for
that,
but
it's
also
a
very
expensive
decision,
and
you
know
we.
B
I
think
we
from
just
from
a
global
perspective
on
the
database,
but
we
still
discuss
two
lessons
retention
strategies
because,
especially
for
free
users,
keeping
data
forever
is
expensive
right
and
it's
not
only
like
the
storage
cost,
but
it's
also
the
like.
We
can
see
the
engineering
side,
that's
a.
A
Good
question,
I
think,
that's
a
very
valid
discussion
and
we
should
have
more
discussions
about
data
durability
versus
you
know
the
cost
of
retaining
everything-
and
this
is
perhaps
discussion
we
should
have
with
product
team
about
you
know
how
to
model
that
for
users
like
because
we
want
him,
we
would
need
to
factor
in
the
plan
that
user
is
on.
Then
it's
going
to
be
it.
A
We,
it
will
need
to
behave
differently
in
on-premises
versus
on
github.com,
and
what
should
we
do
with
companies
that
are
huge
consumers
of
ci,
but
are
on
premises
like
in
in
their
case?
Is
that
like
fine
to
remove
data
or
move
it
somewhere
or
like
you
know,
and
suddenly
it
becomes
a
very
big
and
complex
discussion
about
how
to
model
even
such
a
simple
thing,
as
data
data
retention
for
c
builds
and
and
the
other
I
feel
like
it,
we
cannot
really
remove
or
move
anything
without
having
discussions
like
that.
A
Like
it's
not
expected
to
remove
data
from
database
for
users
running
their
gitlab
installations
on
premises
right,
they
don't,
they
might
not
even
have
object,
storage
configured
because
I'm
not
even
sure
that's
actually
a
required
setting
right
now
so
and
then,
like
is
the
object,
storage
the
place
we
should
move
data
to,
or
should
we
have
a
document
database
for
example,
or
something
like
that
I
can
now
we
start
discussion
about
introducing
a
separate
database
because,
in
my
opinion,
like
object,
storage
is
great,
but
it
might
not
be
the
best
database
out
there.
B
I
just
had
a
similar
thought:
is
it
right
to
think
that
the
archived
builds
data
is
read
only
then
so?
This
is
something
that
never
changes.
Yes
after
it's
been
archived.
Yes,
yes,
so
in
a
sense,
you
could
even
even
argue
that
this
is
sort
of
a
analytical
approach,
or
you
know
we
talked
about
similar
problems
when
you
have
an
analytical
queries,
analytical
data.
B
This
is
mostly
data
that
is
read-only
and
you
access
that
in
a
different
way
and
sort
of
we're
also
arguing
that.
Well,
maybe
we
should
model
for
that
right.
We
should
maybe
have
a
different
different
approach
than
we
are
used
to
with
with
the
application
that
has
has
data
that
is
rewrite
and
changes
all
the
time
which
eventually
goes
to
like
hey.
Do
we
actually
need
a
different
database
to
support
that
better,
even
though
postgres
is
really
good
at
that
too?.
A
Yeah,
so
that
that's
really
interesting,
so
I
I
just
wonder
if
one
action
point
after
this
meeting
could
be
scheduling
a
call
with
a
product
team
member
to
actually
discuss
expectations
around
data
retention
and
archiving
old
builds
like
because
you
know
engineers
we
might
be
surprised
by
what
the
requirements
are.
We
do
not,
you
know,
know
or
understand
all
the
usage
patterns
of
how
big
customers
are
using.
A
A
There
are
things
that
might
be
very
important
for
them,
for
example,
using
api
to
model
like
to
get
data
out
of
gitlab
to
draw
charts
about
their
ci
usage.
How
many
failed
or
successful
blues
there
are
and
stuff
like
that.
So
again,
it's
kind
of
surprising
how
you
know
important
the
feedback
from
the
product
team
might
be
on
this.
B
B
It's
probably
going
to
be
a
lot.
I
think
that's
a
really
interesting
topic,
and
only
starting
to
to
dig
in
to
understand
people.
B
A
What
do
you
think
about
rescheduling
this
call
for
the
next
week?
At
the
same
time,
perhaps
you
know
there
is
this
agenda.
I
will
link
that
to
the
meeting.
Perhaps
I
will
have
more
questions.
Perhaps
you
will
have
more
questions.
Perhaps
I
will
gather
some
feedback
from
the
product
team
and
you
know
it
might
be
actually
interesting
to
be
good.
A
Okay.
So
thank
you
very
much.
I
would
like
to
you
know.
Thank
you
a
lot
for
joining
the
call.
I
think
it
was
very
useful,
like
this
is
a
complex
topic
and
we
need
to
you
know,
align
our
perspectives
and
expectations
and
hopes,
and
eventually
we
might
be
able
to
actually
devise
a
good
strategy
because
I
still
feel
like
my
idea
about
the
separate
table
is
just
an
idea
and
I'm
not
a
database
expert,
and
I
don't
pretend
to
be
so.
B
Yeah-
and
it's
also
the
other
way
around
for
sure
this
is
kind
of
kind
of
the
typical
problem
that
we
have
as
a
database
group.
We
know
very
little
compared
to
you
about
about
ci
domain
and
how
it
works
and
let
alone
any
product
considerations.
So
we
can,
you
know,
I
think
we
understand
our
partitioning
works
or
you
know
how
to
look
at
that
data
model
and
maybe
how
to
approach
it,
but
we
wouldn't
be
able
to
talk
about
anything
cfci
related
without
that
input.
So
that's
why
it's
great.
A
Thank
you
will
schedule
the
next
call
and
thank
you
very
much
and
have
a
great
day.