►
From YouTube: 2021-04-19 Database Scalability Working Group
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So
we
can
go
ahead
and
get
started.
This
is
the
database
scaling
working
group
today
is
april
19th.
A
I
have
item
agenda
1a,
which
is
basically
the
db-12
upgrade
from
this
weekend
was
delayed,
so
we're
shifting
responsibilities
around,
and
I
have
an
mr
there
basically
to
take
over
facilitator
responsibilities,
at
least
for
the
next
month.
Eric.
I
think
I
assigned
you
as
a
reviewer,
though,
if
you
want
to
merge
it
feel
free
or
give
me
an
approval,
because
I
think
that's
what
you
were
asking
me
on
friday
for.
B
Sure,
yeah
I'll
I'll
get
that
merged
for
you
thanks
thanks
cool,
you
got.
B
Yeah
from
the
daily
stand
up
that
we
do
immediately
before
this
meeting,
I've
got
a
question
I'm
asking
about
how
much?
How
much
time
do
we
think
we
just
put
ourselves
the
consensus
right
now
is
we
feel
like
about
six
months
or
october,
15th
we're
sort
of
out
of
database
capacity
again.
So
we
have
some
ways
to
update
that
estimate.
B
But
really
you
know
time
for
analysis
is
kind
of
done
and
we
have
to
go
with
our
best
guess
to
make
sure
we
focus
the
majority
of
ourselves
on
execution,
so
it
feels
like
we
should
be
launching
the
first
aspect
of
starting
by
the
end
of
this
upcoming
quarter,
which
means
july
31st,
and
that
could
mean
a
number
of
different
things.
B
We
have
to
figure
that
out
as
quickly
as
possible,
it
could
be
a
sharding
shim
that
is
aware
of
only
one
shard,
but
does
no
harm
and
then
would
allow
us
to
melee
chart
after
it
might
be
something
that
relieves
some
pressure
off.
The
name
name
spaces
table
or
something,
but
it
feels
like
that's
a
ambitious
date
to
shoot
for
and
then
working
back
from
that.
The
question
be:
could
we
actually
pick
our
sharding
key?
B
C
Yes,
I
I
guess,
like
I
think,
like
I
mentioned
a
few
things,
I
just
see
like
like
two
main
ways
of
like
how
we
can
approach
our
database
credibility
program,
I'm
just
not
sure
like
which
one
of
them
does
is
easier
and
faster
to
execute.
I
mean
ultimately
like
we
likely
need
both
of
them
to
be
implemented.
C
As
for
the
timeline
and
starting
key,
I
think
my
perception
is
like
I
don't
want
to
rule
out
any
of
the
approaches,
so
I
think,
like
committing,
maybe
be
pregnant,
but
I
think,
like
my
goal
to
like
to
solve,
is
to
iterate
as
quickly
as
possible,
like
in
the
next
weeks.
It
could
be
like
you
can
define
these
to
be
four
weeks
or
whatever
other
number.
C
Also
how
github
would
behave
in
these
different
approaches
to
actually
validate
the
most
sensible
approach,
and
once
we
have
that,
I
guess
it's
gonna
be
much
easier
to
also
figure
out
iterations
because,
like
you
are
like
you
propose
like
this
knight
of
the
end
of
the
q2
like.
If
we
pick
one
of
these
approaches,
like
at
least
for
the
application
sliding,
I
probably
have
like
some
good
iterations.
That
would
allow
us
to
relieve
the
pressure,
but
I
think,
like
one
of
the
challenges
right
now
is
like.
C
There
is
like
a
lot
of
complexities
related
to
how
we
start
our
application
and,
to
be
honest,
I'm
I
don't
know
like
if
the
next
is
gonna
be
the
easiest
to
do.
I
I
don't
know
like
how
it
turned
out
with
others
to
do
so.
I
guess,
like
my
perception,
is
still
that,
like
that
it
appears
like
the
most
sensible,
but
I
want
to
validate
them
quickly
and
we
gonna
have
like
people,
starting
basically,
they
are
starting
this
week.
C
How
we
can
use
database
level
partitioning
how
we
can
use
application
level
partitioning,
because,
because
each
of
them
has
a
lot
of
associated
complexities
and
then
maybe
we
will
be
able
to
very
quickly
show
how
actually
github
runs
on
these
approaches.
So
we
could
actually
like
test
that
and
so
on,
onto
like
understanding,
which
of
them
gonna
be
easiest
to
implement
and
gonna
give
us
the
best
iteration.
C
So
I
think,
like
that,
locking
us
to
picking
the
sharing
king
so
quickly
may
lock
us
on,
like
on
the
on
the
better
approaches-
and
I
I
didn't
want
ask
a
lot
on
that.
I
think
like
I
want
us
to
understand
like
how
these
different
approaches
like
connect
with
each
other
and
pick
the
one
that
gives
us
like
the
most
scalability
room
because,
like
even
if
we
pick
the
top
level
group
right
now.
C
If
you
look
at
my
iterations
like
we're,
not
gonna
migrate
data
from
the
existing
database
right
away,
it's
gonna
be
like
a
theme,
lengthy
process.
It's
we
still
gonna
have
the
problem
with
the
current
database.
There
is
like
a
lot
of
complexity,
how
we
can
online
migrate.
This
data,
which
can
vary
on
writing
some
additional
extensions
to
performance,
logical
replication
or
similar
stuff,
so
even
like.
If
we
picked
the
top
level
charging
today
it,
it
may
not
be
the
like
the
the
single
answer
to
say
that
we're
gonna
have
a
headphone
in
six
months.
C
It
may
mean
that
in
the
six
months
we're
gonna
be
able
to
start
writing
new
information
and
significantly
reduce
the
amount
of
the
new
data
coming
to
the
current
database,
but
it
will
not
magically
move
the
data
from
the
current
database.
Yet
so
I
think,
like
is
it's
like.
I
want
to
evaluate
these
solutions.
I
want
to
evaluate
them
from
their
perspective
like
how
quickly
we
can
interact,
how
easy
we
can
and
reliably
do
it
on
github.com
to
not
introduce
more
problems,
but
also
like
how
well
it
answers
like
that.
C
Your
estimates,
because,
like
six
months,
I
guess
like
if
we
continue
growing
data
in
the
current
form,
we're
gonna
hit
limits,
but
if
we
start
studying
by
the
top
level
group-
and
we
cannot
efficiently
move
data
between
starts
by
the
time
we
still
gonna
hit
that
limits,
and
it's
not
gonna
solve
that
problem.
C
So
that's
why
I
kind
of
mentioned
these
two
alternative
proposals.
Like
these
two
proposals,
one
is
like
we
have
database
partition.
We
know
that
some
tables
are
big.
Maybe,
instead
of
maybe
we
can
approach
this
table
to
make
them
have
the
characteristic,
how
they
behave
slightly
differently,
and
maybe
exactly
that
also
kind
of
concurrently
to
leave
this
headroom
of
the
current
database
and
have
this
like
another
aspect
that
we
are
working,
which
is
more
like
the
application
that
we're
starting?
C
That's
gonna,
give
us
like
the
a
lot
of
headroom
and
much
more,
even
performance
of
all
the
current
github,
also,
all
the
new
coming
people
to
the
github
in
the
six
months
from
now.
So
my
perception
is
like
it's
very
likely
that
there
is
no
like
single
solution
for
that
problem.
I
think
we
need
to
write
in
which
of
these
solutions
is
like.
It's
give
us
like
the
most
health
room
in
that,
given
time
frame
that
we
mentioned.
B
So
I
think
the
the
counter
argument
is
one.
We
can't
allow
like
analysis
and
execution
to
kind
of
go
past
that
date
where
we
think
we're
going
to
run
out
of
capacity.
So
we
we
can
base
the
schedule
off
of
like
what
we
think
the
most
responsible
way
to
make
the
decision
is,
but
at
the
end
of
the
day
we
have
to
fit
this
in
the
time
box.
So
there
has
to
be
a
solvable
solution
that
fits
in
the
the
time
and
resources
and
people
we
have
available.
B
Two,
the
outside
feedback
we've
got
from
like
postgres
ais
and
we
have
in
our
glossary,
functional
decomposition
and
sharding.
They
say
the
big
companies
they
work
with.
Do
both
just
pick
one
and
do
it
as
quickly
as
possible,
and
if
it's
about
plan
a
versus
b,
if
both
are
better
than
where
we
are
today,
then
pick
one
and
and
and
do
it,
and
maybe
we
need
to
to
do
the
other
plan
after
the
fact.
But
we
can't.
B
I
can't
allow
the
scenario
where
we
we
analyze
and
we
wait
and
we
delay
and
we
run
out
of
capacity
again.
So
we
need
a
bias
for
action,
so,
if
you
can,
I
want
to
understand
camille
like
in
in
the
world
you're
putting
forward
like
what
is
the
work
back
schedule
like
when
do
you?
When
do
you
think
that
decision
is
made
and
can
be
executed
against,
I'm
open
to
seeing
that.
C
Okay,
I'm
hoping
that
we
can
really
make
this
decision
like
in
the
next
four
weeks,
basically
and
stick
to
it,
because
I
assume
that
it's
gonna
be
enough
time
to
to
find
all
the
complexities
of
the
tools
that
we
need
to
do
and
it's
after
the
reasonable
understanding
how
the
situation
should
look
like
going
back
to
your
like
to
your
question.
C
I
think
it
may
be
even
like
more
sensible
to
do
both
of
them
on
the
single
time,
but
I'm
not
sure
like
if
we
can
do
both
of
them
with
the
same
group.
We
cannot
but
like
the
cibs
is
like
something
that
needs
to
be
worked
anyway
by
the
cia
team.
So
maybe
the
the
the
like
another
aspect
is
like.
C
C
So
maybe,
instead
of
like
doing
that
in
the
sequence,
maybe
we
should
enforce
this
information
like
enforce
ourselves
that
we
actually
need
to
work
on
them
right
now
and
then,
like
we
can,
as
part
of
the
starting
group,
focus
on
how
we
can
start,
maybe
by
the
top
level
abandon.
But
then
the
other
group
can
focus
how
we
can
partition.
B
I
was
going
to
say
I
like
that
plan
where
we're
yes,
we're
doing
some
analysis,
but
in
parallel
we're
starting
executing
one
or
maybe
two
other
tracks
that
feels
much
more
comfortable
and
then
we
can,
if
we
happen
to
choose
the
wrong
track
based
on
our
instincts.
Now,
the
analysts
can
tell
us
a
month
from
now,
but
we're
not
waiting
we're,
not
pausing
on
taking
taking
action.
That
feels
a
lot
more
comfortable
to
me.
A
And
just
to
be
just
to
be
explicit
camille
when
you
said
talking
about
ci,
if
there's
a
particular
area
or
group
that
we
need
to
basically
say
hey
part
of
your
requirements
now
is
to
go
and
belentus,
we
can
not
have
the
the
sharding
team
working
on
that.
We
could
have
that
team
working
in
parallel
to
that.
So
that's
like
like
that
is
totally
open
for
us
to
do
so
like
if
we
need
to
try
as
an
example.
C
So
I
guess,
like
the
sooner
we
start,
the
faster
like
we're
gonna
have
much
more
headroom
and,
like
my
biggest
worry
about
the
application
sharding
like,
I
just
see
a
lot
of
complexities,
and
I
just
worry
that,
like
we
say
that
it's
gonna
reduce
the
pressure
on
the
github.com
that
main
database
in
the
six
months
it
may
reduce.
But
it's
not
gonna
be
something
that
is
gonna.
C
Be
immediate
because
there's
like
always
a
lot
of
back
and
forth
and
moving
data
around
to
rebalance
and
like
rebalancing,
is
like
very
important
aspect
to
do,
and
it's
really
risky
thing
to
do
on
the
living
database.
That
is
constantly
being
written
to.
So
I
think,
like
with
the
application
starting.
C
If
this
is
the
approach
that
we
pick,
we
definitely
can
get
to
the
point
in
the
six
months
that
very
likely
the
new
projects,
like
the
two,
the
new
top
level
groups,
will
start
we've
written
to
the
new
database,
but
it
still
doesn't
resolve
the
problem
of
the
current
database
and
its
shield
size.
It's
still
not
gonna
go
away,
so
whatever
solution
we
take
to
make
this
problem
smaller
today
during
this
time
frame,
is
gonna,
give
us
more
hyphen
from
both
sides.
Basically,.
D
D
I
don't
believe
that
the
12-day
time
frame
is
enough
to
do
that,
but
I
do
think
that
we
need
to
reach
these
decisions
as
quickly
as
we
can
and
as
iteratively
as
we
can,
because
I
completely
agree
we
can
probably
analyze
this
for
another
six
months
and
not
actually
get
anywhere.
I
don't
believe
that
is
the
intention.
Camille
has
already
produced
a
couple
of
proposals
that
will
allow
us
to
move
into
that
direction.
D
I
think
the
team
that
can
actually
execute
on
them
is
being
formed
this
week
and
I
think
we
will
work
as
fast
as
possible
to
make
sure
that
that
happens.
I
think
the
other
thing
to
highlight
here
I
think
christopher,
said
this
as
well.
There
may
be
other
dimensions
that
can
happen
in
parallel.
Right,
sharding
is
one
of
the
scaling
patterns
right.
E
D
E
E
D
Yeah
and
camille
correct
me
if
I'm
wrong,
but
I
think
that
is
also
the
approach
that
we
we're
going
to
go
for
right.
We
have
approaches
that
we
want
to
evaluate
and
understand,
and
those
will
then
influence
the
decision
that
we're
going
to
make.
C
C
Second,
gitlab
is
very,
very
complex
on
its
data
model
and
not
all
of
the
approaches
we
fit
we're
gonna
need
to
discover
exactly
which
parts
of
the
application
will
be
broken
and
the
third
we
need
to
actually
have
something
that
is
easy
for
people
to
use
later,
knowing
that
the
sharding
is
being
used,
which
is
like
also
part
of
like
our
ability
to
iterate.
So
that's,
why,
like?
C
We
need
to
model
these
to
understand
how
complex
they
are,
what
the
problems
they
have
and
ensure
that
we
very
well
understand
understood
exactly
how
they
fit
into
the
current
gitlab.
C
I
mean
like
I'm
just
kind
of
for
it.
There
is
like
sharding
on
the
application
level.
I
know
that
different
companies
do
it
differently.
I
have
my
personal
bias
on
the
solution,
but
I
also
want
to
ensure
that,
like
others
can
provide
like
some
insightful
feedback
on
that,
so
like
distributing
this
prototyping,
where
we
actually
give
us
like
different
perception,
how
different
people
see
these
approach,
and
maybe
someone
gonna
propose
something
more
much
better.
I
think
it's
essential.
C
I
really
like
to
time
box
that
like
to
not
make
us
like
to
do
it
for
months.
I
think,
like
it's
reasonable,
to
think
that
we
spent
on
these
four
weeks
at
the
end
of
this
time.
We
stop
whatever
we
do.
We
did
it.
We
just
pick
and
like
stick
with
like
the
best
solution
out
of
that
to
continue
actually
like
implementing
that.
So
I
guess
my
perception
is
like
two
time
box.
That's
very
heavily.
B
So
sid
made
a
concrete
proposal,
though
I'm
not
quite
hearing
like
like
a
yes
or
no
in
answer
to
this
question,
so
I
think
his
proposal
is
go
their
instincts
now,
rather
than
the
output
of
any
to
be
determined.
Analysis
start
prototyping
the
name
space
thing
and
then
do
the
analysis
and
maybe
the
maybe
the
analysis
tells
us
it's
something
else,
and
then
we
change
the
prototype
or
we
add
a
prototype
thread
or
something
like
that.
But
why
not
start
tomorrow
on
the
namespace
prototype.
C
A
Think
I
think
the
short
answer
is:
is
that
we're
trying
to
build
confidence
and
whether
we'll
have
confidence
in
12
days
or
whether
we
build
towards
the
confidence
in
a
month
is,
is
what
we're
talking
about
here,
but
I
think
we're.
If
you
said
today,
what
are
we
doing?
We're
committed
to
name
space
until
we
can
find
something
that
breaks
us
there's
another
way
to
say
it
from
that
objective.
But
that's.
A
Move
on
so
sid
had
a
question
about
follow
our
elasticsearch.
The
team
did
a
short
write-up
associated
with
it.
I
just
kind
of
put
in
my
read
on
it
around
it.
This
decision
was
made
largely
three
years
ago.
It
looks
like,
or
two
and
a
half
years
ago.
I
didn't
have
necessarily
great
reasoning
in
the
merge
request,
but
you
know
the
answer
that
I've
heard
is
is
that
it
was
the
smallest
unit
smaller
unit.
So
then
it
was
basically
the
bin
packing
problem.
A
If
you
have
smaller
units
than
in
theory,
you
can
pack
better
associated
with
that
and
the
the
other
aspect
that
I
saw
was
that
elasticsearch
is
not
a
code
change
as
much
as
it's
a
configuration
change.
From
my
perspective,
though,
I
know
it's
not
that
trivial.
E
Okay
in
the
issue
created
a
week
ago,
it
gives
two
questions
and
both
relate
to
gitlab
region.
E
I'm
missing
the
kind
of
the
approach,
and
so
maybe
that's
missing
for
a
good
reason,
and
I
only
read
the
description,
but
I
assume
that
if
we
now
do
per
project,
if
you
do
a
query
across
a
group,
you
have
to
combine
these
results
in
gitlab
the
application,
and
that
will
give
you
lower
fidelity
results
than
if
they
would
have
been
combined
in
elasticsearch.
A
A
E
Work
yeah,
but
it's
also,
it's
also
now,
certainly
a
lot
less
important
to
change
it,
because
I
think
I
understood
this
as
starting
between
elasticsearch
clusters,
but
I
understand
it
now
that
it's
starting
between
elasticsearch
something
else.
F
See
it's
to
be
clarified
now
we
only
run
one
cluster.
So
if
we
are
going
to
change
to
multiple
cluster
approach,
then
we
have
to
do
work
on
application
level.
So
now
I
think
the
topic
is
not
new.
You
have
you
created
that
issue
like
two
years
ago
and
the
team.
We
also
had
a
similar
issue
that
created
a
few
months
back
to
investigating
whether
shouting
per
namespace
we'll
get
our
better
search
performance,
especially
on
group
level
search.
F
E
F
Yeah,
so
we
have
been
doing
performance
in
different
perspective.
For
example,
the
ongoing
performance
that
we
have,
where
we
have
been
doing
right
now,
is
to
split
the
data,
not
by
name
say
that
by
the
data
type
before
we
have
everything
in
one
single
index
now,
with
rhythm
in
code
issues,
merge
request
nodes.
So
after
we
separate
this
like,
if
you
do
the
let's
say,
if
you
search.
F
Yeah
still
one
cluster,
but
the
issue
search
even
on
the
group
level
is
much
faster
than
before,
and
this
just
like
a
two
codes
this
morning
we
just
have
to
receive
a
feedback
from
one.
I
think
it's
a
big
customer
that
I
don't
name
it,
because
it's
going
to
be
a
public
recording,
it
says
after
they
re-indexed
their
whole
cluster.
F
It
is
working
at
its
target
searches.
Such
risks
are
fun
and
latency
is
way
better
now
and
everything
important
to
the
to
total
housing.
So
just
like
example,
of
some
good
results
from
the
recent
performance
improvements
on
the
search.
E
Okay
thanks
thanks
for
that,
my
misunderstanding,
and
then
I
no
need
to
re-evaluate
the
starting
key
or
stuff
like
that.
This
is
all
fine.
I
didn't
understand.
They
were
that
the
cluster
was
still
on
the
same
level.
I
thought
they
were
separate
clusters
thanks
thanks.
A
So,
in
the
interest
of
time
I
want
to
be
respectful,
I'm
going
to
jump
past
what's
been
done,
because
those
I
think
are
mostly
to
read
eric.
I
want
to
get
to
your
comment
on
4c
just
kind
of
organizationally:
did
you
want
to
move
to
a
flat
structure.
B
Yeah,
I
noticed
it
last
meeting
and
in
past
meetings
when
we
have
this
highly
opinionated
agenda
structure.
Inevitably
something
important
doesn't
quite
fit
into
it
or
things
lead
to
discussion
as
they
did
today.
And
so
I
wonder
if
just
a
simple
flat
list
and
use
the
facility
are
free
to
determine
what
the
most
important
stuff
is
we
should
be
talking
about.
It
tends
to
work
a
little
bit
better
than
trying
to
cram
your
thing
in
somewhere
and
we
don't
quite
get
to
it
later
and
whatnot.
A
Okay,
so
we'll
move
the
flat
structure
and
I'll
make
sure
to
organize
it
in
a
fashion
that
gets
the
most
critical
items.
Hopefully,
at
the
front,
though,
looking
for
feedback
from
sit
and
eric
on
if
something
needs
to
potentially
be
reorganized,
and
then
I
just
want
to
cover
what's
happening
next,
as
we
mentioned,
there's
a
kickoff,
it's
happening
on
wednesday,
chen.
A
I
don't
know
whether
it's
worth
doing,
but
it
may
be
worth
looking
to
see
if
we
can
potentially
move
it
to
tomorrow,
just
because
we
do
have
a
wednesday
working
group
session,
so
it
seems
like
it'd,
be
better
and
then
the
other
question
I
had
is
it
feels
like
the
other
thing
we
want
to
make
sure
that
happens.
A
Next
is
kind
of
our
first
prototype
plans
and
if
you
know
right
now,
it's
just
name
space,
then
that's
fine,
but
at
least
we
have
to
list
it,
though
I
think
camille,
when
you're
talking
about
prototype
playing
you're,
also
talking
about
time
decay,
which
we
also
need
to
do
as
well.
Just
from
that
perspective,
that's
kind
of
independent
of
the
starting
discussion
point.
So
sound,
reasonable
or
folks
have
feedback.
E
I'll
try
to
see
if
I'll,
try
to
see
if
we
can
move
that
sync
meeting
to
tomorrow
morning
have
a
conflict
with
the
database
team
meeting,
but
we
can
probably
use
the
same
same
time
because
they're
both
database
related
and
moving
to
a
cadence.
This
meeting,
the
sync
meeting
will
we
are
trying
to
explore
some
other
natives
because
the
team
is
distributed.
E
A
Cool
all
right:
do
you
think
at
least
having
what
we
have
so
far
in
the
prototype
plans
on
wednesday?
Just
so
that
we
can
kind
of
articulate
here.
Are
the
plans.
E
A
All
right
all
right,
I
don't
think,
there's
anything
else
unless
there's
any
other
questions
all
right.
Thanks.