►
From YouTube: Review of plan to partition events table
A
B
A
Okay
cool,
so
so,
basically,
the
the
root.
The
root
issue
is
that
the
events
table
is
Big
very,
very
big.
A
It's
about
one
terabyte,
I
think
so
quite
big
yeah
try
and
find
this
one
Okay.
So
in
this
issue
we
have
all
the
data
about
this.
This
table.
It's
it's
used
in
your
user
activity,
project
activity,
group,
activity
contribution
and
Analytics
yeah.
Those
are
the
few
places
is
used
and
it
looks
roughly
like
this.
A
So
the
the
main
use
case
is
in
this
calendar
here
and
activity.
So
if
you
click
on
one
of
the
cells,
it
will
show
you
the
activity
for
that
day
and
we
only
show
one
user
for
data.
A
So
you
can
see
you
can
see
later
why
this
is
important.
Basically,
it's
all
based
on
date
based,
and
we
only
show
about
one
years
of
data
I'm
sure
I
think
like.
If
you
keep
scrolling,
you
can
see
more,
maybe
but
yeah
I
think
it's
really
hard
to
scroll
one
user
data
IV
a
lot
yeah.
So
that's
that's
the
events
table
and.
B
A
A
sidekick
there's
a
background
worker
that
removes
oh
data.
That's
that's
three
years
old,
but
it's
very
inefficient.
It
only
removes
10
000
events.
At
a
time
it
will
take
a
20
years
to
remove
all
the
data.
B
A
No,
we
just
delete
it.
No,
that
was
the
decision
that
was
made
years
ago,
like
from
this
almost
a
very
startled
github.com
I.
B
B
A
It
it's
all
one
year
and
we're
an
issue
one
year,
data
here
right,
so
yeah
there's
the
idea
anyway.
A
So
anyway,
because
this
table
is
quite
large,
we
want
to
kind
of
figure
out
how
to
reduce
the
size
of
the
table,
because
a
very
big
table
like
this
is
very
hard
to
write
to.
It
also
costs
money
to
store
as
well
a
few
thousand
dollars
in
mind.
But
you
know
it's:
it's
not
trivial
and
it
also
very
slow
to
read
now
as
well.
A
I
think
the
crazy
thing
is
that
it's
one
and
the
the
dot
high
is
only
100
gigabytes
but
because
it's
so
slow
to
read,
we
need
like
10
times
the
amount
of
indexes
just
to
read
it
efficiently,
so
yeah.
So
one
of
the
one
of
the
great
things
about
I,
don't
know
how
much
do
you
know
about
partitioning.
B
I
know
the
high
level
it's
a
postgres
feature
right.
Yes,
yes,
you
can
also
implement
it
on
the
application
Level,
but
I
think
we
use
postgres.
B
So
you
set
up
from
rule
yes
and
then
most
common
use
is
something
related
to
some
timestamp
or
a
date.
That's.
B
Right,
yeah
and
then
it
will
create
like
tables,
but
they
are
transparent
for
the
application.
Yes,
so
it
is
basically
an
how
do
you
say
like
an
interview,
the
main
the
main
table
will
still
be
there.
Yes,
yeah
I
think
it
will
be
renamed
and
then
a
new
table
will
be
added.
Yes,.
A
B
A
Right,
yeah
yeah,
correct,
yeah,
yeah
yeah.
It
is
absolutely
correct,
so
it's
been
available
since
postgres,
10,
I,
think
and
in
the
progressively
added
more
and
more
features.
So
since
post
Quest
travel
is
pretty
complete
now,
which
is
very
nice,
I
think
when
the
first
introduces
we
introduced
it,
it
didn't
support
like
one
keys
and
things
like
that
so
made
it
really
hard
to
use.
A
Oh,
yes,
that's
the
idea,
so
so
the
most
obvious
solution
for
you
for
the
events
table
is
to
partition
it
so
behind
the
sensors,
we'll
have
events
for
so
the
events
partition
for
December
2023
or
the
events
partition
for
January,
February
and
so
on
so
by
by
month
by
year.
It's
not
enough
because
if
we
split
one
terabyte
by
by
five
years,
say
it's
too
too
big.
So
the
the
part
each
partition
is
still
too
big
and
then.
B
Growing
there
will
be
more
events.
Oh.
A
B
A
So
who
knows
that
and
then
there
is
a
guide,
that's
very
helpful
that
the
database
team
has
created.
So
I
am
mostly
following
that
guide.
So
it
is
step
one
two,
three
four,
so
they
are
correspondingly
two
issues
that
I've
created,
because
this
is
one
issue
for
release,
one
and
two
step,
one
and
two
and
then
the
other
issues
for
that.
A
Yeah
there's
a
lot
of
subtleties
here:
oh
yes,
let's
talk
about
other
things,
so
other
things
that
we
could
try
besides
partitioning
is
somehow
drop.
Some
of
these
indexes,
because
this
is
80
of
the
table,
is
indexes
but
I
think
we
need
all
the
indexes
so
well.
B
B
That's
an
example
is,
for
example,
where
your
cursorber
is
now
been
one
line
up
that
with
the
long
name,
this.
A
B
B
A
Still
have
hundreds
of
gigabytes
so
yeah
those
are
some
of
the
things
that
I've
considered
but
yeah.
The
other
thing,
that's
that's
a
lot
more
drastic
is
that
this
is
a
polymorphic
type
table.
It
actually
stores
different,
like
10
different
types
of
events.
Right
installers,
push
events
bit,
stores,
project
events
like
I,
create
a
project
or
I
create
a.
A
Stores
like
project
membership
events
right,
so
they
all
have
slightly
different.
Like
logic.
B
B
A
B
It
doesn't
give
you
that
much
because
we
we
do
we
do
not.
The
access
pattern
is
across
all
these
Target
types
right.
We,
we
just
show
all
events
for
a
given
user,
a
different
project,
yeah.
B
A
Would
be
yeah
if
you
want
to
build
that
query
it'll
be
like
yeah
this
one!
You
need
that
one,
this
one
unit
or
on
an
order
by
created
kind
of
expensive
yeah
I
mean
we
could
we
could
do
something
like
that.
We
just
need
to
change
the
UI
ux.
You
know
to
something
different
I
think
this.
This
doesn't
show
yeah.
This
isn't
sure.
A
But
if
you
go
to
say
the
project
activity
close
set
for
no
reason
you,
you
start
to
see
that
we
actually
have
like
different
English
status,
show
by
Stephen
event
types
already,
but
then
there's
this
passkey
or
thing.
B
A
B
Another
read
performance,
optimization
is
to
denormalize
into
based
on
the
usage
patterns,
so
you
will
have
project
Events,
maybe
that
that
is
accessed
on
the
using
project
ID
and
you
can
have
another
table
like
user
events
or
something
that
is
used
for
the
my
activity.
Feed,
yes,
but
that
is
that
is
actually
actually
duplicating
data,
because
an
event
can
be
related
to
different.
A
B
That
is
something
that
could
be
done,
but
it
will
Inc.
It
will
basically
triple
your
storage
demand.
B
And
and.
A
B
A
To
use
a
quick
house,
but
quick
house
is
not
available
for
self-managed
yet
so
oh.
A
B
A
Medium
term
solution
is
to
is
to
partition
it
as
far
as
I
can
see,
but
there
is
still
a
long-term
problem
where,
if
your
disc
keeps
growing
I
think
even
partitioning
doesn't
keep
up
right.
So
why
do
you
need
to
start
removing
data
or
archiving
data,
or
something
like
that.
A
Yeah,
so
when,
when
we
need
to
delete
it
or
move
it
move
the
partition
to
somewhere
else,
it's
very
easy
right.
Yeah.
A
Okay,
so
so
what
I've
done
is
that
I
created
issues
to?
Firstly,
so
how
how
they,
how
they
recommended
we're
doing
is
that
we
create
a
copy
of
the
table
and
the
copy
is
already
partitioned.
So
we
set
up
the
partition
and
there
is
rails
code
to
automatically
create
the
required
partitioned
tables
and
then
the
second
step
is
to
copy
the
data
from
the
old
table
to
the
partitioned
events
table.
A
A
Will
take
40
days
instead,
who
knows
which
is
nice,
so
that's
that
can.
B
We
also
like,
since
at
some
point
we
will
delete
the
old
data
that
we
only
create
use
partitions
for
newly
add
data.
A
B
A
Is
that
while
there's
back
view
is
going
on
I
think
there's
it
also
creates
a
trigger
to
whenever,
whenever
data
is
inserted,
a.
B
A
It
as
well
to
to
yeah
yeah,
so
so
it
does
that
and
then
the
second
thing
is
that
what's
this,
this
is
step
three
to
clean
up
to
just
throw
away
the
background
my
equation
and
then
step
four
is
to
basically
swap
swap
the
two
tables
around.
So
that's
the
thing.
A
Okay,
let's
go
back
to
here.
So
the
problem,
the
thing
with
partitions
the
trade
over
partitions
is
that
yes,
it's
good,
but
if
you
want
to
query
data
across
two
partitions
say:
I
want
data
from
January
and
I
want
data
from
February.
It's
very
inefficient
to
do
that.
A
So,
ideally,
every
single
query
that
we
have
has
the
created
ad
thing
inside
the
inside
the
query
or
what
I
call
the
partition
key.
A
So
there's
a
few
ways
of
getting
all
the
queries
that
the
application
generates
for
this,
and
it
mostly
has
the
Creator
add
thing,
but
so,
but
not
everything.
So
one
of
the
one
of
the
issues
that
I've
created
is
to
go
change.
All
the
queries
to
to
having
to
create
it
at
inside.
B
Yeah
yeah
and
I've
also
used
in
the
past
a
index
on
a
function.
You
can
do
that
in
postgres,
because
time
stamp
indexes
are
huge
because
almost
all
values
in
the
in
the
creates
that
are
unique.
You
know
there
are
micro
circuits.
A
B
That
index
is
updated
when
you
insert
or
update
a
record
yeah,
and
then
I
could
query
that
to
have
a
very
small
index,
for
that
would
give
me
everything
for
a
certain
date
and
then
I
would
you
know
narrow
it
down
on
something
so
right,
like.
B
Index
becomes
more
yes,
small
and
faster
wow.
A
Yes,
yes,
that
is
true,
I
think
the
good
news
is
that
postgres
does
this
automatically
for
us,
which
is
really
really
nice,
not
sure
about
indexes,
because
I
think
you
need
to
have
the
partition
key
as
part
of
your
primary
key.
So.
B
A
So
your
primary
key
now
becomes
ID
and
created
that
so
it's
already
part
of
the
index
primary
key
but
I
think
what
what
postgres
does
is
that
it
takes
the
created
ad
and
then
it
in
the
background.
It
does
something
like
an
index,
but
it's
not
really
index
I
can't
remember
what
they
call
it
like.
It
converts
that
into
like
the
the
hash
of
the
partition
that
it's
looking
for.
A
A
So
we
don't.
We
don't
need
to
have
special
indexes
for
that
to
look
for
the
right
partition.
If
you
get
what
I
mean
yeah
I.
A
So
we
don't
need
to
create
another
additional
index
for
that
postgres
naturally,
does
that
already,
which
is
super
nice
yeah?
A
The
good
news
about
all
these
queries
is
that
I
think
a
lot
of
queries
already
go
order
by
events.id,
so
it
should
be
quite
easy
to.
Hopefully
it's
quite
easy
to
convert
the
order
by
to
autobuy
e
events.created
at
comma
events.id.
A
A
That
query
C,
oh
yeah,
that's
such
a
very
small
link
here.
I
should
probably
update
this
because
since
I
mean
now,
let's
let's
off
or
is
to
it's
not
very
clear,
okay,
there
we
go.
A
Okay
you're,
following
so
far:
okay.
B
A
So
there
is
another
table
which
is
a
a
for
that
tables.
Foreign
key
refers
to
the
events
table,
but
that
one's
only
200
gigabytes.
So
let's
not
talk
about
that.
A
Let's
talk
about
the
complex,
the
the
last
complexity
here,
which
is
that
they
we
have
a
Unix
and
unique
index
on
on
the
events
table,
but
a
unique
index
doesn't
have
created
at
so,
which
means
that
postgres
will
not
allow
this
unique
index
when
we
partition
it.
It's
just
part
of
the
rules,
I
guess
it's
too
expensive
to
have
a
weird
unique
index
like
that.
So
I
think
we
we
can't
drop
it
because
we
will
still
need
it.
A
So
I
think
what
we
need
to
do
instead
is
to
do
this,
which
is
move,
move
some
of
the
data
into
a
normalized
table,
and
then
we
can
keep
querying
that
data.
It's
I
think
this
is
this
system
the
most
complicated
bit,
I
think
the
other
bits
are
slow,
but
yeah,
but
it's
too
manageable.
This
one
I,
don't
know!
B
On
this
one,
this
is
like
to
guarantee
uniqueness.
A
I,
don't
know
what
it
does
exactly
like.
Yeah,
maybe
I
think
it's
because
like
Wiki
Pages
like
when
we
create
a
Wiki
page
it
it
also
checks
into
repo,
and
it
does
it's
slightly
differently.
So
yeah
I'm
not
sure.
A
Okay,
anyway,
any
what.
B
Yeah
and
are
there
any
since
we
will
create
a
new
table
for
the
events?
Yes,
are
there
any
other
opportunities
for?
Because
now
we
start
with
a
new
table,
there's
also
a
chance
to
maybe
optimize
something.
A
Yes,
so
we
can
reorder
the
we
can
reorder
the
column
order,
yeah,
so
I've
linked
I've
put
it
as
a
note
in
one
of
the
the
issues
when
we
create
a
copy.
So
the
current
Table
order
is
not
not
efficient.
It's
it's
waste.
It
weighs
24
bytes
per
row,
so
24
bytes
times,
3
billion.
B
A
It's
a
few
few
gigabytes.
So
if
we
can
save
those
24
bytes,
we
can
save
14
gigabytes
so
yeah
there
we
go.
Definitely
should
do
that.
A
Yeah
yeah:
do
we
want
to
do
that.
B
B
And
I
would
only
do
that
if
rails
polymorphic
Association
can
support
changing
it
to
an
enum
or
something
like
that.
Yeah.
A
Yeah,
you
can
you
can
I
promise
I,
don't
know
how
how
the
backfield
works,
because
you
know
how
to
do
transformation
when
you
do
backfill,
because
the
new
table
is
going
to
have
a
different
data
type
and
then
the
old
table
is
going
to
have
Yeah
a
different
data
type,
so
I
don't
know
how
to
backtrack.
In
that
sense,
yeah.
B
Yeah,
it's
mostly
storage,
I'm
thinking
about
yeah
yeah.