►
From YouTube: 2019-08-09 - Database query optimization techniques: UNION push down and TOP-K with index
Description
We optimize a query to retrieve TOP-100 projects for a user, bringing down query runtime from 7s to less than 20ms.
Context: https://gitlab.com/gitlab-org/gitlab-ce/issues/65854
The tricks applied are:
1. UNION push down instead of OR
2. ORDER BY, LIMIT push down into UNION branches
3. Partial index to efficiently retrieve TOP-K by ORDER BY criteria
A
All
right,
so
I
would
like
to
quickly
record
with
you
about
a
query
optimization,
because
it
is
rather
interesting
to
see
those
like
tricks
that
can
be
applied,
and
it's
also
quite
an
interesting
query.
So,
let's,
let's
talk
right
into
that
came
from
looking
at
from
a
at
a
checkup
report
that
we
got
from
a
production
database
once
a
week
and
it
basically
looks
at
its
30
minutes
interval
and
records
all
the
all.
A
So
it
is
actually
fairly
fairly
standard.
Query
we
like
to
retrieve
projects
for
a
particular
user.
That's
my
my
user
ID
in
this
case
and
we'd
like
to
retrieve
the
private
ones,
are
the
ones
that
are
like
that
I'm
authorized
to
see,
and
in
addition
to
that,
the
ones
that
are
in
public.
So
basically
when
you're
it
could
lock
on
those
are
quite
a
lot
of
products,
because
we
have
a
lot
of
public
projects.
A
The
way
we
do,
that
is
we
basically
or
those
two
conditions
additionally
we're
looking
at
or
we
want
to
exclude
the
products
that
are
an
Englishman
and
the
caveat
here
is
that
we
also
ordered
by
created
at
an
ID
which
makes
me
think
that
comes
from
an
API
call
but
I'm
not
totally
sure,
but
and
we're
looking
at
the
top
top
100
projects
by
that
sort.
Water.
Basically,
and
what
happens
when
you,
when
you
run
that
currently
when
it
takes
like
six,
seven
or
eight
hours,
not
hours
seconds?
A
A
A
So
what
we
do
is
we
just
take
a
hundred
private
projects
and
one
hundred
public
ones,
and
we
make
sure
that
the
order
buy
is
right
because
you
want
to
you
want
to
take
the
most
recent
ones
by
create
that
time
stem
and
then
IB,
and
we
do
that
in
both
branches
here.
So
we
end
up
by
selecting
from
the
Union.
A
We
end
up
with
a
maximum
of
200
projects
here
and
actually
that
makes
it
simple
to
you
know,
sort
them
on
the
fly,
so
we
would
sort
them
both
created
at
the
my
ID
once
again,
because
now
we
have
this
common
set
of
like
both
private
and
public
ones.
We
sort
them
again
on
the
fly
and
then
we
take
the
top
hundred,
but
there
is
only
very
little
data
to
sort
by
so
that
is
quite
efficient.
A
A
Spend
most
of
the
time
for
this
curry,
so
the
curry
took
about
2.7
seconds
and
we've
spent
2.5
and
actually
scanning
this
index,
and
we
read
a
lot
of
data
from
that
forum
there
and
we
we
have
two.
We
have
an
index
for
the
visibility
level
here,
so
that
is
that
a
sufficient
to
retrieve,
but
the
data
we're
getting
back
is
quite
quite
a
lot
and
only
once
we
have
all
that
data.
A
We
do
the
sorting
this
is
here,
so
the
sorting
doesn't
come
from
the
index,
but
it's
it's
performed
on
the
fly
and
then
we
apply
the
limit.
So
we've
first
retrieve
all
that
one
point:
oh,
what's
that
one
point
four
million
rows
from
like
for
me:
we
do
the
sorting
on
the
fly
and
then
we
applied
a
limit
and
take
the
top
100
ones
right,
and
this
is
actually
I
think
the
major
optimization
that
we
can
do
here.
A
We
can
create
an
index
that
supports
both
the
condition
that
we're
looking
at
and
the
ordering
and
the
way
to
do
that
is
creating
a
partial
index
for
the
condition.
So
this
translates
to
the
same
condition
that
we're
using
in
the
query,
visibility,
level
in
1020's
or
internal
or
public
and
pending
delete
is
false.
This
is
the
partial
condition,
and
now
we
index
columns
that
directly
relate
to
the
columns
that
we
order
by.
A
So,
basically
you
can
go
to
the
index.
You
can
retrieve
the
first
hundred
curry's
and
you
don't
have
to
do
any
sorting
at
all
and
primarily
you
don't
have
to
retrieve
the
full
index
right.
So
you
only
take
the
top
hundred
ones
from
the
index
and
that's
very
efficient
to
do
so.
When
we
run
that
query
once
again,
hopefully
it's
very
fast-
and
it
actually
is
like
goes
down
to
18
milliseconds,
and
when
we
look
at
how
this
changed
the
plan.
A
That's
in
here
the
system
new
index
that
we
created,
we
do
index
only
scan,
which
is
also
very
efficient,
because
we
only
read
from
the
index,
and
this
is
actually
what
we
just
talked
about.
We
retrieved
the
first
yeah.
That's
the
first
hundred
rows
from
from
the
index.
We
do.
We
don't
do
any
sorting
on
that,
because
we
know
the
reading
from
that
from
that
index
will
give
us
the
expected
sort
order.
A
That
was
to
push
down
the
order
by
and
limit
losses
to
those
individual
branches
of
the
Union
query
and
then
provide
an
index
for
one
of
them
to
make
sure
that
retrieving
the
top
hundred
by
this
order
becomes
very
efficient,
and
by
doing
that,
we
were
able
to
speed
the
curry
up
from
I
think
initially,
it
was
about
six
some
seven
seconds
about
that
down
to
like
twenty
milliseconds.
That's
quite
good
and
does
those
tricks
are
quite
universal.
A
It
looks
quite
familiar,
so
I
mean
retrieving
projects
or
checking
project.
Authorization
is
something
that
we
do
in
a
lot
of
places.
So
hopefully
we
can
find
all
of
them
and
make
sure
that
we
use
the
better
create
or
the
optimized
version
for
that
there
I
hope
that
was
kind
of
interesting
I'm,
always
looking
for
feedback,
so
feel
free
to
comment
somewhere.
Where,
if
sent
me
a
message
once
back,
Thanks.