►
From YouTube: GitLab Pipeline Caching Showcase
Description
Engineering Productivity showcase highlighting how caches are used in the GitLab pipeline
A
Great
yeah,
so
first
question
was:
why
is
it
important,
so
I
will
start
with
that
just
to
show
and
improvements
that
we
made
with
the
changes
to
our
caching
strategy.
So
I
think
we
did
the
chances
like
one
year
and
a
half
now
ago,
so
yeah
with
this
new
caching
strategy,
so
reduce
the
in
the
result,
jobs,
duration.
By
from
like
from
one
to
20
minutes
in
some
cases
and
for
pipelines
and
duration,
it
was
between
six
and
twenty
minutes.
So
that
was
quite
quite
a
lot.
B
A
Our
goals
at
the
time-
and
we
also
reduced
an
approximate
three
thousand
dollars
of
ci
machines
per
month,
and
I
don't
recall
any
cash
related
problems
after
that,
or
at
least
we
haven't
have,
we
haven't
had
a
lot,
that's
for
sure
so
now
about
the
strategy
itself.
A
So
the
goal
with
this
new
strategy
was
to
make
it
very
simple,
like
not
have
a
lot
of
cases
for,
for
you
know
like
when,
should
we
purge
or
you
know,
update
the
caches?
What
should
be
the
cache
keys
because,
as
you
all
know,
that's
one
of
the
hardest
problem
is
to
find
a
cache
keys
and
to
and
to
like,
to
find
yeah
to
basically
know
when
to
purge
caches.
A
So
basically,
the
the
important
steps
for
this
strategy
is
that
so
we
are
following
the
best
practices
that
we
that
are
defined
in
the
gitlab
docs.
So
the
first
point
is
that
every
job
should
be
able
to
pass
without
any
cash
or
even
with
an
outdated
cash.
A
So
that
means
that
caches
are
only
there
to
speed
up
jobs,
but
that
they
are
not
there
to
make
sure
that
jobs
pass.
Otherwise,
that's
that's
a
bug.
The
second
important
item
regarding
performance
is
that
all
jobs
must
only
pull
from
the
cache
and
never
push,
and
this
is
just
to
avoid
unnecessary
uploads
of
caches.
That
would
that
in
most
cases
are
identical.
A
So
that's
just
a
waste
of
time
and
bandwidth.
Basically,
if
you
do
that,
and
that's
like
pull
pull
push
so
pulling
and
pushing
from
into
the
cache
is
the
default
behavior.
So
you
need
to
change
that
and
the
goal
of
cache
is
to
avoid
like
restarting
dependencies
every
time,
so
ruby,
gems
nodes
and
go
packages.
A
But
if
you,
if
you
look
at
the
first
point-
and
that
also
means
that
the
job
still
must
still
like,
run
the
install
dependencies
comments,
because
the
cache
could
be
empty
or
outdated
and
so
yeah
that
so
on
this
that
works
well
for
this
package
managers
because
they
handle
outdated
dependencies.
A
A
About
the
cash
keys,
we
really
took
the
simple
way
of
having
fixed
cash
keys,
because
yeah
that
allows
all
pipelines
to
use
the
same
cache.
We
do
that
because
there's
no
like
it's
related
to
the
point
above
like
if
a
a
gems
or
a
package
in
general
is
updated.
A
If
you
only
update
one
dependency
and
yeah
that
allows
to
limit
the
number
of
caches,
so
the
number
of
combinations,
basically
so,
instead
of
having
one
cache
per
like
and
one
cache
per
branch
but
like-
and
we
don't
have
only
one
cache-
we
have
maybe
10
different
caches,
so
that
would
you
know,
multiply
10
by
number
of
branches,
so
that
would
be
a
waste
of
storage
and
yeah
about
the
updates
of
the
caches.
A
Since
we
don't
update
in
in
jobs
in
general,
we
only
perform
the
update
every
two
hours
in
in
our
regular
scheduled
pipeline.
A
A
So
we
are
using
the
multiple
cache
feature,
which
was
quite
recent.
I
think
it
was
implemented
like
introduced
this
year
and
it
allows
to
have
atomic
caches
really
specific
to
like.
We
have
the
ruby
cache.
We
have
the
nodes
cache
and
then
you
can
define
a
cache
that
would
combine
these
two
and
that's
also
register
the
number
of
jobs
we
need
to
update
the
caches.
A
So
that's
that
was
a
great
improvement
as
well,
and
I've
listed
just
two
specific
caches
definition.
A
So
there's
the
guitar
binaries
cache,
which
is
which
the
key
is
based
on
the
content
of
the
list
server
basement
file-
and
this
is
because
this
cache
stores
the
guitar
binaries
that
that
are
built
in
the
setup
test
and
job,
and
it's
just
simpler
to
rebuild
these
binaries
when
this
file
changes,
because
the
values
are
dependent
on
this
file
rather
than
you
know,
like
compare
the
version,
the
binaries
version
with
the
with
the
file
and
given
that
this
version
doesn't
change
very
often
it's
fine
and
the
second
specific
one
is
the
assets
cache
which
includes
compiled
frontend
assets,
and
it
also
includes
a
specific
assets.
A
B
I
think
there
is
an
issue
about
cash
for
fork
because
for
now
suppose,
there's
someone
folks
the
project
and
create
their
own
merge
request
and
that
branch
will
use
their
own
cache
and
there's
nothing
so
and
they,
if
they
don't
put,
they
don't
try
to
update
cache
with
labels
or
merge,
request,
title
and.
A
B
B
A
It
would
be
great
to
be
able
to
allow
forks
to
use
the
you
know
the
canonical
cache,
but
there's
probably
some
security
issues
with
that,
but
I
think
that
would
be
yeah.
B
A
Because
yeah,
our
strategy
doesn't
really
work
for
folks.
In
that
sense,
except
if
yeah
folks
define
the
schedule
by
applying
to
the
caches,
I
guess.
B
C
That
was
great
thanks,
remy
other.
Do
we
graph?
This
is
probably
a
an
ignorant
question.
I
apologize.
Do
we
graph
the
the
times
like
where
you
we
had
the
first
or
the
one
of
the
early
slides
the
improvements?
Do
we
have
that
charted
somewhere?
So
we
can
see
the
the
point
at
which
they,
like
the
benefit,
started
to
to
come
in.
A
Yeah
definitely
so
we
do
we
do
graph,
so
I'm
sharing
just
a
a
gitlab
issue,
but
the
graphs
that
you
see
are
from
sisense
and
yeah.
So
me
I
can
actually
yeah.
I
won't
show
you
right
now,
because
I
need
to
log
in
and
stuff,
but
yeah
we
graph
all
that,
so
we
graph
as
you
can
see
the
like
pipeline
type.
A
So,
for
example,
this
one
is
for
the
qa
pipeline
type.
So
these
are
the
pipelines
that
run
the
packaging
to
a
job.
This
one
is
from
the
front-end
pipeline
type
which,
which
deployed
the
review
apps,
and
this
one
is
from
the
code
pipeline
type.
So
mostly
like
back-end
pipeline,
we
could
say-
and
we
also
graph
per
job-
and
this
is
useful
to
detect
so
per
jobs.
Look
like
this,
for
example,
this
is
useful
to
detect
regressions,
usually,
and
it's
also
useful,
when
we
have
improvements
for
sure.
C
Yeah
it'd,
be
I'm
sure,
I'm
not
the
first
person
to
mention
this
it'd
be
awesome
to
have
these
sorts
of,
like
mini
mini
versions
of
these
graphs
kind
of
embedded
as
part
of
the
like
pipeline
view,
because
it's
I.
C
Is
there,
but
it's
perhaps
not
not
close
enough
to
be
able
to
be
rendered
inside
the
kind
of
the
gitlab
interface,
but
that'd,
be
so
cool
to
to
be
able
to
see
that,
like
the
regression
side
is,
is
as
interesting
and
fascinating
as
the
improvement
like
performance
side.
A
Yeah
totally
agree,
and
I
think
we,
if
you
recall,
we
discussed
that
a
few
weeks
ago
in
the
team
meeting,
because
we
had
a
regression
in
the
in
the
set
up,
tensed
and
yeah.
I
think
it
was
this
one
and-
and
I
actually
created
you,
know
a
feature
proposal
to
to
detect
that
at
the
merch
request
stage,
and
rather
than
looking
at
the
graphs.
D
D
And
and
I'll
say,
like
testing
well,
different
teams
in
verify
are
looking
at
pipeline.
Intelligence,
I
think,
is
what
they're
calling
it's
like
analytics
to
support
those
those
sort
of
insights,
so
that
customers
can
kind
of
can
have
more
nudges
and
reminders
around
their
ci
minute
usage
when
it
might
be
changed,
trends
may
be
changing
good
or
bad.
You
know
whether
things
are
improving
or
getting
worse,
so
I'll
have
to
dig
up
those
issues
and,
like
remy
said
he
created
an
issue
for
that
specific
thing.
D
Cool
remy,
that
was
great
any
final
questions.