►
From YouTube: Running Hyperparameter Optimization Jobs in Parallel
Description
Part 2 of Exploring GitLab Pipelines for Hyperparameter Optimization
Last time, we created a very simple Hyperparemeter Optimization pipeline, that runs the trials sequentially. However, that becomes quickly impractical, since in real world use cases training a single model can take hours, days or even weeks. This week, let's explore how to run those trials in parallel using Dynamic Parent-Child pipelines!
Code: https://gitlab.com/gitlab-org/incubation-engineering/mlops/hyperparameter-tuning-exploration/-/tree/part_2_running_trials_in_parallel
Exploring GitLab Pipelines for Hyperparameter Optimization Series:
https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/6
A
Welcome
everyone
to
another
edition
of
emelops
with
gitlab,
and
today
we
will
continue
the
series
of
exploring
gitlab
pipelines
for
hyper
parameter,
optimization
and
this
time
we're
going
to
talk
about
hyperparameter
running
the
jobs
in
parallel
for
hyper
parameter.
Optimization.
My
name
is
eduardo.
I'm
an
ecobase
engineer
for
mlaps
and
just
to
recap:
why
are
we
doing
this
in
the
first
place,
so
hyper
parameter?
A
A
But
choosing
this
automatically
is
a
very
tedious
process,
very
time,
consu
consuming
resource
intensive
process,
and
it
basically
must
be
done
through
pipelines.
A
So
we
want
to
check
out
if
gitlab
fits
the
bill
here,
if
we
can
use
gitlab
for
for
for
this
use
case,
it's
also
the
first
step
towards
ultima.
So
if
we
ever
want
to
do,
automl
hyperparameter
optimization
is
one
of
the
of
the
steps
on
autumnal.
A
So
if
you
want
to
check
out
what
everything
that
is
being
covered
over
here
and
follow
up
for
the
new
videos
that
will
come
up
soon
follow
the
epic,
the
hyper
parameter
exploration,
epic
and
so
what
we
did
so
far
on
part
zero.
We
explained
what
is
hyper
parameter
optimization
if
you're
new,
if
we're
arriving
now-
and
you
wanna
understand
a
bit
more.
What
we're
doing
here
go.
Please
go
back
over
there
on
that
video.
A
I
I
take
a
little
bit
more
time
to
turn
to
explain
all
the
edge
cases,
and
why
is
this
important
and
then
on
part
one?
We
create
a
very
simple
pipeline
that
runs
hyperparameterization,
but
it
runs
sequentially
and
the
thing
about
hyper
parameter.
Optimization
is
that
okay,
so,
for
example,
on
this
one
I
have
on
this.
It's
very
simple
that
I
created
it.
Has
I
don't
know:
18
different
hyper
parameter
combinations,
each
run
five
times,
and
each
model
training
takes
a
second.
A
So
it
should
take
about
two
minutes
to
run
everything
real
world,
though
it
usually
would
take.
I
don't
know
days
hours,
possibly
even
weeks,
to
train
a
single
model.
A
So
if
you
and
the
number
of
hyper
parameters
that
you're
going
to
be
exploring
is
a
lot
larger
than
18.,
so
you
cannot
wait
for
an
entire
month
or
a
quarter
just
for
hyper
parameters
to
be
optimized
right,
so
sequentially
running
the
trials,
don't
really
don't
really
scale
that
well
so
if
we
can't
rely
on
running
them
sequentially
we
have
to
do
this
parallel,
but
how?
How
can
we
do
this?
The
thing
is
like
our
hyper
parameters
are
not
preset.
They
are
defined
on
a
hyperparameter
file.yaml
file
when
I'm
coding
the
ci.
A
So
that
allows
us
to
do
whatever
we
want.
Basically,
we
can
encode
every
kind
of
dynamic
behavior
we
want
for
that
to
happen
into
a
static
that
is
created
afterwards
right
at
runtime,
and
this
is
where
we
want
to
go.
So,
let's
take
a
look
on
it.
How
it
looks
right
now,
so
here
I
have
on
the
hyper
parameter
repository.
A
I
have
a
I
already
a
merge
request
ready-
and
here
we
can
see
this-
is
the
parent
pipeline
only
two
jobs,
one
for
generating
the
the
ci
another
one
for
running
and
then
here
on
the
downstream,
I
can
see
the
child
pipeline,
so
this
prepare
optimize
and
publish
this
is
the
entire
thing?
Is
the
the
pipeline
that
you're
taking
a
look
at,
and
this
is
very
similar
to
the
old
one?
We
have
the
only
difference,
then,
instead
of
running
sequentially,
now
we
run
them
in
parallel.
A
So
this
is
actually
what
we
we
wanted
to
achieve,
and
it's
great
that
we
can
do
that
right
now.
So,
let's
take
a
bit,
let's
look
at
how
we
did
did
we
implement
this?
So
first
we
have
the
parent
pipeline.
Like
I
said
very
simple,
the
only
goal
is
to
generate
optimization,
ci
and
then
run
the
ci.
So
it's
a
very
small
configuration
file
and
the
only
important
thing
is
that
one
over
there
it's
a
script
that
generates
that
generates
the
file.
A
So
it
takes
a
a
template
that
I
wrote
in
ginger
to
and
reads
the
hyperparameter.yaml
file
and
then
creates
this
quite
large
ci
file,
and
you
see
how
I
repeated
over
and
over.
I
run
trio,
one
zero
one.
So
for
each
for
each
of
the
trials
that
I
have
defined
on
a
hyper
parameter.
If
the
combinations
that
I
have
defined
on
hyperparameters.yaml,
I
have
one
run
trial
job.
A
I
was
pointed
out
that
there's
a
much
simpler
way
to
create
this
with
the
keyword
parallel.
It
will
create
the
all
of
this
automatically,
but
you
still
need
to
pass
for
parallel
the
number
of
jobs
that
you
want,
that
you
want
to
start,
so
you
would
need
dynamic
pipelines
either
way.
It's
just
that
it
would
become
a
lot
cleaner
to
read.
So
I
might
want
to
implement
that
in
the
future
and
then
with
that
it
creates
a
child
pipeline.
A
Two
things,
one
prepare
the
data,
still
the
same
synthetic
data
that
we
used
before,
but
it
also
prepared
the
trial
files.
So
it
creates
a
file
with
all
parameters
that
each
one
of
the
trials
are
going
to
pick
up
in
theory.
I
could
pass
this
directly
when
creating
the
definitions,
but
using
a
file
like
this
will
help
us
in
the
next
step,
which
is
making
this
iterative,
so
it
could
be
done
a
little
bit
different.
A
A
So
I
create
this
file
and
then
the
next
step
is
run
so
run.
Each
one
of
the
trials
will
pick
up
one
of
the
one
of
the
items
on
that
configuration
file
that
I
just
showed
it
will
train
the
model
and
it
will
create
a
csv
file
with
the
id
and
the
statistics
for
that
training.
So
what
was
the
score?
What
was
the
training
time?
What
was
the
feeding
time
for
each
of
them?
A
Then
we
come
then
we
have
the
finalized
pipeline.
The
step
the
first
part
of
here
is
the
part.
Here
is
the
collect
results
where
I
read
all
of
these
files
that
were
generated
and
compute
statistics
on
them.
So
basically
remove
the
sqlearn
part
of
optimization
and
implemented
everything
on
my
own,
so
we
hand
implemented,
we
will
implement
those
so
that
we
could
take
control
of
how
those
things
run.
So
we
are
even
though
we
are
using
sqlearn
for
cross
validation.
We
are
now
we
don't
use
this
for
the
optimization
anymore.
A
This
is
this:
is
the
work
there
that
was
done
so
so,
for
example,
here
I
can
show
the
the
results
being
displayed.
So
if
I
come
to
the
over
here
yeah
and
here
you
can
see
the
results
that
are
displayed.
The
last
step.
It
comments
on
the
on
the
on
the
merge
request,
the
results
of
each
one
of
the
trials
and
what
is
the
best
trial,
so
pretty
cool.
A
The
iteration
is
even
slower
than
before,
because
before
we
could
still
at
least
use
the
the
online
editors
like
the
the
pipeline
editor
and
things
like
that,
but
on
this
one
the
gitlab
ci
file
is
just
the
parent
one.
The
child
pipeline
is
a
separate
file,
so
you
can't
really
use
the
pipeline
editor.
So
what
I
did
is
I
created
a
child
pipeline
first,
with
some
hand,
hand
typed
trials,
and
when
I
was
happy
with
that,
then
I
transformed
that
into
a
template,
but
it
becomes
a
lot
lower.
A
The
the
the
iteration
time
of
this
and
one
thing
that
would
make
it
a
lot
easier,
a
lot
faster
to
implement
a
pipeline.
Also,
a
lot
more
a
lot
less
resource
intensive
to
implement
a
pipeline
is
the
concept
of
checkpoints.
A
A
Now
suppose
that
step
five
is
failing
so
and
it
depends
on
the
output
of
step
four.
So
if
I
want
to
fix
the
step
five
pipe
and
I
have
to
rerun
everything
again-
and
that
is
very
very
annoying,
so
it
would
be
really
useful
if
I
could
store
the
state
until
stage
four
until
step
four
and
then
replay
only
step
five
afterwards,
so
you
don't
need
to
rerun
the
entire
pipeline.
You
run
everything
that
was
correct.
A
You
keep
you
don't
need
to
run.
What
was
correct,
you
keep
that
and
then
you
just
run
the
new
steps,
so
that
would
make
the
process
a
lot
more.
A
lot
faster,
a
lot
more
useful,
and
that
mean
you
would
use
a
lot
less
resources
as
well,
and
that
would
be
pretty
cool
to
see
at
some
point
coming
next
next
week.
A
Things
will
get
fun
because,
right
now
we
know
what
are
the
hyper
parameters
that
we're
going
to
use
when,
even
when
we
create
the
hyper
the
dynamic
pipeline,
we
know
what
are
the
hyper
parameters,
the
combinations,
but
what?
If
we
don't
know
what,
if
we
have
to
compute
these
hyper
parameters
at
run
time
of
the
child?
What's
what
will
look?
What
will
that
look
like
this
is
the
the
our
next
step.
This
is
where
we
start
bending
a
bit
gitlab
two
things
it
wasn't
supposed
to
do,
and
I'm
very
excited
about
that.