►
Description
We are starting a series of videos exploring wether GitLab Pipelines can be used for Data Science use cases. To kick this off, we go through what is Hyperparameter Optimization.
Hyperparameter Optimization with GitLab Epic: https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/6
Incubation Engineering MLOps Updates: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/16
A
Hello,
everyone
and
welcome
to
another
session
on
mlaps
here
at
gitlab.
My
name
is
eduardo
and
I'll
be
walking
a
little
bit
talking
a
little
bit
about
hyper
parameter,
optimization
and
what
does
it
have
to
do
with
gitlab?
A
So
it
will
be
a
little
bit
more
of
a
conceptual
concept
chat
rather
than
talking
about
something
that
was
done,
but
it
is
important
to
pave
the
way
of
from
to
what
we're
gonna
do
next.
A
So
why
are
we
doing
this?
My
role
is
to
explore
emo
data
science
use
cases
within
gitlab,
so
test
out
our
tooling
for
all
the
use,
cases
that
might
come
up
or
most
of
them
and
hyper
parameter.
Optimization
is
a
process
within
develops
life
cycle
that
is
very
long
and
tedious,
and
but
that
can
have
great
results
and
can
be
part
of
the
ci
process
for
machine
learning.
So
we're
going
to
look
over
the
the
next
few
weeks,
I'm
going
to
I'm
going
to
explore
a
little
bit.
What
does
it
look
like?
A
What
is
the
user
experience?
Can
our
runners
do
that
and
the
cool
thing
about
working
with
hyper
parameter
optimization?
Is
that
it
kind
of
paves
the
way
towards
autumn
out
and
we'll
talk
about
this
a
little
bit
later
so
from
the
beginning.
Machine
learning
model
is
an
artifact
that
is
created
from
applying
an
algorithm
to
a
data
set
that
was
processed.
So
you
have
data
set.
You
perform
some
processing,
so
change
size
that
the
the
image
size
convert.
A
The
text
clean
up
some
events
and
so
on
and
so
forth,
and
then
you
apply
an
algorithm
to
this
extreme
boost
random
forest.
A
Many
more
and
then
you
train
both
these
together
and
then
you
get
a
mortal
artifact,
but
both
the
data
processing
step
and
the
algorithms
that
you're
going
to
use.
They
have
configuration
parameters,
things
that
you
pass
on
the
constructor
of
the
model
that
will
change
how
it
behaves
or
the
the
search
or
how
it
how
deep
it
goes
into
the
data
or
just
tweaks,
and
these
we
call
hyper
parameters.
A
Some
examples
that
we
can
we're
going
to
see
of
this,
like
if
you
have
a
neural
net,
the
number
of
hidden
layers
that
you
have
in
there.
If
you
have
a
decision
tree
the
depth
of
the
the
maximum
depth
of
the
decision
tree
or
how
many
different
cuts
you
can
have
over
there
on
the
random
forest,
the
sampling,
how
many
features
you're
going
to
use?
A
A
These
are
called
hyper
parameters
and
the
thing
is
choosing
the
right.
Hyper
parameter
can
have
as
much
impact
as
choosing
the
right,
algorithm
or
choosing
or
improving
the
data
set
itself.
So
it
is
really
important
step
of
the
process
and
it
yields
great
results,
but
it
takes
a
long
time
to
perform.
A
What
you
do
usually
is
that
you
have
something
called
hyper
parameter,
optimization,
which
is
the
process
of
choosing
automatically
the
hyper
parameter
for
the
model
and
for
the
data.
So
it
goes
like
this.
You
you,
you
rank
some
trials,
you
train
models
for
all
of
these
trials
trials
is
a
is
a
is
a
group
of
hyper
parameters.
A
hyper
parameter
configuration
that
you
want
to
test.
You
train
all
of
these
models.
You
compare
the
results,
and
out
of
this
you
choose
the
best
model.
A
This
can
be
done
manually
or
automatically,
and
from
this
you
can
see
that
in
the
end,
at
edit
score,
hyper
parameter.
Optimization
is
a
search
problem,
so
everything
that
we
know
about
search
problems
can
also
be
applied
here,
so
how
to
rank,
how
to
rank
the
trials.
How
to
choose
the
best
trial
when
to
stop
when
to
start
so
and
so
forth.
A
A
So,
for
the
first
case,
you
have,
for
example,
grid
search
that
that
that
you
have
a
couple
of
different
parameters
you
want
to
optimize
for
so
and
each
one
has
two
three
or
I
don't
know
how
many
you
just
test
all
possible
combinations.
So
you
create
a
huge
list
of
combinations
and
you
test
them
all.
So
it's
brute
force,
random
search,
similar.
You
do
generate.
You
might
generate
this
all
these
lists,
but
you
test
at
random
the
hyper
parameters
in
itself.
They
come
from
random
distribution.
A
So
this
reduce,
even
though
you
have
about
the
same
search
space
that
you
had
before
now,
the
random
search
that
you
you
select
at
random,
which
space
you're
gonna
look
at
and
then
on
the
iterative
creation,
and
for
this
both
for
both
this
the
grid,
search
and
rhythm
search.
You
come
up
with
these
values
beforehand,
so
you
create
all
these
values
and
then
you
test
all
of
them,
and
then
you
compare
your
results
and
then
you
come
up
with
the
best
model
for
iterative
creation
is
a
little
bit
different.
A
Instead
of
looking
at
every
parameter,
you
try
to
be
a
little
bit
more
a
little
bit
smaller
on
where
you're
going
to
so
genetic
algorithms
and
bayesian
methods
are
examples
of
this
more
iterative
creation
of
the
trials
and
then
the
second
step
is
testing
trials.
So,
like
I
showed
before
you
have
the
trials
and
then
you
create
a
model
and
then
you
compare
results.
But
the
thing
is
you
don't
know
if
the
trial
is
good,
are
the
results
or
is
just
that?
A
It
was
a
really
good
match
between
the
data
that
was
used
and
the
trine
itself
to
improve
this,
to
make
it
a
little
bit
more
to
have
a
little
bit
more
notion
of
generalized
visibility
and
see
how
it
will
behave
to
new
data
sets
we
use
something
called
cross
validation.
You
take
the
initial
data
set,
you
split
into
n
folds,
so
one
two
three
four
five
and
then
you
train
five
times
so
for
each
trial.
You
train
five
models.
A
A
Performance
results
that
you
can
now
create
a
distribution
of
of
how
well
does
that
hyper
parameter
configuration
tests.
So,
in
the
end,
during
this
process,
you're
gonna
train
about
number
of
trials
times
number
of
folds
models,
so
you
can
quickly
see
how
fast
this
world
and
often
training
a
machine
learning
model
is
not
something
quick.
It
can
take
hours
days
sometimes
seconds
perhaps,
but
it
is
not
cheap.
It
is
not
fast
and
it's
a
very
tedious
process
in
general.
A
So
this
is
why
hyperparameter
application
is
important
and
why
a
pipeline
having
this
as
a
pipeline
on
gitlab
is
also
really
important,
because
then
it
can
become
part
of
the
ci
processing
itself
it
can.
You
can
create
a
commit
that
changes,
the
mod
the
code
for
the
model,
and
it
already
starts
the
pipelines
to
choose
the
best
hyper
parameters
for
that
you
can
already
use
all
the
runners
that
you
have.
You
don't
need
to
use
the
the
the
data
sciences
or
machine
learning
engineer
laptop
for
that.
A
So
that's
pretty
great
and
we
need
to
ex,
but
we
do
need
to
explore
a
bit.
How
this
use
case
can
will
behave
with
how
gitlab
will
behave
within
this
use
case
and,
like
I
said
you,
this
is
the
first
step
towards
autumnal,
why
you
can
think
as
the
choice
of
algorithm
in
itself
as
a
hyper
parameter.
A
So
this
is
a
very
small
data
set
about
10,
000
or
something
rows,
and
here
I
create
a
very
simple
algorithm,
a
random
forest
classifier
that
will
look
at
this
data
and
I
fit
this
model
and
its
accuracy
is
about
seven
three,
eight
and
area
under
the
curve
is
seven,
two,
five,
seven
zero
point,
seven,
two
five!
So
all
of
this
one
is
the
best
zero
is
the
0.5
is
bad.
A
So
what
I'm
gonna
do
random
forest.
If
I,
if
I
open
the
I'm,
not
gonna,
do
this
now,
but
if
I
open
the
the
documentation,
it
has
many
parameters
that
I
can
choose
and
these
are
some
of
them.
So
I
have
a
max
depth
and
I
can
say
here
I'm
gonna
test
out
two
five
and
ten
and
then
from
examples.
A
And
then
I'm
gonna
run
a
grid
search
with
five
folds
on
the
cv
and
I'm
gonna
score
over
accuracy,
and
then
it's
gonna
run
all
so.
It's
gonna
fit
60
60
different
models.
There
are
12
candidates,
so
3
times
2
times,
2
times
5
over
the
number
of
votes.
A
This
is
a
very
small
data
set,
so
it
doesn't
really
like
it's
very
fast,
so
it
trained
really
fast,
but
it
did
run
for
60
more
different
models
and
then
I
can
check
my
results
and
I
can
do
some
plots
so,
for
example,
the
max
depth.
I
can
see
that
there
is
the
the
test
score
increased
when
the
max
depth
increased.
So
I
can
see
over
here.
I
can
also
see
over
here
that
for
the
max
samples
it
tends
to
increase
as
well
and
for
mean
sample,
split,
doesn't
really
change.
A
A
There
were
mostly
max
depth
was
high
and
over
here
it
just
didn't
really
matter
so
I
can
see,
for
example,
okay.
So
if
I
increase
my
max
depth,
where
are
the
results
coming
in
so
they
seem
to
be
higher
than
if
I.
A
Oh,
it
broke
for
some
reason:
okay,
okay,
let
me
just
regenerate
this
plot.
Live
coding
is
always
like
this.
Okay
data.
Sorry,
so
if
I
have
low
max
step,
results
are
generally
on
the
bottom
side
of
the
test
car,
but
if
I
have
large,
they
tend
to
be
larger,
so
this
is
another
way
of
looking
at
these
parameters.
A
So
this
is
a
very
quick
example
of
how
this
can
look
like
up
next
we
this
is.
This
was
just
a
introduction
of
what
hyper
parameter.
Optimization
is,
and
why
are
we
doing
this?
So
the
next
step
is
create
a
very
simple
pipeline
that
performs
this
job
on
a
ci,
so
I
create
a
command.
I
create
a
merge
request.
It
already
computes
automatically
the
the
correct
the
best
values
and
does
everything
for
us.
So
thanks
and
see
you
soon.