►
From YouTube: SEG MLOps Update - 12th November 2021
Description
Feedback and Ideas for Jupyter Support: https://gitlab.com/gitlab-org/gitlab/-/issues/343024
Feedback and Ideas for Pipeline Experimentation: https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/5
All Updates: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/16
A
Hello,
everyone
and
welcome
to
another
mlaps
civil
engineer
group
update
it's
time
for
november
12
2021
welcome
back
and
we
are
going
to
talk
about
some
very
exciting
things
this
week.
Just
a
reminder
of
what
we're
doing
here.
My
vision
for
this
analogues
is
just
to
make
gitlab
a
tool,
data,
scientist
and
machine
learning
engineers
love
to
use.
The
thing
is
especially
data
scientists.
A
They
don't
use
gitlab
because
they
want
to
they
use
because
they
have
to
get
lab
and
git
and
the
ci
is
really
compatible
a
lot
of
times
with
with
the
data
scientist
workflow,
and
we
are
trying
a
little
bit
to
bridge
this
gap
and
see
what
is
the
tooling
that
is
missing.
What
are
the
components?
A
How
can
we
make
gitlab
a
better
companion
for
data
scientists?
So,
on
that
note,
first
yay
jupiter
is
live.
We
finish
most
of
the
the
the
first
iteration
of
it.
You
can
see
it
over
here.
It
is
not
perfect,
it
is
not
a
deal.
That
was
never
our
goal,
but
it
is
better
than
before
a
lot
better.
So
now
you
can
clearly
see
what
changed
within
a
jupiter
notebook.
A
You
can
clearly
see
the
differences
we're
still.
There
are
a
lot
of
things
we
want
to
do
over
here,
but
the
first
iteration
is
live
and
users
can
already
see
it.
It's
going
to
be
for
the
self-managed
customers,
it's
going
to
be
on
14.5
for
gitlab
home
users
is
already
over
there.
Okay,
very
happy
with
the
results
we
shared
this
on
twitter
and
the
results.
The
reception
by
the
by
our
customers
were
was
really
really
great.
A
They
were
very
happy
with
this
very
excited,
both
gitlab
users.
Our
competitors
were
looking
at
this
as
well,
I'm
very
happy
with
it
we'll
still
be
working
on
it.
There's
still
a
lot
of
things
like
I
said
we
want
to
do
and
if
you
have
any
asks
or
any
features
with
future
requests
any
feedback.
I
will
link
this
this
issue
on
the
on
the
video
description,
so
you
can
just
come
in
give
your
suggestion.
I
will
take
a
look
and
then
we
can
discuss.
A
Then
I
can
think
I
can
see
what
I'm
gonna
do
next,
but
very
exciting,
very
happy
finally
took
two
months,
but
it's
out
there
and
yeah
okay.
Next,
one
done
that
taking
out
of
the
out
of
the
scope
now
the
other
thing
that
we're
working
on,
but
we
are
both
looking
at
feature
implementation,
which
is
the
jupyter
notebook,
but
we
also
want
to
explore
how
data
scientists
could
use
the
things
we
already
have
on
gitlab.
A
One
of
the
aspects
that
data
scientists,
machine
learning
engineers
do
is,
that
is
part
of
of
the
machine
learning
process
is
to
optimize
the
hyper
parameters
of
the
model.
Hyper
parameters
are
just
the
parameters
are
passed
to
the
model
itself,
not
to
the
training.
So
when
you
create
a
model,
you
can
select,
a
lot
of
configuration
between.
You
can
tune
out
a
lot
of
the
configurations
for
the
for
algorithm,
and
these
are
called
hyper
parameter
and
this
configuration
can
make
an
impact
on
the
model
performance.
A
So
this
is
what
you
call
have
a
parameter.
Optimization
is
in
an
additional
step
to
the
training
pipeline,
where,
after
where
you,
you
train
with
the
data
but
also
choose
what
are
the
best
hyper
parameters
for
your
use
case,
and
this
is
a
very
long,
tedious
process.
You
run
many
many
times
the
model
and
using
power
plants
for
that
I
are
usually
the
way
to
go
either
with
kubeflow
or
aws
sagemaker,
and
here
we
want
to
explore
a
bit.
How
can
we
use
gitlab
to
that?
For
that?
A
This
is
not
the
common
use
case
for
gitlab
pipelines.
We
understand
that
and
it's
more
about
exploring.
How
could
this
be
make
better,
and
can
we
use
this
as
it
is
right
now
so
to
be
able
to
do
that?
I,
the
code
is
like
this
is
the
issue,
but
over
here,
okay,
so
what
I
did
instead
of
picking
up
a
a
data
set
or
anything,
that's
out
there,
what
I
did
it
created
a
simple
model
that
has
six
seven
random
variables
and
that
I
compute
a
function
that
returns
true
or
false.
A
Based
on
this
five
or
six
variable
the
seven
variables,
then
I
remove
three
three
of
this.
This
variables,
which
are
called
features,
which
means
that
I'm
trying
seven
features,
are
necessary
to
create
the
my
target
for
for
prediction,
but
I
only
have
four
available,
so
I
can
use
machine
learning
to
try
to
estimate
the
value
of
y
based
on
a
b
c
and
g.
So
it's
a
very
simple
way
of
creating
a
model
for
a
data
set
that
I
can
control.
A
Our
focus
here
is
not
really
did
it
set
itself
is
how
to
explore
the
pipeline,
so
it
was
okay
for
now.
So
what
I
did
over
here-
I
created
a
pipeline
over
here
that,
okay,
this,
what
yeah.
A
Enter
the
visualize
okay,
so
what
I
did
to
create
a
very
simple
pipeline
that
first
of
all
fetch
the
data,
which
is
the
step
that
I
shared
just
to
create
a
data
set
and
I
created
an
artifact,
then
it
optimizes
the
hyper
parameters.
I
use
sqlearn
scalar
for
this
and
I
use
a
very
simple
algorithm
for
the
helper
parameter.
Optimization
hyper
of
hyper.
The
optimization
steps
are
basically
a
search
problem.
You
have
to
run
a
search,
so
one
of
the
ways
is
to
test
out
all
possibilities
and
choose
whatever
is
best.
A
This
is
what
this
algorithm
does.
So
it's
very
simple,
not
optimized!
A
You
can
do
a
lot
of
training
with
this,
so
you
cannot
go
crazy
on
on
the
your
hyper
parameters,
but
it
is
a
first
skeleton
so
that
that's
okay
as
well
and
then
it
the
next
step
it
creates
a
it
publishes
the
results
to
the
merge
request.
So
if
we
come
here
to
the
this
is
the
visualize,
but
I
can
come
to
the
merge
requests
and
I
can
see
a
very
simple
merge
request
that
I
added
so
hello.
A
It
doesn't
really
matter,
but
if
you
come
over
here,
it
reads
this
file
of
where
is
it
reads
from
this
file
the
hyperparameter.ammo?
A
So
there
are
many
parameters
that
I
can
configure
on
this
model,
and
then
I
run
this
pipeline
over
here.
So
we
can
look
on
the
step
optimize
this
scalar
job,
and
you
can
see
over
here
that
it's
really
many
many
tries.
So
it
tries
all
possibilities
within
that
specific
scope
that
I
gave
not
only
that
it
tries
many
times
for
because
it
uses
cross
validation
to
to
score
the
models.
So
for
every
configuration
that
I
have,
it
tries
five
times
with
a
different
training
set.
A
This
is
to
make
sure
that
the
data
that
the
the
score
is
compatible
with,
like
it's
not
just
one
case
where
it
went
really
well,
no
within
that
hyper
parameter.
Those
are
the
best
results,
for
example,
so,
for
example,
for
this
one,
it
tasks
five
times
with
different
training
data
for
max
f,
two
max
samples,
500
and
simple
step,
and
it
does
for
everyone.
So
if
you
look
at
here,
I
just
had
12
candidates,
but
I
only
had
what
is
the
type
of
parameter,
20
and
then
over
here.
A
So
I
only
tested
two
three
values
for
next
step:
two
from
examples
and
two
forming
sample
split.
So
that
means
I
only
had
12
possibilities,
but
in
the
end
there
were
60
model
fits.
So
it
takes
a
lot
of
time
to
compute
this,
not
in
this
case,
which
is
very
simple
model,
very
little
data.
I
didn't
care
so
much
about
that,
so
it's
very
fast,
but
think
that
this
can
explode
very
quickly.
This
is
why
it's
important
to
have
this
as
a
pipeline.
A
The-
and
this
is
a
limitation
of
this
one
with
this-
is
run
sequentially.
Ideally,
this
should
run
in
parallel
because
they're
independent
of
each
other.
So
I
can
just
go
crazy
and
fire
five,
six,
seven
concurrent
pipelines
so
that
I
can
get
results
faster
after
the
the
pipeline.
This
optimization
pipeline
runs.
Then
it's
the
next
step,
which
is
publish
results.
A
A
A
I
think
that
you
are
running
this
on
a
recommendation
page
if
you
get
two
percent
more
of
of
conversion,
just
because
of
this,
that's
insane
so
now
what
it
does
is
just
it
creates
this
report
and
that's
it
for
now
that
it
doesn't
do
much
more
than
this.
What,
but
that
doesn't
mean.
That's
all
that's
where
we
want
to
stop.
So
if
we
come
over
back
over
here
and
over
here,
no,
this
one.
This
is
the
skeleton.
This
is
the
starting
point
where
we
want
to.
A
We
want
to
start
getting
more
and
more
complex.
So,
first
of
all,
there
are
many
things
we
can
do,
but
we
can
use
real
data
instead
of
the
the
fake
data
they
are
using.
We
can
implement
an
iterative
pipeline,
so
there
are
algorithms
for
optimization
that
are
not
try
everything
they
are
more
likely.
Let's
try
these
five
check
the
results
and
then
come
up
with
new
five
values,
and
so
on
so
forth,
really
a
search
problem.
A
We
can
use
this
to
stress
tests
so
suppose
that
you
are
running
tensorflow
and
really
really
large
data
sets.
How
would
a
gitlab
behave?
Well?
How
would
the
runners
behave
with
this?
So
let's
write
this
out
as
well.
Gpu
runners
that
we
have
available
and
one
other
thing
is
the
building
experience.
So
it's
really
not
nice
to
build
the
pipelines,
it's
not
horrible,
but
it's
a
lot
of
trial
and
error,
but
is
it
possible
to
make
it
easier?
A
For
example,
can
you
transform
a
jupiter
notebook
into
a
pipeline
user,
pushes
a
jupiter
notebook
and
that
becomes
a
pipeline?
We
can
try
that.
So
if
you
have
any
idea
anything
that
we
want
us
to,
try
it
out
I'll.
Add
the
comment
over
here
for
the
expectation
for
the
epic
share
with
us.
We
will
try
to
look
into.
We
will
give
priority
to
suggestions
to
external
suggestions
and
yeah.
That's
it.
Thank
you
very
much
that
is
for
today,
bye.