►
From YouTube: SEG MLOps Update Jan 5th 2022 - Looking Forward
Description
Happy new year! In the first update of the year, we are taking the time not to talk about what we did, but what are the things we working on and where do we want to get with them.
Update Issue: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/42
User Personas: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/40
Rendered Notebook Diffs: https://gitlab.com/groups/gitlab-org/-/epics/7194
Glyter: https://gitlab.com/gitlab-org/incubation-engineering/mlops/glyter/-/tree/poc
Knowledge Repo: https://github.com/airbnb/knowledge-repo
A
Hello,
everyone
and
happy
new
year.
This
is
the
first
update
for
sag
of
the
year
and
today,
since
it's
new
year,
it's
time
for
new
year
resolutions,
we're
gonna
do
something
different.
We're
gonna
talk,
not
what
we
did
before
or
what
we
did
last
week,
because
we
were
all
on
holidays,
but
we're
gonna
talk
about
what
we
intend
to
do
throughout
the
next
few
months.
A
So
we've
been
working
so
far
on
jupiter
working
on
pipelines,
but
I
want
to
share
a
little
bit
about.
Where
do
we
want
to
get
with
each
of
those
things?
A
If
you
want
to
check
it
out.
Most
of
this
will
end
up
there
at
some
point
but
yeah
to
achieve
this
vision
right
now
we
have
four
areas
that
it's
in
my
mind
a
bit
so
first
of
all,
is
user.
Personas
then
rendered
code
review
for
jupyter,
notebooks
and
then
glitter
and
then
analytics
repository
most
of
it.
I
have
already
spoken
about
a
bit
here,
but
except
analytics
repository,
but
I'll
go
over
everything.
A
It's
no
secret
that
data
scientists
and
machine
learning
engineers
were
not
considered
target
customers
or
target
users
for
gitlab,
and
we
know
we
knew
they
used
gitlab,
but
there
was
nothing
done
for
them
and
that
is
reflected
with
on
the
lack
of
a
user
persona
and
user
research
done
on
the
area.
A
The
skill
sets
that
are
generally
used,
the
outcomes
and
everything
else,
and
this
becomes
repetitive
because
you
have
to
explain
over
and
over
and
over,
and
you
are
the
one
coming
with
ideas.
While
if
they
know
about
the
use
case
from
the
get-go,
they
can
come
up,
the
other
teams
can
come
up
with
with
the
ideas
and
with
the
improvements
themselves,
not
only
myself.
A
So
one
of
I
think,
I
believe,
one
of
the
main
things
we
can
do
to
increase
impact
on
the
long
term,
for
data
scientists
and
for
machine
learning.
Engineers
is
to
create
the
user
personas,
not
not
ourselves,
of
course,
with
the
user
research
with
the
ux
and
user
research
departments,
but
work
with
them
and
drive
this
creation
and
drive
the
the
the
kickstart,
the
user
research
on
this
area,
so
that
it's
not
my
word
right
now.
It's
my
word.
It's
me
talking
and
I
want
the
users
to
talk.
A
I
know
what
the
users
want.
Well,
some
because
I
was
a
user.
I
wasn't
a
scientist,
I
was
there.
I
felt
the
pain,
the
pain
points
that
I'm
trying
to
solve,
but
not
the
rest
of
of
the
teams.
So
this
is
what
we
need
to
just
to
solve.
A
We
did
deliver
the
the
cleaner
cool
review,
experience
so
cleaner
diffs,
where
we
transform
into
markdown,
and
we
display
this
and
remove
a
lot
of
unnecessary
things,
and
then
we
display
that
one,
which
is
already
a
big
step
in
the
right
direction,
but
consider
this
scenario
suppose
that
I
have
this
notebook
here:
nothing
special
about
it,
an
image,
a
couple
of
images
with,
I
don't
know
just
random
data.
A
A
And
then
I
come
with
a
with
a
what,
if
here
we
can
do
better
what?
If,
instead
of
displaying
this
markdown,
we
actually
had
code
reviews
over
the
notebook
itself
or
render
the
blocks
within
the
notebook
in
the
in
the
in
the
diff.
So
this
is
what
I'm
talking
about
now
mind
yourself.
This
is
a
very
poor
image
edition
that
I
made
in
google
slides
itself
so,
but
the
idea
here
is,
we
render
each
block.
We
do
diffs,
it's
a
different
diff
algorithm.
A
It's
a
different
diff
pattern
of
the
bun
data.
Scientists
might
understand
that,
but
a
different
dip
first
to
the
the
cell,
the
idea
of
the
cells
and
then
you
give
what's
in
each
cell
and
and
then
you
can
create
the
the
the
display.
A
A
I
can
discuss
if
that's
the
best
representation
or
not.
I
can
discuss
the
code.
I
can
discuss
the
questions
I
can
discuss
the
markdown.
I
can
just
the
notebook,
I
look
at
the
notebook
not
at
the
source
file.
Now
we
know
from
users
that
they
do
want
still
the
raw
divs.
There
are
still
use
cases
where
the
raw
diff
is
important,
and
we
don't
want
to
hide
that,
so
we
also
are
going
to
add
the
ability
to
toggle
between
raw
and
rendered.
A
If,
but
this
is
an
idea
of
where
we
want
to
go,
we
want
to
be
able
to
have
the
code
review,
have
the
diff
with
rendered
blocks,
with
markdown
rendered
with
image
renders,
with
code
cells
rendered,
because
that
this
is
the
what
is
necessary
to
have
a
good
code
review
on
a
jupyter
notebook.
You
cannot
rely
only
on
the
code,
you
need
the
images
itself
because
in
images
are
the
output
and
they
depend
on
each
other.
A
So
yeah,
I'm
very
excited
about
this,
and
I'm
very
confident
we
can
do
this.
It
might
take
a
while,
but
we
need
I
I
we
need
to
do
this,
it's
just
something
we
need
to
do
so
to
get
there.
This
is
what
we
want
to
do.
This
is
our
the
six
items
that
that
that
I
want
to
deliver
in
next
few
months
for
the
render
jupyter
notebook
notebook
diff,
which
is
not
a
markdown
diff.
A
So
it's
a
algorithm
specific
for
rendering
for
different
notebooks
render
the
images
render
markdown
blocks,
render
latex
render
the
code
and
the
ability
to
toggle
between
raw
and
rendered.
If
we're
already
working
on
a
lot
of
this,
so
we're
already
working
on
the
a
part
of
of
toggling
between
run,
render
diff,
we
already
working
on
the
notebook,
diff
and
then
render
becomes
the
next
step.
A
So
that
was
for
render
so
notebook
divs,
and
now
we
go
into
the
third
step,
which
is
glitter.
Glitter
is
a
library
we
created
a
very
simple
one.
For
now,
it
was
more
of
a
proof
of
concept
that
let
me
open
up
here
later
that
picks
up
a
jupiter
notebook,
for
example
this
notebook
over
here.
A
So
it
has
a
couple
of
steps
and
it
runs
this
notebook
as
a
pipeline,
so
each
one
of
the
steps
will
become
a
step.
So
if
I
come
here
over
the
pipeline
of
over
over
the
cicd
pipelines,
I
can
see
on
this
one
over
here
that
there
was
one
step
for
each
one
of
the
steps
that
were
on
that
original
notebook.
A
A
The
key
thing
here
that
we
need
to
remind
is
that,
since
training
models
is
takes,
a
lot
of
time
takes
a
lot
of
resources,
sometimes
resources
that
the
the
the
data
sciences
don't
have
on
their
machine
or
the
data
they
cannot
access
from
their
machine.
For
many
reasons,
pipelines
are
not
for
mlabs
are
not
part
of
the
verify
stage
they
are
as
well,
but
they
are
part
of
the
create
stage
and
at
the
great
stage
you
cannot
expect
that
a
scientist
should
to
keep
creating
a
commit
every
time
they
want
to
change
a
code.
A
They
are
prototyping.
So
imagine
that
you
are
prototyping
your
model
and
every
time
you
change
something
you
have
to
commit
so
that
it
does
goes
down.
It
triggers
a
pipeline
that
doesn't
make
a
lot
of
sense,
but
at
the
same
time,
right
now,
all
pipelines
run
with
things
that
are
within
with
the
file.
Even
the
glitter
for
the
lid
for
for
my
pipeline
for
glitter
to
run
the
notebook
has
to
be
on
their
repository,
but
what
if
we
could
run
a
notebook,
an
arbitrary
notebook
on
a
pre-configured
repository,
so
think
about
it.
A
A
I
just
passed
the
repository
where
this
needs
to
be
run
and
a
notebook
that
is
in
my
machine.
It's
not
on
the
on
it's
not
a
repository.
It's
not
committed,
it's
just
probably
staging.
I
just
change
a
line
or
something
I
don't
know,
but
take
the
thing
that
we're
building
with
glitter
right
now,
that
of
of
of
transforming
intricate
lab
ci,
but
with
a
arbitrary
notebook,
and
I
think
we
can
do
that
without
any
changes
to
gitlab
codebase,
just
a
lot
of
creative
engineering.
A
So
my
my
strategy
on
tackling
this
on
trying
to
make
a
mvp
for
this
is
to
rely
on
the
gitlab
api
to
upload
a
notebook
into
the
repository
and
then
trigger
a
pipeline
that
is
already
configured
where
you
pass
that
file
the
path
file.
The
the
pipeline
downloads
that
parent
pipeline
downloads
that
jupiter
file
transforms
into
a
gitlab
ci
and
runs
it
as
a
child
pipeline.
A
A
So
that
is
a
lot
of
great
engineering,
which
is
a
lot
of
fun,
very
excited
about
trying
this
out,
and
I
think
it
will
work
if
it
doesn't.
I
would
just
add
more
creative
engineering
to
it,
but
the
key
problem
here
right
now
is
how
to
make
it
run
with
arbitrary,
a
notebook
not
rely
on
a
file.
That's
in
the
commit
at
this
point
so
yeah.
Let's
try
it
out.
A
Another
thing
on
this
that
I
haven't
mentioned.
One
thing
that
is
in
my
mind,
for
pipelines
is
the
concept
of
the
checkpoints,
especially
for
machine
learning
engineers,
so
suppose
that
your
trainer
model
downloading
data
runs
fine
training
data
runs
fine
and
something
fails
when
uploading
to
the
model
registry.
A
This
on
the
while
or
I
don't
know
with
testing
or
computing
the
metrics,
and
I
want
to
run
everything
again.
It
would
be
really
interesting
if
I
could
run
a
new
pipeline
based
on
a
the
state,
the
output
state
of
a
previous
pipeline.
So
that's
something
that
is
in
my
mind.
I
don't
think
I'll
be
exploring
that
soon,
but
that
has
a
lot
of
of
benefits
not
only
for
data
scientists
but
for
the
entire,
our
entire
ciu's
base
user
base,
and
it's
just
there
all
the
time.
A
Last
and
a
little
bit
of
least
analytics
repository,
so
when
a
company
grows
so
does
the
number
of
data
scientists
and
when
the
number
of
data
scientists
grow,
they
usually
speak
into
teams
then,
and
they
create
a
lot
of
knowledge
and
it's
impossible
to
consume
all
the
knowledge
that
that
is
created.
So
data
scientists.
We
usually
talk
here
about
data
scientists,
doing
machine
learning,
but
that's
not,
I
would
say,
that's
not
50
of
of
the
job
of
data
scientists
more
often
or
not,
they're
not
doing
machine
learning.
A
They
are
not
creating
machine
learning
products.
They
are
using
data
to
create
intelligence,
to
help
business,
make
the
better
decisions
so
and
on
that
side,
creating
metrics
testing,
metrics,
creating
cohorts
for
studies,
and
things
like
that.
So
on
that
sense,
those
are
usually
they
are
developed
with
an
egyptian
notebook
or
on
our
markdown.
And
then
a
report
is
created
on
a
google
docs
or
something
where
they
share
with
their
with
the
stakeholders.
A
But
the
point
is
when
that
jupiter
is
completed
and
then
when
they
create
a
report,
the
jupiter
or
the
markdown
is
pushed
to
a
gitlab
repo
or
something.
And
then
it's
completely
forgotten
forever.
A
When
I
wasn't
a
scientist
we
would
have
under
on
a
on
a
previous
company.
We
were
about
200
data
scientists
and
we
joked
that
every
two
years
we
would
just.
We
would
never
run
out
of
work
because
every
two
years
we
would
just
do
the
same
work
again,
because
everybody
forgot
that
that
work
was
done
in
the
first
place
or
that
analysis
was
done
in
the
first
place.
A
I
tackled
this
within
that
organization
by
deploying
a
analytics
repository,
which
is
basically
a
wiki,
but
for
data
scientists,
where
you
would
push
jupyter,
notebooks
or
r
markdowns
into
this
tool.
You
would
make
it
pretty
add
comments,
ability
for
commenting
ability
for
searching
for
displaying
for
sharing,
and
it
was
really
well
received
by
by
my
my
colleagues
and
it
really
really
helped
this
idea
of
okay.
It
is
somewhere
if
I
want
to
go
search
for
every
time.
I
start
a
ticket
or
a
new
analysis.
A
My
first,
the
first
thing
I'm
going
to
do
is
just
go
into
this
repository
and
search.
What's
in
there
the
same
way,
you
search
for
libraries
when
you
are
building
a
a
new
software
or
or
when
you
creating
new
infrastructure,
you're,
not
going
to
implement
everything
for
on
your
own.
The
first
thing
you
do
is
see
what's
in
out
there
and
this
solved
that
problem,
it
was
heavily
based
on
well.
A
It
was
built
on
top
of
airbnb
knowledge
rebel,
which
is
a
repository,
that's
a
bit
dead
for
a
while,
but
it
does
implement
most
of
the
necessary
things
and
what
we
want
to
do
on
this
here,
we're
not
going
to
deploy
a
knowledge,
rapper
thing.
What
we
want
to
do.
What
I
want
to
do
is
to
test
out
you.
A
I
already
did
a
poc
of
this
a
couple
of
years
ago,
but
use
gitlab
pages
to
read
all
the
notebooks
that
are
within
a
repository,
create
a
sqlite
or
something
index
all
of
them
and
create
a
page,
make
them
pretty,
create
an
index
for
them
and
add
some
some
of
this
type
of
functionality,
to
search
and
to
discover
with
tags
and
things
like
that.
A
How
can
I
say
uncertain
of
them?
This
is
the
one
that
I
had.
I
have
planned
the
least
so
far,
at
least
within
the
gitlab
ecosystem,
and
I'm
not
sure
if
this
is
going
to
work
or
how
it
is
going
to
work
or
if
it's
just
going
to
be
part
of
glitter,
something
that
we
can
make
life
easier
for
for
others,
but
well
something
I
want
to
work
in
there
in
the
near
future.
A
A
A
It
makes
me
really
excited
about
what's
to
come
and
if
you
have
feedback,
if
you
want
to
see
something
else,
be
explored,
we
have
the
this
issue
over
here
with
our
weekly
updates
feel
free
to
drop
over
there
or
drop
by
over.
There
leave
a
comment
or
something,
and
I
will
answer
and
that's
it
for
today.
Thanks
all
for
sticking
with
me,
bye.