GitLab MLOps, 14 Dec 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SEG MLOps Update - December 14th 2021

Description

This week: Running Jupyter Notebooks as Gitlab Pipelines,

Feedback and Ideas for Jupyter Support: https://gitlab.com/gitlab-org/gitlab/-/issues/343024

Feedback and Ideas for Pipeline Experimentation: https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/6

All Updates: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/16

A

Hello, everyone and welcome to another session on uh mlaps here at gitlab. My name is eduardo, and this is the update for what day is it 14th of december?

A

um And today we're going to talk about pipelines, we're going to talk about jupiter and further news on the industry? So just as starter, remember the vision of the sag uh of what I'm doing here is. I want to make gitlab 2 where data scientists want to use it not just have to use it but want to use it, and we are doing that by looking at our portfolio and see. Where can we augment data science and machine learning engineers workflow with gitlab? What can we update? What can you change to make their life better?

A

So on that sense, let's begin with what was done this past week, starting with uh glitter. What is glitter glitter is gitlab, plus jupiter one, a very common uh workflow in data science is the data.

A

Scientist goes and creates the the model in the jupiter notebook and does everything on the jupiter notebook and then it wants to put that model into production, but production doesn't really play well with jupiter box and what happens is a machine learning engineer or a software engineer will usually pick this up and or the data scientists will translate, will migrate that code from jupiter into a pythons python script. But a lot is lost within this translation or it can take a lot of time or it doesn't really work as expected.

A

So why not try to not move uh the code from away from jupiter, so this is where glitter comes in. Glitter is a way of converting a jupiter notebook, a sequence of cells into a gitlab pipeline, and I this week I created a poc for this. So this is part of our exploration series on gitlab pipelines for hyper parameter. Optimization, we're not really working on the hyper parameter on this step, but more on the jupiter side. But it's really interesting so suppose that I have this notebook over here. um Very simple one.

A

Three steps there's some thing here that has not been shown, but there is a cell over here. If I show the raw uh here, there's a configuration uh part as well, uh and three steps prints, hello, one, hello and then hello, two and then hello three. So what do I do?

A

I come over here and I create a parent pipeline that calls glitter and glitter parses this notebook into a yaml file into a valid ci file, and then it runs. So it's a very simple script. uh It just picks up, for example over here. This creates each step. This is a very early concept, very poc and then it creates a pipeline which in turn, calls glitter run, which has a function that just executes the function within a a cell within a notebook.

A

So I pass a path and it passes index, and then it's going to run just that piece of code. So what it would looks like on the pipeline is this thing over here. So first it generates a notebook. Ci then runs notebook, ci and, like I mentioned, there's one job for each cell and, for example, this is zero cell, zero job prints, hello, one hello, one, hello, one as expected.

A

um So there's a lot. This is a very early concept. Of course there is a lot to be done in here, for example, sharing state between cells or running multiple cells as part of one job, or I don't know plenty of stuff to to help to to move here. Manual events better configuration so instead of running sequentially can I run some cells in parallel, so lots to be done over here, but I think it's a very promising uh start and something that nobody's really doing actually uh on the community.

A

There's this big bias or not bias, but line of thought that data science shouldn't be done. Anything that is, production shouldn't be done on a jupiter notebook, and I think I kind of disagree with that that I think if the team is uh comfortable with that and wants to use your notebooks there's, no reason why not sure the the problem is not the jupyter notebook is the tooling and the lack of tooling that makes it complicated.

A

So this is where what we're tackling a bit over here um then, uh after the the the pipeline, we also worked on the jupiter experience. uh Those are the two main lines of work we are looking at right now. uh We started by refactoring the code for the diff that we released before so it was across. We wanted to publish something rather than uh really be thoughtful with the code base, but now that we have like that now that is live and it is uh being used.

A

We wanted to expand and uh refactoring was important and from user feedback, which was really good, because it means users care about this. We know that they over here. We know that they still want the rod if they like the the the render diff, but the raw diff is still important.

A

So what we're working now is a way to show both of them at the same time, which is a bit challenging because which that means we need to know the conversion between line numbers so that we can comment on the right place. For example, if I comment on the raw diff, it should display on the correct place on the render div and vice versa. So it's technically a lot more challenging, but nothing that can't be done. So we are working on this.

A

uh It might take a while to get everything right, but it's really uh comforting that we are receiving this feedback, that users are caring about this. They care enough about the gifts that they come over and they they they ask questions they talk about this.

A

um We are rendering about 12k jupiter divs per day, uh which is not a small number considering it was never something we gave attention to so uh 12k, uh it's quite a nice number and we'll be building on top of it. uh Some extra so uh bailey shared on her twitter. She is a director of ai or ml at github. I believe I can't remember correctly. She asked the community on twitter. What is the one thing that it could change at github for machine learning or data science?

A

I linked that right over here. So if you want to go over later, but the highlights are very interesting, majority is jupiter majority talks about either jupiter def or jupiter, rendering one of those it's clear. This is a must, and it was clear for us already, but it's nice to keep receiving this uh the signals that we are on the right track and the second one that they requested.

A

A lot is improvements on github actions for machine learning and data science, and which is also something that we are working on is our pipeline efforts, uh which is great, like both of the things that users are asking. The most is exactly what we are deciding to work on already on. So I'm very happy with that uh and some there was one comment that even mentioned a gpu runner gpu for actions, and it's great that we also have this already.

A

So it was very uh interesting to read that very uh improve the morale so up next here, um we'll finish the the source map that we're that I've been building, um it's going well and we should be done very quickly at least implementing this on the library level, but integrating with uh with gitlab might take a little bit a while, integrating with merge reviews integrating with divs that might take a bit of time, but we are going in that direction.

A

um We want to create a so we can use. Meanwhile, why we don't even the diff that we render is a markdown version. It doesn't show everything on the jupiter as a mitigation we can for now we can try to render pretty diffs like the notebooks in itself using review, apps, uh and, I think, all oh uh on the next week I'll be creating a poc for this see how it goes, and we also have a few customer conversations uh happening up so again.

A

If you have ideas, if you have any, if you're interested on what, uh when we are doing here, comment on the issue subscribe to the to the uh to the epic to the uh update epic and yeah reach out, if you, if you have any ideas, have a good one.