GitLab MLOps, 5 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab MLFlow Integration - What Why and How

Description

As we gear up to start exploring an integration between MLFlow and GitLab, we want to talk about why we are doing this in the first place, and where do we want to go

Epic for feedback: https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/9

A

Hello, everyone, um my name, is eduardo and today we're going to be talking about kit lab and in outflow, and if you follow my updates, I've been talking about mfl for a while and I'm going to start working on it on integration, and I think it's important to take a little bit of time to talk about what why and how we are doing this integration.

A

First of all, what is mflo mflow is a modern registry. Motor registry is a component on the ammo flow mlab stack responsible for managing uh the model, uh the machine learning models that we create. So what is available uh in production, or not uh the list search bureau upon all the things that usually do with a registry same that you do with the docker registry.

A

uh It's a very common, uh it's a very uh popular open source project and yeah, but more important that what it is is what does it enable teams to do? First of all track experiments when we create machine learning models, uh a big important part of this is hyper parameter tuning. uh We create a code, but the parameters that you pass to the code can make a lot of difference on on the output.

A

So what we do is we create an experiment where I run this training multiple times and uh with different parameters, whatever float does whatever flow makes it easy to do is giving all of these runs that you just had uh makes it easy to compare and explore your hyper parameter space. So, for example, here I can compare these two hyper parameters on one or more uh different metrics that I'm interested on, and that makes it a lot easier.

A

Having this runs, the next thing that makes a lot easier to do is create a model, so you can register register a model with ammo flow and promote a run to a model to a version of a model. So then you can track which version is available. On staging on develop on production. You can search.

A

You can roll back all those uh classic uh necessary things to do with uh with uh packages and having that model, then it becomes really, and since this model is a flow packages, is kind of like a pattern or a cask, so it makes it really easy to create a server that will create predictions for uh your model and it makes it really easy to deploy.

A

So my flow already has a lot of functionality for this um both to serve this, but also to deploy machine learning models in uh to sagemaker and to azure not to gcp yet, but it makes it really easy to go through the entire flow of creating testing and deploying your managing and deploying your machine learning models, that's really cool, but why?

A

Integrating with gitlab? First of all users seem to really want it. So we ran some small surveys to check temperature on this and see what users wanted and mo flow features came consistently as the most voted. These are small sample sizes, but it does give an idea of what we're looking at here.

A

So users are very excited about this. Apparently, second, it makes sense with the devops platform. So from what I spoke a little bit earlier. Ml flow or model registration itself touches, create verify package release monitor.

A

It really involves a lot of devops in itself, so it makes sense to be part of the devops platform and third, it's not gitlab can actually make emma flow easier.

A

If you go through the issues open on ammo flow, you're gonna see that a lot of the things that are mentioned over there gitlab can help with, for example, the biggest pain point, the most voted issue and something that comes every time I speak to a user about, mflo is authentication and permissions and user support memo flow doesn't do any of it, and git lab could provide this through gitlab own permission system, and there are many instances of where this can happen.

A

And how we will uh work on this, so the three underlying guidelines that we have in mind is first self-managed. It should work for some managed it's not only for gitlab.com. This is also for users that deploy uh behind uh their own servers. So it's self-managed. First second uh point: it should just work. This shouldn't be.

A

The whole point is to make mo flow deployment easier and along with gitlab. So we want this to be a zero overhead for users. You install gitlab and you have emma flow and three. We want this to be more than just providing an nml flow instance. We want this.

A

We want to one make emma flow better by being part of gitlab. uh We want to make ammo flow easier to install easier to use easier to manage, but also we want to make gitlab better by having ammo flow integrated.

A

So when to surface the right information, uh for example, how can I use ammo flow at code review? How can I use ml flow? uh What information can I show from memo flow on the commit page of a model or so on so forth? uh So there are many places where we can, uh where we can help with this. uh It's a multi-game thing um and the plan. So this is how it start. The first milestone that I'm looking at here is storm of flow as a component on gitlab installation.

A

So when you install uh gitlab or your jdk instance, you also the same will have a component forget a gitaly or whatnot. You have a component for mlflow. It is installed together.

A

Second, milestone authentication through piggybacking on projects. So when you open a project, it will have your own mlflow. Each project has its own ammo flow. Installation and authentication is based on the project. So you get this for free and second, you also have to configure where, um where stor, where artifacts are stored for mflow, we can use gitlab storage for this, and this is the first part in the installation. The second part is usage, uh how the users actually use. So you connect the first milestone for usage.

A

You can open uh emma flow ui through gitlab.com, slash my project, slash mflow, so you're going to have the ui that I showed before second milestone. uh Well, the same milestone actually is: each project has its own tracking server, so it's gitlab.com myproject.mlflow that would be best when you're creating the model so that you can track the runs it's central, gitlab and over time at the beginning. The first milestone is to use emma flow ui, but we want over time to rely less and less and less on the malfoy ui and migrate.

A

All the information into the gitlab ui so use mflow as a backend for this, but the front end will be gitlab and that's what I had so if you have any feedback, this is the link I can uh for the epic of 4ml flow, I'm accepting any ideas, any opinions, any I don't know whatever you want to say it's very early and of course this is incubation engineering. This is an exploration that we are doing. uh We want to explore if this is viable or not.

A

This is not really a promise, but just sharing my vision for this. Thank you for watching and have a great day.