GitLab Incubation Engineering - Latest, 1 May 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab Model experiments + MlFlow integration 15.11 Overview

Description

An exciting update with the release of Model experiments to all users, and the next step on the MlOps @Gitlab journey: Model Registry!

Demo: https://youtu.be/uxweU4zT40c
Documentation: https://docs.gitlab.com/ee/user/project/ml/experiment_tracking/#machine-learning-model-experiments
Feedback: https://gitlab.com/groups/gitlab-org/-/epics/9341
Handbook: https://about.gitlab.com/handbook/engineering/incubation/mlops/

A

Hi there, my name is Eduardo I. Am the incubation engineer for melops at gitlab and today I'm going to be giving an overview of gitlab mode experiments and mlflow integration at 15.11 version, so everything that I show over here is already available on gitlab.com and for self-managed customers at 15.11 upon enabling a feature flag.

A

So what is modex burn tracking in the first place? So when we are training a machine learning model, a machine learning model is a combination of three components: the code that you run to create the trainer model, the data that the model is trained upon and a set of configuration parameters that you pass into the model or to the code or to the Transformations that we call hyper parameters each one of these components. If we change, we can change the the the we won't change the performance of the model, um so, for example, common practice.

A

We have had parameter tuning, we'll, try a whole set of hyper parameters and see what is the best performance. A candidate is a set of these three components: code, data and Hyper parameter, and an experiment is a set of candidates that are comparable between each other. So once we have many variations of these candidates, uh we group them into an experiment to see what is the best candidate over time or over a short period of time, um and that is called an experiment and experiment tracking is a way. Is a tool.

A

Is a registry that tracks all of the candidates that were created uh so that we compare our tracks all of the artifacts as well, that were created for each of the model candidates um so that we can go back and Trace the lineage later and I'm using the term candidate or model candidate, because each model candidate is a candidate to become a model version.

A

So we're talking here about experiment, tracking, but there's another component called the model registry that comes on the next step, which has versions that are uh shown that are exposed to the application and versions that are tracked by product metrics. Why? On experiment tracking, we track a lot of different things that, but the experiments are more measured on model metrics like accuracy or AUC and yeah. So two different things: each candidate can become a model version eventually, so you can think of the difference.

A

Looking at the devops life cycle model registry is more Associated to packaging and releasing uh a model, while motor experiments is more Associated to creating and uh to the create and verify stage on uh on the devops lifecycle and how we're tackling this.

A

The most common, most popular tool for uh model experiments or external tracking is enough flow by data bricks highly used by Community. uh It has a great client Library, a great SDK. It defines models well already, like all the different possible libraries that you can use to create a model. It already has some integration with them.

A

Large user base open source is very popular, but it doesn't have some of the features you would expect from a uh what you would want from a tool to be used: Insider Corporation, for example, it doesn't have user management you have to deploy, so somebody will have to go and create an internal deployment of ml flow. It does integrate well with the other or there's no out of the box integration for other tools.

A

um So that's where gitlab comes in you. We use the client from a flow and we re-implement it. On gitlab side we made gitlab as a back-end Channel flow, so experiments MFL experiments are tied to a project, gives user management. If you want to track register track artifacts, we can already log them directly into the gitlab package registry data. Scientists don't need to set up anything, you just change your URL and it automatically works with gitlab.

A

So what was achieved on 15.11 I will not give a full demo on this. One uh I recorded a demo. You can see a little bit over here, but you can also attach the link below later, but on 15.11 WE Consolidated, the storage of artifacts. So now you can save really well. You can save an artifact from a more flow client, intricate lab. It will organize all of the candidate artifacts. For you very nicely, uh we released uh to all the users, uh so all users have access to to model experiments.

A

Now on gitlab.com uh you are able now to delete experiments and candidates, and you can download the experiment, data CSV. This was all shipped on 15.11.

A

um But what about the next steps that we're looking at? So there are three ways we can go with this with uh with evolving. Now that it's released, we, we are gathering feedback and there are three ways we can move forward. One we can improve the feature set itself so, for example, um improve more experiment Itself by adding display for images of of of curves that you generate for through your model, we can add graphically comparison of of candidates like mlflow does and many others.

A

uh We can also improve the search that is very uh simple now for candidates, um just better candidate Management, on the on the on what experiments or we can start looking how desmod experiments fit into the platform, how we can use the information that we collect from what experiments to improve your merge request, experiments for example, so you create a merge request and that merge request creates a lot of candidates.

A

We could display display the best already into the merge request itself or integrate better with pipelines. I, don't know, add comments a lot of things or we can do something else, looking into a different vertical and different area to to work on, and that's what we're gonna do um model registry. This is the next step for incubation engineering mlaps.

A

A

Because mod experiments, mainly because mod experiments need model registry, uh we already had feedback from from users uh that already gave a shot at a more experiment feature and they love how easy it is to set up and everything. But since they don't have a model registry, the artifacts that have nowhere to go to.

A

We want to give users the ability to manage the whole life cycle of a model directly on gitlab, from the creation, from code review, from pipelines to the experiment, phase and then into model registry, so that you can, you can later deploy your application, so it doesn't really matter how much effort we put in tomorrow. Experiments if we don't have modern registry in place and that's we are going we'll, also be following the same guidelines or the same design principles that we had for more experiments.

A

We will aim for minimal changes for data scientists that already use uh some kind of tool uh by integrating with ammo flow client, also for the model registry um and so zero set up zero or keep code changes at a minimum ah and yeah. This is where we're going nice. This will be a I've already starting on this as soon as next week, so I'm very excited about this new Direction, um so that was it what I had for today again modern experiments is already available.

A

If you want to give it a try, um you can also uh there's an epic for this at the feedback issue. You can check all of my updates on the handbook and on the docs. uh Those are the docs for experience tracking. If you want to get started thanks for watching and have a good one.