GitLab Incubation Engineering - Latest, 22 Mar 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Gitlab Experiment Tracking - 15.10 Overview

Description

- What is experiment tracking?
- Demo of the feature and user experience at 15.10
- What's next for 15.11

Epic: https://gitlab.com/groups/gitlab-org/-/epics/9341
Feedback Issue: https://gitlab.com/gitlab-org/gitlab/-/issues/381660
All Updates: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/16

A

Hello, everyone welcome to uh another update for incubation engineering ml apps at gitlab. My name is Eduardo, and today I'm gonna be giving you an overview of how the skit lab model experiments, look like in 15.10. What's the current status and where we're going from here.

A

First of all, what is model experiment tracking when we are creating machine learning models? A machine learning model has three main components that might Define Its, Behavior or its performance. They call the code that is generated that generated a model they dated. It was that it was used to train the model and something we called hyper parameter or configurations that were passed on to this code and this training process.

A

Each combination of code data and Hyper parameters will generate a candidate, so a candidate is something that might become a model eventually and a set of comparable candidates based on some specific performance, metric or I, don't know use case is called an experiment. So it if you change either the code, the data or the hyper parameter you're going to have a different candidate and experiment tracking is a way to make sense of all these candidates. That you generate across your training across your iteration when you are evolving your model across time as well.

A

So it's it's. A metadata uh storage helps keep track of all of an evolution of our models and why does it make like, and how are we doing this in git lab so ammo flow is great. Ml flow is an open source Library most most popular Source library, for uh my experiment, tracking is by databricks. It has a really large base is a it has a great Library client Library, uh it's open source as well, but the issue here is that it doesn't provide a lot of features that we expect nowadays on the Enterprise world.

A

So, for example, it doesn't provide user management. um If you have it makes it doesn't have an auto. It makes you have to deploy yourself and so on and so forth. So what we do by having gitlab act as a backend for them of our client is that we provide user management. We already provide an artifact registry.

A

um We and all of this with zero set up for data scientists, I'm going to show you very quickly how this works. So what we're doing here we're re-implementing Emma flow back-end, intricate lab, so gitlab works as the backend for ML flow client, so I'm going to show how this works uh right now with a uh with a demo. So let's minimize this over here, so on the right side. Here you have a the code that is used to train a model. This is taken from ml flow uh documentation. It's pretty simple!

A

uh No I didn't change much from the original, so the code is the same code that you would use just to to restore your experiments into mlflow, and here I have a project that I just created so first things. First I need to create a uh a access token, so that Mr flow can communicate so I'm going to go over here and access token I'm, going to create demo token to I. Don't know it needs API and it needs right repository and it will create a token for me.

A

So, with this token, I can already start training and recording to gitlab so over here. If I go into ml experiments, I will see that I have nothing. uh No experiments have been tracked yet so I'm gonna go over here.

A

It's the same code that I use for ammo flow and for gitlab the code doesn't change at all. The only thing that changes is that now I have to pass two additional variables for the running. First of all is the token that I just created and the other one is the ID of this project. In this case, 2020 is 22., so over here, I just passed this 22., okay and now I can go back into experiments and then apply and then I can run and it will start creating.

A

This and saving Circle already reload over here I see that it already created the experiment without any changes to the code base at all and you just by just pointing the ammo flow code to gitlab. It already starts tracking intricate lab, so over here, I can already see that the candidates have been are being created and, moreover, if I go into artifacts I can see a git Labor Ready stores the artifacts. It's automatically it automatically stores the model artifacts intricate lab.

A

So you have the model, you have the requirements and everything it uses gitlab package registry for this. um You also have over here uh and yeah. This is the current state of this feature. If I go over here, I can see a little bit more about the candidate itself.

A

So as soon as it compiles quickly.

A

So it stores all of the metadata generated by uh by the mlflow client as well, so Stars, the metrics and the parameters and everything that's needed. So going back to the presentation.

A

uh So this is already available internally for gitlab, uh for for colleagues for, and they already gave some feedback on this. So first of all, it serves a lot of the problem of management, of setting up of of user access to specific experiments or to specific models because it attaches to the project. So you can manage uh the users that have access to a specific model or to a specific experiment based on users who have asked access to that project, so it makes it very straightforward. It also make very straightforward to store artifacts.

A

The user doesn't need to configure a bucket for uh for ML floor or anything since it already uses the gitlab art uh package registry. So it makes very straightforward, there's no setup necessary for the data scientists. It only needs to create a token and that's it, um but it's right now it only keeps track of it. It only keeps track of the candidates. It really needs a model registry to bring this future forward.

A

So what users want is to manage their model life cycle so from created experiment and from coming I'll talk about this a little bit soon and making a experiment becomes a become a model, and the third point of feedback is that right now it doesn't really help users recording the the information that this trainer generates. So imagine if this runs on gitlab CI, we can pull all of the information from the CI from all developmental variables and cross-reference with the logs with the user that triggered that one with Mr. That is running.

A

So this is what we're going to do next and going back about the subject of experiment of experiment. Tracking model registry model registry is different. They are very related, but they are different between one another. So model registry happens after the experiment. Experiment is at the create when they're still creating your your models. You still thinking about the n, iterating and a model registry happens when you're already deploying the model, so uh you have an artifact. They want to deploy so experiment.

A

A candidate can be promoted to a model version which can then be served to an application. So a motor registry closes the loop that experiment tracking creates, so experiment tracking is when you're iterating very early on. Sometimes the the code is not even on the data on the under repository yet, and the model registry is uh when you already have uh usually a model, that's trained through a CI pipeline, passing all of their the the checks, and then uh you want to serve this one so their complement, they complement each other.

A

They close the loop, and this brings the the solution to the the users want.

A

um So the difference between them is that the metrics, when you're tracking experiments you care more about the motor metrics so area under the curve, Precision recall and when you're doing model registry, it's more on the usage metrics. So like click-through rate things like that user facing like experiment, tracking experiments are not user facing on the model registry. They are on experiment tracking you create artifacts, either locally or on an automated way.

A

Like a CI and a mother's registry, you will likely most likely just create an automated one and on this stage that we talk about devops stage, experiment tracking is on the create stage and while the registry is more in the packaging stage, um so with this, we are at a point that it's usable and we are improving this we're going.

A

We are planning on releasing in the next few weeks this, uh as is for testing for to to all users, so they can have a a test on this uh and it will be you keep working on this so uh right. What I'm working now, what I'm going to be working on is Rick is consolidating artifact storage, um make it more user friendly, make it better, uh adding some basic features that are necessary, like managing experiments and candidates, deleting, for example, it's not over there and afterwards horizontal features.

A

So my horizontal features I mean integrating across gitlab integrating expert on tracking to the CI to the Mars and so on and so forth. That's what I had for today! Thank you for watching bye-bye.