GitLab MLOps, 22 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Exploring GitLab Pipelines for Hyperparameter Optimization - Part 0: What is it?

Description

We are starting a series of videos exploring wether GitLab Pipelines can be used for Data Science use cases. To kick this off, we go through what is Hyperparameter Optimization.

Hyperparameter Optimization with GitLab Epic: https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/6

Incubation Engineering MLOps Updates: https://gitlab.com/gitlab-org/incubation-engineering/mlops/meta/-/issues/16

A

Hello, everyone and welcome to another session on mlaps here at gitlab. My name is eduardo and I'll be uh walking a little bit talking a little bit about hyper parameter, optimization and what does it have to do with gitlab?

A

So it will be a little bit more of a conceptual uh concept uh chat rather than uh talking about something that was done, but it is important to pave the way of uh from to what we're gonna do next.

A

So why are we doing this? um My role is to explore emo data science use cases within gitlab, so test out our tooling for all the use, cases that might come up or most of them and hyper parameter. Optimization is a process within develops life cycle that is very long and tedious, and but that can have great results and can be part of the ci process for machine learning. So we're going to look uh over the the next few weeks, I'm going to I'm going to explore a little bit. What does it look like?

A

What is the user experience? Can our runners do that and the cool thing about working with hyper parameter optimization? Is that it kind of paves the way um towards autumn out and we'll talk about this a little bit later so from the beginning. Machine learning model is an artifact that is created from applying an algorithm to a data set that was processed. uh So you have data set. You perform some processing, so change size that the the image size uh convert.

A

The text uh clean up some events and so on and so forth, and then you apply an algorithm to this extreme boost random forest.

A

Many more and then you train both these together and then you get a mortal artifact, but both the data processing step and the algorithms that you're going to use. They have configuration parameters, things that you pass on the constructor of the model that will change how it behaves uh or the the search or how it uh how deep it goes into the data or just tweaks, and these we call hyper parameters.

A

Some examples that we can we're going to see of this, like if you have a neural net, the number of hidden layers that you have in there. If you have a decision tree the depth of the the maximum depth of the decision tree or how many uh different cuts you can have over there on the random forest, the sampling, uh how many features you're going to use?

A

uh If you have images, the the width and the height of the image can be considered after transformation can be considered uh hyper parameters as well, so anything that you pass to change the that changes, the behavior of the model training.

A

These are called hyper parameters and the thing is choosing the right. Hyper parameter can have as much impact as choosing the right, algorithm or choosing or improving the data set itself. So it is really important step of the process and it yields great results, but it takes a long time to perform.

A

What you do usually is that you have something called hyper parameter, optimization, which is the process of choosing automatically the hyper parameter for the model and for the data. So it goes like this. You you, you rank some trials, you train models for all of these trials trials is a is a uh is a group of hyper parameters. A hyper parameter configuration that you want to test. You train all of these models. You compare the results, and out of this you choose the best model.

A

This can be done manually or automatically, uh and from this you can see that in the end, at edit score, hyper parameter. Optimization is a search problem, so everything that we know about search uh problems can also be applied here, so how to rank, uh how to rank the trials. How to choose the best trial when to stop when to start so and so forth.

A

So first step of this is that that we can talk about is how to create trials. You have two ways of doing that: one. You can pre-compute all the trials that we're going to test out, or you can do this in a more iterative way.

A

So, for the first case, you have, for example, grid search that that that you have a couple of different parameters you want to optimize for so and each one has two three or I don't know how many you just test all possible combinations. So you create a huge list of combinations and you test them all. So it's brute force, random search, similar. You do generate. uh You might generate uh this all these lists, but you test at random the hyper parameters in itself. They come from random distribution.

A

So this reduce, even though you have about the same uh search space that you had before now, the random search that you you select at random, uh which space you're gonna look at and then on the iterative creation, uh and for this both for both this the grid, search and rhythm search. You come up with these values uh beforehand, so you create all these values and then you test all of them, and then you compare your results and then you come up with the best model for iterative creation is a little bit different.

A

You create a few initial trials, so like c trials, and then you test them and then you look at the results and then you create new trials, so you create iterations over those you try to.

A

Instead of looking at every parameter, you try to be a little bit more a little bit smaller on where you're going to so genetic algorithms and bayesian uh methods uh are examples of this more iterative uh creation of the trials and then the second step is testing trials. So, like uh I showed before you have the trials and then you create a model and then you compare results. But the thing is you don't know if the trial is good, are the results or is just that?

A

It was a really good match between the data that was used and the trine itself to improve this, to make it a little bit more uh to have a little bit more notion of generalized visibility and see how it will behave to new data sets we use something called cross validation. You take the initial data set, you split into n folds, so one two three four five and then you train five times so for each trial. You train five models.

A

Each one of the models will remove part of the data set and will test on that part of data set. It was removed. It means that you have five different.

A

uh Performance results that you can now create a distribution of of how well does that hyper parameter uh configuration tests. So, in the end, during this process, you're gonna train about number of trials times number of folds models, so you can quickly see how fast this world and often training a machine learning model is not something quick. It can take hours days sometimes seconds perhaps, but it is not cheap. It is not fast and it's a very tedious uh process in general.

A

So this is why hyperparameter application is important um and why a pipeline having this as a pipeline on gitlab is also really important, because then it can become part of the ci processing itself it can. uh You can create a commit that changes, the mod the code for the model, and it already starts the pipelines to choose the best hyper parameters for that you can already use all the runners that you have. You don't need to use the the the data sciences or machine learning engineer laptop for that.

A

So that's pretty great um and we need to ex, but we do need to explore a bit. How this use case can uh will uh behave with how gitlab will behave within this use case and, like I said you, this is the first step towards autumnal, why you can think as the choice of algorithm in itself as a hyper parameter.

A

So one level above hyper parameter, optimization, getting hyper optimization to work correctly on gitlab can lead us to actually implement be faster at implementing automl in the future.

A

So I'm going to just show a quick example over here that I've created of a very simple hyper parameter, optimization notebook.

A

So this is a very small data set about 10, 000 or something rows, and here I create a very simple algorithm, a random forest classifier that will look at this data and I fit this model and its accuracy is about seven three, eight uh and area under the curve is seven, two, five, seven zero point, seven, two five! So all of this one is the best zero is the 0.5 is bad.

A

So what I'm gonna do random forest. If I, uh if I open the I'm, not gonna, do this now, but if I open the the documentation, it has many parameters that I can choose and these are some of them. So I have a max depth and I can say here I'm gonna test out two five and ten and then from examples.

A

A hundred and five hundred and minimum samples split two and ten.

A

And then I'm gonna run a grid search uh with five uh folds on the cv and I'm gonna score over accuracy, and then it's gonna run all so. It's gonna fit 60 60 different models. There are 12 candidates, so 3 times 2 times, 2 times 5 over the number of votes.

A

This is a very small data set, so it doesn't really like it's very fast, so it trained really fast, but it did run for 60 more different models and then I can check my results uh and I can do some plots so, for example, the max depth. I can see that there is the the test score increased when the max depth increased. So I can see over here. I can also see over here that for the max samples it tends to increase as well and for mean sample, split, doesn't really change.

A

Another way of looking at this is to using the hyper high plot, which is, I have here, the test score and all the configurations, so all the different computers. So I can look over here and see that for the high scorers.

A

uh There were mostly max depth was high and over here it just didn't really matter so I can see, for example, okay. So if I increase my max depth, where are the results coming in so they seem to be higher than if I.

A

Decrease them, let me see over here.

A

Let me see over here.

A

Oh, it broke for some reason: okay, okay, let me just regenerate this plot. Live coding is always like this. Okay data. Sorry, so if I have low max step, results are generally on the bottom side of the test car, but if I have large, they tend to be larger, so this is another way of looking at these parameters.

A

um So this is a very quick example of how this can look like up next we this is. This was just a introduction of what hyper parameter. Optimization is, and why are we doing this? So the next step is create a very simple uh pipeline that performs this job on a ci, so I create a command. I create a merge request. It already computes automatically the the correct the best values and does everything for us. So thanks and see you soon.