GitLab Incubation Engineering, 2 Dec 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Running Hyperparameter Optimization Jobs in Parallel

Description

Part 2 of Exploring GitLab Pipelines for Hyperparameter Optimization

Last time, we created a very simple Hyperparemeter Optimization pipeline, that runs the trials sequentially. However, that becomes quickly impractical, since in real world use cases training a single model can take hours, days or even weeks. This week, let's explore how to run those trials in parallel using Dynamic Parent-Child pipelines!

Code: https://gitlab.com/gitlab-org/incubation-engineering/mlops/hyperparameter-tuning-exploration/-/tree/part_2_running_trials_in_parallel
Exploring GitLab Pipelines for Hyperparameter Optimization Series:
https://gitlab.com/groups/gitlab-org/incubation-engineering/mlops/-/epics/6

A

Welcome everyone to another uh edition of emelops with gitlab, and today we will continue the series of exploring gitlab pipelines for hyper parameter, optimization and this time we're going to talk about hyperparameter uh running the jobs in parallel for hyper parameter. Optimization. My name is eduardo. I'm an ecobase engineer for mlaps and just to recap: why are we doing this in the first place, so hyper parameter?

A

Optimization is the process of in ml development in machine learning development, where you choose the the best parameters for your model for your algorithm automatically, so the configuration of a of an algorithm can make. Can the right configuration make a huge impact on the performance.

A

But choosing this automatically is a very tedious process, uh very time, consu consuming resource intensive uh process, and it basically must be done through pipelines.

A

So we want to check out if gitlab fits the bill here, if we can use gitlab for for uh for this use case, it's also the first step towards ultima. So if we ever want to do, automl hyperparameter optimization is one of the uh of the steps on autumnal.

A

uh So if you want to check out what everything that is being covered over here and follow up for the new videos that will come up soon uh follow the epic, the hyper parameter exploration, epic and so what we did so far on part zero. We explained what is hyper parameter optimization if you're new, if we're arriving now- and you wanna understand a bit more. What we're doing here go. Please go back over there on that video.

A

uh I I take a little bit more time to turn to explain all the edge cases, and why is this important and then on part one? We create a very simple pipeline uh that runs hyperparameterization, but it runs sequentially and the thing about hyper parameter. Optimization is that okay, so, for example, on this one I have on this. It's very simple that I created it. Has I don't know: 18 different hyper parameter combinations, each run five times, and each model training takes a second.

A

So it should take about two minutes to run everything real world, though it usually would take. I don't know days hours, uh possibly even weeks, to train a single model.

A

So if you and the number of hyper parameters that you're going to be exploring is a lot larger than 18., so you cannot wait for an entire month or a quarter just for hyper parameters to be optimized right, um so sequentially running the trials, don't really don't really scale that well so if we can't rely on running them sequentially we have to do this parallel, um but how? How can we do this? The thing is like our hyper parameters are not preset. They are defined on a hyperparameter file.yaml file when I'm coding the ci.

A

I don't know how the how many jobs I will need at runtime right, so I cannot really write them down and even if I could, that would be a very tedious process.

A

So how can we go about this and the solution for that is dynamic pattern, child pipelines, so a parent child pipeline in gitlab is a parent char, apparent pipeline that triggers a child pipeline and a dynamic one is a parent pipeline that creates a through some script, a pipeline definition and then triggers that new definition on a child pipeline.

A

So that allows us to do whatever we want. Basically, we can encode every kind of dynamic behavior we want for that to happen into a static that is created afterwards right at runtime, and this is where we want to go. So, let's take a look on it. How it looks right now, so here I have on the hyper parameter uh repository.

A

I have a I already a merge request ready- and here we can see this- is the parent pipeline uh only two jobs, one for generating the the ci another one for running and then here on the downstream, I can see the child pipeline, so this prepare optimize and publish this is the entire thing? Is the uh the pipeline that you're taking a look at, and this is very similar to the old one? We have the only difference, then, instead of running sequentially, now we run them in parallel.

A

So this is actually what we we wanted to achieve, and it's great that we can do that right now. So, let's take a bit, let's look at how we did did we implement this? So first we have the parent pipeline. Like I said very simple, the only goal is to generate optimization, ci and then run the ci. So it's a very small configuration file and the only important thing is that one over there it's a script that generates that generates the file.

A

So it takes a uh a template that I wrote in ginger to and reads the hyperparameter.yaml file and then creates this quite large uh ci file, and you see how I repeated over and over. I run trio, one zero one. So for each for each of the trials that I have defined on a hyper parameter. If the combinations that I have defined on hyperparameters.yaml, I have one run trial job.

A

I was pointed out that there's a much simpler way to create this with the keyword parallel. It will create the all of this automatically, but you still need to pass for parallel the number of jobs that you want, that you want to start, so you would need dynamic pipelines either way. It's just that it would become a lot cleaner to read. So I might want to implement that in the future and then with that it creates a child pipeline.

A

That looks like this so like I showed before it has a preparation step, and then it has a run step and then a publishing or, if not finalized, step, prepare step.

A

Two things, one prepare the data, still the same synthetic data that we used before, but it also prepared the trial files. So it creates a file with all parameters that each one of the trials are going to pick up in theory. I could pass this directly when creating the definitions, but using a file like this will help us uh in the next step, which is making this iterative, so it could be done a little bit different.

A

Maybe for this specific use case, where I'm just running them in parallel, it wouldn't be necessary to create this file, but it will be clear on the next video on the next step. Why was this important?

A

So I create this file and then the next step is uh run so run. Each one of the trials will pick up one of the one of the items on that configuration file that I just showed it will train the model and it will create a csv file with the id and the statistics for that training. So what was the score? What was the training time? What was the feeding time for each of them?

A

Then uh we come then we have the finalized pipeline. uh The step uh the first part of here is the part. Here is the collect results where I read all of these files that were generated and compute statistics on them. So basically remove the sqlearn part of optimization and implemented everything on my own, so we hand implemented, we will implement those so that we could take control of how those things run. So we are even though we are using sqlearn for cross validation. We are now we don't use this for the optimization anymore.

A

We have implemented our own optimization algorithm, and then we publish the results so.

A

This is um this: is the work there that was done so uh so, for example, here I can show the the results being uh displayed. So if I come to the over here yeah and here you can see the results that are displayed. uh The last step. It comments on the on the on the merge request, the results of each one of the trials and what is the best trial, so pretty cool.

A

um So now look at this some improvement points that appeared.

A

The iteration is even slower than before, uh because before we could still at least use uh the the online editors like the the pipeline editor and things like that, but on this one the gitlab ci file is just the parent one. The child pipeline is a separate file, so you can't really use the pipeline editor. So what I did is I created a child pipeline first, with some hand, uh hand typed trials, and when I was happy with that, then I transformed that into a template, but it becomes a lot lower.

A

The the the iteration time of this um and one thing that would make it a lot easier, a lot faster to implement a pipeline. Also, a lot more a lot less resource intensive to implement a pipeline is the concept of checkpoints.

A

So suppose that my I have six steps on my pipeline and fourth step take an hour and then steps five. Six and seven are very quick.

A

Now suppose that step five is failing so and it depends on the output of step four. So if I want to fix the step five pipe and I have to rerun everything again- um and that is very very annoying, so it would be really useful if I could store the state until stage four until step four and then replay only step five afterwards, so you don't need to rerun the entire pipeline. You run everything that was correct.

A

You keep you don't need to run. What was correct, you keep that and then you just run the new steps, uh so that would make the process a lot more. uh A lot faster, a lot more useful, and that mean you would use a lot less resources as well, and that would be pretty cool to see at some point um coming next next week.

A

Things will get fun because, right now we know what are the hyper parameters that we're going to use when, even when we create the hyper the dynamic pipeline, we know what are the hyper parameters, the combinations, but what? If we don't know what, if we have to compute these hyper parameters at run time of the child? What's what will look? What will that look like this is the the our next step. This is where we start bending a bit gitlab uh two things it wasn't supposed to do, and I'm very excited about that.

A

um So and after that we want to really look at jupiter, see if we can um have a quick way of converting a jupiter notebook into a gitlab ci pipeline.

A

That was it for this week or for this part- and I hope you enjoyed it and we'll see you next time. If you want to see this video check out everything else, please follow the epic that I shared before cheers.