GitHub GitHub Universe 2022 - AI, 21 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ML with Git: Experiment tracking in Codespaces - Universe 2022

Description

Presented by: Dmitry Petrov

As always, feel free to leave us a comment below and don't forget to subscribe: http://bit.ly/subgithub

Thanks!

Connect with us.
Facebook: http://fb.com/github
Twitter: http://twitter.com/github
LinkedIn: http://linkedin.com/company/github

About GitHub
GitHub is the best place to share code with friends, co-workers, classmates, and complete strangers. Millions of people use GitHub to build amazing things together. For more info, go to http://github.com

A

A

My name is Dmitry pietrov. Today we are going to talk about machine learning with Git and experiment tracking with code spaces. When you create your machine learning model, you need to train the model, dozens hundreds, sometimes thousands times and that's crucial- to keep track of all the experiment you run. So you can return to the past and understand what happens yesterday, what happened last last week and that's important for your team to understand uh the progress of the project and how to reproduce your results.

A

The experiment tracking tool set become a crucial part of your ml infrastructure, but you will see that even such a simple problem as experiment track and introduce a huge overhead to your it infrastructure to ml uh operations.

A

Today, we will learn how to simplify operations around machine learning, how to track experiments not without introducing any additional pieces of infrastructure. Instead, we will be using git and ecosystem around git in order to track your experiments and collaborate in the team. A little bit about myself. I am a co-founder of a startup iterative EI. We build tools for machine learning. We believe that machine learning, engineers and researchers should live on the same ecosystem of the tool as software developer. So in this way we can collaborate more efficient and build our idea application faster.

A

We need to use git, GitHub and all data system around to improve the efficiency of our teams. My experience came from Microsoft I was a data scientist at Microsoft for a number of years and later I created an open source tool, DBC or data Version Control, which helps you to manage your data set models in the storages at the same time, use them in your git repositories, with together with your modeling code. Today we will be talking about experiment tracking and how to use DVC for tracking your experiments and utilizing git okay system.

A

First, we will talk about experiment, tracking and importance of the this experience. Why we need this and how people use this to improve their uh efficiency? Then we will talk about your ideas and how to get this experience without any additional pieces in your infrastructure, how to utilize verse code IDE how to utilize git to track the experiments.

A

As a last step, we will be talking about resource orchestration, how to train your model in a cloud using GPU how to get the best developer experience possible and we will be talking about code space specifically, which put together all the pieces, such as gitos code gpus, and can give you the best developer experience while you developing your models, so ml experiment tracking, when you train dozens and hundreds of your models, it's really complicated to remember what happened in the past even today morning.

A

Do you remember what metrics you got when you was training model today in the morning? Do you remember uh what Matrix your model had yesterday or maybe a month ago, it's really complicated even for uh for yourself I'm, not even talking about the team, so you need a system a place when this information is stored together and you can return back uh to any uh any experiment that you run before the system should work as a git for software engineering. But you need a little bit more for machine learning.

A

You need to remember your source code version, uh your metrics hyper parameters, versions of your data, a lot more that you need for uh software development. This is when the experimentation tracking tool came from in general. In the tools you need a few components. First, you need to lock your metric source code and Hyper parameter somewhere, and you need to have a centralized place when you, your teammates, can look at this metrics and Hyper parameters and make some data-driven decisions based on this information and it's crucial to have ability to reproduce the result.

A

That's why you need experiment tracking. This is the maximum value you can get from this to reproduce the result and be efficient. It helps you to be efficient as a team, the first experiment, tracking Tool uh at least that I know it's a circuit which was released in 2014.. If you are doing machine learning for a long time, you probably remember that tool. I will use this tool as a reference for uh functionality of experiment tracking. You can use the a UI of socket.

A

This is how web UI, when you can see all the runs you made like with the Run ID, all the hyper parameters you use, like a learning crate, is 0.003 batch size and metrics. The metric that you model produced for this particular run for this particular set of metrics. So you can return back to the table and see what happens to the ammonium yesterday, a month ago, in such a tool, you can attach more to your experiments. You can go deeper. You can understand what is inside the models.

A

If you put some visuals graphs like area under the curve loss functions all the information that helps you to make this data-driven decision, and even the first experimenter I, can tool supported this functionality. You can put lost function, confusion, Matrix, anything that is needed to remember what happened uh in the past when you was working on this experiment, the success of sacred tool and the rise of machine learning in the industry created a lot more different tools for for experiment, tracking such as tensorboard the mail flow and some others.

A

Majority of the tool appears in around 2018, but there's still a lot of progress in this area. Let's take a look at the architecture of this system. When you work with experiment tracking tool, you need to deal with additional part in your it infrastructure. You need to have the centerless table somewhere when you put the information, so you and your teammate can can look at it. Sometimes you just need a database and web UI on Top This is how initial version of the socket works or ml flow.

A

Some solution put the database and web UI in a form of SAS, so you just need to have a login to the system, and then you will be able to stream the your metrics images to a SAS, so you and your team can look and visualize to the to the experiment. This SAS or service becomes an additional part of your machine learning infrastructure and this picture becomes even more complicated.

A

When you look at cloud training experience when you need to train in a cloud, you need another additional company that manages your Cloud Ram infrastructure, provisioning and such so. The system becomes very complicated. You just need to track experiment. You just need to run in the cloud. At the same time, you already got two additional companies to your infrastructure. You double or even triple complexity of ml operations and the question we are asking today: can we simplify this experience? Can we remove the moving parts or in your it infrastructure how to do that?

A

We need to reuse the infrastructure company that we already have I'm talking about your git as a source, control tracking tool. I'm talking about your git servers such as GitHub voice code extension, your IDE, why do you need a separate, UI and service for experiment tracking? Why don't you have this experience right in your ID in the place?

A

When you manage your code, you won't need to switch back and forth between your development experience when you write the modeling code and your ml tracking experience when you see your runs, this can dramatically improve our efficiency, efficiency, around infrastructure and efficiency in our workflow. This is how we structure our next steps. We will talk about each of these components and how to use this in your ml training process.

A

As a result, we are getting rid of some part of your ml infrastructure and using the existing infrastructure that you already have instead and git becomes a single source of Truth for your ml experiments for all your operations around machine learning. Let's take a look how it works.

A

We are starting with uh experiment, tracking user interface, how we can get the reach interface right in your idea, together with your code, so you will have your coding experience the best developer experience you can get and at the same time you will track experiments right on the same application. Whenever it's run in in your laptop or in a cloud, you should have those pieces together and when you run it in your IDE, you don't need additional infrastructure.

A

You don't need additional server, because you have one all right now, this fun stuff begins: I'm open my best code extension and my laptop. uh We start local locally, and then we go to Cloud. This is a regular OS code. Experience you're, probably familiar with this very well. My training code I trained something I save my model, so very usual modeling stuff. What is special about my environment? Invest code, I have a plugin installed for experiment tracking, so we can see the list of our experiments with metrics.

A

You can look at average, Precision, AUC and Hyper parameters. So far we got only four of those in real case. Yes, you might have dozens of Matrix and Hyper parameters, but let's make it simple for the demo, you can see the entire history of my experiments uh and now we can run a new one. I am choosing an experiment and run it. So in this case, I will be changing the number of estimators and maximum features right now, I'm using a very simple project. It's a NLP problem with a binary classifier using traditional machine learning.

A

So the training takes only a few seconds and we will see the result soon here. It is a new experiment in the table. You can see the time the new value of the Matrix, and now we can compare the experiments. The current experiment performs slightly better than experiments. We started originally from this diagram. Experiment is actually a git Branch, the original Branch we started from and the difference. As you can see substantial, we can go deeper and look at the Matrix and graphs that we produced during the experiment. So that's this Precision recall curve.

A

You can see that our experiment performance slightly better than the original one. In addition to basic metrics, you can see more advanced visuals, such as confusion, Matrix for our binary classifier. You can take a look at the feature, importance graph and compare those two. You can notice that feature importance is just regular image file, regular PNG file that we produced as a result of our experiments and the tool can pick it up and show it right here. Sometimes two experiments is not enough. Why don't?

A

We take a look at the experiment that we run let's say a month ago and look at all three experiments. So all three are here: you can see the IDS and all the three graphs in one in confusion, Matrix. Of course you cannot put one Matrix over the others, so you can see them separately and the same for the feature importance. This is the graph for those experiments you run and those two are old ones.

A

What you see here is a regular way of tracking machine learning experiments, but we have all this experience right in your idea. Next to the code, you can click and jump to your source code right away, modify the code and after modification just click run, and it will train and preserve all the results, all the changes in source code, all the changes in hyper parameters and produce the metrics in the table. This is the way we integrate your development experience.

A

With your modern experience, you you don't want to switch back and forth between the IDE and coding and some sauce and some UI in a separate place. You have everything here how we generated the metrics, how we generated the visuals. You need to instrument your source code using some libraries. In this case we are using DVC, Live library, but any experiment tracking tool has its own. You import the library and lock the Matrix like here. This is I log curve.

A

This is how I log on my Precision recall curve, and this is how I save my Precision numbers, and after this you will see the result in the table before we jump to the details and see how it works under the hood. How we use git to save the table I want to mention to some small details here, how you can customize your table. You can remove some of your columns that you don't need anymore.

A

uh Some of the hyper parameters that you you don't need when the table is customized. You can understand your Matrix better help you make data-driven decision faster, but how to get this experience. That's very simple! You see the DVC plugin installed in the west code, and this is what you need to get the table. You can go to the extension Marketplace search for DBC and install it after the first run. You will see the table and you can start your journey in experiment tracking quiz code. Now we go a little bit deeper.

A

We will see how this works under the hood, how experiment tracking is implemented using git? How git becomes the source of Truth for thousands of your experiment that run? You are using the same set of tools and technology for both for your source code, tracking animal experiment, tracking, using git for machine learning, experiment tracking, simplify your workflow. We will show you how to use git branches, pull request to collaborate with teammates on machine learning, experiments and how to do this efficiently, using the tool that you already have and what is even more interesting.

A

It helps you collaborate better with not machine learning teams, your devops, your it infrastructure people, your software Engineers, know very well how to use a git. You already know how to install the extension and how to see the table with your experiments, visuals and all the information that you might need. Now we will see how this implemented and how we leverage git to keep information about our experiment. If you look to the table, you can notice the ideas of the experiment, believe it or not.

A

Those ideas are git commits and by these ideas you can find the exact information about your experiments, the exact source code, metrics and data, or at least pointers to the data sets. Let's do a simple experiment. So there are two different commits, the last experiment and the previous one. What we can do we can do a simple git, GIF command for one of those experiments, the ID and another experiment.

A

Let's look at the changes in source code, so a regular git div shows you what we changed in our source code. Now you have a proper history of your experiments with a source code changes. You can look at the difference in your metrics, so that's The, Matrix file we use and the difference.

A

You can see that that's a simple Json file with the numbers and git can show you the difference between the Json files. If you are not happy with the Json div right for the numbers, it's kind of probably not the best experience, you can use DVC functionality to get difference between the numbers, so DVC div works the same way pretty much as a git does, but it shows you data driven div based on the metrics. In the last session, we will be talking about training in cloud. In many ml projects, you need Cloud compute.

A

Sometimes you just need a GPU instance to run your deep learning model. Sometimes you need a little bit more memory and that's a constant source of a paint problem for teams. You need to learn how to get these resources from the cloud you need to set them up. You need to clone your repositories. You need to sync the data you need to etc, etc, etc, and not forget to shut down the instance when training is ready, so you won't be wasting money until your instance is running over the weekend.

A

A new project code spaces unifies this experience with code spaces. You can get your compute right from web UI of GitHub. You will get your GPU instance. Your ID installed your git repository clone there good portion of problems that you have will be solved by by a single click in additional to all the benefits of using compute. In such a simple way, we will talk about the correlation and unification of your infrastructure with code spaces and risk West code.

A

There is a way how to unify your environment, on your laptop and in a cloud so you'll have the same set of libraries, the same the same versions and the same source code. It simplifies the way how you operate, especially for the teams. We will also talk about environments, how to unify environments on your laptop in a cloud and in the cloud, so you won't have this problem like. Oh, it worked on my machine, uh but no one else right and code spaces has a special technology to simplify this process and improve this experience.

A

Let's jump to the GPU part, the cool part. This is an easy way to run your ml training in a cloud for this demo for GPU demo, I will be using a different project, a deep learning project. So in the web page of the project we just need to get to the code spaces, but for GPU we need to configure the page a little bit. So we are choosing the GPU instance. When you push the button, it will create a container.

A

It might take a minute or two. So instead of waiting, I will return back and launch a container that I already have. Let me click here and we will get our server with a GPU in a couple of seconds. So that's our source code and the source code in a cloud and you we can see the great developer experience with OS code. While it runs in the cloud. Do we have GPU? Let's check it out, that is uh Tesla V100.

A

Not bad for our project, so, let's start training, DVC plugin is set up. We can go to the experimentation table and from the table. Let's pick that experiment and run it with slightly different set of hyper parameters. So we are choosing this one and that one until this is right and we will check our source code and how deep learning project is instrumented by metrics, as as before we use DVC Live library in Python, you can find our callback function. uh Callback helps you to simplify the way how you output your metrics.

A

That's an example of uh report and casino metrics. This is how you log uh your loss function. All right training is in progress. Let's take a look at the table. This is a similar table that is, you have seen before. We just use a different set of metrics right and different set of hyper parameters. What is special about deeply learning, uh usually the slow right and you can look at the progress of the training in our plots. If you choose experiment that we are running right now, you can see the progress over time.

A

This number is increasing and in real time you see the value of your loss function of your accuracy. So let's look at the metrics that we use for this particular project so accuracy and in addition to the Matrix, we got misclassified images so that image was classified in a wrong class. So it's a croissant which model recognizes a cat, so that's a way how you can attach more visuals to your project and you can make a better decision about your next steps in the modern process.

A

Gpu that you have right in a GitHub brings you like completely new experience right. You have all the experimentation tools in one place, but there is a one more important company that we should uh talk about. It's about country containers, it's about the way how you unify uh your environment, you're, probably familiar with the docker, but in a code spaces there is an another way of creating a Docker and I would say even a simpler way of creating Docker. Take a look at this Dev container Json file.

A

It has a reference to uh initial container that we need, which we use Python 3.. We use a few features, we use Cuda and we use iterative dbcm. If you create this image with that features, so you will get all the functionality in your Docker container on your developing environment in a cloud you can specify more items here like extensions right. We use uh python extension from Microsoft.

A

We use red hat yaml extension from Red Hat to visualize our files and iterative, DVC extension for the experimentation table that we have seen before all in one place. When you ask for a GPU instance, you will get all this environment, so it helps you to unify environment for your team members. But what makes this really cool is a local experience.

A

You can use the same container in your local West code. Let me show you how it works if you open your West code and if you open this project in your laptop in the west code, so it will ask you automatically. Do you need to use this container? If you answer yes and we got the same container, the same environment with the same set of libraries plugins and all these features in our local environment?

A

So now we have unified environment everywhere in a cloud in your laptop all the code, you create all the experiment, you run work everywhere. Now it doesn't mean I got GPU on my machine, so it might be running for days, but you got the point. This is the way how you can unify your environment. This is the way how you efficiently can collaborate with your teammates with your devops team, who is responsible for deploying care models. They will know exactly what library you use, uh what images you use for business.

A

It helps maximizing value from investment that you already made investment to your gitaka system and tools to Agile process and to your team collaboration. So all together, we can be more efficient in create and color AI projects. Thank you for your attention. Please reach out me to discuss this ideas.