.Net Foundation ML.NET, 31 Dec 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Create Your First Machine Learning Pipeline in ML.NET - 2020 Update

Description

Update on an old video that creates a linear regression machine learning pipeline using the latest version of ML.NET (v1.4).

Original video - https://www.youtube.com/watch?v=8gVhJKszzzI

Code - https://github.com/jwood803/MLNetExamples/tree/master/MLNetExamples/SimpleRegressionUpdate

ML.NET Playlist - https://www.youtube.com/watch?v=8gVhJKszzzI&list=PLl_upHIj19Zy3o09oICOutbNfXj332czx

Contact:
Twitter: https://twitter.com/JWood/
Blog: https://jonwood.co/

Gear used (affiliate links):
Mic - https://amzn.to/2YEXtxI
Mouse - https://amzn.to/2ZtASoQ

A

Hey everyone so a while back I made my first youtube video here about creating a first machine learning pipeline with no net ever now, it's been about a year and a half ago, and there has been a lot of changes. In fact, I think I did this back when it was 0.1 version and now we're around 1.4 version. So there's been a lot of breaking changes since then, and so what I wanted to do is kind of make do the same machine learning pipeline with their motor net, but using the latest version with it.

A

Alright, so we're in visual studio here with just a regular console project and done their core and the first thing I'm gonna do is install the moai NuGet package and right now the latest version is version 1.4 alright. So we got that and we're going to use the same data that we did before the salary data, where it has the years of experience and the salary based on the years of experience.

A

So we're going to create a machine learning model that takes the years of experience and tells us approximately what's la somebody should be getting all right. So the first thing here in the new version, then the newer versions of mo net is that we need to create new in male context, and this allows us to do pretty much everything that we need to do within an monette.

A

So I'm gonna do a few things here in this video I'm, going to load the data and then I'm use that data to build a model and then I'll evaluate the model based on a metric.

A

Then I'll do a prediction using the model so to load the data. I'm gonna, use the context and use data property and on there there's several methods that we can use. What I'll be using is load from text file, and this is actually a generic method and it takes in what-what I have to call the schema input which basically tells a mode net when the schema of our data looks like, and it's going to be. A class and I'll call salary data.

A

Now let Visual Studio create this and a new file, then in here I just do a couple of fields here and the first one is going without years of experience, which I'll use as a float and I'll call it years experience and then the second one's going to be this salary, which I also used as a float now for mo Annette to read this incorrectly, we need to add a couple of attributes and it's going to be the local column attribute, and that takes in a parameter here which tells it this way, we'll call them that we're going to read from- and it says the years of experience- is the first column there's going to be zero.

A

It's gonna be zero based here you never do the same for the salary. That's gonna be the second one.

A

Another attribute that we'll use pretty often when Mo done n, it's called a column name not put in label here, because we're using salary is our label for a machine learning model and what this does is it tells mo Dannette that this we want our salary field to be named labor when using in ammonia pipelines, so basically we're we're telling it to be so telling them to be label, but we're gonna use it as salary within our application, and that just allows us to make it more sense within our application.

A

Instead of saying label all the time we can say, salary or predicted, house price, or something like that. So we have our schema input. We have a couple of parameters for load method. Here. First is the name of the file that we're going to load in next they're, a couple of optional parameters. One has where we can tell if it has a header or within the fowl, and this one knows and the other one with the other one is we can tell it was separator we wanted to use and by default it is tab.

A

But we're gonna say that this is a comma. Instead, as we see we have commas in our input file here right, that's all. We need to load the data just to build a model yasmine to first create our pipeline, and with this don't have much data here, so we don't have a lot of data cleaning to do, but these pipelines can be pretty big. Ours is gonna, be real, simple, first thing: I'm gonna do is use a transform and an internet.

A

A transform is anything that manipulates to the input data and to something else and we're going to use the concatenate transform, and this takes in a couple of parameters here. The first one is what column name want: the output to be I'm, gonna, say: I'm, gonna, output, be features and the input it's gonna, be the years experience and we can add as many call names as we want in here, and then we're gonna pin to these as many times as we want to.

A

But since that's all, we need to do I'm gonna, also put in what machine learning algorithm I want to use, and it's gonna be regression, because we're gonna predict a numerical value here and then we're going to use a trainer which is the same thing as an algorithm and I'll choose the this Poisson regression now there's a few parameters that we can use. First, there's the label call name, but that defaults to label. If you remember, we told our label field here, to name it label, so we don't need to specify within here.

A

Well, there's also a feature called name that defaults to features and that's the main reason why I did this transform here? We just use those default parameters here now we could just not use not do this concatenate and put in years of experience as the input call name as the features name. So that's a pipeline next thing is: we need to fit our data on onto it to create our model, so a pipeline that fit on our training data.

A

This is basically you wanna, take take our data run it across this pipeline and then it's gonna create our models. Gonna learn from our data once once the does does concatenate on everything. Then it's gonna run through this algorithm to learn on one data. Then output, a model for us.

A

So now we already have the model, so that was actually not a lot of code to build our model there and we can evaluate it, which means we can take our model and see kind of how well our model performs on on beta, so do create some predictions and I'm gonna transform based on our model on some new data and in fact, I forgot to do something. It's one thing you wish you know pretty much always do and that's called splitting our data, and so you know that gives us a nifty method.

A

First, we can do context that data that test train split, so we take in count the whole data that we load in here. We can also give it a test fraction which I like to do 20%, so it takes 20% of our entire data and it saves that off as a test item and in the rest of the 80% will be our training data.

A

So we fit. We need to always fit on our train set and then, when you evaluate on, we always do that on our test set. So now that we have predictions on our test set here, which basically it takes all of the inputs from the test set the user experience. It runs that against a model and again gives up words for it and with those we can build matrix and in modena gives us nifty methods for that as well.

A

Where, since we're doing regression we're going to call the regression that evaluated Metin and never give it our predictions. So that's going to take our predictions and it's going to evaluate it based on the data that it already has, since it already has the labels in this data. It's going to take what it predicted from the model and compare it to what the actual values are and that we can write out to the console and I'm going to use. R squared.

A

Nothing, what r squared is the closer to one that it is, then the better off the model performs to diminish the R squared.

A

Then we'll go ahead and do a console real on, so it doesn't disappear on me when I run it. So that's how you evaluate the model. Next, we can predict on it with our own new data so to create a new salary data item here and I'm just going to give it a years of experience. Let's do one point one.

A

Let me get an error here, because one point one is a double and we told total minutes of float so we're going to add F to it to kind of add an F to it to kind of force it to be a float.

A

And the great predictions need to create a prediction function to do this. We use a context. The model that create prediction engine- and this is also generic and actually takes two atoms here.

A

First- is a kind of the input here which we already did as a salary data, but the second one is going to be: they call it a destination, but I like to call it the output schema for no, it was just kind of the what our schema that we get from the output of the model and I'll call it celery prediction, because we haven't created this yet and then from here we just pass in the model.

A

You know let future studio to create this again. For me, now in here, I'll just create a property. There's gonna be a float same as what our years of experience, our salary was an input I'll, let the same type be the same as output and I'll call it predicted salary.

A

Now we also need an attribute here. Just gonna be the call name, and this is going to be scored now mo done. Net puts this as a score by default, but we can kind of override it in our application. As predicted salary there we go. We have our prediction function and with that we can create a prediction by using friction function that predates pass in or new data, and with that we can write it out to the console.

A

It's a prediction: prediction that predictors salary all right. So let's run this and make sure it runs. Okay and there's one thing: I always forget to do this when I put in the file into Visual Studio like this, always forget to copy it.

A

When I builds there we go so we got our squared, it's 94%, it's pretty good and then our prediction of 1.1 years should be around 42,000 dollars very good, so that was kind of the updated version of the previous video that I did, and hopefully it shows you how to use how to do the same thing with the newest version of Emma done it. So, thanks for watching and we'll see you all next time.