.Net Foundation .NET Conf 2018, 13 Sep 2018

Previous Meeting

⏯

youtube image

►

From YouTube: S201 - Machine Learning in .NET (ML.NET)

Description

Ankit and Gal will share some of the latest features in ML.NET, the machine learning library for .NET.

A

Welcome to day 2 for dotnet, conf, I'm, Agatha, Stena and I'm joined here by I'm gallows-tree and we're gonna be talking about what's new, with machine learning, net or ml dotnet. So, as you know, dr. is a great tech stack for building a variety of applications. You can build web apps cloud apps gaming, apps, IOT apps, and what specifically we're going to be talking about in this session is how we're making dotnet great with ml dotnet.

A

So, let's get started before we get there, though I know that a lot of you folks, perhaps for just getting strong with machine learning so gal, perhaps can we give it a little bit of an overview about what machine learning is absolutely.

B

So when we think about machine learning is that it helps you program the unprogrammed Bowl. So if I wanted to write a function that takes in an image as an input and tells us whether there's a face in it or not, it's not clear how you would start writing that function or if I wanted to write a function that takes the description of a shirt and outputs the price of the shirt.

B

You might look for some keywords within the description, but it's not clear if that would work very well or how you how you would scale that to more items now, even though we don't know how to write that function of inputs and outputs, we have many examples of these inputs and outputs as data sets, so we can take a large number of images and actually label them as having faces or no faces in them.

B

My machine learning lets us take this data set and create this function, a model that can take the input of an image and predict whether there's a face in it or not. Now there many tasks in machine learning. To give a few examples. One is classification where you're trying to decide whether something is a or B. In this case we looked at is that is there a face or no face in this image? Another example is regression we're trying to predict how much or how many?

B

So, if you're, trying to predict the temperature tomorrow or the shirt the price of a shirt, that's a regression problem. Another is clustering, where you're trying to look at what are the groups in this data? What are maybe the different topics in these set of news articles and.

A

And today, essentially with ML dotnet, if we're going to be showing you how we solve three real-world mo problems. Well, so let's give you a taste of that with the first problem that we have here, which is github issue classification.

A

So let me is gonna switch here from my for my from a PowerPoint presentation here to this sec to this to the screen here and what you're seeing here is essentially the dotnet core effects, repo and and- and this is a such an example of how the dotnet team is already using ml by building a classification model. So here what I provided here is the title of an issue and the description of an issue and I'm actually going to go ahead and submit this issue here. So this is a new issue.

A

That's been open yet, and what are we going to do with ml that met here? Is we're going to go in and classify the label tag on what this issue really belongs to using the description and the title? That's here so the background here, I'm just going to cheat here a little bit and run this lab here, and what that's going to do is is going to use the gate of api's to essentially go and tag.

A

This corvex issue so you'll see here that, as soon as I ran this particular app in the background it went ahead and labeled. This is an area system net. When you look at the title of this issue in the description of the issue, you know you will leave a look at it and it'll be like you know what this is actually really good. This actually works well.

A

So this is one example that we're going to get into detail on how we build this issue classification using ml dotnet, and this is something that the Dutton team is already using today as a part of the core FX repo for issue tagging and so on. The other two examples that we will also get into is: how do we actually build a movie recommender with machine learning dotnet?

A

This is one of the new features that we've added as a point of ml, dotnet, point three and then gals going to get into how we can do image classification now with machine learning net, using deep learning, with the tensorflow edition as a part of the ML dotnet 0.5 story.

A

Before we go further, given that some of you folks might be new to what ml botnet is it's it's fairly new, we release it us build it, release it up build, so we want to give you a little bit of an overview on what I'm a lot net is and how this currently positions itself against the other Microsoft products that are also in the mouth space.

A

So, if you're new to machine learning, one great way to get started with machine learning is to take a look at pre-built ml Microsoft as a part of Hydra cognitive services as a part of the Microsoft platform. Why it's a great way for you to be able to do pre-built, machine learning so in this code snippet. What you're looking at here is that I'm using the text analytics API, which is a part of the cognitive services set and I, am providing it a product review in this case this product reviews for a vacuum cleaner.

A

It says that this is a great vacuum cleaner and then this API essentially returns a sentiment, whether this reviews positive or not. So, as you can see here in this case, this is very, very simple: to do you're using an API which is basically backed by a pre trained ml model, and it's returning you a sentiment and for a lot of scenarios. This just works very well. You should use it.

A

Having said that, let's say, let's say we actually change the product reveiw now to be instead of being at this is a great vacuum cleaner to know that this vacuum cleaner, sucks so much dirt. So now now what's happened as we've chained the product reveiw here, and in fact this is a great product review for a vacuum cleaner. But what happens with pre-trained ml. Is that the sentiment that this is going to return?

A

Now it's going to be only 9 percent positive, so it's actually negative sentiment here, and this kind of like highlights some of the problems that there are with free, pre-trained ml pre-trained ml works well for for a good number of scenarios, but in a lot of other scenarios, you do need to build custom ml models where you bring in your own data. You train with your data.

A

You have your own algorithms and then you run and deploy these models in production, and that's really what ml dotnet is about ml dotnet is a framework for building your own ml models. Custom ml models essentially now, as as this is a framework, this has various components as a part of it. The first component that it has is it has api's. So what we've done here is we've taken.

A

A look at the API is that CN TK had a couple of other offerings had, and what we've come back with is one set of developer friendly API is from mortal consumption and mortal training, essentially, something else that the base ml dotnet nougat package comes with is a list of components of transforms. So when you're doing machine learning, one of the most important activities, you'll do when you spend your time with, is doing data. Pre-Processing data transforms feature engineering and so on, and for for these kind of tasks you require these.

A

These transforms that come with a little dotnet. Essentially so things like categorical transforms text transforms feature selection. These are all built into the framework, so you can use them for data pre-processing, processing and feature engineering.

A

Something else that this framework comes with out of the box is a set of classical machine learner, machine learning, algorithms, and you can use these algorithms for for doing various ML tasks that gals just talked about like regression and classification. So if we have linear learners, we have Creole earners, we have SVM, we have learners for clustering, like k-means and so on.

A

In addition to this ml, dotnet also has the base package also comes with some other stuff. So, as you can imagine, machine learning data has certain characteristics and for these characteristics we have heterogeneous data types and homogeneous data types that ml dotnet comes with, which can help you with data messaging, datum, sagging and and transforming which eventually, eventually you'll use as a part of the model building face, and then ml Dodman also comes with some other aids for our utility AIDS, for let's say loading from a CSV from a TSP and so on.

A

So all this is essentially what makes up the ml dotnet framework today.

A

The last thing that I also want to touch on briefly before we go further, is that ml dotnet is extensible, which means that as popular first-party or third-party frameworks coming to being, we will wire them up as a part of ml dotnet, for example, test flow which we're going to talk about later, and you can then use tensor flow and other popular first party in third party machine learning frameworks through the same set of consistent developer friendly ap. Is that we've built with you and providing you as a part of the ML dotnet package?

A

Essentially, so that should give you a little bit of an idea about what this framework is. What are some of the come opponents there where it comes with, and as we do, the demos you'll see these different things in action. Essentially, one important thing, perhaps to also call out here is that Python and our our great environment for machine learning and data science, and if you, if you're familiar with them, that's actually a great choice.

A

We provide great tooling and experience for them individual studio and our products, but if you love dotnet and you love c-sharp and you want to stay in c-sharp and dotnet ml data is a great way for you to get started machine learning in terms of products that you might have already used like cognitive services and a lot net. Compliments this experience for, for example, cognitive services, as I mentioned, as pre-trained machine learning for the most part, whereas ml dotnet provides the ability to build custom machine learning models internally in dotnet, adren.

A

Well, studio is another product that we have in this in this space, which provides you a very graphical way of dragging and dropping components to compose models, machine learning models and ml dotnet complements that by providing a very nice quote, first experience which will extend and complement to experience as you use these products together.

A

The other thing to point out here is that, while it'll depend is only about four to five months old at this point externally ml botnet has been used heavily within the company. For the last decade, ml dotnet is used by a lot of the iconic Microsoft products like banks, for example, Bank uses ammo botnet for click.

A

Add predictions Excel uses ml at net for chart recommendations, a feature that you might have used: PowerPoint uses ml dotnet for design ideas feature another feature that you might have used: a Windows 10 hello uses, Windows hello, uses ml, dotnet, Windows, Defender user ml dotnet.

A

So what we're trying to get across here is that ml dotnet has its roots for the last one decade within the company, and it's powered some of the most iconic products and even though you might not be filled with ml dotnet from an external perspective, you probably have used a feature which is basically main built by ml dotnet or by using M of that net.

A

So, in a nutshell, what we want to we're going to summarize here if you're, just looking at ml data for the first I'm mo botnet, is a machine learning framework for dotnet developers allows you to build your own models.

A

The port of view we've taken with designing and developing ML dotnet is that we want this to be developer, focused, which means that we're going to optimize it for certain ml scenarios, like recommendation, sentiment, analysis classification scenarios, prick'd of maintenance, regression forecasting thinks that our common ml tasks that you run into every day ml baton is proven in extensible, which means that, like we've had this within the company for the last 10 years and and what we were doing externally now is we're redesigning the api's as a part of the community.

A

But this is, this is ml. That means is at the base of M. Allotment is extremely strong and the last thing there just how we roll in dotnet these days that assaults the open source on cloud platform- and you can check us out at our machine learning repo, which is the dotnet machine learning repo. Here you will see that we already have about 3,500 stars here, a bunch of forks there's a lot of activity on this repo.

A

In addition to this, we also have a sample sweeper that you can check out and with the sample scraper we're growing.

A

This repo right now, but but currently you can see it- has C sharp and F sharp samples, and you can you can explore them as you, wanna get exposed or as you as you learn as you still get started with learning machine learning you can you can you can you can check out these samples here getting started the machine learning with ml dot result so easy if you land on the dotnet website, the easiest way to find or getting started, docs is by going to the ml dotnet getting started page, which is this landing page right here when I click get started, you can follow the instructions here, and this will.

A

This will help. You get started ml dotnet by installing the console lab by installing the ml dot a nougat package and then taking a look at the iris data set and you'll go through and walk through a problem here which should allow you to do a regression model for building for building and predicting classifying artist flowers. Essentially, so that's some of the stuff that we wanted to mention it as a part of what is ml dotnet.

A

Next, we want to get into some of the some of the demos that we have, but before we get there, we also want to briefly touch on what we've shipped as a part of the mill that net releases over the last four months. Every month, the first Tuesday, roughly we ship a version, a point version of ML dotnet, we've already shipped four versions at this point point two point three point four point: five, and as a part of these different versions of ML data that we shipped we've added a lot of capabilities.

A

So the first thing that we've added there is the ability to do additional mo scenarios like if you want to do clustering, we have we've added learners and transforms for that. We've added recommendation as a scenario. This is the preview of recommendation in mor pet and with ml dot that five we've also added support for tensorflow, which will allow you to do some deep learning scenarios like image classification.

A

In addition to this, we've also added some other data science capabilities here, so we've added cross-validation as an option for training and testing your data, a variety of new transform and additional learners support for onyx. We've done some work in ml dotnet point for for improving our support for f-sharp by second by supporting extra records and, lastly, we've also authored the M allotted samples, repo, which has some you know, n2n, and examples that you can use to get started with them. A lot net.

A

So I hope that gives you a little bit of an overview whatever dotnet is well what what are the different components in the framework? What we've been shipping every month for the last four months as a part of this, and next, let's get into how we can use them a botnet, let's say: build the github issue, classification model and example that I just demoed a few minutes ago. Gal perhaps can walk us through that. Absolutely.

B

So if we go to the next slide, so there's to quickly remind ourselves, this is a classification problem, because we have the issue coming in with a title and the description and we're trying to assign it to one of these classes. One of these areas that the engineering team a.net has decided.

B

So this is an example issue where we see the title: the description, as well as the labels and the bottom right, the label that we're going to try to is area system- I. Oh, it's where, in the code is this issue relevant, so the features which is the input to our model and the information available to us when making the prediction are the title and the description? This is what the engineering lead would look at to make that decision.

B

The label in this case is area system I, oh, so when we want to train our model, we're going to take these features and the labels all these examples and feed them to our learning algorithm to output a model now this model at prediction. Time can then be used to take in the features when the new issue comes in and predict which label is most relevant for that issue.

B

Taking a step back and looking at the you and 2i machine learning, workflow, we start by loading data, so we bring in the data set from of all the issues and the core effects repository. We then extract features from it and convert everything to numbers that the learning algorithm can understand. So we need to turn all the text that we have into numeric vectors that have the information available for the learning algorithm to use to make good predictions. That's the extract, features stage or feature engineering.

B

We then train our model using this data and the output of this the model itself needs to be evaluated. We need to understand: is this model actually good? Will it perform well on new issues that come in? If we're happy with the results of this model, we can then deploy it into our app and start consuming it make using it to make predictions as new issues are coming in.

B

So, let's take a quick look at the code and, let's switch over to my laptop.

B

So one thing I want to note is that this is the existing API that are available in ml dotnet 0.5, the learning pipeline API s. This is what we released with ml net at built a few months ago, but we knew from the beginning that this API will not cover all of the scenarios that ml dotnet needs to enable. So, while I want to review all the concepts that are shown through these api's later in the session, we will actually give you a sneak peek into what the new API is look like.

B

So what I have here is a console app I have the ml dot the Microsoft ml new get version. 0.5 I also have a few datasets downloaded from the core FX repository, so issues Trane has all the issues and their area their title as well as the description the ID does not matter to us right now.

B

What I'll do is start running this app and we'll get to the breakpoint right after we've loaded in the data into our learning pipeline. So this learning pipeline will include the text loader, which brings in the data some transformations to extract the features and turn them into numeric vectors that the learning algorithm can understand. We will then have the learner itself and finally, we can do at the training step. But let's look at this step by step, so over here I can actually open the pipeline and look at the top 10 rows within it.

B

So if I pick one of these rows, I can actually get a preview of how the data was loaded in and right now we're separating the columns with this with this vertical bar. So you see the ID the area, the title and the description of the issue.

B

If I go one step forward after this diction Arizer I've converted the area to a label column which is numeric it's of type key, which will be meaningful to the learning algorithm once we get to that stage, so you can see that we added this new column over here in the preview as a next step, I'm going to feature eyes the text again turning it into a numeric vector that the algorithm can understand so previewing. This again, we see the title in place has changed in two numbers.

B

One step forward: we've feature eyes the description as well, so we have all of these as numbers. Now. What we need to do as a final step is concatenate. These two columns and two one features vector, so the learning algorithm can just take the features and the label columns and use them to train the model. So taking one final, look, we see the label here and the features column which concatenated the title and the description now I can train the model.

B

Let me actually start running this, but now I can train the model by giving it an input and output class. So this input class represents the columns and the data set that we want to read and the output class tells us. What do we want to keep out of the model? At the end.

B

So this training will take a while, but well, what happens after the training is that we can save the model as a dog zip file to load it into a different app or we can evaluate on a different data set now in machine learning, it's important to evaluate the model on data that it hasn't seen yet to make sure that it performs well in in a realistic scenario where data is coming again in the future, and you didn't have it available when you trained at the model so over here we can evaluate it and look at the metrics like like micro accuracy and finally, we can actually take in a new issue.

B

So we've created one here with a title and description and we can bring it into the model to get a prediction now. One thing that's important is so the program actually finished running and you can see over here the micro accuracy and the area.

B

So one thing that's important is not to pay attention right now to these exact api's, but to the concepts that we covered in terms of bringing in the data, transforming it training the model then evaluating it and using it to make new predictions.

B

So, let's switch back to the slides.

A

Cool tech scale, so that was kind of like a quick intro into how you can go about building, get up issue classification models with them about net. If you want to follow the example there that samples is actually checked in to the ML dotnet samples repo, so you can clone that and you can follow the code there.

A

One other capability that we've released as a part of the.net point 3. Is the preview experience for being able to do recommendations as a part of em of that? Let so, let's take an example of how we will use the ML dotnet capabilities to build a movie recommender. As you can imagine, building a movie recommender you can. You can have various approaches there. So, let's cover a few popular ones that come to mind. The first approach there that you could take is: is population averages.

A

So what we're doing in this approach here is that we pick a particular movie and we'd like take a look at that waiting for that particular movie. So let's say that in this case we have the Princess Bride great movie to watch, and we take a look at the rating for this movie, which is about 8.1 on IMDB and then what we, what we, what we come up with is a threshold, and we say that hey. If the rating of a movie is higher than a particular threshold, we will go ahead and recommend this movie.

A

So all we're basically doing is we're using the population averages and coming up with the threshold and saying that hey. If the rating of this movie is higher than the threshold, let's recommend it. If it's not, let's not recommend it. This is a very simple approach and for for a number of cases this actually might work very well, but in reality, as you can imagine, there might be a scenario where someone doesn't like romantic and fantasy movies and hence doesn't perhaps likes this recommendation. So this approach might not work very well in those scenarios.

A

Another approach that you can take towards move recommendation is content-based, filtering and content-based. Filtering is looking at some user preferences, for example the movie genre and then coming up with recommendations based on that. So, for example, if a particular person likes Ironman and likes Captain, America, there's there's a chance or a stronger, stronger probability that this person might also like adventurous.

A

Essentially now again, there is there's a there's, a there's, a fall true in this in this approach, as well as it could happen that, while johner based prediction or genre based recommendation is is is is is is is correct. The movie by itself might not be great movie.

A

The third approach that we're going to talk about in detail here is collaborative filtering now collaborative filtering as a recommendation approach is becoming extremely popular and it's used behind a lot of the popular recommendation systems that you see. So, let's learn a little bit more about this. So what collaborative filtering essentially says is that if person, a let's say gal has the same opinion as person bases are, for example, on a particular issue.

A

It's likely that gal person a again is going to have a similar opinion as person B on a different issue than a randomly selected person, and to illustrate this point, let's take a look at this example here right. So what you're seeing in this table here are three three users on the left and kit, gal and cesare and then on the right, you're, seeing different movies that we've seen the the the the tick marks here. The check marks here suggest these are recommended movies, or these are movies that we liked.

A

So in my case, for example, I liked home alone in Terminator, 2 and the crosses here mean that it did not like heat, Mission, Impossible or Casino Royale in this case.

A

So this is an example of data that you can collect as a part of you know, or move recommendation systems or your building. Now the question comes up here: is that, given this data set, what is the probability of gal liking? Casino Royale, and this is something that collaborative filtering can really help us with so Neville dotnet point-three. We added support for collaborative filtering and what collaborative filtering tells us in this case is two things.

A

The first thing it tells us is that gal and cesare have similar tastes in movies and that, given cesare liked Casino Royale, there's a higher likelihood that gal also might like might like scenario.

A

So, let's go ahead and start building this model in code and and I just want to call out here that there's various approaches for doing collaborative filtering what we've released an emerald up in point. Three is factorization machines, which is a binary classification learner that we've added, which we're going to use for this demo.

A

Now, once we've established the approach for building a movie recommender in this case that being collaborative filtering, the next thing for us to figure out is which data set to use so movie lands, for example, is very popular data set for movie movie ratings. The the data set has close to 20 million ratings there across 27 thousand movies by about 130,000 users. If you want to go in and check this data set out, there's a link here. This data set comes with two attributes here.

A

Two to CS wheels here, the rating CSV and the rating CSV provides things like user ID movie, ID waiting and the timestamp at which the review is provided. The second data set here that comes with this movie CSV, which has the movie ID the title of the movie and the G owner of the movie.

A

So, in our case, once now that we know the data set, let's try to model this problem right. So what kind of ml task is this? So we know we're doing recommendation, but there's a catch here. Another lat net 0.3 rulli's factorization machines, which is a binary classification learner. What that means is Gator could not only tell between a versus B.

A

So what we're going to do with our data here is that we're going to take our ratings data set and we're going to do some massaging of data here and we're going to transform this data set into if the rating of a movie is greater than 3 we're going to put a 1 there, which means the user liked this movie and the rate of the movie is lower than 3.

A

The range is 1 to 5 is lower than 3 we're going to go ahead and mark the rating here as 0, which means the user did not actually like this movie Modelling. This problem this way as a binary classification model will help us being able to succeed. So let's go ahead with that. So now that we know it's a binary classification problem, the next phase is feature engineering, which means that I want to pick up the features that I want to pick.

A

These are the features which will help me predict the rating of the movie, we're going to use two features here, in this case the user variety in the movie ID, and then we're going to go ahead and use the rating field as a label, which is what we're trying to predict once we have the features and the labels, we can go ahead and fit our model and that's what this picture is showing you sore features again or the user ID of the movie ID and the label.

A

What we're trying to predict is the rating and then once our model is ready at prediction time. We will input into this model: user, ID and movie ID, and what this is going to return us is the predicted label, which is the rating of the movie to 0 or 1, along with the scores field. I hope that makes sense. So let's see this app in action, I'm going to just flip here for a second.

A

Let me run this app here and then we'll get into how we build this.

A

This is the app that we built here is essentially an asp.net app. It's an MVC app there. So, as you can see, this app has a set of profiles here. So I have three users here: I have ankud I, have gal and half Cesare. So let's click on gal and see what we see here so in case of gal.

A

What this gal profile shows us here is the recently watched movies that gal has seen like heat Terminator, Mission Impossible the thumbs up or the thumbs down tells us whether gal, like these movies, or he didn't like these movies. For that matter. The next film line here shows you a list of movies which are popular.

A

You know hits on the box-office like face off or Titanic or Casino Royale again, and if you look at this little button here called recommended, but I think what I, when I click this button what's gonna happen is that this is actually going to call into the ml dotnet machine learning model. The recommendation model that we've built and come back with a zero to 100 rating for each movie.

A

The the rating here is represented by this percentage that you seen here with this burning icon, so a higher rating here, a higher percentage here means a higher likelihood of gala like in this movie.

A

So in this case, you'll see that Casino, Royale and gladiator, which were two action movies, have much higher rating eighty close to eighty percent here, each, which means that there's a strong likelihood that gal might actually like these movies over some of the other movies mention here, and if you take a look at the predictions here based upon some of the data that we used here, which is gals profile and gals, recently watched movies like Mission Impossible in premiere, which he uploaded, it will sort of make sense.

A

That makes sense in terms of why these two movies, which are also action, movies and popular movies, were or strongly recommended for gal. So something else I also want to show you as a part of this app is how collaborative filtering is helping with some of these gals predictions here. So let's take a look at Cesare here. It says profile here. So if you remember in the slide, when we showed you so sovereign gal have a very similar taste in movies here.

A

So what you're seeing here is that Cesare also has seen some of the similar movies, the gala scene and he's also uploaded or liked heat, Mission, Impossible and, let's say not liked home alone, here very similar to what what what what gal had also I, also provided feedback on, as as he says, he saw these movies. So let's go ahead and do the same thing for Cesare and click.

A

The recommended button here, which is going to bring in the 0 to 100 predictions, so you're gonna, see here that Casino Royale again is predicted or recommended really high, with the the close to 85% and gladiator here is recommended about eighty percent. Now, if you look closely here, we cheated a little bit here because in case of Cesare Cesare was already seen, Casino Royale and he uploaded it and we've used this data for training.

A

But the important thing to note here is that, since the czar is very close and movie tastes a gal and they roughly have the same cohort of choices, this prediction that data that we provide responses as profile with him watching and uploading Casino Royale is what's really helping, gals prediction being so high for Casino, Royale and- and that's really an example of how collaborative filtering here is an action of a demo botnet to wrap this up.

A

You can also see my profile here, which is almost a polar inverse of Cesare and gal, so in my case, I, actually like home alone and pretty much nothing else and you're gonna see here when I click recommend it's gonna, come back and say that my ratings or my preferences here for some of these movies is much lower, so you'll see here even these are still blockbuster and very popular movies that almost everybody liked. It did drop a lot by about 10 points here.

A

Essentially, so, let's quickly go over code here and show you how we build this so in this solution here we have two projects. The first project here is the is where we've actually build the model and and just to cover this a little bit of depth here you know you're using the Microsoft I'm a lame space you're using the using the ml dot and nougat. These are things that you want to start with. If you're getting started, I move that net once you've done that.

A

The first thing we do here is is, if you remember, we want to convert our data set, which currently is providing user already movie idea, rating and ratings from one to five to be a binary data set which is 0 or 1 for likes and dislikes. So we call this wrangle, data or massage data piece of code here, and what that does is that it actually goes through reached the rating CSV file converts the ratings greater than 3 into a 1 and converts the rating less than 3 into 0.

A

Once we massage the data we're pretty much ready to go here, we can start again by creating a learning pipeline, which is this first line of code. We can then go into adding to this learning pipeline using the text learner API our data. As we add this data, we need to provide the the the input class sure. So that's this was rating data class which takes in the user types use, variety movie, IDs strings and then the field that we're trying to predict as a label.

A

Once we've created our pipeline and added our data using this line here, the next thing is to do a couple of transforms. So if you remember earlier, during the start, I mentioned that ml data is a framework and it comes with a lot of transforms. These are transforms. You need to massage your data into a feature vector which is eventually what you learn is going to use.

A

So in this case, because this is categorical data, we have to perform this categorical hash, one heart vectorize we transform, and what this does is that it takes this categorical data and converts that into a numeric feature vector.

A

Once we've added once you convert this into into feature vectors, the this next line of code basically takes the two feature: vectors one for a user ID, the second for movie ID that combines them together to create one feature vector, which is what we're going to pass into the pipeline trainer API later.

A

The last thing here to do before we can call pipeline darkrain is to add the factorization binary, classifier learner. This is the learner that allows us to do collaborative filtering and we've added this since annulled that midpoint three once the training is done, so this dataset uses about twenty million records. So you can imagine take some time so I'm not gonna, run this here, but you can. You can follow this code. We're gonna be checking the check in the sample sometime today and and once the model is trained.

A

We can then write it to disk and start using it in a wrap, so I hope that gives you a little bit of clarity in terms of how we build this model and the app here that we're adding this model to is a very classy of classic asp.net MVC app right. So maybe maybe we take a look at this movies controller here very quickly and brick and bring in the recommend function any one second, so here's my recommend, recommend method here and let me just start this in debugger mode here very quickly.

A

So let's go ahead and choose Galligan click to recommend button here which is going to break into the visual studio, debugger oops. It didn't do that. Oh I thought we didn't start debugging it. So let me try that again.

A

Take a minute to do that, so here we go yeah we bring up growls profile, hit the recommended button again, and what that's going to do is it's going to break into the into the debugger here into the recommended method, so as a part of the recommend method, the first thing we do is we pass the profile ID. So this is essentially the profile ID for gal, which is being passed here.

A

The next thing we do is we actually load the machine learning model that we built as a part of the other project, and- and this this also allows us to talk about one of the other value props, that ml dotnet has with ml dotnet. You can write the model to disk, and then you can use that model app local in your app.

A

If you like, as we're trying to show you here or you, can hide this model behind a Web, API, publish this to Azure or any other cloud environment and use it that way, but ml dotnet will give you the flexibility for being able to do that loading a model. It's really easy. You would use this prediction model type and use the read: async API, the read: async API takes the input class and the output class just like how gal was talking about them. So let's take a look into that very briefly here.

A

So the rating data class is exactly the same class. You had before use variety movie, ID and the label, and the rating prediction is the output class, which stores the prediction, as we make these as we make these calls to the model that predict API.

A

So you can see in the rating prediction here that has two fields: it has a boolean field for collected labels that tells us true or false, and the model was whether the recommendation was good or bad, and then we've also had another column here called scores which gives us a numeric value which tells us, numerically what the score for this recommendation was the higher the score, the better the recommendation.

A

Once we've loaded the model wire, this prediction prediction model async API. We can go ahead to the next piece of code. That I want to show very quickly, which is I, want to have a look go through all these training movies here, which you're seeing here face of Titanic home alone and one by one get the rating or the recommendation prediction for them.

A

So I have this for each loop here and I'm just going to show you one prediction: you're very quickly, so this model that predict is what takes in the features that we added, which is the ID and the movie movie ID and the user ID, and what it returns is a prediction you can see. The prediction in the in the debugger here is going to return. Two fields is going to return the predicted labels field and it's going to return a score.

A

The score is, you can think of it as a numeric value, the higher the score, the better it is. We use a function here called sigmoid, and what that really is doing is that it's normalizing the score between 0 to 100 percentage value, so we can represent that to our user. So, in this case, when I, when I run through that you're gonna see the score normalized score for this particular movie, which in this case is I, believe face off, is about 53%.

A

So what this is telling us is that there's about a 53% likelihood, the gal, the light face off so in this for each loop we're going to go traverse through all the training movies and once we've done that, we would have gone through and and created our ratings array, which we essentially then passed or view for viewing. As a part of this experience right here, so I hope that gives you an idea on how you could use ml botnet from a perspective of loading it in a in an asp.net app and then using it.

A

We will check in the sample the recommendation sample soon in the repo as well the machine learning samples repo. So you can actually follow this as well.

A

So that brings us to our next demo I'm, just going to move back to slides here very quickly.

A

Give me one second here and we're going to talk about or the next thing. We we had in mind, which is we go so so, just to summarize here you know for recommendation systems, we added field of error factorization, which used in ml dotnet 0.3, it's a very popular learner using a clique prediction and recommendation system competitions. It performs really well here's a piece of code that we use for creating this learner.

A

If you want to follow along a couple of other things here, is that the model accuracy is about 72% for this model and in order to get better better results. What you want to do is you want to use more features. For example, we didn't actually take movie genre as a feature in our model. You can use that and then also as ml button matures in and goes moves towards ml data, point 1.0 will add more recommendation, learners and so on, which will help with the improving is more like receive and further.

A

The other thing that I've want to very briefly touch on and I'll come back with. The demo for this later is that you want to make sure that for F, sharp and Mel dotnet is a great environment. So if you want to do machine learning with f-sharp ml Dutton is there for you and as a part of 0.3 0.0 11.43848.

A

/Childprocession.

A

Actor absolutely.

B

Yep so figure in the next slide, so I mentioned earlier that we released mo dot net with a learning pipeline API at Build. This is a really good API for getting started with machine learning and gives you one concept. The learning pipeline- or you add your text, loader your transforms and your learner, and then you can train that learning pipeline to get a model. However, this this API had very obvious limitations. So one example is that you always have to end the pipeline with a learner or you have to have a learner in the pipeline.

B

So if you want to just do your data processing save that data as a separate file and come back to it later, you wouldn't be able to do that or, if you wanted to add multiple files to use for training in the pipeline. That's also not very easy right now.

B

Another area, that's really important, is feature importance and model understanding. So if you want to look inside your model and.

A

B

What are the weights or coefficients to understand why it's making the predictions that it's making? It's not really possible to do that right now with the learning pipeline API, and we also had a lot of different suggestions from the community in terms of how to make the API better how to make it easier to use- and we knew from the beginning that we would want to make some updates before we reach emmelda 1.0.

B

So this new API final net will be actually released in the next few, we as an early version with an ml net, and it will enable all of these different scenarios that I just mentioned, and more so one other example that came up is taking a model that you've already trained and using it as a starting point to train a different model. So you don't have to bring in all of your data set again and start from scratch. This is a really common scenario in many cases like clique prediction.

B

Now we also want to use parallel terminology to existing ML frameworks like scikit-learn, so estimators and transformers and you'll see that over here we won't go into in detail today, but it's something that you can learn about a bit later and finally, I think one of the most exciting things about this new API is that it takes advantage of strong types and dotnet and you'll see how it can help guide me through which columns are available at each step when I'm training when I'm creating my model and what is available to me.

B

So if we- and one thing to mention, is that all of this information is covered in a blog post that we posted yesterday that introduced ml down at 0.5 and also discusses the upcoming API changes. You can also learn a lot more in the github repo, where all of this is being discussed in various issues.

B

So if we switch over to my laptop we're going to look at another console app again with the github github scenario, so if we switch over to the other laptop awesome, so you'll see over here that we start with Microsoft ml you'll. Get 0.6, so this is a preview of the next release and it's already publicly available, so you can actually try it out and use it today.

B

Now we have the same data set again issues train that has the github issue label the title and the description now we'll only pay attention to some of the areas of this code. For now now. The first thing that I want to mention is this reader that we're creating so before we always created that input class where we defined which columns we want to use.

B

Now we actually bring in this text loader, and we say that we have the area in column index 1 the title and column index 2 and description and column index 3 we're ignoring the ID, because we don't need them now. What I'm going to attempt to do is create the estimator.

B

This defines the pipeline and the steps that we're going to take to transform this data and turn it into a model, and what you can see later in this app is that we actually read and the data and provide it to the estimator through the dot fit function, to try to get the model.

B

So what I'm going to do is create an estimator and I'm going to start with the reader and I can actually see that I have make a new estimator available to me on this reader, so I'm going to try that and then what I can do is now append as many steps as I want to take the input and transform it into to get the predictions from the model. So you can see.

B

Is that actually, because this is strongly typed I can see that I have the area the title and the description as the columns that are available to me. So what I'm going to do is first of all, do the same feature ization that we did before, where we have to turn everything into numbers. If I just took the area and try to do something like predict, I see that I have nothing available, because it's the wrong type, it's a string.

B

So what I can do is turn it into a key and then I see that I have predict SDC, a classification available, which is the linear learner.

B

Now I, don't want to do that yet, because I want to actually do some additional feature ization first, so what I can do is create this label and then again for the title and description I'm just going to do feature eyes text for both, and you can see that intellisense is actually helping me here to understand which columns are available to me at each step and what are the appropriate transformations and components that I can use on each of them.

B

So at this point everything is a numeric, but what I want to do is again concatenate the the title and description into one features column, so I can take the title and I see that I have can cat available here, and one thing that's important, because we're using a linear learner is that we want to normalize the features. So you can sorry I'm typing that here now, but you can imagine in the future.

B

We'll have some helpers that can identify that you're using a linear learner and that you should use this normalized component now I'm also going to keep the label and propagate it down to the next step.

B

Now, I'm ready to start training, so what I can do is first of all, keep the label and then I'm going to create a score column that takes the label, predict with Seca classification I'm, going to provided the features and I'm going to provide a loss, function that I defined above, but in the future, this will be a default.

B

Now I want to change the data a bit so that it can be used for evaluation and I can actually, instead of just taking the key. That is, the output from this model I'll be able to look at converted back to the string that we can attach to the github issue. So, first of all, I'm going to propagate the label I'm going to propagate the score, but I also want to create a predicted label by going to the score, and it has this predicted label field.

B

Now this is a key, so it's going to be the integer that represents the class, the issue from github. What we actually want is the string, so I can go back here and do two value and see that it will give me back the string that I want, so it converts the key column to a column containing the corresponding value.

B

So now I'm done I. Actually this is the estimator that I'm going to use and when I use the fit method and give it the data I get the model back now. Maybe I'll start training this. While we go through the rest of the code.

B

So the first thing that we might want to do is evaluate the model on some test data, so this is a separate data set that we have available and what we're doing here is just calculating the metrics on this data set.

B

So here we're going to print the micro accuracy, which is how many of the issues in this test data set did we predict correctly over here we have make prediction function which again takes an input and output class that have the input fields that are available and the predicted label, which is the thing that we care about seeing at the end, and we can do predictor dot predict on a new github issue and then output, the predicted label.

B

So this will actually take a few minutes. So I think we want to do is actually switch over back to the slides and go over one more scenario that we have available. So on can mention earlier deep learning and in ml done at 0.5 we added a transform that does tensor flow models scoring. So if you go to the next slide, one thing just mention really quickly about deep learning and we don't have enough time to really explain it in detail, but it's an area of machine learning.

B

That's really revolutionizing areas like computer vision and speech recognition. The methods in deep learning, like neural networks, have been around for many years, but it's only in the last decade or so that it's been able to take advantage of the huge amounts of data and compute that are not available. So we can train image classifiers that detect whether there's a dog or a face or other objects within an image with really high accuracy.

B

So we added tensorflow to test flow model, scoring two ml dotnet. Now, if you're not familiar with tensorflow, it is one of the most popular frameworks for doing deep learning.

B

Now, when I say we added model scoring what I mean is that if you have a tensor flow model like Inception, you can bring that into your ml net pipeline and give it to your input and get a score out of it. Get a prediction out of it. Now, with the old api's, you would have to always add a learner on top of that transform with the new api's your look at that score directly and use it.

B

So that's why we're saving this demo, even though this transform is available in 0.5, we're going to show it to you with the 0.6, the new API s. So if we switch over back to the other laptop.

B

What we see here is again a console act with ml dot, net new, get version 0.6 and what we're doing is using this tensorflow model that was downloaded from this location.

B

Very similar api is in terms of the reader and the estimator, but now we have this image path as the input. So we have a file in this folder. So a couple of images and what we can do is load it as an image resize. It extract the pixels from it with a few parameters that match how the tensorflow model was trained itself, and then we can do apply tensor flow graph and give it the model location. So the model in this case is a frozen model in the tensor flow terminology.

B

Now, when we downloaded this model, we also got the inception labels file. So this tells us for each class that that model might predict what is the corresponding item that it has found? Maybe it's a Golden Retriever, another type of dog or a car or something else. So after I have my estimator I fit it in the same way that I did before I, create the prediction function and I, give it the name of an image or the file name.

B

I can really quickly run this, and in this case it will detect the type of dog here now it's not always going to be accurate, but inception is actually a really good model and it works for a large number of the images that we have tried over here.

B

So this is a really sneak peek into how you could use a tensor flow model, scoring API or transform with the new ML dotnet api's. So I think if we switch back to the slides.

A

So I just want to call out there thanks thanks Gail for demoing tension flow and the new API. So one thing I just want to call out here is that this is a very. This is the start of us, adding deep learning and I'm about net. So some of the code there is gonna, look way complex and we're working, we're gonna work on making it easier for you to be able to score existing tension for models, pre-trained tensor for models or deep learning models into ml, not net.

A

It's gonna take a little bit of time, but that's. This is just to start for those folks who already know what tension flow is and know how to use tensorflow, and they want to use tensor flow as a part of ML botnet. So I just want to make that clear. The.

B

New API is for questions as well. Yeah.

A

So the new API is gonna be available as a part of the point. Six and we've got that package that gal just showed you which ships sometime next month, so just one thing I just want to clarify mention here as well.

A

Is that a lot of the code that we've shown you here we're currently composing these this the model training code by ourselves, but in the new API or the old API, and one of the things we're also working in parallel- is on a UI that will generate order, generate the model training code for you and the model consumption code for you as well, which will really help you get started with ml is if, if so, if this looks a bit complex or looks too much to get started with.

A

Currently you wanna take a couple of questions here, but we just want to go over this one last slide here, which is about what's next with ml dotnet and we're working on improving the API as Gail just showed you. We really want your feedback there we're trying to bridge a bridge in two worlds. There we're trying to generalize over the machine learning curve, we're also trying to give you a very powerful machine learning API that gives you all the flexibility for you to be able to support and carry on the demo tax.

A

You want to be go to and go ahead and add deep learning of a tensorflow, we're gonna, add provider UI in some time which will allow you to simplify some of these tasks and we'll go an innovate. Both the language, C, sharp, a C, sharp F, sharp for machine learning, but all and also improve the tooling that we have in BS here. Just one last thing, I want to show you before we take the take questions here. Is that I skipped over one of the demos that we had here, which was the F sharp one?

A

So let me just go over that very quickly and point out. One thing here:.

A

Doesn't seem to come up, let me let me try another. Thank you very quickly. Oops.

A

Can't seem to find it simply let.

B

Me start taking a few questions: if there's the one question from Roger, so content to flow be fully used with via ml net doesn't mean I do not have to use Python, so we mentioned that the tensorflow capabilities that have been added to ml don''t so far or for model scoring. So if you have a trained tensor flow model that you found online or that you trained through ten to flow in Python, you can bring that in and use it now. We want to expand these capabilities over time, but that's not available right now.

B

So if you're training your own model with your own custom data, you would still want to use tensorflow directly right.

B

This kind of service does Ezra cognitive views ml net, so not for all models or not for many models. I think there's some where we're discussing things with the kind of services team, but I, don't think most of them are based on a meld on that right now, right.

A

So I think yeah, so I will have to go and check in the sample there. It is the f-sharp sample is available in our samples, repo. The one thing I want to call out is that, if you're using records, you do have to decorate the with the CLI mutable attribute, which will make an immutable for the mo button API. So just keep that in mind, but I think this is what we kind of like had to show you today.

A

You can reach it and reach out to us or or github machine learning, issues, repo and ask us all these questions. Are you asking us here's some of the great questions gal already covered? You can also reach us on Twitter and via email, but thanks so much for having us here, we'll have Cesare next who's going to get into providing you a little bit more background about what machine, learning and AI is and how you can use different kinds of machinery right components that Microsoft provides along with them a lot net.

A

So thank you for watching us today and hopefully hopefully you enjoyed a recession. Thank.

B

You very much for your time.