gRPC Community, 5 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: gRPC April Meetup/ ML Microservices with gRPC and Python- by Andrej Baranovskij

Description

ML Microservices with gRPC and Python by Andrej Baranovskij, Founder/Senior Developer at Katana ML

Andrej Baranovskij, Founder/Senior Developer at Katana ML, will present "ML Microservices with gRPC and Python". ML logic flow can be complicated and it is advisable to split implementation into different services. For example, microservice for data preparation, another microservice for model training and so on. When logic flow is split into different services, we need to transfer large amounts of data between these services. Andrej will show how this can be achieved with gRPC.

A

Okay, so thanks for joining everyone and yeah for me right now, it's quite late. It's actually half past midnight um based in lithuania, europe, uh but anyway, I'm no, I'm not sleepy. I because I'm a developer and I never sleep. I just go to stand by sometimes right so yeah.

A

So it's fine for me and thanks for this invite and yeah I'm happy to talk about our experience of the way how we use grpc and just a very quick introduction, what we do. We um start up machine learning, related startup and we implement different solutions that help to automate enterprise and one of the things.

A

For example, we work right now is automatic extraction from uh data extraction from the documents and automatically posting invoices and stuff like that, and my background is software development, so I I jumped into machine learning from software development and from the start, but I saw is like when you start with machine learning, you usually start with notebooks csv files and you load csv file into the notebook process, data train model, and then you run inference.

A

But this is fine when you do study when you study machining, but when you actually want to move your model to production, it's completely different story, because you, when model runs in production and when some problem comes and you need to fix the code.

A

If you have this spaghetti code and everything is inside, all the code is in a single place then becomes very hard to to manage the stuff, especially because in in machine learning, you have uh to put quite a lot of logic for data processing to clean data uh process it in a certain way and so on. And if you do processing and model training and inference in in a single place.

A

It works up to some point where, when complexity grows and uh it's hard hard to manage the stuff yeah and by the way, if you have any questions in the meantime, just um yeah, you can um stop me and task so I'll, be happy to follow up yeah. So what what we saw is that it doesn't work when uh to run everything in uh in one like monolithic application, with machine learning.

A

uh But what typically you would see when you would start with ml when everything runs in a single place, and then we decided that uh for our product we need to split. We need to have a separate services. uh Each service would be responsible for its own stuff.

A

Like data fetching data transformation, data, cleaning data processing, maybe it will be done by service a and then once data is ready. We could transfer this data to another service which would do model training and based on that data, and when model is ready, then there'll be another model which should um serve api and would allow to execute like inference, requests to do to do the processing right and then the question comes um with that: okay, we can split, but uh this one is specific about ml.

A

Is that um you operate with usually flash sets sets of data. So when you train the model, you need to have training, sets validation, set, testing set and usually there's quite a lot of data, and you need somehow to transfer this data from one service to another right and the first option that would come to to the mind would be to use json and rest calls over http.

A

uh It works, but it's not convenient because as data sets are quite large, then you, you may have lots of attributes as well uh in data sets. And then, if you use json, to send data over between services, then it comes extra uh complexity. When you need to parse the data, when you get this json uh text and then you need to pass all the data, and especially the problem is with numerical data you may lose some precisions and so on and so on.

A

So there are a lot of small details that uh kind of hidden, but then, as you start to work with that, it can comes to the surface and it becomes quite complicated.

A

So then we started to look uh for alternatives. uh What other options we could use uh to implement communication between services- and we know that grpc is quite um suitable for our case, because until show later that in a demo because with grpc, you can quite easily uh transfer data from one service to another, and you don't need to play with data parsing, because you kind of you're able to define the method which returns uh type and this type could um encapsulate um different. uh Multiple data sets so in in a single call.

A

You could return, for example, training data, set validation and test data, set all the data that you need and run the logic okay. So this was introduction and now, let's jump to demo application and before that I'll just few words about the model that I'm using for this demo application.

A

So the model is based on this article that I wrote back in december on towards data science, and this model is the idea of the model is to keep it simple right and it's using a standard data set that you would be able to find in different uh ml examples. uh This is boston housing data set. It comes with um a set of attributes that describe uh price for certain, um like real estate like house, for example, and to make it this model slightly more interesting.

A

We are training model to kind of forecast, not only single attribute price, but also additional attribute people. Teacher teacher ratios, so that's based on a set of attributes a model is able to predict a price for this house based on the neighborhood, maybe when the house was constructed and so on and so on and additionally, it's able to predict uh able to predict another another attribute called ptr uh just to make it more fun and yeah.

A

If you would be interested after the webinar to read more about this sample model, you would um go for that article and also all the source code is being provided on the github and is being referenced in the bottom of this article.

A

Okay, so let's uh jump to the source code- and I have dem application- is- um is quite simple because I effort to it should not be over complicated in order to uh for you to understand what we are doing here. So uh it's not this demo. Application is not our main product, but I I implemented separate uh application, obviously, which specifically shows the the main uh domain solution which we are using and uh it highlights the advantage of grpc for a ml domain.

A

Okay, so first, there are uh two two uh applications: the first one is implement data service and second, one implements training service.

A

So the idea of data service, that is, that it should read data load it clean it up, remove some attributes that you're, using as a target, attributes right and then uh split data into training and test sets and and then do data normalization as well, because in ml, when you have a data set with certain numbers, it doesn't work to send this just this raw array for training, because if number distribution is is quite high, you may have one attribute it can be like uh if, if the scale is different, so one attribute is like, uh maybe from one to ten another attribute from one to ten thousand and in this case uh model would not train uh very effectively.

A

So you need to do data normalization and you need to translate all the attributes to be in the same scale. For example, from minus one to one- and this is um kind of common task in melbourne, so this job is done as well as a data in data service, then all the data is being prepared and it's being sent uh to the caller, and the caller in in in our case, would be in a training service which is another application which using grpc to make a call to a data service.

A

It gets the data and based on that data runs the training. So essentially, this is very common ml flow, but the main difference. uh If you look for most of the examples of ml uh tutorials or just any uh like overview material for ml, you would see that those these two steps are done uh usually in the same application.

A

So the idea here is to split and have two different applications to do this to do the job and okay, if you look uh into the data service application, uh we have defined here, data service, proto uh file, and uh in this profile we define uh messages for the request and response request is quite simple: just one attribute size and it's kind of with this parameter we can.

A

We can have the option to split original data set into the training set and test set, based on the percentage that we specify in this test size parameter for the response. We return multiple attributes, we return normalized data for train test sets and then validation set normalized as well, and we additionally return target database so uh for train test and validation.

A

So the target attributes uh would be used by ml model to when when training will run, and it will try to match uh input, attributes from uh train test validation sets with with those targets, attributes from train test and validation and will try to build patterns and or rules, and then later it will follow those patterns to work with unseen data.

A

Okay and to make it to keep it simple. This one service defined boston, housing and this method called prepared. Data is being defined which accepts parameter uh to to set the testing size and it returns the response, uh all the all the data that we was prepared right and then the next step. What we do when we have this protofile, we generate um client, grpc, client and server, and for those of you who would be interested how we do it.

A

I included a readme file over here yeah and by the way, this source code, that uh for that demo, that I'm showing is also available on github, and um you could find it it's this example is um available uh for anyone, so you could try it out and play it aft after the webinar as well yeah. So what I like with grpc, because in the past I was working a lot with soap, web services and you know in so when you want to generate clients for the web service. You have the option to generate.

A

uh I was working with java right, so this was option to generate proxy proxy code, which helps you to call uh to make a call to the soap service.

A

And always this code that was generated was very cumbersome and in case of so many classes which generated, and it was hard to manage all the stuff. And what I love in grpc is when, when we generate um code on top of this protofile, there are just two files being generated and they're quite simple. And this is uh what I like, because it you don't over, complicate your application. It's kind of uh easy to manage.

A

uh You get those two files, and then you use the same files on the server and on the client, the same scripts that they are being used to make a call right, and this is uh convenient as well, because you could reuse the same on the same stuff on a client and on the server okay and then.

A

Then we have data service class is responsible to serve api in this case, and this is kind of in this demo.

A

This is a standard code that you would see in grpc documentation, nothing, nothing special over here and what we do here is we call we call start method from the server and to start listening for incoming connections, and this is where um this class it implements uh boston, housing, servicer, uh the one that was defined uh in profile and we import import the the script you just auto generated from the profile and over here we implement uh prepare data method. The the method which is was defined over here in metadata file.

A

Right- and this is the the main. This is the main logic where we handle stuff. So we recall the first thing we do. We call prepare datasets methods from uh from data helper right and if you look into the data helper, this is standard codes that you would see in any in most of the ml examples right.

A

Prepare data sets. So we do what we do here, the load data. Then we create pandas data frame and then we extract target attributes. We split data into a train test validation. Then we create target attributes.

A

Then we normalize data for train test and validation sets and then finally, we return all this all those uh sets back to the caller right, so the data is being prepared and in in this case, as I mentioned already, a standard ml code is being used, the same code. You would see in in any way in any ml implementations. Typically right, then we get back data sets and then we have print out information about datasets just for for debug purpose and then uh at the end we would return.

A

We would construct a response of the same type uh as it was defined in a profile, boston, housing response.

A

Then we for each element from that response. We assign the value, and the tricky part is, is that we operate with numpy arrays uh when we prepare data sets it's not just a simple python array or whatever it's a numpy array and uh by default.

A

Grpc doesn't support numpy type, and you cannot, just out of the box, simply send this array through gfpc, but the standard way how to solve this problem is to create by terry, because numpy library allows to convert or save numpy array into the byte ray using standard, save method like we do it here, so we create bytes, io object and then assign we basically copy them from original numpy array. We copy this array into the bytes array and using a numpy, save method.

A

Right then the same stuff is done for all datasets and then this data says that um from numpy array. Basically, I converted to by sorry being sent uh for the response.

A

Okay. So now, if I can start, I could start the service it's up and running and then, if I'll go to.

A

Training service over here uh we have uh the same, uh uh both grpc files that are generated generated in data service, the same the same files are being copied over here and then we have uh just for demonstration purpose, uh training, service, test script is being created which initiates uh uh training service, and this is the one and then we call run training method right, and this is the main method which is uh is here and what we do here. We first of all fetch data right, and we call fetch data methods and.

A

This method gets gets the data from data controller from here, so we've got uh our phage data method, which actually what we do here. We use grpc api and we pass input parameter test size which will be used to split data into training and test sets, and we make a call.

A

We call prepare data method and we get back response and response returns uh byte by terrys, and then we need to load uh because we want to uh to keep operating with numpy arrays. When we run training, we want to use numpy re, so we need to load back from by 3 into numpy and again we're using here standard numpy method called np load. So no, no, no, hacking, no, nothing!

A

Nothing special, just using standard standard, api right and the only the only tricky thing uh here is that when um targets at the target attributes array was uh converted to by theory, uh it was targeted since we use our model is training for multiple attributes.

A

For two attributes to be precise and uh targets object, it was double and uh containing both race and when those this object was converted to by theory, uh it was array was created, like a router main array, with two arrays inside, and this is how it was sent and when we converted back to numpy array, then it was not converted to be exactly the same object object as it was. Originally. It was converted to numpy array with two arrays inside and then we do this extra step to convert it back to be exactly the same.

A

Object tuple object to be the same like it was originally because otherwise uh tensorflow ways in tensorflow and uh tensorflow training model the method which trains the model. uh The feed method would not be able to recognize this target as the one which is suitable for training.

A

So we do this little trick to return the state as it was originally and as it is done, then we return all the sets back to the main method from where it was called and and then the rest, it's kind of straightforward, because we build the model and compile it and then run a standard, uh tensorflow fit api call which trains the model, and we pass the data here and this uh data, because it's num we're using numpy race, which we brought back from bite siri.

A

Then it works the same as um like it's like, like with original data sets. It doesn't know that data actually travels through flow grpc from another service, uh it's transparent for in this case, okay, and just to show you that it actually works yeah. I could.

A

Run training service test, uh then it makes a call to another application. Data was prepared over here to send back and then training loop runs right and training executed, 100 and it reports back um information about the quality of the training right. So that's a standard thing as well. So we see that communication works. We were able. We are able to make a call from training service to data service.

A

We were able to package the data, send it back from data service to training service and training group executed successfully right, and this is of course a simple example in in a real case.

A

You would have uh probably logic for data processing would be way more complex and you may have different different uh checks and stuff running for data processing and having this logic to be turned to to be runnable in separate services is good for maintenance, but not only it's also good for for runtime, because if, when you have data processing running in one service, training running in different servers, you have more options for for scalability.

A

You could run this service for for data processing, on one machine or on one container, with certain resources and uh training we would run on on gpu, for example, uh to improve training performance and and so on, right.

A

So this option, when you can split it, gives a lot of good possibilities and it's running a ml flow in microservices is quite.

A

It's kind of natural because comparing to those erp applications when you have database in the background, if machine learning you typically you don't have a database and it's easier to split logic into different services- you're not constrained by database, so it makes it more more natural, okay. So that's.

A

That was my demo and yeah. The main point was to show you that uh in to explain how uh we applied grpc in a ml domain and to explain specifics on of ml and why uh I think grpcs is useful for for ml use.

A