Red Hat OpenShift Open Data Science Conference (ODSC) Europe 2021, 15 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Integrating Data Science and Application Development

Description

Data science projects can fail before ever being realized. Integrating data science into your application development workflow and tooling from the beginning can greatly improve chances of success. In this demo, we show how a data scientist uses Red Hat OpenShift Data Science, along with partners and open source software, to develop a model and integrate it as part of a heterogeneous application.

A

Hi, I'm sophie I'm a data scientist at red hat and I'm here today with my colleague chris who's, an application developer.

A

So when we think about the machine learning workflow, we see it as having kind of four stages like this. It starts with preparing the data exploring that data and seeing what's there and then we move on to developing that machine learning model. So this really encompasses everything from future engineering to testing out different machine learning techniques and using a range of machine learning libraries as a data scientist. This is where I spend most of my time and energy.

A

Often the next stage is to deploy that model as part of a pipeline or a broader application, and then once it's deployed, I need to monitor it and look at the model's performance over time.

A

Now these three stages here as a data scientist, I'm pretty confident with, but this is often a pain point for data scientists, we're not application developers and so deploying our model as part of a larger application can be pretty tricky. And that's what we're going to focus on today. We're going to talk about the collaboration between data scientists and application developers and the process of going from that machine learning model, perhaps developed in a notebook to a model which is deployed and running as part of a larger application.

A

So today, we're going to step through the process of how to do this with openshift, which is red, hat's, kubernetes distribution.

A

Now, as a data scientist, I shouldn't have to become a container or kubernetes expert in order to reap the benefits of the tools and that's why we're using uh red hat openshift data science, which is a new managed service from red hat targeted at data scientists. So that enables me to work in my usual way and by working on top of open shift.

A

The integration of my model output into an application becomes easier, so I'm going to begin by working in notebooks and specifically I'm going to be working in a template that chris has set up for me and git. This makes it easy to ensure that the work that I produce can easily be lifted into an application that lifting is going to be done with source to image. So chris is going to show us how we can use softer image to take the output that I produce and put it into an application.

A

So, chris I hear you got a particular problem to solve.

B

I have lots of problems to solve, but this one in particular is about my dogs. My dogs are digging up my yard and I want to be able to find out when they're in there, so we need to weigh a service that can detect dogs. So I get an alert when they're out there digging my yard, um and could you help me with that sophie.

A

I think I could so.

B

Let's go ahead and get a project together, let's start off with an sty project in git that'll build and deploy automatically in openshift. Every time you push up a change. First, let's start off with this template.

B

Now that we've generated a new project, you can see some files that'll interest you here, we've got some notebooks for you to get started and experiment, and once you're done experimenting and you've got a good prediction function. You can drop it here in this prediction.pi file and then you can put your libraries like tensorflow or pytorch in this requirements.txt file and that'll get included in the service as well. Once you're done, you can go ahead and save your files, push it up to git and it'll, build and deploy automatically all right.

B

Well, there you have it. Let me go ahead and send you this link to this git repo, and you are off and running with it.

A

Thanks chris, so I'll, take that and head over to red hat openshift data science.

A

So once I've logged into red hat openshift data science, I get taken to a dashboard that looks like this. What we can see here is a card for each of the applications which I've got enabled in my environment. So I've just got a few things enabled here we can go over to the explore tab and look at all of the applications which we're able to install into our open shift data science instance and for each of those applications. We've also got associated resources.

A

These are things like documentation, quick starts to help you get going and how to's to do little tasks, the kind of thing that you end up looking up every time, you're doing work. So let's go back to enabled, and we can see one of these quick starts here. If we click start, we can see this creating your jupyter notebook. Quick start. That's going to step you through kind of your first experience, getting going with jupiter hub, I'm going to go ahead and click launch and launch cheaper hub.

A

So this will take me to a server starter page. That looks like this, where I can set some options for my notebook server. I'm going to stick with the tensorflow notebook image because I'm going to be doing object, detection and so tensorflow is arguably the right framework to do that in container size, I'm going to stick with medium, but if I was doing something that needed larger resources, I could select it here and I can also request a gpu given that they're enabled in my environment.

A

I want to add some environment variables so today I'm going to be accessing data that is in an s3 bucket on aws, and so I add in my aws access key and secret key here by doing this they're going to be injected into my red hat openshift data science environment. So when I'm developing notebooks, I can access these through environment variables, I'll go ahead and click start my server and we'll wait for some notebooks to.

A

A

So here I am in my jupiter lab environment and I'm gonna go ahead and clone that git repo that chris sent us using the git helper. So I'll add in my url click clone.

A

And there we have that get repo added to our files that we can access. Now, I've already made a few changes to these files that chris sent over. So let's go ahead and look at those.

A

In this explore notebook, we step through some exploratory data science, so we begin by importing the packages that we're going to need to be able to do this object. Detection.

A

We then use our aws access, keys and secret keys, which we set in the spawner to download an image that I've got stored in an s3 bucket. So I've downloaded this image here of these dogs. We can have a look at them. This is max amargo and we're gonna transform that image into a tensor, so that tensorflow models are gonna, be able to process that. So when we do that, we get something that looks like this.

A

We've got an array here.

A

And now that we've got that in a nice format that tensorflow can deal with, we can go ahead and test out a model to see if it can detect these dogs.

A

Now, there's lots of pre-trained image detection models on the web and for the purposes of this demo, we're just going to use one of those today so we're using the ssd mobile net version two, and this is trained on google's open images data set. So we load in this.

A

Model we can then pass our image into that model and see if it can detect anything in the image, and this is the output we get so again. We've got a array here. The tensor.

A

We have a set of classes, so these are the classes corresponding to the objects which have been detected. We have the names of those classes in human readable form, so you can see. We've got dog dog, footwear and then a range of detection scores here, denoting how confident the model is in those detections that I made so with a bit of standard code. We can recreate our image and plot the boxes um from that object. Detection model on top of those.

A

So here we can see the parts of the image that are being recognized as dog and the parts of the image that are being recognized as footwear.

A

Now this model is doing a pretty good job of detecting the dogs. That knows that there are two dogs there um and if we filter out all of the predictions that it made with certainty of less than 50, then actually the only things that it recognizes is those dogs. So it's confident in its predictions compared to the other predictions that it's made.

A

So now that we've got our model that we know works well enough for this use case, I'm going to go ahead and do a couple of things that chris asked me to do. First up, I need to put all of my requirements for this model into a requirements.txt file, so the three at the top chris added for me and these ones below are the ones that are requirements for my notebook and use case.

A

Specifically, so we've got tensorflow, matplotlib and numpy, and we've also fixed versions for those I'm also going to go ahead and create a prediction.pi function. So this is what chris's application builder the source to image builder that we're going to use is looking for. This is what it's going to use to make those predictions, so our prediction.pi function just really takes code from that notebook that we had earlier it loads in the model, and then we've got a few functions to make predictions and kind of clean.

A

The output of those predictions I can now head over to this predict notebook that chris put together and check that that service is working as I expect it to so. I can install all the requirements and I can load in an image and make predictions by calling that predict.

A

Function to test that it's working as expected- and indeed we can see here when we make a prediction on the data- we get some prediction response here, so these correspond to that bounding box. That goes around the object. That's been detected. We get an estimate of the class and a probability with which that class has been predicted.

A

So with that I'm gonna go back over to the git helper and push this back up to get for chrissy's. So chris I push that back up to get. Can you go ahead and use this now.

B

All right, that's fantastic, sophie I'll pick it up where you left off. Let's go ahead and take a look at this project and see what we got. You created your model, you updated your prediction function and you added your dependencies to the requirements.txt file. That means I can go ahead and build and deploy a new service straight from git. Let's go ahead and do that now.

B

Here is the current application ready to go ahead and consume that service which we're going to create right?

B

Now, it's going to go ahead and recognize this as a python project we'll go ahead and create a new application group for it called dog detector.

B

B

All right, you can tell it's building there and that's going to take a minute. So let me go ahead and do one more thing. We want this to build and deploy every time you push up a change, so let's go ahead and get that working.

B

We're going to go ahead and create a web hook right in git, so that every time you push we'll go ahead and build and deploy, and there we go so now that that's done, let's go ahead and see what my app looks like.

B

Perfect, we have detected a dog well sophie there. We have it my app works. um The only problem is is like. I actually have to push this button to find and detect my dogs, that's not really what I was going for. um So I thought of something, though, if I could leave the camera running and take some intermittent images and push them up to kafka, I could go ahead and detect all the dogs as they're, going in kind of full motion. So I think that's the answer.

A

And so do you need me to do anything extra.

B

Well, let me show you so the nice thing about doing this custom map is that I can do whatever I want. I'm not locked down to a certain thing, not a rest api, not a certain framework, and in this case I did away with the rest api code and I went ahead and made a quick and dirty kafka consumer. It's going to pick up those images off of one queue, do his prediction and dump those objects into another queue, and then I'll just read those in my app and display them.

B

So first thing I did is I made a kafka cue? I didn't really want to manage it myself, so I went ahead and just made one on red hat: open shift streams for apache kafka, the new managed service, and so I got the dog detector here and I need to get his credentials once I have those credentials.

B

I can go ahead and plug them into my environment variables for my notebook server right that way, I'll be able to use them from inside my notebooks without saving that stuff up to.

B

Get all right! So what do we got here? So, as you can tell, I can go ahead and connect to this kafka cue and produce some. You know sample messages and then, in a whole, different notebook in a whole different kernel. I went ahead and consumed those messages so that.

A

Sounds great but.

B

Really what I want to do is take a look at the data, the images that are in the queue and go ahead and save them to disk. So I can play with them and make sure that they work.

B

So I've got these messages with the images in them, and so I need to go back and make sure that it works. But really you already did all the hard work here sophie. So I went ahead and just took your prediction function since it was in a single file and I just dropped it in here and now. I can go back and go ahead and build this app. The exact same way. I did the last application and it'll just run just like.

B

B

And there you have it it's a working app all right. So what do you think.

A

Awesome thanks so much for putting this app together. Chris.

B

Are you ready to join my dog detector, startup and quit red hat.

A

Sounds like a plan I'll, let you tell our manager, okay, let's recap what we did today, so hey sophie. So what do we do?

A

Well to start out, you set me up that git repo, so that I could go ahead and create a model in a format that was able to be deployed using the source to image functionality.

B

Right and we did a really simple workflow, because it's just the two of us- we wanted something very simple. So all we used was roads, openshift and git. We could have done something far more complicated and complex. We could have used storage, we could use pipelines, we could have used deployment software and we fully expect customers to be able to do that using partner software, open source software and red hat managed software and to come up with their own processes that use all their favorite tools. But for us this worked great.

A

Here's why you can go to find out more about what we've shown.

A