Red Hat OpenShift OpenShift Commons Gathering: Los Angeles 2021, 3 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Data Science as a Managed Service Audrey Reznik (Red Hat) OpenShift Commons Gathering 2021

Description

OpenShift Commons Gathering 2021
Data Science as a Managed Service on OpenShift
Audrey Reznik (Red Hat)
https://commons.openshift.org/index.html#join

A

Okay, I'm standing between you guys and lunch. My speaker notes didn't come up, so I'm going to wing this so that we can get done on time and everybody can eat so good morning. My name is audrey resnick, I'm a senior principal software engineer and data scientist with the red hat, open chief openshift data sciences team, and I'm going to talk to you about.

A

What's the deal with managed services and model delivery, so if you're, at a data scientist or if you work with a data scientist supporting them, you'll know that when they create the model, there's a lot more than to creating the model you want to be able to get data to the model you want to be able to deploy the model monitor it. So we're going to go into some of those items, we'll take a look at a model's role in an intelligent application.

A

Kind of get into what managed services are sasha, thankfully, went ahead and covered some of that. So I don't have to go into detail for that. Thank you.

A

We'll take a look at who uses managed services, and surprisingly, it's not just the data scientists when we're talking about intelligent application creation and then we'll kind of look at how managed services uh help you, along with the model delivery where you find them, and finally, I'm going to click through a very quick demo of the red hat openshift data science platform and its managed services that are available.

A

So when we take a look at intelligent applications, um we have to take a look at the model's role in them now. Intelligent applications by themselves are not just one small thing: they are a distributed system, so there are things uh that work in conjunction all the way from uh data verification uh serving the infrastructure, uh go ahead, going ahead and doing some configuration and when we go ahead and take a look at these intelligent applications.

A

We'll see that the model code is just a very, very small part of that and the model code or the model itself has to be able to make its way through these district. This distributed um these distrib distributed infrastructure, so it has to have a way to interact with things such as feature and extraction. It has to be able to interact with some of the analysis tools and you look at that and go wow. That could be like really complicated.

A

It could be like kind of hard. So is there a way to easily create a model and be able to use this thing called managed services?

A

Well, there is at red hat, we went ahead and we created something called the red hat openshift data sciences platform and I'm going to go into detail with that, but that's just an easier way that we can go ahead and help the data scientists be able to go ahead and deliver their model and monitor it.

A

So when we take a look at managed services, we can divide them into four groups or four categories, and within those four categories we actually will have a number of personas that are going to interact with them. So if we take a look at the first category, we want to go ahead and gather and prepare the data. So that means we're going to look at data storage, data lakes, data, warehousing stream processing, and it's really our data engineers that are going to get totally excited about this category of managed services.

A

Then we go ahead into actually developing the model. So when we go ahead and develop the model, we're going to bring the data scientists in and they're going to go ahead, and you know create the model work with the algorithms that they need to solve the particular business problem that they're trying to solve.

A

I just want to mention that you'll notice at the very bottom. We have it operations, keeping an eye on everybody we'll get back to that.

A

Then we want to be able to deploy a model in an application we can get in with ci cd pipelines and that's where you're going to have your application, developer or machine learning engineer helping out and then finally, we want to look at model monitoring and management.

A

So we want to see if that model that we've deployed has any drift.

A

Is it giving us some of the answers that we thought we were going to get or do we have to correct it and retrain it, and that's where both the the data scientists and the application developer or machine learning engineer will come into conjunction now having all of these services um and having them available for everybody can actually be a nightmare for it operations right.

A

You want to give your users the latest bells and whistles, but at the same time you want some sort of platform, or you want some sort of services that you know you can be. How do I say it uh very comfortable with you can depend on those, and you know that they're not going to help create any outages when they're actually trying to help create a model.

A

So let's go and take a look at kind of the model life cycle and where these managed services fit. In now, remember I told you there were four: we wanted to kind of extract and transform the data, so we can actually go ahead and instead of building something ourselves with the red hat, openshift data science team- we said well, wouldn't it be cool if we could just go ahead and invite a whole bunch of different vendors open source vendors in so that that way, you have a lot of choice.

A

So, when you're extracting and transforming the data yeah, you could go ahead and use apache kafka streams to go ahead and pull in some of your data. But wouldn't it also be cool to use somebody like starburst galaxy so that you could go ahead and curate your data. You really want to unlock that value of your data by making it very fast and easy for you to be able to access that data across the hybrid cloud.

A

Next, we want to take a look at creating models, so we want to be able to either use a jupiter notebook or something like that for some exploration, but maybe, at the end of the day, we're really interested in what anaconda has to ha has to offer, because they might have an extensive set of data science packages or libraries that we could use in our jupiter notebook projects when we're going ahead and doing some of the experimentation experimentation we're coming up next, so another one of our internet service, vendors or is fees, is ibm watson studio.

A

So, when you're doing this experimentation, you can go ahead and use ibm watson studio to see if you can manage your ai models at scale and be able to see if there are any issues when you're trying to deploy them at scale.

A

Now, when you're going ahead and you're done with your model and your testing and everything done with your experimentation, what you want to do is actually go ahead and deploy those models as actual services. So can you can use an isb such as selden deploy and it's going to really help you simplify and accelerate the process of deploying and managing your machine learning models?

A

And, finally, when you get your model out there, you want to be able to make sure that that model is performing optimally. So you want to be able to monitor that model performance and you want to be able to glean any meaningful analytics from the model.

A

Now this whole path that you're seeing this curvy path is kind of the the model operation life cycle. I want you to keep that in mind, because we need to see where would these data services or sorry managed services actually live, so we're going to start with the red hat manage cloud platform. We want to have a platform that is very stable.

A

That will allow us to work not only from a hybrid cloud but say on-prem public and even to various edge devices.

A

Whoops next, we'll have red hat managed cloud services. So these are the cloud services that we provide to our customers and you'll notice that kind of in the center there. I don't know if my mouse will work. We have the red hat open shift data science platform.

A

Now, on top of that, we have what we call our isv managed cloud services. So these are internet uh service, vendors such as starburst, galaxy or anaconda that we have into our red hat openshift data science platform, so that you can use some of the open ship services that they have and then we have customer manage isv software. So if you wanted say, for instance, in a model to take a look at quantization or go and take a look at inferencing, you could use intel openvino, which I apologize.

A

I forgot to put the icon up there. You can use seldom deploy.

A

Now, remember I told you: we have those four categories- those four categories- kind of sit on top there, so you have gathering and preparing the data you have developing the model, integrating the models in an application in application, development and model monitoring and management and, of course, you're going to be able to retrain the models.

A

So this whole red hat openshift data science offering actually sits on aws. So it's a cloud offering right now and what I'm going to do is um probably go through a demo. I think I have enough time for that. At least I won't have to uh have it as a live demo, but one thing that I wanted to mention about this entire platform: is we have the depth and scale basically without lock-in, so the capabilities that we have are really in conjunction with red hat and our service partners that we brought into this ecosystem.

A

So that way you have this manage cloud platform. You can use the red hat portfolio and services and you can take advantage of open source products through our partner ecosystem.

A

Okay, this is going to be very quick. I'm glad I'm going to be clicking through it, so one of my colleagues actually worked with the london city, metro and the london city. Metro wanted to go ahead and to be able to monitor cars within the metro area. They wanted to be able to uh recognize license plates and see is that car able to park here does that uh car actually have a tag so that it can use these certain metro ways. Is this car containing somebody who did something bad that we want to track?

A

Okay, get you get that idea. So here I have a picture of a car. What we're hoping that the machine learning model will do is be able to take that license, plate to write it and to actually grab the plate numbers and then, once we get those plate numbers, we can use apache kafka to go ahead and store that information, possibly get an amber alert in the meantime, we'll be pulling a lot of those license plates into our various warehouses and also into our vehicle registration database and, of course, um the city of london.

A

Their metro services can then perform more business analytics on the data that we've gleaned. So what does this look like if we are trying to use this red hat openshift dedicated platform? Well, because the red hat openshift dedicated platform sits on top of aws? You are going to need a cluster to actually use red hat openshift data science or what we affectionately called roads, and I know I'm going to hell for giving you the acronym there.

A

So we're going to click on uh the roads menu option and then what you're going to see is basically a menu of the managed services and, of course, there will be managed software available for you to you actually work with in the background you'll see that I've chosen one of those items that is jupiter hub notice, the other ones. When I go ahead and hit the explore icon. These are all the different managed services that I can go ahead and choose to use and there's plenty of documentation.

A

There are tutorials. There are quick starts so that if you want to learn more about the products and you're not very familiar with them, you can go ahead and use those utilities.

A

So I'm going to go ahead and choose jupiter hub and what that's going to allow me to do is to go into a jupiter notebook image, I'm going to take this notebook image and then basically wrap it up in a container so that I can deploy it on openshift. But I want to customize it for myself. So the first thing that I'm going to do is say: okay, if I'm doing some machine learning, what am I going to be working with?

A

Am I going to be using just a standard data science package which may contain things like numpy or pandas or scikit-learn, or am I going to want to work with something such as this license? Plate detection, where I may have to use a lot of the pie, torch libraries that are available, so I'm going to click on pi torch?

A

Then I'm going to take a look, sorry on tensorflow and then I'm going to choose a container size. I'm going to choose a large container size, and this is also going to give me the ability to choose the type of ram and cpu that I want to use.

A

If I'm working with s3 storage, I have the ability then, to go ahead and enter the credentials because we don't want to go ahead and put in the access, keys and whatnot into our actual code. I know people that do that. It's it's fun to hack into their code, but we don't want to do that.

A

So we can use these environment variables to save that for you and then we're going to start the server and we're going to have that server spin up and then we're going to pop right into jupiter lab so we've, given the data scientists now the ability to have an environment where they can go in and create their work.

A

What they probably want to do is to clone a repository that already exists in this case, we're going to be cloning, the license plate workshop and then, when we go ahead and clone that we will see all of the files that are associated with that.

A

What we'll do is we'll do what a data scientist does we'll go ahead and start experimenting within that notebook and see if we can actually create a model that will be able to successfully extract that license plate and to actually tell us that it's been successful and extracting that license plate.

A

Now, if I want to go ahead and deploy this application, you're not going to deploy this application as a jupiter notebook, I know people have done that. Please don't do that. What we want to do is really package the model as an api, in this case we're going to use flask to help us accomplish this, then we'll go ahead and launch our server to see.

A

If we've been able to successfully deploy something internally now we'll test that flask app, we have a status of okay and now we're ready to go back into our openshift dedicated environment, where we first launched the roads uh platform from to see if we can actually go ahead and deploy this now on openshift.

A

So we're going to create a project for ourselves in openshift dedicated and then yes, you guys have been good. You've been checking in your code as you've been working on your model. So we know that everything within the get repository is perfect.

A

Sometimes this can be very hard for a data scientist because they like to save all of their code on their local laptop. But you, as the machine learning engineer who is working with this data scientist, are going to encourage them to check their code in to get yes, you are all right, so we're going to go ahead and now from the uh the get option we're going to have some other things that we can do, such as the resources and advanced options.

A

What we're really interested in is making sure that we click on the routing option, because we want the route of this api. We need to be able to access the api from another location so that very important route we're going to go ahead and copy that probably test that within a browser to make sure that we can actually hit that api.

A

And then, if you want to, you can go back into a jupiter notebook and then using that route you can use either a curl or you can invoke a web request to actually see if you can hit that api successfully and then to test your deployed aiml application. You can take that route and even go back into a jupiter notebook and you can put in the actual api address you're going to give it an image.

A

So in this case I'm giving it an image called car.jpeg so that it will see if it can pull the license plate number from that image and wow. It actually worked. We actually have a car and we were able to successfully predict the actual license plate number.

A

So all of this is on red hat, managed cloud services and again this demo was concentrating on the red hat openshift data science portion and remember um we're trying to with red hat openshift data science is to have a platform, that's fairly open, so that if you have a specific open source vendor that you like or if you have um specific requirements where you want to use not only red hat products but open source products, you should have that choice to be able to do that.

A

So what did we learn today? Well, we learned that managed services or in particular, managed services for data. Science are really important to a data scientist they're, just not going to sit there with their their model. They have to have some way of actually going ahead and deploying their model um training it testing it and they have to do it in such an easy manner that they can accomplish that task themselves and, of course, I t operations will be there to help them with that journey.

A

I think I got finished in time so that I'm not preventing you from from lunch. So thank you very much.

A

A