Red Hat OpenShift Data Science | OpenShift Commons Gathering 2021, 28 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Data Scientists and Red Hat: Better Together - Sherard Griffin (Red Hat)

Description

Data Scientists and Red Hat: Better Together

Sherard Griffin (Red Hat)

OpenShift Commons Gathering on Data Science
January 28, 2021
https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Data_Science.html

Find out more about OpenShift Commons, please visit: https://commons.openshift.org

A

Hi, I'm sherrod griffin, director of ai services, at red hat. I hope you've enjoyed the conference so far. There have been some fascinating talks about all the interesting work. That's going on in our open source communities with ai.

A

I want to talk a little bit today about data scientists in red hat and how we're better together, but before I dive into that, let's take a little bit of a step back and look at why red hat actually decided. It was best to get into the ai industry and look at ways in which we can help our customers along those journeys.

A

So the first thing we saw when we started talking to customers about ai is that they really needed to be able to build and run ai workloads on open source infrastructure that was key to them to be able to leverage their existing commodity hardware not have to worry about specialized compute resources that they needed to run these workloads, but do it in a way that they could just maximize their existing investment.

A

So we worked really closely with them to be able to do exactly that. Bring technologies like openshift and some of the other things we're doing with our partners like nvidia, to make sure that they have the best of breed tools on open source infrastructure.

A

We also saw that red hat itself needed ai in order to increase the open source, development and production efficiency. We've looked at things like analyzing, build logs with anomaly detections to be able to to find interesting patterns or or discover new things. That may not be right with the way that we're that, with that we're building and developing the software, and it's allowed us to increase that efficiency and get products out to customers a little bit faster and and be able to adapt to the market quicker.

A

The third thing here is: we saw that customers needed ai, integrated into open source products and services to be able to build or to be able to leverage an intelligent platform. Now, what do I mean by that? If you think of a lot of the platforms that customers are using like openshift in in also things like red hat insights, where we're helping them manage their own infrastructure, ai allows that to be smarter to be more predicted to be able to predict things before customers know about it and it's been great.

A

Introducing those technologies and, at the end of the day, customers benefit from these intelligent platforms, because it can react to their environment where maybe perhaps humans aren't able to grasp all the data that's coming in and be able to make decisions as quick as machine learning can do that. Now, how do we go about doing this? One of the foundational pieces of approaching the ais problem space that we saw is we knew it had to be on an infrastructure that could run in a myriad of different areas.

A

So if you look at the bottom of this graph, we knew it had to run on physical, virtual, private clouds, public clouds hybrid, as well as the edge that had to be a baseline, where we needed to meet the customers where they were. On top of that, we also needed to bring hardware accelerators into the mix.

A

It's with some of the challenging machine learning uh initiatives that customers were endeavoring on that it has to benefit from the use of gpus and fpgas in that space and then also being able to utilize self-service capabilities with the hybrid cloud. That's key with technologies like openshift and rel.

A

On top of all of that core infrastructure. That's where we started looking at well. Where do we need to work with the open source, ai communities, as well as our partners, to uh provide the rest of that story? And so, when you look at that chart that shows a typical aiml initiative. It starts with setting the goals and preparing the data all the way through developing and training the model and deploying that model as a service and getting some value and some some data back from that model being generated well.

A

In order to do that, we had to bring a number of different types of tools and technologies into the mix, to not only allow the data scientists to get access to the data in the way that they need to.

A

But then also to use the tools that they need to uh tools like tensorflow and jupyter, notebooks and spark and python, and all of our partner technologies to be able to solve their own challenging problem, and so when we did that, we not only allow you know we not only open this up for customers to use, but then we started using this internally ourselves to be able to bake all that intelligence. I mentioned in the previous slide into our technologies and also into our business processes at the core of all of this.

A

In in where once we just provided those tools, we realized. What we were truly doing is democratizing access to the tools and democratizing the data for the data scientists. No longer are they burdened with having to know where all of the data resides. They have one platform that can run in all of these different data, centers and all of these different cloud providers, uh and they don't have to you know they don't have to carefully craft their machine learning models to only run on certain technologies.

A

But the key part of this is all of the access to the tools and all the access to the data is still governed by I.t. But in a way that the data scientists have their own self-service capabilities, they can spin up their tools. They can get access to their data without bogging down, I t and having to work with it. To get all of these things done, I t can curate that process in the platform itself and then the data scientists have the freedom to make the choices that they want.

A

So I talked about working with partners before, but that was a huge endeavor in a huge initiative across red hat and we really truly listen to the data scientists and it still today is a driving factor for the partners that we engage with.

A

When we looked at how we needed to provide the tools it wasn't in just one space, we recognized to be able to have a data scientist use tools from a beginning to end to from, in data ingestion, all the way through to deploying their model. We had to work with partners that helped them along that journey and some of the partners focus on data, governance and security, data processing databases as a whole and then also the hardware accelerators.

A

This is just a glimpse of the partners that we've worked with today and there's many more to come. Now I talked about what we've done in the past. I want to talk about where we're going in the future, we're starting to transition from empowering data scientists with the hybrid cloud democratizing the data and now we're moving into improving the data science experience across the hybrid cloud, that's very challenging, but we're hearing customers in their journey and we're really it's really resonating with what we're trying to do in the space as well.

A

We're looking at ways in which we can optimize data governance across the hybrid cloud. That's an interesting problem because no longer are companies storing all of their data in one place, in fact, no longer are they storing it. In one cloud provider, everything is becoming fragmented because of the need to be able to be as close to where the data is generated as possible. But it's also becoming fragmented, because enterprises are getting so big and there's so many tools out there that different organizations are just doing processes and generating data differently.

A

But in order to get access to all that data, it's very key that we work with the data scientists to figure out the ways in which they're trying to bring that data together and lots of efforts are going on right now to improve the services around that, as well as the technologies to break down those data. Silos.

A

We're also working with partners, specifically to main uh to to decrease the maintenance of the machine, learning tools that they're offering through automation, intelligence and additional services. And this is key because we don't want it departments to be bogged down with maintaining all of the tools that data scientists need.

A

But if you can imagine making more intelligence into it into their tools being able to know when things aren't quite healthy and having self-healing capabilities or self-diagnostic capabilities. Those are critical to being able to have a platform that runs on its own and so by working with hand in hand with our partners. We're providing better tools for data scientists and for it departments that work that work in a way that provides more intelligence around what's going on.

A

The third thing I want to talk about is the area in which we're improving the usability of machine learning tools by minimizing infrastructure management, and when I think of this, I think of the job of a data scientist and ultimately they're. It's not their responsibility to be able to maintain the infrastructure themselves.

A

uh It's an ideal experience would be that they go in and they do. They use their tools, the way they need to, but they don't care where those tools are running. It doesn't matter that it's on-prem or in the cloud, or it doesn't matter the in fact that it's kubernetes or openshift or some other technology.

A

They just want a certain experience, and so now what we're looking at are ways in which we can abstract the infrastructure from the tools themselves, so that data scientists don't have to worry about the infrastructure management themselves, and so there's some exciting work going on there and we're hoping uh in in all of that's happening through both uh the platform itself, but then also looking at ways in which we can provide a better managed experience for the customers.

A

Now, another area in which we're innovating and we're working with the data scientists are the needs for bringing ai to the edge. This is interesting because we want the k. We want data scientists to have the capability to train at the core and then deploy at the edge. This is very critical for some of the workloads where customers have have data centers all over the world and data scientists. Traditionally, it's a challenge to be able to build a model in one place and deploy it into a vast ecosystem of clusters.

A

So, by breaking down those silos and really understanding what a data scientist needs, it's allowed us to start moving in the direction where we're providing the right tools for these capabilities and and it allows the data scientists to be able to do things like a model life cycle and and be able to monitor and manage all those models that they're deploying do versioning of it, roll back if they need to all those things that they just traditionally do in one data center.

A

Now they have the capability of doing it in many, and you can follow along with that project. Down below you see the link. uh You know we call it our blueprint for industrial edge and industrial manufacturing. The last thing I'll talk about today is a really really interesting project that we have going on. It's called operate first, but it's in conjunction with the mass open cloud.

A

If you're not familiar with the mass open cloud, it's a public cloud where we, the industry, together with uh you know, a lot of research institutes have worked to have worked to build out this public cloud where anyone can go in and collaboratively do work. Now, we've extended our philosophy at red hat uh of of how we do open source technology and we've moved into the space of operations.

A

If you can think of this, you know traditionally, mlaps is focused on how to develop the best practices for businesses to run ai successfully and now by introducing that to open source. We have an open source infrastructure.

A

That's in an open public cloud for anyone to take a look at and anyone to get involved in and we're bringing machine learning to that to that environment so that the data scientists, the operations all of the stakeholders, the application developers can all work together. Even our partners work together in one open way, so that we can enrich and better the ai community.

A

So some interesting, fascinating things going on in that space as well. It's a great test bed for new technologies, new concepts that companies and open source communities are working on, and it's also a great way for the data scientists to provide a feedback loop loop of what they need so that the companies participating can he can listen and help to create more technology to fulfill their needs. You can look at the url below as well to take a look at what's going on in that community.

A

So that's just the few things that's happening, uh but it's really really exciting. I I I'll I'll end with this note. Innovation, specifically in the ai space happens when we work together and that's why we're really focusing on our open source communities and how we can work together to to to take things to the next level and really look at what data scientists needs need and help with that journey.

A

If you want to find out more, please go to commons.openshift.org and I hope you've had a great time listening to the talks and absolutely you can find us on openshift commons and I hope to see you in the next openshift commons gathering. Thank.

A

A