Red Hat OpenShift Data Services Office Hours, 20 May 2021

Previous Meeting

⏯

youtube image

►

From YouTube: Data Services Office Hour: Introduction to Data Science with Guillaume Moutier

Description

Join data experts Chris Blum, Michelle DiPalma and data novice Chris Short every other week for a hands-on Office Hour about Red Hat OpenShift Data Science. Be ready with your questions and to learn a few things along the way.

A

Good morning, good afternoon, good evening, wherever you're handling from welcome to another edition of the data services office hour, uh if you notice the name changed during the show, so we're going to talk a little bit about that today, I'm chris short executive producer of openshift tv, I'm joined by chris bloom and I'm very happy to have both of you here, uh I'm very happy to have learned that pronunciation this morning as well. So thank you very much for the lesson and french pronunciations this morning.

A

um So chris data services, what's up.

B

You haven't been online.

A

For a while, you you had to go to portugal, like you got a lot going on in your life right now,.

B

It's great to be back.

C

Yeah so uh some of you, our true fans that always watch the office are here they probably remember me. I was gone for a little bit because I I live in berlin, germany and a lot of people have decided to work from home and that kind of overloaded, the internet capabilities of germany, and so I was limited to 0.4 megabits upstream, and so I said.

A

We wanted to deliver, it doesn't work for live streaming.

C

Nope so uh right now I spent three months with my aunt uh very remote. I don't have a personal interest anymore, but we do have fiber internet here, so 100 megabits, synchronous and so I'm able to come back- and um I hope you uh you're not too sad not to see michelle today um so.

A

We'll be back from time to time.

C

A

C

Yes, she will look back, um so that's not the only change that I'm back. um We also changed the name, so we changed from the ocs officer. We changed it to the data services office hour and um that's not just a name change. It's also a change in how we change our focus.

C

Where previously we always talked about storage, storage, storage and a lot of people just think about ocs as the solution to provide their persistent stories, their pvcs and just stole that single use case. Just do the dumb stuff that a lot of other people can do, and it was sometimes difficult in these conversations to really position the true values of what our product does.

C

So with the name change of data services, we're going a step further, we're not just talking about storage, because that is a a problem that has been solved over and over by multiple companies. We want to show you that we were thinking one step further, we're not just trying to provide some kind of storage capacity.

C

We want to help. You run your workloads with the storage and the storage is a little bit smarter. It understands a little bit more about what you want to do and helps you in in doing that.

B

But that's in a nutshell in a nutshell: yes, so.

A

B

C

Have two parts of this, so we have the let's call it the old part, so ocs is now renamed to odf, so openshift data foundation and the other part is what we have guillaume here now is we want to talk a little bit about data science, so um do small things on the storage and that involves aiml workloads and we can. We can do fun stuff with jupiter hot, just prototype, something in python. Let it run um so john will talk a little bit about that later.

D

Awesome, yeah and- and that's the idea here, if I may it's it's, like you know, changing the perspective about what we are proposing: it's not only about storage, which is the implementation. How you do it really but much more about you know the business value. What can you do with the storage, which, in fact is okay? I will work with my data for data science or you know purely data data for your applications or things like that. But, of course, at the end of the day, it's the same thing that is running. You know you.

D

You have to store your bits and pieces inside some, uh some storage, but now we want to uh to lean a little bit more on this aspect on. How can I use my data? How do I integrate my data inside my overall architecture and not only leaving storage at the end of the chain? You know as something that you that you don't even consider until you really need it. It has to be part of your architecture.

D

Therefore, the the slight shift of approach.

A

Right, like you, can't just dump everything in one place right like we need to think about this in a like. You know how we used to think about how we partition disks. We put the boot volume in the very front right, like you, have to think about all of your data in not the same light, but you got to think about it right, like what am I going to do with it? It's just going to sit here. I have to keep it for regulatory reasons.

A

Can I do anything else with it, while it's there right like how do we manage all that, so the entire engineering effort behind the data foundations, I think, is like it takes the the name and actually applies values to it. Right, like chris, you mentioned values and, like it's a very interesting proposition here, right like I like it, a lot.

C

Yeah, so the foundation is literally a foundation, so what we had before is now the foundation of what we can put on top of it, and what we want to talk about is use cases. You tell us more high level about what you want to achieve, and then we can talk about how the odf or the data services can support you in doing this, and also one of the consequences is we.

C

We just released a new version now odf 4.7, um and with that we also thought a little bit more about the pricing, so the pricing will be a lot easier to calculate with this new version um and yeah and we're starting in this data services approach, where we add things onto our foundation that you can then use beautiful.

C

A

Name change whole, like transformation inside your department, whatever you want to call it business unit department, I don't know yeah whatever it is.

A

Is it working so far.

A

Asking the hard questions here on uh yeah.

C

Yeah, that's that's important. Someone needs to ask the the hot questions. um If you have any more hard questions, just write them in the chat.

B

C

B

C

We just released it, I think yesterday was uh was the ga 4.7, so um in the last 12 hours it has worked in the last 12 hours. We've killed it yes, awesome, but obviously, before the ga, we we had a lot of internal discussions about this. We wanted to actually understand.

C

Are we doing this? The right thing do people want this? Do people understand this? There was a lot of conversation about how we should position this, how we should do it and um the in the conversations when people really understood what we want to do. Obviously there were there were a lot of people that were sad that were leaving that term storage. A lot of people look at this term and say well now we're not storage anymore.

C

What does that mean, but it's a new term it needs getting used to, but once that once it sinks into people, that's what we've seen internally here is that they they understand that we were limited before we were limited to being a storage department that just cares a little bit about disks and how to partition those disks, how to make them available. How to be a fast storage or storage that only needs little resources, but now we can, we can actually drive our conversations further. We can talk further about hey customer.

C

What do you actually want to do and then we can deliver a full-blown solution to the problem, not just a product that you install you put in a cd and install it and and then you're done, and you never think about it anymore.

D

So it's also yeah. It's also a change of you know the people we want to talk to you know not only the csun means the storage and means, but go a little bit broader. You know with the architecture solutions architect, the cto, the cio. If you speak to cio about storage, you will say: oh no, you know it's it's. The thing for my itn means I don't really care about storage.

D

Now, if you're talking about what you can do with the storage, what you can do with the data, then you have the the attention and anyway we we've. I have experienced this shift. You know for the past 10 years in all the it infrastructure uh components becoming more and more commodities, especially with the uh with the cloud. It's uh okay.

D

Let me let me take you back 10 years ago, back when I was working at level university, we were out, for you know, um for uh an rfp for new servers or something like that we would spend you know the architects team. We would spend hours looking at the bus architecture with the processors and how it's handled and everything fast forward. 10 years. Oh just bring me a server right there on hp or whatever I just don't care, because that's not relevant anymore.

D

What's what has become relevant is the containers that you are able to reschedule automatically or your vms or things like that, but not in fra itself, not saying it's not important, okay, but it has become so easy. You know, oh, I can have a server from aws from azure or even on-prem. Now I have all my pipelines to deliver uh vms on my internal cloud or things like it's.

D

It's not really the subject anymore now the subject is: how can you deliver this to your dev to your people, for them to be able to use it in one hour, because that's what you are competing with with ws? You know 10 years from now for servers. Oh, you want a new server. Yeah call us back in three weeks because we have to order it and then it will be delivered and then we have to rack it and connect it. Something.

A

Like that, yeah.

D

And you know it's obvious for servers, but the same thing has been happening for storage storage becoming a commodity, because, oh you want object, storage, yeah, just go to readable uss3 and you have object storage. That means we have to deliver exactly the same experience, therefore leaning more on that aspect. What do you do with the storage? Of course, we will continue to to to speak and to work really closely to the with the storage people, the pure storage people, because at some point you have to do this.

D

You know for performance and scalability and things like that. But if we want to have more impact, we have to go a little bit up the ladder and talk directly about the usage with those different personas, as we were used to work with before nice.

A

So change the name, change the product name change. All these things, we're thinking about data holistically now, not just hey we're going to sell user storage, and you know three different open source projects glued together. What has changed in the product as a result of this so far.

A

Did you hear me.

C

Like the biggest change is actually an area where geo missed right, um okay, so internally in red hat, we emerged um two teams, which was uh for me the storage people, and then we got the data science people on board too. So the the biggest change that you can see today is that the data science part has been added uh and geome can talk a lot more about this. The other thing is um that now, with the 4.7 release, we're starting and it's now um def premium 4.7.

C

um We we started looking at the vr things, so this will gradually improve now and um you you get a little preview of it in product. Seven 4.8 will already be a lot better and then we're looking at um at the releases afterwards, where you can release that really um yeah, so that is now available, but the biggest change is the data science part that has been added.

D

Nice yeah- and it's not, you know we changed them last week, something or the announcement was yesterday for the official new name. So it's it's not about the changes that we put in the product, because this has been happening for a few months. You know adding more features towards this easiness of consuming storage or being able to to deliver data services. So we've been doing that and we will of course continue to do that.

D

So that means integrating more things into directly into the openshift uh console into the openshift ui, so that for people it's easier to to to to work with storage, especially as a dev, you know you don't want to. In fact you don't even want to talk about storage. You just want to put your data somewhere, so.

A

D

To dump this give me the api call.

A

Exactly give me give me an api.

D

I only want to talk about an api and an sdk. The rest is in fact it's not my skill set as a developer, and I don't want to learn more about this, because I have tons of other things to learn that are directly linked to to what I do. Storage again is a commodity from from this point of view, so we have already and we will continue, bringing more of those features of those integrations in inside inside openshift, as must as we can.

D

Let me give you an example uh in sev, since last year we have this feature called bucket notification in object, storage. That means whenever something is happening on your bucket, of course, you configure it. Let's say you have uploaded a new image or something this bucket has the ability to send a message to send a notification to an endpoint.

D

It such a simple message, saying hey this file with this name has just been created inside this bucket and we can send this message to different endpoints, an http rest, api kafka, mq messaging, and then you are able to act upon this event. Okay, so that's that's. The first illustration, where you bring data as an intelligent thing within your architecture, because now it's part of your even driven architecture.

D

It's not just something where you dump your data and you retrieve it when you need it now, it can totally be part of the architecture, but this feature that's well. It's not that difficult. You know to to to configure bucket notifications, it's pretty standard and we reuse the same mechanisms and protocol as uh as you have in wss3. So all the sdks that are here and and everything, but still it may be difficult for some people, so there are work uh there.

D

There is work going on right now to bring this as a configuration configuration part of the yaml definition, so purely native kubernetes way of programming thing. Oh, I want to have a bucket storage and I want it to send event to this endpoint three lines of yammo bam. You have your uh your object bucket and you are able to work with it from your applications and then create this event-driven architecture.

D

You don't even need to know how it's implemented. You know behind the curtain. You don't even know if it runs on ceph or whatever else, or how many nodes or replicas or things like that. No normally, your I.t team is supposed to take care of that and provide you with these performant scalable storage that you need to work with. That's that's the kind of things that are happening.

D

A

So if go ahead.

C

A

It appears to me like I'll talk, not me yeah.

C

So, um in addition to to what geome said, one of our goals is to keep odf um in with the similar goals that we already have with ocs, where you have an interface.

C

That's very simple: to use, it's very integrated into the openshift experience, a lot of the other products that you can have that give you storage on the openshift platforms. They might not even be written for openshift they're written for kubernetes and they sometimes work with openshift and then there's a new version and there's compatibility issues.

C

Odf is developed primarily for openshift and works with openshift. It's deeply integrated. You get dashboards and even though we add more and more features, we have a ton of features that we we have in in staff in the back end that we can port to odf, but we do still want to have that ease of use so that you don't need dedicated storage people that need to understand it. It's just that it's there. You can use it, but you can have your regular people use these products easily.

A

Regular people using the product easily- that's that's music to my ears. It really is, I mean because we all know I'm the storage idiot on the show right, like the data science idiot right, like I've, helped the data science people embrace containers. Now it's like. Can the data science people help me embrace some of what they're doing that'd be awesome, yeah?

A

Also yeah, like I like this, I like the direction.

B

We're heading here and we've shown.

C

This in the first couple of shows, if you want to go back to the archive um where chris himself, he installed ocs and um it worked.

A

It's one of those things where it's like, I'm patiently just waiting for the next cool thing to come out about it right like it's, not it's maintaining itself, it does what it does it's an operator. You know it's gonna handle things for me. uh Just waiting for cool features, you know, is there anything else I can do with? This is always the thing that I think about like okay, you know the cluster's here it's doing things for me.

A

Can it do more things for me right and that's what most people do with their infrastructure too right like we'd like to take on a new project? Can we do what we got or do we need to add something else right? A lot of people sit there and think that, and it sounds like odf is going to have some. You know foundations and data science that will have folks thinking.

A

Yes, we can do us more with what we have now right. Like that's kind of the premise, is what I'm hearing you don't have to go out.

C

A

Architect, or do that rfp that takes you know months to figure out what you need and no it's often running from the get-go kind of thing, just install, ocs and or odf and be on your way.

A

I'm sorry that that one's going to take a little bit, I told you, I was going to say it yeah.

A

um So where do folks go right now to learn more about odf um data services that whole gamut of things? I dropped one link in here that I found openshift data foundation from the technologies section, but it's just a high level overview. uh I'm assuming the docs and everything have been updated right like what else.

C

Exactly so, we just updated our access.redhead.com site. uh Let me just fetch the link for you um and we try to to make that more obvious. What we talk about, what is data services and that um and yes, you'll, find us in the documentation um there. You you'll learn how to do it and there's going to be a lot more material out there in the next couple of days. That will talk about data services, how how it is positioned, all the things that I talked about earlier.

C

Why why the name change from storage to data services? What does it mean and.

A

I'm looking forward to.

A

So what what are you most excited about like looking forward now folks? This is when we're talking about the future. There's no dates, there's no times being promised here. You know like just keep that in mind, and you know future talk is happening right now, so not saying that this is promised in the next release or promised ever right like what are you most looking forward to as a part of this change? I know it's bringing people together, which is always good. It's it's changing people's.

A

You know perception of what they can do with data, but what else.

C

Well, a lot about these talks about the use cases. We want to understand more about what you run on your your data foundation and then help you leverage that a very good example was what geom already talked about with the object bucket notifications.

C

So you have an application and the object storage can enable you by adding new features, to actually do that and now that we have the object bucket notifications, we we can deliver that to all platforms, no matter where you run previously. Maybe you only have that available in aws or not in bare metal or anywhere now with odf follows you wherever you want to go?

C

The other thing that I'm looking forward to a lot is the dr thing. I already talked about it a bit, but that will enable you to have multiple sides and have your persistent data available wherever you need them site fails you can switch over, and that is something we're actively working on.

C

The underlying technology is already in step, so it's nothing new. It's not like. um We we go out, and we say: okay, this uh it's quite complicated to synchronize data um across an internet link, and sometimes you want to do it synchronously.

C

So it's updated immediately on both sides. Sometimes you want to do it asynchronously and that difficult part is already handled and um has been used by customers already as it's. Even though it's a new odf feature, it's not going to be um like you have to be careful or afraid to use it, but we want to make it so that the user experience is great.

C

We want to make it so that it fits to the goals of odf so that it follows you on whatever platform you use, but it's also easy to use. You don't need that storage sky, chris short, can do it.

C

B

We need to stand like.

A

Chris short can do it yeah like that, would be awesome so.

C

Right now it's at that phase, where uh chris bloom can do it, and then we want to get it to the stage where chris short can do it, because it's easy it's in the ui and we do have specific kubernetes dr objects that we can use to describe how we want to do the synchronization and that's what I'm looking forward to and that's also an area where um talking about data services um takes.

C

It takes the next step, because over there we're not only transferring your data, we also need to think about your workloads because it doesn't. It doesn't help you in any way, if you have the pv over on the other side, but your application isn't just in that one side.

C

You also need to take your application, what we call the metadata and we move it to the other side as well. Even though that's not persistent data.

A

So um talking hybrid here, let's think hybridly since you've mentioned that you know hey cross-cloud, maybe or on on-premises to cloud what are the you know, kind of advantages of putting odf across a fleet of clusters where data scientists can like access it easily and then have like you know, there's a team over here. There's a team over there there's one big bucket of data that they use like what is that experience going to be like for everybody using it right? How are how like, if I'm pulling up a jupiter notebook as a data scientist?

A

What is that experience going to be like? Is it going to be more simplistic more, you know I need to learn a little bit or I need this. One snippet tell me more about that journey for a data scientist now.

D

Well for data scientists, if you work inside your jupiter environment, you're already one layer over, so you shouldn't be concerned about storage. um I, and there are different things you can do that. I guess the main interesting point brought by odf is that it brings all the three different types of storage. You will need to to to make data science or data engineering happen. Okay, I will take first example.

D

This is a team uh in ontario that I helped uh build the data science platform for their coving 19 research, okay, it's a loosely loose group of 300 researchers from different organizations, the the different ministries and guys in ontario, and they grouped up together as a community to work on the data that was available for kovi 19. short story. They were kind of fed up with the way the government was publishing the data, which was not really useful in well. The data was useful, but not for researchers, because no, it was not raw data.

D

It was not updated in the right way, so they took up on themselves to okay, we'll do this we'll do this data aggregation data, scraping and and recreate data sets that we can really work with. So I helped them set up this uh open data hub environment, so this data science platform environment, and here they had these specifications that they wanted to be able to share notebooks and they wanted to be able to share data between each other. How do you achieve that?

D

You know normally a notebook when you launch it with jupiter hub and then it's connected to your storage, but that's your storage, your your kind of stock, but with the df. Oh, no. We have also uh file system storage with cfs, so that means we are able to have those rwx volumes and mean volumes that you can connect to multiple pods. At the same time, and from this you can build a shared library, a shared library of notebooks or shared library of data. That's that's the first step.

D

Second step is oh yeah, but you know uh we want to be able to access. Also, all those data from many different points on you know interconnect all those things together. Then object. Storage is much more suited for this kind of things, and and generally we we tend to see more and more people shifting to object storage. For this exact reason, it's easy to work with. Now it's it's built into most of those scientific or data science, libraries, but because of this disconnected mode, you know it's not something.

D

It's not a file system that you mount on your server. It's only an http request that you can make from wherever you you are in the world, so this disconnection in between your notebook or the the container or the vm that is processing the data and the storage makes it really well suited for for data science environments and it's also brought by odf, because odf has also object, storage and then at some point you will need a database, oh for database.

D

I would use blocks because I need this block block approach and uh an intensive workload approach. Well, it's still odf, so you see that's where this I find it interesting because you know granted. There are many different storage vendors that offer that have fantastic offers in block or in object storage, but usually they don't have this fully integrated approach.

D

You know all across all across the board of storage, which is what you get in the df as, as we said, even eucris, you are able to deploy it in a few clicks and then you have file block and object, and then you are about to do mostly whatever you want depend, depending on the use case, so because data science and data engineering is exactly about this. It's always trying to reinvent something because the context changes or you want to use new things or test new things. It's really different from a standard application.

D

Let's say I'm an insurance company and I want to I want to do the architecture for my new application. Well, I will work a few months on my architecture. I will say that, okay, I need this type of storage. I will go out, buy it and then I bring everything and it will stay the same for five ten years right. That's, okay! It's not true! In data science, in data science, what you are implementing now did not exist six months ago and will be absolutely six months from now right.

D

So if you don't have this agility, you know being able to pick and choose the different types of storage that you that you need or recreate easily architectures by just using again pvcs uh persistent volume claims or object bucket claims, or things like that. If you don't have this agility, it begins to get really really difficult to do to to work with um so again it's I, I think the best thing is that odf being fully integrated into openshift.

D

That totally makes it. You know the the platform of choice to uh to to to set up those data, science, environment plus you're, totally agnostic of the real infrastructure that is underneath right, meaning whatever you are creating in aws or azure as a test. You know you're trying your things just to to learn more or maybe you know you uh you, you have a subscription to to rhodes, to begin to use uh open data hub and shift data, science and okay.

D

You see it fits my need, but I want to be able to do something on prem yeah. You can totally do the same thing on prem, because you're not tied to the specific storage that is brought by aws or when roads will be on azure you're, not tied to the specific storage that will be brought by azure. So again, it's about flexibility, and I I I guess that's our main strength here.

A

So the flexibility is the main strength. My understanding is, you have some like examples. Use case demo type things that you could maybe.

D

Yeah sure to bring to.

A

The table here.

D

Yeah, I I can show you some of the things uh I'm doing. um Let me share my screen.

A

Share dance here we go folks, oh, what's it telling you there, we go all right. It's a small print right now, though,.

D

Yeah, I will zoom a little bit for. Thank you. Is that better?

D

Okay? So for uh for those who listen, who are not familiar with jupiter, it's an environment where you are about to write notebooks and for an example of a notebook I'm going to open this one.

D

Basically, a notebook is a web interface that connects you to a kernel, a kernel being the engine that will run your code. Okay. So here we can see I'm in my environment so again fully web environment, and I can see that I'm connected to a python 3 kernel. So that means whatever I will be running inside. My notebook will be run against this kernel, and this kernel doesn't run on my computer. It's running on the cluster on the openshift cluster in the container that I've launched.

D

That's the first advantage of setting up this data sense platform on top of weapon shifts, because that means you can bring to your users the full capabilities of a cluster okay. I could do this for my for my ipad. It would work exactly in the same way, but the code that I will uh that I will run will run on this cluster, so maybe with 8 cpus and two gpus 32 gigabytes of ram. Whatever I don't have on my ipad, it will still run and the way it works with notebooks.

D

You enter your code into cells like this one, and this is a python cell. This is python code, okay and you are then able to run those cells independently. So I will run the first one. I click on run and I have the result here. This is what you entered. Okay, hello world, you know very basic, but it has run only this cell.

D

Now I want to run the other one perfect and then it has around the same function. It has run the function that I had created on my first cell, but with a new text. Okay, so interactive way of developing your python code. This is basic and well, of course you can take notes. uh You have sales with code and sales with markdown and you can create your environment.

D

This is basic, but where it's used it's for this. For example, this is a notebook that I have that I used to create a model that is able to recognize a risk of pneumonia into chest: x-rays: okay, that's the kind of things that you do in with aiml tools, and here I have my notebook.

D

So I have first a few a small description about what it is, what it does, then I have my imports and my code, but, as you see it's fully documented code that people are able to read directly understand, what's going on and replay all of those different cells, one by one when you are developing your algorithms and things like that, that's that's a a really easy way to do this, because of course you are always adjusting parameters and things like that and you don't want to rerun everything from scratch.

D

Normally, in standard development, you would put a breakpoint and then rerun everything see what happens at the breakpoints or trying to debug everything. Here. You can go totally step by step for each of your function here, each of your parameter, and things like that. So here.

C

And because it runs centrally, you can also share this right, so jim develops. This has a problem and he he calls me for example, and says: hey? Can you look at this? Why is this python thing not working? I can just open up his notebook and look at it.

D

Yeah, that's why it has become. You know so popular with data scientists, because- and it's called a notebook because that's exactly what you would do as a researcher doing experiments. You have your research notebook and you know you take notes. Okay, here, I'm running experiment number one with these parameters. You run the things. I don't know what you do, you're in chemistry, you, you mix up different liquids and see what happens and you write the results here. It's exactly it's exactly the same thing. I'm writing. Okay.

D

This is what I'm gonna do here on this specific cell and then I have the result. So it's the same workflow but applied to code. Okay, and once you have this once you have this model that has been deployed, you can put it into motion into a real application, and this is what I have here.

D

uh Let me first switch back to this view. To give some explanation.

D

Okay, here it's working in this way, I have extra images that I am sending into a bucket okay, an object, storage bucket, but because this bucket has been enabled with notification every time I am sending a new image, it will send a notification to kafka to a kafka topic. Nice, okay- and I have here in my open shift and everything runs in openshift in my environment. I have this, which is a creative eventing component.

D

It's listening to this kafka topic and whenever something uh some message has been coming in, it will send this message to here. I have a canadian serving component with a serverless function in which I have built my model, my ai model, that is able to recognize the risk of pneumonia and in this container I'm making this risk assessment.

D

Okay and then I will. I will save the results uh into a database, but here you see we have this workflow where we go from okay, I'm sending my data to my data repository, which is an object bucket, which is taking part of this overall architecture where it sends message to kafka and then to my risk assessment bucket and container, and we can see it live on this dashboard we have here. You know I started a while ago, the generator so I'm sending all those images into my object bucket.

D

I have here this discounter that will count the images coming in then. I think you need to zoom in a little um yeah, but it will mess up a little bit. The.

C

Dashboard, unfortunately, can you just open up the pipeline progress as a own view.

D

um No, unfortunately, I cannot.

A

Oh, I mean it's, it's legible, but I I can't read the print inside the blue boxes.

D

Okay, but let me describe you know it's what it's what I described here: um okay, I'm putting everything into my object: storage, it's sent to a kafka bus, and here I have my container. It is doing the risk assessment. I have this counter.

D

That is counting the number of images that have been recognized so far, and some of those images have also to be anonymized because.

D

The model is not able to recognize exactly uh if there is a risk or not so for further processing. The images are first anonymized and then they are sent to another process, and here I have all those uh those data about the the the last images that were recognized and so on with with the images themselves. But it's just to illustrate how you go from this oops this one, which is your data, science, development, environment, okay, and here again you are leveraging different things.

D

uh You are leveraging um the the block storage because that's where my notebook is residing, you are leveraging object, storage, because that's where the data set with uh my uh 6000 or and something raw images are to trend the model and that's a small data set. You know. Sometimes the data set that you have to work with is 500 terabytes of data.

D

Of course you don't put it on your usb key, and that means you have to have those bigger environments and that's where openshift plus ceph, with its scalability, come into play because you are about to have those 5, 500 terabytes of data residing with no problem to self. And then you can have hundreds of data scientists using this data set this central data set in object, storage.

D

You see that that's where it works and you have your data scientists working working on the data set working on the data, creating those models and then on the same platform, the very same openshift platform they can deploy the model and use it for real in the real implementation.

D

So again, uh that's what I find really interesting with the the the the business proposition that we are making here. It's the same openshift plus odf platform that you can use both for your data science, development, day-to-day usage and also application production. You don't change your environment and it's totally portable. So let's uh yep.

A

Nice, so I can put all you: do you have any other demos that you wanted to show off or anything or was that that.

D

Oh no, not at this point, maybe in a later show.

A

Yeah yeah so like the data set, was public. I'm assuming- and you know, we're just using it, of course, okay cool. I don't have to put any disclaimers.

C

A

um Like that's incredibly powerful right like to train models and be like okay, taking this a step further right, like this patient, had covered this patient, didn't what's the difference right like we're, gonna have to get through this pandemic. There's gonna be some aftermath right like that has to occur, or you know something has to happen for these people that are dealing with the after effects of covet right research is being done there hell.

A

My wife just told me the other day that, like some group in europe developed like from mrna just like the covet vaccines, but it's like pandemic agnostic. It doesn't matter right. So it's like okay, great, like how did you do that, like what data did you consume to figure out that you could create a vaccine to fight any coronavirus?

A

um That is impressive and it seems like we're giving people the tools to be able to do that, like all in one spot, as opposed to having to bring together, like your entire I.t department, to figure this out kind of deal.

D

Yeah, it's you know, that's why data science has been on the rise for the past few years, because now we have the the capabilities. You know the processing power. We have the techniques. We have everything to be able to to train those models to do real, aiml, the the the the the mathematics part of this is really old. You know it is 30 years 40 years old, but until mid 2010s we didn't have the real mean to be able to leverage that okay, that's not true anymore.

D

Now, now we we have this, and here it's it was an example uh with image recognition. This is now pretty much. You know, standard image.

C

D

Is easy, but there are many other things that were that were tried uh for coving 19, for example, someone trained the model you just cuff a little bit on the phone and it's able to detect if there is there is a risk of not yeah. Here it's the same. It's about having those thousands of samples of people coughing and training a model to be able to detect what the human hair cannot do.

D

Obviously, so it's those tools that we are bringing uh that have been brought into the world that were reserved for the past few years to you know some specialists and it was really difficult to use really difficult to to to to implement.

D

Now it's a little bit more mainstream and by bringing it on top of openshift, it's even even more mainstream, because it's the standard platform that you may you may already have in your enterprise and most customers, I'm working, I'm working with you know they already have some open shift, installation or some openshift knowledge, and now they are interested into this data7 thing and they're. Oh yeah, we have this data and uh maybe we think it will be useful. uh How can we do this? Well, you already have openshift.

D

Let's deploy open data hub. You know five minutes from now. You have all those tools. Oh perfect,.

C

Yeah you notice, like I, have a bigger data set than I expected because of openshift. You can use a machine set. You scale it out with a different, instant style that is bigger. You don't need to touch anything because the openshift thing is handling all the installation and once you're done, you can get rid of it again. Yeah.

D

And I I have customers, I'm working with they're, doing exactly that they have those huge processing to do periodically every 24 hours. You know it takes tens of uh tens of machines to to be able to run that, but of course, as it runs in the cloud, they don't want to keep it. uh You know running.

C

In 407 cost of fortune so.

D

Now it's part of the workflow at the beginning of the process. They will just increase the machine set, it will spawn some new things. Then they will launch the process uh using those. You know those data, science tools spark and the rest. They will do the processing takes a few hours and then, when everything has been done, just you know just scale down the cluster and they save a lot of money.

A

I mean this is really reminding me of a time where I worked for a financial services like marketing company and the data science team. We were having so many problems with like infrastructure and all this other stuff right, like oh, my model, didn't finish running before the spot instance shut off, and now I wasted all that time and money right so openshift like it puts all the power in the people's hands is what it feels like right like I don't have to worry about some other team or some other. You know configuration touching my workloads.

A

It's I'm managing this now I I have a machine set that does what I need it spins up it spins down as I'm processing data and off I go kind of deal. It seems really powerful.

D

Right like and it's you know, it's really close. You know, we know we we've known for a few years now, all the benefits that openshift can bring to development. Okay, in general, all this flexibility, agility and everything. It's about bringing the exact same advantages to data science. It's it's really well suited. Now that most data science tools would run it will run to containers.

D

Then it's oh fantastic! Now we can run them on top of openshift and we can use all the the know-how. The skill set that we have developed around devops and around creating those infrastructures to data science.

D

That's perfect and when you add to the mix ceph with odf, then you bring the scalability and the performance that you need for data science, because it's not only about you, know, storing a few data here for for now, more and more people we're talking about petabytes of data and petabytes of data that have that have to be uh that have to be processed in as a small time as possible, so meaning you have to to have performance on the on the storage pod and that's where seth shines.

D

You know, especially with the predictability of performance, uh this uh perfectly uh uh straight line, uh the more capacity you add, the exact same performance you get. That's uh that's really important in data science, you don't want to be okay. Now that I'm reaching over one petabyte of storage of my specific stuff, the performance are totally dropping because the storage is not able to cop uh to keep up with it. uh We don't have those kind of issues with stuff, so it's kind of bringing best of both worlds, storage and kubernetes to data science.

D

That's why I'm so excited you know to work with it. It's uh yeah, perfect patch,.

B

That's amazing.

C

Yeah, but so in my daily life, I'm not actually handling a lot of big data or I'm not wearing lab coats or anything. So um one thing that I want to mention about jupiter hub is it's not just to to do what jim showed us? You can also do regular development in it, and um maybe chris you can share in the chat, a link that I uh just said.

C

There's like a list of all kinds of kernels that you can use. You don't mention it in the beginning. The kernel is the language that you write in your notebook and there are kernels for pretty much anything I like to uh to see that there are go kernels, so you can write your go. Applications in the jupyter notebook in your browser, share it with anyone or one thing, that's very popular and that's pretty cool.

C

Is you have an ansible kernel so if you've ever written an ansible playbook, you know that it's hard like you, you write it and you want to have it so that you can repeatedly run it. You want to test it. You want to document it. You can start writing your ansible playbook in jupiter, notebook and test it in there, and then you can immediately see what it does, what the output is and all of that that's pretty cool.

A

Yeah I'm about that life. You know you say ansible and I my ears light up. Obviously, but yeah. I see it right here: ansible jupiter kernel, that's awesome!.

D

It's funny because you know hardcore developers will always swear. You know by their own id. You know, but when you come from a different background uh or you're, not you know, I'm not a full-blown developer. uh That's not! Well! That's not what I do. I have the same approach as chris. You know taking the best thing depending on what you want to do and for instability. I I've never done this before, but I have tons of fanciable playbooks to rewrite to deploy those demos into our hpds, but I totally see the pointer. Oh no.

D

I want to test only this part of the playbook and not rerun everything or just comment, the parts that I don't want to replay because I'm just working on this part. That's the interactive mode proposal from the notebook that is interesting, so yeah.

C

And you can document it in full markdown, so it's it's also great if you want to teach someone to learn a certain language or ansible, whatever there's also bash kernel. So if you want to teach all those millennials what you can all do in bash, then you can. You can write a notebook and make it fancy with the markdown. Tell them exactly hey. This is a for loop and that's how you do it and they can run it and see it immediately. What it does, what the output is.

A

So, where do folks go learn more? Is it our normal learning places like learn.openshift.com, for example,.

D

um There are things on learn.openshift.com uh related to uh to rhodes, red hat openshift data science, and uh let me check.

A

Yeah, I was gonna pull it up too.

A

A

The people to do.

A

So for folks that aren't aware uh I have an intern this summer and I'm very happy about that, because he gets to take notes and tell me what I'm doing wrong, because he has production like this kind of production experience in his background so or not this kind, but like movie production experience, so I'm sure he's like blushing or whatever in the background but yeah I like talking about my intern, um so yeah, some some mood music as I'm searching for data here aiml, is that.

C

In the meantime, I can read some more kernel: this redis there's the powershell bullfram.

A

Warframe, oh okay, so.

C

Yeah, you can, you can write wolfram.

A

C

Stuff in it, I I like them, because I I have this um app on my phone when I was a student, I um I didn't always have everything in my kitchen, so um sometimes I only had a weight. So I wanted to know okay, how much does 100 grams uh flour weight right, yeah.

A

Or a hundred milliliters.

B

Of milk kind of deal yeah.

A

How many, whatever leaders or you had to do a conversion or something.

C

Yeah then obviously wolfram would overdo it. It would tell me like okay.

A

You have 100.

C

Milliliters of milk is that, like uh whole milk.

C

D

A

C

Differ by probably two grands right.

A

Exactly all right, I don't I mean, let's not belabor the fact we are approaching the top of the hour. Is there anything else we want to talk about before we sign off? We don't we don't have any questions in chat so or at least I haven't seen any. I hope uh I haven't lost any by just not looking at youtube and twitch directly. Okay, no, I haven't all right um so yeah, like anything, you want to sign off with.

D

uh I would reiterate that you know for the the part I'm working on, which is data science and data engineering. The important thing you have to consider when building the thing is the platform. Okay, it's not the tools only by themselves. The tools are easy to figure out, but it's a platform, and here running those kind of workloads.

D

uh You know aiml or statistical workloads or pure data analysis on top of openshift, with everything that got with it, you know odf all the other components that we have several ass and and so on, that that makes a great platform. So that's uh that would be my takeaway from this beautiful.

A

All right, so don't ignore all the other things that you see happening in open shift, maybe poke around you can you're not going to break things. Just create your own project off. You go all right. So thank you, guillaume and chris.

A

I appreciate your time today as always um later on the channels uh 11 o'clock, eastern 1500 utc we're going to be talking about the value of get ops and uh we're going to have some guests on so please tune in for that, and until next uh data science or data services office hour, we will see you then stay safe out there. Everybody for real.

A

A

A