Red Hat OpenShift Copenhagen 2018 | OpenShift Commons Gathering, 1 May 2018

Previous Meeting

⏯

youtube image

►

From YouTube: Lightning Talks: Diane Feddema & Zak Hassan: Red Hat

Description

Lightning Talks: Diane Feddema & Zak Hassan: Red Hat

A

Okay, so we're going to talk about Zack and I are going to talk briefly about machine learning, we're both practitioners, we're using machine learning and on kubernetes right now and we're using the s2i tools that we talked about earlier, and we are part of this read analytics IO team, which is creating the tooling to make it really easy to run these machine learning algorithms and include them in your pipeline on OpenShift. So this is a really simple overview here of this software stack with OpenShift, then our read analytics tooling.

A

On top of that and then apache spark which Zack will talk about next and then your application, which could be it could be something like a retail site online. It could be I, have an application for running performance, all of our performance tests and I've added an an intelligent portion of that, because I've added machine learning component, which improves the user experience, and it does some prediction for me. So Zack will tell us a little bit about SPARC now and what it does.

B

So patchy SPARC is a is the so we built a analytics platform on top of OpenShift and Apache spark is the core engine for our analytics. So it comes with different api's. You can use machine learning or you can use streaming or you can use graph processing as well as spark sequel. It comes with lots of language bindings. So if you want to do your stuff in Python, Scala Java, there's sy builder images that you can you can utilize.

B

So the kind of the benefit of using spark is actually because it's it performs optimizations it's lazy by default and has lots of things so think of your data being partitioned across many machines and then being able to query your data and do other things as well.

A

Okay, and as you can see here with all these different API is, if you are used to using R, you can use SPARC with R and, if you're more of a database user, you can access SPARC through the SPARC sequel, interphase and you know view things.

A

Similarly, as you would in a relational database, so scheen learning was a field in computer science and it is still highly interdisciplinary.

A

Primarily I've used it myself for these algorithms listed at the bottom for clustering, using things like random forests and regression, so some examples of how you might take a regular application. That's doing just your transactions on the web and turn it into something. That's using one of these machine learning. Algorithms is, for instance, like on the Airbnb site. They use ultra alternating least squares to you to give you recommendations about places you might like to stay say you go to a site where you're a place you can normally would like to stay it's already booked.

A

They will use alternating least squares to give you a bunch of other recommendations about where you might want to stay. Instead, you can do clustering, where you might want to cluster all your customers and tailor their experience on your website, based on which of these clusters. They fall into I personally used random forests to help me with my performance, monitoring and I'm, able to like pick the top ten configuration parameters that I've set in my experiment and see, which ones are most influential only the overall performance of the codes that I'm running.

A

So this just gives you some examples, this small subset of all the ML algorithms we have available in SPARC, and this is the good news. Well I've done all this performance testing, and so far the overhead has been 10% running on kubernetes and in clusters. I mean in kubernetes instead of just bare metal, and so Zack will talk to you a little bit about how easy it is to use this. You don't really have to be a data scientist to do this work you can. The the API is so easy pretty much.

A

Anyone can just try this out and we have a website where you can see all of our examples and try yourself.

B

So so there's a lots of tooling around that. So when you're designing models and and whatnot, then you know there's you know some data scientists. We do have data scientists on staff that work on algorithms and to have. But then, when you train the model, then you deploy the model and then you do. You can do things like predictions and solve different problems with with your data, so I think it's very interesting.

A

Okay, so you can just check the our github site right at oolitic, say: oh, if you want to check out our code. Thank you.