Red Hat OpenShift Commons Briefings 2018 | OpenShift Commons, 1 Jun 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ML on OpenShift SIG Briefing: Kubeflow on OpenShift Update with Trevor McKay (Red Hat)

Description

from the Machine Learning on OpenShift SIG meeting held on June 1 2018

A

All right so, as matt said, my name is Trevor McKay I'm, a software engineer at Red, Hat, I'm, a contributor to the rad analytics IO project and also a member of the coop flow community. We've talked about coop flow in this meeting a few times earlier this year and today, I'm going to give you a high-level update about what is happening with coop flow, so for the benefit of those watching on YouTube or for those of you that may have missed earlier presentations.

A

We'll start with a very quick recap of what is coop flow, we'll take a look at how the community has grown up and what's going on there, what the roadmap for the rest of the year looks like and release plans and then we'll take a look at a question of particular interest for OpenShift comments, which is how is it fitting to OpenShift a very important question for us? Additionally, I want to point out just a few particular areas where I think open shifters in folks from rad analytics IO in particular, can contribute.

A

And finally, if time allows we'll see a simple demo of the workflow on open shift, proving that all the bits come along very nicely also, if you haven't seen Jeremy Louie's presentation from coop con tu in early May, I posted a link here, I'll add the slides to to the meeting notes afterwards. This is a great presentation that goes in more depth. I encourage you to check it out and when I asked Jeremy for up-to-date roadmap resources, he directed me to this presentation, so it's very relevant and pretty fresh.

A

Also just the logistical know if I say anything that contradicts Jeremy about roadmap he's right. Okay, so you may have seen this slide before what is goop flow. In a nutshell: it's about building portable machine learning solutions using kubernetes I'd also add it's about lowering the bar to entry for deploying machine learning apps and taking a lot of the orchestration burden off the shoulders of developers and data scientists.

A

So you can look at coop flow from a couple of different perspectives from a community perspective, it's about bringing together data scientists, researchers, developers, pm's, etc in the same space to produce a machine learning platform on kubernetes.

A

A community like this is necessary as Jeremy notes that, because building a platform is just too big to tackle alone, I think that's a very important point, there's so much that goes into this beyond just training models from a technical perspective, coop flow is about leveraging kubernetes technology, and especially recent features like custom resource definitions to create a portable scalable extensible ml platform for the whole world.

A

So what does that platform? Look like? Well, you all know that building a model is central, but there are all these other pieces to do, and you may have seen this slide before as well. These are all the tasks that you need to develop and manage a machine learning solution and get it into production. The ultimate goal of coop flow, then, is to provide tools that deliver the functionality in each of these boxes.

A

Right now, in the 0.1 release, the toolset essentially covers tasks directly associated with training, building and serving models, but future releases will gradually fill in the other boxes, and I have a little more to say about that later. So, how is the community doing? Well, honestly, the growth has been amazing. It's very exciting, to see an open-source project like this take off in a matter of months, though, at coop con, for instance, there were eight stalks touching directly on coop flow, which is amazing.

A

Here are some stats? I would bet dollars to donuts that most of these numbers have probably increased. In the last month, lots of members, 20 plus organizations, involved lots of PRS. It's it's really quite incredible. Beyond just the numbers, there are now regular bi-weekly community meetings. The meetings are sort of split between time zones on alternating Tuesdays, so that we've got the broadest coverage possible over the globe. If you really want to get a feel for, what's going on in the community at a detailed level, I'd highly recommend attending some of the community meetings.

A

In addition to just discussion of, what's going on, there's usually a great demo from somebody in the community, so you can see how the technology is being applied. I'd also know that the coop flow github org now has expanded to 16 different repositories, carbon core components, testing infrastructure or the website community and proposals, some examples. This is also an awesome place to spend some time and dig into what's being developed all right. So here is a high-level roadmap for the rest of the year.

A

The 0.1 release came out in early April, with core components: around training and serving models. The 0.2 release is slated for June, and that is this month for those of you on YouTube and the goal is to have a 1.0 release by the end of the year, with production, worthy components and there's an additional goal to actually move eventually to a quarterly release, cadence going forward.

A

That may mean that there's a 0.3 somewhere between June and December, but don't quote me on that all right, so how is coop flow fitting to OpenShift a very relevant question? Well, the short answer is very, very well given some of the coop flow core principles. This shouldn't be a surprise at all. It's dedicated to being kubernetes native, which means it has hard dependencies on kubernetes api s, and the community makes a promise that it will run everywhere. That kubernetes runs that's the goal, so naturally, since OpenShift is a kubernetes distribution for the enterprise.

A

This works out just fine. However, since OpenShift has our back enabled out of the box and strives to be safe and secure by default. There are a few configuration commands you need to do in order to set up users and projects or installing and running coop flow. We need to give the user installing coop flow permissions to do things like register custom resources. We need to modify security context and roles for service accounts in each project.

A

Well, where we'll run it, but it's really just a couple of commands, we'll see this later in the demo, and aside from that, everything works out of the box all right. So it's no surprise. There is a ton of stuff to contribute to, in general, in the goop flow community, but I've called out just a couple things here. That I think might be of particular interest to this audience, to open, shifters and folks involved in rad analytics I/o.

A

First of all, there's a proposal being worked on currently for supporting multiple images for coop flow components based on different distros, so naturally an open ship we're interested in this, because we think that the safety, security and reliability of CentOS as a base image is a perfect complement to the safety, security and reliability built into OpenShift from its foundation. Secondly, having open shift oriented images may possibly help us ameliorate, the need for some of the kinds of extra config I called out in the previous slide, but this proposal needs to be fleshed out.

A

It's not finished yet and presented to the community and obviously all the work to to support this needs to be done. So this would be a great place to contribute. Another possible place for open shifters to contribute is in making the CI tests run on OpenShift infrastructure as part of the normal workflow. Obviously, if we want to have confidence that coop flow will continue to work flawlessly on OpenShift, we should have OpenShift infra in the in the test, workflow and there's work to do there.

A

Lastly- and this is kind of near and dear to my heart as a member of rad analytics IO,.

A

We we can look at having Apache spark CR DS at it as coop flow components sometime in the future. If we recall the machine learning task diagram from earlier there's this top line that deals with data handling, ingestion, engineering, pre-processing, etc. Apache spark is a performant distributed processing framework. It has a pretty broad feature set and it can do a lot of things and I think it would be particularly suited to filling in these tasks from the top row of that diagram.

A

So we'd like to work towards that as well, I will mention that currently Google has a spark CRD that is conceptually based around spark application management and rad analytics io has a new spark CRD that is conceptually based around cluster management and allows for long-running busters or ephemeral clusters, as hopefully, we can move towards having these spark capabilities as part of the standard coop flow tool chain.

A

Alright, and with that, we will shift to a simple demo, showing this stuff being deployed on open shift, no actual data science today more sort of nuts and bolts. Let me hide this oops that too okay, so where do I want to start? I want to start here, alright, so on this screen, I have put together a small shell script. This is running commands straight out of the the coop flow user guide. So there's nothing special here.

A

It doesn't do everything that you can do and launching coop flow, but it does sort of the core essentials. So, for instance, you can launch metrics on usage and whatnot I don't have that here, but basically it is constructing a directory to run in initializing a project. It sets the cupola version that you want. It sets up the registry so that you can get coop flow packages, it installs them.

A

It built a prototype, sets up some environment variables and then it deploys the components and, of course, KS stands or case on it or case a net, and this is the standard workflow now from an open shift perspective. If you do this and you haven't done the other bits you will see goes excuse me, you will see something that looks like this.

A

Everything will be fine and you'll be trucking along until you get down to this last line. That we'll talk about the ability, basically not not to be able to do. Excuse me the lack of ability to edit role bindings okay, so the solution is relatively simple: I won't do this live in the interest of time, but here is another little script. I put together called user SH and I'm running with OC cluster up here for an open shift instance. So all I need to do is become the admin user.

A

Add cluster admin to the user that I'm using and switch back once that runs now, we will not get that error anymore. So, while we're at it messing around with user settings, the other thing that you need to do and I call that in this slide, is to change the change. The permissions for some of the service accounts in the project.

A

You can do this easily again by just making a little shell script, and so here we modify security contexts so that the Ambassador and Jupiter hub images run without error in OpenShift, and then we give the TF job operator. Some extra privileges privileges because it needs to mess around with resources behind the scenes when it wants when it launches a TF job.

A

So if we go ahead and actually run that that is all that takes and then we should here be able to run chaos in it again and this time we should get to the end without any errors.

A

So we are trucking along there and that's all it takes if we check our pod list here, we see that everything is is basically running. The dashboard is still coming up there now we're good okay. So, let's jump over to.

A

Browser here, so here is my project that I just created I make that a little bit bigger you can see all the pots running and today we will will just go ahead and take a look at the Jupiter hub.

A

So if we look at our services, we have our our Jupiter hub load balancer here, so we create a route to that and then we go and visit it and there is Jupiter hub. So we will call ourselves Danny and we will go in and launch a server use the default settings and in a second here we did get a notebook there we go.

A

That was pretty quick. I previously cached this image on my laptop, so we didn't have to wait for it, download the image and then, of course from here you are all set to start running a jupiter notebook.

A

So you can see. This is pretty easy. Even with the extra are back considerations. It's really simple: to set this up on OpenShift, it works just like it works anywhere else. We may be able to iron out some of those privileged things in the future, but it's it's really not burdensome at all. That's all I've got for you today.

A

Perhaps if it fits with the agenda, we can come back next month or the month after and show you some of the other components and maybe show you some of the zero-point-two stuff on OpenShift. So that is it for me. That would be awesome to have you come back again in another month with a new release.