Red Hat OpenShift AI / Machine Learning | OpenShift Commons, 6 Apr 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift Commons ML Briefing JuypterHub, BinderHub, and repo2docker with Carol Willing Project Juyp

Description

Carol Willing (Project Juytper) presentation of JuypterHub, BinderHub, repo2docker and how they all work on Kubernetes and on OpenShift to the Machine Learning on OpenShift SIG of OpenShift Commons

A

So my name is Carol Willick and I'm, actually really speaking on behalf of our core Jupiter hub team, which includes Chris and UVA admin, and as far as Red Hat goes as Dan mentioned to some of you folks earlier Graham Dumbleton has done a lot of work and has been a great help to us over the past couple of years and sorry, there's like a big Air Force jet flying across. So if there's some background, noise I am NOT in my normal spots. Okay, so let's get started.

A

One of the things I wanted to do was to sort of explain to folks, even if you've seen what the Jupiter tools are, you may not know sort of where Jupiter comes from there's some misconception out there that were a big corporation. Actually we are the core group of Jupiter developers are in this photo here. Project Jupiter started in academia and that's where our roots really are.

A

We were funded by a number of research grants and a few industry partners, but what you see in that slide is really the core of Jupiter and we have many many contributors beyond that. Our vision is really to help facilitate data science, scientific collaboration and open source tools with which to do that, and so with that I will sort of move on. Those are some of our funders and our mission is really to focus on usability collaboration and reproducibility and in the sciences.

A

Reproducibility is key because you need to be able to either show a government agency or other people doing research in a similar area. The fact that you can flip to the same answer, even if you're working on it three months later, six months later, a year later, so one of the interesting things about Jupiter is it lets you construct a computational narrative which you can share with other people, whether they're in your industry, area or scientific special to the area, and it really helps with solving problems that span.

A

Multiple disciplines for I think the benefit of the people that might be watching on YouTube. A notebook is basically a combination of narrative text which is done in markdown and or la tech and code visualizations and when I say code visualisation code in visualizations, it's really being able to execute code on the fly visualizations on the fly, as well as embed things like YouTube videos. Other information that might be useful.

A

There's a recent academic paper that analyzed a million different notebooks up on github, there's a approximately 2 million up on github right now. So this is grown really quickly since about 2014 2013.

A

Some of the things that we see are people developing interactive content for large education classes that may be traditionally haven't, had animation embedded within it, but actually by adding animation to their lectures, actually makes it a much more powerful story.

A

We're seeing machine learning being used in a number of areas, including medicine, media finance, government and things ranging from psychic PI, torch, tensorflow and many many of the other rich libraries that are in Python are Julia and other languages as well. In fact, one of the Nobel Prize winners at NYU has put together a comprehensive course based on notebooks and which can be taught in either Python or Julia, and it's really quite a fantastic series of notebooks. So moving forward.

A

Jupiter lab extends the concept of notebooks into more of a multi windowed web-based extensible user interface. So you can have.

A

Windows, you can pull visualizations out of a particular notebook and have it off to the side where you can manipulate it, concepts things like scratch, pads and really the way you can maybe have pros on one window and actual calculations and other things on other windows.

A

So it's really taking the notebook, the classic notebook which we like to call it and extending it in many different directions and also having the ability for users or companies to extend the actual notebook functionality through plugins and other things, and you can try that on binder and I'll get to what binder is in a little bit, Jupiter hub, which you can run on OpenShift, and there are a series of great blog posts by Graham, Dumbleton and and github repos that go with that.

A

That explains how you can do that easily on open, chests and Jupiter hub is basically the way that you can. If you have a group of users that you can give each person in that group their own notebook server, so that they don't have to deploy sorry things locally, they can actually work off a web-based instance and in the cloud, and particularly in education or in a research group.

A

That's very powerful because you can pre install all the dependencies that you need have the same environment or customized environments for each person, in your research lab or in an educational setting and there's different methods of authentication, different ways that you can what we call spawn a notebook server which, for you know, one size doesn't fit all and our supercomputer high performance computing users may have very different needs than somebody working in a small data. Science team within a company of you know where the data science team is maybe 10 users.

A

So it is very flexible and extensible and what we have done in the past year is. We have developed a zero to Jupiter how-to guide, which uses kubernetes and the guide itself sort of assumes that you've got a functioning version of kubernetes on your cluster and then it walks you through how to use helm and the home charts that we provide to deploy Jupiter hub so that you can use it within your own organization.

A

Similarly, Red Hat and urban shift has a way to basically deploy the same functionality, but doesn't necessarily rely directly on the home charts that we have, but underneath of those that there's very similar functionality, the architecture of Jupiter hub is. Essentially, there is a proxy to the outside world that takes in information from users. It will authenticate users with the hub. It will spawn up in many cases, docker containers that will give each user their own sort of sandbox.

A

If you will in which to do work and there's also this notion of services where you might have a third party service that you've developed.

A

Maybe it's a way to share reporting or you know, presentations and it's something that communicates with the hub. But it's not directly doesn't directly reside within the hub.

A

This particular site I'm not going to dwell on, but it's an example of how Jupiter hub is being used at Harvard, with large classes of students to do electronics and wearable and Python instruction, and it's pretty cool so worth checking out later.

A

Binder is a new grant that we received last year and the concept behind binder is you put in a URL or a github repo press, a button, and it will launch an ephemeral, notebook or environment with which folks can compute and it was funded by the Moore Foundation and the reason is more to provide a mechanism with which folks can share reproducible research. So you would maybe have a research paper. You would stick in a badge or a link to a binder instance.

A

By clicking that link, you would then go, and it would take all the data that was used to generate the research within the paper, all the calculations and we are seeing great interest from the scientific community and things ranging from the notebooks itself being used. Our studio being used Julia and you know, Nature- has done some profiles on it and some other information as well, because we realized that some people wanted to build their own binder service, maybe to serve notebooks to an industry. Research group.

A

We also have similarly to the zero to Jupiter, have instructions instructions on deploying your own binder hub and down the road. We hope that all the different binder hubs, at least in academia, will have some sort of federated model behind it. Where you know it really opens up open research and collaboration.

A

The binder hub architecture is very similar to Jupiter hub running on kubernetes. It's a little bit of extra functionality. One of those bits of extra functionality is something we call repo to docker.

A

Those of you that are familiar with openshift may have heard source image that gram a lot of work into and actually repo docker is built on some of the concepts that Graham put together with that, and we had a great meeting with him last year that really moved us forward in terms of sort of being able to specify and then build docker images for custom data, science and scientific programming environments. So we're very thankful to Graham for that, and you know the documentation for repo docker.

A

You can use repo docker either within binder hub as part of binder hub, or you can also run it locally, and you know build your own images and you know, mix and match different dependencies that you might need in order to do your research, and that is pretty much the basics that I had I am happy to answer.

A

Questions I wasn't anticipating to go into a real, deep dive, but I did want to sort of. Let folks know that we are a non-profit.

A

We are very much focused on research and the needs of scientists and data scientists across the world, and you know we are very much committed to building open tools that helps people using machine learning, be more effective and that's pretty much. What I had Diane.

B

Well, I think it's a great in your dark I'm curious from my own back they've been joined too so I need a few more people and get everybody again and see if people have quit.

C

I think this is amazing, really really good stuff I noticed you were using helm and I was wondering if there was also cue flow packages available for all this good stuff, and if not, can we hope.

A

We have been working with you guys on the keep side project you're using Jupiter hub within that project, and we anticipate you know as we move forward, that there will be more collaboration right now. I think we are somewhat limited just by the fact that we are a non-profit research.

D

A

In academia, in terms of resourcing big efforts, but we are happy to provide knowledge and you know, work with community members as well. So.

C

Yeah, my hope was that architecture slide.

C

Folks having to do all those various proxies and things like that, that's certainly what we were hoping to try and tackle with, or make make much easier anyway, with kin flow. And so we are more than happy to help. You know get your packages into flow and make it easy that.

A

Would be great that would.

C

A

Great yeah and I mean I think one of the things that we've tried to do even within the Jupiter team.

A

Is we recognize that scientists and data scientists may not be computer scientists and may not be you know fully versed in DevOps, so we wanted to be able to with minimal provisioning effort to be able to just sort of much like you would download an app from an app store, get to a point where you can have a fully functioning cluster with a number of dated science tools available and attempts our flow being one of them. You know hi George, so I could image. You know.

A

Julia Python are what we have found in our working with both folks in industry and in education that an academic research that really there's not one particular tool that folks relies solely on they. They may work primarily in one, but they wind up using all of them in terms of data cleaning and data analysis, so yeah.

C

We've already seen that in cube flow, that's why cue flow is designed to support pine torch and XG boosts scikit-learn numpy.

C

We already have like upstream contributions for that and so like for us more than anything is making sure that we, what we would love to get your stuff in and connected to whatever a data scientist wants absolutely be.

A

In Copenhagen so well time to chat more in detail, yeah.

C

I can't I can't tell you how many times I've heard people ask for exactly what you just demoed here. This is really really cool.

B

So you mentioned you mentioned Copenhagen and I were put a pitch out here again on May 1st and the evening Carol and David and Diane Fatima, and the number of other people will be at the 6:00 to 8:00 p.m. reception that we've set up for everybody from this community and the control community to come together and have a beer and hear some Lightning talks and and talk about ml stuff on kubernetes.

B

So if you're coming to coupon- and you please do join us that evening and I put the link to that, isn't there any other questions for Carol other than like huge kudos? And what do you need from us? Maybe Carol.

A

You know I think just you know, I think some of the things like so continue to support kubernetes, because it has really as a whole, streamlined how quickly and efficiently we can deploy things. I mean, there's still a ways to go and cube flow sounds like a very promising way to get to, as well as some of the other projects like Service Catalog and some of the other things being done in the kubernetes world. That will make users from a variety of different academic and science and data science.

A

Arena is more productive, so I am very pleased and I'm very much looking forward to meeting all of you in person in Copenhagen to.

D

Maybe if I can add something, I am speaking from University from Quebec City. It was not for a question but for big kudos, because that's exactly as end users, scientist, that's where we want to go what we are well, we already use Jupiter and Jupiter up and inside compute Canada we are. We are teaching the researchers to use the tools and that's great and what we are trying to achieve. Super objectives put the glue in all those tools that we have so having Jupiter wrap, spinning speaking, not boobs inside containers and right we're.

D

What we're working on right now is the other part with spark with Melanie's steam is doing with red analytics to have at the same time, spinning you know, completing their environment from scratch. Having the netbooks directly connected to spark, we have because we do mainly a medical research and think that case we have to isolate different defendant, Byron de them and the idea to be able to spin containers and demand and having directly the netbooks kinetic to to spark instance.

D

That would be sped up at the right time with the right connections, the right credentials and everything. That's what we want to put inside it at the hands of the researchers, because they have to do research not to tamper with all these all those tools that that they have to build. That's the the concept of science apps that science apps, that that we are trying to develop and that we are really relying a lot on all these tools, thanks so much Carol for what you are doing.

D

It goes exactly in the right direction, especially with binder, because we want our researchers to be a to to manage their code, because now science is code. We want them to manage it in in exactly the same way that IT people are doing with versioning, with all those kind of things that are brought by code management that they currently don't use.

D

So we have to put all these boon between all these different tools and that's what we are trying to achieve and Jupiter and all the things that the people are doing here on this goal, a very great for us that it was a kudu for the annual from the end users that.

A

On behalf of the entire Jupiter team, thank you and it is really what we're trying to do is is not just and I think you captured it well, just not not just the tools but also moving.

A

We recognize that science is moving more and more into the computation space and into open tools and cloud computing, and if there's an education piece within the communities as well to deliver these solutions effectively and in a way that people understand.

A

One of the things with scientists is, if you deploy something and you're, not quite sure you deployed it correctly. You're not gonna want to rely. You have your research. You know over multiple years rely on something that you're not sure it's working and I think that's where kubernetes and Kiro and other projects really help take the DevOps stuff from the scientists hands and and put it more into people who are very experienced with that.

A

Whether they're working at you know whatever companies Microsoft Google, Red, Hat OpenShift has done a lovely job of the one-click kind of install. So you know, and and compute Canada is doing great stuff, not just within science, but also looking at education in the younger grades in elementary school and as well so kudos to all the work you're doing.

D

The scientists taking the most of all those tools- it's also helped them not to have the burden of setting this up because normally, usually their weight works in a new lab. Okay, we want to have to do some calculations and spark and whatever rivalries you are using, and we want to set up netboot something like that. Usually what they do is they. They task a grad students to set up on the environment and the text month and it's difficult.

D

They have to read on from scratch and we know for sure that's not an easy task, and that's one month, two months, three months of time that is totally lost for research. It's only time spent on infrastructures, and it shouldn't be the case.

A

Right and that's where I hope you to die ants question of what can the fix on this call do for us is really put a strong emphasis on outstanding documentation because it will, by having outstanding documentation as well as good design underneath of the core technologies. It will make it so that the usability will be such that folks can get their work done with a minimal amount of DevOps effort by grad students or professors or students. For that matter,.

D

Is really important- and here.

D

Fantastic job is my good right now, because it's pretty great for us. Yes,.

B

Graham Graham wanted to be here, but it's 3:00 a.m. in Australia and he has done some really good documentation. I will find the blog post and put it into there's a whole series on Jupiter yeah thanks Matt and he's just posted that into the into the chat. But if we need to do more, our problem I think it open ship and islet that this openly is that we sometimes document by blogging.

B

But what we really need to do is get it into the documents on open ship docked, stocks that open ship and even I'm thinking they get it into the caterpillar posted, learn open ship stuff. So there's some work still to be done. Yeah.

A

I'm happy to help with some of the documentation stuff, like maybe like we do an hour to sprint in Copenhagen and pull together some of the cute flow docs and really you know, make it so that people can get started quickly and I. Think that is I, think all of our goals. So.