National Energy Research Scientific Computing Center (NERSC) Jupyter Community Workshop June 11-13, 2019, 12 Jun 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 8. Jupyter Deployment at NERSC

Description

June 12, 2019 Jupyter Community Workshop talk by Rollin Thomas, National Energy Research Scientific Computing Center

A

So I'm going to talk about what we've been doing for our deployment here at nurse and I decided to kind of focus on some of the customization stuff that we've done, maybe with the hopes that it would not need to be just us doing it and I'll do a little demo of what the what the service looks like for users who log in and maybe at the end or actually at the end, I have a list of things that we do.

A

That's not necessarily a wish list, but would be things that would be great to kind of find out what the real answer is, because we have hacks and things like that that work, but in the long run they we need to do something better.

A

So in case you didn't notice. This is the building you're in right now and we don't have offices. We all just come in at work and stand around like that.

A

The cuts that it really lowered the construction costs for the building.

A

So anyway, we're the production user facility for HPC and data for our Department of Energy funded researchers. So that's people at universities and other national labs in the country. This is all our stuff. We don't have Edison anymore. We just unplug that and disappeared last week, basically, but we're essentially a huge file system and network with computers that we attach and then detach every five years or so.

A

We have about 7,000 users on 700 different projects. They seem to write about 2,500 publications per year. Every 10 billion.

B

A

Hours so there's a little extra factor on that, but basically the machines are used all the time if possible. These are the six science offices within the Department of Energy Office of Science. The thing represent. You know what our users do here.

A

One of the things that's a trend for other HPC facilities. Is people showing up to have experimental and observational data sets, at least within the Department of Energy, a lot of those people who maybe would have stood up a departmental cluster or depended on local resources at their University, and they write into their grant. You know hey, we need to purchase this hardware and it needs to run for five years. Increasingly.

A

Do-E has been kind of telling those people that they need to figure out a way to rented the at the HPC facilities, leadership, class facilities, but mainly that means here so there's all of these experiments that are showing up with lots of data and the stuff they want to do with. It is different from the kind of stuff we've been doing for the past twenty years or so so we're shifting we're, adding, basically not really shifting we're adding a bunch of data analytics machine learning.

A

Real-Time data analysis, because a lot of what these people want to do is they want to take data at a beam line on a synchrotron or something like that figure out. How should I rotate the sample? That's a big computation, so I need to ship the data over to some place that can actually be sitting there and ready to do a kind of larger parallel computation and send the answer back, and then they want to look at that and then decide. Okay, I'm gonna go five degrees more this way or whatever.

A

So this kind of dynamic, workflow, human-in-the-loop and analysis and steering of experiments is something that we're we're really looking forward to helping people with the system. We've got on the floor. Right now is Cory. This is our first system that was supposed to address the needs of simulation and data people at the same time, and so it has a bunch of these. What we like to call data features, data friendly features, namely slurm, which is a kind of a huge engagement for us, is working with the CERN developers.

A

It's getting be containers first buffer globus file transfer. We have nodes set aside for for data transfer, work, look, Oh, work flow to notes, and then, uh and then there was like one node for things like Jupiter as part of the contract and I think the stupid, the Jupiter part was the best part.

A

Why did we take over running a hub service? What we kind of noticed early on was the users figured out how to run the notebook being SSH tunneling, and then they could do Jupiter stuff at nurse scan. They wrote blog posts about it. Like here's, how you use Jupiter head nurse, you just installed Jupiter, and then you can set up this complicated SSH tunnel and use that maybe we should help dial it back.

B

A

It was around that time, actually I'm so okay, so we wanted to kind of embrace this and make people not have to do SSH tunneling, and all of that, so you know Jupiter hub was helped us do that by letting us kind of standardize the service and authenticate people the way we wanted them and educated and maybe help them kind of manage the the process of setting that up. So hopefully they can just get started, doing Jupiter stuff. um Here's, our I, guess our history at nurse I.

A

Guess we invited Fernando to give a nurse user group talk back in 2013 I guess we stood up Jupiter installation on kind of some hardware that we said hey are you? Are you throwing that away? Can we use it? So we you know we set up a Jupiter hub instance there and a couple people used it and they liked it.

A

The thing that they liked was that we could mount the global nurse global file system, so they could see their NATO sitting on the project final system, basically and that's the place where we tell people to put their data so that they can share with other people. It's not the high-performance. While your job is running you can you can hit that file system, but it's for sharing, so you know people could could make plots and they could do little data analytics tasks on that on that one node.

A

But the next thing that we did was move the place where notebook spawn to be inside Cori well on Cori I, don't mean inside because inside and outside crater are different, but this is more outside Cori bits on a login node that we repurposed, or actually we set aside from the beginning for running Jupiter. So Correa is something like 24 login nodes, which is a lot for us really. Only 12 of them are in the load balancer for users to actually ssh into the higher numbers.

A

Login notes are reserved for these kind of big memory, notes or Jupiter, or file transfer, workflows, and things like that. So we have. We got one node for our notebooks and you know maybe 20 users used it for a while, um but I think somebody said yesterday like they, you know you give users a resource, it becomes theirs right, and so everybody started wanting it for themselves on one note, basically and so that that worked for a while and we actually kind of not the best in terms of programming, Jupiter components.

A

So you know we kind of ran into some growing pains, but I think we worked them out, but even at the same time, super popular thing for a lot of our users so like some of them are really fanatics like I. Don't.

A

B

Other supercomputer centers, but.

A

I really like to do mice do a lot of stuff that nurse simply because you have have Jupiter here. um So this was great for, like maybe 20 or 30 users, but as more users gone on, they started to notice that it was one note and- and they could like crash it.

A

So we had kind of two things to work on, maybe 2016-2017, which was stability and then, of course, the need to scale, because it's just one node, so acquiring additional nodes for running Jupiter means you have to kind of do a lot of talking within your organization to figure out like okay, so those nodes that we said we were gonna use for this.

A

Maybe we're gonna change them around to be Jupiter stuff, and so a lot of the work that we did was was kind of socializing that uh our architecture so I'll do a little architecture, diagram demo.

A

So we run the hub glory. We run it on a container infrastructure called spin. It's Rancher underneath it's running the old Rancher scheduler, but soon it's going to turn into kubernetes. So we'll make that jump this here. So we run the hub there. We have a few other containers sitting alongside we split out the database, we're splitting out the proxy. We have a couple extra services that we run alongside to call notebook. We let them be idle for 24 hours and we shut them down.

A

We have like a monitoring container that runs alongside as well and sends information to our central data, collect, set height of quarry in a in a docker container, and then we have two custom components in the classic sense. We have our own Authenticator because we have multi-factor authentication started out with GSI SSH, but that started going away.

A

It was a real pain to keep that working, because it was a service that needed to run when the node came up, but we have our own custom Authenticator that uses our internal authentication mechanism and our own kind of spawner infrastructure that lets us do what I'm about to show you. So this is our Authenticator. Here we have an internal API for managing generation of networks with our multi-factor authentication. Once a user is authenticated, they can choose where their notebook is going to spawn.

A

In the center, so if Cory is down for maintenance say and they have a paper deadline and they want to make a plot say we don't want to tell those people sorry. So we have another container sitting inside inside of spin that allows users to start up a notebook inside this shared container and at least make some and make their plots with kubernetes. We might be able to do something kind of you know more normal to spawn over. On the login nodes on on Corrine, we've now been able to repurpose three nodes: three nodes total.

A

You might be able to get a couple more. Those spawned by using SSH spawner that we've written I think there's a couple SSH spawners out there, but we we sit on top of a sink SSH. To do that, once you have a notebook running on a login node on Cori we've extended the internal network on query, which is all the compute nodes which don't have routable IP s.

A

Generally, we extended that network out to the login node of these high numbered login notes, so you can have a notebook running or a thirteen or fourteen or 19 and talk to say a task cluster running in a job that you started up through a regular job submission, so I got to know as the IP of the internal IP of that of the head node of that job. Ok, so that's a super popular.

A

We think this could probably be the most popular way that people are going to combine Jupiter and the batch queues, because their notebook gets to stay around basically forever. There's another way to do this, which is kind of straight batch. Spawner start the notebook up in a job, and we have a API for associating IPS on-the-fly to jobs for a small number of IPs, so part of the job startup is is, is to hit that Sdn API and get a get an IP address.

A

So I should I really I want to point out that um a lot of the infrastructure services we've added on the center side for SSH.

A

Shane here so there's a lot of infrastructure stuffs that we did that it really helps to have like somebody that understands how the the center infrastructure works to be able to do these kind of things and then batch spawner, okay, um trace was here. I think you showed some extensions yesterday. One of them is this: Jupiter lab slurm jupiter lab plugin. Let you look at the queue and top jobs, and things like that, so that's in development, actually I'm william over there has been working on it.

A

I think that it works now, but I think it didn't a few months ago. So alright so I'll do a quick demo.

A

Alright, so we have I didn't talk about how our deployment model, but we do like a monthly deployment cycle. We have somebody said I have a B, dev and test or whatever we have like test stage and then the production, the production stack. So what I'm showing here is the stage one which is the one that is like right before we do the production, so we've customized the authenticator we've customized the login page template, so that we can stick in our flavor of multi-factor, authentication, Thanks and I'm logging.

A

It is myself and so I'm staff, so I see some things, not non-staff people don't see. ah So this is our console or our home page and again this is customized as well. What we wanted it was enable users to pick one of these to run at a time. So you can run. This is the shared.

A

What that is. That's a 319, great thirteen fourteen the shared logging. If you want to submit a job using batch Conner, that's these two things over here and I mentioned that I'm staff, so I see things that other users don't see on this page.

A

One of those is test fit and then over here you have to be in a special QoS to be able to see the view notes which actually are running on separate sermon controller and it's not actually part of Corey anyway I mean I, can push these buttons and and they'll do things. But so this is starting up the GPU, a GPU node job, it's fairly fast, but if I go to start up a job on the on the CPU nodes, using that funner, it's pretty slow, because our slurm is super super busy.

A

Okay, so it might take three minutes before your job starts up, and then this is some thing. I'll come back to it in a minute which is our users are like always like? Well, I want to go back to the console, but it always leaves this extra window open and if I stop my server it, it makes the page gray and there's errors and stuff and I don't like it. um Let me look the test user, though so I have a test user.

A

Alright and I'll. Show you that it's a little bit different, oh I, hope his password hasn't expired.

B

A

Yeah, okay, thank you. Now it shouldn't be right.

A

Okay, whatever I'm doing is maybe I did get an email that his password was about to expire I.

A

Don't understand why it's not working.

B

A

That I was gonna, show you was how you don't see buttons if you're a regular user right, which is like I. Don't really need to show you that, but just imagine.

A

The point I want to make, though, is how do we do that right? We've got to know some stuff about me and we got to know some stuff about users, and this is important for if we have say an options forum, we don't want to list all accounts that live that nurse, just the ones associated with the user. We don't want to list all the shifter images that our nurse just the ones that they should care about.

A

We don't want to list all the reservations that are in slurm right now, just the ones the user can submit to say. So we have us. We have internal services that we've exposed through a REST API or through REST API. So let us get at that information. We do have a little bit of difficulty getting in that information at the home page, but we have no problem getting that at the options forum stage, because that's a callable, that's a callback, that's a covert team, so we've shoehorned in a little fix that is kind of waiting.

A

There I think it's not necessarily the right way to do it, but here's an example of our options form for a batch spawning, so I think I just.

B

A

Actually have a real production one here, because you charge equal force up, so that's something that that we've developed, but things like what reservations aren't there much if your images are there, these are done. These were done for this demo by SS aging to the Machine and running in command, so should fix that oryx terms and have a recipe anyway. So we have a new machine coming and Jupiters going to need to work on that. This is what that page is gonna, look like maybe in a year or so, you'll have all these options.

A

A

Okay, so this is the thing that we ran into is that we we have these things we want to expose to users at the center, our computational resources, our file systems. If they're gonna be submitting jobs, we need to make it easy for them to say well, you know, use my default repo or use this particular repo that I want to use for this job or I, have reservation and I can only submit from this repo or whatever.

A

We need to kind of simplify that, and so that means that a lot of intelligence has to be brought to the hub from our internal services and basically, we just generate rest. Api is left and right. Yeah.

B

A

Cory has login nodes where which you just log into you, can generally the regular login loads are for compiling code. Messing with your you know, writing software. Looking at your data may be a little bit interactive stuff, but then submitting jobs with batch queue that runs on compute nodes, which are not normally accessible. The same way from the outside.

B

A

B

Also, have a racket has wells that are interactive notes. We.

A

Have so there is an interactive QoS in the Haswell partition and people can use those to say start up a desk cluster, a notebook running on a log in note saying to talk to you.

B

Right yeah so hopping computing. Centers! Don't want you doing data analysis kinds of tasks. Is that not true? We.

A

Have guidelines for what a what a appropriate level of use in resource consumption on login nodes is, and if somebody is exceeding that than we talk to them.

B

A

I didn't ever know, goober nodes are yeah, so these these ones, these repurpose login nodes I, should mention that a normal login node has maybe 50 people on it and you given time a lot of Martha sitting there hardly doing anything like editing a file Jupiter the Jupiter nodes. We currently have concurrently, like 200 notebooks running at any given time across the stree nodes. When it was a hundred, the node was going over like every other day.

A

um Cgroups memory secrets are in place so that, if anybody kind of gets today we know- and they just don't know what happened. But hey don't do that. So so, but you know, we've we've talked to the Systems Group about about other alternatives like putting people into jobs. They don't actually know our running things. That's learn can do that might be. You know, it's just me and the C group yeah.

B

A

I connect to some other person's desk desk job.

A

You well you're running desk. You should be. You should be ensuring that that doesn't happen by doing so TLS stuff, but really yeah. We're kind of I'm not not really terribly worried about users figuring out how to grab other people's desk stuff, but we do.

A

We do recognize that there are some security things that we need to address and that's one of them also if somebody starts up a bouquet server and they have a Sdn entry, their bouquet or sorry desk in the bouquet server sitting there, for instance, they don't have that guy behind TLS people could look at what they're doing people could send mess up their ass up their bouquet server. I! Guess that, but we this is these things.

A

I showed you like the stage right so stage is not in production, so we don't have all this stuff ready for users quite yet, because we're still working through some of those issues.

B

A

Not a Jupiter, proud, yeah.

B

A

We don't yeah so the way I look at it is you could you could ask that about kind of anything you could do with Sdn API, and so we're actually still reviewing. Is that the way that we want to do this and one of the things we're talking about is, should we run CHP for the rally on an on a serious note, yeah.

B

Ask me about the with extinction: we've got which, since it's a safe, stable, all or don't say, except logout module. How can we break back to landing page, which I think is what you're looking for.

A

And closes the tab.

A

A

Okay, I don't mind it, but it confuses users a little bit and then I see weird messages. After.

B

Trying small terms when you say yeah.

B

A

Yeah I appreciate that, okay- and this is my last slide- is in terms of wish list which I should have put in quotes. It's not really wish list is just stuff. We didn't really know how to do and then thought. Maybe there's a thing to do all the stuff I talked about that refers to like what a council user can charge you their shifter images, all that stuff we're just sticking it on of--steak, which gets set when you authenticate that can be refreshed, which is great.

A

So if they go into our accounting system and they sign up onto a new reap allocation, we can probably get that the difficulty was getting it at the point we needed it, which was on our our our name. Servers iPage, basically grabbing ahold. Of that exactly means I mean you have to do it from from inside co-routine. So the get method there was okay, but it's kind of weird looking that it's just sitting there going hey by the way go find out what all the user can do.

A

Maybe there's a better place to put this so I think it's kind of jammed in there, but hey it worked secure. I know there was like a discourse topic I think you think UV started it may a couple months ago. One is. We would really like to know about the possibility of notebook level on it. So every cell, every thing that goes through the X term extension tutor Minato we're interested in finding out how to log that hubs.

A

Fine, all that logging is going up central data collect and you don't have to do anything, that's part of our infrastructure, and there is interest from our networking and security group on the on the code review side of things. So if you, if there, if you were looking for somebody to do this, I know of a person who would be really excited to do that and then there's a thing about rap spawner, that I could be just doing wrong, which is there's like a default.

A

There needs to be like a default kind of spawner hanging out there and it's local process spawner right now for me in the container, and the only reason it doesn't work is because, like there's, no get PW, name or whatever, and it just fails, but I think there should be a spawner that doesn't do anything. So this would be a good default for that unless rap spawner can do it in a different way, but something that would say whatever you did. You picked a name for your server like ABCD efg.

A

Can't do that, so if I could figure out how to do that, I thought this might be a need, I need. If you can do and then all right we can skip the last bullet. Does we already talked about all right?

A

So if you want to talk about like our deployment stuff and why we do things the way we do do them with slurm and stuff, like that. We're happy talking about that during the break breakout or something like that. So that's it.