Jupyter IPython Dev, 22 Jul 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: JupyterHub Workshop, July 22 2016. Part I

Description

Part II is here: https://youtu.be/6p1xhi0P5dg

An online workshop about JupyterHub, including a system overview and a collection of lightning talks describing various deployment scenarios, tools and needs.

All talks and materials are available at https://github.com/jupyter/jupyterhub-2016-workshop

A

Good morning, everyone, this is July 22nd 2016 and we're going to host a fairly informal online workshop around Jupiter hub we're going to try to look a little bit at what Jupiter hot does at the architecture of the system, with an introductory taught by min, and then we're going to both from a series of short lightning talks, describing scenarios and use cases for the system, and hopefully will this.

A

This will serve to sort of collect both for users to collect kind of a gallery, a gallery of scenarios and tools that have been developed to use different hub in various environments and also for us for the team to understand what are the various and common problems and pain points and what are similarities and what is what is happening. The ecosystem both saw that in integrate better and so that we can identify problems that may need to be solved so I'm. Nothing to spend too much time doing a lot of introductions.

A

We have a lot of people, so I would rather without further ado. Let me know Ian Kelly take the first call, meaning was the original architect of the project. Super hot and he's been with us for a long time. Many people know him how about it. Yeah.

B

So I'm gonna start us off by talking a bit about the general setting the stage with the general Jupiter hub for architecture and how it works, what pieces we have and and then a little bit about where it's going, what people can do with it? Okay and just checking everybody can see my screen with the first slide. That's working all right, good, all right!

B

So I'm start talking about the architecture itself and kind of how would you put her who works and what's there so we start out as I almost over start out with kind of whoa. It is a notebook and first you grow up is concerned. You start out with you know: the notebook is this document formats and it's also the environment in which you create the documents and run your computations and things, but, most importantly, for Jupiter hub is it's a web application, so a notebook server?

B

Is this web application where you have a tornado, pipe notebook, server and then different kernels and in different languages as Python Julia I? Wouldn't what have you talking over the network and then we view this whole thing as a single multi-process web application that users connect to over HTTP or HTTPS and and WebSockets.

B

So from that perspective, what is Jupiter hub, Jupiter hub, takes that notebook and add some extra pieces to it. So Super Hub manages authentication so that users can log in and use their notebooks with authenticating as themselves, and then it spawns single user servers on demand, so each user has their own notebook server and for each user there's one of these entire web applications that it's the same application that used when you take a Jupiter notebook on your laptop or that you get with two company on Triton, Juber org.

B

But then juga hub adds this authentication then and spawning to let you deploy notebook servers for your users in various contexts, so the gist of what happens when a user visits Jupiter hub. The initial requests goes through this configurable htb proxy and it's handled by the hub.

B

The hub presents some form or uses OAuth to identify to authenticate the user and I didn't identify them, and once the users logged in the spawner starts to singing these are server for that user and then, once the signature server is up and running, then the hub notifies the proxy to add another route so that this user, slash name URL points to the the individuals, see the users or server and then the hub redirects, the you know, the login page to the URL that is, is maps to their their particular server and then the the only modification that Jupiter hub does to the thing who's your server from what you get when you type Jupiter notebook.

B

Is that it it adds. When you get another book it does some verification with. So the hub set a cookie on that URL and then, when the signature server gets a request with that cookie. It asks the hub if that cookie corresponds to the the user or users who should be allowed to access that server. And the important part here is that the Authenticator, which is the class that implements authentification, is customizable and the spawner. The class that implements starting notebook servers is also customizable.

B

So you can do a hub ships with some basic communications, but you can use any anything that can authenticate users than anything that construct processes can be used in a Jupiter hub deployment, so a bit about when you should use Jupiter hub. So you can have. If you have a class where students want to do homework, you can deploy super hub with, with envy greater. If you have a short live workshop, just teaching you can, especially if installation of your tools is hard.

B

You can use Jubair hub to give people access to to some resources or, if your research group, if you've, got a shared work, station or small cluster, that every that different researchers should be able to use, you can have jupiter hub, presents an access point to the to that or computing resources in general, at an institution for researchers, students analysts, whatever the general principle is, you have computational resources and you want to give users access to that and they should have their own space to work in and think should be persistent and they should be identified as as specific individuals.

B

So when not to use jupiter hub. This is important, as we have a variety of things that and the existence of some other services. Let us kind of scope limit what we want to do with jupiter hub itself. So one is the basic principle of jupiter hub. Is that the notebook server that users get in their work environment is authenticated and persistent? That means users login it's their stuff when they leave.

B

They should expect that stuff to still be there, there's also temp nd, which has a lot of things in common, but importantly, temp MB is anonymous. You just visit, try it out, you Berta org and you get a notebook server and then, when you leave for awhile, that server is gone and it doesn't say if your work or anything it's just kind of for trying things out and demoing things, and then binder is basically temp. Nv+ pre-loading the environment based on github repos. So it's useful for demonstrating an interactive documentation with four packages.

B

You can actually do live demos for people who don't have to install your software or, if you want to you know, do a reproducible research going to follow along with my computation without going through the installation process, and then there sage math cloud which, most importantly, is hosted. So it's a commercial offering that has hosted notebooks and other things and also provides real time collaboration with projects and then a variety of classroom features for using in education context and the biggest the two yes points for sage.

B

Maskull cloud are that it's commercially supported, so you can have sage Matt.

B

You can, rather than doing your own administrative assistant, work to deploy Jupiter hub that you can pay Williams time to if you do it for you once HMS Cloud, and also the real-time collaboration feature, which is important and we still don't have yet so the default behavior of Jupiter hub the authentication uses Pam the Linux pluggable authentication module, which means it's just local users and passwords for logging in and then the spawning is just to become a real user on the system and then start the notebook server and because of this in particular, the hub must be running as root in order to have the permissions to turn to become other users and launch servers as then there's a pseudo spawner that lets you do this with slightly less than Rivoli jizz, but in general principle that the hub needs significant privileges in order to run in this way, so a bit about custom authenticators.

B

So if you, if you on your system and want to use an authentication that is not the basic pam, there's two base, two general categories of authentication and one is username and password where Jupiter will show the user form with a username and a password field. And then your Authenticator just needs to implement. One method that takes a data will be a dictionary of the form data from that HTML form and you can get the username and password out of that.

B

And then you do whatever it is that you do to verify that username and password. And then, if authentication is successful, you return users, your name, and if it's unsuccessful, you just return that to identify that there is no. This information does not identify any particular user, and that's really all you need to do to implement a custom user name and password Authenticator. It's just really that one method, the second type and perhaps more common I, think, is using OAuth.

B

And we have this Authenticator package that is on the juber hub org, but not not part of the Jupiter hub base. Installation and tornado and Authenticator provides some base classes for implementing anything that does standard OAuth flow. You just need to give a couple of URLs, and then you have a login handler. That's the HTTP handler that redirects to the OAuth service and then the O Authenticator is the Authenticator class that defines so with with OAuth.

B

You visit a URL on your server that URL redirects to a to that oo service and then after logging, in that OAuth service, redirects back to another URL on your system, which is where this Authenticator will get called, and typically this is this- has some often short-lived code, passing the URL arguments, and then you need to take that and then ask the service with an API requests kind of who is the user that this code corresponds to and then again you just return the username.

B

If that is a successful authentication and returning them, if it isn't.

B

So customer authenticators that that exists there just some examples. Our Authenticator is a package that provides a collection of implementations of OS and a base implementation for people to subscribe and extend so it it includes github, google, bitbucket, wikimedia, gitlab, yeah and then there's remote user, which can be used to just you, have some proxy apache or nginx, or something that sitting in front of the Jupiter AB application and that handles its own authentication may be using campus Shibboleth plugin or something like that, and I just sets the they use their header.

B

So that is actually the hub isn't doing any authentication. It's just looking up the user header and trusting the proxy in front of it is doing authentication manage LDAP Authenticator, which is talks to LDAP directly through the app API rather than some. Sometimes you can use LDAP trips through Pam. There are examples of that, but sometimes you want I want to talk to all that directly. You can write write more authenticators using either of these patterns with relatively little work. Based on what exists already so example. If using github pull off, is you create?

B

You need to go to the give up site to create an OAuth application and then you'll need to pass particular callback URL that the Troubadour hub expects, and then you get a client, ID and client secret that identify your application to github and he used these to set some environment variables to configure so I get hub Authenticator, and then you installed the Authenticator package and then in your Jupiter hub configuration file. You tell it to use this Authenticator class and then you're and then you're done them. You've got github all authenticated, superhub.

B

By default, the default behavior of uber hub is to allow any user who successfully authenticates is allowed to have a server. That's usually fine with Pam, because anybody who has access to that server already can have access to that server through Jupiter hub. That generally makes sense.

B

But if you're using github OAuth, that's that makes anything less sense, because the odds of you wanting to overlap every user of github access to your machine is student. Less so authenticators have a notion of a whitelist. This is a white list of user names, as interpreted by your authentication and service that should be allowed access and then there's also a list of admin users who should have access to the admin interface of Jupiter help? Who can do things like add and remove users?

B

Do some debugging, stop other stop and start other users servers that kind of stuff so a bit about what spot? What spawners need to be able to do and how to do a custom spawner, the most basic functionality.

C

Are you? Is there a way to modify that list while the server is Chris seems I can read that see how you would yeah.

B

Something that I didn't entirely plan to talk about, but these the config file is essentially used to initialize a database, and then there is an API, a REST API that you can use to add users add and remove users in that database user that has added via the UI will get added to the whitelist that so adding a user is effectively adding user to of the whitelist.

B

So the the whitelist and then the config file is really just an initial condition for the whitelist and then any users added can be added Li your at runtime and then the next time. You start the server it will load from both sources, the whitelist and the config file, and anything that's been out into the database.

B

That make sense.

C

Okay yeah, thank you that that makes sense.

B

So once you've got a user, that's identified as a username and then you implement a spawner is the object that represents starting a notebook server and there's a base class for this. Just like there's for authentication. So writing custom spline errors, you subclass this and you mainly define three methods so start- is a method that will start the notebook server and there will be. There are some methods than the base class for the arguments to pass to the notebook server and the environment variables to load into the environment.

B

And then you, you write, you implement the function that is kind of. How do you I actually start this, which can be? You know, allocated an ec2 instance submit of PBS job any of these things, and it must be a tornado cover team for a async reason. So a lot of these spawners might take a really long time start a server.

B

So you can use these yield statements to start a server to request a server, and then you don't return until that servers actually running, and then the main thing that your Bonner needs to do is communicate to the IP and port of the server, so that the hub can add that entry point to the to the proxy. This can be any IP import. It does superhub, doesn't care anything about how you start the notebook server other than you start it somewhere.

B

That is accessible on the network to where the proxy is running, and then you pass that in the next release of jupiter hub you'll pass that just by returning at the end port in the current stable version, you set these with setting attributes on the user, server or M object, and then the remaining methods. So once you've got a notebook server, you have to be able to pull. You know you have to be able to check if the server is still there and then you have to be able to stop it.

B

When the user asks you to stop running the notebook server and then some extra methods that you that are kind of optional, that in general, the hub should be able, you should be able to bring down the hub and leave your single user service running and then bring that hub back up and it can reload it state from the database.

B

In order to do this, it needs to be able to reconstruct whatever state the spawner had to reload the spawner from the database so that it can continue polling and stopping without having been the actual objects that started the notebook server. And so these are just load state get state, so the state could be any dictionary of states.

B

So I've included the examples from the local process Bonner, where the state for the spawner, it's just the PID of the process, and so you save the PID so that you can reconstruct a new spawner from that safety. I Deanna can check, it, can start, stop, kill or can stop it and pull pull and and stop just give enough PID. But whatever information you need to reconnect to your spawners, what you would use this here, it's the container ID on docker, that kind of thing, so an example of using corresponding er.

B

So one of our example.

C

Is there a way to share that between an Authenticator and a spotter share.

B

These tips, so the Authenticator has the Authenticator has two hooks I believe they're called pre-spawn start and post spawn stop that are called with the spawner kind of before and after it starts, and that lets you do things you can have Authenticator spawning Authenticator pairs that need to pass information. So there's a the CI logon Authenticator and spawner pass credentials using this, so the Authenticator logs in it gets a client certificate, and then this it's doing this pre spawns start hook.

B

It passes that certificate to the spawner and then spawner dot start loads that into the users environment.

B

So it's in that case the Authenticator and spawners do need to know about each other. It's really a unauthenticated or spawn or pair. They don't necessarily work with it. It's not because it's that behavior is not something general to all notebook servers. That's a feature that they that that spawn her class and authenticate your class, both nil about that they pass between between them. Through this book yeah there is, there is among mechanism for the Authenticator to pass information to the spawner, and the spawner does have a handle on the authenticator.

B

When it starts, if it needs to get information out of it, oh definitely I.

D

Noticed and I actually looked at your. This is Shane cannon by the way.

D

I just came in a little light, actually looked at your code when we were working on the GS at one, and you had something that I think generically would be useful, which is being able to tell the spawner what URL to contact back on, because the one that's in Jupiter hub may not be the right one that you want to sort of tell this honors to use so that that may be something that would be generically useful to push up or into the and just sort of the the template or whatever. Yes,.

B

So in general, Jupiter hub does have does encode kind of the IPE and API URL of the of the hub generally. The only thing you need for that is an IP address and IP import that the hub is listening on, and so, if you set the hub IP, for instance, that's generally accessible.

D

Yes, when it's running when the hub itself is running in a container and I, think you even mentioned that and you go because the internal address that it's gonna, if you Dockery.

B

Yeah with docker in particular, the best way to do that in DOS with docker overlay networks makes it a lot make that basically lets you think about it as a regular network. When both things are in containers and they're on when you've used, docker links or docker or docker overlaid networks, then you just need to pass the basically the the hostname of the hub container yeah.

D

In our case, we're running the hub in a container, but then the spawners run externally from the from docker yeah.

B

Great yeah- and in that case you can there's a when the environment in which the hub is and the environment in which the containers or the spawners running then it made. There may be something specific to the spawner, but you need to say the hub doesn't actually know where itself is relative to the spawner, but the spawner does so docker in particular, has what it's called the hub.

B

Docker spawner actually has its own configuration for overriding the hub by P yeah. That's what.

D

B

Hub connect, IP I, think is in docker, spawner yeah.

D

B

We could we could push that up into the base, spawner class so since that that you're right, that is, generic to.

B

Actually, I haven't seen it be relevant, except for docker, but it conceivably could be that we can push that configuration up to the base spawner class.

B

Yeah right clicking along so this is in general. This is actually related to that in your config file. Loading. The docker spawner is often not enough, because the hub API, that is, that the spawner uses to check with to authenticate with the hub. It listens on localhost by default, because we don't want to have give other machines access to that API unnecessarily, but often with you're running in docker. That's not not actually enough!

B

So if you ring a hub on the host and your containers in docker this bit of code, been a config file, tells the hub API to actually listen on the docker virtual interface so that it's accessible to to docker containers, and in this case you don't need any configuration. Any special configuration of the spawner you're just telling the hub the hub's API to listen on this interface and an interface is accessible to all containers and the docker spawn in particular has cons of configuration. You can use.

B

You can load data volumes, you can specify the container image and you can do a bunch of networking configurations to give different containers access to different services, including the the hub itself, so more custom spawners that we have there's a dr. Spier, which I've talked about pseudo spawner lets. You run the regular local users without giving root permissions to do the hub that one is kind of I.

B

Don't recommend using that unless you really have to because really hard to configure and debug batch spawner lets, you respond servers with bad systems and a little spawner lets. You run remote servers over ssh and then there's more than you can write the right here own without too much effort.

B

So then a bit about deploying jupiter hub. You can install this Conda pip and PM. It does have one non Python dependency. So if you you can install it in one command with Conda fuse the condo for channel or you have to install the Python package with PIP and in the end script package with NPM for the proxy caveats for installation.

B

Often, if you all things and ends those ends aren't readable by all users, so you needed really careful if you're making ends for Jupiter hub that are owned by roots and privates that won't work. You'll need to make sure that the Jupiter Hubbins installation itself is accessible to all your users when using local service or servers. Local single adjusters.

B

Always use SSL, since this is an authenticated service for people again run code. Just don't run it over plain HTTP ever uh-huh and you can. You can use self-signed certificates, but one caveat there is that Safari refuses to connect WebSockets to instruct it untrusted certs, but the solution to that is, let's encrypt, which provides free, SSL certificates for any domain. So if you have a domain, then we can use SSL with Jupiter hub or you can do as a self-determination at genetics or Apache or some things. Men.

C

Do not the hood so the hub, API URL is not encrypted as well, because I thought that was no.

B

The so the local, what are considered local internal services are all still HTTP only that for that's mainly just the lack of exposing configuration, there's no actual restriction, and we actually really want to set up PKI for so that for authentication with not not just ssl encryption, but actually SSL I, don't know at the rate, actual words are, but actually set up. Authentication like the docker demons, do so that the clients actually get client certificates to authenticate their API requests, but that we haven't gotten there.

B

Yet it's mostly just exposing a matter of disparate or native servers and everything we use support. All of that, it's mostly just a matter of exposing those options at the client can pick files in order to do it.

C

We should talk about that because I think there's one of our use cases involves running a hub remotely having basically spawning notebooks, it's actually on a different network. Yes,.

B

Then you definitely want, as I saw in the API, and we can it's it's not hard I. Just there's a lot of things. So definitely talk about that yeah yeah! It's definitely it's definitely not hard. Well, the PKI stuff might be a little bit hard, but just turning on SSL is not hard. We just need to expose the options, so the basics of running with SL. So this would be the config. You needs to run with SSL on the right default. Http port and you generate config to generate your default config file.

B

Just like all the other jupiter applications, more things it you for the system-wide installation. You wanna make sure that you install kernels and kernel specs system-wide, which may not be the default behavior for some kernel installations like Julia and R. So you want to make sure that those kernel specs are installed system-wide.

B

The same is true of extensions. You can install and enable extensions system-wide as well for Jupiter hub users.

B

Generally, it's a good idea to run Jupiter hub with some service men and saying whether it's, whether it's services supervisors, what I use most often in it. It's you know: it's it's a process, it's a long-running process, so any whatever service management tool you use you can use. You can run Jupiter hub with that.

B

We don't it's not. It doesn't really care how you start it. So that's a good idea to run with whatever, whatever tools you're familiar with, and we have to reference deployments, one for deploying docker, with with docker that use docker compose, puts everything in docker and I'm gonna one. That's a example. Deployment for teaching that uses ansible doesn't use docker and sets up and be creator on local machine general best time. As always, you start with the cell.

B

So the default is to use SQLite, because that's the only thing in the Python standard library, but it's a good idea to use Postgres or some or or something. If it's available you can put nginx or some other 13 webserver in front of the configuration proxy to serve static pages. um There's a call idle service service that can prune I fatal servers to save can produce resources.

B

If you have a lot of users, but a few of them active at a given time, you can put nuclear and I Python support, system-wide configuration in at C Jupiter and at CI Python. You can put the configuration file, it's there, so the fact everything and then just backup your user data. It's always a good thing to do if you're, especially fusing docker and things like that mentioned before, is there's this REST API that you can use that runtime and with external services to manage Jupiter itself.

B

You can add users remove users start and stop user servers, create a new authentication tokens talk to the proxy shut down the hub itself, a bit about where we're going what's coming. Services is the next major milestone for things like coal idle servers, script and and be greater like those easy easier to run and manage. These are things that talk to you per have API and there may not be started by Jupiter hub itself and then the next day they once we've got.

B

Services is a sharing service for users to publish notebooks and download notebooks from each other, and then the next thing is by multiple servers per user right now we restrict users to one server, but that's an artificial restriction that we can relax.

B

Yeah, that's Blair I, never do probe a little bit overtime thanks.

A

A lot me no no worries. There was a lot of good discussion and sort of couple minutes late, so I think in the interest of sticking to the schedule. Since we have time, we have a solid half-hour at the end, I'd probably want to pass pass it on to Brian who has the next, the next slot and then a little tight and then we'll we'll kind of keep keep open discussion for the final run. I.

E

Can folks here B, yes, I share my screen here.

E

You see my screen. Yes,.

E

All right so I'm Brian Granger I'm, one of the core, Jupiter dad's and I'm gonna, be talking about my experience in deploying Jupiter hub and one thing that I think it's really important and probably the main point that I want to make is that and in any context where you're deploying Jupiter hub there's a lot of decisions you need to make, and my deployment focus here is small to medium groups of mostly trusted users and where the deployment is being managed by a non DevOps type of person and I'm, a professor at Cal Poly and, in addition to helping to develop Jupiter I, also regularly teach classes in data science and physics where we're using and deploying Jupiter hub.

E

So I sort of have these two roles myself both being a a developer Jupiter, but also a user of Jupiter as well, and so the the biggest thing that I think you'll encounter. As you deployed you better. Have these questions? There's a lot of questions to answer. Are you gonna use the doc responder or not, which Authenticator are you going to use Pam or ooofff, which spawner I? Guess that's the same as the docker. Are you gonna run a proxy like Engine X in front of Jupiter help or not?

E

How are you gonna install and manage kernels when you're gonna use, Conda, you're, gonna use the built-in system? Python is and be greater needed, how many users one of the hardware requirements- and this is I- think where there's still a lot of unavoidable pain in deploying Jupiter help right now that there are all these questions and I think the main thing that the main point I want to make is that the answer?

E

The answer to these questions will depend highly on what usage case you want to support, and my particular usage case has the following characteristics. So there's no professional or four-time DevOps person when I'm in teaching mode I'm spending most of my time teaching a course and during that quarter, I have essentially zero cycles to spend on DevOps, isset min type tasks, and so my my deployment is sort of optimized for that time, requirement and expertise, level I typically have between between 50 and 100 users across one or two or three sections of a given course.

E

If we were ever in a situation where we had more than a hundred users, they would be spread across multiple sections and we could just deploy multiple servers and so basically we're finding this sort of one to two sections per server works. Well and also my users are trusted by that I mean I'm, completely comfortable, giving each of them a standard. Unix shell account on these servers.

E

The other is that because of the constraint on numbers and also the memory and disk space requirement, I can always get away with a single server for a given Jupiter hub deployment and as other people we'll be talking about later. The effort that goes into scaling Jupiter hub across a cluster is much more complicated and I'm, definitely avoiding that, but the deploy must be repeatable. So we're doing this deployment one two to three times per quarter a few times a year. So it's something that we do often and the persistence is really important.

E

Students are doing a quarter worth of work on these systems, and so their work I have to be able to guarantee instructors and students that their work will not ever be disappear, and so they need to have persistent home directories that are backed up anywhere from daily to weekly and then optionally, depending on the deployment scenario, a full and be graded employment with multiple classes and instructors.

E

The the these this particular usage case that last point about and be greater I think also maps onto a lot of small research groups and data science groups in the industry, and so, even though this sort of a lot of this goes into the head heading of deploying Jupiter hub for teaching context, I think a lot of it would apply equally well to non teaching context that have these same constraints.

E

So again, I really want to emphasize that constraints. Here I am simultaneously or often teaching the courses. I'm, deploying Jupiter hub for I have no TAS, no grad students, no sis admin's I, do absolutely everything these courses take at least full time, but just for the instructional part, and so I have zero time to mess with DevOps during the quarter. I have to have a solution that truly truly just works and that I'm spending a very, very small amount of time, setting up each quarter.

E

So what we've gone with for this scenario is an animal based deployment, we've encoded all the choices and the constraints for this type of usage case as ansible configurable ansible roles. I'll show you the github repository where all this sits in just a second this.

E

These ansible roles were originally developed by jonathan, frederick who's, a core jupiter contributor in recent times, I have sort of taken over that, and then it's now that it's merged into the Jupiter hub org on github min and a few others have started to contribute as well, and we've probably deployed myself a half dozen servers now over the last year. Using this approach- and at this point the new server build time is less than an hour honestly.

E

The most complicated part for new people using this is simply getting a domain set up and getting SSL certs like that's, typically getting DNS setup once that's done, it goes really fast.

E

Originally, we were using the default to boot to Python and it was taking a long time to just build and install Python packages.

E

Man, what's your sense of how long a build is taking now with caught now that we're using Conda for Python in this approach, probably closer to 15 minutes, okay, so you're, basically looking at a new server from scratch, and that is once you have a an instance up with a fully qualified domain name, DNS setup in an SSL, cert setup, so so very quick turnaround time. Some technical details we're using the sub process.

E

Spawner that uses just play: music, UNIX user accounts that we're using the github Authenticator that maps those UNIX users on to the github user names, there's an engine X proxy in front that does SSL termination and also serve static files related to the notebook server. And then we assume that you're using a real, signed, SSL certificate either one that you've obtained usually through purchase or we have a configuration that will just use let's encrypt. So we try to make this as easy as possible.

E

The persistent services that need to run are run using the supervisor package and then that's not condo its Conda for Python packages and with a occasional usage of pip for non content available packages configuration there's a configuration, a yamo file that allows you to specify the sort of that the remaining choices. So you can pick a path that you'd like to use for home directories. You can enter the information to configure Google Analytics to configure New Relic monitoring.

E

You can set up user accounts, a white white lists for system administrators for instructors, graders regular user accounts.

E

You can list a set of users that want to have a public key based, SSH SSH access to the server for sort of admin usage. You can specify a list of mount points for the servers.

E

You can call idle servers and configure all of that and then optionally set up a public HTML directory for all users, and so, even within that, overall constraints, there are still quite a few choices and we've encoded all of that in this yamo config file future work right now we're just running a Python 3 kernel on this, and so some effort would be required to add other kernels to this setup. We definitely plan on following the Jupiter hub services work and most of my deployments have been done on Rackspace, using their bare metal servers.

E

Part of what is working really well with that is that the one that's cost model of those bare metal servers works out fairly well in terms of overall RAM number of cores, but I also find that the automated backups available on Rackspace are fantastic within a few minutes and their web user interface I can configure backups to run daily and it works very well Ben. Have you ended up trying this now on Amazon or other services? No I've only.

B

Done on Rackspace, since that's what that's, what our accounts are set up with I'm.

E

One, a major major.

B

Factor on the deployment time is the two: these factors are generally networked downloading and packages to install and then SSD disk access, so Rackspace on the real servers have really fast, really fast network and really fast SSDs, and that's a major major factor in how long it takes to do a deployment yeah.

E

Now that I definitely would advise running Jupiter hub on SSDs, both from the server perspective and also from the perspective of providing sort of robust, fast access to users who are going to be hitting that the filesystem off and one thing I wanted to comment on is and again this is my own personal perspective. Other people here gonna talk a lot more about docker and I spent a bit of time enough time.

E

Looking and evaluating docker for this usage case, and given my constraints, docker would have complicated things dramatically and I think it would have would have turned something that was feasible for a non DevOps person to do into something that you really had to invest a lot of time and that's not to say that III think for other usage cases. Docker is absolutely the right tool for the job, but I think it's important to note that, depending on your usage case and your resources available in the skillset and the time, investment docker may complicate things dramatically.

E

Part of that is the current way that Jupiter hub and be greater architected, there's a lot of additional hoops that you have to jump through right now with the new services architecture, that's coming up in Jupiter hub I'm, expecting that a lot of that is going to improve dramatically, and so it may.

E

We may eventually get to a point where, even for this usage case, docker is a good option, but I think today, given the current model, it's not the best solution for this type of deployment, so I'm gonna switch over here to this is the github repository that we have so it's on the Jupiter hub org. Do you better hub deploy teaching that the name is? It is optimized for teaching, but again it's.

E

It would work very well for any context that has these basic constraints, even if you don't want to use and be greater all of that's completely optional. The main configuration is done here in the host of ours and so here's the ml. Now you can set the home directory for users.

E

There's admin users: here you can set up just sort of the regular Jupiter hub users, there's n be greater configuration and then things like mount points, the let's encrypt information, there's a public HTML directory, github user names to have SSH keys installed, calling available services, Google, Analytics, New, Relic, the Google, github ooofff, and then this right here I, don't think we're using any significant right right now.

E

So, basically, it's a question of making those decisions, editing this file and then pressing just running ansible, and so far this approach has worked very well, given this particular sort of set of constraints.

E

That's all I have to cover at this point more than willing to answer questions about how we're going about this I.

B

Did anyone make one comment about the doctor, complicating things, I think I think it's specifically the envy envy greater moving files around yes, really do be loved in general, occurs off from the easiest way to do of it, but for the integrator stuff Gawker gets it gets in your way, quite a bit and part of the point of the services. The services in general and the cheering service in particular part of the point of that is to make em be greater on darker, easier, yeah. Absolutely.

E

Sorry, it wasn't quite clear about that, but that's definitely true, and it went part of where the an on docker approach starts. To really be painful is, if you want to start to add other kernels yeah, then docker really is start to be fairly attractive. I think.

F

This is a quick question. The new services architecture I've heard mentioned a couple times. Is there any online your documentation for it that we can go? Look at what's playing I mean I, there's some some bullets on the roadmap, repo about it.

F

I'll take a look.

E

Other questions.

C

Quick question about the UNIX accounts: how do you guys provision those are those from a pool of kind of temporary accounts, so.

E

On this particular server, the assumption is that those are truly persistent, so I mean you should now that there's a couple ups, and so, if you're, not you so this deployment, you don't have to use github OAuth in that, and in that case, when you list users, we actually create those UNIX accounts and the the name of the using account actually matches the user name of the user.

E

When you using the github Oh off approach, the users account name matches that of their github username they're their third, so they're not in any way temporary counts. It would be, for example, perfectly acceptable to deploy this on. The system for users already had persistent UNIX user accounts. That would that that's a very consistent with this model, and it would work quite well well.

C

I guess my question was the other way around. How do you do you have to provision those accounts.

E

Ahead of time know that that's there's configuration options in a couple different places and they're already set up correctly for this, and if the user accounts don't exist, they will automatically get created with the right options. Now. That's, for example, why you need to specify the home directory, because one of those user crowns are created will make sure that they're created in the home directory if they don't exist,.

G

There's a point of: can you hear me? Yes, as a point of clarification when you said new kernels, the support for new Letten for new languages with python being the default? Is that what you meant by new kernels yeah.

E

So right now, users only have Python notebooks Python 3 notebooks as an object if they wanted to use R or scalar or Julia or other languages. We'd need to work a bit more on the ansible roles and.

H

E

So, typically, the first day of class, our classes are not too big. Typically below 50 students. I will typically run like the day before class. Put out a Google survey or I. Have the students enter their University username in their github username I before class, create the add the github user names through the UI and then in class I? Have the students just verify they can log on and debug any options at that point.

E

I

Do you have issues students putting in the wrong? Is your name's cuz I? Did that too, and I had a lot of issues of that? Yes,.

E

Student students will enter the wrong usernames and in any possible way. You might imagine whether it's a different case and and so that that is something that I always have to go over. The first day, like I, basically walk around with my laptop and talk to every student, we're in a lab type, setting and I just say: hey: can you log on? If not, let's debug right now and it it is yeah, yeah and part of it is like I, think experience.

E

Github users are familiar enough and know like yeah that this username is going to be looked at by lots of people and so I want it to be sort of a consistent case, no weird characters, etc and I find often times in this type of context. Students are creating github user account for the first time, they're using weird cases they're using special characters. They basically things that tend to cause problems.

E

Part of the issue originally was that the the regular expression that constrains github usernames is quite different from the regular expression that constrains UNIX user, past user names and in general github is more liberal than default. Unix user name conventions I think we've worked out all those bugs and are setting it up so that basically any valid github username should work.

E

There may still be some edge cases we're not covered, though.

I

John, did you have something you wanted to add venir, oh.

H

Yeah, no, that was quite helpful. Hey.

A

So I think we should probably continue with Jessen. If there's more discussion, we will leave it from. We have a little bit of slack because we had one cancellation but we're eating into our slack so I'd like if that's okay, just to continue and thanks a lot Brian. Okay, yep.

A

You want to share your screen. You have the floor. Okay,.

I

All right, can you see my screen? Yes,.

A

We can see it perfectly and it's here on the recording so you're good to go.

I

Yes, okay, so I'm gonna talk about I, guess, like sort of the next step, so Brian's class size. You know, I was like. What's than 100 I'm gonna talk about scaling it up to about 200 students and we did use docker and so I'm gonna talk about like sort of what challenges that set aside, and you know what challenges is satisfied and also what challenges we had to overcome with that.

I

So the problem for our particular class, which is taught at UC Berkeley, was that we had 200 students and we didn't want to deal with the local installation of the notebook for all 200 students. So we decided we wanted to use a hosted solution like Jupiter hub, but we also have this constraint that you know like Brian's class students had to have persistent files and they had to be files on a real file system. But you know unfortunately, like 200 users means that all of these years probably can't be on the same computer.

I

So that was one particular challenge. We also had the constraint that students shouldn't be able to see each other's files, and then we wanted to integrate with services like MB greater, so this is sort of like you know, an overview of issues that we had to solve this deployment and so I'm gonna break. This talk up into sort of two parts.

I

The first is on how we actually scaled out this deployment using docker and then how to actually integrate Jupiter hub with services like and be greater and I'm going to. You know been sort of alluded to some of this stuff in his talk, I'm gonna talk a little bit more about that. This is not the official service API. It's just. You know if you're building your own auxilary things, how you can do that so to start out with scaling up with docker.

I

So the first challenge that we wanted to address is this issue of isolating the users from each other, because there was just to work on their problem sets independently. So we didn't want them to be able to see each other's files, so the solution that we had for that was to use docker, because then that gives each user their own nice isolated environment.

I

It also set us up to solve challenge 2, which was to actually scale it up to the full number of users that we had and like I said, that's too many to fit on just a single computer, and so we somehow had to scale out Jupiter hub to go. You know to be distributed across multiple computers, and so the way that we did, that was to use docker swarm and the nice thing about docker swarm is that it kind of just acts like docker.

I

So for each of the node servers that you have, you run a docker daemon there, and then you tell swarm about those node servers, and then you talk to the docker swarm application. Just like you would talk to the neural docker application, and so you tell it to start up a container and it'll pick one of the node servers and start up the docker container on that one for you, rather than starting it up on the local machine.

I

So it's actually your like a really nice interface and it doesn't really require very much overhead beyond just using regular doctor.

I

However, so using docker- and this is sort of you know, one of the complications of docker that Brian was talking about is that you know with docker, like even if you're doing it on, even if you're, using docker, if you're using it on a single machine, you can just mount users home directories to the docker container and that's fine, but once you start in distributing across multiple machines, then you need the files to be the same, regardless of which machine just the students server is running on and you know there are options for doing this.

I

That would probably be simpler, so you could like store the users notebooks and maybe like a Postgres database or something rather than having them be real notebooks.

I

On a real file system, but for our class we had the constraint that we wanted that to be a real file system, because we, you know, we had exercises that involved file I/o we wanted to use, em be greater, and so we needed it to be a real file system so to solve that we used NFS and so to take a look at sort of the big picture of how this whole deployment looks like we had a few different machines. The first was the hub server.

I

We also had the node servers and a file server, so this is, if you've seen me talk about this before this is actually slightly different from the way that I've talked about it before, because you know with my Rackspace blog post and the talk that I gave at side PI last year. That was for last spring's class, but we also taught it again in the fall and I made some changes to the fall deployment, so so for the hub server.

I

Of course, we had Jupiter hub, we switch to using the Google Authenticator, because Berkeley uses a Google app, so we just were able to allow students to use their berkeley ids, and then we use a system user spawner to talk to dr. swarm. So the system user spawner, is part of docker spawner and it inherits from dr. squatter. But whereas dr.

I

spawner assumes that you're swapping these darker containers that are, you, know, sort of like ephemeral system, user spawner will mount like users home directories into the container, and it's meant to be used for when you actually have system users, so that talks to docker swarm and then docker swarm talks to or it starts up, the notebook servers on your various node servers and then to make sure that the server's always have the right files.

I

We ran NFS and one of the changes that we made for the fall was to run the file server as a separate server. So we had our NFS server on the separate server and its job was just to host the files rather than having them be hosted together the hub server. And then we ran the NFS clients on both the node servers and the hub server, and so, in addition to this setup, we had a bunch of also like auxilary services. So we had this KO'ing service that men mentioned. We also had a stat service.

I

We had the form grater, which is part of Emme grater, and then we also had Co stress for the Jupiter hub database and the nd grater database and these services. We also ran in docker containers. So everything that you see here in blue is in a docker container, and so at this point you can see it's starting to get quite complicated, and so that leads us to our next challenge, which was how to actually manage everything.

I

And so we also used ansible, but also we used it with the tool called docker composed, which basically allows you to just specify a configuration file with all the docker containers that you want to run either the names of the images or a location where the image should be built. And then you can specify all the dependencies between the docker containers, all the environment, variables that you need and et cetera, and it makes it much much easier to orchestrate something like this whole thing.

I

When you have everything in docker containers, if you're using docker compose it's just like one command, and it starts everything up for you, so that was made it actually a lot easier to deal with this complex deployment.

I

Let me just check him during on time all right all right, so that was sort of the big picture of how we actually scaled this app with docker, and so the next part I got to talk about is a little bit more about these other services that I talked about. So we had this culling service, which the dogs are calling services to shut off single user servers if they're idle. So if the student has an access, a server in awhile be shut down.

I

That way, we don't. You know just end up having all of the student servers running all at the time have we had a stat server service which kept track of who is using Jupiter hub one, and this gave us a sense of like the load of the system, and then we have the form grader service, which is part of MB grader, um and it's a web application for grading Jupiter notebooks. So the way that these three services work in conjunction with Jupiter hub is well there's a few different things that you can get.

I

So the first question is how you actually talk to Jupiter hub and then mentioned this. You can use the REST API. So to do this, you first have to get an API token, so you run this on the command line. Jupiter hub token, then I'll give you a token that you then use to access the API. um So you can do, for example, a get request. This gets the hub, API users, so get your list of users, and then you pass in the headers with authorization with the token that you cross on the command line.

I

So this will allow you to access the API in the PDF that I have in the github repository I said that the API Doc's are undocumented and then been linked to them in his presentation. So I changed it really quickly. I'll change the PDF in the repo, but there are the the docs for for that API and both the stats and the calling service use. These use this API. To do that thing, and that's.

J

I

They do they just pull Jupiter hub with a that API and they don't do anything more complex services like a form. Greater, however, are much more complex because they have to do two things. The first is that they need to run behind Jupiter hub, so they need you better, have URL and then the form greater. We want to authenticate our instructors to be able to access the form grader and not allow the students to do it.

I

So we also need to do authentication to Jupiter hub so to get the URL you can, so the lights are going off in this room that was like there. We go okay, so to I to get your Jupiter have URL. You run your service, you know whatever you are important, that you want and then to get the URL.

I

You don't actually talk to Jupiter have you talked to the configurable HTTP proxy, and so you maybe do a post to the the proxy URL at four eight zero zero one and you tell it I- want to create a route for my service.

I

You give it the the proxy API token which you had to set when you started the proxy, and then you tell it where your service is running at and once you do, that you can access your service at the same URL, where Jupiter hub is running, slash my service and you can do whatever route that you want. It doesn't have to be my service, it could be, you know like, so it could be like any greater slash like class 101 or whatever.

I

So the config report see documentation is there. It gives some more details on what you could do with that and then then the most complex part as I mentioned is authenticating with Jupiter have, and so you have a set of users that you want to be able to access your service. But- and so you want to ask you better, have whether those users are or who they say they are so to do this. You again get an API token from Jupiter hub, and then men wrote this hub off surface.

I

So this used to be a lot more complicated. Is it still is quite complicated in end be greater? If you go and look at it, but with the sub on surface, it should be much much easier where you basically can use this on this hub. Auth object to connect to Jupiter hub, ask Jupiter hub: is this user authenticated with Jupiter hub and then, if they are, then you can use in your application. Just say whether or not that user is allowed to enter your application or not.

I

So, as I mentioned the example use case for this is the form grader, and so that is mostly that's the whole of my talk. I have a whole bunch of links here. I just will go through them really quickly main already linked to the docker spawner and the system user spawner. There's the Google Authenticator that we used and the reference deployment with docker.

I

These next links are for my deployment and they're out of date. So if you try to run it, it's not going to work, but it might still be useful for it just for reference, so our whole danceable deployment is there in the first link and then the docker compose template that we used. Is there a second link?

I

We have and then I have a separate repo for each of the docker images that we use. So there's the one for Jupiter hub, the one for the single user servers, blown for stats, calling and form grader, and that's it thanks.

A

There's thanks. Maybe if there's one quick question and in order to try to stay on schedule will hand it over to Ryan. If Ryan you want to start getting ready. But if there's one quick question for for Jess and then we'll take longer discussion later in the.

H

So I've been kind of trying to decide between a setup where I would use a load balancer on AWS VCSU and actually use auto scaling versus a setup where I would use something like dr. sonner, and you know, part of that decision is actually moving. The database in the file system off of the Jupiter hub dr. instance, and in two separate connected services.

H

So I'm curious like why it is every piece of documentation that I see about this say use doctor spawner as opposed to use standard things like auto scaling groups with a load, balancer I.

I

Don't know if I'm really the right person to ask that I I don't have actual professionals to submit experience, so I don't have too much experience using the hood balancers myself at the time when, when we set up this deployment, docker spotter and soccer swarm seem to be the right things to use. It's certainly possible that using the traditional load balance there might be a better option. I haven't explored that so I'm, not sure yeah.

H

It just seems like the the as far as complexity goes, you know, there's so many there's so many tools for both of these things out there, but like if you're, spinning, something up by ec2 like you, can just send an auto scaling group up and it all for you, as opposed to something like dr. swarm, where you actually have to manually, settle these settings and set up these servers and point things to them.

B

Darger's form also take care of it all takes care of all of it for you in the sense that I have notes- and it just puts things places and.

D

20 years, both to especially with some of the newer doctor tools.

G

H

Also in Kristin ECS setup right, like yeah, there's, there's so many options now and with a limited understanding about Jupiter hub is actually working on your lipid. It's difficult to point like. Oh, this is the best way to scale it.

B

Yeah, so Jupiter really doesn't care hub, says: hey spawner I want a server. Tell me where tell me the IP and port, where it ended up, and so literally anyway, you have starting over the network.

K

G

Question for judges, how does the out-of-date status of several of these links affect our decision, making.

I

They shouldn't affect your decision-making, I I, wouldn't recommend like using those things out of the box anyway. That's that's our specific deployment for our class, and it isn't it is. Is there for reference basically, like I, think it might be helpful for people to take a look at that and I know. Other people have based deployments off of that, but it's certainly not that will work out of the box for someone and is very complex so like unless you're trying to do something very complex. We probably shouldn't use it yourself.

I

So if being out of date is you know, like things like, there are like some URLs that have changed like diversion. Jupiter have has change. The version of MV. Vader has changed, so we did the cost in the fall and I have haven't updated and stuff for the next iteration of the process. That's why it's out of date, but does that answer your question.

G

Yes, thank you.

A

I suggest we hand it over to Ryan and and people, maybe log in the back of their brains, longer questions for the later discussion section. Thank so much Jess everybody. Can you hear me? Yes, we can we're not seeing we're seeing a blank screen for the moment so to try to share your slides and I. Have them for back up here. If need be. Okay, that's perfect! You're you're on the on the screen, so, okay.

J

Great yeah, so my name is Ryan Lovett I'm in the Department of Statistics here at UC Berkeley, and we deployed Jupiter hub for a new class here: foundations of data science, it's cross listed in Department of Statistics, yes and School of Information.

J

It's it's unique and that it has these. What are called connector courses so there's a main class and then there's a series of of other classes where they, they students apply the skills that they learn in the main class to discipline specific projects. So there's a one main instructor, but then several connector instructors and the students are coming into this with no knowledge it well in, in some cases, many cases, no knowledge of statistics or computer science.

J

So, like chess, you know we didn't want these students to have to install a bunch of software on their devices and cheap rub. You know, makes it easy to to give the students the tools to to learn the first semester or this class was taught. There was about a hundred students and we ran Jupiter hub on a local cluster and Department of in the East Department and the the next semester.

J

We had about 480 students and we chose to deploy on ashore because we didn't have enough compute at the time to continue to run on the local cluster.

J

We also have we use are the local clusters, as a fallback, to the main instance on the shore and then in the coming semester, we'll have about the same number of students and we're also planning on hosting it at ashore in the future spending, but who you talk to and when there's going to be anywhere between five hundred fifteen hundred students taking this class, so that relates to scaling which I'll talk about later. So in our deployment we again we have the same constraints as just, and so fortunately we, you know she did everything.

J

So we just you know copied her cut models deployment. You know they're there. As she said. There's there was a lot of configuration which was particular to her course. She was hosting at Rackspace, so we ended up moving it in development, so yeah we didn't use any grader and we tested on AWS. That was the first place that we tried deploying, and it took me a couple months to learn about all the particulars I wasn't familiar with. You know docker and ansible, and you know spawners too, and all that stuff.

J

So it took a while to learn to learn about everything and the first semester on the local cluster.

J

It was fairly uneventful in terms of issues we ran into on the second semester on Azure, we did notice some spotting errors, so, for example, users would say: I got I saw an error on the hub, some HTTP 500 error, and so we would need to go into the admin interface and stop and start the users server for them, and there was some variation there and we were documenting it all and we sought help from from men and he got straightened out there.

J

We had students fall back to a low, the the local cluster that we were running the previous semester and it had a different storage. Each instance has its own NFS server. So we had some complexity that we introduced to ourselves and in getting the students notebooks synched, with with each stupider hub server, as I said, min fit a founded fixed.

J

All of our are spawning issues such that we ran a short summer course on our Shore instance and use, and when we upped it in to the latest Jupiter hub, 0 6 1 again, we haven't seen any issues there. Any of the spawning issues.

J

As I said for me anyways. He there was a learning curve to learn all about the the tools ansible docker, swarm, etc.

J

My background so I I'm a sysadmin, so you know one could say shouldn't know more about these things, but there's so many tools and the the interoperate and in so many different ways that it was.

J

It was interesting. Learning about the whole stack and in deploying Jupiter hub I found that it's very important, particularly since I got it going on on AWS and then a local server, and then your in terms of redeploying it each time to make it reproducible. You have to really pay attention to the versions, and you know the the tags.

B

J

Each particular.

J

Component, so you know, if you, if you say you're, you know you're using ansible and you're, using the ansible container to deploy. Well, you need to make sure that you know you're using ansible 1.9 instead of 2.0, depending upon how you set things up and everything that you specify in your docker file and you're in your requirements, your condo pip, etc. You should really I found it's useful to to keep track of the versions there so that it can be reproducible we're looking in the future.

J

So we we had a group of students, doing learning about Jupiter, hub and and helping us with sort of corner issues and and we fit one student has started to develop a system for stress testing the hub with selenium so next semester we'll pursue that a bit further continuous integration, that's something that I think people are doing a lot of places.

J

So that's something we need to be doing more of I need to be learning more about to make this whole process more seamless, more reliable and, as it was mentioned previously, there's the there are these reference deployments, which have come out and I'm a look at rebasing our deployment on those to reduce the.

J

To reduce deployment specific stuff that we have, that we don't necessarily need now that these reference deployments are up. So one thing that was important to us to the instructor John de niro. He wanted the ability for students who were peruse the the course a text to be able to click on a button in the in the text and and have it appear in the Jupiter hub, and so that's the that's. The way that we are distributing notebooks into the students accounts. So we wrote this. A student wrote this service we're calling interact.

J

It's a hub side service.

J

We stole some of the code that just has an NB grader and some other places how to implement a hub side service and so that when the students click on the link, the link specifies where, where what github repository the notebook can be found at and the hub service pulls it down into the students account it'll also start up the students server. If it's not already running the downside, is that not all of these connector course instructors are familiar with git?

J

They may be history instructors or English instructors, and so there they may not have training and how to use gifts, so we're looking at other ways that we can distribute notebooks to our students. The other issue is what happens when you know these notebooks change that are being distributed and when the students make local changes, how to how to sync them up. They postgrads contents manager, I, think it's I, think it's a cell based notebook storage, so one could alter some cells and not the other I.

J

Don't know if well I'm not going that route. Well, we have looked at most recently as Jupiter Drive, so that would protect us to some extent. If we did have multiple hubs, he gives users the same storage by the Jupiter Drive Berkeley integrates with Google Apps, so everybody already has an app berkeley.edu, Google Drive. We just brought this online this last week, I think and we'll need to do some testing to see how how it operates. In terms of you know: local local data file, access sharing and then that sort of thing what happens anyways.

J

It will just require some some testing on one interesting thing is that we can potentially integrate with the Google Drive UI, so somebody could have a notebook in their Google Drive and then they could say open with and it'll open on our on our hub. So that's something with that. We may explore the one question that the faculty have said.

J

That is important is how do how do we get an IP and beta can be hosted anywhere on the Internet into our hub, I mean we could we could pre populate these notebooks on the system ahead of time, but it's important that they could specify a URL or some such thing and automatically easily import it into into the into the hub. So men's bullet points about content sharing.

J

Will we're definitely going to be paying attention to all of that? Yeah I.

C

Think we'll be good to have a way to pull in big notebooks into the hub in a really seamless fashion. Right now, it's a bit of a clunky approach.

C

A

Not the hub thing.

C

J

I'm sorry I can't hear.

A

Fernando is yeah. That solution is actually quite seamless. It has the constraint that as soon as the content is on github, but it's a very good starting point: I've seen it from the user side and it's completely seamless and actually because it's it's just your L base. If you realize oh I need this other thing, you just go and hack the URL and give it a path. These are the references on get. What is they call? It's called I think you're on the Internet I'll.

J

Send you Payson to the cha-cha, maybe the that we're actually looking at redoing some things with it version 2.0 for this semester, but the previous version is pretty straightforward: it's a tornado! No, it's! It's, not a tornado app, it's a what it is, but because of this new hub side service feature, that's been created recently we're gonna rebase on that and provide a little bit more feedback for when the get pull takes. You know a while. We need to provide some some feedback for that, but it yeah it works.

J

Well enough. Let's use well thanks sure, so, because we had multiple instructors, some of them. You know we're fine with the single user, docker container environment, others wanted. You know they request additional libraries to be installed, and- and so we needed to be able to find a way to enable those faculty to to deploy that's opera to test it and because they say a single user server doesn't give students the bill to become root. It doesn't give the instructors ability either.

J

So we, basically, we we've derived our system user containers so that if people want to, they can run it in. You know on their local computer, using docker, beta or I suppose even any version of docker, but so that they could pull that system use container down, have the same environment as in the course and but they can install libraries and and try things out, I'm sort of thinking about having it giving people the ability to choose which image they want to run on the hub.

J

So a student may only have one choice on which container to run but an instructor. It might be handy for them to be able to start the student, the student type server or a variant that gives them more privileges, so I'm playing around with this thing, I'm calling image spawner, which basically just gives you a form to choose which witchdoctor image you want to start up and.

J

So scaling so here we're thinking about how to support in the context of the data science education program.

J

We have a number of these connector courses, but a lot of people are starting to pay attention to the Jupiter hub and what it can provide people and so we're starting to think about how we can support multiple courses and do we, you know, have one hub with multiple spawners or do we have one one hub one course type of arrangement how to scale computes?

J

Do we stick with swarm or do we look at kubernetes, UV and a-- who's at wikimedia might have chat with him and he seemed very excited about what kubernetes can do and its ability to scale on to multiple nodes. We want to be able to look at and right now we're we're we're pre pre, creating nodes rather than dynamically, creating them on demand.

J

So it's a bit wasteful and we're. You know, there's this question about how well jupiter hub itself scales. I I think we're. Maybe one of the larger deployments out there on the in the Jupiter of Docs. It says that Jupiter has four modestly sized groups of users, so I don't know if we're modest modestly sized so I think we need to start collecting the data about.

J

You know the how long it takes container to start up how long it takes to you know for user to login and whatnot, and so men and UV have added this nasty tooling, and so we want to start collecting that so that if we do have some sort of issue and we can call out for help so that will we'll have as data to show to the chupa turn up devs. You know that this is what this is. The problem we're seeing- and this is how the hub is behaving future directions.

J

So, right now we're just gonna keep on going with the hub, and if we have more users were you know, we're gonna see if you know how to deal with it will it will try to knock down the issues as we come across them? There are these newer. You know notebook as a service kind of efforts emerging Microsoft has in notebooks. That is your comm service there. We don't have to create the infrastructure. You know that the docker the docker swarm, etc, but we don't have control of the server side that Jupiter hub gives you.

J

You know we can run these hub side services and it requires that each student have a have a Microsoft account basically already, and so we can't tell our users go sign up for Microsoft before the course starts.

J

Calmette may have our local windows team may have some help for us on that front and then there's other services like my binder and that's helped faculty.

J

Do some development in terms of customizing their docker images if they want to try out some something I think min had a talk at in London and where he proposed. The notion of you can create a spawner that operates in the same way as finder and that that kind of feature may be useful.

J

So I was talking really fast. So if I've covered everything, but the links we have here are the last one is the textbook by John, De, Niro and Annie Adhikari, and the first two are information about the data science education program. I didn't go into too much detail about our deployment again because we pretty much copy just tweaking it for the local service provider and that's it.

A

Thanks Ryan, no, that was great and actually timing wise. We basically 88.8 into all of our schedule slack and we're already three minutes technically into the ten minute break. So if there's, maybe one quick question otherwise I would say: I would stop the recording to give people a chance to take a quick break and then we would resume in a few minutes. If.

G

I can ask for quiet real, quick question: what did the English and history people use with Jupiter notebook yeah.

J

So that's so I'm not heavily involved in the pedagogy John, De Niro, but I think is. This may be able to chime in more, but they have interesting projects where they're they're using data science, history I think they were doing I think it was an evaluation of the slave trade English there's certain things that can be ascertained about. You know you know word as simple as words in a book. You know what what information you can extract from just the raw.

J

You know the wrong words that can be found in any piece of literature. John are you? Are you still here.

H

Sure, but it's hard answer, but there's lots of discover so yeah I- would.

A

Add that the short answer is to go to the data at birth, you know, do you link, and you can see that if you click on the Quran, the connector courses, you would see that there's actually a number of courses that are in the in the digital humanities, there's a and and in other areas there's a data and ethics course there's a literature and data course there's a history. How does history count course?

A

So there are a number of courses in the of connector courses in the education program that come from the humanities and the Social Sciences feels that traditionally haven't been using these tools as much, and so there is a fairly active kind of ecosystem here at Berkeley of courses in those areas and the education program is open. Being the foundational course is open to all incoming freshmen, regardless of their declared major, and so it really is aimed at a very broad population.

A

Thanks so I would say that that we should probably take a little break right now and, let's start again in ten minutes, to keep the break to be actually ten minutes I'm going to pause, the recording thanks, Ryan I'll, restart the recording afterwards.