National Energy Research Scientific Computing Center (NERSC) Jupyter Community Workshop June 11-13, 2019, 11 Jun 2019

Previous Meeting

⏯

youtube image

►

From YouTube: 6. KBase and Jupyter

Description

June 11, 2019 Jupyter Community Workshop talk by Bill Riehl, Lawrence Berkeley National Laboratory

A

My name is bill: real I work here at Berkeley, National, Lab um and I'm gonna talk about the do a systems biology knowledgebase, which we use the jupiter notebook as one of the main front ends for for users to come in and do their analytical work. So at first what is cave a soak a base is a knowledge creation and discovery environment, so I'm for both biologist and bioinformatics ist's. So the more comprehensive version of what that means is any any vial that you're any biologists.

A

Anybody who has biological data can come in into the cave. A system upload their data, use our resources to do a number of different balances, pipelines on that and check out. The results interpret those results document that interpretation and share, including the the notebook that they would use to do.

A

This analysis, work and the data itself other users, so the short version of kind of the the the pipeline, the the the workflow that one would use is user would start with data about and in this being, a Department of Energy project in a Berkeley National Lab project joint with a number of other national labs.

A

As focused around environmental interactions, so environmental biology, microbial biology, plant biology and Ike Rovio communities, so users can come with that from GE user facilities or even things that they get garnered from NCBI up front uploaded into our system, and that's not just as simple as throwing data files that they have on their laptops into a web page. So what we mean by data integration at this step is kbase is built around a pretty strong, centralized data model.

A

So if somebody comes in with the series of reads that come off of a DNA sequencer, it's not just uploaded as a file but transformed into a reads- object: that's stored! That's strongly typed and the reason for that strong typing and I'm not really gonna have time to get into this too detail.

A

Is we really want to build this as a knowledge base, so a way to react, data of different types, and that comes from different experiments and from analyses we integrate all of that data together and try to build knowledge and try to be able to build predictive biology out of all that, it's pretty lofty goal, and after after some time of getting getting the basics down, we're we're finally making headway towards that. But I won't have much time to get into that today.

A

We can chat in the breaks if you like um anyway, once data gets uploaded and integrative. The next step is to do this collaborative analytics. We call where users can come in use, apps and tools that we provide or even provide apps and provide their own tooling to kabe a system to do analysis of data, and the way that this is done is through a fairly heavily modified version of the Jupiter notebook.

A

Apps themselves are what we call apps, which be kind of executing a Cell. These get run through a job execution system, we're using HD Condren.

A

All of this again, all of this together I just want to reiterate the the data that a user's uploaded and transform to me to the data model, as well as the notebook itself and the analysis itself, and even the job status and job documents themselves get bundled together and become themselves a shareable unit in kbase. So if I upload things and work on them, I can share them with Shane or with Rowling or anybody here who can also, at the same time perform different analyses.

A

Take notes in that and I'll be alerted to see when those are updated and we are. We've also recently introduced a way to build up user groups and the more broader, not just within the notebook, but throughout the whole system, way to manage data and work slowly. Judging for zhen shan, mr before and I know, there's other people struggling with the concept or struggling with the practice of being a truly fair data share organization.

A

um As mentioned, we have a series of apps that can be run on a gambit of biological data. We have about 200 right now um and it's it's very much. Not a closed system kbase is a very open platform, so anybody who has and any app and the external app or anything that they want to be able to use on our resource or integrate with our data model. There's an open, SDK and that's available to use. It's really made for the community a little bit about the the architecture behind kbase.

A

So our interface, which we call the narrative interface, it is built on the Jupiter notebook good binds to the data and the apps and the analysis together behind all that core services. The main data service that might be of interest is sitting on MongoDB and that, on top of itself, on top of Apple App Store, where the larger chunks of data like meta-genome reads installing are stored as well as user and reference data.

A

Interestingly, we also store in the notebooks themselves, not just as files in the system, but as units in or and where do they wind up in the blobs here might be in the blobster.

A

Next bit is the the execution engine, so the once a cell is clicked or a cell is run that contains an app in it that gets fired off and sent to the execution engine which runs asynchronously and alerts the notebook. When it's finished so then the notebook will update itself from developer interface. I also won't touch on too much, but we do have a set of SDK tools for adding your own apps to the system and just it's pretty open and free to use.

A

So you could just plug it right into the app catalog and then it'll become available for anybody who would want to use your app and I'll do a brief demo here. I think some of the concepts make more sense to really see it in action. So I have one up right now. This is the tutorial I have on the jdi's meadow genome assembly pipeline that we have wrapped up as a single app in cádiz and there this is a Jupiter notebook. It's pretty heavily modified.

A

We have a different set of templates that work on the front end that give us things like this data panel here. So this is the set of data objects that are associated with this. This narrative um there's an each one of these is itself tell you what it's data type. Is this one's an RNA seek alignment a paired end library reads: opening it up will give you some options and tell you quite a bit about the metadata.

A

You can also drag and drop on here, which I want you right now, so I don't want to eat up the network, but that will also that will just automatically create itself actually yeah I'll gamble, let's gamble so create this. Will pop a cell in place that once this loads up, it will show you a little bit about what that data object, is and give you some details and I just want to emphasize a little bit we're not making up any new cell types here. These are. This is a standard Jupiter notebook.

A

It just has a different interface on top of it. So, even though this looks a little bit different, it is really just a different code cell that gets executed and shows the the result below, which is, in this case, as a JavaScript, widget oops.

A

Scrolling down a little bit when the user gets to the point that they would want to run an app there's, what we call an app cell, which again is just another code cell, and this just gives a different interface really for a user to create code. So if there's power user- like probably everybody in this room, you can just enter code directly and execute it, but a number of our external users that are pure biologists or pure bench viola districts.

A

Para mental biologists, aren't necessarily interested in writing the code themselves, but so we also provide an interface that this cell becomes very well aware of what data is available in this narrative. So you can kind of pick and choose what what I want to run on for my inputs decide what the output should be and then just hit run, that'll execute um and as the couldn't show style. This would be the result of that. So there's two other things that become active here.

A

So the job status gives you some clues of what the job is doing itself. So this is really just the raw log that comes off of the execution server and the final result of that job would show open another tab here.

A

In this case, this is a report on what happened after running the the meadow genome assembly and finally, these these objects that were these new data objects that were created, don't just live in the narrative itself or don't just live in the notebook but become serialized and live in kbase data system and become available and pop up over here.

A

So that's the short version of that.

A

Finally, so what's next for us, is that was all living on the Jupiter notebook right now and actually a little bit of an older version of that as well, but we're all pretty excited about Jupiter lab, especially coming out 1.0 very soon. Congratulations guys! So one project that we have in mind is we want to adapt what are currently a series of env extensions into the series of droop. Your lab extensions we've started that work, and some of it will be challenging, so I might be bugging some of the Jupiter folks that are here today.

A

We're also using we're also transitioning over to using Jupiter hub for detainer management for notebooks um for various other things yeah. We have. We have a custom system right now that that spawns notebooks, if every notebook or narrative, probably stick with the byline narrative that you see in the narrative interface is a docker container, that's running in for each individual user and we'll be transitioning that over to be using a tooth, you want to hub keep tripping over those.

A

And finally, the next thing that we want to do is just more focus on running jobs in batch, so connecting using Jupiter and using a lot of the tools that have been brought up today, all ready to connect job running with the HPC access we have here at nurse and apps that we have that are in the pipeline there that are really designed to work on larger clusters, rather than on just a couple of notes, which is the majority they absolutely have.

A

So that's the brief introduction, the kbase.

A

Partly I think it will be well there-there's. A few things we've talked about, so one is one of the great things about Jupiter. Lab I think is that everything is built in as an extension, so we can really build a flavor of Jupiter lab that will look more or less like what the narrative looks like now. um Every I mean we already have some progress on um I think I. Have that open my little act together version, so this is.

A

We already have an extension that has here's the list of apps that we have here's the set of data that's available in a given notebook, notebooks aren't really grey skin. Yet um you have the UM the file browser links to our data store in Albany, so it's already kind of on the way towards looking like that. What is different.

A

Your right is a lot of the other branding components, but my feeling is that Jupiter labs sensible enough that that will be quite a bit of work, but I think mostly will be doable in the world also. This is also an opportunity for us to update what our look and feel should be, and we've learned a lot of lessons the past few years of what works. What really doesn't.

B

Design brief off to the lab.

B

B

B

C

Yeah TVs there, maybe also on that other project, how many people actually falls in the adaptation that.

A

Are actually working on it right now, yeah.

D

A

One and a half of us at the moment it's this is pretty early stage.

B

I mean or any other groups doing these kind of be. You know you girl, have extension kind of efforts. That might be another thing.

A

That be another fun breakout on Jason.

D

B

A

Just go ahead and write a threat off.

A

Thank you all for letting me.