South Big Data Hub Data Sharing & Infrastructure Group, 11 Nov 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CI WG demo: GABBS (Geospatial Data Analysis Building Blocks)

Description

Date: 11/11/2016
Presenter: Carol Song
Institution: Purdue University
Midwest Big Data Hub

A

I'd like to introduce Carol song Carol is a senior research, scientist and director of the scientific solutions group at the Rosen Center for Advanced Computing at Purdue University. Our current research areas include distributed computing, advanced computing and data infrastructure and science gateways carol has been reading produces HPC program since 2007 served as a PI of Terra grand resource partner project is currently the PIO produced partnerships in the exceed project.

A

She has a leading role in a number of NSF funded interdisciplinary projects, including data interoperability by team, SDC, eye-fi and, most recently, a CI on a digital initiative project geospatial data analysis building blocks, Carol received her PhD degree in computer science from the University of Illinois at urbana-champaign and I person excited about this because GIS is my area. So thank you very much excited all right. Carol. Take it away, okay, hi everybody! So does my slide.

B

A

B

Okay, great because I'm always confused by my two screen setup, so I'm, going to give a quick overview of the gaps. Prada gap stands for geospatial data analysis, building blocks and I probably have more slides than I can talk through some of them. I just includes included them for benefit of the people, to have a copy so I.

B

When we talk about the gap architecture and give some examples and talk about where we're at so I guess, listening to Kenton I a CEO, there were probably all going through the same things as we were working with a lot of researchers over the years. We did a lot of these type of applications and most of them were custom development. You know just putting even just putting datasets online and do the interactive gis-based interface in a way that everybody can access it takes months, if not years.

B

So it came to a point where also you know this is the type of year that we went into a lot that somebody scientists they have models, they need to run simulations and they update their input parameters and input data and run multiple times in so these are sometimes. These are pretty capable scientists who can do these computing, but Co they're. They have number of the issues, for example, as they said well, a lot of my laptop I can't do other things and they lose track of.

B

You know how many ones it's done and where they put the results and what they use for their parameters, and especially one that you is maths and things like that, and they need a different set of skills and also when a lot of people came to us when they start sharing their tools with other people, and they said well, you know how do I go about this. There are a lot of aspects involved and finally, they have a paper published.

B

They said you know how can a reference the software and connect set it up to so other people can run. So we get a lot of those. So with that, we really that that we can see the project gap and the goal was really. The overarching goal is to lower the barrier, make it easier for people to be able to visualize geospatial data and realize spatial data in their models in their whatever computation they're doing, and also do it in open-source community driven fashion.

B

So mortgage done in a project is ready to build a geospatial, enable of integrated, self-service collaboration platform.

B

So this is a different approach from what Canton presented we're doing this end-to-end a platform for people to work together and obviously the results were hoping to see us broaden the participation in GIS type of data analysis in quicker dissemination in to be able to support some of the needs in the classroom, so our user community, primarily basically anybody who needs to deal with spatial data, especially sharing and making them available for other people we're specific to this project. We had scientists that we're working with from hydrologic modeling climate impact.

A

B

People will deal with large amount of weather data and disaster related data products and we're working with social scientists as well. So our approach is really building on an existing platform. So many of you probably are familiar with hook bureau, which has been funded by NSF for many years and initially started providing a platform for sharing computational tools connecting to HPC high-throughput in in recent years.

B

Clouds and all the other resources, and also provides an environment for groups, project to collaborate in more and more classes and out of classroom training are using this platform and so that's kind of what we're building on, and so this give us a baseline that has the virtualization cloud-based execution engine behind the web server and also connecting to high performance and high throughput and cloud resources.

B

And on top of that, we're add a list project in particular, adding geospatial data and compute stationed on building blocks to allow people to do these things on their own without a lot of programming. Without these knowledge about GIS, and so all of that will be part of a healthy role eventually and to support our variety of application and needs. So that's at a high level a few words about the award that was back when the 13.

A

B

Awards and we are basically doing development and also providing local us.

B

Excuse me they're here something: okay, all right, so developments and services to the community and I have the website there that TOC for a project so I'm going to skip these lights because they have a lot of things. The main point I want to illustrate here is all these orange blocks if the new things we're adding to hub 0.

B

Of course, a lot of the hub, Europe stuff aren't shown here: the CMS, the authentication, access control, all the best stuff that you come to expect and we're really focusing on building the rendering engine for the geospatial data, and so you see from the other CI projects that we're leveraging IRAs, koba's and so on and I'm not going to show that one. So the specific goals, the toolkits for people to quickly put together the GIS based application and to provide an integrated data management environment with built in geospatial data support.

B

So for people who have done these things developers, they know that there are a lot of pieces involved and so for other domain scientists with no research groups to do these things a tremendous barrier for them. So we want to build us into N and built-in support for the geospatial data and also data visualization builders that allow people to quickly visualize their spatial data without doing any programming or minimum in the goal also is to provide a production system where people can come and use or if they want to set up their own.

B

We have various ways of providing that, so we do have a VM that almost ready for download and start using right away. There is also the help 0 open-source software packages, people install in we are actively working on AWS instances, so people can just click up on one button start a zero Amazon instance right away, ah see so I'm going to give some examples for the first three I think just so.

B

You have an idea: what's there, so the first one is ready, the toolkit, so these are libraries or toolkits in the back-end support that people can leverage. For example, we have. This is one group that we worked with and they have this go hold data down scale applications that they, you know, have a screen for their model setup and they run the model and the results are displayed. So this interface you're looking at so all these map, rendering and the controls.

A

B

The site, so they don't need to do any of that, so basically they drop in a widget through of library. This is a Python base, map, library that we created based on QT, ice and and there's another example with weather data if it goes to a big database to access the data based on the time periods, people select the variables in the geographic features that they want like in this case is the something like Indiana in doc.

B

District boundaries shown on there and the users can also interact with the map with the cells on the map and do the plot of the time series contained for that map for that cell.

B

Other things the rapture is a original case came out of nano hub, that's a very quick way for people to lay out their without doing programming, much programming, and so in this case the the model was set up and with input data with equal parameters, the output, raster dataset and that gets overlaid on a map at the end in business and interface that the application developers only to mess with, and it comes with it and all these controls that they can use throughout the out of rapture.

B

And you can do anything tweak all the layers and transparency and all that so that those are examples for the toolkits were putting out there already available on production site and as far as entry and spatial data support. So basically doing all these things on this list search metadata extraction and all that automatically as much as possible and and we're also providing data services. So to link the tools in datasets in vice-versa, the metadata extraction we're using.

B

We created irods micro services for doing those type of things, and also to put in solar index or search and here's an example of a workflow. So people can create a project area for themselves, so basically a two fields, quick, fill up the forms and start, and then they have their project space and there were different kind of storage providers they can plug in right. Now we have the iOS that has the geospatial support and you can pull up the 485. It's just like a file explorer.

B

You can pull up the metadata and edit it save it or you can do a quick preview of these multi layer, geospatial, dr. dia or raster data and.

B

So for these files, if the type is associated with certain tools, you can do things like open with a particular tool. It could be. Multiple tools in this case is something called multi spec.

A

B

Handling and processing, hyperspectral multispectral, remote sensing data or other types of data, and so without really leaving you know going to another place. They can just open the tool and the data appears in there. They conclude manipulations there and save it back to the project area, and there are other tools that allows people to combine different data sets into a map and with layers they can control and the lot of times.

B

The final step is that people are ready with their data set after they work on that they start a new publication, and here the second screen shows that you can select the files that goes into this data publication and eventually, when you push publish there is some approval that can be set up, whether you want a human in the loop and then people guess it's too small here, probably but there's a yo I here or people to cite the work as the data set.

B

So that's kind of example workflow as we call it from end to end other examples. For these we call it geo builder, which is a tool for people to show off their data or share their data without actually doing programming, or example,.

A

B

And these guys they they took this device. They they were on a punching both taking measurements and about wind, wind related this data directions, wind speed and all those related variables. I came out as a spreadsheet, about twenty to thirty thousand lines of rolls, and so for that we were able to just import the data and right away. They see how they went, and then they can select all the points along the way and then see these a different cloth of variables.

B

They they selected to see and be able to share this type of interactive interface to with other people and the example showing sighs I said again, just something: that's there that people can use right away. Yeah I got.

A

Waiting for words, that map is that the intercostal waterway, um what.

B

A

Wait I just out of curiosity, where is that map from is that the intercostal waterway, um no.

B

That's just a mass I think it is sort of a compound stream.

A

B

Something is Florida is one.

A

B

Those rivers yeah.

A

I corrected if it looks familiar.

B

Okay looks familiar yeah, I.

A

Think it's called I just.

B

For the Melbourne area.

A

Yeah, that's the Melbourne, such as Delta, where I grew up anyway.

B

A

I keep going okay.

B

A

Right so we're also using.

B

These tools with students, so this, for example, this was a camp that summer camp that we did as part of a summer camp. We did some lessons using tools that students can use to study in a holiday camp from the remote sensing image to calculate the extent of the areas of flooding, and so you can see the picture.

B

They were pretty focused on trying to get the numbers and feel that they they were able now to when they see a satellite image they they can say something about it now so, and these are also part of the things that we're doing with now.

B

Okay- and so this is not ready right now by this- is what we're working towards. So a data publication looks like this right now, so you have area to describe your data you can put in snapshots. You can also see what files are there. You can actually check within the bundle here what we really want to do, or in it's in the works.

B

Is that not only you can download the data set, and you can also view any for so do quick previews of things like this or doing more extensive manipulation to end interaction with the data? So these are the things that we eventually want to be able to provide.

B

So here you know, only thing I want to mention is that we are working with other projects to build interoperability. I arrived we're using it, so we have a lot of experience in lessons learned and we're sharing that with the IROC team.

B

We're also working with hydro share to be able to publish resources across the two library infrastructures also on during launching a to launching from one side to another and we're right now we're using the IROC spews from where we're still working from the issues out in terms of launching tools on each other's side in golbez were using it in in various tools. We're also looking into whether we can how we can make that more integrated with hubzero brawn dog with you know, talk about various things.

B

It seems to be a good match, some that these are easy to do like using some of the conversions data emerges, mrs. we're more interested in this data showing to become certain information. So we don't have to do that part ourselves.

B

Let's see what else I have I just general, so we have this website called my geohash.org, and that was supporting a number of large projects right now, but individual users can also go there and just use it for free. We have. We have been releasing various pieces on this website as we go throughout last year in our our first release of facial. First release is coming out in December and between December and next year, we're going to have incremental releases and also, in addition to hosted services.

B

We also are going to put out a W anted instances and also open source release, offer packages and right now the site has users at 6 1 s, ik solving users per year, and we have a lot more visitors which is come here. Take a look at users actually use tools or downloading data sets and so on, and there are many things that we would like to work with through places like big data hub in in other groups.

B

Obviously, this kind of work takes a large team. I left my team picture out, but it's roughly about 15 people, including professional staff and students, and you can find out more information on these websites Lee. That's it awesome.

A

So claps questions for carol and carol. If you can hold that slide up because I'm going to enter your information into the hack sets that people can reach you ok.

A

We again do a still on the phone. If you would like to add some sauce.

A

I was muted here, we're very interested in again trying to do some of these operations remotely.

B

Right so yeah, that is one direction that that were going. We meaning the whole have Bureau kind of enterprise that so right now currently helps your users. Openvz containers, but I know that the hubzero team is running soccer, I, don't know it whether it's production yet but I, know they're testing that, because there are a lot of requests for running things remotely and to, for example, to deploy some tools closer to this, where the data resides so I, think that is the direction we're going.

B

It's doable at some point.

A

Very good, ok does anyone else have questions we still have maybe five minutes before we go on to the next speaker.

A

Mike, could you maybe comment on how this fits within the ecosystem? We were talking about today, sure and I'm, hopefully behind on being able to address it honestly, um but I think there's a very nice interlocking of all the dibs projects, and you know so. For example, we have we double Hydra share, would equally be interested in the capabilities. I think that are in Dad's I, don't know if there's any hundred people in the room, but we were just talking with Ray of dodging about Hydra share the.

B

Machine previous.

A

To this, and so you know, we really kind of taken the perspective that um in combination all of these various data nets and divs provide a really good basis for the technical architecture of the hub's Apollo and since they are working on container izing Gans at some point, you know we're wanting to get into how to essentially productize linkages between these different comments so that they are deployable twice as a stack that you know a cafeteria stategy, so everything that's been discussed sound like is very much pointing towards that I'm really interested in as I think is rebound pretty, maybe hinted, and um you know, and that's what we're talking about with brown dog is like.

A

Where does the computation take place? It seems like it's a really big question that everybody is trying to solve right and through what sort of mechanism whether something is like high-throughput or something have to run in some. You know parallel. You know sort of great communication for things like coastal hazards or speed right so now, I'm rambling, but those are just the point I heard and whether anybody can react to that or it makes this a neighbor. Does anybody have any thoughts they want to add.

B

So one one point: a one comment: I would have this that, when Hydra shear we actually sketched out a way to before we can containerize everything, not everything container right things. We thought of a an approach using IRAs to.

B

Federate in some fashion and get the data over for reasonably sized data and to be able to be able to launch tools that from Hydra shear, so I think we're kind of waiting on something I forgot exactly what, but probably on either fuse or NS and SS in the IRA. Something like that so so we're. Hopefully we we get to something, at least before the container version for sure.

A

But you're kind of actively actively working on that with Hydra share right now is that like sort of fair, yeah yeah because we're you know week week, so we can kind of matrix in by working with harder share on our side, and maybe that's way to make. The end is all kind of complemented by the.

A

Okay, excellent well, given the time constraints and I know it's now getting late on a Friday afternoon, so we're going to move on to hi Thank You Carol. That was cheery.