Status SWARM Orange Summit, 1 Aug 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Model, express and connect data from traditional relatiional system to RDF

Description

In this presentation from Day 3 An interesting take on Swarm use came from Ameer Ahmed in his talk “Ontologies for structured data in Swarm” where he presented an approach to categorising structured data such as museum catalogues.

A

Hi everybody, so this is a two-part presentation. The first part is gonna talk about essentially my experiences in taking data from a relational database systems to to graph to specifically to linked open data. So so the my part will talk about the experience on how we did it for a particular industry, specifically cultural heritage sector, and the second part will be by today. We will talk about how the application of ontologies can be used in in swarm and data form.

A

So just a quick background, so I've been in this space for about 14 years, specifically with artstor ethica in the mellon Foundation I started off with CDW a light schema. This is this is a schema which describes how to describe a work apart. So we, the problem, was that they have, we have museums all around the world and they all have heterogeneous data sources and different systems.

A

So the goal was to create, with the help of Getty research, define one schema, one ring to rule them all and using their money go around the world and install software and harvesting software and harvest all that data into one global repository. So we started off with with the Metropolitan Museum. We converted them to this standard, and that was our initial trial after that I think some people from Germany museum that specifically they liked our model and they would do they ought they took the I?

A

Guess the leadership after that and decided to change the project and it's called it was called Lido which then again continue to work with getty to go around the world implementing standards for museums. So you have universal access, so the the problem back in those days were there were there were no systems to catalog art history.

A

So the the idea of shared shelf, which is a concept of allowing to catalog images and media for academic institutions, came into play. So we took that proposal and we gave it to the mellon Foundation and they accepted it and we ended up creating shared shelf shared shelf. This is still all background for the talk I'm gonna talk about so shared shelf is a it's a platform which allows you to catalogue objects of art using not only standard text, but you have vocabularies specifically vocabularies, which are provided by by Getty.

A

These are the Souris sorry relating to relating to artists and works and places and concepts. I'll talk about that in a bit more detail later on. So we started this with about seven big institutions, academic institutions and right now, they're about I.

A

Think it's 180 yeah about a hundred 80 institutions across the world, now utilize ship shells. So just to give you an example, this is a record of Abraham Lincoln in shared shelf. You can see you have titles very entitled different languages. You have the Creator who who actually you made the sculpture again. This is your authority you're linked Authority. The linked Authority in this sense means that the data is curated.

A

It's curated and comes from it's all referenced by spotters, so it's the the authority so similarly, so this the schema goes down which contains a bunch of authority records and people utilize this to catalog.

A

So while we were doing that, we were working with Columbia University and they had another project in mind which was built works registry now built works. The this they claimed to be the biggest curated data source for built environment, so they comprised of about 42 different source collections around the world. The picker I think strategic partners from India to Italy to America, so we had a very diverse set specifically relating to built environment.

A

So using that project we thought. Okay, we will use this project as a guinea pig to expose the data in the linked open data environment. So a quick review on how we we started this project with our partners, part institutions. We converted the data into a format ingested it into shared shelf, the the BW are staff links it to various authorities and then finally, they push it out to the public. Who can then utilize that record again?

A

This is all for scholarly purposes and the typical example here being so an either an institution could contribute data or an individual user to contribute data. So an example here is like someone is researching. Mayan temples defined a particular image that hasn't been cataloged.

A

They uploaded to the community to the local layer, the editorial board, reviews it and finally approves and publishes the data in the community space which then can be utilized by people like Liz, who was a high school student, doing research on Mayan temples, so this is this is just a workflow of an individual versus an institution on how data is contributed to this method. So, yes, all of this is just a setup for linked open data.

A

What is this well, this is it's a cluster of data which the goal here is is to make data openly available in a particular format and to provide links among various data, so it becomes machine readable. As of a few weeks ago, there were about 1200 data sets. It started off in 2008, with like a few hundred when we were doing it. It was like around 500 to 750 and now it's about 1200.

A

So how do you actually enter this linked open data? Club I mean this is kind of very basic things. Well, you need to use your eyes to to define objects.

A

When you look up those objects, you need to provide in a data in a particular format, specifically rdf here and finally, you you have to link your data to other people's data, so discoverability goes up, so these are the four principles that you have to follow. So how do we join it?

A

Well, going back to what I showed you before shared shelf was our main platform, so we created an ontology on shared shelf and then we map that on top ontology to our data schema and then we expose our data through a semantic portal, essentially endpoints to access that data. And then you register the data set. You register your endpoint URL and then you also provide your entire data dump to the data hub IO linking well. We have to obviously link this to make it work right, so we have to identify.

A

Who do we link to get IVA cavalry? Is here obviously the number one choice because they're not only funding the project, but but it's a extensive vocabulary used in art, history for for naming for Geographic places and other links like geo names and dbpedia, and how do we match them? Well, this also happened in phases. The first part was quite simple because we had the ID of the record, so it's creating a very simple triple with taking that ID and saying is the same as this ID.

A

So once you have generated the link between one entity to another, you then provide those links to the ontology owners on the other side, so they can also ingest that data on their side and they can reference back to you.

A

This is a quick review of what what the XS D is so over over here in the middle. We have a work record again. This is how the catalog objects of art. So this is your container model, and this is your display record the example here being. If you have like a shot to Cathedral or Eiffel Tower, there will be your work and then the various views of that building or object, we'll be your display rendered. This kind of this is fixed.

A

This is somewhat flexible, meaning we allow you to model in any way you want. So in this scenario, that's a part hold relationship between these two. So so this is our starting point. Essentially right. So we have our. This is your relational system. We have our data in this format.

A

How do we convert it to go into LOD? Well, we there's a transformation process between this XS d. That I showed you into an ontology right, which is a shade shelf ontology, which we are going to use. Once we have that conversion we can generate instances based on data. This is just some examples on how we actually converted the X is D to its counterpart in the ontology I'm. Just gonna fly through that and that's an example ontology the same XS d that I showed you before right here.

A

So it's a different view of it, and this is your display record. So again, the concept of work record and the spray record is is, is key to understanding these components so an actual live example. So this is your shade shell system, but display record kind of very flat. As you can see, you publish this there's an ID right there, you publish through the website website is, is using an Omega plug-in. Essentially so it's all automated and then it publishes into into this website. You can obviously visit.

A

This is all open and once you click on the semantic view, that's where you see that data in its RTF format and from here we can actually visualize it. So you can see the project is built works through registry. It contains Wall Street building, which is linked to New York record and then from New York. You can see other records which are linked to New York and then going over to to the Getty vocabulary. You can then actually navigate that asaurus using broader terms narrower terms.

A

You can go to New York County or it can go down to Brooklyn. That's again, it's a classic way of mapping to thesaurus and then using the thesaurus to discover data in your own data set.

A

This is just a quick overview of again all the various components happening with ch itself, so I mean it was a it's a fairly large project about 100 people worked on it over the course of 10 years and their various tools which came into play, and this is the one section that we are talking about today. So this allows you to export data into a linked open data environment.

A

So, over the course of a year we we did about 8 releases. We try to do one release a month, but it didn't really work out that way, but it was essentially taking baby steps. You know taking something starting something very small, starting with with a simple flat dataset, which is a display record and pointing to a local instance of a vocabulary and then ending the year with with full-blown HL support. What does that mean? That means now any type of recovery that you connect to.

A

When you publish your data, your data actually goes into the linked open data space with links to all the different cloud sets, and one final point I want to make here is: we also allow the project itself to become a vocabulary, so that means you could catalog and then you wanted to. You have I, don't know a list of temples that temple could be the temple vocabulary, which could then be opened and used by other people.

A

I'm gonna talk about briefly about data enrichment, just how that process works. So imagine you have very simple life of taro text as the title location is Paris. You bring this into your taking the export of this and import it into hello. Do you find an order? You're fine is a it's a project. It allows you to work with messy data and also has links to various vocabularies in the LOD space. So we take that data and we define a particular data source in this case with dbpedia and the thesaurus for Geographic Names.

A

Once you reconcile, you can see it, it picks up a match to that resource, and then we go back to that resource and get additional information. So once you've made the link to Paris in tgn, we can get all the variant names. Actually, sorry, that's the wrong one.

A

Once you've made the link to Eiffel Tower, you can get all the various variants Michel tower I mean this is just one example, and once you've got all the variant names you can upload that data back into shape shows so that allows this is I mean this is the use case of using linked open data is to be able to search your own data with other people's information.

A

And this is just one last diagram which talks about how we actually make our project as a controlled vocabulary, but essentially what it comes down to is every single project can be deployed in this environment and ends up with a with an endpoint. So then, you can share this endpoint with anybody who wants to reference your project and and that's it.