W3C OpenActive Community Group, 17 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenActive W3C Community Call / 2020-06-17

Description

Improvements to RPDE

- The harvesting model has limitations
* Difficult to gauge progress
* Requirement to harvest from the beginning of time
*Cannot query selectively
- Pagination data
- Parameterise the RPDE querystring
- Movable first page

A

So hello, all uh welcome to the w3c call for uh 17th june, and the topic today is going to be possible changes to the rpde uh specification. um The motive for this is pretty clear.

A

It's that the harvester model that the rpde pattern facilitates is pretty easy on data publishers, it's fairly simple, which is nice for data consumers, but it does kind of rely upon swallowing all of the data in one big chunk, which can take a fair bit of time and often means that data consumers end up with a lot of data that they might not necessarily find relevant or interesting for their purposes.

A

So there's a few proposals. I wanted to go over just for streamlining rpd a bit.

A

Oh hello, charlie.

A

uh So hi charlie uh thanks for joining us, I was just saying the topic for the call, generally speaking, is um improvements to rpde to make things a little bit easier on data consumers.

A

um So I'm just going to share my screen here. So we can take the presentation.

B

B

Hello. Sorry, I'm sorry, I'm a bit late.

A

No worries no worries um for once I managed to start recording, um for once I managed to start recording uh before everybody joined the call, so uh we're starting on time for once. um So, as I said, uh rpg the problem is um that it's fairly slow in the sense that you have to get all the data before you start working uh with it at all um and then a particular annoyance simply on my part, but I suspect, for data consumers as a whole is actually that it's very hard to gauge progress.

A

um So it takes a long time and you also don't have any sense of how long you've got remaining so depending on the feed you're harvesting. uh It could be that you're going to be done harvesting in five minutes. It could take a matter of days in some cases of the larger feeds.

A

um It's often uh the case you end up with a lot of less than relevant data could be data in the past. um There's no requirement to delete obsolete or irrelevant data anymore, so it can be the case. You end up with a lot of stuff way in the past.

A

um It's not possible to query rpd selectively for say only a particular geographical location, particular activity type or whatever. um So we're just going to go over a couple of proposals for streamlining things. A bit.

A

The first is a proposal that I made just yesterday, in fact, and I see that nick has already commented on it, which is simply to include a little bit of pagination information in the response to our pde requests.

A

Just so, you've got a sense of the total size of the feed, so just highlight one new data attribute proposed for the response, which is just the total number of items.

A

So in this scenario, the client would be responsible for keeping track of how many items they'd already processed, um what their progress was like, but at least they'd have a sense of where the the end position was um nick commented on this uh pretty recently um pointing out.

A

First of all that this doubles the query load in the sense that you have to make an additional query on the publishing side to support this. uh Some kind of count query indicating uh precisely what the number of remaining items would be and then. Secondly, this ends up invalidating. Caching, of course, because if that number changes, then the cache needs to be refreshed, so the efficiencies of caching with rpd would be lost. Under that scenario,.

A

The refinement proposed uh yesterday by nick uh was to put this in the data set specifications. So when you looked at a data set site in the json, there would be um an indication of the total number of items per feed.

A

um I guess before continuing on with this discussion, I guess I was wondering whether uh luke tom or charlie had any further thoughts on the proposal as it stands.

B

uh Nothing from me, although it's probably bordering on my technical ability.

C

D

Yeah, I was gonna say this is a a little bit over my head um from a technical standpoint, but it seems to make sense logically.

C

um Yeah sorry I was gonna say uh I don't think that uh definitely uh the points that nick has raised about um optimization um because it would change the sort of performance and nature of the of the query. Quite a lot, especially relating to edge caching.

A

Right, okay, um I guess the difficulty with the data set proposal. Is it's not really envisaged that the data set site as it stands right now has to actually read the rpd feeds, um so it seems like the technical mechanism for populating the total items. Property is a bit unclear to me.

C

Oh, I can help with that. So the the libraries that we currently have for the data set site generation um are all it's all dynamic, so you give it some properties. I I guess it's designed dynamically primarily for the use case of the kind of white label solutions.

C

You know where you've got like a gladstone, where you've got lots of different types of customers and they've all got their own data set sites, and so because it's all dynamically rendered right now. If your data set site query as well as querying the database for the organization name and everything else, it's growing for also queried to the total number of records which may be cached, um then that would that would do that and then, for example, in gladstone right now, the dataset side is rendered from the database and is um cached.

C

I think it's 15 minutes. It's cached for um both on the um on the the server is, is caching in a memory and then it's cached using the um using a cdn. If there's a cdn in front of it,.

D

C

But with the 15, I suppose that's the key thing: it's only cash for 15 minutes, whereas some of the pages are cached for hours or days.

A

Right, okay, um so then, the only time that that becomes a problem is if you've got a feed that takes a long time to consume. The number of total items actually could change significantly over the consumption time.

A

So you get that you get that sort of inaccuracy if you're caching. I guess I guess. If, on the client side, you could just keep pinging the data set site for the total items, property.

A

Kind of weird but yeah, okay doable um better than better than ruining. Caching, I suppose for the for the feed itself.

A

Okay, um then, that seems like a fairly uh simple solution um as long as as long as the pages are generated, dynamically and the client can keep requesting that json object and total items is updated on a reasonably frequent basis. Then, um okay, great some things- are easy.

A

Okay, I think we can just move on from that point then, um to the to.

B

The relief of charlie and tom it sounds like um I was gonna say my tim, my only question because I I don't like to not understand things, although I probably will regret asking, is uh what what's the impact and which is the nice general pointless question, but if I'm, if we're playways, has no intention and doesn't take any uh direct feeds from any booking systems, so we only we only intend to have it really use the um I'm in feed on our own. What does that concept it have on on impact in that?

B

Is it more of an I'm in I'm in peace with the booking system provider? Okay, we're one of those as well, but I'm just trying to understand what it's? What what yeah, what the impact is.

A

Yeah, so I most of that would be hidden from me because I'm in is doing that harvesting work. If I understand the flow right.

A

I suppose, if you're consuming yeah sorry because uh luke and nick um I'm in offers an api integration right, so rpde is not something that um charlie would have to worry about. On his end,.

C

A

Okay, right, okay, so yeah impact impact on you uh zero. In that case, awesome, that's what I like yeah.

B

A

So it might, um it might help I mean a little bit in the sense of of planning out how long it'll take to consume a feed and that kind of thing, but if you're, if you're sitting behind that, uh it won't affect you at all. It's fine.

B

Good answer tim, I like it.

C

Thank you. Thank you. Well, the other benefit is it doesn't break anything that's existing because we already need to go around and update everyone's data set sites when the new spec comes out anyway. So this isn't going to add any additional um lobbying effort or um otherwise to changing our pde. If that was the thing that we need, it would yeah it doesn't, it doesn't add any more to than already there.

B

Yeah, that's great in the fact there's no, no need for breaking changes or similar to make an enhancement to improve as good development costs saved. Yeah yeah awesome thanks.

A

Okay- uh and I suppose also in terms of workflow- it's easy that the dataset site specification still has to be written. So it's easy enough to add that line item in there. um Okay, so I'll migrate that issue over to the dataset site specification repo, then um I don't know if I described the next proposal um in the best possible way. I don't think this one needs very much discussion either.

A

To be honest, because looking at the thread, there already seems to be a lot of consensus around this, um but the proposal is essentially and nick- please jump in if I'm mischaracterizing this to allow harvesting to start not from the absolute beginning of the feed, but essentially from now, meaning that you can start harvesting only opportunities that exist in the present or future rather than having to pick up all the ones that have existed in the past.

A

Is that a fair description? Nick.

C

Yeah yeah, that's right.

A

um So then, the only point of debate was really about which approach to use um the difficulty is a little bit technical in that to create that capacity to start harvesting from the moment that the query is launched. Actually because of the way the specification is written, that would drop the first item in the feed.

A

So the question is: how do we make sure that we get all of the events that we need to.

A

The first approach changes the query slightly, which makes it sort of a fairly deep change to the specification and or rather the guidance surrounding the specification, there's a few more slightly hackier kind of approaches to allow that retrieval to be made without altering the query quite so radically.

A

um But, generally speaking, it seems like looking at the comments. Everyone who could be bothered to comment um seems to be keen about the first approach there. So, even though it's um a fairly well moderately significant change to the query itself um approach, one seems like it's getting all of the votes right now.

A

um And I'm having a hard time seeing what the disadvantage is of approach- one, um it seems like anything else, is kind of slightly unsatisfactory.

A

Nick was your was your worry that implementing systems would have to make a sort of deep change for this to work.

C

Yeah, it was just because it makes the the actual um makes the query a little bit more complicated, but I mean, as I suppose it says, it's written as simply as as it can be there with the um with the query uh with that extra line where in the where clause.

C

um But I suppose it's just because that's the that query is the bit. That's that's most often done wrong and because we haven't got um because of the way that we test our pde, which is just checking the invariants. It's going to be quite difficult to also test that this is done right um without kind of the test harness, for example, of the booking spec being enhanced to add. uh Well, I suppose it already. It already does do this, um so so using the testing of the booking spec.

C

You could do something like add an opportunity and check that it comes through um and and then check that so the check that this all works, but because of the the kind of weird edge case about that first item to actually check this properly. You'd have to really um you'd have to know the first item in the database right to be able to or or artificially insert the oldest item in a test suite and then check it came through.

C

So it's just a real gnarly, like you know, to actually validate this works, given that the biggest problem is with the query, um but I think all the all of these involve um some form of query changing. uh I think so. I suppose it's it's just uh kind of yeah, the lesser of the evils.

A

Yeah and it's sort of it's sort of inherent in the in the in the goal. Isn't it? The testing becomes a bit more difficult um yeah, because it adds a variant and it sort of can't help, but at a variant.

A

Okay, so yes, I think maybe what I'll do is I'll just add a note on there. That testing is an issue and we need to. We need to make sure we've got that covered in in the future, but I think this should probably just go through, as is because the the benefit of it is significant.

A

um You know, particularly in the larger feeds. I can imagine this. You know taking processing time down from from hours down to minutes, yes sure yeah. So that seems like a really really valuable addition.

C

Sorry, yeah no, no carry on. I just remembered the reason that we didn't do. The other stuff is because of the string constraint. uh That's right. So it has to work for every every use case, because there is a simpler option available if you've got an id, which is not a string which is one of which is what some of the other approaches we're talking to. But it's, I think it's fair to say a lot of people use strings as ids, because they've got guides involved. So.

A

Yeah yeah um and then there's even assumptions being made about the ordering of the of the ids. Isn't there on that one as well so yeah.

C

Feels like you're, not out of the woods, even if you do have a.

A

Numerical right right, yeah.

B

A

um Okay and then the last one I think, we've actually already covered in that. I think one of one of the frequently voiced difficulties with rpd is that you can't query it basically um that you can't say I just want everything in this particular geographical area or something like that. You have to download everything and then slice it up yourself.

A

um But again this just runs into a caching problem. Doesn't it.

C

Yeah, the yes, that this is exactly it, the second that you start adding um any types of parameters outside of what you've already got in there. In fact, even the limit parameter in the in the rpd specs it stands um doesn't really work for caching, um or at least yeah it. It creates problems. So you could, I mean a good implementation.

C

Could just ignore that parameter um and- uh and so that doesn't, which is what a lot of the big big um high scale ones are doing, just override it with whatever um so yeah, so anything outside of just where you are in the paging, which changes the pages um just yeah radically increases the number of permutations of the data set that you're able to download and then um undermines all the load, the um the load management that you can do using the cdn at the moment.

A

Right, um I feel like yeah, it's tricky. I feel like we're a little bit imprisoned by our caching, um that we need everything to be sort of as static as possible so that we can cache it.

C

Well, I guess I guess here's the thing right if we would, if we wanted to go down the road of of of closing these endpoints down, so they weren't fully open and having api keys on them and everything, which is what um I mean because of the cache thing that we're able to have all these endpoints fully open and unable to you know, we've had several challenges: um legends sports suite- I guess anyone who's looked at this for scale- has kind of gone really into the detail of you know: does this cdn strategy actually work?

C

Does it actually protect my servers? The answer is yes, it absolutely does as it stands because of the constraints we've put on the uh on the on the endpoints. um If you wanted to do anything else, you would, I think, it's fair to say from all the feedback that we've we've had. We would need to start adding api keys and the challenge as soon as you start doing.

C

That is that you get into situations where people who have opened the data can start to be very selective about who they decide to make the data available to, and then you end up in that slippery slope towards a bit like we we saw previously with them with where sport suite were before they kind of um realized the benefit of the open, license and and looked at what kind of what the kind of philosophy of open activity actually was.

C

Was you know that you have a form where you fill it out and then they approve it and you get an api key, but they might not approve it. If you know I mean I'm not saying it's a sports week, but the implication is they might not approve it. If you're a competitive organization or if the you know it doesn't quite so you might have an open license um and so and and the odi is you know the view on this has always been that you know.

C

If you have an open license, then actually that's fine, because an aggregator, for example, can just consume that data and republish it an open license. And it's if it's open, it's open, but I think practically in this sector.

C

um We probably want to make sure that open means open and we're not relying on aggregators and others to you know, redistribute the data with an open license uh and have kind of those additional gatekeepers on there. um Just because we want to lower the barrier to entry in the market really, and so um the the solution in rpd at the moment is there's inline filtering, which you can use, which means that you can.

C

um The rpd is designed so that if you want to slice the data set, you can do that arbitrarily based on the data in each day in each payload um of of each item, and um you can do that in line, so you can only load into your database exactly what you need. You need to store everything, but unfortunately you still need to. You still do need to page through everything.

C

But of course, if those pages are cached, then the paging should actually be fairly quick um and you're benefiting from that edge caching to go through and only take what you need from the data set and store it, so that that's kind of I guess it's a philosophy point really as much as anything. It's the reason, this constraints are there.

A

It's interesting because I mean there are: there are open apis um but yeah, depending on on keys.

A

But yes, if the reality of the sector doesn't support that it doesn't support that. I guess.

C

But it also increases the amount of work required for everyone right, because everyone needs to manage, have an api management solution in place where they can manage api key distribution uh manage signups and a lot of these smaller organizations. Like our parks, you know this is a they do the tech work and then, as we've seen, they've only touched it. You know they touched it four years ago and now they've.

C

Just again to update it, um but there's not really a high amount of resource being dedicated to managing access to these things.

A

So, what's what's the fear exactly is the fear, ddos attacks or.

C

uh Well, less details just more just a high volume of consumers asking different questions. Yeah like on the um the like performance angle, um I recently just had a look at all of the. uh I think.

D

C

It was all the session series data, but um all the session series dates that I'm in harvests- and um I think all of it gzipped- was something like uh 50 megabytes or something which can be downloaded.

C

You get an average uk connection in a second, so I think if so so, um at which point you know, if everything is, is edge, cached and everything is included. Then um uh something can should, in theory be able to. You know if it's downloading, just as frequently as it can do be able to get the entire data set very quickly, and then they can decide um uh at that point whether to keep that bit of data or just drop it on the floor based on their own filtering if they want.

C

But um so if, if we were to add query parameters that do geography geography-based filtering, then um that reduces the amount of data. But as far as I can tell, there's not there's not a great amount of data, even when there are a lot of locations, um but you don't get any of the benefit from the edge caching. So the data gets there much more slowly.

A

I'm curious that it only comes to 50 megabytes. He said that's just session series.

C

A

C

Yeah, the gzips to uh with a highlighter.

D

C

Right, um yes, because the data, the challenge with this whole infrastructure, has never been the data volume. It's always been the real-time nature of it, the value of having uh the up-to-date information about when a session is and how many spaces are left and having that coming live from source.

C

So if you wanted to get a static version of you know what what what the sector looked like in a certain point in time, you could probably get all the systems to pull that in a csv and then shove it somewhere and then, but that's yeah, that doesn't that doesn't achieve the objective of creating that real-time view of what sessions are available tomorrow, and so that's. I guess why this this is geared towards making the data like.

C

I guess that's why our pde is about more the real time element of it unless the kind of downloading from the beginning, because the value you don't get the value from this by one download right, the download in itself is great for the first time someone's got it. So I totally see some of the you know challenges around the first time.

C

Someone harvests the feed you've got lots of stuff to download if you need to resync that feed, but the ongoing value you're getting from that as a business as a consumer or whatever is not downloading the whole thing again. It's just the fact that you get to with a minimal amount of work um in the middle of um of bandwidth. You can get those changes and there's actually not very much going on there in terms of when you're just calling for changes.

A

Right right, yeah, I mean I guess my my personal experience is focused more on ah well hold on. I want to capture other details, yeah.

D

This, I guess this.

A

Is the problem with the filtering solution is, supposing you change your database schema? um You then have to re-harvest, meaning you do it all from the top again um meaning you, then you know sync the time again. So it's it's not very agile. I suppose it's good. If you've already got an established business methodology and logic that says, okay, we can just keep on pulling in deltas. Basically.

C

Well, actually, I would argue it's um more agile because you don't need to the problem we've got with distributed querying is that every single type of query needs to be implemented on every single data provider, and so, if you want to add a postcode search and that doesn't quite do do it because of whatever reason you go around every single provider and get them to uplift, their feeds to support that particular query parameter and then, as the data evolves, and so then does the query parameters and, and then, of course, you've got the challenge of versioning those parameters.

C

So if you change the definition of one you've got to go through and make sure you've got some homogeneous. You've got you've, got to figure out a mechanism so that you get the same kind of data back from everyone. So I think the well you could it could. It could be viewed that the current situations as it stands with the harvesting is actually the most agile, because, as a data consumer, you can really radically change the kind of filtering you're doing you. Can you can merge munch the data.

C

However, you want, although you've got to wait for the harvest to happen. The that actually historically in the sector, has not been the biggest problem. The biggest problem is, as we've seen, waiting for systems to implement stuff.

C

So if this is, if what this is doing is giving you basically a fire hose of all the data and you can choose how you cut and do what you want with it, depending on your use case, um then that's optimizing for the thing that um you know it makes it. It might have a bit of slowness in terms of downloading it, um but you're literally saving years, compared to trying to get that data from whatever apis are in the in the you know, or lack of apis or in the systems as it stands.

A

Right, so by removing the complexity on the publishing side, you end up with more data. Generally is what you're saying there.

C

Yeah, absolutely absolutely no, and also um you you're, more agile, because you because you've removed the complexity, you've got access to all the data um and the the approach you you can use, which you can apply to every data set is you know, filtering based on the standard, whatever parameters you're interested in and you can decide to apply new and interesting filtering to that. You know you don't have to wait for everyone to implement the more than uh date field for the end date.

C

Right, like you'd, have to go through and wait for every data provider to do that and the time that would take and the cost for all of those implementations you. You just add that single line of code to your harvesting tool press go, go, have a cup of coffee come back and then see. If it's worked um yeah I mean.

A

My experience is that you uh go and have a cup of coffee. Then you go to bed.

D

A

You wake up check on how it's doing uh go to bed. Again I mean it's, you know the as the as the volume of data increases um and as the kind of experimentation you want to do gets more bold. I think the time overhead becomes more and more um oppressive and certainly debugging is no fun right.

C

um Absolutely- and I guess that was the previous- um the previous issue right around- um making sure that we can, we can limit the amount of data in the feed to that useful set. Yeah significantly reduces that yeah you're right, because, if you're, if you're pulling from the beginning of time from gll you're, looking at literally millions of records, yeah yeah.

A

Yeah so yeah, but I think, as you say, the the the now proposal does get around most of that.

A

But yeah, I think that it shouldn't. We shouldn't, underestimate the the burden that gets put on data consumers uh as a result of removing the burden from data publishers. That's all um but uh yeah yeah. I feel like the the caching is invaluable for a publisher, but it creates uh headaches for uh for a consumer um but uh within within the domain of the possible um yeah. This does do a nice job of um making sure that there's actually some data there's some data published that consumers can use um okay.

A

So um I think this might just be a question that we revisit based on the um ability to start harvesting from now, rather than from the beginning of time, because, as you say, gll has got millions of records, but of course most of those are historical. So we become much more agile if we can, if we can throw away obsolete data.

A

I guess that becomes a question of how do we what's the process for getting implementing systems to make that change to their queries? I guess the first one, as you pointed out nick, is testing.

A

um We need to have some way of verifying that systems are doing this correctly.

A

Is it simply a question of supplying the right tests and then uh communicating that out to providers.

C

Yeah, well, I guess I guess the um the: uh what do we, what was the name of it, um the retention period? So that's that's on the um open, active docs already, and I know that we've been um pushing that with bigger feeds to try to get those feed sizes down anyway, just as part of just general discussion um as they're kind of moving to the next version of whatever they're doing so.

C

Yeah I mean- I guess I get yeah having it having testing in there would be would be useful. um Definitely um I'm just trying to think. If you can. I don't. I think, because of the way that this is fairly transparent to the consumer.

C

It's probably not easy to add to the rpd tests in the validator just to harvest tests see. We probably would need to do this in more of the way that the booking is sweet. Yeah insert a record check the records come through. You know that that type of level, which which is definitely yeah, definitely a feasible, as I mentioned, there's already similar tests in there.

C

So maybe it's just adding a couple tests in there um I mean you could even add it as a feature in the way that that framework's currently built the future being um stateless retention period. And if someone chooses to support that feature, then it goes green and then we just promote that feature along with the other features. Actually, I was thinking this the other day. You know that that open active test suite is actually pretty good.

C

If you don't implement booking because it does do data set site validation, it does full um harvesting of the data feed. I know that um josh in playways found some bugs in the open data feed just by using the test screen. So it might be that- um and this is just an idea I had yesterday- but it might be worth looking at adding validation of the open data pages into the test, suite just doing that as it goes or having an option set.

C

So you can turn that on because that what that then does is means that you've got this kind of full feed download test test harness you can use, um and if you combine that with this feature, then you could almost imagine like a profile of you know. You can configure that test suite with a profile of features which is just open data, actually nothing to do with booking. um So it just covers the open data.

A

Yeah yeah, that's nice, so that that suggests. I think um if the proposal is to add the total item count to the dataset site um and to implement the movable first page that suggests that we prioritize.

A

Getting a specification up for the data set site published and testing against that as part of getting these two improvements to rpde, actually practical and verifiable, um and then the third point about panoramatization becomes much less much less critical, yeah, okay, yeah! I know that's a that's a that's kind of a nice way forward. I think, looking at the test, suite is more than just booking and extending it to cover open data is a is a really nice way of tying that package up.

A

So I think that's it. That's it for rpde um we're well ahead of time. I think mostly because the issues have largely been resolved in github before we started discussing them.

A

Is there any other business? Anybody would like to raise on the call.

C

I suppose there's only um one just uh about the um data set site uh specification is obviously um that's that's something that is being presently. I mean the whatever iteration of that that's there in in the github issues is being used in the test suite um because there has to be something in there, and so I guess it's just a comment that there's, obviously a um present implementation is happening against that, by necessity with the current implementers and the playways is, uh is already underway and others.

C

So um I guess that there might be um in parallel to completing that test suite and which I know we're all is an urgent thing uh to to be done. It might also be that there's a an urgency around finishing that data set site spec so that what things are done, what you know, the definition of done means really done not like diana will have to revisit it when that other spec comes out in like a few months time, that's just a thought.

A

Yeah, no that's a good point and I think it's a bit worrying that the dataset site specification has been kind of. It's existed as a default for a long time now, um right as bits of json floating around in the ecosystem, which is tremendously helpful.

C

Yeah, like I, I even noticed um the other day I just I thought it was a bug actually, but it turned out to be a well a feature or a result of that. um The um the case of content, url or something I can't remember what the um there's a content, url or access url, whatever the url is that's being used for the booking spec stuff, which has come from dcat, actually has a different case of url than the scheme of the org stuff.

C

Oh no, all right, because d-cap uses access, url or whatever it is. Actually, you know I check actually check d-cat's um original spec and yeah. They have url capitalized, whereas nothing in schema.org has url capitalized, and so even I mean it sounds super basic. Doesn't it but obviously that's a conformance thing. We need to make and that's something that now exists by default, as you say, because this is kind of evolved and not really being cross-checked and so um just just kind of uh yeah, even even that stuff just needs to be um well.

C

I guess you just need to make a call if we're going to use decap vocabulary, and I know that um tim, you had a a chat with dan. Didn't you from schema.org about web api and his his kind of or their ambition for that and whether that was gonna fit in.

A

Yeah, that was um that was a long time ago. um The schema or conversation has gone off in a completely different tangent, for reasons nothing to do with open active um but yeah. There has to be some kind of decision about how we reconcile dcat, schema.org and.

C

Us um well so tim, I was going to ask you this question um separately, but maybe I mean given it's relevant to the group. Is it worth updating us all on the schema.org chat and the um and the web api kind of dan conversation? I know this is like it's been on my backlog for like six months to ask that, but.

A

I mean there's not there's not actually that much to say um essentially the web api. Well, the last time I looked, which was a couple of months ago now, the the web api conversation within schema.org had spiraled off into big questions of scoping, um which were kind of outside of our remit. Basically, so there's not a lot to update specifically on us because it wasn't it wasn't like it was not a dialogue of we've got this proposal to make um and then schema.org thought that there were some problems with it.

A

It was more other voices outside schema.org.

A

It felt like web api should be describing a much different range of services um and that conversation was swirling the last time I looked at it, so I don't have anything too valuable to add beyond it's not the right time to be making very concrete proposals with regard to web api, because there's right.

C

Is that um dan's comments in addition to what was in schema.org um uh the content on github, or is that that just I mean basically, is all of what we're saying on github or is there stuff that's context from dan? That's not on github, no.

A

There's everything everything's on github yeah. You can follow that on the schema.org mailing list on github yeah.

C

Okay, oh great, and then so what and with regard to the you said, there's a separate kind of conversation. It's gone off in a different direction to do with open, active and schema.org.

A

Oh, no, sorry not not to do with openactive and schema.org. um I mean with web api and schema.org right got it yeah. Sorry, so I guess the place to look for updates on that which is someplace. That I should look for. Updates is the schema.org mailing list and um repos.

A

It's not particularly our own internal github, repo comments. Those are all those are all fine in our little world, but how much our little world coincides with schema.org's conceptualization of web api. That's a different question.

C

Yeah completely makes sense, so I guess for clarity, then, um is there anything from your conversation with dan? That is not in the github issue. That's worth it. There.

A

C

Yeah out of band yeah everything's.

A

Everything's documented and out there yeah, okay.

C

A

Great no, I wish I wish. I could say we had this great conversation. Here's how it's going to go, uh I'm lining it all up.

C

But that that's not the case. Well, I guess this. Is it it's because it's because the the the thread on schema does yeah, like you said, I think we we did comment a few further up a few times, but it doesn't look like any of the original contributors to that.

C

um So I don't know if you you saw that there's a a separate w3c community group like open, active like this one, that's um for web api, and so, though, and a couple guys on there seem to have on the github repo um put together an initial spec and that's part of the thread, and then it looks like that's there and then there's a bunch of other thoughts, but no one's really kind of bringing it together. It's just kind of it's. It's like opening the diamond up, rather than closing it down.

C

So I guess I guess I was hoping that dan might have had, or someone might have had like a imperative to close it, because it seems like someone needs to go in and be a ringleader there. Otherwise this whole yeah everyone's just kind of right. I.

A

C

A

Mean I think I think the problem is even just the name I think indicates this- that the scope actually has to be pretty wide. um The number of questions that have to be answered is quite high. I don't think open active, particularly we can drive the conversation there if we want to, but I think it's going to be a long conversation.

C

Yeah, I'm just checking it hasn't. There's been no move, there's nothing! There's nothing outside that, be that very amorphous thread going on.

A

C

A

No movement on it since february I haven't, I haven't, looked back in a long time. um Okay, I mean I could. I could uh raise the issue again on that thread and say here's the direction we wanted to go in, but I I suspect that that will be begin that will initiate a very.

C

A

C

So is what we're saying for this, then that the best way to take this forward is to kind of just do our own. Our own thing define our own types yeah given, given what.

A

Was earlier said about the priority of getting this done, I think we defined first and worry about schema.org um alignment yeah. We do worry about it, but it's not the first priority.

C

Oh yeah, okay, that makes sense, so I guess, if we set it up in such a way that um future conformance to schema would be a uh um it's all. So I guess what what this means is, because, obviously part of the point of the data set sites is that google can index them and others can index them um that there will be a necessary step at some point to con to align with what they're doing- and I guess sounds like what we're saying is we set this up such that everyone who's harvesting?

C

These feeds is going to need to support two versions of something at some point in time.

C

So as long as we like, you know, the version that we put out there now and the version that schema.org eventually decides is the thing, and so I guess, if we set it up, such that it's not like uh yeah, like we've done what we can to to to use schema terms so that it minimizes the difference. I guess and then just like some kind of obvious switch in the type, maybe like open, active web api. That means that someone who's consuming this can write like a simple switch statement and then do everything yeah.

C

Whatever needs to be done.

A

Yeah- and I mean I think, if we get a solid kind of if we, if we create a solid enough standard and people are using it, I think we're in a stronger position talking to schema.org to say actually, here's a syntax that that should be supported or should hold weight with you, um but we are simply in a position where we have to decide. First, I think okay.

A

Any any further points anyone wish to raise.

A

From me, oh thank you all uh apologies. If some of this conversation seemed a little uh technically involved, um but uh I think there were at least some some clear actions going forward and um I think quite actionable and reasonably urgent, so we'll be pushing those forward in the very near future and thank you all for joining I'll. Give you back uh 10 minutes of your day.