►
From YouTube: OpenActive W3C Community Call / 2020-06-17
Description
Improvements to RPDE
- The harvesting model has limitations
* Difficult to gauge progress
* Requirement to harvest from the beginning of time
*Cannot query selectively
- Pagination data
- Parameterise the RPDE querystring
- Movable first page
A
So
hello,
all
welcome
to
the
w3c
call
for
17th
june,
and
the
topic
today
is
going
to
be
possible
changes
to
the
rpde
specification.
The
motive
for
this
is
pretty
clear.
A
A
So
I'm
just
going
to
share
my
screen
here.
So
we
can
take
the
presentation.
A
No
worries
no
worries
for
once
I
managed
to
start
recording,
for
once
I
managed
to
start
recording
before
everybody
joined
the
call,
so
we're
starting
on
time
for
once.
So,
as
I
said,
rpg
the
problem
is
that
it's
fairly
slow
in
the
sense
that
you
have
to
get
all
the
data
before
you
start
working
with
it
at
all
and
then
a
particular
annoyance
simply
on
my
part,
but
I
suspect,
for
data
consumers
as
a
whole
is
actually
that
it's
very
hard
to
gauge
progress.
A
So
it
takes
a
long
time
and
you
also
don't
have
any
sense
of
how
long
you've
got
remaining
so
depending
on
the
feed
you're
harvesting.
It
could
be
that
you're
going
to
be
done
harvesting
in
five
minutes.
It
could
take
a
matter
of
days
in
some
cases
of
the
larger
feeds.
A
It's
often
the
case
you
end
up
with
a
lot
of
less
than
relevant
data
could
be
data
in
the
past.
There's
no
requirement
to
delete
obsolete
or
irrelevant
data
anymore,
so
it
can
be
the
case.
You
end
up
with
a
lot
of
stuff
way
in
the
past.
A
It's
not
possible
to
query
rpd
selectively
for
say
only
a
particular
geographical
location,
particular
activity
type
or
whatever.
So
we're
just
going
to
go
over
a
couple
of
proposals
for
streamlining
things.
A
bit.
A
So
in
this
scenario,
the
client
would
be
responsible
for
keeping
track
of
how
many
items
they'd
already
processed,
what
their
progress
was
like,
but
at
least
they'd
have
a
sense
of
where
the
the
end
position
was
nick
commented
on
this
pretty
recently
pointing
out.
A
First
of
all
that
this
doubles
the
query
load
in
the
sense
that
you
have
to
make
an
additional
query
on
the
publishing
side
to
support
this.
Some
kind
of
count
query
indicating
precisely
what
the
number
of
remaining
items
would
be
and
then.
Secondly,
this
ends
up
invalidating.
Caching,
of
course,
because
if
that
number
changes,
then
the
cache
needs
to
be
refreshed,
so
the
efficiencies
of
caching
with
rpd
would
be
lost.
Under
that
scenario,.
A
The
refinement
proposed
yesterday
by
nick
was
to
put
this
in
the
data
set
specifications.
So
when
you
looked
at
a
data
set
site
in
the
json,
there
would
be
an
indication
of
the
total
number
of
items
per
feed.
A
C
D
Yeah,
I
was
gonna
say
this
is
a
a
little
bit
over
my
head
from
a
technical
standpoint,
but
it
seems
to
make
sense
logically.
C
Yeah
sorry
I
was
gonna
say
I
don't
think
that
definitely
the
points
that
nick
has
raised
about
optimization
because
it
would
change
the
sort
of
performance
and
nature
of
the
of
the
query.
Quite
a
lot,
especially
relating
to
edge
caching.
A
Right,
okay,
I
guess
the
difficulty
with
the
data
set
proposal.
Is
it's
not
really
envisaged
that
the
data
set
site
as
it
stands
right
now
has
to
actually
read
the
rpd
feeds,
so
it
seems
like
the
technical
mechanism
for
populating
the
total
items.
Property
is
a
bit
unclear
to
me.
C
Oh,
I
can
help
with
that.
So
the
the
libraries
that
we
currently
have
for
the
data
set
site
generation
are
all
it's
all
dynamic,
so
you
give
it
some
properties.
I
I
guess
it's
designed
dynamically
primarily
for
the
use
case
of
the
kind
of
white
label
solutions.
C
You
know
where
you've
got
like
a
gladstone,
where
you've
got
lots
of
different
types
of
customers
and
they've
all
got
their
own
data
set
sites,
and
so
because
it's
all
dynamically
rendered
right
now.
If
your
data
set
site
query
as
well
as
querying
the
database
for
the
organization
name
and
everything
else,
it's
growing
for
also
queried
to
the
total
number
of
records
which
may
be
cached,
then
that
would
that
would
do
that
and
then,
for
example,
in
gladstone
right
now,
the
dataset
side
is
rendered
from
the
database
and
is
cached.
C
I
think
it's
15
minutes.
It's
cached
for
both
on
the
on
the
the
server
is,
is
caching
in
a
memory
and
then
it's
cached
using
the
using
a
cdn.
If
there's
a
cdn
in
front
of
it,.
D
A
Right,
okay,
so
then,
the
only
time
that
that
becomes
a
problem
is
if
you've
got
a
feed
that
takes
a
long
time
to
consume.
The
number
of
total
items
actually
could
change
significantly
over
the
consumption
time.
A
A
Kind
of
weird
but
yeah,
okay
doable
better
than
better
than
ruining.
Caching,
I
suppose
for
the
for
the
feed
itself.
A
B
The
relief
of
charlie
and
tom
it
sounds
like
I
was
gonna
say
my
tim,
my
only
question
because
I
I
don't
like
to
not
understand
things,
although
I
probably
will
regret
asking,
is
what
what's
the
impact
and
which
is
the
nice
general
pointless
question,
but
if
I'm,
if
we're
playways,
has
no
intention
and
doesn't
take
any
direct
feeds
from
any
booking
systems,
so
we
only
we
only
intend
to
have
it
really
use
the
I'm
in
feed
on
our
own.
What
does
that
concept
it
have
on
on
impact
in
that?
B
A
A
I
suppose,
if
you're
consuming
yeah
sorry
because
luke
and
nick
I'm
in
offers
an
api
integration
right,
so
rpde
is
not
something
that
charlie
would
have
to
worry
about.
On
his
end,.
C
A
B
A
So
it
might,
it
might
help
I
mean
a
little
bit
in
the
sense
of
of
planning
out
how
long
it'll
take
to
consume
a
feed
and
that
kind
of
thing,
but
if
you're,
if
you're
sitting
behind
that,
it
won't
affect
you
at
all.
It's
fine.
C
Thank
you.
Thank
you.
Well,
the
other
benefit
is
it
doesn't
break
anything
that's
existing
because
we
already
need
to
go
around
and
update
everyone's
data
set
sites
when
the
new
spec
comes
out
anyway.
So
this
isn't
going
to
add
any
additional
lobbying
effort
or
otherwise
to
changing
our
pde.
If
that
was
the
thing
that
we
need,
it
would
yeah
it
doesn't,
it
doesn't
add
any
more
to
than
already
there.
B
A
Okay-
and
I
suppose
also
in
terms
of
workflow-
it's
easy
that
the
dataset
site
specification
still
has
to
be
written.
So
it's
easy
enough
to
add
that
line
item
in
there.
Okay,
so
I'll
migrate
that
issue
over
to
the
dataset
site
specification
repo,
then
I
don't
know
if
I
described
the
next
proposal
in
the
best
possible
way.
I
don't
think
this
one
needs
very
much
discussion
either.
A
To
be
honest,
because
looking
at
the
thread,
there
already
seems
to
be
a
lot
of
consensus
around
this,
but
the
proposal
is
essentially
and
nick-
please
jump
in
if
I'm
mischaracterizing
this
to
allow
harvesting
to
start
not
from
the
absolute
beginning
of
the
feed,
but
essentially
from
now,
meaning
that
you
can
start
harvesting
only
opportunities
that
exist
in
the
present
or
future
rather
than
having
to
pick
up
all
the
ones
that
have
existed
in
the
past.
A
So
then,
the
only
point
of
debate
was
really
about
which
approach
to
use
the
difficulty
is
a
little
bit
technical
in
that
to
create
that
capacity
to
start
harvesting
from
the
moment
that
the
query
is
launched.
Actually
because
of
the
way
the
specification
is
written,
that
would
drop
the
first
item
in
the
feed.
A
But,
generally
speaking,
it
seems
like
looking
at
the
comments.
Everyone
who
could
be
bothered
to
comment
seems
to
be
keen
about
the
first
approach
there.
So,
even
though
it's
a
fairly
well
moderately
significant
change
to
the
query
itself
approach,
one
seems
like
it's
getting
all
of
the
votes
right
now.
A
C
Yeah,
it
was
just
because
it
makes
the
the
actual
makes
the
query
a
little
bit
more
complicated,
but
I
mean,
as
I
suppose
it
says,
it's
written
as
simply
as
as
it
can
be
there
with
the
with
the
query
with
that
extra
line
where
in
the
where
clause.
C
But
I
suppose
it's
just
because
that's
the
that
query
is
the
bit.
That's
that's
most
often
done
wrong
and
because
we
haven't
got
because
of
the
way
that
we
test
our
pde,
which
is
just
checking
the
invariants.
It's
going
to
be
quite
difficult
to
also
test
that
this
is
done
right
without
kind
of
the
test
harness,
for
example,
of
the
booking
spec
being
enhanced
to
add.
Well,
I
suppose
it
already.
It
already
does
do
this,
so
so
using
the
testing
of
the
booking
spec.
C
You
could
do
something
like
add
an
opportunity
and
check
that
it
comes
through
and
and
then
check
that
so
the
check
that
this
all
works,
but
because
of
the
the
kind
of
weird
edge
case
about
that
first
item
to
actually
check
this
properly.
You'd
have
to
really
you'd
have
to
know
the
first
item
in
the
database
right
to
be
able
to
or
or
artificially
insert
the
oldest
item
in
a
test
suite
and
then
check
it
came
through.
C
So
it's
just
a
real
gnarly,
like
you
know,
to
actually
validate
this
works,
given
that
the
biggest
problem
is
with
the
query,
but
I
think
all
the
all
of
these
involve
some
form
of
query
changing.
I
think
so.
I
suppose
it's
it's
just
kind
of
yeah,
the
lesser
of
the
evils.
A
Yeah
and
it's
sort
of
it's
sort
of
inherent
in
the
in
the
in
the
goal.
Isn't
it?
The
testing
becomes
a
bit
more
difficult
yeah,
because
it
adds
a
variant
and
it
sort
of
can't
help,
but
at
a
variant.
A
A
You
know,
particularly
in
the
larger
feeds.
I
can
imagine
this.
You
know
taking
processing
time
down
from
from
hours
down
to
minutes,
yes
sure
yeah.
So
that
seems
like
a
really
really
valuable
addition.
C
Sorry,
yeah
no,
no
carry
on.
I
just
remembered
the
reason
that
we
didn't
do.
The
other
stuff
is
because
of
the
string
constraint.
That's
right.
So
it
has
to
work
for
every
every
use
case,
because
there
is
a
simpler
option
available
if
you've
got
an
id,
which
is
not
a
string
which
is
one
of
which
is
what
some
of
the
other
approaches
we're
talking
to.
But
it's,
I
think
it's
fair
to
say
a
lot
of
people
use
strings
as
ids,
because
they've
got
guides
involved.
So.
A
Yeah
yeah
and
then
there's
even
assumptions
being
made
about
the
ordering
of
the
of
the
ids.
Isn't
there
on
that
one
as
well
so
yeah.
B
A
Okay
and
then
the
last
one
I
think,
we've
actually
already
covered
in
that.
I
think
one
of
one
of
the
frequently
voiced
difficulties
with
rpd
is
that
you
can't
query
it
basically
that
you
can't
say
I
just
want
everything
in
this
particular
geographical
area
or
something
like
that.
You
have
to
download
everything
and
then
slice
it
up
yourself.
A
But
again
this
just
runs
into
a
caching
problem.
Doesn't
it.
C
Yeah,
the
yes,
that
this
is
exactly
it,
the
second
that
you
start
adding
any
types
of
parameters
outside
of
what
you've
already
got
in
there.
In
fact,
even
the
limit
parameter
in
the
in
the
rpd
specs
it
stands
doesn't
really
work
for
caching,
or
at
least
yeah
it.
It
creates
problems.
So
you
could,
I
mean
a
good
implementation.
C
Could
just
ignore
that
parameter
and-
and
so
that
doesn't,
which
is
what
a
lot
of
the
big
big
high
scale
ones
are
doing,
just
override
it
with
whatever
so
yeah,
so
anything
outside
of
just
where
you
are
in
the
paging,
which
changes
the
pages
just
yeah
radically
increases
the
number
of
permutations
of
the
data
set
that
you're
able
to
download
and
then
undermines
all
the
load,
the
the
load
management
that
you
can
do
using
the
cdn
at
the
moment.
A
Right,
I
feel
like
yeah,
it's
tricky.
I
feel
like
we're
a
little
bit
imprisoned
by
our
caching,
that
we
need
everything
to
be
sort
of
as
static
as
possible
so
that
we
can
cache
it.
C
Well,
I
guess
I
guess
here's
the
thing
right
if
we
would,
if
we
wanted
to
go
down
the
road
of
of
of
closing
these
endpoints
down,
so
they
weren't
fully
open
and
having
api
keys
on
them
and
everything,
which
is
what
I
mean
because
of
the
cache
thing
that
we're
able
to
have
all
these
endpoints
fully
open
and
unable
to
you
know,
we've
had
several
challenges:
legends
sports
suite-
I
guess
anyone
who's
looked
at
this
for
scale-
has
kind
of
gone
really
into
the
detail
of
you
know:
does
this
cdn
strategy
actually
work?
C
Does
it
actually
protect
my
servers?
The
answer
is
yes,
it
absolutely
does
as
it
stands
because
of
the
constraints
we've
put
on
the
on
the
on
the
endpoints.
If
you
wanted
to
do
anything
else,
you
would,
I
think,
it's
fair
to
say
from
all
the
feedback
that
we've
we've
had.
We
would
need
to
start
adding
api
keys
and
the
challenge
as
soon
as
you
start
doing.
C
That
is
that
you
get
into
situations
where
people
who
have
opened
the
data
can
start
to
be
very
selective
about
who
they
decide
to
make
the
data
available
to,
and
then
you
end
up
in
that
slippery
slope
towards
a
bit
like
we
we
saw
previously
with
them
with
where
sport
suite
were
before
they
kind
of
realized
the
benefit
of
the
open,
license
and
and
looked
at
what
kind
of
what
the
kind
of
philosophy
of
open
activity
actually
was.
C
Was
you
know
that
you
have
a
form
where
you
fill
it
out
and
then
they
approve
it
and
you
get
an
api
key,
but
they
might
not
approve
it.
If
you
know
I
mean
I'm
not
saying
it's
a
sports
week,
but
the
implication
is
they
might
not
approve
it.
If
you're
a
competitive
organization
or
if
the
you
know
it
doesn't
quite
so
you
might
have
an
open
license
and
so
and
and
the
odi
is
you
know
the
view
on
this
has
always
been
that
you
know.
C
C
We
probably
want
to
make
sure
that
open
means
open
and
we're
not
relying
on
aggregators
and
others
to
you
know,
redistribute
the
data
with
an
open
license
and
have
kind
of
those
additional
gatekeepers
on
there.
Just
because
we
want
to
lower
the
barrier
to
entry
in
the
market
really,
and
so
the
the
solution
in
rpd
at
the
moment
is
there's
inline
filtering,
which
you
can
use,
which
means
that
you
can.
C
The
rpd
is
designed
so
that
if
you
want
to
slice
the
data
set,
you
can
do
that
arbitrarily
based
on
the
data
in
each
day
in
each
payload
of
of
each
item,
and
you
can
do
that
in
line,
so
you
can
only
load
into
your
database
exactly
what
you
need.
You
need
to
store
everything,
but
unfortunately
you
still
need
to.
You
still
do
need
to
page
through
everything.
C
But
of
course,
if
those
pages
are
cached,
then
the
paging
should
actually
be
fairly
quick
and
you're
benefiting
from
that
edge
caching
to
go
through
and
only
take
what
you
need
from
the
data
set
and
store
it,
so
that
that's
kind
of
I
guess
it's
a
philosophy
point
really
as
much
as
anything.
It's
the
reason,
this
constraints
are
there.
A
It's
interesting
because
I
mean
there
are:
there
are
open
apis
but
yeah,
depending
on
on
keys.
C
But
it
also
increases
the
amount
of
work
required
for
everyone
right,
because
everyone
needs
to
manage,
have
an
api
management
solution
in
place
where
they
can
manage
api
key
distribution
manage
signups
and
a
lot
of
these
smaller
organizations.
Like
our
parks,
you
know
this
is
a
they
do
the
tech
work
and
then,
as
we've
seen,
they've
only
touched
it.
You
know
they
touched
it
four
years
ago
and
now
they've.
C
Just
again
to
update
it,
but
there's
not
really
a
high
amount
of
resource
being
dedicated
to
managing
access
to
these
things.
C
Well,
less
details
just
more
just
a
high
volume
of
consumers
asking
different
questions.
Yeah
like
on
the
the
like
performance
angle,
I
recently
just
had
a
look
at
all
of
the.
I
think.
D
C
It
was
all
the
session
series
data,
but
all
the
session
series
dates
that
I'm
in
harvests-
and
I
think
all
of
it
gzipped-
was
something
like
50
megabytes
or
something
which
can
be
downloaded.
C
You
get
an
average
uk
connection
in
a
second,
so
I
think
if
so
so,
at
which
point
you
know,
if
everything
is,
is
edge,
cached
and
everything
is
included.
Then
something
can
should,
in
theory
be
able
to.
You
know
if
it's
downloading,
just
as
frequently
as
it
can
do
be
able
to
get
the
entire
data
set
very
quickly,
and
then
they
can
decide
at
that
point
whether
to
keep
that
bit
of
data
or
just
drop
it
on
the
floor
based
on
their
own
filtering
if
they
want.
C
But
so
if,
if
we
were
to
add
query
parameters
that
do
geography
geography-based
filtering,
then
that
reduces
the
amount
of
data.
But
as
far
as
I
can
tell,
there's
not
there's
not
a
great
amount
of
data,
even
when
there
are
a
lot
of
locations,
but
you
don't
get
any
of
the
benefit
from
the
edge
caching.
So
the
data
gets
there
much
more
slowly.
A
Was
it's.
C
Yeah,
the
gzips
to
with
a
highlighter.
D
C
Right,
yes,
because
the
data,
the
challenge
with
this
whole
infrastructure,
has
never
been
the
data
volume.
It's
always
been
the
real-time
nature
of
it,
the
value
of
having
the
up-to-date
information
about
when
a
session
is
and
how
many
spaces
are
left
and
having
that
coming
live
from
source.
C
So
if
you
wanted
to
get
a
static
version
of
you
know
what
what
what
the
sector
looked
like
in
a
certain
point
in
time,
you
could
probably
get
all
the
systems
to
pull
that
in
a
csv
and
then
shove
it
somewhere
and
then,
but
that's
yeah,
that
doesn't
that
doesn't
achieve
the
objective
of
creating
that
real-time
view
of
what
sessions
are
available
tomorrow,
and
so
that's.
I
guess
why
this
this
is
geared
towards
making
the
data
like.
C
I
guess
that's
why
our
pde
is
about
more
the
real
time
element
of
it
unless
the
kind
of
downloading
from
the
beginning,
because
the
value
you
don't
get
the
value
from
this
by
one
download
right,
the
download
in
itself
is
great
for
the
first
time
someone's
got
it.
So
I
totally
see
some
of
the
you
know
challenges
around
the
first
time.
C
Someone
harvests
the
feed
you've
got
lots
of
stuff
to
download
if
you
need
to
resync
that
feed,
but
the
ongoing
value
you're
getting
from
that
as
a
business
as
a
consumer
or
whatever
is
not
downloading
the
whole
thing
again.
It's
just
the
fact
that
you
get
to
with
a
minimal
amount
of
work
in
the
middle
of
of
bandwidth.
You
can
get
those
changes
and
there's
actually
not
very
much
going
on
there
in
terms
of
when
you're
just
calling
for
changes.
A
A
Is
the
problem
with
the
filtering
solution
is,
supposing
you
change
your
database
schema?
You
then
have
to
re-harvest,
meaning
you
do
it
all
from
the
top
again
meaning
you,
then
you
know
sync
the
time
again.
So
it's
it's
not
very
agile.
I
suppose
it's
good.
If
you've
already
got
an
established
business
methodology
and
logic
that
says,
okay,
we
can
just
keep
on
pulling
in
deltas.
Basically.
C
Well,
actually,
I
would
argue
it's
more
agile
because
you
don't
need
to
the
problem
we've
got
with
distributed
querying
is
that
every
single
type
of
query
needs
to
be
implemented
on
every
single
data
provider,
and
so,
if
you
want
to
add
a
postcode
search
and
that
doesn't
quite
do
do
it
because
of
whatever
reason
you
go
around
every
single
provider
and
get
them
to
uplift,
their
feeds
to
support
that
particular
query
parameter
and
then,
as
the
data
evolves,
and
so
then
does
the
query
parameters
and,
and
then,
of
course,
you've
got
the
challenge
of
versioning
those
parameters.
C
So
if
you
change
the
definition
of
one
you've
got
to
go
through
and
make
sure
you've
got
some
homogeneous.
You've
got
you've,
got
to
figure
out
a
mechanism
so
that
you
get
the
same
kind
of
data
back
from
everyone.
So
I
think
the
well
you
could
it
could.
It
could
be
viewed
that
the
current
situations
as
it
stands
with
the
harvesting
is
actually
the
most
agile,
because,
as
a
data
consumer,
you
can
really
radically
change
the
kind
of
filtering
you're
doing
you.
Can
you
can
merge
munch
the
data.
C
C
So
if
this
is,
if
what
this
is
doing
is
giving
you
basically
a
fire
hose
of
all
the
data
and
you
can
choose
how
you
cut
and
do
what
you
want
with
it,
depending
on
your
use
case,
then
that's
optimizing
for
the
thing
that
you
know
it
makes
it.
It
might
have
a
bit
of
slowness
in
terms
of
downloading
it,
but
you're
literally
saving
years,
compared
to
trying
to
get
that
data
from
whatever
apis
are
in
the
in
the
you
know,
or
lack
of
apis
or
in
the
systems
as
it
stands.
A
C
Yeah,
absolutely
absolutely
no,
and
also
you
you're,
more
agile,
because
you
because
you've
removed
the
complexity,
you've
got
access
to
all
the
data
and
the
the
approach
you
you
can
use,
which
you
can
apply
to
every
data
set
is
you
know,
filtering
based
on
the
standard,
whatever
parameters
you're
interested
in
and
you
can
decide
to
apply
new
and
interesting
filtering
to
that.
You
know
you
don't
have
to
wait
for
everyone
to
implement
the
more
than
date
field
for
the
end
date.
C
Right,
like
you'd,
have
to
go
through
and
wait
for
every
data
provider
to
do
that
and
the
time
that
would
take
and
the
cost
for
all
of
those
implementations
you.
You
just
add
that
single
line
of
code
to
your
harvesting
tool
press
go,
go,
have
a
cup
of
coffee
come
back
and
then
see.
If
it's
worked
yeah
I
mean.
A
My
experience
is
that
you
go
and
have
a
cup
of
coffee.
Then
you
go
to
bed.
D
A
You
wake
up
check
on
how
it's
doing
go
to
bed.
Again
I
mean
it's,
you
know
the
as
the
as
the
volume
of
data
increases
and
as
the
kind
of
experimentation
you
want
to
do
gets
more
bold.
I
think
the
time
overhead
becomes
more
and
more
oppressive
and
certainly
debugging
is
no
fun
right.
C
Absolutely-
and
I
guess
that
was
the
previous-
the
previous
issue
right
around-
making
sure
that
we
can,
we
can
limit
the
amount
of
data
in
the
feed
to
that
useful
set.
Yeah
significantly
reduces
that
yeah
you're
right,
because,
if
you're,
if
you're
pulling
from
the
beginning
of
time
from
gll
you're,
looking
at
literally
millions
of
records,
yeah
yeah.
A
But
yeah,
I
think
that
it
shouldn't.
We
shouldn't,
underestimate
the
the
burden
that
gets
put
on
data
consumers
as
a
result
of
removing
the
burden
from
data
publishers.
That's
all
but
yeah
yeah.
I
feel
like
the
the
caching
is
invaluable
for
a
publisher,
but
it
creates
headaches
for
for
a
consumer
but
within
within
the
domain
of
the
possible
yeah.
This
does
do
a
nice
job
of
making
sure
that
there's
actually
some
data
there's
some
data
published
that
consumers
can
use
okay.
A
So
I
think
this
might
just
be
a
question
that
we
revisit
based
on
the
ability
to
start
harvesting
from
now,
rather
than
from
the
beginning
of
time,
because,
as
you
say,
gll
has
got
millions
of
records,
but
of
course
most
of
those
are
historical.
So
we
become
much
more
agile
if
we
can,
if
we
can
throw
away
obsolete
data.
A
A
Is
it
simply
a
question
of
supplying
the
right
tests
and
then
communicating
that
out
to
providers.
C
Yeah,
well,
I
guess
I
guess
the
the:
what
do
we,
what
was
the
name
of
it,
the
retention
period?
So
that's
that's
on
the
open,
active
docs
already,
and
I
know
that
we've
been
pushing
that
with
bigger
feeds
to
try
to
get
those
feed
sizes
down
anyway,
just
as
part
of
just
general
discussion
as
they're
kind
of
moving
to
the
next
version
of
whatever
they're
doing
so.
C
C
It's
probably
not
easy
to
add
to
the
rpd
tests
in
the
validator
just
to
harvest
tests
see.
We
probably
would
need
to
do
this
in
more
of
the
way
that
the
booking
is
sweet.
Yeah
insert
a
record
check
the
records
come
through.
You
know
that
that
type
of
level,
which
which
is
definitely
yeah,
definitely
a
feasible,
as
I
mentioned,
there's
already
similar
tests
in
there.
C
So
maybe
it's
just
adding
a
couple
tests
in
there
I
mean
you
could
even
add
it
as
a
feature
in
the
way
that
that
framework's
currently
built
the
future
being
stateless
retention
period.
And
if
someone
chooses
to
support
that
feature,
then
it
goes
green
and
then
we
just
promote
that
feature
along
with
the
other
features.
Actually,
I
was
thinking
this
the
other
day.
You
know
that
that
open
active
test
suite
is
actually
pretty
good.
C
If
you
don't
implement
booking
because
it
does
do
data
set
site
validation,
it
does
full
harvesting
of
the
data
feed.
I
know
that
josh
in
playways
found
some
bugs
in
the
open
data
feed
just
by
using
the
test
screen.
So
it
might
be
that-
and
this
is
just
an
idea
I
had
yesterday-
but
it
might
be
worth
looking
at
adding
validation
of
the
open
data
pages
into
the
test,
suite
just
doing
that
as
it
goes
or
having
an
option
set.
C
So
you
can
turn
that
on
because
that
what
that
then
does
is
means
that
you've
got
this
kind
of
full
feed
download
test
test
harness
you
can
use,
and
if
you
combine
that
with
this
feature,
then
you
could
almost
imagine
like
a
profile
of
you
know.
You
can
configure
that
test
suite
with
a
profile
of
features
which
is
just
open
data,
actually
nothing
to
do
with
booking.
So
it
just
covers
the
open
data.
A
Yeah
yeah,
that's
nice,
so
that
that
suggests.
I
think
if
the
proposal
is
to
add
the
total
item
count
to
the
dataset
site
and
to
implement
the
movable
first
page
that
suggests
that
we
prioritize.
A
Getting
a
specification
up
for
the
data
set
site
published
and
testing
against
that
as
part
of
getting
these
two
improvements
to
rpde,
actually
practical
and
verifiable,
and
then
the
third
point
about
panoramatization
becomes
much
less
much
less
critical,
yeah,
okay,
yeah!
I
know
that's
a
that's
a
that's
kind
of
a
nice
way
forward.
I
think,
looking
at
the
test,
suite
is
more
than
just
booking
and
extending
it
to
cover
open
data
is
a
is
a
really
nice
way
of
tying
that
package
up.
A
C
I
suppose
there's
only
one
just
about
the
data
set
site
specification
is
obviously
that's
that's
something
that
is
being
presently.
I
mean
the
whatever
iteration
of
that
that's
there
in
in
the
github
issues
is
being
used
in
the
test
suite
because
there
has
to
be
something
in
there,
and
so
I
guess
it's
just
a
comment
that
there's,
obviously
a
present
implementation
is
happening
against
that,
by
necessity
with
the
current
implementers
and
the
playways
is,
is
already
underway
and
others.
C
So
I
guess
that
there
might
be
in
parallel
to
completing
that
test
suite
and
which
I
know
we're
all
is
an
urgent
thing
to
to
be
done.
It
might
also
be
that
there's
a
an
urgency
around
finishing
that
data
set
site
spec
so
that
what
things
are
done,
what
you
know,
the
definition
of
done
means
really
done
not
like
diana
will
have
to
revisit
it
when
that
other
spec
comes
out
in
like
a
few
months
time,
that's
just
a
thought.
A
Yeah,
no
that's
a
good
point
and
I
think
it's
a
bit
worrying
that
the
dataset
site
specification
has
been
kind
of.
It's
existed
as
a
default
for
a
long
time
now,
right
as
bits
of
json
floating
around
in
the
ecosystem,
which
is
tremendously
helpful.
C
Yeah,
like
I,
I
even
noticed
the
other
day
I
just
I
thought
it
was
a
bug
actually,
but
it
turned
out
to
be
a
well
a
feature
or
a
result
of
that.
The
the
case
of
content,
url
or
something
I
can't
remember
what
the
there's
a
content,
url
or
access
url,
whatever
the
url
is
that's
being
used
for
the
booking
spec
stuff,
which
has
come
from
dcat,
actually
has
a
different
case
of
url
than
the
scheme
of
the
org
stuff.
C
Oh
no,
all
right,
because
d-cap
uses
access,
url
or
whatever
it
is.
Actually,
you
know
I
check
actually
check
d-cat's
original
spec
and
yeah.
They
have
url
capitalized,
whereas
nothing
in
schema.org
has
url
capitalized,
and
so
even
I
mean
it
sounds
super
basic.
Doesn't
it
but
obviously
that's
a
conformance
thing.
We
need
to
make
and
that's
something
that
now
exists
by
default,
as
you
say,
because
this
is
kind
of
evolved
and
not
really
being
cross-checked
and
so
just
just
kind
of
yeah,
even
even
that
stuff
just
needs
to
be
well.
C
A
Yeah,
that
was
that
was
a
long
time
ago.
The
schema
or
conversation
has
gone
off
in
a
completely
different
tangent,
for
reasons
nothing
to
do
with
open
active
but
yeah.
There
has
to
be
some
kind
of
decision
about
how
we
reconcile
dcat,
schema.org
and.
C
Us
well
so
tim,
I
was
going
to
ask
you
this
question
separately,
but
maybe
I
mean
given
it's
relevant
to
the
group.
Is
it
worth
updating
us
all
on
the
schema.org
chat
and
the
and
the
web
api
kind
of
dan
conversation?
I
know
this
is
like
it's
been
on
my
backlog
for
like
six
months
to
ask
that,
but.
A
I
mean
there's
not
there's
not
actually
that
much
to
say
essentially
the
web
api.
Well,
the
last
time
I
looked,
which
was
a
couple
of
months
ago
now,
the
the
web
api
conversation
within
schema.org
had
spiraled
off
into
big
questions
of
scoping,
which
were
kind
of
outside
of
our
remit.
Basically,
so
there's
not
a
lot
to
update
specifically
on
us
because
it
wasn't
it
wasn't
like
it
was
not
a
dialogue
of
we've
got
this
proposal
to
make
and
then
schema.org
thought
that
there
were
some
problems
with
it.
A
It
felt
like
web
api
should
be
describing
a
much
different
range
of
services
and
that
conversation
was
swirling
the
last
time
I
looked
at
it,
so
I
don't
have
anything
too
valuable
to
add
beyond
it's
not
the
right
time
to
be
making
very
concrete
proposals
with
regard
to
web
api,
because
there's
right.
C
Is
that
dan's
comments
in
addition
to
what
was
in
schema.org
the
content
on
github,
or
is
that
that
just
I
mean
basically,
is
all
of
what
we're
saying
on
github
or
is
there
stuff
that's
context
from
dan?
That's
not
on
github,
no.
A
C
A
Oh,
no,
sorry
not
not
to
do
with
openactive
and
schema.org.
I
mean
with
web
api
and
schema.org
right
got
it
yeah.
Sorry,
so
I
guess
the
place
to
look
for
updates
on
that
which
is
someplace.
That
I
should
look
for.
Updates
is
the
schema.org
mailing
list
and
repos.
A
C
Yeah
completely
makes
sense,
so
I
guess
for
clarity,
then,
is
there
anything
from
your
conversation
with
dan?
That
is
not
in
the
github
issue.
That's
worth
it.
There.
C
A
Great
no,
I
wish
I
wish.
I
could
say
we
had
this
great
conversation.
Here's
how
it's
going
to
go,
I'm
lining
it
all
up.
C
C
So
I
don't
know
if
you
you
saw
that
there's
a
a
separate
w3c
community
group
like
open,
active
like
this
one,
that's
for
web
api,
and
so,
though,
and
a
couple
guys
on
there
seem
to
have
on
the
github
repo
put
together
an
initial
spec
and
that's
part
of
the
thread,
and
then
it
looks
like
that's
there
and
then
there's
a
bunch
of
other
thoughts,
but
no
one's
really
kind
of
bringing
it
together.
It's
just
kind
of
it's.
It's
like
opening
the
diamond
up,
rather
than
closing
it
down.
C
C
A
Mean
I
think
I
think
the
problem
is
even
just
the
name
I
think
indicates
this-
that
the
scope
actually
has
to
be
pretty
wide.
The
number
of
questions
that
have
to
be
answered
is
quite
high.
I
don't
think
open
active,
particularly
we
can
drive
the
conversation
there
if
we
want
to,
but
I
think
it's
going
to
be
a
long
conversation.
C
A
A
No
movement
on
it
since
february
I
haven't,
I
haven't,
looked
back
in
a
long
time.
Okay,
I
mean
I
could.
I
could
raise
the
issue
again
on
that
thread
and
say
here's
the
direction
we
wanted
to
go
in,
but
I
I
suspect
that
that
will
be
begin
that
will
initiate
a
very.
C
A
C
A
C
Oh
yeah,
okay,
that
makes
sense,
so
I
guess,
if
we
set
it
up
in
such
a
way
that
future
conformance
to
schema
would
be
a
it's
all.
So
I
guess
what
what
this
means
is,
because,
obviously
part
of
the
point
of
the
data
set
sites
is
that
google
can
index
them
and
others
can
index
them
that
there
will
be
a
necessary
step
at
some
point
to
con
to
align
with
what
they're
doing-
and
I
guess
sounds
like
what
we're
saying
is
we
set
this
up
such
that
everyone
who's
harvesting?
C
So
as
long
as
we
like,
you
know,
the
version
that
we
put
out
there
now
and
the
version
that
schema.org
eventually
decides
is
the
thing,
and
so
I
guess,
if
we
set
it
up,
such
that
it's
not
like
yeah,
like
we've
done
what
we
can
to
to
to
use
schema
terms
so
that
it
minimizes
the
difference.
I
guess
and
then
just
like
some
kind
of
obvious
switch
in
the
type,
maybe
like
open,
active
web
api.
That
means
that
someone
who's
consuming
this
can
write
like
a
simple
switch
statement
and
then
do
everything
yeah.
A
Yeah-
and
I
mean
I
think,
if
we
get
a
solid
kind
of
if
we,
if
we
create
a
solid
enough
standard
and
people
are
using
it,
I
think
we're
in
a
stronger
position
talking
to
schema.org
to
say
actually,
here's
a
syntax
that
that
should
be
supported
or
should
hold
weight
with
you,
but
we
are
simply
in
a
position
where
we
have
to
decide.
First,
I
think
okay.
A
From
me,
oh
thank
you
all
apologies.
If
some
of
this
conversation
seemed
a
little
technically
involved,
but
I
think
there
were
at
least
some
some
clear
actions
going
forward
and
I
think
quite
actionable
and
reasonably
urgent,
so
we'll
be
pushing
those
forward
in
the
very
near
future
and
thank
you
all
for
joining
I'll.
Give
you
back
10
minutes
of
your
day.