►
From YouTube: IETF113-PPM-20220325-1130
Description
PPM meeting session at IETF113
2022/03/25 1130
https://datatracker.ietf.org/meeting/113/proceedings/
B
C
A
Well,
I
don't
have
don't
have
any
any
advice
on
that.
A
C
A
Start
time
and
I
think
nobody
will
feel
slighted
if
they
miss
the
first
chair
slide.
So
let's
get
started
hi.
My
name
is
ben
schwartz.
Together
with
sam
weiler,
we
are
the
chairs
of
privacy,
preserving
measurement,
and
this
is
the
very
first
meeting
of
this
brand
new
ietf
working
group.
Thank
you
to
joe
salloway
for
being
our
our
surrogate
chair
in
vienna.
A
If
you
have
any
questions
in
the
room,
please
take
them
to
joe
sam
and
I
are
both
on
in
east
coast
time
in
the
united
states.
So
if
we,
if
we
say
anything
that
seems
a
little.
A
A
Okay,
how's
it
going
joe
are,
we
are
we
ready
to
go
in
the.
C
A
C
A
Okay,
well
as
long
as
as
long
as
it's
not
too
distracting
in
the
room,
maybe
we
can.
We
can
keep
keep
moving.
A
It's
always
relaxing
to
be
the
last
session
at
ietf,
since
most
people
have
had
a
chance
to
really
look
at
this
quite
a
few
times
this
week,
but
just
in
case
you
haven't,
this
is
legally
binding.
So
please
understand,
what's
going
on
here
and
abide
by
these
rules
when
participating
at
the
ietf.
A
This
is
a
hybrid
meeting
if
you're
in
the
room,
you
probably
know
that
you
need
to
check
in
with
your
mobile
device
in
order
to
be
counted
as
attending
this
session.
Please
do
that.
A
A
A
So
that's,
that's
all
we've
got
for
for
this
meeting.
Anybody
wants
to
raise
questions
about
this
agenda.
Please
speak
now,
otherwise,
we'll
hand
it
over
to
eric
rescala.
D
I
I
am
talking
next
obviously,
but
I
did
have
a
question
with
the
agenda,
which
is
we
probably
need
to
talk
about
adoption
of
the
of
the
of
the
draft
because
of
that,
this
charter
item
in
two
ways,
so
so
just
like
that
should
go
at
the
end
of
this.
You
know
at
the
end
of
in
the
beginning
of
this,
some
region
changes
et
cetera.
A
Okay,
I
think
that
we
will.
We
will
note
that,
there's
a
that,
we
will
definitely
raise
the
question
of
draft
adoption
after
the
presentations
in
this
during
this
session.
Thanks
and
now
it's
yours,
how
do
you
want
to
handle
slacks.
D
I
think
I
just
press
this
button
which
I
can
refined.
Then
I
will
be
good
or
are
you
now
now
you
have
to
press
a
button
yep.
A
Oh,
I
see
I
need
to
stop
sharing
before
you
can
start
sharing
or.
D
All
right,
however,
you're
right
so
for
those
of
for
those
of
you
who'd
entered
the
bath.
This
is
actually
a
nearly
identical
presentation,
so
my
apologies
I'm
not
forgiving
it,
but
for
the
fact
that
you
had
to
wake
up
20
minutes
earlier
than
you
otherwise
would
have.
So
if
you've
already
seen
this,
you
know
you
can
feel
free
to
go,
get
a
coffee
or
something
because
really
it's
just
a
subset.
D
D
Apologies
to
the
star
people-
I
didn't
actually
cover
star
in
this,
but
they're
going
to
so
that'll
be
fine
and
then
I'll
talk,
something
something
about
mpc.
You
know
the
specific
technologies
which
are
the
ones
that
are
like
larger
than
the
scope
of
this
working
group.
Again.
You
know
this
is
mostly
stuff
in
draft
gpew.
D
I'm
expecting,
like
you
know,
chris,
to
cover
a
lot
of
the
detail,
but
this
is
just
to
give
you
sort
of
some
situation
and-
and
I
think
we'll
stop
so
there
are
a
lot
number
of
situations
where
it's
desirable
to
learn
information
about
people.
You
know
this
is
not
only
true
on
the
internet,
but
intro
generally,
you
know
various
forms
of
public
research.
The
census
is
a
good
example.
D
You
wish
to
learn
information,
which
is
science
sensitive,
like
demographics
income,
you
know
medical
issues,
those
kinds
of
things
you
know
having
a
sense
for
instance,
of
how
many
people
have
had
or
having
covered,
is
obviously
recovered
response.
As
an
example,
then
there
are
commercial
reasons
I
mean
for
pi
development.
You
know
if
you
make
a
product
you'd
like
to
know
which
features
people
use
and
don't
use
in
your
product,
so
you
can
make
them
better
or
take
them
out.
D
You
like
to
understand
where
your
product
is
failing
to
work,
and
this
this
is
obviously
very
important
for
consumer
products,
that
you
know
that
that
you
know
people
use
and
they
go
wrong,
and
so
you
know
if,
if
your
body
doesn't
work
on
a
website,
you
know
it's
pain
to
have.
Customers
have
to
have
to
report
it
to
you
so
you'd
like
to
know
it
didn't
work
and
like
to
measure
where
it
didn't
work.
As
soon
as
you
know,
addresses
are
most
important.
D
This
is
a
big
issue
for
for
firefighters
by
the
way
and
then
there's
a
various
because
behavioral
measurements,
they're
great
hyper
are,
you
know,
are
using
your
product,
you
know
so.
First
of
all,
it
was
discovered
websites
which,
which
you
know
previously
existed,
so
you
could
like
you
know,
people
are
now
going
this
website
a
lot.
Maybe
it
should
be.
I
don't
know
in
the
search
index
and
which
information
people
are
most
interested
in
again.
D
You
know,
maybe
there
should
be
something
which
which
would
like
have
a
search
index
or
like
to
tell
them
about
proactively.
So
it's
like
you
know,
measurement
is
all
over
the
place.
D
So
that's
this
information
is
obviously
super
useful,
but
the
problem
is
it's
also
in
many
cases
very
sensitive.
You
know
people
so
take
my
example.
From
earlier
having
having
it
known
that
someone
had
covid
or
has
some
other
medical
issues,
that's
something
information
they
may
not
wish
to
be
known.
You
know,
income,
sexual
orientation,
other
things
are,
are
gender
identity
or
things
people?
You
know
an
aggregate.
You
know
people
would
like
to
know,
but
individually
people
necessarily
do
not
want
to
disclose.
D
It
turns
out
that
you
know
that
even
much
less
sensitive
data
can
or
paralyzing
data
can
be
very,
very
revealing,
and
especially
when
you
take
a
lot
of
less
sensitive
data.
I've
got
this
a
square
course
for
a
reason:
it's
not
emphasis.
It's
reminding
you.
The
less
is
a
is
a
matter
of
judgment.
When
you
put
lessons
of
data
together,
you
know
you
from
a
lot
of
people
and
so
there's
this
famous
incident.
D
Where
target
sort
of
looked
at
somebody's
shopping
history,
he
had
figured
out
that
this,
this
girl's
pregnant
and
I
I've
heard
other
reports
from
people
the
same
thing.
D
You
know
that
that
that
they'll
have
just
being
some
life
change
and
they'll
suddenly
start
getting
advertising
that,
like
than
just
space,
that
life
change-
and
so
you
know
this-
I
mean
this
concept
is
the
foundation
behind
like
modern
web
advertising,
that
you
should
take
a
bunch
of
like
small
information,
glue
it
together
and
use
it
for
use
it
for
is
for
targeting.
So
it
is
definitely
the
case
that
collecting
a
lot
of
information,
even
if
it's
like
normally
not
sensitive,
can
be
a
real
problem.
D
So
this
tension
between
information
gathering
and
and
the
power
of
information
gathering
would
you
like
to
have
and
and
the
privacy
problems
that
that
creates.
The
good
news,
however,
is
the
tension
is
created
by
technology.
It's
not
created
by
it's,
not
inherent
because,
mostly
what
you
want
to
gather.
I
I
wanna
caveat
that
there's,
that's
not
always
the
case
so
for
like
web
advertising
for
tracking
it's
super
complicated,
but
for
measurement
you
typically
wanna
measure
aggregates.
D
So
you
wanna
measure
like
what
is
the
distribution
of
people's
income.
I
don't
care
anybody's
individual
income
out
to
the
distribution,
income
or
distribution
of
household
expenses.
You
want
to
know
what
fraction
of
people
have
had
coveted.
You
don't
want
to
know
whether
anybody
individually
had
coveted
in
general,
I
mean.
Obviously
there
are
cases
you
know
like
the
people
in
indiana.
D
Perhaps
you
might
want
to
know
somebody
individually
has
copied,
but
for
my
my
my
comfortable
posture
in
the
back
in
my
room
here,
you
know
I
don't
care,
I
don't
need
to
know.
You
know
which
individuals
have
covet.
So
you
know.
Is
there
like
a
so
there's
more
complicated
aggregates,
so
you
might
want
to
measure
like
the
relationship
between
income
and
height,
which,
by
the
way
there
is
one
taller
people
on
intent,
have
higher
incomes.
D
You
know
what
are
the
most
popular
websites
again,
I
don't
need
to
know
websites.
You
know
individually.
Someone
went
to
all
I
need
to
know.
Is
you
know
which
one
is
most
popular?
It's
often
it's
often
necessary,
however,
not
just
to
have
like
gross
aggregates
like
that,
but
people
slice
the
data
in
multiple
ways,
so
you
say
well,
you
know
I
want
to
know
what
distribution
people's
income
in
california
or
I
want
to
know,
take
two
arbitrary
variables
and
compare
them.
So
I
gave
this.
D
You
know:
income
and
height
distribution
example,
but
you
might
want
to
say
well,
you
know
covered
rates
sliced
by
age
or
by
income
or
by
demographics
or
other
things.
So
again,
the
key
point
is
that
these
individual
values,
like
aren't
necessary,
do
this
work
what's
necessary
to
be
able
to
work
on
the
data
in
various
ways.
D
But
as
long
as
you
can
work
on
the
data,
the
individual
values
actually
is
not
particularly
helpful
and
and
of
course,
a
privacy
problem,
and
as
someone
who's
done
quite
a
bit
of
work,
I
can
tell
you
that,
like
one
actually
very
rarely
looks
at
data
values
individually,
once
you
get
past
even
size,
because
it's
simply
not
practical,
you
have
you
know
hundreds
of
thousands
of
data
points
and
like
what
one
data
point
doesn't
help
you
very
much
and
if
you
do
have
and
the
only
times
you
really
do
look
at
them.
D
It's
like
there
was
a
giant
outliers
or
something
you're
like.
Why
is
there
like
one
person
who's?
You
know
who
appears
to
have
visited
like
50
million
websites
and
like
that
person's,
like
obviously
not
probably
that
person.
So
there
are
a
variety
of
different
measurement
types
that
you
like
to
collect.
You
know
there's
sort
of
simple
statistical,
metrics
aggregates
like
mean
media
and
some
histograms
and
those
kinds
of
things.
D
You
know
the
typical
things
you've
learned
is
stats
like
one-on-one
class
and
then
there's
like
there's
things
like
relationships
and
multiple
values.
These
are
all
stuck
at
stats.
101
kind
of
things
you
know
correlation
coefficient
ordinary
squares,
blah
blah.
You
know
this
can
go.
This,
of
course,
can
get
very
complicated.
D
You
know
up
to
like
machine
learning,
algorithms,
nothing
we're
talking
about
today,
really
will
let
you
do
like
you
know,
we'll
let
you
do
a
deep,
deep
learning,
unfortunately,
and
then
there's
a
there's,
a
very
specific
task,
but
actually
quite
a
common
task,
which
is
common
strings,
which
is
often
called
heavy
hitters
and
the
the
the
problem
here
is
to
say
like
what
you
know
what
strings
are
like
most
common
between
people
and
so
like
that
can
be
a
lot
of
different
things.
D
But
a
common
example
that
google
gave
in
their
in
their
report
paper
is
what
what
home
pages
are
most
common
on
people's
people's
machines.
And
so
you
know
you
don't
want
to
know
like
what
you
don't
want
to
know
like
what
home
page
someone
has
like.
The
only
person
has
it,
but
if,
like
half
the
population
has
to
let
you
know
that,
and
the
nice
thing
about
like
about
all
these
values,
let's
say
is
like
just
to
pull
back.
D
Is
that
once
again,
they
don't
actually
depend
on
individual
values
only
depend
on
knowing
that
this
the
desired
good
and
and
this
heavy
hitter's
problem
obviously
has
a
property,
and
you
only
want
the
most
common
ones.
You
don't
with
individuals
and
that
helps
by
the
way,
mitigate
the
privacy
of
these
heavy
headers.
Because
someone
has
you
know
as
their
home
page,
they
have
their
own.
You
know
some
google
doc
that
they
accidentally,
I
said,
set
as
like
happens.
Some
google
doc,
which
you
know
is
like,
is
like
sensitive
well.
D
You
don't
want
to
do
that
necessarily
so
so
one
example
use
case.
Is
you
know
user
interests?
So
you
know
what
kind
of
sites
are
you
just
visiting
and
again
that
you
know
I
don't
need
to
know,
and
no
one
and
typically
until
no
one
needs
to
know
exactly
what
sense
it
is.
D
But
you
know,
but
you
like
say:
okay,
you
know
bucket
size
by
topic,
so
you
know
how
many
people
are
on
medical
science,
so
see
a
bucket
like
the
topic,
you
have
a
number
of
like
visits
and
minutes
people
span
on
each
topic.
But
again
you
know,
even
though,
even
if
you're
not
measuring
individual
sites,
the
topics
themselves
can
be
sensitive.
So
if
someone
is
visiting
a,
we
know
a
website
associate
particular
medical
condition.
D
You
might
worry
that
they
or
someone
in
the
family
has
that
condition,
and-
and
you
know
you
don't
want
to
have
to
collect
that
information.
So
it's
like
one
problem
statement
is
like
the
distribution
of
time
spent
on
each
type
of
site.
It's
obvious
to
like
be
able
to
generalize
this
kind
of
problem
and
anytime,
you
have,
you,
know
categorical
information
from
users
and
you'd
like
to
collect.
You
know
this
category,
so
you
know
it's
not.
D
You
know
not
just
necessary
sites,
there's
lots
of
other
things
as
well
and
okay.
One
thing
you're
going
to
hear
throughout
this
presentation
is
that
you
know
these
are
these
are
general
small?
These
sort
of
stylized
problems
often
can
be
bootstrapped
into
like
a
lot.
A
lot
of
different
kinds
of
measurement
can
fit
these
kind
of
stylized
problems.
D
Another
use
case,
which
is
which
we
see
very
often,
is
this
like
web
compatibility
problem.
So
you
know
the
web
is
really
big
and
that
websites
sometimes
will
not
work
on
a
given
browser,
and
so
you
know,
voices
very
often
the
website
works.
Fine,
icon
and
chrome
doesn't
work
on
firefox.
D
It
works
fine
on
firefox
release,
bills
and
work
on
firefox
beta
and
it
it
users
will
have
support
these
problems,
but
often
they
do
not,
and
they'll
present
doesn't
support
these
problems,
but
often
they
do
not,
and
so
you
know
even
we
have
fairly
gross
skin
or
problems.
Often
they
don't
get
reported
and
even
if
they
do
get
reported,
you
know
the
latency
of
the
system
means
that,
like
suddenly
a
lot
of
people
experience
problems
before
you've
had
any
chance
to
fix
it.
D
This
is
like
a
really
quite
large
problem
for
for
for
for
browsers,
like
silver
and
firefox,
where
sometimes
the
developers
a
lot
test
on
the
product.
So
the
good
news
is
that
often
you
could
detect
breakage
on
the
client
directly
either
directly
by
you
know
the
api
fails
or
they
try
to
use
any
api
which
doesn't
exist
or
because
the
users
like
what
they
call
range
clicking,
trying
to
reload
the
page
over
and
over
and
over
again
in
order
to
see,
hopefully,
it
will
fix
which
sometimes
it
does
by
the
way.
D
A
very
specific
case
of
this
is
that
it
wasn't
quite
well.
Compatibility
was
a
similar
kind
of
problem,
which
is
many
websites.
Do
this
thing
called
fingerprinting
where
they
measure
persistent
properties
to
create
a
browser
fingerprint.
So
this
is
an
obvious
threat
to
privacy,
it's
alternatives
and
cookies,
and
it's
it's.
We
see
it
being
used,
sometimes
when
cookies
aren't
available
or
when
the
people
don't
trust
cookies
or
something.
This
also
is
often
detectable
on
the
client,
because
you
know
you
can
see
you
can
see
use
of
apis.
D
It
doesn't
make
any
sense,
so
an
example
would
be
like
they
do
webrtc
apis,
but
then
they
don't
actually
have
much
appear
connection.
So
he's
like.
Why
are
you
collecting,
like
you,
know
the
ip
address,
but
you're
not
clicking
here
connection?
So
it's
very
hard
to
learn
with
these
issues
on
a
mass
scale,
and
we,
and
even
though
modern
browsers,
have
what's
called
telemetry,
which
would
say
they
report
data
back
back
to
the
manufacturer.
D
They
do
it
only
with
basically
non-sensitive
data,
because
for
obvious
reasons,
we
wish
to
preserve
the
user's
privacy
and
we
don't
wish
to
learn
whichever
that
they're
actually
on.
So
this
is
the
problem
statement
here
which
like
collect
the
sites
where
the
client
sees
issues,
but
do
it
in
a
way
where
actually
I've
written
the
palm
team
wrong.
Even
the
problem
statement
here
is
collect
the
sites
on
which
clients
are
aggregate,
seeing
issues.
D
I
I
don't
care
you're,
actually
seeing
issues
on
for
the
reasons
indicated
previously,
so
the
the,
if
you
so
pulling
back.
You
know
there
are
multiple
kinds
of
privacy
problems,
so
one
privacy
problem
is
collecting
sensitive
data,
which
is
directly
tied
to
identifying
information,
so
say
so,
for
instance,
a
concrete
example.
D
If
I
just
report
back
what,
if
you
have
a
program
which
report
back
what
website
everybody's
going
to,
even
if
there's,
even
if
it's
not
tied,
even
though
there's
no
like
email
address
in
there,
the
ip
address
is
enough
just
to
see
that
right
and
so
the
and
so
like
the
the
first
privacy
problem,
which
is
just
gathering
like
sensitive
data
but
like
I
usually
identify
directly
attached
to
it
in
some
way.
D
So
that's
problem,
one
problem
two
is
is
even
if
you
don't
have
an
identifier
collecting
sensitive
data
along
with
some
some
non-sensitive
appearing
identifying
information.
So
as
an
example,
the
time
sweetie
in
2014
pointed
out
that
if
I
just
had
your
zip
code,
your
gender
and
your
date
of
birth,
that's
enough
to
identify
like
87
of
people
in
the
united
states.
D
So
so
so
there's
a
good
example
here
is
say
like
look,
you
know,
is
people's
income
individually,
isn't
isn't
isn't
problematic
and,
like
their
birthday,
is
code
initials
on
the
same
problem
when
you
glue
them
all
together
now
I
actually
have
like
everybody's
birthday
or
not
everybody's,
but
a
lot
of
people's
income
individually.
So
we
have
to
fix
both
these
problems
or
we're
not
going
to
be
out
of
the
woods.
D
Okay,
so
the
sort
of
natural
thing
everybody
thinks
to
do
at
this
point
is
say
well
what,
if
we
just
collected
these,
these
are
information
without
any
identifiers.
D
So,
practically
speaking,
you
know,
there's
like
the
sort
of
dumb
way
to
do
this,
which
you
say
well,
we'll
extract
the
identifiers
on
our
side,
and
you
can
just
promise
us.
We
don't
do
that.
People
do
that
and
likes
better
than
nothing.
D
But
you
know,
technical
controls
are
better
than
policy
controls,
and
so
the
the
better
thing
to
do
is
to
strip
identifiers
on
the
client
side,
which
is
a
client
side,
we'll
strip
out
like
on
a
web
browser
throughout
cookies
and
email
address
and
stuff
like
that,
and
then
you
strip
out,
but
that
still
leaves
you
with
a
lot
of
networking
metadata.
You
ship
that
out
with
a
proxy,
so
you've
got
some
proxy
which
is
like
not
associated
or
at
least
not
colluding
with
the
collector.
D
The
data
collector
and
you
end
and
encrypt
the
report
to
the
data
collector.
So
the
proxy
can't
see
the
data
and
the
identifying
information
and
then
the
proxy
strips
off
like
the
metadata
address.
So
the
idea
here
is
that
the
data
is
never
concurrently
identified
and
available.
It's
either
encrypted
or
unidentifiable
at
the
same
time.
So
there's
multiple
technical
ways
to
do
this.
D
They're,
like
connection
logo,
proxies,
like
you
know,
ipsec
2018
2017
connect
mask
there's
application
proxies
like
ohio,
so
this
is
a
great
technology
and
a
technology
which,
which
is
very
valuable
in
number
of
cases.
Unfortunately,
it
is
imperfect,
so
good
use
cases
here
are
when
you
have
like
semi-sensitive
data
so
and
you
want
to
boost
the
privacy.
D
So
you
know,
as
I
said
I
mentioned
browsers
telemetry
now,
but
they
just
they
just
throw
away
the
ip
address,
hopefully
at
the
other
end,
but
which
we
try
to
do.
But
you
know
you
don't
know
it's
happening,
also:
individualized
values
where
you
don't
need
to
dig
into
it.
D
I'll
get
this
in
a
minute
and
freeform
data
like
json
blobs,
which
is
like
hard
to
like
do
like
crypto
on
like
calculate
crypto,
on
also
for
anything
by
the
way,
which
is
an
answer
because
the
multivariate
computation
stuff
I'm
going
to
talk
about
a
minute.
It
does
not
work
properly.
You
need
to
answer.
It's
one
way
only
it's
like
dns
requests
and
safe
browsing
queries
and
stuff
like
that.
Do
well
with
proxies
and
do
I
have
like
bad
kids,
yeah,
good,
bad
use
cases
fantastic.
D
So
so
so
there's
a
number
of
cases
where
this
is
really
useful.
Unfortunately,
this
case
is
where
it
falls
down
to
so
one
place
where
it
falls
down
is,
if
you
have
like,
what's
called
high
dimensionality
data,
so
say
I
have
data
where
I
have
a
lot
of
variables
that
I
want
to
like
look
at
their
relationship
with
them.
D
So
going
back
to
my
income
height
example,
for
instance,
also
the
case
when
you
want
like
subgroups,
so
you
say
like
look,
I
want
to
look
at
only
people
with
this,
this
nationality
or
if
I
want
to
do
correlation
regression,
any
kind
of
status,
processing,
and
the
reason
for
this
is
that,
as
I
was
saying
earlier,
the
more
you
glue
together
these
hot
these,
these,
like
low
individually
low
sensitivity
but
high
dimensionality
data,
sets
the
more
you
can
identify
people,
and
so
there's
this
the
example
I
give
of
you
know
birthdays
whatever.
D
Where
is
it
perfectly
natural?
I
want
to
ask:
what's
the
correlation
between
zip
code
and
income,
but
but
if
I
give
you
and
it's
probably
natural
to
ask
like
what's
the
relationship
between
birthday
and
income,
but
if
I
like
all
those
things
together
suddenly
I
have
identifying
and
there's
this
famous
exam
other
famous
example
of
the
netflix
data
set
where
nariana
demonstrated
that
you
could
like
look
at
very,
very
small
information
people,
netflix
viewing
histories
and
figure
out
how
they
work.
D
So
the
problem
is
that,
if
you're
just
going
to
like
blanket
anonymize,
you
need
to
break
the
data
apart.
You
need
to
take
each
value
and
send
it
separately
unlinkably.
But
then,
when
you
do
that,
you
can't
do
any
of
this.
That's
analysis.
D
It's
also
not
very
good
for
heavy
hitters,
because
if
you
want
to,
if
you
want
to
know,
if
you
want
to
know
only
the
top
end
values,
then
in
order
to
actually
in
order
to
actually
figure
out
which
one's
the
top
end,
you
report
them
all.
And
so
you
say:
look
you
know
what
I'd
like
to
do
is
not
see
the
you
know:
google
docs
instead
of
their
home
page,
but
actually
what
happens
is
that
they
always
send
the
server
and
you
stack
rank
them.
D
So
some
technique
for
fixing
that
is,
is
really
helpful,
and
so
I
I
think
just
not
not
to
foreshadow
this
too
much,
but
the
you
know
the
like.
That's
one
thing
that
star
does
that's
a
second
later
it
tries
to
like
collect
the
data,
but
but
not
see
the
ones
that
are
high
dimensionality.
That's
something
that's
in
scope
for
gpu
for
for
ppm.
This
is
work
as
well.
D
So
the
good
news
is
like
this
situation,
where
cryptography
can
help
us
there's
been
quite
a
bit
of
work
on
how
to
address
this
situation
in
the
past
10
years,
and
we
have
cryptographic
mechanisms
for
us
in
this
problem.
So
the
basic
technology
it
gets
is
called
mpc,
multi-party,
computation
and
the
idea
is,
you
have
two
servers
and
the
servers
are
non-colluding.
What
that
means
is
that
they
interact
with
each
other,
but
they're
not
working
together
to
reveal
your
information.
D
So
the
basic
idea
is
that
each
client
takes
this
data
and
it
splits
it
up
between
two
servers
and
it
sends
like
signed
to
server
and
send
to
server
b
and
the
servers
basically
take
all
the
data
from
all
the
clients
and
they
individually
aggregate
it.
But
data
is
still
encrypted
so
they're
doing
it.
D
It's
a
kind
of
homomorphic
encryption
and
they're
computing,
an
aggregate
over
the
encrypted
data,
and
then
they
share,
take
their
address
and
share
them
together
and
then,
when
the
shares
can
be
reconstructed
to
produce
the
actual
value.
So
just
to
think
about
where
the
situation
is
no,
the
servers
individually
know
who
the
clients
are,
but
they
don't
see
the
clients,
individualized
values
and
the
and
the
collector
gets
to
see
the
the
aggregates
but
never
gets
to
see
any
initial
value
associated
with
anybody's
data.
D
So,
what's
really
important
to
the
trust
model
here
is
what
the
client's
requirement
is.
Is
that
the
two
servers
don't
collude
if
the
service
clue
they
can
compute
digital
values,
and
it's
like
totally
over
the
servers
and
the
servers
in
order
to
guarantee
this
that
they
don't
accidentally
include,
is
in
force
various
invariants
like
minimum
bachelors
and
query
limits.
D
This
is
this:
is
the
hard
privacy
requirement
for
the
client
side?
The
collector's
requirement
is
actually
the
doctors
don't
care
if
the
servers
collude,
but
they
care
the
servers,
execute
the
protocol
correctly
and
either
server
can
distort
the
results,
so
just
to
just
recap
that
for
the
client
to
be
safe,
both
servers
have
to
cheat
for
the
for
the
for
the
cl.
Sorry,
the
unsafe
position
of
the
cheat
for
the
for
the
collector
to
be
unsafe.
D
Only
one
service
cheat
but
from
the
from
the
privacy
perspective.
That's
okay,
because
it
because,
because
once
every
teacher
can't
actually
break
the
privacy
in
variance
this
is
by
the
way
difficult.
D
Both
these
properties
are
really
difficult
to
verify,
especially
collisions
which
happen
with
the
side
channels,
but
point
in
time
audits
are
sort
of
sort
of
the
state-of-the-art
and
again
we're
talking
at
you
know,
trying
out
data
where
sorry,
I'm
also
trying
to
thought
there
it's
five
in
the
morning
for
me
so
anyway,
this
this
is
not
not
necessarily
straightforward
to
verify,
but
you
know
generally,
the
idea
is,
you
can
keep
piling
on
more
servers
and
at
some
point
the
clients
asked.
D
D
I'm
sorry
the
initial
the
first
one
of
these
are
like
really
is
viable,
and
this
is
useful
for
computing,
like
simple
numeric
aggregates,
so
the
basic
idea
here
and
sorry
about
a
little
bit
math,
but
is
each
client
has
like
an
initial
value
x
of
I
so,
like
you
know,
say
it's
like
my
height
or
my
income
is
like
that
and-
and
I
want
to
give
it-
and
I
want
to
get
the
I
get
over
that.
D
So
this
is
like
all
like
high
school
math,
which
is
like
very
nice.
Actually,
it's
like
elementary
school
math,
so
I
generate
some
random
value.
R
sub,
I
in
a
finite
field
p
and
I
send
server
one.
Basically,
my
share
minus
r
sub,
I
modulo
p.
So
again,
that's
like
that's
like
the
elementary
school
math
part
of
it
and
I
send
cert
and
I
send
server
two
r
sub.
I
and
you
should
be
able
to
treat
yourself
pretty
easily.
D
If
I
take
x
of
I
minus
r
sub,
I
and
I
add
r
sub.
I
then
I
get
x
of
I
back
so
this
information,
theoretically
fine
and
each
server
adds
up
the
shares.
So
the
server
server
one
takes
up
all
the
x
sub
eyes.
D
Minus
the
workshop
buys
and
server
twos
takes
all
the
r
supplies
right,
and
so
now,
if
you
take
all
those
you
add
them
up,
you
know,
congratulations
addition
is
commutative,
and
so
you
just
cross
out,
basically
all
the
minus
all
the
r
sub
eyes
and
the
minus
r
supplies,
and
you
get
basically
the
sub
x
of
s
right.
So
what
we've
done
is
we
created
a
situation
where
neither
serves
all
the
data,
but
but
you
have
the
aggregate
anyway.
D
So
there
is.
I
say
this
is
also
elementary
school
math.
They
not
only
just
going
off.
Part
of
this
is
what
happens
if
the
clients
lie,
and
so
the
zero
knowledge
proof
indicating
the
clients
didn't
say
that
I
you
know
I
the
my
I
member
again
meters
tall.
D
The
good
news
is
that
even
this,
like
really
really
kind
of
like
dumb,
encoding
or
or
you
know,
very
limited
system,
compute
enormous
number
of
things
so
just
to
give
you
a
flavor
of
this
arithmetic
mean
is
like
easy
to
compute.
Once
you
have
some,
you
divide
some
by
by
cardinality.
That's
like
obvious,
you
compute
product
by
working
in
log
space
instead
of
working
in
by
working
in
linear
space.
This
by
the
way
is
like
how
slide
rules
work.
D
If
you
remember
sled
rules,
you
could
be
a
geometric
mean
for
product
in
earth.
It's
the
same
way
as
the
arithmetic
mean
you
could
be
variants
of
standard
deviation.
The
same
way
you
can
do
bullion,
bore,
brilliant
or
and
min
max
or
even
ordinarily
squares,
and
just
about
finding
the
writing
coding
for
the
data.
So
that's
really
really
nice
because,
like
you've
got
the
same
basic
structure
and
you
just
basically
say:
oh
now,
I'm
encoding
the
data
in
a
new
way,
and
I
can
give
you
new
things
right.
D
So,
as
I
was
saying,
we
have
this
problem
with
bogus
data.
There
are
two
kinds
of
bogus
data,
one
of
which
is
plausible,
with
false
data.
So
I
say:
look
I'm
really
175
centimeters
tall,
but
I
stay
180
centimeters
tall.
This
is
a
problem.
Any
surveying
technique
has
this
problem.
D
People
lie,
and
you
know
if
you're
gonna
trust
them.
You're
gonna
trust
them.
The
solution
to
this.
As
with
any
surveying
technique,
is
you
live
with
some
of
the
noisy
data
and
you
hope
that
the
lying
is
unbiased
by
the
way
height?
It
probably
isn't.
People
probably
said
taller
than
they
are
the
and
then
there's
the
question
of
completely
ridiculous
data.
So
I
say
I'm
a
kilometer
tall.
D
Well,
I'm
not
a
kilometer
tall,
you
know
nobody's
kilometer
tall
or
worse,
maybe
they
say
I'm
negative
one
kilometer
tall,
which
is
like
if
you
can't
see
the
individual
values.
What
do
you
know
so
in
the
standard
system?
What
you
do
is
you
take
the
data
that
comes
in
and
you
look
at
the
data
and
you
say
like
well,
I'm
not
going
to
accept
anybody
who
says
they're
called
at
all
I'll,
just
let
it
out,
but
obviously
prio.
You
can't
do
this
data
is
encrypted.
D
So
the
solution
to
this-
and
this
is
the
fancy
part-
is
each
submission-
comes
with
a
zero
knowledge
proof
of
validity.
So
in
advance
you
say:
I'm
only
going
to
accept
people
who
say
they're
between
100
centimeters,
tall
and
200
centimeters
tall,
and
you
know
maybe
there's
someone
who's
more
than
200
centimeters
tall,
which
I
think
there
is
but
we'll
just
say,
look
you
know
we'll
call
them
200,
centimeters
it'll
be
okay
and
similarly
below
10
meters.
D
So
the
and
the
zero
knowledge
proofs
just
prove
that
the
proof
of
the
encoding
is
that
this
is
correctly
encoded
and
which,
with
math,
that
I
will
not
attempt
to
explain
that
perhaps
chris
patton
can
the
servers
work
together
to
validate
the
proof
because
individually,
they
shouldn't
be
able
always
real
information.
The
data
and
you
only
aggregate
submissions
with
valid
proofs,
so
there's
basically
filtering
stages
that
haven't
shown.
D
So
you
can
also
collect
you,
know
user
interest.
With
this
technology,
every
user
gets
a
bucket
and
sorry
every
user
interest
gets
a
bucket.
So
if
I
have
100
user
interest,
100
buckets
and
the
user
reports
the
amount
of
time
spent
in
each
bucket,
you
know
including
zeros,
by
the
way.
So
if
I
didn't
spend
any
time
on
car
websites,
it
still
has
to
say
zero.
So
let's
say
I
wasn't
on
a
car
site
and
you
just
pre
to
sum
them
up.
So
it's
like
really
straightforward.
D
You
get
histograms
as
beautiful
right
and
the
servers
again
only
the
aggregates,
not
the
values
which
category
this
is
all
fine.
As
I
started
on
this
footnote,
if
you
also
report
on
times
squared,
you
can
be
standard
deviation
because
I
really
need
to
compute
generation.
So
that's
all
sort
of
okay,
except
that
this
is
all
sort
of
okay,
except
that
it
doesn't
scale
well.
D
So
if
there's
a
hundred
user
interest,
I
have
to
report
a
hundred
integers,
but
there's
a
million
users
that
report
a
million
integers
and
the
intuition
for
this
is
that
if
there
are
integers
for
which
I
have
no
value
at
all-
and
I
just
report
nothing
it's
fine.
You
can
certainly
conclude.
There
was
no
value,
but
now
you
know
I
wasn't
interested
and
you
can
assume
that
anything.
I
did
report
a
number
on.
I
was
interested
in
so
you
have
to
report
all
the
value,
so
it
doesn't
scale
well
at
all.
D
So
there's
this
there's
this
new
technology,
even
newer
called
poplar.
I
don't
know
why
I
still
have
it
called
hits
here.
I
thought
I
replaced
all
those
but
same
people's
prio
that
basically
works
by
collecting
strings,
so
instead,
instead
of
having
so
you
could
do
the
same
thing,
you
basically
say
each
interest
is
mapped
to
a
string.
D
D
The
way
this
works
is
that
you
can
basically
take
the
set
of
submissions
and
and
because,
because
things
are
unknown,
you
got
to
figure
out
what
things
are
most
popular
and
so
you're
able
to
ask
this
question:
how
many
strings
have
prefix
p
and
how
many
pieces
of
psp
with
a
zero
on
the
end
and
versus
one
on
the
end
right,
and
so
now
you
have
a
binary
tree
and
you
can
find
your
search
on
the
tree
and
find
the
most
popular
things.
D
This
is
like
a
very
clever
intuition
and
very
very
powerful
schemes
lets
you
collect.
You
know
the
most
popular
urls,
for
instance,
or,
as
I
say,
basically
any
open
end
stream.
D
So
you
know
how
would
I
imagine
using
this
looks
like
a
real
use
case
every
time
that
every
time
the
site
is
broken,
the
client
creates
a
report
for
each
site
which
is
broken,
and
then
you
use
on
this
technology
to
turn
the
top
sites
and
and
and
and
so
then
you
go
and
you
try
to
investigate
them
right,
and
so
the
nice
thing
is,
the
servers
collectively
only
are
the
most
important
sites,
they
don't
want
to
report
them
and
they
don't
learn
like
the
low
cardinality
that
you
don't
care
about
because,
as
I
said,
this
is
arranged
in
a
binary
tree,
and
so
you
start-
and
you
say
like
here,
all
the
reports
and
then
the
left
side
of
the
tree
like
or
you
know,
half
of
them,
but
at
some
point
you
get
down
and
say:
there's
only
50
reports
to
the
left
side
of
the
tree
and
you're
still
like
way
way
way
at
the
top
of
the
url,
because
you
know
because
most
of
the
urls
are
still
you
still
haven't,
seen
the
x
you're
only
going
down
one
one,
but
at
a
time,
okay.
D
So
the
second
piece
of
this
puzzle
is
that
sometimes
you
want
to,
as
I
said,
dig
into
the
data
right
so
so
I
just
described
like
a
sort
of
like
one-way
pre-program
kind
of
measurement,
but
you
can
also
have
measurement
where
you
want
to
get
into
the
data
later.
D
So
you
say:
look,
you
know,
I
have
a
distribution
of
income,
but
I
really
want
to
look
at
you
know
birthdays
kind
of
whatever
right
and
so
the
and
so
the
way
you
do
this
is
you
put
the
demographic
data
in
the
submission
itself
and
that
basically
gets
carried
along?
That's
unencrypted
and
that's
because
that's
basically
nonsense.
If
you
don't
worry
about
it
so
much
and
then
the
servers
can
say:
okay,
slice,
the
data
only
only
only
ask
for
the
aggregates
over
you
know.
D
I
only
asked
for
addresses
with
a
set
of
birthdays
or
the
set
zip
codes
right.
So
this
is
a
powerful
technology.
The
the
challenge,
some
challenge
in
that,
if
you,
if
I
allow
you
to
make
as
many
queries
as
you
want.
Obviously,
you
can
slice
it
down
to
individuals,
data
values,
and
even
if
I
say
that
you
can
only
compete,
you
can
only
ask
for
like
sets
of
those
more
than
a
thousand.
D
What
I
can
do
is
I
can
create
like
two
sets,
one
with
a
given
user
one
without
and
then
I
can
pull
out.
I
I's
value,
so
there's
a
bunch
of
possible
defenses
here
which
probably
all
used
together
minimum
batch
size
anti-replay.
So
you
don't
can't
ask
this
question
the
same
data
multiple
times
or
too
many
times:
randomization
noise
for
differential
privacy,
those
kinds
of
things.
D
So
I
know
those
are
all:
let's
go
for
each
of
these
proposals
so,
as
I
said
that
there's
a
main
document
here
and
the
one
which
is,
I
believe
in
the
charter
specifically,
is
this:
some
draft
p
gp,
ew
print
ppm,
which
I
guess
I
have
their
own
version
number
here,
which
is
like
a
generic
module
protocol
for
like
any
of
these
mpc
flavor
schemes,
initially
implements
prio
and
poplar,
which
I
I
really
thought
I
did
search
and
replacing
these
clients.
But
maybe
I
forgot
to
rerun
them.
D
Basically,
a
plugable
systems
is
compatible
with,
like
anything
that
can
be
fit
in
this
npc
flavor,
and
to
give
you
a
concrete
example
of
that,
the
the
proposal
facebook
and
mozilla
worked
out
for
in
our
approval
private
attribution,
also
if
it's
reasonably
waterless
framework,
so
it's
built
on
top
of
like
web
service
infrastructure,
so
it's
easy
easy
to
implement
with
existing
stuff,
et
cetera,
et
cetera,
the
usual
thing
and
just
check
very
very
briefly.
You
know.
D
Here's
like
what
the
architecture
looks
like,
like
I
said
before
you
have
these
clients.
Each
client
sends
their
shares
and
they
have
this
these
back
and
forth
aggregates
competition,
and
it
goes
to
the
collector
and
I'm
not
gonna,
bother
to
like
really
kind
of
like
go
into
this
at
all
because,
like
I
know,
pat
has
plenty
of
material
on
it,
and
so
now
I
have
questions.
If
anybody
has
any.
A
Hi
everyone
we're
running
a
little
bit
behind
schedule,
I'd
love
to
get
into
the
next
presentations
thanks
thanks
eric
for
that
really
clear
and
educational
introduction.
If
anybody
has
burning
questions
feel
free
to
jump
into
the
queue
now,
but
but
but
also,
let's,
let's
start
getting
ready
for
our
next
presentation.
E
Oh,
oh!
I
guess
I
can
share
them
from
this.
F
Everyone
can
see
that
right,
it's
working
cool
and
I
have
full
control.
I
am
ready
to
go.
Okay,
yeah!
Thank
you,
ecker.
That
was
a
that
was
a
wonderful
introduction
to
the
the
crazy,
the
crazy
world
we
just
dove
into
with
ppm
okay.
So
I'm
going
to
start
off
the
the
next.
The
next
few
talks
are
kind
of
describing
of
some
open
issues
that
have
come
up
over
the
last
several
months
that
we've
been
kicking
around
this.
F
This
draft
that
we're
talking
about,
and
so
I'm
gonna.
I
I
have
the
pleasure
of
talking
about
one
of
these
one
of
these
open
questions
and
it
has
to
do
with
how
clients
upload
information
upload
their
their
reports
to
to
the
to
the
aggregators
the
the
the
two
servers
that
ecker
talked
about.
F
Okay,
so
I
wanted
to
give
a
quick
overview
of
how
we
think
about
the
architecture
of
the
system
and
we
really
think
of
ppm
as
being
three
sub-protocols
that
are
executed
simultaneously.
So
the
upload
flow
is
clients
push
their
reports.
These
are
input
these.
This
is
the
the
input
shares
the
the
secret
shares
of
their
input
to
the
leader,
and
these
are
encrypted
under
the
the
public
keys
of
each
of
the
aggregators.
F
So
the
the
leader
doesn't
end
up
seeing
them
in
the
clear
and
then
there's
the
aggregate
flow,
which
is
when,
where
the
leader,
the
aggregators
interact
with
one
another,
in
order
to
to
verify
the
validity
of
the
inputs
they're
consuming
as
well
as
aggregate
them
and
compute
compute
shares
of
the
of
the
aggregate
results.
D
D
If
you,
if
you
took
two
minutes
and
covered
the
material,
I
that
I
foolishly
thought
you
were
going
to
cover,
which
is
basically
just
briefly
describing
how
the
architecture
works
in
a
slightly
more
general
sense.
You
know
you're
only
showing
you
know
I
mean.
Are
you
really
sure
like
like
like?
I
know
this
material
but
make
sure
you
you
cover
like
exactly
how
people
think
about
the
whole
system
as
a
whole
that
be
useful.
D
F
Okay,
okay,
yeah
yeah:
that's
that's
what
I'm
attempting
to
do
here.
So,
if
anything's
not
clear
at
the
end
of
the
slide,
people
please
get
in
the
queue
and
ask
questions
because
we're
gonna
we're
gonna,
bring
up
the
pros
and
cons
of
two
different
approaches
here:
okay,
so
yeah,
so
the
upload
clients
upload
their
reports
to
the
the
leader,
the
their
their.
F
The
report
contains
the
input,
shares
and
they're
encrypted
to
to
the
public
keys
of
the
leader
and
helper,
and
that's
so
that
the
leader
doesn't
ever
see
the
input
shares
in
the
clear
in
the
aggregate
flow.
This
is
where
the
input
validation
that
ecker
described,
happens
and
aggregation
of
the
input
shares
and
then
finally,
the
collect
flow
is
there's.
F
There's
the
data
collector,
we
think
of
as
as
a
different
entity
that
interacts
with
the
leader
in
order
to
get
the
the
the
final
the
the
aggregate
result,
and
so
it's
it's
asking
a
leader
for
the
encrypt
aggregate
shares.
Okay,
so
does
anybody
have
questions
about
the
architecture?
F
Yeah,
so
I
mean
the
the
what,
as
ecker
pointed
out,
something
that
we're
going
to
make
be.
Assuming
is
that
they're
that
the
the
leader
and
helper
are
not
are
not
colluding,
and
we
can
discuss
like
I
think
for
now.
I
think
we
should
take
that
as
gran.
Take
that
as
granted,
but
we
can
discuss
like
I
think
the
working
group
might
might
want
to
discuss
ways
of
actually
verifying
or
non-collusion
ecker
go
ahead.
D
I
I
have
a
leading
question,
which
is:
how
do
the,
how
do
the
leader
and
helper
get
coordinated
so
they're
doing
the
same
thing
and
know
which
measurements
are
being
collected
and
what
anything
individually
means
and
like
those
kinds
of
things
right,
do
you
mean
how
they're.
F
Okay,
yeah,
I
think
that
the
yeah,
so
the
I
guess
what's
missing
here-
is
that
the
what's
being
what
what
the
ppm's
protocol
is
meant
to
specify
so
there's
in
the
cfrg
we're
working
on
working
on
a
a
document
that
describes
the
underlying
crypto
bits
like
soprio
and
and
poplar,
and
other
instantiations
of
like
a
thing
of
the
multi-party
computation
step,
and
so
the
the
the
all
of
the
parties
have
to
be
configured
to
execute
the
same.
F
What
we
call
a
verifiable
distributed,
aggregation
function,
vdf
I
was
hoping
to
talk
about
tasks
for
30
seconds:
okay,
okay,
so,
okay,
so
a
task
is
sorry.
It's
also
early
in
the
morning.
For
me,.
F
When,
when,
when
a
collector
okay,
so
when,
when
when
a
client
is
uploading
reports,
it's
it's,
it
needs
to
know
where
to
send
reports,
how
to
generate
them,
and
this
is
determined
by
what
we
call
a
ppm
task
and
task,
and
the
task
is
supposed
to
specify
all
of
the
things
that
all
the
parties
need
to
agree
on.
In
order
to
to
do
to
do
the
computation,
I'm
not
sure
what
else
there
is
to
say
about
that.
H
Do
you
hear
me?
Yes,
okay,
sorry,
this
resets
all
the
time,
so
my
question
is
related
to
are
you
envisioning
to
allow
any
options
of
this
architecture
in
the.
I
H
That
maybe
there
are
applications
where
the
party
collecting
the
data.
The
reports
is
different
from
any
of
the
aggregators
right.
There
might
be
a
reason
that
it's
easy
to
have
a
party
online
that
is
different
from
the
aggregators
that
is
collecting
all
the
reports
and
an
example
of
such
a
system
is
the
the
deployment
that
we
have
in
exposure
notifications
where
our
collectors,
our
ingestion
servers,
are
neither
of
the
aggregators,
and
I
can
also
imagine
other
applications
where
maybe
you
want
to
avoid
the
additional
communication
cost
of
the
leader
sending
to
the
helper
reports.
F
F
Yeah,
so
the
sorry
to
interrupt
that's
what
this
this
presentation
is
about
is
an
alternative
architecture.
That's
been
proposed
and
we're
weighing
like
the
the
the
goal
is
in
my
talk
is
to
weigh
the
pros
and
cons
of
of
the
two
different
approaches
and
to
your
first
question
yeah
I
mean,
I
think
I
think
I
think
that
that's
totally
on
the
table,
but
it's
it's
something
that
also
needs
to
be
discussed.
So
what
I'm?
F
What
I'm
the
perspective
I'm
coming
from
is
from
the
current
draft
and
which,
which
I
think
I
I
guess
we
could
have
spent
more
time
describing
at
a
high
level
but
yeah.
So
that's
that's
totally
on
the
table,
but
it
would
need
to
be
in
the
draft.
J
Yeah
we
in
the
previous
slide
deck
we're
talking
about
aggregators
and.
B
J
F
I
yeah
I
apologize
for
that.
The
leader
and
helper
are
both
aggregators,
so
there's
three
different
kinds
of
roles:
there's
the
client
and
the
the
client,
the
an
aggregator
and
the
collector,
the
leader
and
helper
are
both
due
different
types
of
of
aggregator,
and
the
only
difference
is
that
the
leader
is
kind
of
holding
the
state
of
the
aggregation
flow.
I
thought
I
I
was
hoping
that
would
be
clear
on
on
eckerd's
slides,
but
I
guess
not
I
apologize
for
that.
I
should
have
labeled
this
better.
F
Okay,
I
will
go
through
this
very
quickly.
I
want
to
be
sure
to
save
time
for
everybody
else,
so
this
this
is
what
we
have
today.
This
leader
upload
flow,
where
clients
send
their
both
encrypted
input,
shares
to
the
leader
and
an
alternative
flow,
as
mariana
suggested
is
we
could
instead
have
the
client
send
a
share
of
its
report
to
each
of
the
aggregators,
so
each
each
each
each
of
these
report
shares
would
just
have
the
encrypted
input
share
for
that
for
that
for
that
aggregator.
F
So
I
think
that
in
some
sense
the
split
upload
model
is
a
little
bit
more
natural
and
as
we
and
as
mariana
mentioned,
this
is
what's
already
been
deployed
for,
if
for
for
an
earlier
iteration
of
prio,
so
why
so?
Why
did
we
do
it
this
way?
F
Why
do
we
do
it
this
way?
So
I
think
the
main
motivation
was
we
wanted
to.
We
wanted
to
make.
We
wanted
to
make
it
as
cheap
as
possible
to
stand
up
the
helper
aggregator.
F
So,
with
the
current
architecture,
only
the
leader
has
to
have
has
to
be
able
to
handle
a
lot
of
bandwidth
and
has
in
is
exposed
like
as
like
a
normal
web
service
to
to
to
to
client
traffic,
and
the
aggregation
flow
has
less
bandwidth
because,
first
of
all,
it's
sending
it
doesn't
have
to
send
both
encrypted
input
shares
to
its
peer.
So
we
save
a
little
bit
in
bandwidth
and
also
the
throttle
the
leader
can
throttle
traffic
if
it
needs
to.
F
If
the
helpers
falling
behind
or
if
it's,
if
it's
falling
behind
it,
doesn't
have
to
be
totally
online.
The
way
the
leader
has
to
in
the
upload
flow,
and
then
the
collect
flow
is
just
like
there's
this.
The
collectors
is
just
getting
aggregate
results
which
is
fairly
cheap.
To
do
another
motivation
is
that
there's
kind
of
a
race
condition
in
split
mode
where
the
leader
is
receiving
report
shares
and
can
initiate
the
aggregation
flow
at
any
time
and
then
and
then,
when
the
helper
gets
it
helper
receives
its
report
shift.
F
So
if
the
helper
receives
its
report
shared
first
before
the
leader
begins.
Well:
okay,
sorry,
if
the
helper
doesn't
receive
the
report
share
before
the
leader
receives
its
report,
sharon
begins
aggregation,
then
we'll
we
might
have
to
like
drop
that
report
and
there's
ways
to
fix
this
in
the
protocol
and
then
the
third,
the
and
then
the
third
point
is
that
in
the
split
upload
world
the
upload
flow
is
is
more
likely
to
fail,
because
there
are
two.
F
F
The
big
downside
to
the
leader
upload
model,
as
mariana
suggested,
is
that
the
aggregation
flow
has
higher
than
necessary
bandwidth,
because,
basically,
we're
we're
sending
the
client
could
have
sent
the
report
share
instead
of
the
leader-
and
this
is
a
significant
problem
for
poplar,
because
input
shares
are
big
and
there's
we're
going
to
run
the
aggregation
flow
several
times
on
the
same
set
of
reports.
F
In
order
to
to
finish
the
the
computation
of
the
of
heavy
hitters
and
and
so
higher
bandwidth
between
the
leader
and
helper
means
higher
egress
cloud
egress
costs
between
like
cloud
providers,
which
is
an
important
consideration
for
us.
F
So
I
think
we
have
mainly
two
options,
which
is
what
I'd
like
to
discuss,
but
I
I
guess
we
should
maybe
just
take
it
to
the
list,
because
we're
we're
running
low
on
time.
We
can
stick
with
the
leader
upload
model
and
try
to
mitigate
its
downside.
That
is,
you
know,
don't
require
re-transmitting.
F
The
report
shares
in
the
in
the
protocol
and
there's
a
question
of
whether
this
is
like
enough
to
reduce
the
egress
costs
or
we
can
take,
or
we
can
consider
adopting
the
split
upload
flow
and
leave
and
base
yeah.
I
don't
have
much
time
to
talk
about
this.
There's
there's
options
for
it's,
it's
possible
that
we
can
mitigate
the
the
downsides
by
by
but
but
kind
of
leaving
it
up
to
the
deployment.
F
So
there's
this
there's
been
the
suggestion
of
putting
an
adjuster
between
the
client
and
the
lead
and
the
leader
in
the
helper,
which
can
coordinate
transmission
of
report
shares
and
solve
some
of
the
the
coordination
problems.
Then
there's
this
question
that
eckerd
brought
up
on
the
list,
which
is
you
know,
is
in
what
sense
is
the
ingester
trusted
or
untrusted?
So
that's
my
time.
I'll
I'll
kick
it
back
to
then.
A
Okay,
thank
you
chris.
I
want
to
make
sure
that
we
get
through
all
the
presentations
on
this
topic.
We
did
build
in
time
for
questions
after
the
three
discussions
on
this
draft,
so
if
folks
have
have
really
high
priority
brief
questions
jump
in,
but
but
also,
let's
get
set
up
for
the
next
set
of
slides.
A
Oops
eric.
D
I
I
just
want
to
make
sure,
like
I
seen
a
bunch
of
chat
budget
budget
discussion
on
the
chat.
I'm
sorry,
he
says,
make
sure
people
like
understand
what
that's
at
stake
here,
which
is
primarily
what's
the
stake
here.
D
Is
you
know
whether
or
not
the
role
of
distributing
the
data
to
the
so
the
the
the
role
the
leader
has
to
perform
is
orchestrating
the
computation
and
the
role
that
is
instead
of
performing
and
distributing
the
data
to
to
the
help
developers
right
and
so
at
stake
here
is
whether
we
should
separate
out
those
rules
and
have
the
role
distribute
between
the
data
helpers
be
done
separately
from
the
and
directly
in
this
case,
from
the
role
of
orchestrated
computation.
So
that's
for
the
stake
here.
It's
not.
D
It's
not
it's
not
a
security
question.
It's
really!
It's
like
largely
a
it's
a
largely
operational
question.
J
I
think
the
other
comment
in
the
chat
was
that,
if
it's
possible
to
describe
the
system
where
the
role
according
to
the
computation
does
not
require
participating
in
the
competition
that
might
also
make
it
cleaner
right,
you
could
say
there's
the
coordinator
and
the
coordinator
may
also
play
a
role
as
one
of
the
participants
in
the
computation,
but
I
don't
understand
the
the
actual
mechanics
of
the
computation
well
enough
to
know
whether
those
roles
could
be
split
out.
I
think
that's
the
thing
that
was.
F
I
think
that
there
is,
I
think,
that's
an
interesting
suggestion.
I
think
I
mean
you
could
like.
I
think
you
could
imagine
like
like
it
would
be.
I
think
it
would
be
nice
to
cleanly
separate
separate
those
things.
I
don't
know
if
it's
like
that
easy
at
least
to
specify
but
yeah.
I
think
I
think
it's
there's
a
lot
to
discuss
here
and
I
think
hopefully
we
can
have
a
have
a
good
discussion
about
this
on
on
the
list.
D
Just
just
to
answer
that
question
a
little
bit
just
sorry
just
answer
that
question
a
little
bit.
I
think
the
challenge
is
that,
from
the
perspective
of
running
the
computation,
it's
actually
quite
straightforward
to
specify,
as
if
the
leader
like
we're,
not
one
of
the
elevators,
but
in
order
to
actually
do
the
computation,
because,
like
you're,
really
just
sending
messages,
saying
compute
this
compute
this.
But
in
order
to
actually
do
the
computation
you
have
to
know
which
shares
are
available
in
order
to
describe
them
to
each
side.
So
they
can
be
aggregated.
D
And
so
we
have
to
invent
some
mechanism
by
which
the
leader
learned,
which
shares
were
available
in
order
to
computation
and
that'd,
be
like
new
protocol
mechanics
or
we
have
to
like
just
say
it's
magic
and
and
so
that
that
I'm
sort
of
looked
to
describe,
there's
no
protocol
mechanics
if
no
one's
actually
gonna
implement
it.
That
way,
but
perhaps
you
could
just
say
it
was
mad.
It
was
a
magic
channel.
K
All
right,
good
morning,
everyone,
this
is
going
to
follow
in
the
heels
of
some
of
the
comments
and
questions
that
came
up
during
chris's
presentation,
in
particular
how
the
collect
sort
of
piece
of
the
protocol
works
and
what
sort
of
requirements
you
need
to
have
in
place
in
order
to
ensure
that
the
the
resulting
ppm
protocol
has
the
desired
privacy
properties
that
you
want.
The
echo
kind
of
alluded
to
in
his
overview.
K
It's
going
to
get
a
bit
technical
in
terms
of,
I
guess
how
the
particular
collect
flow
works
for
the
current
instantiation
of
ppm.
So
if
you
have
any
questions
along
the
way
feel
free
to
chime
in
or
pop
in
the
queue
or
whatever
and
I'll
try
to
answer
them
so
at
the
highest
of
levels.
This
is
how
you've
seen
this
diagram
in
different
shapes
and
forms
before,
but
this
is
how
the
collect
flow
basically
works.
K
K
Once
aggregation
is
done,
the
leader
will
produce
basically
an
aggregate
share
or
each
aggregator
approves
an
aggregate
share,
and
then
the
collector
will
eventually
query
the
leader
for
this
for
these
aggregate
shares
and
then
combine
them
together
to
get
the
the
aggregate
result.
Each
aggregator
also
maintains
during
the
aggregation.
K
I
guess
flow
of
this
in
this
particular
interaction.
All
of
the
individual
reports
that
were
submitted
by
clients,
the
the
individual
reports
themselves,
as
well
as
as
well
as
also
sort
of
the
aggregates
that
are
accumulated
as
the
aggregation
happens
such
that
it
can
present
them
to
the
collector
and
the
requirements
here.
Are
you
know
how
to
in
what
ways
can
the
the
collector
ask
the
the
aggregators
for
different
aggregate
outputs
such
that
the
the
the
privacy
requirement
is,
is
satisfied
and
by
the
privacy
requirement?
K
What
we
mean
basically,
is
that
the
the
aggregation
output
ensures
that
there's
some
minimum
number
of
reports
that
went
into
it,
where
a
minimum
number
of
reports
is
something
that's
configured
as
part
of
a
measurement
task
in
the
system.
So
all
of
the
entities
that
are
participating
for
a
particular
measurement
agree
on
what
this
this
min
batch
size
is.
K
This
minimum
this
minimum
threshold
is
and
the
requirement
is
that,
no
matter
how
how
the
collector
chooses
to
query
and
interact
with
the
aggregators
it
cannot
produce
or
derive
in
any
in
any
way
a
a
an
aggregate
that
was
based
on
fewer
reports
than
this
that's
risk
threshold.
K
There's
also,
of
course,
a
correctness
requirement,
which
is
that
aggregation
that
is
either
triggered
by
collection
or
happens
before
collection,
actually
that
the
collect
request
actually
comes
in
includes
basically
the
same
reports.
So
if
a
client,
you
know,
if
there's
n
reports
from
n
different
clients
or
n
shares
of
n
reports
from
n
different
clients,
those
are
all
included
in
the
same
aggregate,
and
you
know
we
don't
have
like
one
aggregator
aggregating.
K
Some
set
of
report
shares
and
another
aggregator
aggregating,
a
different
set
of
report
shares
because
the
output
would
be
garbage.
So
these
are
these
are
kind
of
two
informal
goals
of
collection
and
try
to
walk
through.
Why
currently
bpm
does
not
satisfy
the
latter
goal
and
ask
some
questions
that
hopefully
get
us
towards
thinking
about
how
it
might
satisfy
this
later
down
the
road.
K
So
during
aggregation,
aggregators
kind
of
keep
track
of
individual
report
shares
in
batches
and
batches
are
divided
over
time,
where
the
the
length
of
time
is
some
some
parameter.
We
call
min
batch
duration,
so
might
be
a
day
might
be
a
week
or
an
hour.
Whatever
a
report
is
tagged
with
a
particular
time
stamp
effectively,
and
that
puts
it
in
one
of
these.
These
time,
windows
and
collect
requests
when
they're
when
they
are
issued
by
the
collector,
indicate
the
the
time
window
over
which
the
collector
wants
collection
to
occur.
K
Currently,
the
the
collect
request,
the
parameters,
the
the
correct,
collect
request,
in
particular,
the
time
parameters
must
align
with
batch
window
boundaries,
so
they
must
align
like
on
the
picture
here
on
like
t
minus
one
or
a
t
or
t,
plus
one
or
whatever,
and
and
that
that's
effectively
kind
of
the
only
constraint.
K
The
the
current
validation
for
collect
requests
is
basically
it's
composed
of
two
steps,
the
first
of
which
is
to
a
check
to
see
that
the
the
time
parameter
parameters
do
indeed
align
on
these
batch
window
boundaries
and
the
second
of
which
is
to
make
sure
that
for
the
given
boundary
or
for
the
given
time
window
specified
by
a
single
collect
request,
independent
of
any
previous
collect
requests.
K
K
So,
as
an
example,
imagine
you
had
a
collect
request
for
the
window
of
t
minus
one
to
t.
If
you
were
to
look
at
the
two
criteria
here,
it's
indeed
a
valid
window.
K
Aligns
on
the
boundaries
and
totally
arbitrary,
but
like
tried
to
draw
the
picture
such
that
it's
like
yes,
this
is
a
indeed
a
valid
size.
There's
enough
reports,
there's
not
too
few
many
reports
that
go
in
this
particular
window,
so
both
both
both
criteria
are
met
and
in
this
collect
request
would
be
satisfied.
K
Similarly,
you
could
ask
for
this
collect
request
again
time.
Window
is
valid
because
it
aligns
on
the
batch
window.
Boundaries
and
the
size
is
also
valid,
because
it
covers
simply
more
reports
than
the
previous
collect
request.
Martin,
I
just
noticed
you're
in
the
queue
I
don't
know
when
you
joined
the
queue,
but
is
it
a
clarifying
question.
L
L
How
does
the
collector
know
so,
if
the
collector's
going
to
ask
for
say,
t,
plus
one
and
t
plus
two
which
have
what
appears
to
be
a
total
of
one
sample,
submitted,
that's
not
going
to
meet
your
minimum
balance,
that's
right!
K
Right,
I
mean
the
collector
doesn't
know
how
many
things
go
in
the
report
until
it
asks
or
how
many
things
are
inaccurate
until
it
asks
I
guess,
and
it
gets
an
output
from
the
system
indicating
that
there
were
indeed
enough
things
in
that
particular
window,
or
no,
that
there
were
not
enough
things
in
the
window,
so
an
aggregate
could
not
be
produced,
but
anyways.
If
you
look
at
these
two
collect
requests
independently,
they
seem
valid.
K
They
check
both
criteria.
Unfortunately,
however,
it's
pretty
trivial,
if
you
sort
of
only
validate
that
these
collect
requests
are
valid
in
isolation.
It's
trivial
for
a
malicious
collector
to
basically
use
the
output
to
compute
an
aggregate
that
is
composed
of
less
than
the
threshold.
K
If
you
think
of
these,
as
just
like
that
they
collect
requests,
yielding
like
a
set
of
reports,
then
you
can
just
like
compute
the
set
difference
between
the
the
first
collect
request
or
the
output
from
the
first
collect
request
and
the
output
from
the
second
collect
request,
and
in
this
particular
example
that
set
difference
would
yield
exactly
one
record
or
one
report
which
is
unique
to
some
client
breaking
the
informal
privacy
goal
that
we
had
at
the
beginning.
K
So
this
is
clearly
a
problem
and
something
we
need
to
fix.
So
it's
kind
of
an
open
issue
in
the
draft
right
now.
K
K
So,
for
example,
you
might
want
to
say
give
me
the
aggregate
for
this
particular
time
window
that
came
from
all
clients
that
have
the
specific
user
agent
string,
or
you
might
want
to
say,
give
me
all
the
aggregates
in
this
time
window
for
reports
that
came
from
this
particular
geographic
region.
To
allow
you
to
sort
of
drill
down
into
errors,
if,
like,
for
example,
these
are
like
measurements,
you're
collecting
from
the
perspective
of
what
web
browser
and
currently
the.
I
K
So,
given
these
two
things,
the
the
the
the
desire
to
sort
of
maintain
this,
this
informal
privacy
goal
stated
above
our
previously,
as
well
as
as
well
as
the
potential
flexibility
you
might
want.
You
might
want
in
time
and
space.
K
K
Given
any
of
this
any
any
particular
sequence,
the
collector
must
not
be
able
to
produce
deduce
compute,
whatever
an
aggregate
that
is
composed
of
less
than
the
threshold
number
of
report
shares,
and
this
is
it's
fairly.
It
would
be
fairly
easy
to
sort
of
enforce
this
rule.
If
the
protocol
was
aware
of
the
sort
of
space
dimension,
how
that's
actually
done
mechanically,
like
how
reports
are
tagged
in
space
like
they
are
in
time,
is
sort
of
an
open
question.
K
Do
we
want
to
even
do
that
as
an
open
question,
but
I
think
I
think
we
like
there.
We
understand
fairly
well
how
we
would
implement
this.
It's
just
a
question
of
how
this
how
these
different
constraints
and
parameters
are
expressed
in
the
collect
request.
M
K
Civil
attacks
are
a
different
issue
entirely.
I
think-
and
I'm
not
trying
to
address
them
in
this
particular
this
particular
discussion.
K
We
do
have
to
have
like
accommodations
for
civil
attacks
like
either
clients
themselves,
introducing
random
reports
or
leaders
themselves
introducing
random
reports,
but
that's
separate
from
how
the
how
the
collector
queries
for
things
and
tries
to
produce
or
tries
to
violate
the
privacy
requirements.
K
Okay,
so
the
questions
I
have
for
the
group-
basically
I
first
is-
is
the
validation
problem.
Clear
in
particular,
is
as
like.
The
current
issue
in
the
draft
I
have
described
it.
K
Is
it
clear
and
understandable
to
folks
is
the
sort
of
informal
privacy
requirement
also
clear,
and
if
so,
how
do
we
want
to
sort
of
augment
the
protocol,
if
at
all,
to
accommodate
queries
to
or
allow
people
to
query
on
the
basis
of
time,
as
well
as
potentially
on
the
basis
of
space
or,
like
you
might
imagine,
just
simply
relying
on
the
fact
that
the
aggregation
protocol
itself
will
always
yield
or
the
output
of
the
aggregation
protocol
will
always
ensure
that
both
aggregators
agree
on
the
same
the
same
reports
that
went
into
a
particular
aggregate,
so
you
may
not
need
to
specify
in
in
full
detail,
for
example,
how
how
collection
requests
are.
K
Exactly
how
the
validation
criteria
is
is
to
be
enforced.
You
might
just
rely
on
the
fact
that
the
aggregate
protocol
sort
of
enforces
that,
for
you
and
there's
probably
other
questions
as
well.
Stephen.
I
Hi
steven
farrell,
so
chris,
you
talked
about
time
and
space
in
privacy
pass
yesterday
and
those
are
a
different
space.
K
I
K
Yeah
I
get
I
mean
I
I
tend
to
think
of
it
as
like.
Just
a
bit
string,
you
know
the
bit
string
might
be
like
you
might
might
encode
like
the
user
agent
and
bitstream
might
include
like
geographic
data
in
the
bitstream,
but
I
guess
depends
on
your
definition
of
what
space
is.
I
think,
going
back
to
the
examples
is
probably
the
easiest
way
to
think
of
space.
K
So,
like
you,
you
know,
imagine
like
the
the
space
constraints
being
a
fixed
set
of
user
agents
and
each
report
being
associated
with
a
particular
user
agent
or
something.
I
K
That's
right,
I
think,
that's
that's
a
necessary
requirement
if
you
want
to
and
like
enforce
the
privacy
requirement
on
the
basis.
If
you
want
to
allow
people
to
query
based
on
these
additional
parameters,
then
I
think
those
parameters
need
to
be
visible
to
the
people
who
enforce
the
the
validation
criteria.
H
I
I
K
Think
that's
a
good
point
and
I
think
this
goes
to
how
much
sort
of
flexibility
query
flexibility
you
want
in
the
collect
flow,
whether
or
not
you
you
want
to
allow
that
like
any
of
this
additional
data
to
be
expressed
and
and
as
a
result,
be
sort
of
exposed
to
the
aggregators
or
not
the
utility
of
the
collection
sort
of
depends
on
you
know
what
information
is
available.
A
Thanks
in
the
interest
of
time,
I
think
we
should
move
to
tim's
presentation.
N
All
right
look,
let's
get
into
it,
so
bear
with
me
a
number
of
these.
I
think
a
lot
of
our
material
today
was
written
under
the
assumption
that
the
folks
have
a
lot
of
familiarity
with
the
draft,
so
yeah
bear
with
me
if
a
lot
of
this
gets
too
technical
or
too
into
detail
about
the
state
of
draft
one
okay.
N
So
we're
going
to
talk
now
about
the
the
current
officer
after
the
specification
that
we've
been
working
on
as
well
as
some
work
that
a
few
of
us
have
been
doing.
That
goes
beyond
that
draft
cool
all
right.
N
So,
in
anticipation
of
this
week's
meeting,
we
submitted
a
new
author's
draft
of
the
ppm
specification,
which
is
draft01,
which
contains
quite
a
lot
of
changes
relative
to
graph
zero
zero
that
was
submitted
back
at
ietf112
for
the
for
the
buff,
we're
going
to
cover
a
number
of
those
changes
as
we
work
through
this
deck.
But
first,
let's
talk
about
the
status
of
this
current
draft
so
to
briskly
recap:
this
draft
specifies
ppm,
which
is
a
framework
a
protocol
framework
for
privately
computing
aggregate
functions.
N
It's
based
on
prio,
but
we've
generalized
from
there,
so
that
it
can
work
with
any
instantiation
of
a
verifiable
distributed.
Aggregation
function,
which
is
a
specification
that's
being
discussed
in
the
cfrg
ppm,
is
designed
to
coordinate
the
execution
of
vdafs
across
multiple
non-colluding
aggregators,
and
the
sharing
of
inputs
across
both
servers
is
how
ppm
deployments
provide
meaningful
privacy
to
users.
N
So
azure
explained
to
us
earlier.
This
targets
a
variety
of
motivating
use
cases
ranging
from
simple
statistics
to
the
heavy
hitters
problem.
N
All
right.
We
also
covered
earlier
very
briskly
how
ppm
is
composed
of
three
different
sub
protocols
that
execute
simultaneously.
The
upload
flow
is
for
clients
to
secret
share
their
their
input
and
upload
them
to
aggregators.
N
The
aggregation
flow
is
where
the
leader
and
helper
interact
to
verify
the
validity
of
inputs
and
then
aggregate
the
reports
and
compute
aggregate
shares.
And,
finally,
the
collect
flow,
which
chris
would
just
give
us
some
insight
into
is
how
the
collector
gets
aggregate
shares
from
the
leader
to
get
final
results.
N
So
we
know
from
some
experience
that
draft
001
is
almost
fully
implementable,
at
least
like
the
happy
path
of
uploading
reports
and
computing
aggregates
is
implementable
and
we
feel
it
satisfies
the
the
key
deliverables
defined
in
the
ppm
working
groups
charters.
N
So
in
particular,
we
have
mechanisms
for
client
submission
of
measurements,
for
the
the
joint
evaluation
of
proofs
of
validity
of
those
measurements
and
for
the
computation
of
aggregates
to
be
delivered
to
some
recipient,
and
all
of
these
are
defined
in
a
way
that
makes
them
flexible
enough
to
accommodate
multiple
underlying
algorithms
through
the
vdaf
abstraction.
N
So
we're
going
to
discuss
later,
I
think
adoption
of
this
draft,
but
I
suppose
we
I
want
to
say
at
the
stage
that
well
we
think
that
this
draft
is
good
enough
to
be
adopted
by
the
working
group.
Okay,
on
more
on
that
later,
I
suppose
moving
on
so
going
past
draft
zero
one.
We
at
isrg
and
our
colleague
with
the
club
research
team,
we've
been
working
on
a
set
of
proposed
changes
that
we're
calling
them
interoperability
target.
N
N
So
our
goals
here
are
to
run
a
deployment
of
ppm
using
the
prio
three
based
vdas,
since
the
popular
one,
vdaf
isn't
quite
ready
yet
so
bizarre
to
let
us
hammer
out
a
bunch
of
interesting
protocol,
edge
cases
and
some
important
error
handling
scenarios
and
we're
hoping
to
learn
a
lot
about
which
parts
of
specification
are
difficult,
either
to
implement
or
operationally.
N
And
what
we'll
learn
can
then
be
fed
back
into
discussions
in
the
working
group
and
has
proposed
changes
to
the
specification
okay.
So,
let's
look
at
where
we're
at
today
with
development.
So
the
most
substantial
piece
of
progress
that
we
can
report
is
that
the
specification
of
the
303
vdas
is
has
matured
quite
a
bit.
So
you
can
take
a
look
at
that
in
draft
zero.
N
One
of
the
vdif
document,
which
was
presented
to
cfrg,
I
believe
yesterday
by
chris
patton
and
his
co-authors,
so
we
also
have
a
complete
implementation
of
that
draft
of
303
vdafs,
along
with
test
vectors
for
them
in
librio
rs,
which
is
going
to
be
used
in
both
the
cloudflare
and
icerg
ppm
implementations
and
hopefully
others
in
the
future.
So
of
course
you
can
find
those
documents
and
implementation
up
on
github
or
on
the
data
tracker
at
these
links.
N
So
we
I
srg,
also
currently
have
a
toy
implementation
of
ppm
somewhere
around
draft01.
It's
missing
a
number
of
important
protocol
features,
and
it's
really
just
a
toy
that
couldn't
actually
be
deployed
onto
the
internet.
It
can
only
talk
to
itself
not
to
any
other
implementations,
and
it
has
no
persistent
story,
but
it
does
demonstrate
that
the
happy
path
of
the
protocol
can
be
implemented
at
end
to
end.
N
That
also
is
up
on
github
if
you're
interested
and
finally,
both
cloudflare
and
isrg,
we're
working
on
actual
deployable,
implementations
of
the
evolving
interop
target.
N
Okay,
now,
let's
turn
to
looking
at
like
what
what
proposed
changes
are
we
proposing
in
the
interoperable
protocol?
So
far?
So
let's
talk
about
the
aggregate
phase,
so,
as
we
discussed
earlier
right,
ppm
is
made
up
of
three
sub
protocols.
Upload
aggregate
and
collect
the
meat
of
the
complexity
lives
in
the
aggregate
flow,
and
it's
consists
of
the
coordinated
execution
of
a
vdaf
across
the
two
aggregators.
N
We
also
refer
to
this
frequently
as
preparing
inputs
and
what
that
means
is
taking
input,
shares
and
transforming
them
into
output
chairs
that
can
be
summed
into
aggregates.
So
what
that
means
depends
on
the
particular
vdaf.
It
could
just
mean
evaluating
the
the
validity
proof
or
in
some
other
vdfs
there
might
be
some
more
significant
transformation
all
right,
so
I
spelled
out
currently
in
draft
zero
one.
The
specification
lacks
sufficient
detail
to
really
be
implemented.
N
So
one
of
the
things
that
the
interop
target
does
is
to
be
more
detailed
about
how
to
detect
and
handle
disagreements
between
the
aggregators
we've
also
updated
ppm
to
use
the
current
verbs
and
message
types
are
defined
in
draft
zero.
One
of
the
vdif
specification.
N
So,
above
the
vdaf
level,
we
we
also
relative
to
draft
zero
zero.
Excuse
me:
relative
drop,
zero
one
have
eliminated
what
was
called
the
helper
state
blob.
So
in
back
and
draft
zero
zero,
we
had
the
goal
of
having
no
storage
requirements
for
helpers.
So
the
goal
there
was
to
foster
a
diversity
of
aggregator
operator
by
making
it
as
easy
as
possible
for
anyone
to
run
a
helper
with
minimal
infrastructure
requirements
or
operational
overhead,
but
aggregating
shares
into
vdaf
is
inevitably
a
stateful
process.
N
That's
because
the
coordinated
evaluation
of
the
proofs
is
a
multi-round
protocol
and
there's
a
state
that
carries
over
from
one
one
round
to
the
next,
the
vdfs
that
we
envisioned
currently
so
again,
that's
pre
the
different
303
based
ones
and
popular
one
are
all
two
round
protocols,
but
you
could
have
arbitrarily
many
rounds
and
bdafs
that
come
along
in
the
future.
N
Our
solution
back
and
direct
in
draft
zero
zero
drop
zero
one
was
to
shift
the
burden
of
storage
onto
the
leader
by
having
hold
on
to
the
helpers
encoded
state
across
the
sequence
of
aggregate
protocol
requests.
So
we
see
this
illustrated
in
the
sequence
diagram
over
here
on
the
right.
In
the
first
request,
the
leader
is
sending
to
the
helper
a
a
sequence
of
reports
of
encrypted
report
chairs,
along
with
some
other
parameters
needed
for
aggregation
upon
receipt.
N
N
The
leader
next
combines
the
the
help
message
that
it
received
with
its
own
and
sends
a
sequence
of
combined
prepare
messages
for
the
helper.
So
that's
one
prepare
message
for
each
report
right
and
sends
to
the
helper
also
the
the
serialized
state
blob
that
it
received
previously.
So
in
this
illustration,
the
second
request
from
the
leader
happens
to
be
sent
to
a
different
instance
of
the
helper
than
the
first
time
around.
You
might
imagine
that
there
are
multiple
replicas
behind
a
load
balancer.
N
It
looks
like
this
is
fine,
since
all
the
state
is
in
the
serialized
state
blob.
But
of
course
there
were
several
downsides
to
this
design,
so
the
first
is
you're
going
to
spend
extra
bandwidth,
transmitting
the
state
back
and
forth
over
multiple
rounds
of
the
protocol.
N
Further
its
contents
are
secret,
which
means
that
the
helper
implementation
has
to
be
responsible
for
encrypting
its
serialized
state
to
protect
it
from
the
leader
tampering
with
it
or
just
seeing
it.
And
of
course
we
have
to
stop
the
leader
from
replaying
old
states
into
the
helper,
so
anti-replay
means
the
helper
has
to
store
a
counter
or
something
like
that
to
prevent
state
rollbacks,
and
so
we've
already
failed
to
meet
our
goal
of
no
helper
storage.
N
N
N
So
we
have
to
prevent
the
leader
from
replying
a
client
report
into
the
helper
to
rule
out
attacks
that
would
enable
the
leader
to
learn
something
about
the
client
input
in
draft
zero
zero.
This
was
solved
by
defining
a
total
ordering
over
report
nonces
and
then
requiring
the
leader
to
send
nonsense
to
the
helper
in
ascending
order.
N
The
helper
would
then
defend
against
replays
by
keeping
track
of
the
highest
nonsense
ever
seen
and
refusing
any
reports
older
than
that.
So
this
doesn't
work,
though,
if
you
have
multiple
helper
instances
that
are
working
in
parallel.
In
this
illustration,
the
leader
has
carved
up
the
work
of
aggregating.
The
k
reports
that
fall
into
some
aggregation
into
three
chunks
each
meets
the
requirement
of
ascending
nonsense,
but
to
meet
the
anti-replay
requirements.
N
The
helper
instances
have
to
share
the
highest
nonce
counter,
which
is
sort
of
shown
in
this,
like
ambiguous
cloud
of
storage
on
the
right.
So
if
the
helper
3
happens
to
do
its
work
first
and
then
commits
k
as
the
highest
nonce,
then
harpers,
1
and
2
have
to
reject
all
the
reports
they
get,
which
is
obviously
bad.
N
So
in
draft
zero
one,
we
acknowledged
that
we
had
over
indexed
on
the
goal
of
lightweight
helpers
and
we
accepted
that
requiring
that
helpers
have
a
trusted
database
or
some
kind
of
trusted.
Storage
isn't
really
all
that
bad,
especially
since,
as
we
discussed
draft
zero,
zero
required
them
to
have
some
kind
of
storage
anyway.
N
They've
ever
seen
up
to
some
reasonable
data
retention
period
so
that
they
can
refuse
to
aggregate
a
report
if
they
know
they've
already
been
included
in
that
aggregation
helpers
also
have
to
keep
track,
of
which
batch
intervals
they've
serviced
a
collect
request
for
so
they
can
refuse
reports
new
reports
that
fall
into
those
intervals
to
mitigate
some
of
the
attacks
that
were
talked
about
earlier
by
chris
wood,
the
drop
zero
one
still
has
the
helper
state
blob
and
the
attended
anti-replay
counters.
N
Since
we
already
accepted
non-trivial
helper
storage
requirements,
we
decided
in
the
interrupt
target
to
take
the
next
step
and
do
away
with
helper
state
all
together.
N
So
instead
we
require
helpers
to
store
their
own
intermediate
state,
but
to
preserve
the
nice
property
that
you
know
different
rounds
of
the
of
the
prepared
protocol
don't
have
to
be
serviced
by
the
same
helper.
N
We
introduced
and
said
the
concept
of
an
aggregation
job
id
which
is
assigned
by
a
leader
when
it
constructs
aggregation,
requests
and
can
be
used
later
by
helper
to
look
up
the
state
associated
with
the
preparation
of
a
set
of
shares.
But,
unlike
the
old
helper
state,
the
job
ids
aren't
secret
and
they
don't
require
any
replay
attack.
Mitigations.
N
Okay,
I
think
in
the
interest
of
time
and
we're
gonna,
I'm
gonna
skip
over
this
and
okay
jump
ahead
to
the
topic
of
gracefully
recovering
from
input
preparation
failures.
All
right.
This
is
a
this-
is
a
problem
that
we
learned
quite
a
lot
about
while
operating
the
exposure
notifications,
private
analytics
system
over
the
last
couple
of
years.
N
So
first,
let's
recall
some
of
the
math
about
how
aggregations
over
the
secret
share
of
work.
So
suppose
we
have
n
value
where
each
is
sharded
into
one
chair
for
an
aggregator,
a
and
the
other
for
aggregator
b
such
that
they
sum
back
up
to
the
original
value
modulo
some
prime
number
p.
N
So
we
compute
aggregates
by
having
each
aggregator
sum
their
sequence
of
shares,
then
the
sum
of
those
sums
is
congruent
to
the
sum
over
the
original
inputs
and
again
all
under
modulo,
p,
okay,
but
for
some
large
n
for
some
large
number
of
reports
n.
We
expect
that
errors
will
occur
such
that
there
will
be
cases
where
one
aggregator
happens
to
accept
a
different
set
of
shares
than
the
other.
N
So,
if
10
shares
out
of
a
million
get
rejected,
we
still
would
like
to
be
able
to
aggregate
over
the
other
999
990
shares,
because
that's
still
a
lot
of
good
data,
but
we
have
to
make
sure
that
both
aggregators
are
using
the
same
set
of
shares
so
recall
that
for
each
share
v
sub,
I
a
that
is
to
say,
like
the
I
you
know,
report
with
the
iaf
report
and
the
share.
N
N
N
Like
say
you
know,
you're
measuring
something
like
how
many
times
did
a
user
click
a
button,
and
you
get
some
of
like
absurdly
huge
number,
so
in
the
scope
of
the
interrupt
target,
what
we
are
aiming
to
do
is
to
have
each
aggregator
include
some
data
in
aggregate
chairs
that
will
allow
us
to
detect
this
kind
of
problem
and
measure
how
bad
it
is.
N
N
All
right,
so
that's
the
highlights
some
of
the
highlights
of
what
we've
been
batting
around
as
we
work
towards
an
interoperability
deployment.
So,
to
reiterate,
our
goal
here
is
not
to
our
goal
here.
Excuse
me
is
to
operate
experimental
deployment
and
then
come
out
of
that
with
with
some
learnings
some
experience
and
some
data
that
will
allow
us
to
start
some
discussions
in
the
working
group
to
then
you
know,
propose
some
changes
to
the
protocol
and
that,
hopefully,
will
let
us
answer.
Oh
sorry,
benjamin.
N
No,
no
finish
your
sentence:
okay,
yeah,
I'm
just
wrapping
up
last
slide.
So
the
interesting
questions
that
we're
hoping
people
to
discuss
going
forward
with
what
we
learn
are
so
for
which
interactions
and
to
what
extent,
should
the
ppm
protocol
specify
authentication
for
for
requested
messages,
or
you
know,
instead
of
doing
that,
maybe
we
should
be
specifying
transport
security
requirements
and
letting
deployments
choose
for
themselves
how
to
meet
them.
Also,
like
there's,
there's
some
places
where
the
protocol
introduces
shared
secret
parameters
between
actors,
particularly
between
the
aggregators.
N
How
are
we
going
to
go
about
negotiating
those
and
particularly
rotating
those
and
then
operationally?
What
is
the
life
cycle
of
reports
or
for
the
state
associated
with
their
processing?
When
is
it
acceptable
for
one
of
the
participating
servers
to
discard
old
data
and
to
make
that
clear?
Might
we
need
an
explicit
commit
phase
during
the
preparation
protocol,
such
that
both
aggregators
can
have
high
confidence
that
they're
aggregating
over
the
same
set
of
shares?
A
A
So
if
there
are
very
brief,
clarifying
questions
feel
free
to
get
in
the
queue,
but
otherwise,
let's,
let's
bring
up
the
dss
star
presentation,
I
believe,
alex
davidson,
is
presenting.
O
Yeah
yeah,
okay,
cool,
okay,
cool
yeah,
so
I'm
alex
I'm
gonna
be
talking
about
star,
which
is,
as
has
been
a
reference
previously
like
an
alternative
protocol,
sort
of
idea
that
could
potentially
fit
into
the
ppm
framework
and
yeah
so
I'll,
just
get
straight
to
it.
So
star
is
kind
of
it's
very
similar
to
this
like
popular
one
approach
that
has
been
mentioned
so
it.
O
The
idea
is
that
we
want
to
come
up
with
a
system
that
we
can
find
like
heavy,
hitting
like
arbitrary
data
and
specifically
when
we're
building
star,
we
essentially
want
to
provide
canned
anonymity
for
clients
to
provide
sort
of
arbitrary
data
forms.
So
the
idea
is
that
a
number
of
clients
would
send
data
and
then,
when
you've
received
k,
reports
all
containing
the
same
data
point,
the
aggregating
server
will
be
able
to
reveal
them.
O
O
If
we
just
focus
on
that,
as
seen
as
the
functionality
is
quite
similar,
is
that
it's
quite
expensive
to
run,
and
also
these
like
having
to
have
this
aggregation
process
between
like
multiple
non-including
services
or
something
that
was
quite
difficult
for
us
to
get
off
the
ground
during
the
aggregation
phase.
O
So
so
I
was
kind
of
an
attempt
to
build
like
a
privacy,
preserving
system
that
allows
you
to
reveal
these
like
heavy,
hitting
data
points
without
having
to
run
aggregation
collaboratively
with
these
like
non-polluting
elements
and
also
you
know
along
the
way.
Obviously
you
want
to
perceive
privacy
and
then
also
like
there
was
also
a
goal
of
trying
to
use
like
simple
cryptographic,
primitives
and
techniques
where
possible,
rather
than
introducing
like
novel
mechanisms
for
running
this
aggregation.
O
So
the
idea
behind
star,
hopefully,
is
very
simple,
so
it
spent
three
phases
so
essentially
the
first
phase
is
this
randomness
phase
and
the
the
idea
behind
that
is
at
this
phase.
Is
that
we're
trying
to
establish
a
scenario
where
different
clients
non-interactively,
can
establish
secrets
of
the
secret
shares
using
any
old
threshold
secret
sharing
scheme
of
the
same
value?
That
will
be
that
you'll
be
able
to
combine
together,
so
this
randomness
phase
essentially
allows
the
clients
to
establish
correlated
randomness,
depending
on
their
measurement
data
point
and
later
on.
O
In
the
measurement
phase,
the
clients
are
going
to
sample
sort
of
secret
shares
of
that
measurement.
It's
a
little
bit
more
complicated
but
and
then
send
these
measurements
to
the
aggregation
server
and
then
in
the
aggregation
server.
When
the
server
receives
like
k
of
these
shares.
O
If,
if
k
is
the
threshold
in
the
secret
sharing
scheme,
then
they'll
be
able
to
reveal
the
measurement,
so
the
randomness
phase
can
be
done
in
cooperation,
either
with
a
random
server
or
like,
alternatively,
locally
just
derived
from
the
measurement
expert
that
has
cycle
security
issues,
and
this
is
something
that
was
already
raised
by
previous
protocol
designs
like
proclo
by
google.
O
One
of
the
nice
things
about
star
is:
you
can
include
like
additional
auxiliary
data
as
well
with
your
measurements,
but
the
the
threshold
itself
is
only
imposed
on
the
measurement
itself
and
then
I've
written
there's
this
notion
of
an
epoch
which
I'll
also
talk
about
which
is
defined
by
the
randomness
server
and
they're.
Randomly
serving
the
aggregations
say
that
here
are
non-colluding,
but
the
idea
is
that
they
don't
have
to
collaborate
in
the
aggregation
phase.
O
So
what
exactly
is
happening
here?
So
the
idea
is
that
if
you
sample
you
can
sample
randomness
locally,
using
just
like
a
hash
function,
to
write
like
defined
over
your
measurement
space.
But
that's
gonna
only
really
work.
If
you
have
high
entropy
measurement
distributions-
and
it's
not
clear
whether
you
know
such
measurement
distributions
exist,
and
so
one
of
the
sort
of
things
that
we're
introducing
with
star.
O
Is
this
like,
like
remote
way
of
something
running
this
via
an
oblivious
pseudorandom
function
which
is
controlled
by
the
mapping
of
the
server,
so
clients
submit
their
measurements
obliviously
and
the
server
evaluates
the
oblivious.
O
You
don't
function
over
them
using
a
secret
key
which
is
kind
of
like
tight
the
epoch,
and
so
this,
like
randomness,
that
the
client
sample
would
then
go
just
into
this
like
message:
construction
algorithm,
so
the
clients
essentially
encrypt
their
data
point
along
with
any
auxiliary
data
and
then
derive
an
encryption
key
from
some
portion
of
the
randomness
that
they
get.
They
secret
share
like
this
randomness
that
they
use
to
derive
the
key
and
this
round.
One
here
is
sort
of
like
the
correlated
aspect
where
you
can.
O
You
can
like
construct
shares
like
consistently
without
interacting
as
clients,
and
then
they
also
introduce
a
tag
which
isn't,
which
is
an
important
note
for
security,
and
essentially
these
tags
will,
when
the
clients
then
just
send
all
these
messages
to
the
service,
email
and
the
server.
Then
groups
together
all
the
messages
with
common
tags,
because
these
deterministically
derived
from
the
measurement
and
then
can
recover
the
measurements
themselves.
Using
this
like
secret
sharing
recovery
process,
cool.
H
So
I
just
have
a
clarifying
question.
The
server
here
learns
all
the
counts
on
tags
in
the
clear
yes,
yeah,
that's
correct,
you're,
not
we're
not
protecting
accounts
here.
O
O
Cool,
so
the
security
model
that
we
use
is
obviously
comparable
with
popular
one,
so
we
need
non-conclusion
of
the
randomness
and
aggregation
service
to
you
know.
To
for
this
to
work
for
clients,
we
consider
like
a
malicious
adversary
server
that
also
may
control
a
subset
of
clients
to
kind
of
like
model
this,
like
civil
attack
propensity
and
as
as,
as
mariana
just
pointed
out,
the
messages
that
encode
the
same
measurements
are
leaked.
O
So
even
if
you
don't
reveal
the
measurement
itself
that,
like
the
deterministic
tag,
leaks
which
subsets
like
the
messages
belong
to-
and
so
you
can
see
that,
but
the
goals
here
are
confidential
absolute
confidentiality,
the
measurements
are
sent
by
less
than
or
equal
to
k,
clients
and
like
robustness,
of
the
aggregation
cool.
O
So
I
just
wanted
to
talk
a
little
bit
about
like
the
civil
attack
window
for
stark,
as
this
is
like
the
most
damaging
attack.
So
essentially
we
want
you,
because
this
deterministic
tag
is
present
in
the
messages
it
allows
the
aggregation
server
potentially
to
try
and
learn
the
message
just
by
trying
to
like
run
like
an
offline
dictionary
attack
on
the
measurement
space.
O
So
obviously,
with
the
local
model,
this
is
absolutely
possible,
and
so
one
of
the
reasons
that
we
use
this
like
remote
model
of
sampling
randomness,
is
to
kind
of
like
shorten
this
attack
window
and
also
move
the
like
offline
dictionary
attack
to
something
online
that
has
to
be
carried
out
with
the
in,
like
as
queries
to
the
random
the
server
so
yeah
like
so
in
doing
so,
we
kind
of
like
make
ensure
that,
as
long
as
the
client
messages
are
sampled
in
only
in
this
window
and
then
they
received
afterwards
and
the
the
aggregation
server's
attack
propensity
is
also
limited
to
the
tool
cool,
hello.
J
Sorry,
can
you
go
back
slide?
I
just
wanted
to
check
if
I'm
understanding
the
sorry
one
more
slide,
I'm
just
gonna
wrap
my
head
around
this
randomly
server,
the
I
guess
the
previous
slide
has
looks
like
the
client
is
sending
x
the
secret
value
directly
to
the
randomness
server
themselves.
O
O
O
Sorry
yeah,
I
yeah
that
this
diagram
is
not
that
helpful
for
explaining
how
the
random
server
actually
works,
but
okay
yeah.
So
this
oprf
is,
as
defined
in
like
the
oblivious
universal
functions
draft
as
that's
coming
with
the
cfrg.
So
essentially
the
client's
input
is
blinded
beforehand
and
then,
like
the
server
evaluates
the
function
on
it
and
this
the
client
receives
the
response
and
then
can
unblind
it
and
then
gets
the
like.
Real
value
output
got.
O
Okay,
yeah,
so
just
as
a
quick
comparison
with
poplar
one
so
in
star
clients
can
send
arbitrary,
auxiliary
information
with
their
data
point,
which
may
or
may
not
be
useful
star.
As
I
mentioned
star
leakage
reveals
all
the
sub
sort
of
measurements
that
hide
the
same
measurement.
Even
if
the
threshold
is
not
satisfied,
which
is
obviously
very
important.
O
One
of
the
things
that
popular
one
does
is
it.
You
know
it
may
even
be
part
of
the
functionality.
Is
it
reveals
all
the
heavy
hitting
prefixes
of
strings
and
for
some
of
our
use
cases,
we
only
wanted
to
reveal
the
heavy
hitting
string
in
its
entirety
rather
than
the
prefixes
itself,
and
one
one
obviously
star.
O
Only
during
the
aggregation
phase
only
requires
a
single
aggregation
server,
which
makes
things
a
lot
more
cost
effective
because
you
don't
use
any
bandwidth
and
the
the
computation
itself
is
very
minimal
in
terms
of
how
potentially
star
could
fit
into
the
pbm
framework
or
at
least
how
we
envision.
O
It
is
that
kind
of
this,
like
leader
and
collector,
in
this
diagram
and
in
star
these
are
kind
of
the
same
entity,
and
there
were
just
no
helpers,
so
clients
submit
you
know
via
some
mechanism,
either
oh
hi
or
whatever
like
an
anonymizing
proxy,
or
it
doesn't
even
have
to
be
included
in
the
vitamin
proxy,
but
it
it
massively
improves
privacy.
If
you
do
and
submits
things
to
this
entity
and
the
entity
just
performs
the
aggregation
and
learns
the
output.
O
So
we
we
we,
you
know,
as
I
mentioned
before,
star
is
kind
of
like
a
trade-off
between
trying
to
reduce
the
costs,
while
also
trying
to
maintain
some
like
meaningful
privacy
guarantees
so
and
also
not
having
like
noncluding
entities
work
together
to
perform
the
aggregation,
and
so
some
of
these
are
some
of
the
things
that
we're
trying
to
emphasize.
O
When
we
talk
about
the
advantages
of
star
and
also
for
functionality,
we
allow
auxiliary
data
which
may
or
may
not
be
useful
and
simple
cryptography
in
the
sense
that
we
don't
have
to
implement
quite
complex
new
protocols
in
order
to
build
the
aggregation
process,
and
so
just
to
conclude,
yeah.
O
We
think
star
provides
kind
of
some,
for
at
least
for
us
like
provides
like
a
private,
preserving
reporting
mechanism,
for
you
know,
entities
with
limited
resources
and,
without
you
know,
expert
implementation,
knowledge,
and
we
think
that
some
of
the
trust
assumptions
are
preferable
to
those
made
by
either
prior
or
popular,
and
so
just
to
finish.
O
Up
like
some
of
the
questions
we
wanted
to
ask
is
whether
the
working
group
is
kind
of
interested
in
this
as
a
alternative
protocol
spec,
and
if
it,
if
it
was,
would
this
star
draft
fit
into
like
the
working
group
and
also
the
ppm
kind
of
specification.
A
Thanks,
okay,.
B
A
Have
two
things
I
wanted
to
say:
first,
not
as
chair,
so
okay.
Let
me
let
me
step
back
one
more
step.
We
have
12
minutes
by
my
clock
before
the
end
of
the
session.
We
have
a
lot
of
things
that
people
want
to
talk
about
I'll,
try
to
be
fast.
A
A
Could
we
represent
that
as
one
helper
and
and
represent
the
aggregator
as
the
other
helper
and
lay
out
a
vdaf
protocol?
Even
if
it's
not
exactly
natural.
O
O
I
think
I
think
if
there
was
some
way
of
like
building
extra
functionality
into
the
ohio
proxy,
then
maybe
there
was
like
there'd
be
some
way
of
like
interacting
with
that
in
order
to
create
some
information
and
then
like
sending
the
reports
through
to
the
leader,
which
could
also
be
a
single
entity
in
the
case
of
star,
perhaps
but
yeah.
A
Okay
and
secondly,
as
chair
I've,
I've
heard
several
comments
about
draft
adoption,
especially
for
the
priv
ppm
draft,
so
I'd
appreciate,
hearing
from
star
authors
and
other
people.
If
people
could
comment
on
whether
they
think
the
priv
ppm
draft
is
ready
for
working
group
adoption
thanks.
F
Alex
was
so
you
mentioned
that
the
the
the
differences
in
trust
assumptions
do
you
need
the
randomness
server
and
the
aggregation
server
to
not
collude.
O
Yeah
yeah,
so
you
require
both
of
those
entities
to
not
pollute
yeah.
Okay,.
O
Right,
it's
it's
comparable,
but
they
don't
have
to
because
they
don't
have
to
collaborate.
During
the
aggregation
phase,
you
can
kind
of
like
it's
easier
to
split
other
functionality,
and
one
of
the
I
guess
one
of
the
hopes
with
like
oprf
based
functionality
is
that
we
would
have
entities
running
oprs
kind
of
as
a
service
and
yeah
at
that
point,
it's
easier
from
a
practical
perspective
to
argue
that,
like
that,
if
you
like
an
application
server
in
an
entity
that
just
runs
an
opr
kind
of
as
a
service
are
like
not
colluding.
F
H
I
want
to
make
a
comment
that
I'm
not
sure
I'm
kind
of
buying
it.
There
are
two
different
ways
to
view
non-collusion.
I
I
feel
like
for
me
this.
If
there
is
a
non-collusion
assumption,
there
is
a
non-collusion
assumption
in
either
protocol,
so
I
I
would
kind
of
challenge
the
statement
that
these
are
different
trust
models.
A
Thanks
mariana,
it
seems
like
the
something
strange
happened
with
the
queue
a
bunch
of
people
were
in
the
queue
and
then
got
out
and
back
in.
A
Eric
I
saw
you
at
the
top
of
the
cube
before.
If
you
want
to.
D
Go
ahead.
Well,
I
want
to
clarify,
make
sure
I
understand
your
question
because
you
asked
whether
whether
we
thought
on
ppm,
the
priv
document
was
ready
for
for
for,
was
ready
for
acceptance,
but
then
people
were
still
talking
about
stars,
so
you
want
to
talk
about
both
or
what
just
to
be
clear.
A
It
sounds
to
me
like,
as
as
chair,
it
sounds
to
me,
like
the
authors
of
the
priv
ppm
draft,
largely
feel
that
it's
ready
for
adoption.
Although
I
haven't
heard
everybody
comment
on
that,
I
wanted
to
also
ask
the
star
draft
authors
what
they
thought
of
adopting
the
proof
ppm
draft,
because
and
and
everybody
else
in
the
working
group,
because,
of
course,
we
adopt
by
rough
consensus.
D
So
since
I
am
here,
I
will
say
that
the
ppm
doctors
were
ready
for
adoption.
I
think
that
the
I
think
that
people
were
asking
in
the
chat
whether
or
not
these
were
mutually
exclusive
they're
not
under
complimentary.
Each
is
good
for
some
tasks.
I
don't
think
we
have
to.
I
think,
if
we
drop
through
ppm,
we
can
still
adopt
star
later,
and
I
think
you
know
it's
clear
that
it.
D
It's
really
clear
that
they're
not
not
subject
to
each
other
entirely,
so
so
I
think
like
they're,
not
that
they
don't
proclaim
each
other.
A
J
Sorry,
I've
I've
lost
much
more
thought
while
trying
to
describe
so.
I
will
pass.
L
Have
to
wait
for
the
wonderful
meat
echo
delay,
so
I
sort
of
came
into
this
thinking
that
the
group
would
be
adopting
something
I
think
we'd
be
foolish,
not
to
not
to
do
something
that
fits
within
our
charter.
L
I
don't
see
any
alternatives
for
the
the
sort
of
things
that
are
being
achieved
by
the
prio
work
and
I'd
like
to
see
a
draft
for
that,
I'm
a
little
less
clear
on
the
sort
of
practical
aspects
of
the
popular
stuff,
but
I'm
happy
to
sort
of
take
the
the
ppm
draft
where,
where
it
is
right
now,
I'm
not
quite
sure
about
the
applicability
of
the
the
star
stuff.
Just
yet,
there's
there's
a
bunch
of
sort
of
usage
constraints
that
need
to
be
better
understood.
L
I
think
that's
also
true
to
some
extent
for
the
the
prio
work
as
well.
There
are
a
number
of
questions
that
came
up
through
the
presentations
which,
I
have
to
say
left
me
less
confident
than
more
confident
at
the
end
of
them,
which
is
not
usually
how
you
want
these
things
to
go.
But
overall
I
think
we
should
be
taking
the
the
something
on
at
at
some
point:
do
it
to
a
call
for
adoption
on
the
on
the
list
after
this.
P
So
I
guess
becker
had
remarked
that
you
know
the
the
star
and
the
priv
or
priya,
I'm
not
sure
what
we're
calling
it
they're
compatible
and
we
can
potentially
do
both.
But
if
it's
looking
at
the
charter
and
the
charter
says
that
we
will
deliver
one
or
more
protocols
which
can
accommodate
multiple
ppm
algorithms.
And
so
I
wasn't
really
sure
if
you
know
if
we
did
both
star
and
prio,
would
those
be
different
protocols
or
would
those
be
different.
D
Do
you
want
me
to
jump
in?
I
think
they
have
a
different
protocol.
That
was
that
was
the
question
that
was
being
asked
earlier.
I
think
about
whether
or
not
you
could
cram
you
know.
I
think
that
that
the
term
here
I
think
so
I
mean
the
print
framework-
was
designed
to
accommodate
multiple
as
patent
was
saying
vdfs,
which
is
the
term
you
know
which
is
like
the
specific
term
is
using
for
algorithms,
but
I
don't
think
it's
very
practical
to
cram
star
into
as
a
vdif.
D
K
I
wanted
to
note
the
distinction
between
ppm,
the
framework
that
drives
sort
of
the
vdf
execution
and
then
the
vdf
specification
itself,
which
is
happening
in
the
cfrg,
wherein
popular
and
pre-specific
bits
are
being
standardized
into
questionnaires,
whether
or
not
ppm
as
a
wrapper
around
vdf
is
sort
of
an
appropriate.
K
You
know
a
good
place
to
start
and
all
of
the
complexities
that
come
with
poplar
and
whatnot,
I
think,
or
most
of
the
complexities
that
come
with
poplar,
can
be
relatively
cons,
isolated
and
constrained
within
the
context
of
the
vdf
draft.
K
That
said,
like
there
are
certain
aspects
of
the
vdf
execution
that
bubble
up
into
ppm,
like
how
the
collector
needs
to
drive
a
collection,
and
there
are
multiple
iterations
of
that,
but
in
general
I
kind
of
I
wanted
to
make
that
split
like
the
this
is
like
protocol
engineering
work
and
the
cfrg
vdf
stuff
is
more
of
the
the
crypto
specific
bits.
J
So
I
remember
what
my
question
was
sorry.
This
is
a
question
about
star
I
was.
It
was
unclear
to
me
what
the
specific
collusion
risk
is
between
the
randomness
server
and
the
aggregator
and
star,
and
I
wondered
if
the
star
anybody
from
the
star
team
or
alex,
particularly
if
you
want
to
just
try
to
summarize
what
you
think
the
collusion
risk
is
there.
O
Yeah
sure
so,
essentially,
if
the
random
server
and
the
aggregation
server
colludes,
then
the
random
server
can
reveal
the
like
opr
secret,
key
to
the
aggregation
server
and
it
kind
of
moves.
The
the
like
online
attack
back
to
being
like
a
local
offline
dictionary
attack
again
against
the
measurement
space.
Q
David
gennazi,
google,
I
have
to
admit
I
feel
a
bit
more
confused
now
than
I
did
two
hours
ago,
but
I'm.
Q
Okay,
let
me
eat
the
mic,
I'm
I
have
to
admit
that
I'm
a
bit
more,
I
feel
more
confused
now
than
I
was
two
hours
ago,
but
I'm
going
to
attribute
that
to
it
being
friday-
and
I
haven't
slept
enough
this
week,
but
I
just
wanted
to
say
that
this
is
a
great
starting
point
for
the
working
group,
and
so
I
hopefully
support
adopting
the
ppm.
A
Document,
okay,
mariana:
we
we
tried
to
close
the
queue,
but
what
I'll
ask
is
everybody
who's
read
the
draft?
Could
you
could
you
please
comment
in
jabber
whether
whether
you've
read
the
the
ppm
draft,
so
we
have
some
idea
of
of
how
many
people
have
read
the
draft
and
maybe,
while
that's
going
on
mariana,
can
can
chime
in.
H
I
guess
I
just
wanted
to
make
the
point
that
we
actually
implemented
both
prior
and
popular
and
in
the
question
of
whether
they
they
are
really
fitting
together
in
a
single
framework.
I
would
say:
yes,
I
think
there
are
like
traders
between
communication
and
computation,
but
the
main
challenges
that
we
kind
of
see
kind
of
seen
across
all
of
the
previous
presentations
are
common
across
this
protocol.
So
I
would
strongly
say
that
they
do
fit
in
the
same.
A
Okay,
thanks
to
everybody
for
for
telling
us
that
that
you've
reviewed
the
draft,
it's
great
to
see
that
there's
been
been
so
much
review.
Unfortunately,
we
don't
have
time
to
run
a
live,
a
live
poll.
I
think
it's
too
late
for
a
hum.
We
are
officially
out
of
time,
but
but
we
will,
I
think,
run
a
call
for
adoption
on
the
mailing
list.
A
I
want
to
make
sure
to
thank
dkg
for
volunteering
ahead
of
time
to
to
take
to
take
notes.
That
was
incredibly
incredibly
helpful.
It
saved
us
a
bunch
of
time
and
made
time
for
all
of
this
great
conversation,
roman.
G
A
G
Roman,
I
just
wanted
to
jump
in
the
end
because
I
should
have
done
in
the
beginning.
I
just
wanted
to
thank
you,
ben
and
sam,
for
stepping
up
to
serve
as
chairs,
and
you
know
to
convey
how
exciting
it
is
to
have
a
working
group,
and
I
also
wanted
to
thank,
as
I
did
in
sag,
the
proponents
of
the
working
group
for
really
helping
us
get
from
what
often
doesn't
happen
from
a
buff
to
a
working
group
by
the
next
meeting
and
that
all
comes
through
hard
work
and
preparation.
G
A
Yes,
thanks
again
to
joe
for
for
being
on
site
and
making
sure
everything
ran
smoothly
today.
Okay,
I
think
that
this
session
is
over.
Ietf
113
is
over
thanks.
Everybody
for
coming
see
you
on
the
list.