►
From YouTube: IETF108 PEARG 20200727 1100
Description
PEARG session at IETF108
2020/07/27 1100
B
A
Okay,
well
welcome
everybody
to
the
privacy,
enhancement
and
assessment
research
group
meeting
for
itf
108,
our
first
online
meeting
using
the
new
meet
echo
tool.
I'm
sarah
dickinson
also
chairing
today
are
shivan
sahib
and
chris
wood
as
per
usual.
This
session
is
subject
to
the
note.
Well,
so
please
read
and
digest
that,
seeing
as
we
are
using
me
taco
for
the
first
time,
please
do
bear
with
us
if
you
run
into
any
technical
hiccups
along
the
way
as
this
is.
A
A
A
We
have
a
lot
of
short
presentations
today,
so
our
intention
is
to
hold
off
taking
questions
until
the
end
of
each
of
the
presentations
and
even
though
it's
a
remote
meeting-
and
we
have
some
identification,
please
still
continue
to
identify
yourself.
As
you
start
speaking,
just
to
help
all
the
other
participants.
A
A
As
with
other
meetings,
there
is
the
neat
echo
chat
and
the
jabber
room,
and
we
will
still
be
taking
questions
there
in
case
you
have
problems
with
your
audio
or
you
just
prefer
to
do
it
that
way,
and
if
you
do
please
pre-pend
your
comment
with
mike,
if
you
actually
want
it
brought
into
the
queue
and
siobhan
is
going
to
be
a
javascribe
today,
he's
going
to
watch
the
queue
and
bring
any
questions
there
that
arise
via
the
chat,
okay,
so
our
agenda
today,
I
hope
I've
covered
most
of
the
administrative
stuff.
A
A
A
Okay,
I'm
not
seeing
any
comments
on
the
agenda,
in
which
case
I
will
switch
over
to
the
first
presentation
today
and
that
is
being
given
by
stephen
farrell
and
it's
on
the
topic
of
testing
apps
for
covid19
tracing
so
steve,
and
these
are
your
slides
if
you
just
want
to
say
next
slide
when
you
need
me
to
progress
them,
and
let
me
right,
let's
get
your
turn,
my
video
off.
C
C
C
Great
thanks
I'll
flick
through
these
slides
quickly
happy
to
take
questions
later.
So
this
is
work
done
by
myself.
The
colleague,
doug
leath
and,
to
be
honest,
doug
did
most
of
the
smarter
stuff
here.
Not
me
next
slide,
so
the
background
here
is:
we
clearly
have
a
pandemic
in
ireland.
C
The
we'll
talk
about
is
the
irish
government,
the
health
services
executive,
essentially
at
the
public
health
authority,
generally
they've
they're
perceived
locally
to
have
done
a
good
job
and
that
kind
of
impacts
on
on
what
happens
when
you
try
to
deploy
these
kind
of
applications
for
covert
tracing,
which
kind
of
started
with
a
march
2020
paper
referenced
here
that
basically
asserted
that
a
mobile
phone
app
could
make
this
all
better.
C
It
would
be
nice
if
that
was
true,
we're
not
sure
it's
true,
nor
what
the
security
consequences
are.
So
basically,
we
wanted
to
set
up
a
little
project
to
examine
that
and
our
project
is
basically
taking
the
approach
of
not
espousing
any
particular
method
for
tracing,
but
rather
just
to
be
testing.
What's
been
done.
So,
although
we're
talking
today
about
the
google
apple
api,
we
started
out
by
looking
at
some
of
the
singapore
work
that
predated
that,
I
guess
worth
noting.
C
Maybe
is
that
the
march
2020
paper
friendly
as
well
way
back
again
in
dtn
times
about
2011.
There
was
a
flu
phone
proposal
from
a
part
of
came
out
of
junk
across
the
group.
So
it's
you
know
that
the
idea
of
using
phones
and
bluetooth
for
this
kind
of
thing
is
not
that
new.
Although
this
is
obviously
the
first
time
we
tried
next
slide.
C
So
the
next
slide
is
about
some
caveats
so
yeah.
This
is
a
fast
moving
area
and
google
and
apple
have
released
some
some
additional
snippets
of
the
way
private
is
described
at
least
of
code.
I
haven't
really
looked
at
them
yet,
but
a
lot
of
the
things
a
lot
of
what's
going
on
here
is
changing
rapidly
in
a
matter
of
weeks,
so
again
take
the
results
and
the
slides
and
and
so
on
here,
as
a
snapshot
and
again,
we
have
to
be
understanding
about
lots
of
lots
of
decisions.
C
C
C
One
part
is
the
app
code
which
is
developed
by
a
public
health
authority
so,
for
example,
the
irish
health
services
executive
and
the
second
half
is
the
exposure
notification,
exposure
notification,
api
implementation,
which
is
from
google
or
apple
on
their
development
platforms,
and
basically
the
app
handles
the
interaction
between
the
handset
and
the
public
authority
servers
in
a
particular
region
and
the
api
implementation
within
the
operating
system
somewhere
handles
managing
bluetooth
beacons.
C
So
how
does
it
work?
The
handsets
basically
generate
a
new
symmetric,
temporary
exposure
key
every
day.
They,
you
know
every
250
milliseconds.
They
send
out
a
bluetooth
beacon
which
has
a
rolling
proximity
identifier
value,
that's
essentially
a
value
derived
from
the
daily
key
that
changes
every
ten
minutes
and
the
the
crypto
derivation's
there.
For
that
part,
looks
fine.
The
beacons
also
include
this
rolling
value
that
changes
every
10
minutes
as
the
bluetooth
mac
address
changes.
C
It
also
includes
an
encrypted
transmit
power,
which
is
helpful
for
figuring
out
trying
to
figure
out
approximately
and,
although
that's
just
using
counter
mode,
so
it's
authenticated
encryption
that
seems
okay
if
it
was
the
case
that
any
random
app
on
the
device
could
spot
the
outbound
beacon
value
and
manipulate
and
send
another
beacon
with
different
transmit
powers.
C
That
might
be
a
problem,
but
apparently
that's
not
possible,
at
least
on
android,
the
handsets
listen,
although
they're
sent
every
four
times
a
second,
the
handsets
only
listen
about
four
seconds
out
of
every
four
minutes
for
beacons
and
I
presume
that's
battery
saving
and
then
they
record
the
beacon
value
which
includes
the
rpi
and
encrypted
transmit
power,
and
the
received
signal
strength
indicator
that
the
the
inbounds
beacons
had,
and
the
idea
basically
is
everybody's
running
these
apps.
C
If
a
person
tests
positive,
then
their
local
health
authority,
the
medics
who
tell
them
that
they're
tested
positive
will
give
them
some
kind
of
authorization
code
which
allows
them
to
upload
the
the
temporary
exposure
keys,
they've
used
for
the
last
two
weeks
or
so
to
the
public
health
authority,
who
then
publish
those
every
other
handset,
so
bob
in
this
case
downloads
them
every
few
hours
or
so,
and
they
can
compare
against
the
stored
beacons
and
determine
if
there
was
a
proximity
event.
So
next
slide
now.
C
The
stated
goal
of
these
apps
is
essentially
at
least
in
ireland,
very
similar
to
what
the
the
the
advice
that
the
the
medics
give
about
when
you
might
have
been
in
in
contact
that
seems
to
be
used
by
the
manual
contact
racers.
So
the
the
state
of
all
of
these
apps
is
to
detect,
where
the
two
handsets
within
two
meters
for
more
than
15
minutes,
based
essentially
on
looking
at
the
signal,
strength
and
matching
the
rpi's
in
texas
and
and
so
on.
C
I
don't
believe
that
can
be
reliably
done
as
we'll
see,
but
the
details
of
how
to
do
it
we'll
skip
through
quickly.
But
nonetheless,
there's
essentially
a
mix
of
the
transmit
power
and
rsi
to
derive
an
attenuation
figure.
The
api
implementation
doesn't
directly
say
that
was
a
proximity
event
or
not
what
it
does
is.
It
gives
back
attenuation
duration,
values
which
are
described
as
in
that
complicated
bullet
there.
The
thing
to
note
here
is
that
the
the
health
service
or
the
health
authority
app
sets
some
thresholds
gives
those
to
the
api.
C
The
api
does
the
key,
the
matching
of
the
tech
and
the
text
and
rpis
and
then
returns
if
there
is
a
match,
attenuation
duration
values
essentially,
is
how
in
three
buckets
of
nearer
in
between
or
further
away,
how
does
that
match
the
thresholds
given
and
then
it's
up
to
the
app
code,
not
the
api
implementation,
to
decide
if
the
user
should
be
notified
and
advise
the
user
to
isolate
and
get
tested,
and
so
on.
So
next
slide.
C
There's
some
governance
issues
with
this
that
are
worth
noting.
Essentially,
google
and
apple,
are
in
total
control.
Here,
that's
there
are
some
reasonable
justifications
for
for
that,
because
you
know
we
don't
want
200
different
schemes.
We
absolutely
because
of
the
the
realities
of
the
mobile
ecosystem.
You
absolutely
need
the
people
who
control
that
are
mostly
controlled,
ftp
involved
and
so
on.
C
However,
given
that
google
and
apple
are
in
control,
one
of
the
things
that
google,
in
particular
have
to
do
is
improve
calibration
across
the
tens
of
thousands
of
handset
types,
and
what
that
means
is
that,
because
we
have
the
threshold
set
by
the
the
health
authorities
and
the
matching
done
by
the
api
implementation
as
google
ship
new
updates
to
their
api
implementation,
that
it
affects
the
results
as
to
whether
you
get
notified
or
not
in
a
way.
C
That's
outside
the
control
of
the
health
authorities,
which
kind
of
seems
like
a
weird
design
so
revisiting
that
governance,
and
I
have
no
idea
how
you
improve
it
actually,
but
revisiting
it
in
some
way,
seems
like
it
might
be
next
slide.
C
Lastly,
about
the
background
before
we
come
to
some
sort
of
results,
the
actual
system
itself,
you
have
to
think
about.
How
would
you
measure
this,
mostly
in
the
media
you
see
about
numbers
of
downloads
or
that
there's
this
fake
thing
that
you
need
60,
which
is
not
true?
C
The
actual
questions
that
would
be
interesting
to
ask.
Are
you
know
how
many
people,
who
wouldn't
otherwise
have
been
found
by
manual
contact
license
get
notified?
How
how
much
quicker
are
those
that
would
be
found
notified
because
of
the
apps?
How
many
people
turn
out
to
test
positive
versus
the
you
know
a
similar
sample
of
the
population
at
that
time,
and
it's
it's
through
those
kind
of
metrics
that
we
might
be
able
to
figure
out.
C
One
of
the
things
we
discovered
early
on
looking
at
the
google
system
and
also
the
singapore
system
and
the
nhs
proposed
system
that
being
deployed
was
there
a
lot
of
these
things
are
dealing
with
beacons.
You
have
replay
attacks,
they're
fairly
obvious.
C
C
We
looked
at
statistics
for
how
many
people
tended
to
be
in
a
hospital
emergency
department
for
how
long
and
basically
concluded
that,
if
you
mounted
this
replay
attack
in
in
that
context,
then
essentially
by
picking
up
the
beacons
at
a
place,
the
colgate
testing
station,
where
you
expect
to
see
some
positive
ones.
You
had
an
amplification
factor
of
four
by
spreading
those
beacons
in
an
emergency
department
and
a
four
is
very
conservative.
It
could
be
much
higher
and
those
could
be
healthcare
workers,
so
the
attack
is
obvious,
hasn't
been
mitigated
at
all.
C
As
far
as
I
know
and
hasn't
yet
happened,
luckily,
I
can't
imagine
that
will
continue
for
too
long
next
slide.
C
So
measuring
deployments
is
the
thing
we're
also
starting
to
do
it's.
It's
basically
because
of
this
set
of
temp
exposure
keys
are
public.
We
can
download
and
count
those,
and
so
we're
currently
doing
that
for
a
bunch
of
countries.
C
C
It
might
allow
us
to
say
that
in
some
cases
there's
they
seem
not
to
work,
and
even
so
far
after
a
few
weeks,
it
does
seem
that
there's
some
mismatches
between
the
number
of
cases
being
declared
positive
for
certain
countries
versus
the
percentage
of
the
population
who
have
installed
these
apps
versus
the
number
of
texts
that
are
being
uploaded,
and
so
there
does
seem
to
be
some
discrepancies,
but
it's
too
early
to
say
maybe
after
a
month
or
so
another
month,
or
so
we
might
be
able
to
come
up
with
some
interesting
results.
C
Based
on
this.
This
is
the
next
slide
so,
like
I
said,
the
measurements
themselves
can
tell
us
if
it
works.
One
question
to
ask:
is:
does
this
proximity
stuff
work?
In
fact,
so
we
did
a
bunch
of
tests
with
we
had
like
six
different
handset
types,
so
we
did
pairwise
tests
giving
a
matrix
of
36..
C
We
only
could
fill
in
33
of
the
cells
because
we
didn't
have
two
handsets
for
every
top
type
and
we
tested
about
one
meter
apart
for
more
than
30
minutes,
which
should
you
know,
should
never
produce
a
false
negative.
You
would
imagine,
but
it
can
do
so.
We
did
the
initial
test
in
early
june
and
at
the
bottom
there
you
can
see
that
we've
got.
You
can
get
a
bunch
of
false
negatives.
C
I'll,
explain
that
a
little
bit
more
in
a
second
in
the
middle
june,
google
started
to
look
at
the
calibration
issue
and
which
is
what
the
cause
of
these
false
negatives
is
that
essentially
the
different
device
types
when
you
look
at
them,
pairwise
they're
not
calibrated
the
same,
and
you
get
weird
answers.
C
So
what
we
did
was
in
our
early
journal
later
in
tests.
We
applied
the
thresholds,
so
you
remember,
I
mentioned
that.
There's
the
you
did
the
apps
give
the
api
two
thresholds
and
get
back
three
buckets,
so
we
applied
that
to
these
tests
and
you
can
see
that
in
in
early
june,
like
the
number
of
false
negatives
was
outrageous
in
in
later.
C
If
you
look
at
the
middle
column
there
with
the
zero
added
noise,
it's
getting
it's
getting
better
for
sure,
and
if
you
had,
you
know,
if
you,
if
you
subtracted
and
had
stronger
or
less
attenuation
by
10
decibels,
it
would
be
pretty
look
very
good.
However,
if
you,
if
you
add
10
decibels
to
this
pretty
ideal
case
of
one
meter
apart
from
obstruction
of
noise,
where
I'll
talk
about
noise
a
second,
then
we
still
get
a
lot
of
false
negatives,
even
at
one
meter
apart.
So
next
slide.
C
So
how
might
we
get
some
noise?
One
way
I
just
covered
in
in
this
particular
document.
This
tech
report
was
released
is
just
due
to
orientation
and
there's,
because
I
guess
handsets
package
entirely
in
different
ways.
The
transmit
powers
are
different.
C
We
can
get
noise
basically
just
purely
due
to
orientation
and
I
don't
know
how
to
model
that
correctly
across
all
the
possible
handset
types
and
there's
other
types
of
noise.
You've
also
have
to
model.
Even
with
this
thing
it's
hard
to
see,
you
can
see
the
diagram
on
the
left
there
we
have.
I
can't
use,
I
can
use
my
mouse,
but
I
guess
you
can't
see
it,
but
there's
a
a
a
circular
diagram
showing
different
angles
and
then
on
to
the
left
of
that
yeah.
C
Thank
you,
and
so
that's
that
that
shows
you
the
average
kind
of
attenuation
values
you
get
at
the
different
angles.
So
you
can
see
there's
about
20
decibels
of
difference
on
the
right
you
get
to
enjoy
a
cat
video,
if
you
like
what
we
did
was
that's
the
environment,
where
I
did
all
these
pairwise
one
meter
tests
and
just
to
see
how
orientation
affected
it.
What
I
did
was
I
took
the
new
handset
and
put
it
on
a
turntable,
so
record
player
and
let
it
rotate
at
33,
rpm
and
the
graph.
C
There
shows
you,
the
raw
rssi
values
so
the
first,
the
left-hand
side
of
the
graph,
which
is
mostly
red,
shows
you
the
stationary
five
minutes
and
then,
when
you
have
the
thing
rotating
you
get
on
the
green
side,
there
is
the
when
it
rotates
and
what
you
can
see
is
you
get
about.
You
know:
10
decibels,
less
noise
in
some
cases,
but
it
spreads
out.
C
C
So
that's
that's
kind
of
artificial,
so
we
also
did
a
bunch
of
tests
walking
cycling
sitting
on
a
park.
Bench
was
interesting
if
you
have
phones
in
your
back
pocket
and
you
sit
in
the
middle
park
bench.
The
signal
goes
to
zero
essentially,
but
two
of
them
are
really
more,
not
worthy,
because
this
these
are
scenarios
where
you
would
imagine
these
apps
if
they
worked,
would
actually
help
with
real
contact
racing
and
that's
like
on
a
bus
or
on
a
tram.
C
Now,
unfortunately,
when
you
add
the
kind
of
metallic
tube
surroundings,
then
that
affects
the
ble
distance
estimation
kind
of
badly.
We
documented
that
for
the
bus,
the
click
two
diagrams
here
for
the
tram
shows
you
on
the
left-hand
side,
the
kind
of
layout
of
seating.
C
So
we
had
people
seating
with
sitting
with
phones
in
hand
trying
to
do
something
with
the
phone,
as
people
normally
would
in
public
transport,
and
then
we
had
a
bunch
of
people
do
that
for
a
quarter
of
an
hour,
then
we
shifted
positions
with
the
same
people
holding
the
same
handsets.
Did
that
a
few
times,
and
overall
we
got,
we
managed
to
derive
a
bunch
of
kind
of
values,
for
what
would
you
or
would
you
not
be
notified
in
those
scenarios
and
the
false
positives?
C
C
The
other
question
we
looked
at,
then,
is
what
traffic
is
being
sent
not
only
on
android.
We
have
no,
we
don't
have
setups
for
exploring
apple
at
this
point
all
the
time.
So
we
looked
at
the
android
implementation
of
the
api
provided.
C
Google
is
part
of
google
play
services,
so
if
you
disable
google
play
services,
then
the
apps
just
won't
work,
and
so
we
manage
the
middle
of
them,
which
is
a
horrible
thing
to
do,
but
we
did
it
and
capture
the
traffic
traces
from
the
apps
and
from
google
play
services
and
overall,
the
apps
preheat
pretty
well,
but
google
play
services
less.
So
the
report
is
linked
there
next
slide.
C
So
we
did
about
seven
or
eight
apps
to
take
the
irish
one
as
an
example,
because
it's
closer
to
home
and
in
all
the
tests
we
did,
we
just
did
the
onboarding
and
downloading
text
steps.
We
didn't
ever
try
and
pretend
to
be
infected
and
upload
keys,
because
that
could
affect
something
in
the
irish
app.
I
mean
it's
pretty
clean,
except
there's
a
use
of
an
authorization
http
header
when
you're
downloading
these
texts,
which
is
pretty
much
unnecessary,
and
nobody
else
does.
C
But
it's
kind
of
you
know,
standards
mobile,
app
development
practices.
I
think
that's
why
they've
done
us,
but
they
have
a
data
protection
impact
assessment.
That
says
they
don't
store
any
of
that
kind
of
information.
So
it's
not
great
for
the
protocol,
but
it
probably
in
practice
is
not
it's
not
real
damage.
C
One
thing
they
do
is
they
in
the
irish
case
they
allow
uploading
of
metrics,
which
is
kind
of
a
mix
of
devops
data
and
medical
information,
including
whether
or
how
many
times
the
user's
been
notified
by
the
app
that
they
may
be
in
proximity,
and
we
kind
of
recommend
keeping
those
in
separate
security
context,
because
they're
kind
of
different
and
then
there
was
some
paths
not
taking
code
with
calls
to
google
firebase.
C
So
basically,
if
you
look
at
all
these
apps
like
generating
them
across
europe,
germany
and
switzerland,
pretty
good
and
ireland
would
be
kind
of
a
table
and
some
viewers
had
some
less
desirable
features
or
weren't
open
source
or
whatever,
but
they're.
All
none
of
them
are
really
really
terrible.
However,
the
next
slide
we
were
interested
in
what
google
play
services
are
doing,
so
we
turned
off
everything
possible
uses.
The
diagnostics
that
you,
if
you
click
along
enough
there's
a
way
of
turning
down
usage
and
diagnostics
in
google
play
services.
C
We
turned
up
every
other.
Google
thing
went
to
the
google
dashboard
to
make
sure
there
was
nothing
showing
there
and
basically,
we
tried
to
find
out.
What's
the
very
minimal
configuration
of
a
handset
that
can
still
run
these,
these
google
apple
exposure
notification,
apps
and
even
in
that
configuration
about
every
six
hours,
google
play
services
calls
home
with,
essentially
all
of
the
long-lived
unchangeable
identifiers.
C
You
could
kind
of
imagine
all
bound
together
so
that,
even
if
you
give
your
hands
out
to
your
child
and
change
change
and
change
the
there,
you
go
no
matter
what
you
change,
you
do
factory
resets!
Well,
some
of
these
identifiers
will
live
on
and
provide
a
chain
of
connectivity
and
then,
in
addition,
every
20
minutes
there's
another
call
to
an
api
that
is
linked
to
the
the
check-in.
C
C
There's
lots
of
additional
terminology
in
those
messages
that
binary
we
don't
understand
and
what's
notable
here
is.
There
is
no
published
data
protection
impact
assessment
for
the
google
play
services,
implementation
of
the
exposure
notification,
api
norfolk,
google
play
services,
all
of
the
apps-
have
done
that
because
they're
populations
and
so
they're
governments.
So
that
seems
like
a
pretty
bad
mismatch
to
us,
and
google
play
services,
of
course,
is
closed
source
and
yeah.
C
C
Sorry
this
is
the
last
slide.
Yeah
these
apps
have
been
deployed.
We
play
it,
the
replay
exists,
I
I,
I
don't
think
the
the
stated
goal
of
the
bluetooth
proximity
is
really
the
correct
one.
If
you
care
about
google
tracking
you
and
you
want
to
run
with
these
apps,
you
have
a
new
problem.
I
think,
and
we
don't
have
information
with
apple
and
you
know
things
can
be
improved
with
additional
documentation,
maybe
some
kind
of
quiet
mode
and
perhaps
revisiting
some
of
the
governance
kind
of
issues.
C
The
next
slides
are
just
some
resources.
The
only
notable
thing
is.
We
have
some
code
that
we
haven't
quite
released
because
it
would
allow
making
this
replay
attack
much
easier,
but
we're
happy
to
share
with
it
with
anybody
who's,
a
researcher
and
that's
it.
A
Great,
thank
you
very
much
steve
indeed.
I
think
we
just
have
time
for
one
question.
So
I'm
going
to
ask
bruno
to
accept
your
audio
and
your
video.
You
want
to
go
ahead.
A
E
Did
you
manage
to
file
a
bug
with
light
services?
Do
you
want
me
to
forward
this
to
them.
C
So
we
we
shared
before
we
published
this
the
last
step.
We
shared
that,
with
with
some
contacts
in
in
google
and
with
in
in
some
of
the
health
authorities,
we
didn't
formally
file
the
bug,
I'm
not
yeah,
I
don't
know
if
it's
a
is
it
a
bug
that
they're
spying
on
us
all
the
time
deliberately
and
have
been
for
years?
C
I
mean
that's
the
kind
of
question
you'd
have
to
wonder
now
I
mean
that's
a
projective
description
of
it,
but
I
I'm
not
sure
that
google
would
consider
it
is
actually
a
book
because
it's
I
think
they
considered
that
they
just
added
the
exposure
notification
to
play
services
and
play
services
already.
Does
that
for
what
they
consider
reasonable
ecosystem
reasons?
So
I
suspect
they
wouldn't
agree
it's
a
book,
but
I
think
it's
not
a
good
feature,
but
okay.
A
C
A
A
F
Yep,
so
I'm
gonna
be
talking
about
trust
token,
which
is
a
new
api,
we're
working
on
based
on
privacy
pass
and
a
crypto
construction
called
pmb
tokens,
which
is
a
token
construction
that
allows
someone
to
private
metadata
next
slide
next
slide,
that's
just
an
outline,
so
the
problem
we're
trying
to
solve
is
protecting
against
things
like
denial
of
service
attack,
spots
and
spams,
in
a
way
that,
like
is
not
painful
for
legitimate
users,
where
they
have
to
solve
some
challenge
or
do
something
every
single
time
they
want
to
perform
an
action,
but
also
we
don't
want
to
be
relying
on
a
lot
of
these
cross-site
tracking
and
fingerprinting
vectors
that
are
currently
used
in
the
industry
next
slide.
F
So
the
existing
way
of
doing
this
is,
for
the
most
part
you
visit
a
website.
The
website
decides
that
you
need
to
prove
some
sort
of
trust
or
some
sort
of
proof
that
you're
a
legitimate
user
for
smaller
websites.
They
use
some
sort
of
trust
provider,
either
a
captcha
or
other
form
of
bot
guard
protection.
F
That
provider
sends
you
a
request
to
do
some
amount
of
work.
Some
proof
of
work,
some
challenge.
You
solve
it,
you
return
it
and
then
you
get
a
verification
saying
that
yes,
you've
have
successfully
proved
to
some
extent
that
you're
a
real
person
either
via
solving
that
challenge
or
by
any
state
the
captcha
might
have
on
you
from
before,
and
this
works
great
for
the
first
time
and
if
you
have
to
keep
on
doing
these
challenges
every
time
you
visit
the
site,
then
this
adds
a
lot
of
strain
to
the
user.
Next
slide.
F
So
a
lot
of
these
trust
providers.
Actually,
after
you
complete
a
challenge,
store
some
at
the
third
party
state,
either
in
your
cookies
or
local
storage,
that
it
can
use
next
time
to
see
that
oh
I've,
you've
already
passed
a
challenge.
You've
already
proven
your
identity
and
can
use
that
to
either
give
you
a
simpler
challenge
or
just
give
you
a
token
right
away.
That
says
that
you
are
like
good
to
go.
You
don't
have
to
do
the
complicated
figure
out
of
this
image
as
a
hot
dog
or
a
cat
or
a
cow.
F
You
really
just
need
a
single
bit
of
information
that
carries
whether
or
not
you
are
trusted
next
slide,
but
the
you
also
want
this,
like
single
bit
of
information,
to
be
something
that,
like
the
trust
provider,
knows
it
generated
and
that
wasn't
forged
or
copied
or
duplicated
just
doing
a
raw
signature
would
give
that
part
of
the
requirements,
but
does
mean
that
the
issuer
can
now
or
the
trust
provider
can
track
you
between
when
you
were
first
there
and
when
you're.
F
F
So,
most
of
these
properties,
we
get
through
a
primitive
called
privacy
pass,
which
is
part
of
another
working
group
in
the
ietf,
which
provides
basically
everything
upset
non-violability
of
metadata,
which
we'll
get
into
why
this
is
useful.
Later
and
privacy
passed
by
itself.
Pretty
much
solves
most
of
this
problem.
F
The
problem
turns
out
to
happen
next
slide
that
we
don't
only
want
to
be
providing.
Yes,
this
user
is
trustworthy,
because
if
we
just
provide
that,
then
this
is
a
very
easy
signal
for
attackers
to.
I
go
attempt
my
new
attack
against
this
detection
algorithm.
This
captcha
see
if
I
get
a
signal,
yes
and
if
so
like.
That
is
a
very
like
immediate
response
like
if
I
got
a
token
that
means
I've
bypassed
this
identity
provider
and
the
current
world
like
third-party
state
and
the
challenge.
F
Verification
is
actually
more
of
a
continuous
signal
between
like
I
am
trusted,
or
I
am
untrustworthy
so
currently
most
trust
providers
don't
immediately
provide
this
feedback
about.
Yes,
you
are
trustworthy.
You've
like
passed
all
my
checks,
and
we
want
to
maintain
this
like
somewhat
trustworthy
or
not
trustworthy
signal
and
just
not
have
it
a
like
single
state.
Otherwise,
like
a
month
down
the
line,
people
will
have
bypassed
all
the
new
trust
systems
that
use
just
the
single
bit
of
information,
so
the
new
property
we
really
want.
F
F
F
So
one
way
of
implementing
this
is
privacy
pass
uses
a
proof,
a
zero
knowledge
proof
proving
that
the
issuer
signed
this
thing
with
this
key
and
the
signatures
formed
in
such
a
way
that
the
client
can
then
blind
it
so
that
in
the
future
the
issuer
knows
it
was
signed,
but
can't
just
copy.
The
like
signa
can't
just
use
a
lookup
table
from
the
signature
that
it
generated
to
the
signature
of
the
final
token.
F
F
So
the
second
thing
we
attempted
next
slide
is
to
use
this
new
pmb
tokens
construction,
which
is
a
slightly
more
complicated
construction
that
uses
both
the
privacy
pass
structure,
which
is
you
have
one
key
that
you
sign
a
token
and
prove
that
you
sign
with
that
key.
But
you
also
attach
another
signature
that
is
signed
with
one
of
two
tokens
and
have
a
or
proof
proving
that
you
sign
with
those
two
tokens.
F
So
it's
basically
the
previous
attempt
with
this
extra
key
signature
from
privacy
pass
and
the
thing
that
this
protects
against
is
now.
We
can
verify
like
whether
this
token
is
valid
at
all.
But
as
long
as
the
issuer
doesn't
say
what,
as
long
as
the
issuer
verifies
that
first
and
then
figures
out
whether
the
token
is
a
true
token
or
a
false
token,
it's
able
to
hide
the
fact
prevent
the
attacks
that
require
altering
the
token
slightly
to
compare
token
values.
F
The
actual
process
is
we
check
one
of
the
signatures
and
then
only
then
check
the
other
signature.
If
you
leak
the
fact
that
the
second
signature
is
invalid,
then
the
issuer
leaks.
Some
information
about
token
comparison
next
slide.
F
F
So
the
protocol
we're
using,
is
this
like
three
step:
zero
knowledge
proof
from
pmb
tokens
using
a
optimization,
called
batching
which
lets
you
combine
a
lot
of
different
deluxe
proofs
together,
so
that
a
issuer
can
issue
a
bunch
of
tokens
in
one
batch
instead
of
having
to
generate
new
zero
knowledge
proof
for
each
and
every
token,
since
clients
will
usually
want
to
ask
for
like
five
to
ten
tokens.
F
Another
thing
where
have
in
the
architecture
for
trust
tokens
is
that
there
are
sites
that
have
lots
of
embedded
resources,
lots
of
things
that
want
some
idea,
whether
you're,
trustworthy
or
not,
but
they're.
All
in
the
same,
like
top
level
contact
context
like
one
site
might
have
at
the
bottom.
You
have
your
comment
board.
F
And
this
provides
a
nice
optimization.
Though,
at
the
loss
of
on
one
first
party,
you
have
to
trust
that
multiple
other
entities
on
this
page
are
using
the
same
redemption
record.
F
There's
also
a
slight
privacy
issue
here
in
that,
like
every
third
party
on
this
site,
is
getting
the
same
redemption
record.
So
you
can
correlate
across
those.
But
you
already
have
these
sorts
of
problems,
since
the
first
party
can
already
embed
any
data.
It
wants
into
the
third-party
requests
on
this
page
as
long
as
we're
not
allowing
this
set
of
third-party
requests
to
be
correlated
to
the
ishwin
spec.
When
you
originally
got
these
tokens
next
page.
F
Another
problem
is
that
trust
tokens
relies
on
the
set
of
keys
that
are
being
used
here
to
be
anonymity
sets
like
if
a
issuer
was
allowed
to
sign
with
50
different
keys.
That
means
like
for
50
different
users.
It
could
sign
with
a
different
key,
and
this
would
mean
you
have
50
different
sets
of
anonymity
so
like
when
I'm
issuing.
I
choose
one
of
those
sets
and
like
assign
each
user
to
a
bucket,
and
that
gives
more
metadata
between
your
issuance
and
redemption.
F
One
way,
and
what
we're
currently
doing
is
having
a
proxy
that
centrally
fetches
the
key
commitments
and
presents
that
to
all
the
clients
so
that
we
at
least
know
that
all
clients
have
the
same
set
of
keys,
though
this
does
have
some
amount
of
trust
in
the
central
proxy
and
we
have
to
make
sure
the
proxy
doesn't
have
any
information
about
the
clients.
Otherwise
the
proxy
itself
could
be
doing
this.
F
A
division
of
the
entity
sets
some
extensions
of
the
alternatives
we've
been
thinking
of
and
are
also
part
of.
Privacy
passes
some
concept
of
a
append
only
log
similar
to
ct.
So
people
can
audit
the
key
commitments
that
a
issuer
is
providing
and
making
sure
that
these
keys
aren't
changing
too
frequently
next
slide,
so
trust
tokens
as
a
whole
is
very
close
to
privacy
pass.
F
These
are
the
three
main
changes
we're
hoping
to
bring
some
of
these
in
line,
either
from
trust
tokens
to
be
more
in
line
with
privacy
pass
or
in
the
other
direction.
We're
working
with
the
authors
of
the
pmb
tokens
primitive
to
try,
bringing
it
to
the
ietf,
possibly
on
the
cfrg
and
having
privacy
pass.
F
Be
able
to
support
multiple
crypto
primitives,
we
have
a
bit
of
a
concept
of
redemption
record,
which
is
a
delegated
redemption
and
privacy
pass,
which
we're
wondering
whether
to
make
that
a
more
general
use
case
there
and
key
management
is
an
open
question
currently
in
privacy.
Pass
next
slide,
so
the
next
steps,
mostly
are
privacy,
passes
the
main
like
actual
standard,
that
all
of
this
is
based
on.
F
Our
first
working
group
session
is
on
friday
and
hopefully
we'll
get
some
good,
docs
and
standards
out
of
that,
we're
going
to
be
experimenting
a
little
while
to
see
the
ergonomics
of
trust
tokens
and
whether
it's
actually
useful
replacement
and
there's
on
the
web
api
side.
Some
standardization
efforts
to
try
to
come
up
with
a
standard
for
how
you
would
interact
with
these
protocols
on
the
web
next
slide
and
those
are
some
links.
Any
questions.
A
B
A
chris's
question,
it's
even
so
chris
is
asking:
if
google's
kt
system,
is
it
a
candid
candidate
for
the
registry.
F
I
think
it
would
make
a
good
sort
of
registry
for
this
sort
of
system.
I
think
we'd
probably
want
something
see
about
whether
some
sort
of
standardized
version
of
that
would
work,
whether
there's
like
either
having
it
be
part
of
the
privacy
pass
working
group
or
whether
this
is
a
more
general
problem
that
the
itf
might
want
to
pursue
separately.
A
Okay,
so
we're
going
to
move
on
to
our
next
presentation,
which
is
by
nate
matthews
on
dionym
d,
anonymizing
internet
traffic
with
website
fingerprinting
nate.
You
need
to
also
ask
to
share
your
audio
as
well
as
your
video,
because
they're
separate
media
streams.
So
I'm
going
to
accept
your
audio
and
your
video
as
well.
G
Great
all
right,
well,
hello,
everyone,
my
name
is
dave
matthews
and
I'm
a
current
phd
student
working
under
dr
matthew,
wright
in
the
global
cyber
security
institute
at
the
rochester
institute
of
now
technology.
Our
lab
group
primarily
works
on
adversarial,
machine
learning
and
its
implications
on
anonymity
systems.
In
particular,
we've
been
working
in
the
website
fingerprinting
domain
for
several
years
now
in
this
presentation,
I'll
provide
a
background,
introduction
for
website,
fingerprinting
and
then
later
I'll
discuss
some
of
the
attack
and
defense
projects
that
our
lab's
been
working
on.
G
G
G
Unfortunately,
this
information
is
all
that's
needed
to
perform
a
website.
Fingerprinting
classification
attack,
since
this
information
is
relatively
unique
for
each
website.
An
attacker
can
use
this
information
to
train
a
classifier
to
predict
what
website
a
user
has
visited
and
ultimately
learn
of
the
user's
browsing
behavior.
G
In
order
to
perform
this
attack,
the
attacker
simply
needs
to
remain
in
a
position
to
eavesdrop
somewhere
on
the
link
between
the
client
and
the
entrance
to
the
tor
network
and
have
access
to
some
modest
computing
power.
G
Such
an
eavesdropper
would
could
be
represented
by
your
internet
service
provider,
for
example,
or
a
compromised
router
on
the
network
path.
So
the
number
of
people
that
can
perform
this
attack
is
quite
large
next
slide
website.
Fingerprint
experiments
are
typically
performed
in
two
scenarios:
closed
world
and
open
world
settings.
G
These
scenarios
describe
what
restrictions
we
have
put
on
the
experiment
in
closed
world
experiments.
The
classification
model
is
only
given
traffic
from
a
small
set
of
monitored
websites.
This
experiment
type
does
not
accurately
represent
the
real
world
performance,
but
it's
often
used
to
benchmark
models
and
techniques
against
one
another.
G
G
The
attack
process
in
these
cases
would
first
collect
a
small
data
set
of
representative
samples
for
the
website
that
you
want
to
monitor
then
develop
a
set
of
robust
features
to
process
the
traffic
into
before
finally,
training,
your
machine
learning
algorithm
and
predicting
on
unknown
samples
in
general.
These
works
show
that
the
features
used
to
train
the
classifier
are
more
important
to
the
particular
classification
algorithm
used
next
slide.
G
More
recently,
researchers
have
begun
to
focus
on
developing
attacks
that
utilize
neural
networks
due
to
the
high
classification
performance
that
these
techniques
demonstrate
in
the
image
domain.
In
particular,
the
state
of
the
art.
Fingerprinting,
attacks
against
tor
use
deep
convolutional
neural
networks
to
act
as
a
classifier
for
the
attack.
G
These
models
learn
abstract
features
automatically
from
the
raw
traffic
metadata
and
have
been
shown
to
achieve
superior
performance.
However,
these
techniques
generally
introduce
high
data
requirements
as
more
trace.
Examples
are
required
to
train
the
model
to
extract
meaningful
features.
The
highest
performing
website,
fingerprint
attacks
to
date
are
deep,
fingerprinting
and
var
cnn.
G
So
here's
a
quick
graph
comparing
the
performance
of
fingerprint
attacks
on
closed
worlds.
The
first
three
items
are
recent
deep
learning
style
attacks
and
then
the
following
three
are
prior
machine
learning
attacks
defined
in
earlier
literature.
G
So,
as
you
can
see,
each
of
these
attacks
are
able
to
form
quite
well
in
the
closed
world
context,
achieving
up
to
near
99
accuracy
on
a
monitored
set
of
about
100
websites
and
as
such,
represents
a
credible
threat
to
client
privacy
next
slide
more
recently
website
fingerprinting
attack
projects
have
focused
on
addressing
elements
of
website.
Fingerprinting
that
may
make
real
world
application
difficult.
G
One
chief
goal
is
to
continue
to
push
the
state-of-the-art
performance,
particularly
in
the
open
world
setting
in
the
real
world.
The
rate
that
a
monitored
website
is
visited
is
likely
very
low
in
comparison
to
the
probability
that
any
other
background
site
is
looked
at
so.
Consequently,
a
model
with
even
a
small
classification
error
will
produce
an
enormous
number
of
false
positives
when
applied
in
the
real
world.
G
We
call
this
the
base
rate
fallacy
so
to
address
this
issue.
The
attacker's
model
must
be
tuned
to
achieve
a
very
high
precision
to
minimize
a
false
positive
rate,
but
must
avoid
lowering
their
calls.
So
the
model's
still
effective
one
additional
issue
affecting
the
current
state
of
the
art
are
restrictions
of
the
type
of
web
pages
used
to
represent
a
website.
G
G
Prior
experiments
have
indicated
that
the
model
efficacy
is
significantly
reduced
when
many
pages
are
used
to
represent
site
instead
of
just
a
single
index
page
and
now.
The
last
goal
for
new
fingerprint
projects
has
been
to
try
to
lower
the
number
of
samples
required
to
train
a
classifier.
G
So,
for
example,
the
time
required
to
capture
say
a
thousand
samples
for
each
site.
Let's
say
a
100
site
data
set,
which
could
span
up
to
several
months
when
only
a
few
machines
are
used,
and
this
would
be
a
huge
barrier
to
actually
performing
a
real
world,
but
I
think
of
brain
attack
because
you
need
fresh
samples
so
to
solve
one
of
these
problems,
my
group
recently
proposed
a
technique
to
drastically
drastically
lower
the
data
requirements.
G
G
This
detect
is
best
described
as
a
multi-stage
attack
that
leverages
both
fresh
and
old
data
samples
to
train
a
robust
model,
so
without
diving
too
deeply
into
the
network
details
I'll
go
over
each
stage
of
the
system
in
the
first
stage
of
the
attack,
the
adversary
trained
the
triplet
network
to
reduce
traffic
samples
to
abstract
features
using
triplets
to
perform
this
training.
A
large
data
set
of
old
data
is
required.
G
G
So
then,
in
the
next
training
step,
we
use
our
trained
up
triplet
feature
extractor
to
convert
a
small
data
set
of
fresh,
fresh
samples,
two
features
and
then
train
a
machine
learning
classifier,
which
is
just
a
basic
k,
n
distance
classifier.
G
G
This
technique
allows
us
to
reduce
the
number
of
fresh
samples
required
for
some
from
several
hundred
to
just
20
instances
per
monitored
site,
while
still
achieving
a
competitive
performance
when
compared
to
other
attacks
that
we
had
reviewed
previously
next
slide.
G
G
G
Fingerprinting
defense
developers
have
two
fundamental
mechanisms
at
their
disposal
that
could
be
used
to
modify
traffic
and
confuse
the
classifier.
The
most
common
mechanism
is
that
of
adding
fake
dummy
packets
that
are
indecent
from
real
packets.
This,
of
course,
states
the
traffic
patterns
and
but
also
adds
additional
bandwidth
overheads
that
can
congested
to
our
network.
G
The
second
mechanism
is
to
add
delays
to
real
packets.
These
delays
may
just
packet
ordering
and
confuse
patterns
and
timing
in
particular
or
but
these
two
different
mechanisms,
their
effects
on
overhead,
can
impact
the
performance
and
behavior
of
tor.
G
G
G
Then
a
very
interesting
strategy
is
to
instead
try
to
create
tractor
pattern
collisions
this
has
the
benefit
of
both
having
a
very
low
overhead,
as
well
as
providing
some
provable
guarantees
to
the
security
of
the
defense,
as
if
you
can
create
traffic
collisions,
then
it's
impossible
to
discern
which
real
website
that
traffic
will
generate
from.
G
G
So
our
lab
has
recently
been
working
on
a
new
new
defense
idea
based
on
adversarial
patches.
An
adversarial
patch
is
type
of
specially
crafted
input
that
can
induce
misclassifications
in
a
model.
These
types
of
attacks
have
shown
strong
performance
in
the
image
recognition
domain,
because
website
fingerprint
predominantly
uses
deep
learning
models
for
its
classification.
G
We
may
instead
use
these
patches
of
defense
to
confuse
the
attacker's
model.
Okay,
next
slide.
G
So
in
our
ongoing
work,
we
started
by
defining
a
new
style
to
represent
traffic
sequences.
So
here
we
represent
traffic
as
a
sequence
of
consecutive
packets,
known
as
bursts.
G
G
G
So
with
all
that
said,
there
are
a
number
of
general
questions
that
still
need
to
be
addressed
in
defense
design
in
the
future.
G
The
overhead
of
the
defense
must,
of
course,
be
weighed
against
the
effectiveness.
So,
as
a
consequence,
just
enough
defense
may
be
oftentimes
the
most
optimal
solution
for
a
large-scale
deployment,
where
we
can't
have
say
twice
for
a
bandwidth
overhead
or
adding
30
seconds
of
latency
on
every
website
visit.
G
G
Okay,
next
slide
and
that's
the
end
of
my
talk.
So
thanks
for
listening.
If
we
have
time
left,
I
can
take
some
questions.
A
If
not,
please
feel
free
to
send
your
questions
into
the
java
queue
during
the
rest
of
the
session
or
nate.
Can
people
send
you
questions
directly.
A
H
No,
it's
fine.
I
can't
do
the
video
if
it's
not
a
problem
for
anyone,
so
thank
you,
everyone
for
your
attention.
This
is
a
presentation
of
a
draft
which
was
published
like
the
first
iteration
of
this
text
was
published
at
some
point
last
year.
H
It's
about
the
application
of
randomized
response
mechanisms
in
round
trip
time,
measurements
in
quick
and
next
slide.
Please.
H
So
the
origins
of
this
draft
was
a
presentation
in
perigee
at
iitf
104,
on
differential
privacy
in
their
presentation
me
and
christopher
longstrom
brought
up
random
response
mechanisms
and
afterwards
it
was
raised
by
siobhan
that
randomized
response
mechanisms
might
be
applicable
to
round-trip
time
measurements
in
in
quick,
especially
after
the
controversies
around
the
latency
spin
bits.
H
H
It
is
meant
to
preserve
the
utility
of
data
for
the
user
of
data
which,
in
the
case
of
round
trip
time,
measurements
in
quake
like
the
spin
bit,
is
typically
the
observer
of
the
spin
bit.
So
not
the
client
or
the
server
while
at
the
same
time
protecting
the
unique
identity
of
the
contributors
of
data,
and
these
would
typically
in
a
privacy
oriented
system,
be
private
individuals.
H
Let's
say,
and
so
differential
privacy
uses
statistical
mechanisms
to
achieve
this
goal
through
statistics,
it
can
accomplish
mathematical
guarantees
of
simultaneous
data
usability
and
data
obfuscation.
H
H
So,
just
to
summarize
the
conclusions
of
the
investigation
of
using
randomized
responses
for
the
latency
spinbit,
it
does
not
remedy
the
privacy
concerns
that
were
actually
raised
in
the
quick
working
group
with
respect
to
the
the
spin
bit
because
it
turns
out
the
privacy
concerns
that
were
raised
with
this
pin
bit
in
in
quick
were
actually
related
to
the
utility
as
such.
H
That
said,
randomized
response
mechanisms
could
also
provide
some
form
of
larger
degree
for
client
autonomy
over
determining
when
the
round
trip
time
measurement
is
useful
and
when
it's
not,
but
that
would
require
much
more
precise
specification
of
the
latency
spin
bit,
and
it's
also
not
entirely
clear
that
it's
worth
the
effort
that
the
additional
privacy
guarantees
are
sufficient
to
to
justify
that
kind
of
effort.
But
so
next
slide,
please
the
background
of
randomized
response
mechanisms
is
that
it's
in
the
1970s.
H
It
was
originally
created
to
ensure
a
higher
level
of
privacy
for
individuals
that
participate
in
surveys,
and
so
originally
the
idea
was
that
an
individual
survey
taker
could
be
guaranteed
that
the
survey
giver
would
not
know
which
question
was
being
answered
and
that
this
would
make
the
survey
taker
more
comfortable
in
answering
questions.
H
So
this
could
be
sensitive
health
information,
for
instance,
that
either
you
ask,
do
you
have
diabetes
or
you
could
ask,
do
you
not
have
diabetes
and
since
the
survey
giver
was
not
meant
to
know
which
of
these
answers
were
which
of
these
questions
were
answered?
That
would
make
the
survey
taker
perceive
a
higher
degree
of
privacy.
What's
the
idea,
what
the
survey
giver
does
have
to
know
is
the
probability
that
either
of
these
questions
was
answered.
H
An
alternative
way
of
thinking
about
this
is
that
a
survey
taker
might
lie
in
the
response
to
a
single
binary
outcome.
Question
with
some
probability
next
slide.
Please-
and
so
I've
tried
to
illustrate
that
here.
So
binary
outcome
of
course
means
that
there's
two
possible
outcomes
to
a
question.
H
If
you
have
no
randomized
response
mechanism,
then
the
question
will
lead
either
to
a
yes
or
no
answer
or
true
or
false
or,
however,
you
want
to
characterize
those
if
you're
using
randomized
response
mechanisms,
then
typically
having
two
outcomes
for
a
single
question
like
survey
taker
can
give
either
of
two
choices,
gives
four
possible
outcomes
that
either
they
truthfully
respond
to
the
question
or
they
falsely
responded
to
the
question
next
slide.
H
Please,
and
so
fundamentally,
the
problem
that
you're
investigating
with
randomized
response
mechanisms
is
how
can
you
estimate
the
true
number,
the
distribution
of
yes
and
no
answers,
knowing
only
the
total
numbers
of
yes
and
no
responses,
plus
the
assumed
proportion
of
truth,
sayers
versus
false
sayers,
and
so
this
has
been
a
quite
active
mathematical
field
since
the
1970s
to
to
work
the
different
ways
of
solving
this.
H
It
can
in
general
be
solved
also
when
you
have
more
than
two
possible
answers
to
a
question,
but
it's
more
complicated
and
I
will
anyway
not
going
to
most
of
the
maths
here,
but
so
next
slide.
Please.
H
So
the
reason
that
this
came
up
in
relation
to
the
spin
bit
and
quake
is
that
the
latency
spin,
bit
being
a
bit
actually
has
two
states
zero
and
one.
H
So
these
two
states
could
be
construed
as
some
form
of
a
binary
outcome,
and
so
the
idea
was
to
apply
randomized
response
mechanisms
to
the
state
of
the
latency
spin
bit
to
increase
privacy
and
what
we
ended
up
doing
mostly
when
I
was
discussing
this
with
siobhan
is
trying
to
practically
figure
out
when
randomized
response
mechanisms
would,
in
this
case
be
activated
to
enable
any
form
of
increased
privacy,
and
so
it
turns
out
in
the
end,
that
activating
randomized
response
mechanism
in
some
sensible
way
that
gave
a
model
that
could
be
used
at
least
somehow
required
10
additional
spin
bit
assumptions
beyond
those
that
are
already
in
the
quick
draft
in
section
173.
H
Currently,
we
also
needed
to
specify
a
model
based
on
those
assumptions.
We
included
in
the
draft
a
round-trip
explanation
in
section
7-1
just
for
for
it
to
be
kind
of
a
word
description
of
how
measurements
and
randomization
would
actually
occur,
and
then
we
constructed
truth
tables
for
the
spin
bit
values.
H
If
anyone
wants
to
try
to
simulate
this,
but
so
next
slide,
please,
but
so
section
71
then
describes
one
and
a
half
round
trips
in
words
and
one
of
the
goals
of
designing
the
randomized
response
mechanism
was
actually
finding
a
way
of
putting
the
spin
bit
in
a
loop,
so
kind
of
indefinitely
stuck
on
a
single
value
and
the
reason
for
why
you
would
want
the
spin
bit
to
eventually
end
up
in
a
in
a
loop,
meaning
that
it
no
longer
has
any
possibility
of
flipping
from
zero
to
one
or
one
from
zero.
H
Is
that,
then
you
can
guarantee
that
a
certain
percentage
of
the
transmitted
spin
bits
are
not
useful
for
the
purpose
of
latency
measurements,
and
this
was
a
design
goal.
Since
the
quick
draft,
in
fact
assumes
that
a
certain
number
of
connections
will
not
be
measurable,
and
even
if
with
randomized
response
mechanisms,
you
don't
get
exactly
what
is
asked
for
in
the
quick
draft.
H
Next
slide,
please,
but
so
the
interesting
bit
about
working
with
like
loops
and
randomized
response
mechanisms
and
having
some
probability
of
lying
or
truth
saying,
is
that
you
get
a
bunch
of
parameters
like
we
use
three.
Where
p
is
the
probability
that
a
server
lies,
which
means
that
it
sends
a
value
of
the
bit
that
it's
not
supposed
to
send.
H
H
Previously,
considerations
for
internet
protocols
in
section
73,
there
are
some
values
for
pq
and
r
suggested
that
could
facilitate
the
restriction
of
useful
spin
bit
measurements
as
not
precisely
mandated.
It
might
be
the
wrong
term
in
section
1731
in
the
id
quick,
but
at
least
sort
of
going
towards
those
goals
of
protecting
privacy
by
rendering
measurements
useful
use
less.
H
H
So,
just
to
reiterate
that
the
conclusion
is
that
the
incorporation
of
such
a
mechanism
in
quick
as
such
is
probably
not
worthwhile
for
previously
stated
reasons,
but
it's
in
a
way
interesting
to
look
at
how
some
of
the
trendier
privacy
protection
mechanisms
out
there
in
academia
could
be
applied
to
internet
protocols.
H
A
Okay,
it
doesn't
look
like
we
have
any
specific
questions
today
on
this,
but,
as
you
mentioned,
we
would
very
much
like
to
see
additional
reviews
on
this
on
the
list,
and
the
intention
is
that
we
will
do
a
call
for
adoption
on
this
document
on
the
peggy
list,
because
we
do
think
it
has
some
value
as
an
example
of
implementing
this
methodology
to
a
technique.
So
we
do
think
it's
useful
to
the
to
the
research
group.
A
I
don't
see
any
questions
coming
through,
so
in
that
case
amelia
I
will
thank
you
again
for
your
presentation.
Thank
you
very
much.
Thank
you.
We
will
move
on
to
the
second
part
of
the
presentation,
which
is
the
updates
on
the
drafts.
A
I
It
even
has
a
neat
little
sound
graph,
cool,
okay,
so
I'll
be
really
quick.
Sorry,
my
headset
crapped
out,
so
you
may
see
my
fancy
mic
in
the
shot.
This
next
slide,
please
so
we're
presenting,
I
guess
again,
a
research
group
adopted
draft,
that's
been
in
existence
since
itf
91,
which
is
november
2014.
a
very
very
long
time
ago.
As
you
can
imagine,
it's
changed
quite
a
bit.
I
It's
fun
to
just
troll
through
the
diff,
but
anyway
we
got
a
lot
of
really
good
feedback
and
pulled
it
from
a
research
group.
Last
call
was
clear
there
needed
to
be
so
I
will
talk
slower
and
enunciate.
It
was
clear
there
needed
to
be
some
additional
work,
and
I
really
want
to
thank
amelia
there's
a
lot
of
authors
on
this
draft
and
they've
sort
of
come
in
here
and
there
and
really
helped
out
move
it
forward
and
emily
has
done
a
wonderful
job,
which
is
why
you've
seen
so
much.
I
At
the
list
traffic,
we
recently
submitted
right
before
the
draft
deadline,
a
an
o4
rg
draft
that
deals
with
so
much
of
the
feedback
we
got
during
research
group
last
call
scaled
back
the
definition
of
sales
censorship
and
was
totally
convinced
that
that
could
be
smaller
and
still
do
what
we
wanted
it
to
do.
I
To
point
out
that
that's
a
liability
here
and
at
elliott
lear's
urging
added
a
chunk
about
domain
seizure,
which
does
happen
so
the
goal
going
forward
is
trying
to
resolve
the
remaining
issues
before
a
final
research
group
last
call
next
slide.
I
This
is
sort
of
the
current
structure
of
the
draft.
It
is
basically
broke
down
into
three
things:
prescription
identification,
interference,
what
the
block
outer
block
and
the
actual
execution
of
the
blocking
it's
in
a
network
layer
structure
you
can
see
if
you
just
look
at
the
top
level
sections
right,
introduction,
terminology,
technical
prescription,
technical
identification,
technical
interference,
non-technical
interference,
you
can
see
there's
some
there's
some
redundancy
there
that
has
been
exposed
through
the
editing
that
I
think
we're
going
to
clean
up,
but
next
slide
we'll
talk
about
the
actual,
bigger
meteor
issues.
I
Next
slide.
Thank
you
all
right,
so
I've
embedded
the
github
repo
issued
numbers
and
ones
that
are
still
open
here.
There's
an
open
one,
elliott
and
vittorio
have
some
questions
about
you
know.
Can
we
get
by
with
a
more
restrictive
definition
of
censorship?
Specifically,
can
it
be
cabin
to
state
actors
rather
than
enterprises,
or
merely
people
in
position
of
power
in
a
network?
Sense?
That's
a
pretty
fundamental
discussion
there
and
I
want
to
talk
about
it
on
the
list
board.
I
have
a,
I
think,
I'm
writing
up
that
I'll
send
along.
I
I
know,
carson
borman
is
a
fan
of
not
saying
censorship
techniques,
but
techniques
employed
for
censorship.
It's
something
I'm
still
thinking
about
chris
would
even
you
know,
I
think,
an
offline
communication,
sorry
chris,
to
expose
offline
communication.
I
Hope
you
don't
mind
that
we
might
want
to
talk
about
network
censorship
techniques,
it
sort
of
would
crispen
it
up
a
bit
and
that's
probably
not
a
too
hard
edit
to
make
chelsea
comlow
had
a
really
good
point
to
you
know,
make
it
clear
with
our
glorious,
tls
1.3
to
some
extent,
but
you
know
the
increasing
security
of
tls.
When
does
tls
help
or
hurt
certain
kinds
of
censorship
techniques
that
is
a
really
good
edit
in
adding
sort
of
a
cost
of
the
sensor
in
each
section.
I
It
sort
of
dovetails
with
this
notion
that
chelsea
also
brought
up
what
they
that
they
use
in
sensors
and
sensor
censorship.
Circumvention
censorship,
circumvention
areas
where
they
wanna
actually
talk
about
sensor.
Maturity
ecker
in
his
blinding
wisdom,
has
a
ton
of
actually
kind
of
hard
things
to
work
through.
That's
just
going
to
take
a
little
bit
of
time
and
you
can
click
on
each
one
of
those
they're
they're
each
of
their
their
own
are
interesting,
but
not
interesting
enough
to
spend
time
talking
about
them.
I
Right
now
still
have
some
open
issues
from
stefan.
There
was
something
I
wanted
to
postpone
to
the
next
version,
which
is
actually
integrating
a
really
good.
I'm
systematic
system
as
adaptation
of
knowledge
review
article.
I
don't
know
why
the
cs
people
call
sok's.
It's
such
a
dumb
name
anyway,
and
I
I
may
be
in
a
position
to
actually
incorporate
this
gorgeous
review
article
about
censorship
techniques,
actually,
in
this
sooner
than
I
thought,
given
that
we
popped
out
rg
lc
next
slide.
That
may
be
it.
A
Excellent,
thank
you
joe.
Do
we
have
any
questions
for
today
on
this
draft
joe?
Could
you
just
clarify?
I
know
you
did
an
update
recently
where
you
addressed
a
number
of
issues.
I
Oh
no,
no,
I
I
need
to
do
some
work
before
we
can
get
to
another
last
call,
and
by
work
I
mean
there's
some.
That's
just
like
you
know,
desk
work
like
you
know,
figure
out
what
the
damn
thing
should
say
and
the
other
is
actually
we
need
to
talk
as
a
research
group.
You
know
about
some
of
these
more
fundamental
things.
I
Like
the
you
know,
the
restrictive
definition,
because
for
many
many
years
just
talking
about
censorship
has
not
been
a
problem
for
some
reason
elliot
and
vittorio
feel
strongly
about
it,
and
I
need
to
either
be
in
a
place
where
we're
comfortable
with
it
or
where
we
can
clearly
say
you
guys
are
in
the
rough
right
or
something
like
that.
I
don't
know
if
that's
how
it
works
in
irtf.
Sorry.
A
A
Yeah
that
that
would
be
the
the
initial
goal.
At
the
very
least
I'm
just
I
see
some
questions
in
jabber.
I
don't
believe
any
of
them
are
for
the
mic,
though,
please
quickly
correct
me
if
I'm
wrong
there,
otherwise,
I
think.
H
Yeah
well,
so
I
think
it
was
kirsten
kirsten
bormann
who
said
on
the
if
I
completely
mangled
the
pronunciation
of
that
name.
I'm
very
sorry
that
said,
maybe
techniques
employed
for
that
suggested,
changing
censorship,
technique
into
techniques
employed
for
censorship,
which
I
thought
his
justification
for.
That
change
was
quite
eloquent
and
I
would
be
strongly
favorable
of
making
making
this
change.
I'm
just
wondering
since
it's
an
open
issue
now,
if,
if
they're,
if
there
is
anyone
who
has
strong
reservations
against
that
change,.
I
So
people
should
line
up
and
talk
about
it,
but
by
all
means,
there's
like
three
places
in
the
document
where
we
actually
say
this,
so
it's
not
a
huge
change.
The
trick
is,
is
that
it's
in
the
title,
and
so
if,
if
we're
going
to
combine
network
censorship
techniques
and
techniques,
this
is
kind
of
more
trivial
than
I
think,
but
whatever
I
don't
want
to
trivialize
carson's
comment
but
like
it
may
end
up,
meaning
to
be
network
techniques
employed
for
censorship
or
techniques
employed
for
network
censorship.
I
Whatever,
like,
I
think,
the
only
sort
of
even
minor
concern
I
would
have
is
that
it's
gonna
make
the
title
long.
But
who
cares
right?
You
know
I'm
at
the
point
where
I'm
not
caring
about
seville,
titles.
H
Sorry
yeah,
so
the
problem
I
would
have
with
something
like
network
censorship
techniques
is
that
I
have
previously
been
in
standards.
Organizations
where
the
meaning
of
the
word
network
was
discussed
for
extended
periods
of
time.
That,
I
think
maybe
this
research
group
is
not
served
by
having
such
you
know,
philosophical
meanderings,
into
what
constitutes
a
network.
Actually
so
techniques
employed
for
censorship
to
me
seemed
like
a
good
way
of
avoiding
the
ambiguity
of
the
word
network.
A
Okay,
thanks
amelia
and
to
be
clear,
even
though
you
have
issues
in
github,
we
prefer
discussion
directly
on
the
mailing
list.
I
believe
joe
is
that
correct.
I
Question
from
peter
cope,
I'm
sorry
I'll
be
really
quick.
Yeah
asking
I'm
asking
about
is
the
centers
for
draft
to
do
playing
fields
establish
or
not
dns
as
a
content
control
plan?
I
No,
if
that
was
a
serious
question,
I'm
sorry
to
take
it
not
so
seriously,
but
it
absolutely
not
we're
trying
to
be
descriptive
and
as
a
reference
material,
if
you
see
places
where
we're
not
doing
a
good
job
of
that,
please
send
us
a
pr.
A
J
J
J
Okay,
so
just
to
recap
in
terms
of
what
this
is
all
about,
so
in
the
last
session
we
introduced
the
problem
of
how
to
make
log
analysis
a
little
more
deterministic
in
terms
of
identifying
personal
information,
so
just
to
give
some
background
in
terms
of
the
motivation
of
this
document.
J
J
C
J
Has
contributed
to
a
lot
of
security
incidents
and
data
leakage,
so
one
of
the
very
often
question
asked
is:
what
is:
is
there
a
way
for
us
to
be
more
deterministic
in
in
terms
of
identity,
identifying
what
is
sensitive
in
a
log?
What
is
sensitive
data
analog,
so
this
is
something
that
what
motivated
to
look
at
this
as
a
problem
statement.
J
So
obviously,
some
of
the
things
that
we
found
out
as
part
of
this
document
ideation
is
that
log
itself
is
to
subject
in
terms
of
the
format
in
terms
of
the
source
and
how
the
content,
and
also
there
are
a
lot
of
existing
ways
in
which
many
products,
many
solutions
in
which
there
are
already
existing
solutions
where
they
provide
us
a
way
to
identify
logs
and
most
of
these
obviously
are
vendor
specific
and,
and
there
is-
and
there
was
no
specific
guidelines
in
terms
of
how
to
operationalize
it
and
possibly
make
it
interoperable.
E
J
So
so
this
is
what
we
want
to
summarize
as
the
document
goal.
Obviously,
what
we
want
to
achieve
with
this
document
is
to
provide
that
reference
model
so
as
as,
as
we
know,
logs
or
have
different
purposes
based
on
who
is
looking
at
the
log
as
a
developer,
obviously
for
troubleshooting
as
operator,
maybe
as
for
my
monitoring
and
performance
measurement
and
as
a
security
analyst
more
for
incident
response.
So
there
is
always
somebody
looking
at
the
log
and
in
a
different
way.
J
So
what
we
were
looking
is
to
see
if
we
can
provide
that
reference
model
in
which
how
logs
can
be
tagged
and
classified
so
that
based
on
where
or
who
the
consumer
of
the
log
is,
there
could
be
differential
views
of
the
log
to
different
actors
in
the
system
so
and
that's
what
we
want
to
achieve
as
part
of
the
privacy
control
actions
on
the
logs.
So
next
slide
please.
J
So
this
is
just
a
quick
summary
on
what
changed
from
the
last
question.
So
obviously,
thanks
for
the
comments
of
the
mic,
as
well
as
on
the
email,
so
try
to
address
some
of
them
here.
So
we
try
to.
We
have
also
tried
to
define
a
privacy
schema
in
terms
of
what
could
be
the
approaches
or
proposals
in
terms
of
achieving
something,
a
little
more
unambiguous
way
of
identifying
sensitive
information
in
the
locks
and
and
also
some
considerations
in
terms
of
what
could
be
the
access
control
model
and
how
could
those
policies
be
defined?
J
So
that's,
what's
the
update
next
slide,
please?
So
this
is
again
a
as
an
illustration
as
an
example
of
what
one
way
in
which
we
can
make
these
log
data
more
identifiable.
J
So
I
mean,
as
we
all
know,
the
various
log
data
that
user
information
user
in
use
the
data,
that's
included
in
the
logs,
so
one
approach
that
could
be
to
define
the
name
space
or
a
kind
of
identified
registry
pii
registry,
if
I
can
call
it
so
that
would
at
least
provide
some
structure
in
terms
of
how
these
key
value
pairs
could
be
represent.
Interlock.
J
So
this
would
make
a
passing
of
the
logs
either
in
the
intermediary
or
at
the
consumer
or
the
enforcement
level,
a
lot
more
deterministic
so
also
because
there
is
already
a
defined
namespace.
So
this
just
one
example
of
how
this
can
be
achieved.
This
is
something
that
we
are
trying
out
as
an
implementation
as
well.
So
so
next
slide
please.
J
So
these
are
again
some
more
thoughts
in
terms
of
how
we
can
do
the
identification
again
all
related
to
privacy
data.
So
while
this
whole
notion
can
be
extended
to
other
anything
that
is
deemed
sensitive
analog,
but
the
focus
for
us
definitely
is
to
look
at
from
a
lock
perspective,
so
there
are
different
types
of
blocks
formats,
some
custom,
some
standards,
syslog
cfl
cef.
J
So
this
is
one
extension
or
an
extra
field
or
metadata,
as
we
can
call
to
the
log
data.
That
would
say
whether
a
particular
field
is
sensitive
or
is
it
critical
or
is
it
confidential?
So
again,
these
are
very
at
the
moment,
they're
all
proposals
and
just
trying
to
experiment
with
these
models.
So
that's
why
at
field
level
could
be
very
useful,
especially
when
there
is
a
large
data
of
log,
and
you
want
to
have
differential
pro
actions
for
each
field
in
the
log.
J
So
this
a
field
level
talking
could
be
very
useful
for
that.
Another
way.
A
lot
of
systems
right,
we
are
seeing
that
they
would
rather
be
more
interested
in
whether
the
whole
log
is
sensitivity,
sensitive
or
not.
So
at
that
I
mean
there
are
a
lot
of
proposals,
also
in
terms
of
just
identifying
one
extra
field
appending
to
the
end
of
the
log
that
says
that
log
is
sensitive
sensitivity
or
sensitive
or
not
or.
J
Or
not,
so
that
is
where
all
the
systems
would
categorize
or
bucketize
those
logs
into
pub
into
sensitive
non-sensitive
and
then
have
their
own
additional
processing
rules
that
could
be
worked
on
those
sensitive
locks.
So
that's
at
the
lock
level.
So
next
one
please
yeah
so
okay,
so
this
is
again
now,
while
log,
while
this
stacking
at
source
would
definitely
help
identify
what
is
really
important
in
a
log
or
what
is
very
critical
data
in
the
log.
J
So
also
there
is
a
need
to
provide
the
operators
to
define
or
specify
what
is
the
action
that
could
be
performed
on
that
particular
field
or
on
this
sincerity,
level
of
the
log.
So
again,
here's
one
illustration
that
we
are
trying
out
based
on
at
the
field
descriptor,
so
we
can
include
some
again
a
metadata
where
every
field
can
be
tagged
with
a
particular
log
level
and
an
action
associated
to
say
what
should
be
a
log
enforcer
is
required
to
do
based
on
the
actions
that
specified
at
the
field
level.
J
So
again,
this
is
another
proposal
that
currently
being
experimented
to
see
if
this
can
be
used
to
execute
something
like
a
per
field
anonymization
or
per
field
policy
actions
for
the
log.
J
So
next
one
please
okay,
so
obviously
there's
thanks
for
a
lot
of
comments
and
and
the
email
comment
from
joe
and
and
everyone
like,
I
think
they
help.
So
we
realize
that
it's
it's
not
just
about
tagging,
something
at
the
source.
There
are
a
lot
of
other
challenges
in
terms
of
the
transformations
of
the
log
as
the
log
gets
propagated
through
different
hops.
In
the
system
log
pipeline,
there
is
always
change
in
the
way
the
lock
could
be
formatted
or
it
could
be
changed
in
the
format
of
the
log.
J
So
one
of
the
problem
is:
how
do
we
look
at
privacy
preservation
so
that
can
be
kept
as
the
log
gets
transformed,
extracted
loaded
into
different
systems,
so
another
area,
that's
still
not
investigated,
is
in
terms
of
change
of
change
of
authorization.
I
mean
change
of
policy,
so
basically,
if
there
is
a
change
in
the
marking
or
the
privacy
specification
of
certain
fields
or
certain
data
elements,
so
how
do
we
enforce
that
change
in
the
system
and
obvious
and
and
the
final
one
is
around
out
of
band
mechanisms
295
systems?
J
This
is
largely
about
systems
legacy
systems
where
a
lot
of
log
generation
is
already
happening
and
that's
a
bulk
of
it
as
we
see
today.
So
how
do
we
insert
something
like
a
log
mechanism
to
a
a
notification
mechanism
so
that
we
can
inform
what
is
private
and
what
is
not
private
or
what
is
sensitive
or
not
sensitive.
So
that's
an
out-of-band
mechanism
work
that
we
can
look
at
so
final,
one,
please
like
okay.
J
So
again,
this
is
open
questions
in
terms
of
like,
if
you
have
any
questions
and
in
terms
of
whether
we
should
continue
this
work
in
perji
and
if
there's
a
need
to
change
the
scope
and
everything
ready
for
now
jan
option.
So
anyway,
any
questions.
A
Thank
you,
sandeep,
that's
great.
Do
we
have
any
questions
on
this?
Okay,
we
have
stephen
farrell.
Let
me
enable
the
audio
off
you
guys,
stephen.
C
J
J
There's
one
question:
obviously:
why
is
the
example
from
from
a
really
rfc
yeah?
Absolutely
thanks
for
that
input
yep.
So
we
can
trying
out
some
of
the
systems
and
obviously
one
that's
one
of
the
log
source
that
we
had
and
we
tried
to
do
that.
Yeah
I'll,
ensure
that
we
move
to
a
later
and
better
version
of.
H
A
G
A
Okay
and
now
to
our
last
presentation,
guccifer
grover
is
going
to
whisk
us
through
an
update
in
the
last
few
minutes
on
the
draft
for
performing
safe
measurements
on
the
internet.
Please
go
ahead,
go
shabbat.
K
Hi,
so
yes,
very
quickly,
this
is
guidelines
for
performing
safe
measurement
on
the
internet,
which
is
a
draft
mostly
written
by
ian
leermath
and
I'll
be
joining
as
a
co-author
in
the
next
update.
So
this
this
presentation
is
just
to
run
you
through
the
open
issues
and
to
welcome
any
inputs
next
slide,
please
so
yes,
this
is
an.
K
K
The
research
is
considered
safe
and
if
and
only
if,
users
are
protected
from
or
unlikely
to,
experience,
danger,
risk
or
injury
in
the
future
due
to
research
next
slide,
please,
and
so
currently
we
have
a
few
sections
on
consent
and
other
safety
considerations,
mostly
relating
to
traffic
in
the
interest
of
time,
I'll
sort
of
focus
on
the
issues
next
slide.
K
Please,
yes,
so
I'll
start
with
the
last
one,
which
is
safety,
not
ethics,
and
this
is
a
scoping
issue
perhaps,
and
some
of
the
I
think,
comments
that
came
in
earlier
sort
of
tried
to
address
the
scope,
and
so
this
this
draft
is
not
as
a
clearly
no,
it's
not
meant
to
supplant
any
institutional
ethics
process.
It's
just
documenting
a
subset
of
that
which
is
safety
and
it
it.
K
The
aim
in
for
the
next
version
is
to
add
recent
research,
from
instance,
from
the
workshop
of
on
ethics
and
network
research
systems,
then
there's
currently
no
statement
on
future
threats
that
might
arise
to
safety.
So
that's
something
like
encrypted
payloads
that
are
being
processed
but
could
be
sort
of
broken
or
decrypted
in
the
future
or
could
be
fed
into
some
machine
learning
models
that
are
currently
not
operated
today.
So
things
like
those.
K
So
yes,
these
are
on
github
as
well,
but
any
additional
comments
on
the
mailing
list
or
directly
are
very
welcome
I'll.
I
I'll
stop
for
now.
A
Thank
you
very
much.
We
did
have
one
person
in
the
queue
who's
just
popped
out.
Again,
though,
we
have
okay,.
F
A
K
Like
it,
okay
and-
and
I
think
ian
is
also
on
the
chat
and
group.
So
if
even
if
you
feel
like
adding
anything
important
that
I
miss,
please
go
ahead.
C
A
Just
that
we're
having
a
additional
author
coming
on
board
for
the
next
update,
so
we
expect
to
see
a
reasonable
amount
of
changes.
G
A
Thank
you
everybody
for
joining
today's
session.
Thank
you
to
all
our
presenters
today
and
thank
you
for
the
gods
of
meet
echo
for
smiling
on
us.
We
managed
to
make
it
through
the
session,
so
we
look
forward
to
seeing
everybody
again.
Hopefully
it's
our
109
meeting.
Thank
you
very
much.
Everyone
goodbye.