►
From YouTube: IETF105-PEARG-20190724-1330
Description
PEARG meeting session at IETF105
2019/07/24 1330
https://datatracker.ietf.org/meeting/105/proceedings/
A
All
right,
hello,
everyone
welcome
to
the
PRG
privacy
has
been
suspense
research
group.
My
name
is
Chris
Wood
here
with
chiffon
sahih
Sarah
is
with
us
remotely
and
meet
echo
just
quickly
moving
forward.
This
is
the
note.
Well
you've
probably
seen
this
lots
of
times
so
far.
But
if
you
haven't,
please
take
a
moment
to
familiarize
yourself
with
it
or
you
can
read
it.
Online
Siobhan
is
passing
around
the
blue
sheets.
Please
make
sure
you
sign
them.
We
have
a
jabber
scribe,
correct.
Yes,.
A
A
Ben,
thank
you.
So
this
is
the
agenda
we
have
today.
We
have
eight
minutes
of
sort
of
research.
He
presentations
to
start
and
then
followed
up
by
some
proposed
individual
drafts
for
the
research
group
to
close
things
out.
Does
anyone
have
any
last
minute
comments
revisions
they
would
like
to
make
before
we
get
started.
C
Okay,
take
two,
so
my
name
is
Pete
Snyder
I
come
from
brave
software,
where
I
do
privacy,
research,
I,
mussel
co-chair
of
ping,
which
is
privacy,
group
and
w3c,
and
so
this
is
some
lessons
taken
away
from
the
intersection
of
those
activities.
So
just
as
a
brief
overview
of
what
I'll
be
talking
about.
First,
we'll
talk
about
how
standards
impact
the
the
privacy
work.
That
brave
does
is
a
privacy
or
anything
vendor.
C
That
I
would
I
think
warrant
further
discussion,
but
I
there's
not
time
for
this
talk
so
first,
how
do
standards,
employees
or
impact
our
work
as
a
privacy
oriented
implementer
so
in
order
to
protect
privacy
on
the
web
for
our
users,
great
makes
a
bunch
of
modifications
to
the
browsing
environment.
We
block
state
storage
in
different
places.
We
integrate
tor
and
a
lot
of
people.
Users
use
tor
in
the
box,
we
block
a
whole
bunch
of
resources
from
being
loaded.
C
The
first
place
from
a
variety
different
places
like
easy
list
and
in-house
generated
lists,
and
things
like
that.
The
most
relevant
to
this
conversation
is
we
modified
the
browser
environment,
and
so
we
end
up
violating
standards
in
a
large
number
of
places,
because
implementing
those
standards
correctly
would
harm
our
users,
privacy
just
to
give
kind
of
a
very
specific
example.
Here's
a
recent
paper
that
came
out
that
gives
a
an
overview
of
a
bunch
of
fingerprinting
methods
that
are
on
the
web.
C
The
relevant
irrelevant
to
this
conversation
is
not
only
do
they
do
a
nice
work,
nice
job
of
surveying
existing
work
on
how
people
are
actually
doing
think
of
fingerprinted
online,
but
they
also
give
the
impact
of
each
of
those
fingerprinting
methods
and
and
kind
of
a
benefit
how
much
identifying
power
they
give
in.
So
the
one
I
want
to
talk
about
here
is
on
the
right
arm,
on
the
right
hand,
side
audio
context,
which
is
a
way
of
identifying
the
user
based
on
particular
it's
about
how
the
browser
will
do
audio
synthesis.
C
We
know
a
lotta
bunch
of
things
that
the
browser
allows
you
to
do
by
default.
These
are
things
like
asking
querying
the
hardware
for
its
capabilities
or
different
low-level
things,
and
so
what
brave
does
is
we
just
say?
No,
when
it's
a
third
party
context,
we
just
say
you
can't
do
that
unless
you
user,
Ops
and
explicitly
says,
violates
a
standard,
but
to
do
so
otherwise,
with
the
harm
art
would
allow
our
users
privacy
to
be
hard.
We
do
this
in
a
whole
bunch
of
other
places
as
well.
C
This
is
it
not
even
nearly
a
complete
list,
but
I
mention
these
just
as
a
partial
enumeration
of
how
extensive
the
problem
is.
So
how
did
we
wind
up
in
this
situation,
or
why
is
this
problem
so
endemic
across
the
browser
platform?
I
only
have
three
different
examples
of
these
kind
of
anti
patterns.
We
see
in
standards
that
lead
to
this
outcome.
This
is
not
anywhere
near
an
extensive
list
or
a
comprehensive
list,
but
I
need
it
as
motivation,
then
demonstration
so
also
I
don't
mean
this
to
be
simple.
C
So,
for
example,
many
w3c
documents
have
a
privacy
consideration
section
this.
These
are
things
saying
like
implementers
should
be
concerned
about
these
privacy
concerns
that
when
they
implement
this,
but
they
are
not
normative,
they
are
not
mandatory.
They
are
just
a
list
of
concerns,
but
the
rest
of
the
document,
which
is
almost
without
fail,
much
more
much
longer
and
much
more
specific.
C
So,
specifically,
the
functionally
that
must
be
implemented,
and
the
result
is
that
everybody
agrees
on
everybody
implements
the
harmful
part
and
people
are
extremely
unsure
what
to
do
about
the
non
harm
or
about
the
mitigating
part,
and,
as
a
result,
nobody
can
people.
Vendors
hands
are
tied
when
they
want
to
try
to
do
the
mitigations,
because
websites
assume
the
standard,
the
well-defined
behavior.
C
So
just
as
a
motivating
example,
this
one
you
may
be
familiar
with
referred
policies,
a
is
a
standard
that
says,
under
certain
conditions,
notify
the
website
that
you're
visiting
now
of
where
you
just
came
from
this
has
extremely
specific,
very
obvious.
Privacy
harm
our
privacy
implications.
There's
a
very
short
piece
of
text
saying
we
know
that
there's
privacy
problems
here,
but
in
the
mitigation
section
it
just
says,
vendors
can
do
whatever
they
want,
there's
no
specific
specificity.
It
just
says
vendors
may
violate
this
at
any
time.
As
a
result,
many
websites
rely
on.
C
This
is
very
unusual
to
website
operators
and
that's
result.
Many
many
websites
just
assume
the
refer
policy
will
be
standardized
as
it's
described
in
the
rest
of
the
document
and
it
now,
if
you
in,
if
you're,
in
brace
position
or
somebody
like
Gray's
position,
where
you
want
to
protect
the
users,
privacy
and
you
remove
the
refer
header
or
you
do
different
things-
to
try
to
make
the
referer
header
less
privacy
harming
you
break
a
whole
bunch
of
websites.
D
C
Like
it
to
smoke,
I
have
a
long
list
of
things
we
could
probably
talk
about
offline,
but
to
the
first
high
order.
Bit
would
be
don't
do
this,
but
if
you
had
to
do
it,
I
would
say.
Don't
expect
in
third
party
context.
Prefer
a
policy
should
up
should
only
be
sent
on
a
referral
to
be
sent
on
the
user
gesture
things
along
these
lines.
I
think.
D
D
C
That
may
distill
down
to
a
distinction
without
a
difference
in
that
by
saying
vendors
can
do
anything.
Everybody
is
following
the
specification,
and
so
chrome
is
following
the
specification
and
also
doing
what
follows
its
lead.
The
standard
is
no
longer
the
guiding
principle.
I
mean
we
can
take
sounds
good
okay
example:
two
functionality,
that's
useful,
only
in
a
very,
very,
very
small
set
of
situations,
but
is
made
universally
available
to
all
websites,
so
problem
here
is
you
have
something
that's
very
useful
to
a
very
small
number
of
people.
C
Maybe
that's
doing
things
like
audio
synthesis,
but
it's
made
without
there's.
No
there's
no
permission
in
place.
There's
no
gating
the
functionality.
You
know
anything
like
this.
It's
all
of
a
sudden.
It's
not
being
used
just
for
this
very
narrow
use
case,
it's
being
used
for
finger
printing
or
it's
at
least
available
for
fingerprinting
it's
available
for
these
kind
of
privacy,
harm
uses
and
so,
as
a
result,
becomes
extremely
difficult
to
pull
this
off
the
web
without
breaking
all
bunch
of
websites.
That
expect
it
to
be
in
place
an
example
here.
C
Canvas
element
allows
you
to
not
just
write
things
to
the
campus,
but
also
pull
things
out.
The
use
case
for
this
is
not
zero,
but
it's
obviously
not
the
common
case
of
when
you're
writing
things
to
do
a
canvas,
and
so
what
do
we
see?
We
see
not.
We
see.
The
common
use
case
of
this
is
on
libraries,
like
fingerprint
js2,
where,
if
you
dig
into
the
code
in
the
lower
right
hand
corner,
you
can
see
this
being
used
to
generate
unique
identifiers
for
users.
C
Now,
if
you
pull
out,
if
you
think
you're
somebody
like
brave
or
you're
somebody
who's
doing
privacy
protections,
you
think
great,
we'll
just
pull
out
this
functionality
on
all
of
us
and
you
broken
whole
bunch
of
useful
use
cases
as
well,
where
the
the
goal
in
the
first
place
should
be
just
to
say,
figure
out
a
way
of
getting
permission
to
the
sorry,
gaining
access
or
restricting
access
to
these
kinds
of
positive
use.
Cases
in
the
first
place.
C
So
again,
I
have
a
long
range
of
suggestions,
but
the
the
first
approximation
is
to
say
user
gesture,
say
permission
to
say
some
other
way
of
the
user.
Signaling
I'd
like
to
do
a
thing
I'd
like
to
have
you
back
canvas
likely.
If
so
reading,
back
cameras
in
and
of
itself
is
often
not
useful.
You
need
to
do
something
with
that
value
for
it
to
be
useful,
save
it
to
disk
stay
with
the
storage
that,
and
so
those
are
things
that
other
browsers,
already
gate.
C
C
D
C
C
Third
and
final
is
what
we
see
over
and
over
and
over
again
in
standards.
Conversations
which
is
websites
can
already
do
bad
things.
What
is
the
marginal
harm
of
allowing
them
to
do
an
additional
bad
thing?
I
think
this
is
unhelpful,
so,
for
example,
if
something
that
the
standard
is
proposing
a
new
way
of
allowing
web
service
to
do
community
or
applications
to
be
communications
to
remote
servers,
the
common
refrain
is,
you
can
already
go
with
an
image
tag.
What's
the
harm
of
allowing
them
to
do
it?
C
The
relevant
part
here
is
I,
won't
walk
through
this,
because
I
assume
people
are
familiar
with
since
it
lives
here,
but
there's
a
large
number
of
there's
a
large
amount
of
papers
being
written
and
research
that
exists,
showing
that
the
exact
kinds
of
values
that
can
be
requested
through
client
hints
can
nearly
identify
a
non-trivial
number
of
users.
As
a
result,
the
response
is
often
well.
You
can
already
identify
users
anyway.
What's
the
harm
of
already
doing
this
cookies
exist,
don't
they
etc,
etc.
C
I
think
this
is
the
way
of
just
doubling
down
on
the
problem,
to
put
put
it
differently
when
you're
digging
and
stop
digging
or
when
you're
in
a
hole,
stop
digging
figure
out.
Freeze
the
problem
start
thinking
our
ways
to
mitigate
the
problem
as
is,
and
don't
and
don't
entrench
the
problem
indefinitely
again,
I
can
I'm
happy
to
say
more
specifics.
If
this
is
a
topic
of
Congress
of
interest,.
C
One
thing
that
we
see
over
and
over
again
is
is
a
way
of
kind
of
pitching
the
problem
forward
to
say
we
know
this
part
of
this.
This
new
standard
introduces
a
privacy
harm,
but
we're
working
on
the
standard
that's
coming
down
at
the
pipeline
in
the
future.
That
will
fix
that
problem.
I
think
this
is
extremely
harmful.
C
Not
only
does
it
type
the
future
authors
hands
and
what
now
they
have
to
do
to
address
the
current
problem
being
introduced,
but
it
gives
you
know
it
makes
it
extremely
difficult
to
evaluate
the
privacy
harm
by
the
standard
you're
considering
right
now,
which
is
to
say
something,
may
change
in
the
future.
That
is
not
a
basis
to
judge
something
that's
gonna
be
introduced
today.
C
Second
point
I
think
is
worth
keeping
in
mind
when
evaluating
standards
is
that
the
idea
of
formalizing
bad
practice
has
at
least
some
some
appeal
in
that?
You
can
say
well
if
we
can
get
all
the
bad
uses
to
use
this
new
API
instead
of
the
old
API,
then
we
can
reason
about
that.
New
API
use
in
some
semantically
valuable
way.
C
I
think
this
is
not
useful,
because
what
ends
up
happening
is
that
actors
use
both
api's
instead
of
just
a
new
one,
and
the
last
is
this
kind
of
what
I
think
is
this
kind
of
like
judo
move
where
people
say
well
site
authors
use
this
people
like
sites
and
so
users,
indirectly
wanna.
You
want
this
to
exist.
I
think
this
is
totally
not
helpful.
C
It
is
important
to
consider
the
site
site.
Authors
needs
the
first
and
foremost,
is
the
person
using
operating
the
software
and
to
recognize
those
interest
at
verge
frequently
and
to
consider
the
harm
to
the
user,
not
to
nebulously
users
in
general,
so
some
last
takeaway,
some
last
things
I
hope
to
keep
in
your
mind,
is
that
the
amount
of
standards
getting
pushed
through
is
just
extremely
difficult
to
be
able
to
reason
about
privacy,
wise
and
so
to
a
first
approximation.
C
The
best
thing
we
can
do
for
privacy
is
just
just
slow
the
roll
a
little
bit
and
to
give
things
a
little
bit
time
to
percolate
into
to
percolate
through
the
review
processes.
Second
think
about
complexity
in
term
of
itself
as
a
privacy
harm.
It's
not
adding
anything
new
to
the
platform
brings
some
risk
and
also
brings
some
reward,
but
to
not
think
of
it
as
there's
no
privacy
harm
that
I've
identified
it.
So
it's
fine
to
add.
C
C
Work
is
standardizing
things
that
are
already
shipped
and
at
that
point
it's
nearly
impossible
to
pull
them
back
in
and
so
figuring
out
some
way
to
reason
about
things
before
they
get
off
the
door
or
at
least
earlier
in
that
process,
it's
probably
a
place
worth
digging
into
okay,
so
I'm
happy
to
discuss
some
things
more
about
it,
but
I
just
want
to
say.
Thank
you
very
much
for
your
time.
I'm
Pete
Snyder
I'm,
the
privacy
researcher
at
brave
I'm
here
to
try
to
help
do
better
and
privacy
has
standards.
Thanks.
A
So
I
have
one
you
mentioned
many
examples
in
which
standards
could
potentially
be
improved
or
specification,
secured
least
be
potentially
improved
to
help.
You
know,
perhaps
benefit
the
privacy
of
those
implementing
them
or
those
users
that
are
using
the
things
that
are
implemented.
Is
there
any
like
what
sort
of
concrete
or
tangible
steps
can
either
the
IRT
app
or
the
IETF
take
to
move
in
that
direction?
C
Or
I
guess
sure
next
steps
so
I
think
there's
a
in
w3c.
We
have
the
idea
of
hearth,
horizontal
review
boards
or
horizontal
review
groups
that
at
different
iterations
in
the
standards
lifetime
from
conception
to
recommendation
groups
like
paying
the
privacy
group
at
other
groups,
like
accessibility
are
expected,
or
at
least
have
the
option
of
giving
input
at
that
point,
I
think
for
maizing.
That
process
stronger
would
be
an
extremely
useful
way
of
allowing
interested
in
concerned
actors
to
get
involved
earlier
in
the
process.
C
E
Most
of
the
examples
that
you
mentioned,
we
have
taken
from
the
world
of
w3c,
which
I'm
not
familiar
with
beside
the
client
ins.
Do
you
have
ideas
of
photo
codes,
IETF
protocols
with
the
same
sort
of
problem,
because
in
theory
in
the
IETF
people
should
avoid
security
considerations,
including
privacy?
We
have
FCC
9:23,
so
in
theory,
problem
should
not
happen
in
the
IETF,
but
of
course
they
do
so.
Do
you
have
specific
examples
in
mind
from
your
experience
with
so.
C
I,
don't
have
any
examples
from
the
ITF
world
specifically
that
I
would
feel
very
confident
talking
about
beyond
just
kind
of
yeah
from
the
outside,
but
I
would
say
we
also
WEC
standards
also
have
these
privacy
and
security
considerations,
sections
and
there's
a
concerned
vendor
I.
Think
that
is
a
step
one
of
what
needs
to
be
a
10
step.
Road.
C
F
Lemon,
so
thanks
for
presenting
this
is
really
helpful.
One
of
the
things
that
I've
noticed
people
do
to
try
to
mitigate
risks
of
this
sort
is
to
have
essentially
a
list
of
things
that
a
particular
site
is
allowed
to
do,
and
that
seems
like
I
mean
this.
Is
that
this
really
wouldn't
apply
to
IETF
standards,
but
it
sort
of
seems
like
it
applies
to
the
stuff
that
you're
actually
talking
about
here.
F
So
basically,
a
set
of
entitlements
I
mean.
Does
that
make
sense
to
you
as
a
way
because
the
problem
with
what
you're,
with
with
slowing
down
the
the
advancement
of
progress
so
to
speak,
is
it's
very
difficult
to
do,
because
there's
always
somebody
who
wants
this
new
feature?
And
what
do
you
say
to
them?
I
mean
the
new
feature
is
probably
totally
privacy
violating,
and
there
really
isn't
a
mechanism
for
actually
preventing
it
from
being
exposed
to
the
user
when
the
user
doesn't
want
it
to
be
exposed
to
them.
F
C
So
I
completely
agree
with
you
that
that
figuring
out
what
sites
should
be
allowed
to
do
what
it.
What
any
given
point
is
extremely
difficult.
I
I
slightly
disagree
with
you
and
that
there's
not
ways
that
vendors
can
enforce
those
choices.
Those
vendors,
a
couple
of
examples
came
up
before
other
things,
maybe
policies
determined
by
the
browser,
offline
or
shipped
with
the
browser,
given
some
knowns,
that
of
safe
sites.
C
Permissions
are
knocked
around,
but
but
not
useless
way
of
going
about
this
user
gestures,
where
the
frame
is
where
the
code
came
from,
which
is
not
a
variable
in
any
standard
currently,
but
not
what
frame
is
it
executing
in,
but
what
who
delivered
it,
etc,
etc,
etc.
So
I
think
I
think
I
think
is
more
ivan
either
for
hope
than
well.
F
C
F
G
Garrymon
diem
qualcomm,
we
call
back
to
six
years
ago,
when
I
was
chairing
the
geolocation
working
group
from
the
w3c.
We
had
a
discussion
is
he,
you
know
related
to
ping
advice
on
the
topic,
and
we
just
then
we
discussed
the
concept
of
whether
a
webpage
could
or
what
a
web
web
service
provider
couldn't
declare
to
the
user.
What
their
intentions
were
as
far
as
any
information
they
would
provide,
and
we
couldn't
figure
out
a
way
to
actually
do
that
without
getting
it
without
it
being
abused
by
rogue
parties.
G
I'm
wondering
now,
though,
you
know
when
you
look
at
the
only
innovations
such
as
certificate
transparency
and
we're
getting
better
and
better
authentication
into
the
browser
all
the
time
respective
websites,
whether
whether
browser
based
policies
with
respect
to
individual
websites,
could
actually
take
the
place
of
having
to
specifically
advise
the
user
from
the
service
service
itself.
So
I
was
wondering
what
your
thoughts
are
on
this
well.
C
I
mean
I,
think
geolocation
in
the
browser
is
actually
a
positive
example.
I
mean
that
is
a
of
the
many
ways
that
user
privacy
is
harmed.
That
is
not
often
one
of
them
because
it
very
explicitly
says
the
users
understand
what
that
means,
and
they
have
they
opt
in
and
I.
Think
of
the
hundred
things
that
makes
me
concerned
that
that's
not
one
of
them
I
think
for
that
reason,
I
think
the
idea
I'm
not
sure,
there's
any
solution
to
the
concern
of
the
website.
Saying
I'm
only
gonna
do
this
with
that
information.
H
Okay,
hi
everyone,
I'm
Sandra,
a
PhD
student
at
EPFL
and
today
I'm,
going
to
present
some
work
done
by
myself
and
my
co-authors
on
traffic
analysis
of
encrypted
DNS,
so
I'm
going
to
start
by
jumping
straight
to
a
conclusion,
which
is
that
we
did
a
number
of
experiments
where
we
did
a
traffic
analysis
of
DNS
over
HTTP
traffic
and
we
found
that
monitoring
and
censorship.
It's
still
feasible,
even
in
the
presence
of
encryption,
and
that
currently
proposed
a
DNS
based
countermeasures
against
traffic
analysis
are
not
sufficient
to
prevent
such
attacks.
H
So
in
this
talk,
I'm
going
to
describe
some
of
these
experiments
in
detail
so
when
a
client
connects
to
a
destination
host
generally,
this
is
preceded
by
a
DNS
lookup.
As
we
all
know,
there
are
measures
in
place
to
encrypt
the
connections
between
the
client
and
the
destination
hosts,
but
DNS
lookups
have
so
far
been
sent
in,
but
here
which
makes
them
susceptible
to
eavesdropping
and
tampering.
H
For
example,
if
you
have
an
adversary
on
the
path
between
the
client
and
the
recursive
dissolvent,
the
adversity
get
get
some
idea
of
the
browsing
history
of
the
user,
for
example,
which
is
a
privacy
concern.
There
is
also
a
censorship,
that's
based
on
DNS,
so
there
have
been
measures
in
place
before
that
have
been
proposed
to
improve
DNS
security.
H
You
have
measures
such
as
DNS
SEC,
which
look
at
authentication
but
do
not
provide
confidentiality,
and
you
have
measures
such
as
DNS
Script,
which
allow
for
encryption
but
did
not
see
much
widespread
adoption
over
the
last
couple
of
years.
You
have
these
protocols
DNS
over
TLS
and
DNS
over
HTTPS,
which
have
been
gaining
some
traction.
H
So
the
scenario
that
we
are
looking
at
is
an
observer
who
is
monitoring
the
connection
between
a
client
and
a
recursive
resolver.
So
the
user
visits
the
page
and
the
observer
tries
to
get
some
features
from
this
DNS
over
HTTP
or
door
traffic
and
tries
to
guess
which
web
page
is
being
visited
by
the
user.
H
We
are
considering
two
adversary
goals
here:
monitoring
and
censorship
and
I'm
going
to
speak
about
each
of
them.
So,
as
I
mentioned
before,
the
goal
of
a
monitoring
adversary
is
to
look
at
the
do
edge
traffic
and
get
some
features.
So
we
and
I
try
to
identify
the
webpage
visited
by
the
user.
So
for
this
we
build
a
classifier
based
on
size
and
directionality
features
of
the
door.
Traffic
I
don't
go
into
the
details
there
of
the
classifier
they're
available
in
our
papers,
but
we
basically
can
conducted
two
experiments
here
in
the
first
experiment.
H
We
considered
a
case
where
an
adversary
knows
the
entire
set
of
is
that
a
user
has
visited
and
the
goal
of
the
adversary
is
to
identify
which
particular
webpage
was
visited
by
the
user.
So
in
this
experiment
we
considered
a
set
of
1,500
web
pages.
So
if
you
consider
a
random
classifier
that
tries
to
guess
which
web
pages
it
is,
the
classifier
would
try
to
guess
this
with
one
on
1,500
basically.
But
what
we
see
is
that
our
classifier
gets
a
90%
precision
and
recall
where
precision
is
a
measure
of
correctness
of
all
the
results.
H
That
word
is
turned
by
the
classifier
and
recall,
says
how
many
relevant
results
were
returned.
So
when
the
classifier
has
a
high
precision
and
recall
score,
this
means
that
not
only
did
the
classifier
identify
a
large
number
of
web
pages,
it
did
so
correctly
in
the
second
experiment.
We
consider
a
bit
more
realistic
scenario
where
an
adversary
does
not
know
all
the
set
of
web
pages
that
are
visited
by
the
user.
H
Rather,
the
adversary
has
has
is
interested
in
a
subset
of
the
webpages
called
the
monitored
set,
and
the
goal
of
the
adversity
is
to
determine
whether
the
user
visited
a
page
in
this
monitored
setting.
So
for
this
experiment,
we
looked
at
a
set
of
5,000
web
pages,
where
1%
of
the
web
pages
were
in
the
monitored
set.
This
is
generally
a
harder
classification
problem
in
the
area
of
website
fingerprinting,
and
we
see
that
we
get
a
lower
precision
recall
score
of
about
70%,
but
this
is
still
much
higher
than
a
random
case.
H
The
second
goal
that
we
considered
was
censorship.
We
did
a
preliminary
analysis
of
a
censoring
adversity.
The
goal
of
the
censoring
adversary
is
slightly
different.
The
idea
is
to
identify
a
web
page
as
fast
as
possible
and
I
really
try
to
block
the
connection.
So
this
means
that
an
entire
door
trace
will
not
be
available
to
the
adversity
and
the
adversity
has
to
look
at
partial
door
traces.
H
So
what
we
found
out
in
our
analysis
is
that
generally,
the
fourth
TLS
record,
usually
curls
ons
to
the
first
doe
query
in
our
traces
and
the
size
of
the
TLS
records,
also
has
connection
to
the
length
of
the
domain
name.
So
this
means
that
one
strategy
that
an
adversary
could
follow
would
be
to
block
on
the
first
query,
this
means
that
the
user
will
not
be
able
to
load
the
page.
H
The
disadvantage
of
this
method
is
that
it
could
result
in
high
collateral
damage,
because
other
pages
with
the
same
domain
length
could
also
be
blocked
as
a
result
of
the
strategy.
Another
thing
that
we
found
out
was
that
by
the
fifteenth
record,
or
so,
which
corresponds
to
approximately
15%
of
a
trace
length,
most
of
the
traces
in
our
set
were
distinguishable,
so
the
adversary
could
follow
a
strategy
where
they
try
to
block
after
having
a
high
confidence
that
this
is
the
trace
that
they
want
to
block.
H
H
We
also
did
a
number
of
experiments
where
we
looked
at
the
robustness
of
the
attack
and
by
this
what
I
mean
is
when
different
aspects
of
the
experimental
scenario
change.
How
does
be
an
adversary?
Try
to
keep
good
classifier
performance,
for
example,
DNS
traces
can
vary
over
time.
So
how
often
does
the
adversary
have
to
retrain
their
classifier?
Then
we
wanted
to
see
the
effect
of
client
location
on
our
classifier,
as
well
as
changes
in
the
infrastructure.
So
by
infrastructure
I
mean
we
change
the
the
resolver
we
experimented
with
CloudFlare
and
Google's
resolver.
H
We
looked
at
cloud
flat,
standalone,
doe
client
as
well
as
Firefox's
in
Bill
client.
And
finally,
we
did
some
analysis
of
desktop
versus
Raspberry
Pi
environments
and
the
main
takeaway
is
that
for
best
performance.
Ideally,
you
would
train
a
classifier
that
is
tailor
to
that
particular
scenario.
If
you
use
a
classifier
trained
on
one
set
of
parameters,
the
classification
performance
does
drop
when
you
test
on
another
set,
but
it
does
not
stop
the
attack.
H
So
what
we
saw
from
my
initial
set
of
experiments
are
that
monitoring
and
censorship
are
feasible,
even
when
DNS
traffic
is
encrypted.
So
we
looked
at
countermeasures
to
prevent
traffic
analysis
attacks.
So
the
first
thing
we
looked
at
was
edn
airspace
countermeasures
where
a
DNS
is
extension
mechanisms
for
DNS,
so
one
of
the
options
in
a
DNS
is
a
padding
option
which
allows
you
to
add
some
padding
to
DNS
queries
and
responses,
and
the
idea
behind
this
is
that
you
remove
the
size
information
that
is
available
for
the
classifier
to
distinguish
web
pages.
H
So
the
first
thing
that
we
did
was
we
implemented
padding
of
DNS
queries,
so
we
used
cloud
flash
stand-alone,
doe
client
for
this,
and
we
implemented
a
recommended
padding
strategy.
So
the
RFC
there
has
padding
strategies
and
the
recommended
one
is
to
pad
queries
to
multiples
of
128
bytes,
which
is
what
we
implemented
on
cloud
flask
line.
H
We
had
also
contacted
CloudFlare
with
the
initial
set
of
results
and
they
implemented
padding
of
responses
on
their
resolver.
How
were
they
they
padded
their
responses
to
multiples
of
128
bytes,
whereas
the
recommended
strategy
is
to
add
them
to
multiples
of
4
68
bytes.
So
we
decided
to
compare
both
the
strategies
as
well,
so
the
experiments
that
we
did
were
the
two
e
dns-based
meshes,
which
I
just
described.
We
also
looked
at
a
case
that
we
called
constant
padding.
So
this
is
a
simulated
scenario
we
wanted
to
see
if
we
had
perfect
padding.
H
That
is,
if
all
the
TLS
records
work
to
the
same
size,
how
the
classifier
would
perform.
So
we
basically
set
all
the
sizes
to
the
maximum
possible
value
that
we
saw
in
our
trace
and
apply
the
classifier
to
this
case.
And
finally,
CloudFlare
has
a
dns
over
tall
service
where
DNS
queries
and
responses
are
sent
over
the
Tor
network.
H
So
we
decided
to
experiment
with
this
service
as
well
to
see
how
anonymous
communication
acts
as
a
defense,
so
this
table
outlines
the
performance
of
the
classifier,
just
note
that
the
values
are
as
decimals,
not
as
percentages
here
for
for
comparison
without
any
counter
measure
the
classify
attains
about
90%,
precision
and
recall.
We
see
that
with
a
DNS
with
the
current
CloudFlare
strategy,
precision
and
recall
drops
to
only
about
70%
with
the
recommended
padding
strategy
it
drops
to
about
45%.
H
Both
these
values
are
much
higher
than
a
random
case,
which
shows
that
a
DNS
based
measures
and
do
not
eliminate
traffic
analysis.
If
you
look
at
constant
padding
it's
about
7%,
so
there
is
a
major
drop
in
the
performance
and
DNS
over
tor
achieves
the
best
results
with
about
3.5%
precision
recall
we
also
looked
at
the
overhead
in
terms
of
amount
of
additional
traffic
that
is
added
by
these
counter
measures.
H
So
we
did
a
very
short
experiment
where
we
took
50
web
pages
and
about
6
samples
per
web
page
and
applied
each
of
the
counter
measures
and
looked
at
the
total
volume
that
is
sent
and
received.
Bytes
of
the
TLS
records.
Just
note
that
the
y-axis
is
in
log
scale
here
we
see
that
the
e
DNS
face
measures
as
expected,
do
not
add
much
overhead.
Our
constant
padding
adds
a
lot
of
overhead
because
we
are
padding
everything
to
the
maximum
size
and
DNS
over
tor
is
somewhere
in
between.
H
So
we
see
that
tor
is
a
effective
defense
for
the
traffic
analysis,
attacks
and
the
reason
so
that
interview
the
data
sent
in
fixed
cell
sizes,
which
reduces
the
variability
of
sizes
and
of
exercise
related
features
of
the
classifier.
Another
thing
is
that
there's
repackage
ization,
and
by
that
we
mean
the
data,
can
be
merged
or
bundled
together
in
tall,
and
this
affects
the
directionality
features
we
look
at
which
records
have
been
sent
and
received,
and
you,
when
thought
does
this,
this
affects
the
directionality
features
used
by
the
classifier.
H
One
thing
that
we
are
not
been
able
to
explain
the
clusters
that
we
saw
in
the
confusion
graph,
so
this
graph
shows
web
pages
that
have
been
mislabeled
as
one
another,
and
what
we
saw
is
that
web
pages
generally
tend
to
be
clustered
where
web
pages
within
the
same
cluster
and
to
be
misclassified
as
one
another.
This
means
that
the
anonymity
set
for
a
particular
web
page
is
not
the
entire
set
of
web
pages
in
the
test,
but
rather
only
the
web
pages
within
a
particular
cluster.
I
H
H
The
first
thing
is
that
our
experiments
looked
at
the
case
where
a
user's
visiting
one
page
after
the
other-
and
this
is
not
exactly
a
real,
realistic
user
scenario.
So
we
are
considering
the
case
where
you
can
have
multiple
tabs
open
by
a
user
which
results
in
some
interleaving
of
the
door
traffic.
Our
initial
results
show
that
a
classifier
gets
about
40%
precision
recall
with
about
two
tabs.
Another
thing
is
that
we
consider
the
case
right
now
where
there
is
no
caching
of
the
DNS
records.
H
So
we
want
to
study
the
impact
of
caching
as
well
on
the
classifier.
We
also
started
doing
a
comparison
with
DNS,
so
a
TLS
traffic,
and
we
looked
at
the
padded
DNS,
so
a
TLS
traffic,
and
we
see
that
it
is
much
more
resistant
to
the
classify
it's
about
28%
as
compared
to
dough.
So
we
have
started
doing
or
feature
analysis
to
see
why
this
is
the
case.
H
And
finally,
we
want
to
see
if
we
can
have
counter
measures
which
include
both
padding
and
rhe
packetization,
but
without
tors
overheads
and
latency
and
volume
caused
by
headers.
So
this
is
basically
the
summary
that
currently
proposed.
Ddns
measures
might
not
be
enough
to
prevent
the
traffic
analysis,
and
these
are
some
links
to
our
paper.
Thanks.
J
H
H
J
J
K
K
Because
I
wanted
to
encourage
this
kind
of
work,
so
I'm
really
happy
that
you've
done
that
and
great
to
see
the
results.
The
way
we
arrived
at,
the
recommended
padding
policy
was
basically
to
look
at
individual
DNS
queries
and
responses
and
I
like
what
you've
done
here,
which
is
to
look
at
them
in
combination.
So
I
have
a
couple
of
questions
that
maybe
you
can
give
me
what
your
intuition
is.
I
think
what
you're
saying
with
the
constant
padding
arrangement
is
that
the.
H
Yes,
so
there
are
a
couple
of
things.
It
also
depends
on
the
kind
of
features
that
we
used.
So
one
of
the
things
that
we
looked
at
was
we
looked
so
if
you
have
a
trace,
we
looked
at
whether
each
TLS
record
was
going
from
client
to
resolve
over
from
resolver
to
client.
So
then
you
have
basically
a
sequence
of
you
know
which
direction
it
is
going
in.
So
even
if
you
removed
the
sizes,
you
still
have
this
directionality
as
a
feature
directionality.
H
K
M
Fross,
a
canal
blabs,
very
cool
work,
I,
really
appreciate
that
I
had
so
one
question
is
already
mystique:
eg
acid,
but
the
other
one
I
had
was
so
nurses
hypotheses
that
the
DOE
Safari
said
that
if
you
mix
do
H
traffic
with
normal
web
traffic,
that
that
would
obviate
the
signature
of
the
DNS
traffic
a
lot
more.
Is
that
something
that
you
consider
studying
as
well?
Yes,.
H
I
H
H
H
N
N
E
H
H
I
think
it
might
change,
especially
if
the
classifier
has
been
trained
in
in
a
particular
way
as
and
if
it's
trained,
based
on
traces,
where
individual
queries
are
in
individual
records.
This
of
course
changes
the
pattern
of
the
traffic.
So
how?
If
you
train
it
in
that
particular
case,
if
you
have
a
mix
of
traffic
I,
think
it
would
affect
the
performance
of
the
classifier.
Yes,
thank.
N
H
P
H
P
H
H
P
Q
Nice
working
again
did
you
consider
like
to
emulate
the
behavior
of
a
National,
Security
Agency,
or
something
like
that.
The
intelligent
agency
that
you
have
a
set
of
websites
that
you
know
they
don't
want
you
to
go
to
and
then
so
we
try
to
fingerprint
them
will
do
whatever
your
method
is,
and
then
you
get
a
like
domain
names
on
various
lists
or
entire
DNS
zones
and
try
to
see
if
an
extreme
like
that,
you
would
be
able
to
detect
them
like
the
techie
users
going
to
those
allegedly
forbidden
web
sites.
R
H
So
we
did
an
initial
feature
analysis
and
we
did
consider
into
a
packet
arrival
times
at
the
moment.
At
that
time
we
didn't
see
that
it
it
increased,
I
mean
it
did,
increase
the
precision
recall,
but
not
by
much,
and
usually
we
found
that
using
timing
as
a
feature
can
also
be
complicated,
because
it
also
depends
on
which
position.
The
adversary
is,
whether
you
know
you're,
locating
the
adversary
on
router
or
know,
if
you
we're
doing
measurements
right
on
the
client
or
not.
B
A
A
It
was
presented
that
a
nrw
workshop
earlier
this
week
and
so
just
to
get
right
into
it,
as
you
may
know,
and
as
was
just
described
there,
so
the
recent
shift
there's
a
recent
shift
in
focus
on
privacy
in
the
IETF
and
ITF
in
general,
and
particularly
we're
trying
to
protect
what
resources
or
what
applications
or
what
services
you
know.
Particular
clients
are
using
and
there's
also
that
gets
the
commerce
or
the
opposite
side
of
that
is
what
we're
trying
to
protect.
Who
is
accessing
these
particular
resources
and
using
these
services
and
connections.
A
What
not
clearly
the
former
is
easier
than
the
latter,
or
rather
the
the
former
is
harder
than
the
latter,
because
we
have
very
distinguishing
identifiers
currently
like
IP,
addresses
and
other
things
that
are
in
client
software.
So
that's
what
I'm
gonna
focus
on
and
the
idea
has
been
trying
to
do.
A
lot
of
work
to
push
in
that
direction,
in
particular,
rolling
out
dough
and
dot
and
gos
encrypted
S&I.
A
It
can
observe
all
packets
between
the
client
and
the
server,
and
the
goal
is
pretty
simple:
it
just
wants
to
learn
some
information
about
that
particular
connection,
be
it
you
know
what
resource
was
actually
requested,
perhaps
some
metadata
about
that
resource,
referring
back
to
in
our
duty
paper.
Just
describing
you
know
what
HTTP
method
was
used
or
sent
over
this
particular
connection
and
optionally.
A
They
may
also
want
to
link
this
back
to
a
particular
client
should
also
know
that
in
the
real
world,
this
is
generally
assumed
to
be
studied
in
what
we
call
the
open
world
model,
which
is
where
you
know
you,
you
train,
potentially
your
classifier
or
whatever.
It
is
you're
doing
your
initial
preliminary
assessment
on
on
a
fixed
set
of
connections,
but
what
the
actual
clients
do
in
the
wild
is
sort
of
unpredictable
and
so
you're
not
constrained
to
that.
A
What
you
train,
your
you
know,
initial
experiments
on
so
the
closed
world
model,
which
is
generally
considered
to
be
much
easier
in
this
particular
problem
space
and
what
we
actually
you
know
for
this
particular
work
studied,
of
course,
gives
better
results
for
classification
and
identification,
but
ultimately
goal
was
to
focus
on
the
open
world
and,
as
was
presented
earlier,
there
are
lots
of
different
features
available
when
you
want
to
do
this.
Sort
of
you
know
identification,
particularly,
you
can
look
at
Network
addresses.
A
You
can
look
at
packet
timing
in
sizes
and
all
these
things,
or
even
the
clear
tax
information
that
was
previously
available
or
is
currently
available,
depending
on
what
client
your
software
you're
running
in
this
work,
though
the
you
know
we're
assuming
that
the
things
that
should
obviously
be
encrypted
like
DNS
and
sni
are
encrypted
we're.
Assuming
that
we're
not
you
know,
the
adversary
is
not
looking
at
packet
timing
in
sizes,
strictly
looking
at
network
addresses
to
try
and
figure
out,
you
know
what
particular
website
or
what
service
a
particular
connection
is
trying
to
access
and.
A
This
is
to
kind
of
show,
claim
that
it
becomes
increasingly
difficult,
as
you
obviously
take
away
the
clear
text.
Information
on
the
wire,
so
the
current
state
assuming
you're
running,
perhaps
some
older
software
and
resolving
or
opening
up
a
particular
connection
in
HTS
connection
I,
have
a
client,
the
middle,
who
sends
a
clear
text.
Query
to
a
DNS
resolver
gets
back.
An
answer
in
clear
text,
obviously
knows
where
they're
going,
because
they
see
both
the
query
and
the
address
everything's
fine,
at
least
from
the
adversaries
perspective,
and
then
he
opens
up
TCP
connection,
TOS
connection.
A
Everything
again
is
exchanged
to
clear
and
ultimately,
the
resource
that
they're
after
is
encrypted,
but
depending
on
the
adversary
model.
Again
they've
already
learned
exactly
what
you
know
application
the
client
is
after
so
it
could
potentially
be
game
over
that
particular
point,
especially
if
the
goal
is
to
censor
based
on
that.
You
know
that
signal,
but
if
you
add
Dolph,
Dolph,
dot
and
dough
into
the
mix
and
yes
and
I
lots
more
things
are
encrypted.
A
On
a
particular
page.
Do
the
resolution
for
all
of
those
names
to
be
at
the
top-level
domain
and
all
the
subtrees
URLs
using
CDNs,
collect
all
this
data
and
then
try
to
see.
You
know
how
we
unique
these
IP
addresses
are
fairly
straightforward
for
this
particular
experiment,
and
if
you
look
at
the
anonymity
set
that
results
from
that
particular
experiment
that
the
data
kind
of
suggests
that
basically
there's
many.
A
What
we
call
a
unique
IP
addresses
that
those
that
fall
into
you
know
and
not
on
a
set
of
size
like
one
or
two,
and
this
is
not,
you
may
think
it's
perhaps
uncommon.
As
the
you
know,
we
move
towards
a
a
you
know,
a
world
in
which
a
of
es,
since
you
see
and
everything
hosts
a
lot
of
applications,
but
still
there
are
a
lot
of
legacy
or
legacy.
A
There
are
older
servers
that
you
know
run
from
behind
a
couch
or
have
a
unique
IP
address
that
simply
looking
at
it
can
reveal
exactly
where
you're
going.
So
that's
not
great
but
again,
I.
Take
this
we're
going
to
solve
in
this
particular
data
is
collected
from
closed
world
environments,
so
it
would
become
harder
to
identify
this
or
do
this
set
in
the
open
world
as
we're
saying
earlier,
if
you
look
at,
if
you
want
to
identify,
for
example,
what
is
the
actual
page,
you
know
a
client
is
attempting
to
access.
A
Perhaps
the
logical
thing
would
be
to
look
at,
not
just
you
know,
a
single
connection,
that's
an
initiated
when
loading
that
page,
but
rather
the
set
of
connections
that
are
initiated
when
voting
that
page
and
all
of
its
sub
resources.
So
as
the
on
path
adversary,
you
see
things
like
the
DNS
query
patterns
you
see
to
us
in
TCP
connection
pattern,
and
the
set
of
these
things
are
often
should,
in
theory,
be
sufficient
or
more
unique
than
each
one
on
its
own.
A
So
if
you
were
to
you
know
as
an
example,
if
you
loaded
New,
York,
Times,
comm
and
Safari,
you
would
see
many
many
TLS
connections
kicked
off
many
many
DNS
requests
sent
over
dough
or
dot
and
that
the
union
of
that
set
is
what
we're
using
as
the
fingerprint
for
a
particular
page
load.
So
the
privacy
of
a
page
load
then
considers
the
same
exact
adversary.
Just
assumes
that
the
adversary
has
is
able
to
bucket
eyes
or
group.
A
You
know
these
connections
from
a
client
into
a
single
event
and
then
use
that
to
make
a
determination
as
to
how
unique
a
particular
connection
is
and
perhaps
use
that
uniqueness
to
associate
it
with
known
top-level
domains.
So
the
same
features
are
available
or
just
you
know,
expanding
the
scope
a
little
bit
here
and
that
pretty
much
describes
or
shows
what
I
was
describing.
A
So
you
go
from
a
single
connection
to
multiple
connections
and
you
look
at
sums
instead
of
individuals
very
straightforward
and
same
thing
same
thing,
so
the
simply
experiments
that
were
discussed
in
the
paper.
They
didn't
focus
on
the
very
large
IP
address,
set
or
domain
set.
That
was
used
for
the
single
connection,
IP
address
nominee
experiment,
but
rather
just
the
top
1
million
loaded
them
using
their
crawler
and
then
compute.
Some
basic
statistics,
for
example.
How
many
unique
URLs
are
you
know
reference
upon
loading
each
individual
page?
A
You
know
how
many
different
domains
is
that
kick-off,
underneath
the
hood
to
see
how
many
connections
you're
making
and
the
results
of
you
know
doing
that.
Looking
at
the
number
of
unique
IP
addresses
or
the
number
of
unique
page
little
fingerprints
that
came
out
of
it,
it's
basically
here,
so
you
can
see.
A
If
you
look
at
the
x
axis,
the
anonymous
head
size
significant
significantly
shifts
to
the
left,
basically
suggesting
that
by
looking
at
the
some
of
these
connections
and
grouping
them,
two
individual
payloads,
the
anomaly
set
or
the
uniqueness
goes
up,
which
matches
our
intuition
again
closed.
World
open
world
so
still
could
be
improved.
A
So
the
the
conclusions
we
kind
of
draw
from
this
very
very
you
know.
Preliminary
research
is
still
ongoing.
As
that.
Clearly
we
need
some
sort
of
encryption
of
obviously
clear
text
holes
to
get
some
sort
of
notion
of
connection
privacy,
and
we
need
some
sort
of
notion
of
connection
privacy
to
get
some
notion
of
page
book,
privacy
and
I.
Think
perhaps
that's
like
the
ultimate
goal
of
a
lot
of
these
things.
I
mean
encrypted
sni
and
doand.
A
Dot
are
great
in
that
they're,
focusing
specifically
on
the
connection
privacy,
but
you
know
perhaps
there's
more
that
could
be
done
to
get
towards
the
the
larger
bigger
picture
that
we're
trying
to
protect.
There
are
a
lot
of
related
issues
here
and
things
that
potentially
not
considered
in
this
experiment
and
particularly
equity
of
have
a
client
that's
doing
happy,
eyeballs
to
race
connections
across
address
families
or
even
across
interfaces.
That
might
be
worse
in
some
ways,
because
you're
simply
giving
the
network
more
information
about
where
you're
trying
to
go.
A
So
it
is
great
for
performance,
as
clearly
demonstrated
by
all
the
clients
that
are
implementing
it
and
the
benefits
that
it
brings.
However,
from
this
particular
spective,
it
might
make
things
a
little
bit
easier
for
the
adversary,
which
is
not
necessarily
great
over
on
the
plus
side,
the
things
that
we're
doing
in
the
HP
biz
to
coalesce
connections
with
secondary
certs,
it's
great,
because
that
potentially
shoves
more
requests
along
a
single
connection
and
makes
it's
effectively
removing
information
from
pages
of
fingerprints
that
would
have
otherwise.
A
You
know
spun
up
new
connections
and
perhaps
add
to
the
amount
of
uniqueness
that
exists
for
a
particular
patient.
Fingerprint
consolidation
within
a
single
CDN
as
well
also
helps
because
you
have
single
connections
that
you
know.
Basically,
clients
are
tethering
to
the
CDN
and
then
sending
all
their
connections
and
all
their
requests
over
it.
A
You
know
just
identify
a
particular
service
by
a
specific
IP
address,
I
mean
you
could
do
fancy
things
by
potentially
trying
to
identify
the
ASN
that
to
which
that
IP
address
belongs
and
then
associating
you
know.
You
know
the
IP
addresses
of
particular
connections
with
asons
and
then
looking
at
the
Union
or
the
set
of
asons.
That
result
from
a
particular
connection
and
using
that
to
identify
a
PLF,
but
that
has
not
been
done
yet.
A
Looking
at
how
bad
traffic
analysis
is
for
you
known
on
Thor
connections,
of
course,
much
of
the
research
is
done
in
tour,
but
it
applies
equally
as
well,
if
not
more
to
you
know
non
Thor
connections,
which
don't
do
things
like
fix
cell
padding
to
mitigate
obvious
or
obvious,
or
to
make
traffic
analysis
just
a
little
bit
harder,
so
I
think
next.
Steps
for
this
are
to
really
kind
of
encourage
people
to
take
this
problem
a
bit
more
seriously,
if
they're
not
already,
and
that
means
like
asking
people
to
do
more
research
in
this
area.
A
Such
I
mean
Nikita
and
others
are
obviously
doing
it.
We
should
for
them
to
keep
going
I,
think
documenting
to
the
gnome
research.
That's
been
done
in
the
area
is
also
quite
useful.
If
you
know
the
reason
to
having
for
having
it,
you
know
a
single
reference
that
we
can
use
to.
You
know
either
assess
or
assess
countermeasures,
or
you
know
sanity
check
to
make
sure
you
know
I,
just
don't
learn
whether
or
not
you
know
something
that
hasn't
been
done.
A
Actually,
indeed
has
been
done,
and
then
you
know
perhaps
IV
III
RTF
or
the
ITF
can
like
work
with
these
people
who
are
actively
working
on
these
problems
to
develop
mitigations
to
see
you
know,
what's
effective
and
from
a
you
know,
cost
performance
perspective.
So,
for
example,
the
previous
presentation
like
here
we
got
good
results
from
sending
dns
over
tor,
but
it's
you
know,
there's
a
performance
hit
there.
So
what
is
finding
that
right
trade-off
is
difficult,
and
so
perhaps
something
we
should
implore,
IETF
and
I
urge
you
have
to
be
working
on.
S
A
R
Hey
Chris,
we
had
just
a
quick
thing:
can
you
go
back
to
the
second
graph
that
you
showed
us
with
the
buckets
so
from
this
it
looks
like
anonymity.
Set
of
two
is
two
orders
of
magnitude
smaller
than
an
amenity
set
of
one.
So
if
we
sum
up
everything
that
means
that,
like
95%
of
web
sites,
can
be
identified,
yes,.
T
A
No,
no
just
that
dns-based
boat
balancers
might
give
you
different
IP
addresses
if
you
try
to
resolve
the
same
name
over
and
over
again.
So
a
simple
you
know
and
naive
approach
to
trying
to
map
name
to
IP
address
and
looking
at
the
first
result
that
comes
back
might
not
always
work,
because
the
answer
will
change
on.
U
A
M
All
right,
my
name
is
Rowan
fresh-picked
I
work
for
an
omlette
labs.
This
is
joint
work
with
high
scanners,
who
was
my
master
student
mad-eye's,
one
loaf
and
Luca
Elodie
from
historian
and
Technical
University
of
Eindhoven
respectively
right.
So
it
feels
like
we're
repeating
ourselves
here,
because
I
think
somebody
had
a
similar
slide
and
the
person
before
me
had
a
similar
slide
and
the
IETF
is
focusing
on
protecting
people's
privacy,
which
is
great.
We
have
things
like
deprive.
We
have
DNS
over
TLS.
M
But
if
you
look
at
the
domain
name
system,
there
is
a
sort
of
an
obvious
elephant
in
the
room,
which
is
the
operator
of
the
resolver
that,
even
if
you
protect
all
the
traffic
in
flight,
still
has
access
to
your
traffic
and
then
they
might
actually
have
legitimate
reasons
for
doing
so.
Right.
If
you
are
in
enterprise
network,
you
may
want
to
use
DNS
to
detect
indicators
of
compromised
people
that
are
infected
with
malware
on
your
network,
or
you
want
to
be
able
to
monitor
our
threats
in
large
user
bases.
M
For
example,
court
9
does
that
they
have
a
very
strict
privacy
policy,
but
they
still
inspect
traffic
in
order
to
detect
malicious
behavior
in
in
larger
populations
that
user
resolver.
So
it
is
probably
too
easy
to
say
they
shouldn't
do
this,
but
what
we
wondered
is:
can
we
find
a
better
way
of
doing
that
that
sort
of
provides
some
privacy
guarantees
for
users
when
we're
inspecting
traffic
on
the
result
for
itself?
M
So
what
we
did
is
my
master
student
developed
a
potential
solution
for
this,
which
uses
something
called
that
is
called
bloom
filters,
which
some
of
you
may
be
familiar
with,
but
I'll
explain
it
in
the
next
couple
of
slides.
More
importantly,
we
have
a
working
prototype
which
is
open
source
Oh
URL
is
on
one
of
the
last
slides.
If
you
want
to
have
a
play
with
it,
it's
very
much
a
prototype,
but
it
gives
you
some
idea
what
it
does
and
we
tested
this
in
production
at
surf.
M
That's
the
national
research
and
education
network
in
the
Netherlands
on
resolvers
that
have
a
client
base
of
roughly
200
to
250
thousand
users
right.
So
first
up
bloom
filters,
so
bloom
portals
really
were
developed
in
the
1970s
just
basically,
there
are
a
method
to
speed
up
database
lookups
and
they
are
a
highly
efficient
mechanism.
So
insertion
and
look
up
is
is
roughly
order
one
and
basically
you
can
think
of
a
bloom
filter
as
a
probabilistic
set.
M
So
if
you
have
an
element
in
that,
has
a
may
or
may
not
have
been
inserted
into
the
filter,
then
you
and
and
if
you
query
a
filter,
then
if
the
the
filter
says
it's
not
in
there
and
it's
guaranteed
not
to
be
a
member
of
the
of
the
set
if
it
says
yes,
this
is
a
member
of
the
of
the
filter.
Then
there
is
a
small
probability
that
this
is
a
false
positive,
but
it's
highly
likely
that
that
particular
element
is
in
the
set.
M
So
how
does
this
work
you
take,
for
example,
a
domain
name?
You
run
that
through
a
set
of
hash
functions,
then
you
get
some
output.
This
is
actually
the
sha-256
hash
of
that
particular
domain
name.
You
use
each
of
the
you
use
parts
of
the
hash
or
the
different
hash
functions
from
multiple
hash
functions
as
indices
in
a
bit
array,
and
then
you
flip
the
bits
to
one
if
at
a
particular
index.
So
on
the
left
hand
side,
you
see
insertions,
so
we
insert
example
a
common
example
or
here
as
an
example.
M
Now
the
idea
that
we
had
was
that
you,
what
you
could
do
is
to
take
all
of
the
queries
from
your
clients
and
insert
T's
into
a
bloom
filter,
and
this
is
actually
interesting.
This
is
a
methodology.
That's
already
used
to
find
things
like
new.
You
observe
domains,
for
example,
I
think
power.
Dns
has
an
implementation
that
does
that.
But
what
we
wanted
to
do
was
use
this
information
to
check.
M
If
a
name
was
queried
for,
but
we
don't
were
not
interested
in
by
whom
that
name
was
queried
for
and
exactly
when,
so
what
we
want
to
do
with
this
is
to
perform
network
level
threat
monitoring.
So
we
won't
say
that
we
have
a
domain
name.
That
we
know
is
malicious.
We
want
to
be
able
to
say
within
a
certain
frame
time
frame.
Did
anybody
ever
query
that
name
in
our
network
and
the
bloom
filters,
give
you
some
nice
properties
with
that,
because
there
are
non
enumerable
as
soon
as
I've
inserted.
M
Something
I
have
no
clue
what
I
insert
anymore,
because
it's
turned
into
some
a
few
random
bits
that
I
flip
in
a
filter.
If
I
makes
queries
for
lots
of
users
into
a
single
filter,
it
becomes
really
hard
to
sort
of
distinguish
queries
from
individuals,
and
what
is
also
interesting
is
that,
due
to
the
mathematical
properties
of
bloom
filters,
we
can
actually
take
multiple
filters
and
combine
them
into
a
single
filter
which
has
a
higher
false,
positive
probability,
but
contains
more
data
and
thus
anonymizes
even
more.
M
So
what
are
the
prototype
that
my
student
develop
does?
Is
that
it
has
sort
of
an
auto
tuning
mode
that
you
can
run
it
in
against
your
resolver
for
say
a
few
days
to
see
where
your
curry
pattern
looks
like
and
then
it
will
suggest
bloom
filter
parameters
that
you
can
set
in
order
to
have
a
certain
false
positive
probability.
M
So
we,
for
example,
want
to
do
hourly
filters
in
our
experiment,
and
we
want
to
be
able
to
aggregate
these
two
single
day.
So
we
want
to
combine
24-hour
lis
filters
into
a
single
filter
for
a
day
when
we
do
aggregation
and
we
would
like
to
have
a
false
positive
probability,
probability
of
100,000
for
the
daily
filters
and
what
the
graph
shows
you
is
that
it's
important
to
do
the
tuning,
because
if
you
look
at
the
number
of
distinct
query
names
that
you
see
in
a
day
on
a
resolver,
then
on
an
hourly
basis.
M
M
So
what
do
we
put
in
there?
What
we
wanted
to
be
able
to
do
so
surfing?
It
is
a
large
research
network.
It
has
many
universities
connected
and
has
schools
connected
to
it
to
this,
and
what
we
wanted
to
do
was
to
be
able
to
distinguish
queries
from
different
institutions,
but
not
from
different
users,
so
basically
the
the
things
that
we
insert
in
the
in
the
filter
on
this
slide.
So
if
we
have
say
evil
domain
Lacombe,
then
we
will
insert
both
the
second-level
all
of
the
labels,
but
also
organization,
a
at
evil
domain.
M
A
comb
for
the
specific
network
that
that
query
came
from
such
that,
and
this
means
that
if
we
want
to
say
we
we
get
an
indicator
of
compromise,
we
want
to
work
out.
If
that
indicator
of
compromise
was
ever
seen
on
the
network,
we
can
put
it
in
a
bloom
filter
in
hotels
if
it
was
seen
and
then
we
can
sort
of
enumerate
over
all
of
the
institutions
that
are
in
there
and
figure
out
which
institutions
sends
us
that
query.
M
M
First,
a
little
bit
about
the
predicted
versus
the
actual
false
positive
rate
for
the
filters,
so
we
ran
Auto
tuning
for
a
week
before
we
did
our
experiment
and
we
chose
as
filter
parameters
that
we
would
set
10
indices
for
every
query
that
we
get
and
the
bid
filter
size
was
for
in
a
91
megabytes
as
it's
roughly
fifty
nine
megabytes
in
memory.
So
it's
actually
quite
reasonable.
M
If
you
compare
the
resolver
cache
on
that
machine,
which
was
about
two
gigs,
so
our
goal
was
to
keep
the
daily
false
positive
rate
below
one
in
a
thousand
and
of
course
we
had
to
estimate
the
number
of
elements
that
we
inserted
a
little
bit
of
hand-waving,
but
after
you've
used
a
filter,
you
can
calculate
the
actual
false
positive
probability,
which
is
that
the
formula
is
given
there.
It's
explained
in
in
the
paper
and
fertilize
me:
I
can't
remember
what
s
is,
but
the
graphs
are
show
you
the
result.
M
The
black
line
is
ten
ten
to
the
minus
three.
So
that's
one
in
a
thousand,
so
anything
above
so
that
the
graphs
are
a
bit
confusing.
Anything
above
the
red
line
means
we
had
a
lower
false
positive
probability
than
we
actually
said.
So
the
takeaway
from
this
is
that
we
actually
had
a
very
good,
false,
positive
rate
on
the
bloom
filters,
and
if
you
look
at
the
hourly,
because
we
use
the
same
parameters
for
the
hourly
and
the
daily
filters,
otherwise
we
can't
aggregate
them.
M
Now,
one
of
the
things
that
we
tested
this
with
is
the
National
detection
Network,
and
this
is
something
that
is
managed
by
our
government,
a
Dutch
national
cybersecurity
Center
and
what
they
have
is
they
have
a
system
that
runs
a
Miss.
So
it's
a
malware
information
service
platform,
I
think
is
the
acronym
and
what
they
put
in.
There
are
high
value
indicators
of
compromise,
for
example,
indicators
of
compromised
from
the
intelligence
services.
Now,
in
the
condition
for
participating
in
this
national
detection
network.
M
Is
that
you
don't
just
take
data
from
it,
but
you
also
put
data
back
into
it.
So
say
you
get
an
indicator
that
there
is
some
some
malware
active
then
what
they
want
to
know
is
how
does
this
affect
your
community?
Because
their
goal
is
to
figure
out
how
that
society
is
impacted
by
by
these
threats.
Right
and,
of
course,
surf
net
wanted
to
participate
in
that,
but
they
didn't
want
to
monitor
all
of
their
individual
users.
M
They
want
to
didn't
want
to
throw
away
the
privacy
of
all
users
in
order
to
participate
in
something
like
this,
so
this
particular
solution
that
we
implemented
was
very
interesting
for
them
because
it
allowed
them
to
take
the
indicators
of
compromised,
hold
them
against
the
bloom
filters
and
figure
out
if
there
are
hits
on
that,
and
they
could
report
it
back
to
the
national
detection
Network,
but
they
also
got
some
indicators
of
sort
of
what
threats
there
were
in
the
network.
The
graph
shows
you
the
number
of
threats
that
occurred
on
a
daily
basis.
M
So
it's
not
a
huge
number
of
threats
in
the
order
40
to
50
in
a
unique
threats
per
day.
But
the
interesting
thing
about
this
is
that
surf
nets.
Privacy
policy
prevented
them
from
monitoring
individual
curries,
so
they
couldn't
do
this
before
and
now
that
they
had
the
bloom
filter
solution.
They
could
do
these
lookups
and
actually
work
out
whether
these
threats
were
occurring
in
the
network,
and
we
found
an
actual
compromise,
which
was
a
one
of
wanna
cry
infected
machine.
M
M
So,
rather
than
doing
blanket
surveillance,
we
can
look
for
a
specific
threat
and
then
see
who
is
infected
and
chase
up
that
machine,
so
this
is
much
less
invasive
for
users
than
just
sort
of
monitoring
all
of
their
traffic
and
just
telling
them
no
we're
doing
this
for
your
good.
Now,
in
this
case,
we
can
make
a
balanced
decision,
whether
or
not
to
monitor
for
something
or
not
so
some
other
benefits.
No
personal
data
is
stored,
of
course,
because
we
don't
retain
any
IP
addresses.
M
This
is
aggregated
at
the
individual
now
level,
but
this
means
that
you
can
retain
this
data
much
longer
which
allows
you
to
do
historical
lookups,
which
is
very
interesting.
We
think
think
back
to,
for
example,
the
one
a
cry
case
where
you
could
recognize
that
this
strength
was
present
or
your
network,
because
he
would
do
certain
DNS
queries
if
we
had
had
this
running
at
the
time
sort
of
before
that
threat
existed
as
soon
as
the
malware
researchers
discovered.
That
particular
query.
We
could
have
gone
back
in
time
and
worked
out.
M
Another
thing
is
that-
and
this
is
something
I
as
a
researcher
find
very
interesting-
is
that
you
could
share
this
data
with
third
parties
without
disclosing
PII
too
to
them,
and
then
they
could.
For
example,
say
you
take
filters
that
are
collected
in
in
different
networks.
Then
you
can
do
co-occurrence
of
queries.
If
say,
I
found
this
query
that
that
I
think
is
malicious,
which
other
networks
did
I
see
that
in
and
was
that
roughly
at
the
same
time?
M
And
the
final
thing
is
that
as
a
nice
side
effect
of
how
bloom
filters
work,
they
you
can
do
cardinality
estimates
or
the
number
of
distinct
queries
that
are
in
there.
You
can
estimate
that,
because
it's
sort
of
related
to
how
hyper
log
lock
works
if
you're
familiar
with
that
right,
so
the
prototype
code
has
been
released
as
open
source,
the
URLs
on
the
slide
so
surf
net
where
we
trial.
This
is
planning
to
take
this
into
production
because
they
want
to
use
this
for
their
C,
cert
team
and
adenylate
labs.
M
We're
creating
the
tools
to
integrate
this
into
our
open
source
products,
in
particular
our
resolver
product
unbound,
and
the
goal
of
that
is
to
again
release
this
is
open
source
software
and
make
this
really
easy
to
deploy,
so
that,
if
you
want
to
do
this
kind
of
network
monitoring,
then
at
least
you
now
have
a
more
privacy
friendly
solution
available
than
just
blanket.
Recording
every
query
that
all
of
your
users
make
and
I
hope
that
it's
a
little
bit
of
proof
that
security
and
privacy
can
go
hand-in-hand,
and
with
that
this
is
the
paper.
M
S
M
So
for
the
for
an
initial
detection
network,
we're
certain
that
they
were
serpent
is
certainly
not
sharing
the
whole
bloom
filter
with
NCC.
What
they
are
sharing
with
ncsc
is:
if
there
are
the
detections
that
they
do
themselves
so
say,
I
have
a
threat,
I
detect
that,
and
we
report
back.
We
saw
this
threat
in
universities
or
in
I,
don't
know
schools
for
vocational
education,
but
of
course
it's
not
the
bloom
filter
they're,
sharing
that
what
they're
sharing
is
whether
or
not
a
threat
was
detected,
but.
S
Don't
you
have
to
take
the
consonance
the
bloom
filters
to
find
out
if
the
1:1
burglar
had
the
the
vulnerabilities
are
the
the
text
that
were
representing
the
other
I'm,
not
understanding?
Sorry,
okay,
I
can't
hear
you.
Can
you
step
closer
to
mine?
Oh
I
was
just
saying:
don't
you
it's
the
coincidence,
the
bloom
filters
that
shows
that
you
that
there's
evidence
of
those
attacks
or
vulnerabilities
in
the
in
one
set
yeah,
but
so
who's,
the
one?
That's
finding
the
intersections
of
bloom
filters,
because
that's
on
your
trusting
the.
M
The
idea
is
that
the
network
operator,
in
this
case
surf
net,
who
participates
in
the
national
detection
network,
thus
detections
on
their
own
on
their
own
bloom
filters
that
they
record
on
their
resolvers
and
they
just
report
infections
back
to
the
cybersecurity
ncsc.
They
don't
sort
of
send
the
bloom
filters
they're.
M
S
M
V
Hi
STP
from
the
NCSC
in
the
UK,
and
thank
you
for
your
presentation.
I
found
it
really
interesting
and
I
think
it's
a
good
piece
of
research
that
needs
to
happen
in
this
space
and
and,
like
you
said
it's
nice
to
see
that
security
and
privacy
can't
go
together
and
I
just
had
a
question
on
your
previous
slide.
I
think
maybe
the
wonderful
about.
V
Oh
maybe
it
was
a
couple
of
very
sorry
yeah
about
sharing
the
bloom
filter
with
third
parties,
so
in
this
case
and
I
would
just
be
concerned
about
the
potential
leg-up
that
you
would
give
a
threat
actor
in
knowing
that
their
demeanor.
Whatever
was
noted,
you
could
just
change
their
techniques
so
and
threat
intelligences
shared
under
the
TLP
protocol
at
moon
and
yeah.
Just
to
say
when
you
share
it
with
researchers,
I
worry
about
giving
it
to
any
academic
that
you
know,
fancy
don't
I.
So.
M
Okay,
so
so
of
course,
when
I
say
sharing
a
third
parties,
what
I
would
say
it
should
have
a
little
asterisk
that
says,
under
certain
conditions,
right
I
am
an
academic
researcher.
So
I
don't
worry
so
much
about
sharing
this
with
academic
researchers.
I
mean
trust
me,
but
you
do
have
a
you.
Do
have
a
point
right.
You
want
to
have
certain
safeguards
in
place
before
you
share
this
kind
of
information.
M
I
guess
it
also
depends
on
the
network
that
you
collect
this
information
in
surfing
is
a
research
network,
so
there
there
is
a
one
of
the
goals
is
also
to
do
research
on
the
network
itself.
So
if
you
share
that
with
researchers
that
are
within
say
their
constituency,
then
that
could
be
there
could
be
good
conditions
for
doing
that
actually
surf
and
has
a
data
sharing
policy
that
sort
of
lists
conditions
under
which
this
kind
of
data
can
be
shared
with
researchers.
M
This
was
sort
of
easy
to
understand
and
her
intuition
was
that,
for
example,
this
would
not
be
subject
to
the
GDP
art
because
of
the
data
that
gets
put
in,
and
that
meant
that,
under
certain
conditions,
we
could
share
the
information,
even
though
there
are
some
privacy
risk.
If
you
can
just
try
and
send
whatever
question
you
have
to
the
filter
and
figure
out,
if
something's
in
there
does
that
sort
of
answer
your
question
sort
of
okay,
any
other
questions.
W
W
W
W
W
W
W
So,
essentially,
we
work
on
a
set
of
three
documents.
These
three
documents
used
to
be
a
single
large
document
and,
based
on
on
the
advice
on
a
number
of
groups,
we
split
this
document
into
three.
The
first
document
is
well
numeric,
IDs,
a
history
in
which
what
we
try
to
do
is
cover
the
timeline
of
some
some
old
numeric
identifiers.
This
is
this
document
targets.
Peer,
second,
is
about
proposing
algorithms
to
you
know
to
generate
these
numeric
identifiers.
W
So
the
first
document
numeric
ID,
is
history.
Essentially
what
we
do
we
cover
some
sample:
numeric,
identifiers,
ipv6
interface,
identifiers,
ipv4
and
ipv6,
fragment
identifiers
and
so
on,
and
what
we
tried
to
do
with
this
work
is
essentially
to
illustrate
you
know
how,
essentially
we
faced
the
same
problem
over
and
over
again,
sometimes
called
the
same
identifiers
in
different
protocols
like
prime
identifiers
in
ipv4
and
ipv6,
or
sometimes
it's
the
same
problem
at
the
end
of
the
day,
but
for
different
protocols.
W
This
document
again
is
targets.
Fear
next
slide,
please
the
second
document,
it's
a
little
bit
more
complex.
Essentially,
what
we
try
to
do
is
to
categorize
numeric
identifiers
based
on
their
interoperability
requirements
and
the
failure
mode
interpretability
requirements,
ell
requirements
that
you
know
must
become
quite
based
on
the
protocol.
B
B
And
I
think
this
draft,
the
third
draft,
which
is
I,
think
it's
under
consideration
for
ad
sponsorship
by
Ben,
kaduk
and
yeah,
and
we
spoke
to
the
we
spoke
to
the
security
ad
Benedict
and
also
the
author
of
the
draft,
the
to
draft
and
seem
like
the
first
to
draft
history
and
generation
who
were
under
scope
for
and
good
benefit
from
coming
to
perigee.
And
we
were
thinking
of
adopting
them.
So
do
people
have
opinions.
L
B
B
A
A
A
X
Am
David
offer
with
the
Guardian
project
and
talk
about
an
internet
draft
we've
submitted
recently
on
enabling
network
traffic
obfuscation
via
pluggable
transports.
So,
first
of
all,
what
is
what
are
pluggable
transports
mechanism
for
enabling
the
rapid
development
and
deployment
of
network
obfuscation
techniques
used
to
circumvent
surveillance
and
censorship?
The
deep
tape
details
about
this
work,
which
has
been
going
on
for
some
now,
are
available
at
this
URL.
X
The
generalized
architecture
here
is
that
there
is
a
server,
that's
exposing
a
public
proxy
that
accepts
connections
from
pluggable
transport
clients.
The
client
transforms
traffic
before
it
hits
the
public
Internet.
The
PT
server
reverses
the
transformation
and
then
passes
the
traffic
onto
the
server
app
there's,
also
an
optional
lightweight
protocol
to
facilitate
communicating
connection
metadata.
So
when
you're
migrating
between
one
connection
type
and
another,
for
example.
X
The
draft
that
we've
put
together
is
based
on
the
pluggable
transports,
2.1
specification,
which
is
a
work
of
a
fairly
large
and
diverse
community
of
people,
and
it
has
two
subsets.
One
is
what
we
call
the
transport
api
interface
and
the
other,
the
dispatcher
interface,
so
one
that
is
focused
around
in-progress
language-specific
api.
X
That's
integrated
directly
into
the
client
app
on
the
client
side
and
into
the
server
app
on
the
other
side
and
communication
within
the
app
or
the
way.
The
app
on
both
sides
sees.
The
pluggable
transport
is
like
a
socket
with
the
dispatcher
API.
This
is
to
be
used
between
processes,
so
there's
another
process
on
each
side
that
handles
the
process
of
obscuring
and
unobscured
and
passes
that
so
that
the
actual
application
doesn't
have
to
deal
with
that
aspect
of
the
problem.
X
The
dispatcher
here
can
be
configured
with
different
types
of
proxies,
so
you
can
have
live
at
any
moment.
Different
kinds
of
of
transports
available
that
can
that
are
connected
in
different
kinds
of
proxies
and
can
respond
to
different
kinds
of
traffic.
So
this
is
sort
of
that
architecture
described
in
graphic
form.
X
The
spec
also
talks
or
the
internet
draft
also
talks
about
where
we
can
by
looking
at
the
older
specification,
which
was
just
the
dispatcher
API
and
the
issues
related
to
linking
across
language
and
then
also
this
idea
of
adapting
back
and
forth
between
types
of
adapt,
types
of
technique
and
I.
Think
that's
it
questions.
X
Yeah
I
guess
I
can
give
maybe
my
own
personal
response
to
this,
because
I'm
not
sure
overall
myself,
but
there's
a
been
a
lot
of
work
going
on
in
this
area,
and
it
is
in
my
mind,
sort
of
in
occurs
in
user
space
of
a
necessity,
because
we
want
these
techniques
to
be
adapted
in
a
very
rapid
way.
So
it's
unclear
to
me
what
or
it
needs
to
be
debated,
what
sort
of
final
position
this
has
in
some
standardization
work.
X
Should
some
of
this
stuff
take
place
a
lower
in
the
network
stack
or
not,
and
and
if
so,
why
and
if
not,
why
etcetera?
But
we
are,
we
do
hope.
This
work
took
place
for
a
limited
period
of
time,
only
within
sort
of
one
small
set
of
the
community,
and
now
we
have
a
larger
community
that
finds
this
interesting.
So
there
seems
to
be
need
to
be
some
bridge
between
some
long
term
standard
and
some
just
lack
of
unity
on
the
topic
Thanks.
X
A
L
N
Z
Z
Just
a
real,
quick
overview.
Ten
seconds.
It
orders
the
discussion
about
censorship
around
three
things
prescription.
That
is
what
do
you
want
to
block
identification?
How
do
you
technically
identify
those
things
that
you'd
like
to
block
and
then
interference
the
actual
performance
of
the
blocking
there's,
a
ton
of
small
and
medium
issues
that
a
bunch
of
people
have
identified
in
the
reviews
that
are
in
our
issue?
Tracker
and
we're
gonna
be
going
over
those
in
the
next
couple
of
months
and
incorporating
all
that
good
feedback?
Z
And
since
this
is
a
possibly
a
research
group
draft,
we're
gonna
actually
try
and
do
some
outreach
to
areas
that
haven't
gotten
much
review
of
this
so
routing
and
DNS
in
IETF
just
to
excerpt
the
chunk
of
it,
that's
there
and
see
if
we
get
additional
feedback
there.
Although
some
people
like
Stefan
a
long
time
ago,
reviewed
an
earlier
draft
of
this,
the
bigger
issues
that
have
been
talked
about
on
the
list.
There's
this
thing
about
mitigations
I
was
initially
pretty
just
to
state
the
problem.
Z
Some
people
feel
that
it
would
be
a
better
draft
of
it
actually
in,
in
addition
to
describing
censorship
techniques
and
included
discussions
of
mitigations
that
may
or
may
not
be
relevant
for
each
technique.
I
was
initial,
initially
pretty
reluctant
to
do
that.
Just
because
it
felt
like
it
would
blow
up
the
draft,
but
I
I'm
I
should
say:
I
can
go
either
way
on
this
in
the
sense
that
I
think
pithy
statements
under
each
technique
that
describe
some
types
of
mitigations
may
work.
Z
Z
I
forgotten
his
first
name.
I
should
know
in
Vitoria,
Victorio
Burt
Ola
said
specifically
that
censorship
may
be
too
negative
of
a
framing
I,
come
from
a
digital
rights
background
where
we
talk
about
when
someone's
blocking
something
you
want
to
get
access
to
that
censored,
no
matter
what
it
is
happy
to
say
blocking
techniques.
Do
you
want
to
talk
now,
I'm
almost
done,
there's
a
chunk
about
non-technical
forms
of
prescription
and
interference
so
ways
of
finding
stuff
and
blocking
them
like
self-censorship
or
legal
mechanisms
and
stuff
like
that,
I'm
not
wedded
to
that
stuff.
Z
It
just
felt
like
it
would
make
the
document
sort
of
complete
and
it's
sort
of
the
rubber
hose
thing
that
might
come
into
play
in
certain
kinds
of
things,
and
it's
not
like
some
rough
description
of
what
that
rubber
hose
might
look
like.
There's
the
bigger
issue
of
the
stuff
moves
really
quick.
It
may
need
to
be
a
regularly
updated
draft.
I'm,
not
I,
know
that
that's
a
different
discussion
in
other
places.
Some
people
don't
care
about
that.
Some
people
feel
very
strongly
about
that.
Z
It
may
be
something
we
want
to
wait
for
the
living
standard
discussion.
What's
the
research
content,
it's
a
sort
of
you
can
think
of
it
as
a
review
article,
a
systemization
of
knowledge.
There
are
some
gaps
in
certain
areas
that
we're
working
to
fix
and
then
one
other
thing
I
wanted
to
say.
That's
escaping
my
mind.
Oh
no.
D
For
scrolling,
so
first
of
all,
okay,
you
should
call
it
s.
Okay,
one
rather
than
art
see
something
cool
more
seriously
and
the
question
we
should
be
able
to
being.
You
know,
updated
draft
I.
Guess
that's
a
question
for
you
is
how
often
do
things
change
if
things
change
you
know
on
the
scale,
every
two
years
then,
like
just
you,
know,
publish
this
one
another
one
a
couple
years:
they
change
every
two
months.
Then
probably
that's
awesome,
so
I
think
it's
all
a
question
for
you
better
rather
than
me.
Z
L
Malory
noodle
article
18
since
we're
solving
all
the
bigger
issues
on
the
slide
right
now,
I
think
for
the
first
one.
Maybe
this
is
a
question
for
you.
Siobhan
is
I
mean
it
seems
like
in
the
Charter.
You
are
open
to
talking
about
threats,
not
just
mitigations
and
I.
Do
think
that
censorship
and
privacy
are
two
sides
of
the
same
coin.
Like
so
I
don't
know.
Maybe
we
can
talk
about
that
more
because
I
feel
like
it
would
fit
within
the
Charter.
I
was
just
rereading
it
before
I
came
up
to
the
bank
agree.
AA
When
you
Seltzer
thing
thanks
for
this
work,
I
think
the
non-technical
forms
is
a
valuable
addition,
because
we're
always
thinking
about.
Where
is
something
going
to
get
routed
if
it,
the
technical
means
aren't
available
and
thinking
about
that
sort
of
broad
piece
of
the
threat
model
is
helpful
to
thinking
how
effective
are
anti
censorship
mechanisms.
Thank
you.
Stefan.
E
Stefan
BOTS
Maya
regarding
the
first
point,
yeah
the
fact
that
the
draft
is
still
not
published
as
a
rare
sea
so
that
it's
complicated
to
keep
this
sort
of
thing.
Up-To-Date
and
I
vote
aye
against,
including
mitigations,
because
a
big
problem
is
not
only
that
they
evolve
very
fast,
but
also
that
can
have
consequences.
A
bad
advice
can
be
really
harmful
for
people
that
can,
for
instance,
to
show
that
we
try
to
walk
around
censorship
thing
like
that.
So
it's
much
more
touchy,
so
I
suggest
to
stay
with
rights.
Only
on
of.
E
Z
E
P
If
you're
still
looking
for
inputs
on
absolutely
for
the
first
one,
I
I'm,
not
sure
how
much
beyond
RFC
six
nine
seven
three
will
be
able
to
contribute,
even
if
we
go
down
the
path
of
including
mitigation,
so
that's
I
mean
personally
I'm
fine
with
the
content
says
they
are
on
the
second
one.
I
I,
don't
think,
there's
a
need
to
change
it.
Okay,
yeah,
but
maybe
the
title
should
reflect
the
scope,
which
is
that
not
this
is
not
on
online
sensation.
It's
just
Internet
traffic
and
website
sensation.
So
maybe
that
just
fix
it.
P
A
AC
A
Just
to
be
good
the
question
of
like
how
frequently
the
document
will
be
updated,
as
you
said,
does
you
know
kind
of
bleed
into
the
living
document
issue,
so
we'll
discuss
with
column?
What
is
you
know
a
good
strategy
for
dealing
with
that?
Should
that
option
happen,
but
for
now
the
normal
thing
will
happen.
I
guess.
A
A
Yes,
that's
good
people
would
please
read
the
drafts
if
you're,
interested
and
and
yeah
well
ask
whether
or
not
people
interested
in
adopting
on
the
list
and
go
from
there
make
at
this
point,
perhaps
given
the
few
number
of
people
that
I've
actually
read
at
doing
a
hum
here
is
not
the
best
thing
so
we'll
just
take
that
one
to
the
list
and
with
that
message
anything
else,
we
can
end
a
few
minutes
early
thanks.
Everyone
Thanks.