►
From YouTube: IETF110-MAPRG-20210308-1430
Description
MAPRG meeting session at IETF110
2021/03/08 1430
https://datatracker.ietf.org/meeting/110/proceedings/
A
There
you
go,
all
right,
looks
like
we're
all
set
thanks
so
much
for
testing
that
stuff
out
and
we're
ready
to
get
started.
So
I'm
going
to
go
back
to
sharing
the
intro.
A
A
A
A
Good
morning
or
good
afternoon
or
good
evening,
everyone,
this
is
the
measurement
analysis
for
protocols,
research,
group
meeting
online
for
ietf110
miria.
My
co-chair,
and
I
are
both
on
here.
Mary-
is
going
to
help
with
watching
the
jabber
channel
and
help
keep
me
on
track.
If
I
forget
anything,
the
you
can
reach
us
at
maprgchairs
at
ietf.org.
A
Here's
the
irtf
notewell,
it's
the
same
as
the
ietf
noel,
essentially,
but
be
aware
that
speaking
about
your
work
here
has
potentially
consequences
having
to
do
with
intellectual
property
and
familiarize
yourself
with
that.
Here's,
the
irtf
privacy
and
code
of
conduct-
behavior
yourselves,
you
always
seem
to,
but
look
at
that.
A
If
you
need
to
know
anything
about
the
details,
here's
a
slide
about
the
goals
of
the
irtf
for
those
of
you
new
to
it,
and
I
think
maybe
you
have
a
couple
speakers
that
haven't
visited
us
before
on
the
irtf
conducts
research,
it's
not
bound
by
the
same
rules
as
the
ietf,
but
we
largely
focus
on
research
that
has
to
do
with
ietf
things,
and
we
happen
to
be
co-located
or,
coincidentally,
with
the
itf
administrivia.
You
can
find
our
charter
online.
A
I
usually
search
for
mrg
at
google,
and
that
comes
up
there.
We
have
a
mailing
list
if
you
want
to
stick
with
us
in
the
future.
Today's
slides
are
not
at
meeting
105
they're
at
meeting
110,
so
I'll
fix
that
in
the
slides
the
media
go
link,
you've
already
got
and
we're
also
on
jabber
so
feel
free
to
type
things
in
the
jabber
session,
which
should
be
on
the
left
of
your
medical
screen.
A
If
you
would
rather
do
that
than
speak
out
loud,
our
agenda
for
today
is
once
we
get
past
this
few
minutes
of
intro
is
we'll
switch
to
paul
hoffman
for
about
10
minutes.
Talking
about
collecting
typical
domain
names
for
web
servers,
and
if
you
look
at
the
agenda
link
in
on
the
website,
the
the
document
or
the
report
that
paul
wrote,
I
think
for
icann
is,
is-
is
listed
there.
The
link
to
it,
then
we're
going
to
bring
viet
up
to
presents
work
in
that
was
presented
in
an
academic
conference.
A
The
link
there
to
the
full
paper
is
also
there
for
a
15-minute
research
presentation
again
about
the
dns
and
tls,
and
some
measurement
results
we'll
switch
to
fong
for
assessing
the
privacy
benefits
of
the
of
domain
name.
Encryption
that
is
also
is
or
is
going
to
be
a
published
research
paper.
Maybe
I've
got
some
other
way
around
one's
upcoming
and
one
has
already
been
published.
A
The
link
to
the
paper
is
there
we'll
spend
a
few
minutes
for
a
quick
update
from
robin
marks
about
qlog
and
then
we'll
close
out
with
pete
heist
and
the
last
about
10
minutes,
and
if
we
get
any
time
back
along
the
way.
We'll
have
you
know
more
time
for
questions,
but
we're
going
to
try
to
fit
the
questions
and
comments
in
between
those
or
in
the
time
itself.
A
So
that's
the
session
for
today
any
any
opening
comments
or
questions
or
issues
all
right.
So
I
want.
B
To
quickly
thank
spencer
and
oliver
for
note-taking,
I
found
two
note
takers,
so
oh
yeah.
A
Awesome
awesome.
Thank
you
so
much
okay,
so
we've
got
paul
up
first,
so
paul.
If
you
get
in
the
queue
to
share
your
screen,
I
can
approve
it.
C
C
Oh
wait:
there's
one:
that's
yeah
application
window.
C
I
guess
we
always
have
to
start
in
an
ietf
meeting
with
hardware
issues.
A
Would
you
like
me
to
stop
you
know.
A
B
A
I
had
to
research
chrome,
let's
see.
A
A
C
C
So
the
motivation
for
this
work
was
that
in
the
dns,
a
lot
of
people
when
they're
doing
research
on
like
authoritative
servers
and
such
like
that
they
want
to
to
collect
data
from
a
lot
of
authoritative
servers,
I'm
going
down
and
up
again,
but
that
happens
and
typically
what
people
do
is
they
use
one
of
the
lists
of
the
most
popular.
You
know
servers
out
there
so
like
the
alexa
list
or
the
tranco
list
and
such
like
that,
but
those
aren't
typical
and
the
the
request
from
dprive
was.
C
Actually
we
were
trying
to
measure
the
how
long
it
takes
to
set
up
a
tls
connection
on
a
typical
web
server
turns
out.
We
didn't
end
up
using
using
the
research,
but
it
was
fun
to
do
anyways.
C
So
the
kinds
of
lists
that
people
use,
like
I
say,
are
the
most
popular
websites
and
if
you
don't
know
about
tranco,
you
should
definitely
look
at
the
old
map.
Rg
slides
on
that
the
other
one
that
people
use.
Is
they
extract
that
that
all
of
the
gtlds,
the
the
global
top
level
domains,
I'm
sorry
generic
top-level
domains?
Those
are
available.
C
Those
lists
of
the
domain
names,
however,
a
bazillion
of
those
are
just
parked,
so
they're
not
really
typical
either
or
they
get
a
dump
from
a
passive
dns
collection
system,
but
that's
not
really
typical,
because
those
are
often
very
local.
That
is,
you
know,
for
a
recursive
resolver
in
a
particular
region.
It's
going
to
be
much
more
regional
next
slide.
C
So
what
I
also
just
realized
is
that
wow
wikipedia
has
links
to
a
bazillion
small
websites.
You
know,
if
you're,
if
you're
reading
the
wikipedia
page,
you
will
see
things
like
personal
websites
of
academics
and
lists
of
all
of
the
elementary
schools
in
a
particular
town
in
a
country.
You've
never
visited
and
such
like
that,
and
also
the
nice
thing
is
wikipedia,
is
not
just
in
english.
That's
in
pretty
much
every
spoken
language
in
the
world,
so
this
is.
C
These
are
examples
of
the
kinds
of
things
that,
as
I
was
looking
I
was
like.
Oh,
this
is
great.
These
are
much
more
typical
than
a
list
of
the
largest
websites
around
you
get
governments
of
small
cities
obscure
support
teams.
Things
like
that.
So
the
question
was
well
great
without
scraping
all
of
wikipedia.
Could
I
get
a
sample
of
these
next
slide?
C
It
turns
out
that
they
actually
keep
a
database
twice
a
month
as
backups
a
public
database.
They
want
people
backing
up
wikipedia,
basically
to
mirror
it,
not
so
that
you're
serving
it,
but
in
case
they
go
down
and
such
like
that,
and
instead
of
them
just
having
all
of
the
pages
as
one
giant
hunk,
they
actually
have
pages
and
some
subsets
and
one
of
them
is
the
list
of
all
the
external
links.
So
that
was
perfect.
C
I
grabbed
that
for
the
research
I
did
here.
I
did
that
at
the
beginning
of
this
year
they
updated.
It
seems
like
twice
a
month
on
the
first
of
the
month
and
then
sometime
towards
the
middle
so
grab
the
database
extract
all
the
external
links
because
it's
actually
a
database.
It's
a
mysql
database
clean
up
the
lists
because
wikipedia
it
being
publicly
edited,
you
have
lots
of
broken
links,
links
that
just
in
fact,
don't
make
sense,
and
that's
okay,
you
know
like
not
not
dunking
on
wikipedia
at
all.
C
It's
a
lovely
resource
picked
out
all
the
ones
that,
in
this
case,
were
for
http
https,
which
turned
out
to
be
the
vast
majority
of
all
of
the
links.
Can
you
scroll
down
or
keep
it
as
a
full
screen?
C
A
A
C
By
the
way,
so
so,
and
then
because
what
I
was
looking
at
is
just
web
servers,
I
didn't
care
about
the
full
url
I
stripped
off
the
scheme
and
everything
after
the
domain
name,
which
means
I
didn't
know
which
ones
that
had
been
listed
as
htp
or
https.
C
But
that's
okay,
because
one
of
the
things
we
wanted
to
see
is,
if
you're
hitting
on
a
a
web
server
that
actually
doesn't
have
https.
How
long
will
it
take
for
you
to
find
out
and
such
like
that?
So
what
what
this
gave
me
was
a
list
of
all
the
domain
names
for
all
the
links,
and
then
I
had
to
call
the
list
because
of
course
there
was
a
lot
of
duplication,
a
million
things
point
back
to
sites
like
google
or,
for
example,
for
a
large
university.
There
might
be.
C
So
after
I
did
all
of
that
and
again
I
did
this
for
january.
First,
there
were
about
750
databases.
I
got
about
7.35
million
unique
domain
names
from
the
data
set,
so
that
was
lovely
and
way
more
than
I
wanted.
So
I
took
a
random
sample
of
a
hundred
thousand,
but
so
the
good
news
is,
if
you
want
millions
of
unique
domain
names
to
sample
from
this
works.
C
Just
great
so
and
I
wanted
to
test
a
hundred
thousand
hosts,
but
many
of
the
names
in
there
are
dead,
so
I
actually
started
with
over
a
hundred
thousand
and
resolved
them
all.
C
So
I
found
out
which
ones
actually
currently
have
an
ipv4
address,
so
I
think
I
started
150
and
then
150
000
and
ended
up
with
about
110
000.,
and
then
I
did
the
actual
analysis
that
I
was
doing,
which
was
timing,
but
I
also
wanted
to
look
just
because
everyone
in
the
ietf
says
how
are
we
doing
with
ipv6
and
how?
Well
you
know
for
those
of
us
in
the
dns
world
how
many
were
dns
signed.
So
I
did
that
analysis.
As
I
was
doing,
that's
not
important
to
this
presentation.
C
C
C
We
have
our
own
series
in
icann
called
the
octo
series,
which
is
the
office
of
the
cto,
which
is
what
I'm
in
so
there's
an
octo
number
of
of
that,
and
I
don't
actually
give
the
code
in
there
because,
as
someone
pointed
out,
maybe
wikipedia
doesn't
want,
is
pulling
down
the
databases
all
the
time
and
such
like
that.
But
if
you're
interested
in
doing
this
work,
please
get
in
touch
with
me.
C
I
do
have
a
github
repo
that
I
just
haven't
widely
publicized
for
the
steps
I
took,
which
is
a
bunch
of
you,
know,
vaguely
commented
python
code,
but
it
actually
is
reproducible.
So
that's
it.
C
I
guess,
if,
for
those
of
you
who
care
about
the
dns
and
ipv6
about
17
of
the
of
those
domain,
names
that
had
a
v4
address
also
had
a
v6
address
and
for
those
of
you
who
care
about
dns
sec,
approximately
four
percent
of
those
domain
names
were
dnsec
signed
so
great,
and
I
guess
we're
not
doing
questions
right.
We're
just
moving
on.
A
A
Yep
excuse
me
for
shutting
the
slide
down,
but
I
can't
no.
A
So
so
for
participants
put
yourself
in
the
queue,
if
you,
if
you're
interested
in
speaking
or
submit
to
something
to
jabber-
and
here
you
can
relay
it,
so
we've
got
alexander
in
the
queue.
Thank
you.
E
Thanks
paulo
alex
mayor
from
liquidity,
I
was
wondering:
did
you
also
consider
certificate
transparency
as
a
sort
of
domain
name,
because
I
I
found
that's
very
interesting
as
well.
C
A
Cool
I
like
that
it's
always
good
to
know
about
another
data
source,
that's
differently,
biased
than
all
the
other
ones
that
we
use.
I,
I
wonder,
if
also
it'd,
be
interesting
to
see
after
a
time
how
many
stale
names
are
in
the
wikipedia
and
they
might
be
able
to
use
that
for
some
of
their
update
driving,
they.
C
Actually,
don't
do
automatic
updates,
which
I
think
is
is
fine.
I
mean
they're
really
reliant
on
volunteers,
and
I
think
that
that's
really
good-
and
the
other
thing
to
to
note
here,
is
that
this
is
not
meant
to
be
a
typical
of
the
web
data.
It's
more
typical.
For
example,
all
of
us,
probably
in
the
last
month,
have
ordered
out
from
a
small
local
restaurant
that
had
a
website
that
is
not
in
wikipedia
there's,
probably
millions
of
those.
So.
F
Yeah
sure
so
this
is
oliver.
I
have
a
question
so
basically,
initially
you
started
with
the
assumption
that
the
top
lists
out
there
are
not
really
representative.
Did
you
also
basically
compare
the
data
set
that
you
gathered
from
wikipedia.
C
Okay,
yep,
and
so,
if
you
look
at
the
report,
you'll
see
that
so
very
briefly,
ipv6
adoption
was
almost
identical
between.
D
C
And-
and
this
dns
second
option
was
actually
so-
I
remember,
I
said,
was
four
percent
for
this
list.
It's
about
two
percent
for
the
tranco
list,
so
that
to
me
says
that's
a
huge
difference
that
shows
that,
in
fact,
well-managed
popular
websites
are
less
likely
to
use
dns
sect
than
typical.
A
And
viet
you
got
a
15
minute
slot
in
this
short
one-hour
meeting.
A
And
I'll,
let
you
know
when
there's
two
minutes
left
or
something.
G
Oh
okay,
so
perfect,
so
hello,
everyone,
my
name
is
viet
and
I've
been
honored
to
present
our
paper
called
measuring
dean
s
over
t
less
from
the
edge
here
at
today's
memory
meeting.
So,
as
they've
mentioned
this
paper,
which
was
co-authored
by
irena
and
viper,
who
have
also
joined
the
session,
as
I
could
see,
will
also
be
presented
at
pam
later
this
month,
and
you
can
already
find
a
link
to
the
pdf
and
more
details
on
the
paper
is
covered
in
this
talk
on
the
github
repository
linked
here.
G
G
For
short,
so
during
the
talk
we
will
see
that
the
adoption
of
the
dot
is
still
rather
low
and
that
we
have
higher
fader
rates
and
response
times
for
dot,
as
one
may
expect
compared
to
traditional
dns,
but
we'll
get
back
to
these
results
at
the
end
of
the
talk
again,
so
we
will
move
on
and,
as
most
of
you
likely
also
know,
dot
was
standardized
in
may
2016
so
roughly
five
years
ago,
and
its
main
feature
is
that
it
provides
confidentiality
by
securing
the
dns
traffic
with
a
tls
session
between
the
dns
client
and
the
resolver.
G
Now
previous
measurement
studies
have
already
looked
at
different
aspects
of
dot
and
have
measured
dot
from
different
networks
and
vantage
points,
however,
not
from
home
networks,
which
is
what
we
were
interested
in,
and
so
we
were
wondering
how
does
dot
look
from
home
networks
in
order
to
find
this
out,
we
kind
of
split
our
measurement
and
study
in
two
parts
and
part.
One
in
this
case
is
the
adoption
and
before
looking
into
home
networks
in
particular,
we
reproduced
the
first
study
that
we
just
referenced
in
which
the
authors
have
scanned.
G
This
was
at
point
15
roughly,
so
we
repeated
this
management
approach
and
found
that
the
number
of
open,
dlt
resolvers
has
actually
increased
by
roughly
23
percent
over
the
course
of
nine
months,
and
that
also
the
support
for
the
new
tls
1.3
has
also
substantially
increased
among
these
open
resolvers,
whereas
on
the
flip
side,
the
support
for
older
versions
like
pls,
1
or
tls
1.1,
the
support
for
those
has
been
also
increasingly
discontinued.
G
So
overall,
we
have
find
some
decent
progress
regarding
the
support
of
dot
and
newer
tls
versions,
although
overall
the
adoption
is
still
somewhat
low,
then
for
the
second
part
of
the
methodology.
This
concerns
the
reliability
and
response
times.
We
use
the
drive
atlas
platform
for
measurements
and
this
platform
has
been
able
to
run
dot
measurements
since
early
2018.
G
This
is
not
really
or
we
weren't
really
sure
if
those
support
dft
or
not.
So
this
ties
back
into
the
question
of
adoption
of
duty
and
for
those
we
found
that
only
13
probes,
of
the
more
than
3
000
probes
that
we
initially
used
had
a
point
or
received
the
dot
response
successfully
from
their
local
resolvers,
which
is
roughly
0.4
only.
G
Nevertheless,
overall,
we
collected
around
90
million
dns
measurements
in
total,
which
we
will
have
a
look
at
in
the
upcoming
slides.
G
The
failure
rate
in
this
case
is
simply
the
column
of
number
of
failures,
divided
by
the
total
number
of
measurement
column.
In
this
case,
and
obviously
then
the
question
would
occur.
What
is
a
failure?
Exactly
failure
in
this
case
is
when
a
probe
is
not
able
to
send
the
dns
request
to
the
resolver
or,
on
the
other
hand,
when
the
response
was
not
received
by
the
probe.
G
So
when
comparing
the
failure
rates
for
both
protocols
in
this
case,
especially
for
the
resolvers
that
use
both
or
that
offer
and
support
both
protocols,
you
can
see
that
dlt
has
higher
failure
rates.
So
some
inflation
regarding
the
failure
rates,
which
range
from
just
about
half
a
percentage
point
for
google
and
cloudflare
over
to
a
few
percentage
points
with
what
nine
and
clean
browsing
and
then.
Lastly,
with
more
than
30
percentage
points
for
the
local
resolvers.
G
Here,
you
can
also
see
that
uncensoredness
has
an
inflation
of
more
than
95
percentage
points,
which
indicates
that
there
have
been
issues
closer
on
the
server
side
for
dot,
whereas
for
the
other
dot
measurements.
We
see
that
the
issues
are
closer
to
the
probes
and
along
the
paths
indicating
that
the
dot
traffic
is
simply
dropped
by
some
metal
boxes,
which
also
explains
the
high
number
of
timeouts
that
we
see
for
our
measurements.
G
Okay.
Moving
on,
we
also
take
the
previous
table
that
we've
seen
and
split
this
by
region
by
the
location
of
the
probe
to
be
more
specific-
and
this
is
what
it
looks
like
so
this
matrix
shows
the
median
failure
rate
for
all
the
probes
on
a
continent
for
a
specific
resolver,
in
this
case,
with
traditional
dns
on
top
in
the
top
matrix
and
dns
over
tls
on
the
bottom
one
so
for
dot
in
particular.
G
So
the
bottom
part
of
the
figure
we
see
that
the
failure
rates
are
somewhat
varying
regarding
the
different
continents
and
resolvers.
So
some
cells
have
lower
than
one
percent
of
a
failure
rate,
whereas
some
other
cells
have
close
to
or
even
more
than
10,
and
we
also
see
that
the
majority
of
the
cells
which
have
higher
failure
rates
are
probes
or
belong
to
probes
located
in
africa
and
south
america.
So
it
seems
like
there
are
more
issues
with
these
on
these.
G
As
for
the
local
resolvers
on
the
bottom
figure,
you
can
see
that
we
only
have
results
for
probes
in
europe
and
north
america,
as
I
mentioned
before,
we
only
saw
local
resolvers
for
of
13
probes
to
return
responses
at
all,
and
these
kind
of
were
both
all
of
these
13
probes
were
located
in
europe
and
north
america
respectively,
with
both
of
the
failure
rates
being
higher
by
quite
a
bit.
G
So
we
have
here
failure
rates
of
more
than
30
to
40
percent
compared
to
most
of
the
other
cells,
which
are
mostly
single
digit
percentages
for
dot
all
right
moving
on
to
the
response
times
so
with
rapidless.
This
is
a
bit
more
tricky,
so
some
background,
explanation
of
that
for
traditional
dns.
Obviously,
you
simply
send
out
the
request
over
udp
and
then
the
resolver
will
look
up
the
requested
domain,
either
in
its
cache
or
recursively,
and
then
send
you
back
the
response
now
for
dot.
G
You
obviously
have
the
pcp
and
tls
handshakes
before
to
establish
the
connection
session
to
secure
the
traffic
and
typically
a
client
would
be
reusing.
This
session,
in
order
to
get
or
minimize
the
overhead
of
these
handshakes
now
for
ripe
atlas.
This
is
a
bit
difficult
and
different,
since
the
measurements
are
designed
to
be
independent
from
each
other,
which
means
that
these
connections
and
sessions
are
not
kept
alive
in
between
the
measurements,
meaning
that
the
response
times
that
we
measure
here
and
present
here
always
include
the
full
handshake
of
the
tcp
and
tila
sanchex.
G
As
a
result,
this
will
roughly
resemble
upper
bounds
for
the
dot
lookups
as
subsequent
domain
lookups
would
not
have
to
do
the
handshakes
and
be
or
have
a
low
response
stamps
all
right.
So
now
also
the
part
of
the
remain
lookup
may
vary
a
bit.
So
in
order
to
partially
avoid
this,
we
focus
on
the
fifth
percentiles
of
each
probe
and
resolver
tuple,
and,
given
that
we
repeat
the
measurements
over
a
period
of
a
week
and
take
the
response
times
towards
the
lower
end
of
the
distribution
with
the
fifth
percentiles.
G
G
We
can
see
that
the
medians
are
around
10
to
30
milliseconds
for
most
of
the
resolvers,
whereas
for
dot
medians
are
roughly
130
250
milliseconds
for
the
faster
resolvers
going
up
to
more
than
200
milliseconds
for
some
of
the
slower
resolvers,
and
in
this
case
unsensitiveness
is
an
outlier
again
at
one
second
for
median.
G
So
when
comparing
both
of
these,
we
can
see
that
the
response
times
for
dot
are
higher
by
more
than
100
milliseconds
when
compared
to
the
traditional
dns,
mostly
due
to
the
connection
and
session
establishment,
and
before
we
saw
that
local
resolvers
have
more
failures
than
public
ones
for
dot.
But
in
this
case
the
response
times
are
actually
somewhat
comparable
for
these
two
types
of
resolvers.
G
Now.
Lastly,
again
we
split
this
by
original
aspects.
I
don't
want
to
go
into
too
much
detail
here,
so
just
briefly
going
over
it.
This
is
how
it
looks,
and
we
see
that
before
the
failure
rates
were
somewhat
varying,
but
also
the
response
times
for
dot
dot
is
highly
varying
regarding
the
cells
on
the
bottom,
which
you
can
see.
G
So
just
to
sum
it
up.
In
conclusion,
we
saw
that
the
t
adoption
is
still
somewhat
low,
with
less
than
half
of
a
percent
for
both
the
ipv4
address
space
and
local
resolvers
of
the
ripe
atlas
probes.
G
And
lastly,
regarding
the
response
times,
we
saw
that
the
unt
is
slower
by
more
than
100
milliseconds,
largely
due
to
the
connection
and
session
overhead,
as
I
just
mentioned,
although
the
response
times
for
public
and
local
resolvers
are
somewhat
comparable
and
yeah.
Finally,
as
mentioned
all
the
measurement
related
material
is
online
on
our
github
repository
and
also
the
paper
with
more
details,
and
if
you
have
any
questions,
you
can
feel
free
to
send
us
an
email,
and
I'm
also
happy
to
take
questions
right
now.
If
there
are
any.
B
Yes,
there
was
one
comment
about
do
compared
to
dot
in
the
chat.
Are
you
planning
to
look
at
this
as
well.
G
However,
there
are
some
papers
that
have
already
looked
at
that,
so
this
is
probably
not
something
that
we
would
be
looking
into,
although
there
have
been,
as
I
said,
some
studies
that
already
did
that.
A
All
right
thanks
fiat
in
the
interest
of
time
we're
going
to
switch
on
thanks
for
bringing
work
that
we
scooped
pam
on.
You
saw
the
results
here
before
they're
in
pam,
which
is
one
of
the
things
that
mary
and
I
try
to
do
in
this
group-
is
bring
you
the
measurement
result
before
it's
published.
So
thanks
a
lot.
Okay,
thanks!
So
much
so
we
have
phone
up
next
and
just
to
prove
that
his
share
should
work.
A
A
D
A
And
I'll
start
a
timer
and
I'll,
let
you
know
if
it's
looking
like
you're
having
trouble
without
two
minutes
in
okay.
D
D
So,
for
example,
this
domain
name
just
based
on
the
domain
name,
you
can
actually
tell
a
user
say
online
shopping
activities,
health
conditions,
religions,
gender
identities
or
even
sexual
habits.
So
where
is
domain
name
actually
exposed
on
the?
Why
so?
This
slide
shows
you
like
two
common
places
where
domain
information
can
be
monitored
by
an
on
pad
attacker,
and
these
are
packets
captured
when
visiting
example.com.
D
So
the
first
place
where
domain
example.com
exposed
is
through
dns
queries
and
responses,
and
after
getting
back
the
ip
address
of
example.com,
the
client
initiates
connection
to
portfolio
3
starter
2s
handshake,
and
this
is
the
second
place
where
domain
name
information
is
exposed,
and
this
is
because,
in
previous
2s
version,
until
1.2,
the
2s
handset
take
place
before
the
actual
encryption
happens.
So
it
exposed
the
domain
name.
Information
on
an
extension
called
server
name
indication.
D
D
The
methodologies
that
we
use
to
conduct
our
measurement
and
then
I
will
show
the
analysis
on
the
privacy
benefit
provided
by
domain
name,
encryptions
due
to
domain
co-hosting
and
the
dynamics
of
domain-to-ib
mapping.
And
finally,
I
will
discuss
the
potential
approaches
for
domain
owners
and
hosting
providers
to
help
with
increasing
the
privacy
benefit
of
domain
name
encryptions.
D
So
to
address
the
previous
security
and
privacy
problems
of
plaintext
domain
being
exposed
on
the
y
domain.
Encryption
has
been
suggested
in
several
proposals
with
dot
com.
First
and
here,
dns
queries
and
responses
will
be
transmitted
over
a
2s
channel
and
with
doh
dns
resolutions.
You
perform
over
https
and,
of
course,
they're
going
to
be
through
port
443
instead
of
853
and
therefore
inheriting
all
of
the
security
and
benefits
of
https
protocol
and
finally,
starting
with
2s
version
1.3.
D
The
server
name
indications,
extensions
in
the
client,
hello
message
during
the
2s
handshake
can
be
encrypted,
and
with
this
proposal
the
domain
name,
information
on
both
dns
and
ts,
handshake
traffic
can
can
be
secured.
So
in
this
new
setting,
a
user
and
the
dns
resolver
first
established
a
channel,
and
this
is
an
encrypted
channel.
D
It
can
be
over
https
or
over
2s
and
after
that,
all
of
dns
queries
and
respond
as
sent
over
this
channel
and
similarly
during
the
2s
handshake
process,
unlike
in
previous
cos
version,
sim
version
1.3
their
server
name,
indication
will
be
encrypted
too.
I
mean
this
is
an
optional
option.
It's
not
compensate
so
in
the
rfc
say
that
it's
it's
optional,
so
this
news
protocol
will
prevent
any
unpat
observer
from
seeing
plain
tech
domain
name
on
the
wire.
D
D
So
here
come
the
motivations
of
our
study.
Given
that
the
destination
ip
addresses
are
still
visible
on
path
to
unpack
observer,
we
are
interested
in
quantifying
the
potential
improvement
to
user
privacy
that
a
full
deployment
of
domain,
name
and
christian
could
achieve
in
in
practice,
and
the
extent
to
which
the
insurance
can
be
easily
met
depend
on
two
factors:
first,
whether
all
the
domains
are
hosted
on
the
same
ip
address
or
not,
and
second,
the
statability
of
mapping
between
a
given
domain
and
it's
hosting
ip
address
over
time.
D
And
next
we
perform
an
active
dns
measurements
to
resolve
these
domains
into
ip
addresses.
And,
finally,
we
analyzed
all
of
the
domain
to
ib,
mappings
and
study
their
co-hosting
degree
and
the
dynamics
of
domain
to
ibm
mappings
over
a
period
of
two
months
and
since
many
domains
and
are
hosted
on
cdns,
which
is
you
know,
they
map
the
domain
to
ip
to
different
ip
at
different
location.
D
So
let's
go
back
to
the
attack
scenario
in
which
the
attacker
can
only
see
the
ip
address
of
the
destination
server.
So,
given
an
ip
address,
there
are
two
hosting
possibilities:
the
first
one
we
prefer
to
as
single
hosted
in
which
a
domain
is
exclusively
hosted
on
one
or
more
ip
addresses
that
do
not
host
any
other
server
on
the
order
domain.
D
D
This
is
because,
from
an
adversary
point
of
view,
only
you
know
seeing
connections
to
this
ip
address
alone
is
enough
to
infer
which
domain
is
being
visited
and
for
the
second
possibility
we
prefer
to
as
mundi
hosted
in
which
an
ip
address
holds
more
than
one
domain
and,
in
this
case,
example.com
say,
is
hosted
on
one
or
more
ip
address
ip
addresses
that
also
host
many
other
domains.
D
So
analyzing,
our
data,
we
found
2.2
million
unit,
ip
addresses,
of
which
70
percent
holds
only
one
domain
names.
So
this
mean
visitor
of
domain
hosted
on
this
addresses
will
not
gain
any
privacy
benefit
by
domain
name
encryption
because
of
the
one-to-one
mapping
between
the
domains
and
ip
address
and
the
rest
30
of
ip
addresses
holds
more
than
one
domains.
D
However,
when
considering
the
number
of
unit
domains,
so
this
plot
is
showing
the
number
of
unit
ip
address,
but
when
we
look
at
the
number
of
unit
domains,
this
seventy
percent
of
ib
addresses
correspond
to
only
1.4
million
domain,
which
is
like
90
of
our
test
data
and
the
rest.
Thirty
percent
of
ib
address
hosts
six
point:
one
million
domains
with
its
more
than
eighty
percent
of
domains
of
in
our
study,
which
which
shows
some
co-hosting
degree
among
this
group
of
six
point,
one
6.1
millions
domains.
D
So
next
we
analyze
the
hosting
degree
as
the
percentage
of
all
domains,
so
90
of
domains
are
single
hosted,
which
means
this
domain
will
will
not
get
any
privacy
benefit
from
domain
encryption
and
the
rest.
Eighty
percent,
sorry,
the
rest.
Eighty
percent,
eighty
percent
of
domains
are
co-hosted
with
more
than
one
domain
and,
however,
you
know
like
they're
around
40
percent.
Here
of
co-hosted
domain
host
with
less
than
10
domains
as
we
we
move
toward
the
right-hand
side,
the
hosting
degree
increase,
and
so
there's
the
privacy
benefit.
D
The
average
number
of
unit
ip
addresses
observed
for
each
provider
is
very
low,
with
half
of
the
hosting
would
have
I'm
hosting
all
domains
on
a
single
ip
address,
and
we
use
the
heuriken
electrics
bgb
toolkit
to
confirm
that
this
provider
is
actually
very
small
providers,
with
many
of
them
managing
less
than
10
000
ip
address
allocated
by
their
regional
internet
authorities
and
when
looking
at
the
popularities
of
domain
hosted
on
this
provider,
as
shows
in
the
last
column,
the
highest
rank.
One
is
at
386
positions
hosted
on
squarespace.
D
Why?
More
than
half
of
these
providers
host
domain
at
well
below
top
ten
thousands.
So
the
takeaway
here
is
that
small
providers
tend
to
host
a
large
number
of
less
popular
domains
on
a
small
number
of
ip,
and
then
we
investigate
the
top
co-hosting
degree,
the
co-hosting
degree
by
major
providers
that
dominate
the
largest
number
of
unit
ip
addresses.
D
Ib
address
that
we
observe
so
unlike
small
hosting
provider,
the
majority
provider,
the
major
provider
hosts
more
popular
domain
and,
as
you
can
see
in
the
last
column,
most
in
popular
domain
hosted
by
this
provider
are
well
within
the
top
ten
thousands
rank,
however,
in
contracts
to
the
small
providers,
the
co-hosting
degree
per
ip
address
offered
by
this
provider
is
is
quite
low,
except
for
cloudflare,
in
this
case,
with
the
co-hosting
degree
of
16
domains
per
ip
address
that
we
see
from
our
experiment
the
rest
of
the
providers
host,
let's
say
less
than
10
domain
per
ip
address.
D
A
D
A
D
Got
sorry
I
just
want
to
let
you
know:
you've
got
two
minutes,
two
minutes.
Okay,
so
the
previous
two
slides
show
the
true
end
of
a
privacy
spectrum
that,
on
the
right
hand,
side
there's,
there's
domain
that
co-hosted
less
popular
domain.
That
co-hosted
on
together
with
you
know,
benefit
from
the
domain
christians,
but
on
the
right
hand,
side,
there's
more
popular
domain,
but
they're
not
co-hosted
with
any
other
domains
and
thirst.
You
know,
receive
less
privacy
benefit
from
domain
name
encryption.
D
So
let
me
skip
this
line.
Okay.
So
let's
go
to
the
dynamic
of
domain
to
ib
mapping,
so
we
conduct
our
measurement
over
two-month
period
and
we
see
that
you
know
like
there's,
22.7
million
domain
to
ip
mappings
and
the
dynamics.
80
of
them
are
very
dynamic,
which
means
they
survive
for
only
four
days
and
only
13
of
them
are
stable.
D
D
So,
in
summary,
you
know,
regardless
of
the
trend
of
core
locations
on
the
web
domain.
Encryption
cannot
really
provide
meaningful
privacy
benefit
with
the
current
degree
of
domain
to
domain
co-hostings
in
the
ip
address.
Information
can
still
visible
to
any
unpacked
observer,
and
to
that
end
we
make
these
two
recommendations.
D
D
A
Thanks
thanks
a
lot
phong.
Okay,
so
you
can
reach
phong
at
that
address.
We've
got
a
couple:
people
in
the
queue;
let's
take
them,
that'll
be
andrew
and
eric.
Let's
make
it
quick,
because
we
only
have
an
hour
meeting
here
and
and
then
we'll
switch
to
robin
so
andrew
go
ahead.
H
Yeah
hi
thanks
brother.
I
was
very
interesting
presentation
and
and
good
good
robust
data.
I
just
wanted
to
highlight
a
couple
of
us
made
a
point
in
the
chat
that
I
think
yeah.
You
made
a
good
case
for
for
the
privacy
benefits
of
co-location
and
so
on
or
using
cdns.
H
Did
you
consider,
though,
the
potential
privacy
risks
and
also
security
risks
of
increased
centralization,
because
potentially
europe,
if
you're,
encouraging
centralization
to
get
privacy,
but
you
also
sacrifice
privacy
with
centralization,
so
I
don't
know
if
you've,
given
that
any
thought
at
all.
Thank
you.
D
Yeah,
actually,
this
is
another
another
paper
that
actually
we
wrote
last
year
submit
to
nds
math
web
about
centralizing
all
of
the
dns
query,
just
to
a
single
resolver
like
a
lot
of
privacy,
controversial
happening,
especially
with
cloudflare
and
firefox.
To
me,
that
is
terrible
too,
like
we
don't
want
to
give
all
of
our.
You
know
browsing
history,
just
to
one
single
resolver,
say
cloudflare.
D
So
to
me
like
this
is
orthogonal
to
what
I
just
present
here,
but
in
terms
of
dns
queries
in
in
the
paper
in
another
paper,
we
suggest
that
we
should
split
them
out
distributed
across
number
resolver,
so
that
you
know
your
browsing.
History
is
not
known
by
just
a
single
actor.
I
Thanks,
well,
that's
my
cue.
I
think
erica
scorland
there's
a
lot,
so
we
actually
did
look
at
what
you
suggest
in
terms
of
spamming,
the
queries
across
multiple
solvers.
The
problem
is,
then
every
resolver
gets
a
copy
of
a
partial
copy
of
your
browsing
history
and
given
how
much
redundancy
the
browsing,
this
tradition
makes
the
problem
worse,
not
better.
I
So
if
you
have
a
fix
for
that,
I'd
be
interested
in
hearing
it
the,
but
the
reason
I
got
up
actually
was
to
ask
martin
thompson
and
yaryarco
how
to
document
this
actually
and
ted
hardy.
I
believe
so.
I
The
reason
I
got
up
actually
was
to
is
to
ask:
have
you
done
any
assessment
on
the
sort
of
relative
impact
of
of
you
know
the
actual
size
of
the
anonymity
set,
so
I
mean
like
in
some
cases,
10
is
quite
good
right
if
it's
like
facebook-
and
you
know
telegram
that
actually
conceal
intelligence,
traffic
and
facebook
is
quite
valuable.
I
On
the
other
hand,
and
also
there's,
I
think,
a
question
about
how
much
does
reducing
an
enemy
set,
help
with
traffic
analysis
attacks,
because
I
know
there's
been
a
lot
of
work
on
that
and
those
typically
don't
work.
Well,
if
you
have
a
very
large
number
of
days,
but
quite
well,
when
you
have
a
small
number
of
domains.
D
Cool
yeah,
so
thank
you
for
the
questions
so
in
terms
of
sensitivity
of
a
website.
It's
really
debatable
here,
like
what
should
we
consider
sensitive?
What
is
not,
and
you
know
like
if
you
go
to
just
an
ip
within
the
range
of
say,
the
as
google
you
would,
whatever
you
use
youtube
google
map
or
whatever
gmail
people,
don't
really
care,
but
for
in
term
of
sensitives
the
anonymity
set
here,
sometimes
a
set
can
be
a
hundred
and
if
just
one
of
them
in
their
sensitive
then
that
whole
set
can
be
really
sensitive.
D
Actually,
we
have
a
follow-up
work
on
this
and
we
actually
do
traffic
analysts
on
a
single
website
per
website
and
for
200
we
analyze,
200
000
website
include,
half
of
them
are
popular
and
half
of
them
are
sensitive
websites
and
just
based
on
the
ip
address
you
can
pinpoint
more
than
90
90
percent.
Precisely
what
even
like
the
sensitive
website
in
the
long
tail
with
the
k
anonymity
really
high,
but
you
can
still
pinpoint
with
really
high
precisions,
based
just
on
the
ip
address.
A
Okay,
thanks
guys,
I'm
sorry,
I
have
to
cut
you
off
we're
doing
this
experiment
with
these
hour-long
meetings.
If
you
write
the
chairs
between
the
meetings,
if
you
think
we
should
do
something
different
and
I'm
sorry,
I
dropped
eric
from
the
queue
just
because
we're
gonna
be
we're.
Gonna
run
a
couple
minutes
over,
so
that
we
can
keep
the
session
links
for
the
other
two
presenters
thanks
a
lot
fong
robin.
Can
you?
Can
you
open
your
slides
or
do
you
need
me
to
do
it?
J
A
J
All
right,
so,
let's
talk
q
log
again,
so
q
log
stands
for
quick
logging,
which
is
an
effort
started
about
two
years
ago
to
make
the
new
quicken
h3
protocols
a
bit
easier
to
debug
and
analyze,
because,
typically,
what
you
would
do
is
you
would
take
a
network
packet
capture
at
some
point
and
then
use
something
like
wireshark
to
analyze
that
and
that's
of
course,
still
possible
with
quick
but
more
difficult,
because
quick
encrypts,
almost
its
entire
transport
layer
as
well.
J
So
to
do
this,
you
would
have
to
store
the
entire
packet
capture,
including
the
full
payload
leading
to
some
scalability
and
privacy
issues.
I'm
sure
some
of
you
have
experienced
this
with
encrypted
application
layer
protocols
in
the
past
as
well.
There's
a
second
long-standing
problem
with
this,
and
that
is
that
not
all
aspects
of
the
protocol
are
reflected
on
the
wire,
for
example,
for
transport
things
like
congestion
control,
typically
aren't
seen
and
so
difficult
to
analyze.
J
The
core
thing
there
is
that
we
propose
to
do
this
in
a
standardized
way,
so
it
is
a
single
format,
a
single
schema
that
all
of
the
implementations
follow,
making
it
easier
to
create
reusable,
tooling
and
also
to
share
data
in
between,
and
that's
of
course,
not
rocket
science.
It's
relatively
simple,
as
you
can
see
here,
for
now,
we
are
using
json
as
a
serialization
format,
and
we
just
define
you
know
if
you
want
to
log
receive
packets.
J
J
This
approach
of
having
a
shared
format
and
reusable
tools
has
turned
out
quite
interesting
and
powerful,
leading
to
the
majority
of
the
quick
implementations
actually
outputting.
This
format,
most
notably
facebook,
is
using
this
in
production
to
help
analyze
their
deployment
of
quick
and
hp3
at
scale.
J
Because
of
this
relative
success,
we
now
have
plans
to
adopt
qlog
in
the
quick
working
group.
They
have
been
separate
drafts
until
now,
but
so
we're
moving
to
adoption
soon
and
our
goal.
J
There
is
not
just
to
flash
this
out
for
quick
and
h3,
but
also
to
start
looking
at
how
we
can
bring
this
to
different
protocols
and
different
use
cases
as
well,
because,
as
you
can
imagine,
this
type
of
approach
can
be
very
useful,
not
just
for
quicken
h3,
but
also,
for
example,
the
things
that
we've
listed
below
here,
one
particular
one
I
wanted
to
highlight.
This
is
the
bottom
one,
where
we
have
a
project
that
we
are
trying
to
log,
both
very
high
level.
J
J
One
example
of
that
is
is
christian
ritama,
who
has
already
done
something
similar
using
public
q
logs
from
the
quick
interoperability
servers,
so
they
have
their
public
servers.
They
expose
their
key
logs
for
for
debugging
purposes,
and
he
already
tried
to
to
measure,
for
example,
what
are
realistic
packet
loss
patterns
and
what
are
causing
them
using
these
public
key
logs.
A
Thanks
robin,
can
you
wrap
it
up?
Yeah.
J
This
is
the
last
slide
cool,
so
I
would
say,
if
you're
in
any
way
interested
in
this
type
of
thing
join
us
in
the
quick
working
group
or
let
us
know
on
github.
Thank
you.
A
Thanks
thanks
robin
for
bringing
that
update
and
accommodating
our
tight
schedule
and
and
pointing
people
to
that
tool,
all
right.
So
let's
get
so
we're
gonna
run
about
10
minutes
over
here
and
oh,
we
got
let's
pee
yeah.
We
got
pete
ready
to
go
we're
about
10
minutes
over.
You
guys
got
a
half
an
hour
until
the
next
itif
meeting.
If
you
want
to
catch
one,
so
apologies
for
running
late,
pete,
I'm
gonna
approve
your
screen
share.
K
K
So
why
did
we
go
to
the
trouble
of
gathering
data
on
ecn
at
an
isp's
border
router?
Well,
there
are
a
few
questions
we'd
like
to
answer
for
ecn
engineering
for
endpoints
we'd
like
to
know
what
proportion
of
flows
are
easy
and
capable
and
how
many
clients
are
initiating
ecn
for
middle
boxes.
We'd
like
to
know
what
proportion
of
paths
appear
to
have
3168
marking
aqms.
So
we
know
to
what
extent
future
experiments
need
to
take
that
into
account.
K
We'd
also
like
to
know
about
any
unexpected
uses
of
the
ecn
field,
since
the
signaling
methods
and
failure
modes
are
crucial
for
effective
congestion
control.
My
cautionary
note
here
is
we
think
of
this
study
as
informative,
rather
than
authoritative,
given
its
size
and
fixed
location.
However,
some
of
the
results
should
still
be
useful
and
suggestive,
so
some
basic
facts
about
the
isp
where
data
was
collected.
These
numbers
give
some
sense
of
scale.
K
So
how
do
we
actually
collect
this
data?
We
use
linux,
iptables
and
ipsets.
Ipsets
are
essentially
in
kernel
hash
tables
using
a
specified
key
of
interest
like
ip
address
or
ipn
port
and
mapping
to
a
packet
and
byte
count.
The
nice
thing
about
them
is
that
the
quantity
of
data
produced
is
relatively
small
and
easy
to
process,
and
also
you
have
fewer
privacy
concerns
than
with
packet
captures
as
we're
just
counting
packets
meeting
certain
criteria.
K
On
the
other
hand,
we
don't
know
the
benefit
of
deeper
packet,
inspection
or
flow
level
analysis.
So
there
are
some
questions.
We
can't
answer
like
exactly
what
applications
are
setting
the
bits
for
performance.
We
avoided
anything
that
could
cause
a
hash
table
lookup
per
packet,
as
some
lab
test
caution
does
against
doing
that
in
production.
So
that
leaves
us
without
much
detail
on
not
ect
packets,
which
we
can
mostly
live
without.
K
So
we
mainly
look
for
non-zero,
ect
0
in
both
directions
and
non-zero
ec
in
either
direction.
However,
we
did
see
some
anomalies
in
the
signaling
data
that
appear
to
come
from
the
os
fingerprinting
routines
of
port
scanners
to
help
explain
how
we
attempt
to
filter
these
out
with
the
ip
in
the
middle
here
ending
at
140.73.
K
We
have
ce
and
ece
marks,
with
a
ratio
close
to
one
to
one
which
doesn't
look
like
aqm
activity.
What
we
do
is
if
the
ece
the
ce
ratio
is
greater
than
two
to
one.
We
consider
it
possible
aqm
activity
if
it's
less
than
two
to
one.
We
subtract
the
ec
marks
from
the
ones
in
the
opposite
direction
and
if
there
are
any
left
over,
we
consider
it
possible.
K
K
So
what
we
did
here
is
split
the
ips
into
those
that
pass
through
a
known,
aqm
and
those
that
do
not
in
the
table.
We
see
that
60
of
ecn
negotiating
user
ips
with
a
node
aqm
saw
possible
aqm
activity,
which
roughly
represents
our
detection
rate
well
after
observations
for
three
weeks,
why
isn't
our
detection
rate
100
percent?
K
That's
very
likely
because,
as
noted
before,
you
need
both
an
ac
uncapable
flow
and
congestion
to
see
it,
so
we
simply
might
miss
it,
especially
if
there
are
aqms
and
devices
that
don't
often
see
congestion
next,
that
10.3
value
in
amber
that's
our
rough
proxy
for
the
percentage
of
random
paths
that
may
have
an
aqm
deployed,
assuming
that
we've
missed
them
at
the
same
rate
as
the
known
pass.
We
could
arguably
scale
that
up
from
ten
percent
to
about
one
and
six,
but
to
be
more
conservative.
K
K
As
for
ecn,
endpoints
set
for
non-tcp
protocols,
I'll
go
through
this
relatively
quickly,
but
unlike
tcp,
the
data
for
non-tcp
is
less
clearly
attributable
to
ecn,
because
we
apparently
observed
at
least
some
misuse
of
the
field
for
one.
The
proportion
of
marks
from
the
wan
is
higher
than
one
would
expect,
even
taking
into
account
the
ten
to
one
ratio
of
traffic
between
the
wan
and
lan,
and
also
note
that
we
saw
one
ip
with
clear
misuse
of
the
currently
unallocated
ect1,
which
accounted
for
some
97
of
the
ect1
usage
in
total.
K
So
what
are
the
possible
reasons
for
ecn
use
on
non-tcp
protocols
at
all?
There
could
be
tunnel
dcn
traffic,
but
since
there
are
different
methods
of
encapsulation
of
the
ecn
field
and
since
those
can
be
asymmetric,
it's
hard
to
say,
based
on
the
ports
we
saw
in
use
like
ipsec,
net
traversal
and
wireguard.
K
We
almost
certainly
saw
tunnel
traffic
but
tunnel
dc
in
traffic.
We
can't
say
definitively-
and
this
study
didn't
look
at
protocols
not
supported
by
contract,
which,
for
example,
includes
ipsec
esp
packets
using
ip
protocol
number
50..
There
was
some
discussion
on
the
list
that
we
could
be
seeing
quick
ecn
here.
That's
also
a
possibility
in
at
least
one
ip
despot
pair.
We
saw,
but
again
we
can't
say
for
sure,
without
deeper
packet
inspection.
K
We
think
there's
very
likely
to
be
misuse
of
the
ecn
feel
going
on
here
and
we
speculate
by
bit.
Torrent
traffic.
The
now
obsolete
rfc
1349
defines
a
value
for
minimized
monetary
cost
that
conflicts
with
ect
0
and
if
a
developer
also
forgets
to
shift
that
absolute
value
to
the
left
by
one
they
might
end
up
with
an
ect
one
as
well.
A
Queue,
I
think,
I'd
like
to
cut
it
just
because
we're
running
ten
minutes
over.
I
really
appreciate
you
guys
fitting
into
our
just
hour-long
meeting
and
maria
any
closing
thought.
A
Go
ahead
thanks
for
joining
us
thanks
for
joining
everyone,
we
had
about
135
people
really
nice.
I
would
love
to
know
if
you,
if
what
you
think
about
the
meeting
length,
we
did
these
two
one-hour
meeting
experiments
to
try
to
throw
as
much
as
quickly
as
can
we
can
as
possible
to
you,
but
we
can
certainly
do
something
else.
Next
time
take
care.