►
From YouTube: ANRW-MeasurementAndOptimisation
Description
MEASUREMENTANDOPTIMISATION meeting session at ANRW
A
A
Okay,
we
should
start
yeah,
alright.
The
first
talk
we're
about
to
have
a
measurement
session
now,
which
will
be
there's
some
three
very
interesting
paper,
so
I
I
commend
you
to
these
papers
and
I
wanted
to
mention
that
those
who
are
still
here
for
the
whole
week.
We
also
have
the
map
measurement
research
working
group
and
they
made
on
Friday,
so
people
who
are
here
giving
talks
if
you
can
tune
into
map
RJ
and
vice
versa.
This
is
very
good
okay,
so
the
first
talk
about
Allison
Menken.
By
the
way,
the
first
talk
is
called.
A
What
can
you
learn
from
an
IP
and
the
papers
by
similar
and
patil,
and
nikita
Borissov
cynllun
is
a
first-year
graduate
student
at
you
are
UIUC
nikita
Borissov.
The
key
here
is
they
on
the
faculty
and
cimarron
mentioned
that
he
would
consider
an
internship
if
people
wanna
if
people
love
this
paper
anyway,
so
they
keyed
it
over
to
you.
Take
it
away.
B
B
If
you
think
back
to
the
bad
old
days
of
about
two
decades
ago,
when
we
browse
the
web
in
using
plain
text
HTTP,
so
anybody
who
can
observe
your
network
would
be
able
to
see
everything
you
do
in
online
to
be
able
to
see
the
or
else
you're
going
to
the
contents
of
the
web
pages.
Everything
we've
come
a
long
way
since
then.
So
today,
a
vast
majority
of
all
web
traffic
is
encrypted
using
TLS.
B
So
the
all
of
HTTP
messages
that
are
sent
by
my
browser
are
typically
encrypted
by
HTTPS
and
therefore
not
available
to
the
adversary.
But
if
you
look
at
what
a
web
browser
does
before
it
sends
an
HTTP
request,
it
sends
a
couple
of
other
messages.
It
performs
a
DNS
query
to
figure
out
the
IP
address
of
a
web
server,
and
then
it
performs
a
TLS
handshake
to
establish
secure
keys
that
it
uses
to
encrypt
the
rest
of
the
connection.
B
And
if
you
look
at
the
transcript
of
this
conversation,
you
will
notice
that
the
domain
name
that
you're
going
to
actually
appears
four
different
times
in
this
transcript
in
plaintext,
in
the
DNS
query,
and
in
DNS
response
in
the
server
name
indicator
extension
in
the
client,
hello
and
in
the
server
certificate
as
the
common
name.
So
an
adversary.
B
So
if
you
use
one
three
that
one
domain
name
instance
is
gone
if
you're
using
an
encrypted
DNS
protocol,
such
as
DNS
over
HTTP
or
overt
COS,
like
we've
heard
earlier
today,
the
DNS
requests
and
responses
are
encrypted.
And
finally,
the
very
lost
plaintext
mention
of
the
domain
name
is
in
the
server
name
indicator
and
there's
a
draft
working
group
trying
to
develop
this
encrypted
S&I
extension,
and
so
hopefully
soon
we
can
get
to
a
world
where
all
of
the
plaintext
domain
name
mentions
are
gone.
B
The
only
thing
that
I
verse
3
will
see
is
what
IP
address
you're
connecting
to
so
first
I
wanted
to
mention
just
very
briefly.
Why
I
think
it's
important
that
the
domain
name
be
private
a
lot
of
times
when
you
go
to
Google
or
Amazon
or
Facebook?
It's
not
so
exciting.
What
the
meaning
you're
using,
but
a
number
of
domain
names
can
be
quite
indicative
of
various
private
attributes
such
as
your
sexuality,
your
religion,
your
interest,
your
drug
addiction
and
so
forth.
B
B
B
Then
we
extracted
all
the
domain
names
that
are
used
in
these
objects
and
use
them
to
use
the
bulk,
DNS
lookup
tool,
z,
DNS,
to
get
the
mapping
from
these
domain
names
to
IP
addresses
and
then
back
from
the
IP
addresses
to
the
main
name
using
University
on
this.
So
out
of
these
ninety
million
objects
about
1.8
million
domains
were
used.
B
We
were
able
to
look
up
almost
all
of
them,
something
figure
out
95
percent
and
we
got
a
total
about
seven
hundred
and
forty
thousand
distinct
IPS,
so
many
domains
map
to
single
set
of
ID's.
Now.
The
first
question
is:
what
can
an
average
say
find
out
if
they
just
use?
Reverse
DNS,
so
reverse
DNS
is
obviously
available.
If
you
see
that
the
address
I
can
look,
look
it
up
and
try
to
see
if
I
get
a
useful
domain
name
back
and
the
vast
majority
of
the
cases.
B
B
We
have
to
think
about
the
what
I
call
the
anonymity
set
of
each
IP
address,
which
is
just
the
list
of
domain
names
that
map
to
it.
They
in
degree
in
some
sense
the
in
neighborhood
of
the
IP
on
this
graph.
So
IP
2
in
this
graph
has
three
domain
names
that
map
to
it.
Well,
as
IP
5
just
has
a
single
one,
and
so
this
is
the
result
of
looking
at
the
IP
anonymity
sets
in
our
data
set
about
note.
B
Now,
if
our
adversary
is
actually
trying
to
figure
out
what
websites
you're
visiting,
which
is
that
maybe
the
higher-level
caste,
that's
interesting.
This
doesn't
tell
the
whole
story.
For
example,
it
could
be
an
IP
address,
like
IP
5
here
that
map's
back
to
only
a
single
domain,
but
is
used
in
a
large
number
of
websites.
As
an
example,
there's
an
IP
address
an
idea
set
that
only
now
spouts
to
a
single
domain
names,
that's
the
double-quick
net,
but
this
domain
name
is
actually
referenced
over
a
hundred
thousand
sites
in
their
sample.
B
Instead,
we
want
to
see
are
their
IP
addresses
that
show
up
for
a
single
website,
only
a
single
site
on
our
list,
references
it
and
turns
out
that
68%
of
the
IPS
that
we
see
are
have
this
property.
They
only
show
up
for
single
website.
43
percent
of
the
web
sites
have
at
least
one
resource
that
has
this
unique
property
and
in
fact,
for
almost
forty
percent
of
the
sites,
their
friends
page
has
this
property
no
other
web
site
in
the
top.
1
million
has
the
same
ip
as
the
front
page
of
us.
B
The
we're
about
the
other
60
percent
of
this
sites.
Well,
if
you
look
at
when
somebody
loads
a
website,
they
go
to
make
a
number
of
connections
to
a
number
of
servers.
So
this
is
a
fraction
of
a
trace
of
loading.
The
web
page
of
the
fairmont
hotel-
and
you
see
that
there's
connections
made
to
a
bunch
of
different
IP
addresses
corresponding
to
a
bunch
of
different
domains
and
adversary,
does
not
see
the
urls
does
not
see
the
domain
names
that
are
being
connected.
B
Assuming
we
get
to
this
ESN
I
world,
but
does
see
this
list
of
IP
addresses.
The
question
is:
how
can
this
adversary
map
back
to
a
website
that
turns
out?
This
could
be
a
bit
of
a
complex
task,
because
every
time
you
load
a
site,
you
might
get
a
different
set
of
domains.
Every
time
you
get
a
domain,
you
might
choose
a
different
IP
through
GMS
round
robin
and
it
might
be
other
complexities.
So
we
don't
precisely
answer
the
question
of
how
this
mapping
back
can
be
done
in
our
paper.
B
Instead,
we
do
a
simple
proxy.
We
try
to
say
for
a
given
site.
What
is
all
the
IPS
that
could
be
contacted
from
all
the
domains
that
are
in
our
set,
so
we
consider
this
IP
set
corresponding
to
e
site,
and
the
two
sides
have
different
IP
sets.
Then
there
is
at
least
some
chance
that
you
could
distinguish
them
based
on
their
page
load
fingerprints
and
turns
out
that
almost
every
website
has
a
distinct
IP
set.
You
know.
Actually
there
are
a
few
large
clusters,
but
for
the
most
part,
every
site
is
unique.
B
One
question
you
might
ask
yourself
is:
why
is
this
the
case
when
we
all
know
that
some
large
fraction
of
the
web
is
now
served
by
CDNs
right
shouldn't?
There
be
a
lot
of
convergence
of
sites
and
objects
to
a
single
place
and
turns
out
that
CEOs
generally
assign
different
IP
addresses
to
different
domain
names,
not
other
than
necessarily
in
a
one-to-one
fashion,
but
they
make
use
of
large
IP
address
spaces.
B
So,
for
example,
we
get
some
quick
analysis
of
CloudFlare,
they
have
about
12
percent
of
our
domains
or
five
Flair
domains
in
our
data
set
so
200,000
domains.
But
these
now
291
thousand
different
IP
addresses,
including
3
percent
of
all
the
web
pages
that
have
these
unique
front
pages
right.
So
this
is
something
that
the
way
the
CloudFlare
and
other
studios
are
configured
right.
Now
they
are
not
providing
any
privacy
of
the
IP
headers.
They
return
from
the
point
of
view,
will
try
and
identify
what
they
mean
it
is,
but
in
theory
this
could
change.
B
B
On
the
other
hand,
within
the
context
of
web
browsing
against
an
adversary
who
has
a
pretty
good
idea
of
a
set
a
website
you
might
be
visiting,
even
if
this
set
is
quite
large
million
sites,
you've
got
very
limited
privacy,
protection
from
CMS
privacy
and
so
I'll
give
you
some
really
quick
call
outs
of
our
main
numerical
results,
and
our
final
conclusion
is
that
it
may
be
possible
to
change
this,
but
it
will
require
a
change
in
how
the
web
hosting
infrastructure
and
especially
content
distribution
networks
are
structured.
Thank
you
very
much.
C
You
talked
about
CD
ends
right,
so
if
I
do
a
query
in
Germany
for
a
website
and
you
do
it
wherever
you
are
you
we
get
most
likely
the
redirected,
but
the
microphone
is
off.
Sorry,
okay,
closer
to
the
mic.
Okay,
sorry,
and
so
you
talk
about
CD
ends
right.
So
if
I
do
a
query
in
Germany
I
get
most
likely
redirected
to
a
different
IP
address
than
you
do
in
the
US
or
where
are
you
from
right?
So
yes,.
B
That
you're
able
to
collect
these
IP
mappings.
From
a
vantage
point,
that's
close
to
the
vantage
point
that
the
client
know
that
said
it
takes.
It
took
me
about
two
hours
to
look
up.
The
1.8
million
domain
names
use
the
DNS,
so
I
think
it's
not
not
that
hard.
We
write
that
you
need
to
be
able
to
localize
this.
You.
B
Do
know
whatever
okay
I
mean
one
of
the
things
I
should
say
is
that
a
common
adversary
that
people
think
about
from
in
this
context
is
your
ISP?
In
fact,
there's
been
cases
where
is
fees?
Do
traffic
analysis
of
your
web
traffic
to
try
to
insert
ads
into
your
communication?
There's
been
a
number
of
places
where
that's
that
has
happened.
B
D
Nikita
Dave
Lanka
I
thought
the
part
of
your
work
about
the
fingerprinting
about
the
multiple
places
people
go
was
super
interesting
and
then
the
comment
I
have
about
going
forward
about
thinking
what
we
can
do
in
the
future
here
I
see,
you
only
use
these
really
small
IP
addresses
that
have
like
three
dots
in
them
or
the
the
the.
What
we
measure
is
there
about
two
and
a
half
billion
of
these
small
IP
addresses
the
old
ones
used
in
the
world
today,
but
we
see
that
many
v6
addresses
every
day.
D
So
I
would
love
to
see
this.
This
study
done
on
b6,
because
in
v6
you
have
a
lot
of
Liberty
as
a
service
provider
and
as
a
client
to
change
the
prefix
length
at
which
you
use
the
address
and
also,
of
course,
to
change
the
address
on
every
single
thing.
You
do
so
me
as
a
v6
client
I
could
use
a
different
address
for
every
different
site.
D
B
B
B
Actually
normalizing
the
IP
addresses
that
get
returned
from
VMs
right.
So
when
they
return
their
DMS
results,
they
could
either
return
you
a
random
addresses
within
their
space,
or
they
could
just
send
everybody
to
one
one.
One
one
right
I
mean
I'm,
told
collectively
that
quite
won't
quite
work,
but
something
very
close
to
that.
An
epsilon
away
from
that
would
work
for
their
infrastructure.
F
Larger,
yet
this
is
really
cool
and
really
sort
of
depressing.
At
the
same
time,
I,
probably
at
the
same
reaction
at
the
a
fad.
That's
like
you
know,
is
it
really
only
the
city
ends
that
that
can
do
something
here
or
would
be
six
make
it
better
away.
Is
there
something
that
the
clients
could
do
right?
Could
you
like
poison,
the
lookups
that
you
did
you
do
for
website?
Could
you
throw
some
garbage
in
there?
Is
there
anything
more
that
we
can
do
that
without
like
needing
to
wait
for
the
CD
ends,
yeah.
B
I
think
it's
possible.
The
client
could
use
something
like
this,
so
you
could
in
theory,
if
you
can
detect,
for
example,
once
again
a
cloth
or
IP
address.
You
can
actually
do
something.
That's
akin
to
what's
called
the
main
fronting
and
just
pick
a
different
IP
address
in
a
quieter
space
as
long
you
send
the
right
SNI
and
you
can
encrypted
one
for
CloudFlare.
You
can
do
that
so
there's
some
opportunities
to
do
something
like
that.
The
data
I'm
happy
to
share
it.
I
haven't
put
it
online
because
it's
about
a
100
gigabyte
database.
B
But
if
somebody
wants
to
look
at
our
our
CDN
analysis
is
actually
the
reason
we
look
at
CloudFlare
is
that
they
have
a
list
of
all
ip's
that
they
use
we'd
love
to
do
the
same
analysis
for
those
CD
ends,
but
the
tools
we
found
for
detecting,
which
CDN
is
being
used
so
far,
I've
been
and
not
so
good.
So
if
any,
when
those
good
ones,
please
let
me
know.
G
B
Think
that's
a
good
point.
This
is
so.
If
you
look
at
this
page
load,
fingerprint
I
mean
this
is
in
some
sense,
similar
to
the
process
and
traffic
analysis
that
we
call
the
website
fingerprinting,
where
it's
sort
of
looking
at
IP
addresses
you
look
at
traffic
shapes
that
are
doing
there
in
that
area.
There's
a
lot
of
research
right
now
that
actually
looks
at
this
question
of
noise
and
what
kind
those
you
can
add
and
what
kind
of
ambient
noise
you
know
you
do.
B
Do
these
things
actually
work,
for
example,
trying
to
identify
when
you
even
start
looking
at
a
website
versus
you
know,
if
you
come
into
an
if
you
come
in
in
the
middle
and
ice,
you
know
try
and
try
to
partition
things
arousing
a
multiple
tab.
So,
yes,
that's
true
the
exact
extended
to
which
this
would
work
and
the
fingerprinting
context
will
vary,
and
then
maybe
some
of
the
techniques
that
people
use
to
identify.
You
know
to
try
to
improve
website
finger
painting
to
do
partitioning
or
other
kinds
of
you
know,
analysis,
and/or,
insurance.
G
B
Noise
to
this
as
well,
it
doesn't
really
help
unless
you
have
a
really
a
good
noise
model
in
a
sense
that
you
know
if
you
have
some
very
sensitive
site
and
you're
accessing
an
IP
that
only
belongs
to
that
site.
What
are
the
chances
that
you
would
have
picked
that
through
just
random
connections
it
getting
good
kind
of
background
noise
model
is,
is
hard
but
I'm
not
saying
it's
impossible.
H
Daniel
Kahn
Gilmore
from
the
ACLU,
so
thanks
for
doing
this
work,
it's
really
good
to
see
the
problem
stated
as
clearly
as
I
think.
You've
stated
it,
and
there
have
been
a
few
comments
in
the
mic
line
here
about
what
could
the
clients
do
differently?
What
could
the
CDNs
do
differently
and
I'm
wondering
if
what
you've
looked
at
has
given
you
any
insight
also
into
what
the
individual
website
operators
could
do
differently.
H
B
A
great
question
I
mean.
Obviously
the
first
step
is
hosting
your
first
party
site
somewhere
that
has
collisions
with
other
websites,
but
then
about
the
third
party
thing.
It's
interesting
because
in
in
some
cases
adding
more
third
parties
might
help,
because
you
get
more
intersections,
it's
not
something
we've
once
again,
we've
looked
at
specifically,
but
yeah
I,
think
I
think
this
is
definitely
interesting
to
think
about
it.
From
all
three
perspective,
like
you
said,
from
a
CDN
perspective,
from
a
user
perspective
and
from
a
web
server
a
website
perspective
Thanks
great.
A
I
A
A
I
I
So
we'll
go
through
this
question
in
three
sort
of
distinct
phases.
The
first
we'll
look
at
some
of
the
issues
around
containerization
on
and
we'll
talk
in
particular
about
what
containerization
means
for
internet
measurements
and
then
we'll
look
at
a
proposed
solution
that
we've
develop.
What's
called
an
ace
and
we'll
also
present
an
evaluation
of
our
solution.
I
So
first
I'd
like
to
give
just
a
brief
sort
of
run-through
of
what
containers
are:
it's
they're,
essentially
a
new,
relatively
new
isolation
mechanism
that
allows
us
to
run
applications
in
isolation
using
a
single
kernel
so
rather
than
a
VMware,
you
have
to
run
an
entirely
different
operating
system
for
each
application.
You
can
run
the
same
operating
system,
but
we
use
these
recent
sort
of
developments
in
Linux
and
other
OS
is
to
achieve
isolation
between
the
processes
of
an
application.
I
Namespaces
allow
us
to
isolate
what
application
can
access
and
see
groups
allow
us
to
acts
to
allocate
resources
to
those
applications,
so
you
can
give
a
particular
application
certain
share
of
CPU
cycles
or
particular
memory
access,
or
particularly
on
a
new
mode
of
affinity
and
containers.
Due
to
some
of
these
advantages,
there
are
rapidly
replacing
VMs
and
cloud
computing
settings
and
and
high-performance
computing
as
well.
A
lot
of
different
sort
of
software
deployment
settings
are
using
containers
now
because
of
the
advantages
and
because
of
the
utility
of
having
convenient
packages.
I
So
if
we're
internet
measurement,
why
do
we
care
about
containers
and
the
reasons
are
fairly
similar
to
the
reasons
any
application
developer?
Deployer
might
care
about
containers?
Are
they
allow
us
to
streamline
experiments
the
deployment,
the
the
sort
of
packaging,
the
standardization
of
them,
so
scripts
tools,
libraries
even
like?
I
So
the
answer
the
question
we
started
with
are
sure
we
can
I
mean
it
actually
has
already
been
being
done
in
in
planetlab,
for
example,
we
want
to
look
in
this
study
at
y9,
in
other
words,
whenever
you
virtualized
something
there's
going
to
be
overheads
virtualizing,
something
using
containers
or
what
are
the
overheads?
It's
it's.
I
But
the
key
issue
that
we
are
we
identify
we
want
to
bring
to
people's
attention
is
that
when
you
add
multiple
containers
on
the
same
host,
this
latency
overhead
increases
we've
measured
it
up
to
three
hundred
microseconds
depending
on
traffic,
and
if
you
continue
to
add
containers,
you
can
kind
of
continue
to
push
this
latency
overhead
higher.
So,
if
you're
doing
any
monument
that
deals
with
timing,
so
if
you're,
trying
to
measure
Artie,
T
or
one
way,
delay
or
do
inference
is
based
on
timing
on,
you
have
to
deal
with
this
in
a
container.
I
You
have
to
deal
with
these
non
constant,
latency
overheads,
and
we
also
we
observed
that.
There's
a
lot
of
work.
That's
going
into
kind
of
trying
to
solve
this
problem
of
the
latency
overhead
in
containerized
network
virtualization
systems
such
as
slam
and
free
flow,
but
these
efforts
for
the
most
part
don't
really
have
any
benefit
for
internet
measurements.
Lymon
particularly
uses
flow
based,
virtualization
free
flow
and
require
a
specialized
RDMA
hardware
and
drivers
on
both
endpoints.
So
in
the
context
of
Internet
motion
that
neither
of
these
solutions
are
really
very
useful.
I
Just
to
give
a
sort
of
illustration
of
of
latency
of
this
number
of
300
microseconds
brother,
it
isn't
you
know
whole
lot,
but
the
speed
of
light
is
an
roughly
90
kilometer
radius.
So
if
you're
off
by
30,
microseconds
you're
off
by
like
19
kilometers
and
I've
done
the
a
circle
around
Montreal,
so
he
reaches
onto
the
United
States
and
several
other
cities
and
towns
around
in
Canada
and
using
this
is
this
number
is,
is
calculated
actually
from
from
Amazon's
latency
statistics.
I
They
calculated
a
particular
rate
at
which
they
last
sales
money
for
milliseconds,
and
so,
if
we
translate
it,
listen
to
our
latency
we're
talking
about
an
area
of
about
1.2
million
dollars.
So,
if
you're
trying
to
estimate
how
much
is
this
thing,
gonna
cost
me
how
much
I'm
gonna
lose
if
my
latency
increases?
That's
the
that's
the
margin
of
error
you're
talking
about.
I
Additionally,
it's
hard
to
isolate
these
kind
of
latency
in
between,
if
you
observe
a
latency
from
a
particular
application,
if
you
can't
there's,
nobody
you
could
ask
right
now
like
where
is
this
latency
coming
from?
Is
the
physical
network
is
that
the
operating
system
is
that
the
application
is
that
the
virtualization
layers
so
we're
addressing
in
this
study
how
to
account
for
these
latency
overheads
in
a
running
container
system?
In
other
words,
we
want
to
do
this
in
real
time.
We
don't
want.
I
So
we
imagine
on
containers
for
measurement
running
and
the
typical
sort
of
bridged
or
overlay
container
network
virtualization.
So,
as
shown
on
the
right,
you
have
an
interface
this
in
the
containers
network
namespace
this
the
the
ethos
Eero
at
the
top,
and
then
you
have
layers
of
bridges
and
routing
so
that
packets
eventually
traverse
an
interface
on
the
on
the
load
on
the
host
node.
That's
the
easier
of
the
bottom,
and
so
the
goal
of
our
solution
is
to
measure
that
red
dotted
line.
I
So
we
want
to
know
how
long
does
it
take
packet
to
go
from
the
container
to
the
host
for
ingress
and
hourglass?
We
want
you
Boston,
try
to
do
this
per
packet
and
you
of
course,
have
high
accuracy,
additionally,
our
goals
in
developing
the
solution
that
we
do
not
want
to
incur
additional
latency
in
the
virtualization
layers.
So
often
when
you
start
probing
things
and
we'll
talk
more
about
this
later,
when
you
start
probing
things,
you
tend
to
increase
the
latency,
but
we
want
to
minimize
that
impact.
And
finally,
we
want
container
friendly
interface.
I
So
we
want
your
container
to
be
able
to
ask
one
of
my
latency
to
the
outside
world
in
real
time
when
it,
whenever
it
pleases.
So
the
method
that
we've
developed
to
do
this
uses
Linux
kernel
trace
prints,
which
are
essentially
hooks
into
the
particular
lines
in
the
kernel
source
code.
You
can
insert
your
own
probe
functions
into
those
lines
and
really
muck
things
up.
I
I
So
between
those
two
we
can
find
we
can
actually
access
sort
of
both
the
system
color,
which
is
the
applications,
are
in
the
container,
so
that's
kind
of
the
top
of
that
dotted
line
and
then
also
the
net
device
layer
which
gives
us
access
to
closer
to
the
bottom.
We
have
some
ideas
about
how
to
go
for
the
further
down
into
the
hardware.
We
originally
had
implemented
this
solution
using
existing
tracer
such
as
F
trace,
and
so
this
is,
of
course,
the
logical
first
step.
I
What
we
notice
is
that
these
tracers
are
great
for
doing
sort
of
diagnostic
stuff.
If
you
have
a
problem,
you
spin
up
the
tracer,
you
figure
out
what's
going
on,
but
if
you
want
to
run
them
as
like
a
monitoring
solution,
they're,
really
terrible
because
they
have
this
huge
perturbation
on
the
system.
And,
moreover,
what
happens?
If
you
do
a
trace,
is
you
have
an
enormous
amount
of
trace
data
they
get
split
out
and
you
have
to
somehow
store
that
filter
through
it
and
distill
it
into
the
actual
latency
results
that
you
care
about.
I
So
due
to
this,
we've
developed
a
application,
specific
kernel,
module
specifically
for
MA
for
monitoring
this
virtualization
latencies,
and
we
imagine
that
this
is
a
module
that
you
could
run
on
container
hosts
that
care
about
doing
this
kind
of
measurement
and
it
reports
directly
to
containers
namespaces.
So
this
solves
the
other
goal
of.
How
can
we
make
a
nice
container
friendly
interface
we'll
go
through
the
design
of
this
module
really
quickly
on?
Essentially,
we
have
packets,
egressing
and
ingressing
the
virtualization
system
on
the
top
and
these
packets
trigger
the
trace
events.
I
So
the
for
trace
events
that
we
are
actually
probing
are
listed
next,
this
enter
and
then
net
dev
start
X
MIT
for
the
egress
path
and
what
we
do
is
on.
We
filter
these
events,
so
we
only
listen
to
particular
interface.
We
only
register
events
on
particular
interfaces
that
we're
listening
for
that
we
care
about,
and
also
we
only
register,
events
in
particular
namespaces
that
we
care
about
so
we're
already
doing
some
data
reduction
at
that
level
makes
weekly
events,
so
we
have
to
have
correlation.
I
Each
packet
incurs
a
whole
series
of
events
as
it
passes
through
the
virtualization
there's,
so
to
be
able
to
correlate
which
events
map
to
which
packet-
and
we
do
this
using
hash
tables
for
both
ingress
and
egress.
And
finally,
we
are
push
Layton
sees
into
these
ring
buffers
and
we
maintain
a
ring
buffer
per
network
namespace
so
that
your
container,
when
it's
asking
for
latency,
is
basically
just
doing
a
simple
read
a
room
buffer,
that's
allocated
for
its
network.
Namespace
I
mean
they're
accessible
via
device,
special
files.
I
I
So
when
you
read
from
what
these
device
files
it
basically
checks
which
namespace
you're
reading
from
and
depending
on
that,
it
will
give
you
results
for
your
namespace
if
your
namespace
is
being
actively
monitored-
and
this
is
our
open
source,
the
the
link
is
there
we'll
make
these
slides
available
later.
So
if
you
want
to
check
out
the
source
code,
if
you
want
to
help
us
there's
a
bunch
of
sort
of
points
later,
if
you
want
to
help
us
continue
developing
this,
that
would
be
excellent
too.
I
So,
just
to
give
you
a
sense
of
the
interface
I'll
go
through.
Are
the
steps
involved
in
running
this
from
a
container
assuming
the
module
is
already
loaded
on
the
host.
So
the
first
step
is
to
basically
echo
a
non-zero
value
into
this
eyes
at
one
of
these
sort
of
sisyphus,
control
files.
So
this
basically
tells
the
module
ok,
this
namespace,
whoever
is
echoing
this
nonzero
value-
wants
me
to
monitor
their
ingress
is
so
then
it
sets
up
the
necessary
ring
buffer
and
everything
to
do
it.
I
Next,
you
execute
I
just
have
to
kind
of
point
this
over
there.
Next,
you
execute
your
experiment,
so
maybe
were
just
looking
for
the
RTT
to
the
nearest
host:this,
gonna
call
themselves
google.com
and
finally,
you
collect
the
latencies,
so
this
is
as
simple
as
reading
from
a
device
file
which
basically
exposes
the
current
effort.
That's
allocated
for
that
M
space
and
for
now
we're
kind
of
just
reporting
the
agencies
in
this
nice,
mostly
human,
readable
format,
there's
a
timestamp.
I
So
how
do
you
know
these
numbers
are
correct
right?
You
get
these
sort
of
cryptic,
looking
integers
and
we
say
they're
nanoseconds,
but
why
would
we
ever
trust
that
without
some
substantiation,
so
we
have
so
far
executed?
It's
relatively
primitive,
but
we
feel
it
supports
the
accuracy
of
our
tool
so
evaluation,
the
issues
there's
no
direct
method.
In
other
words,
we
can't
directly
observe
what
we're
trying
to
measure,
because
in
order
to
do
that,
we
would
again
be
incurring
additional
perturbation
and
the
actual
thing
that
we're
measuring
would
be
changing.
I
We
develop
a
method
which
basically
evaluates
the
accuracy
of
our
latency
measurements
by
evaluating
the
ability
for
us
to
take
ITT
measurement
from
the
container
and
compare
it
with
an
RTT
measurement
made
from
a
host.
So
what
we
want
is
we
want
to
be
able
to
use
our
latency
measurements
to
subtract
the
latency
overhead
from
inside
the
container
and
to
create
you
know,
containers
that
can
take
measurements
as
accurate
as
those
made
on
the
host
natively.
So
to
do
this,
we
first
take
the
RTT
from
the
container
then
regalo.
I
The
latency
is
computed
by
mase,
and
then
we
subtract
the
two.
So
we
call
this
the
corrected
RTT
and
then
we
compare
this
to
a
ground
truth.
Rtt
of
the
network,
which
we
use
Hardware
time,
stamping
to
achieve
so
valuation,
does
this
over
a
single
link
and
minimize
the
network
effects.
Of
course
we're
interested
in
the
virtualization
overheads
here.
So
we
want
the
network
to
be
as
quiet
as
possible.
I
So
this
is
a
main
result
is
presented
here.
This
is
our
tt
bias,
so
it's
the
difference
between
the
reported
RTT
from
one
of
these
sources
and
the
actual
ITT
is
measured
in
hardware.
So
the
blue
trace
on
the
top
is
the
RTT
that's
reported
from
inside
the
container
the
raw
RTT
without
any
modification
so
and
the
cutoff
here.
But
the
x-axis
is
the
number
of
traffic
generating
containers
that
are
running
co-located
on
the
same
host,
so
you
can
see
even
with
just
30
other
containers
running
on
the
host
we've
already
incurred.
I
You
know
over
300
microseconds
of
additional
agency
on
our
measurements
that
are
taken
from
inside
the
containers
and
to
to
to
compare
this
with
the
sort
of
native
versions.
We
have
the
green
in
the
great
race,
so
these
are
native
references
that
are
taken
from
the
host
of
course
they're,
not
without
their
own
overheads.
So
there
was
a
bias
between
them
and
the
hardware
OTT,
and
then
we
finally
showed
the
container
corrected
result,
which
is
the
black
trace,
might
visible
above
the
green
and
gray
traces
there.
I
So
those
the
one
that
we
described
previously
as
being
the
RTT
taken
from
the
container
with
the
latency
subtracted.
So,
in
other
words,
after
we've
accounted
for
these,
ladies
using
mice,
we're
able
to
estimate
the
same
RTT
results
that
we
would
get
from
running
the
process
natively
to
within
20
microseconds.
In
the
case
and
a
noble
observation
here,
the
sort
of
a
secondary
is
that
all
traffic
or
all
of
these
are
TT
measurement
tools-
are
impacted
by
the
addition
of
extra
traffic
going
through
the
virtualization
layers.
I
So
it's
not
just
tools
of
learning
from
the
container,
whatever
you're
running
your
to
of
this
measuring
RTT.
If
there
is
extra
traffic,
either
going
through
containers
on
the
host
or
other
processes,
you'll
notice
that
you'll
get
different
you'll
get
some
bias
in
your
your
resulting
measurements.
So
because
we're
using
hash
tables
I'll
go
through
this
pretty
quickly.
It's
just
me.
Sometimes
we
get
hash
collisions
which
causes
us
to
lose
particular
packets
as
we're
correlating
them
through.
I
So
we
report
a
sort
of
coverage,
which
is
the
number
of
latency
reports
divided
by
the
number
of
packets
as
a
percentage,
and
this
this
part
is
a
little
bit
scale
when
I
first
drew
it,
but
then
I
realized.
The
bottom
line
here
is
actually
95
percent,
so
we're
doing
okay,
we
noticed
that
we
can
kind
of
solve
this
problem
by
adding
larger
hash
tables.
I
So
if
your
hash
table
so
loud,
you
get
less
collisions
so
there's
kind
of
a
trade-off
between
how
much
memory
you
want
to
allocate
for
the
base
system
and
how
how
much
coverage
you
want,
how
many
of
the
packets
that
you
actually
need.
Rtt
reports
on
so
to
address
one
of
our
other
goals,
of
course,
minimizing
perturbation.
We
measure
perturbation
as
the
difference
between
the
instrumented
RTT
reported
and
the
RTT
without
instrumentation.
I
So
this
is
basically
we
just
discard
the
the
latency
results
from
the
measurement,
but
we
want
to
see
how
does
inserting
this
this
tool
affect
the
RTT
results,
and
so
we
compare
ourselves
with
with
F
trace
so
far,
so
that
blue
and
the
green
traces
on
this
chart
our
F
trace
running
from
the
container
and
from
the
native
context
respectively,
and
the
black
and
the
gray
traces.
Our
are
the
the
same
perturbation
calculation,
but
taken
with
our
solution
base
running
instead
of
F
trace.
I
So
we
notice
is
that
as
there's
more
traffic,
the
F
T
of
the
F
transportation
increases
significantly,
and
so
this
is
actually
an
effect
that
all
traffic
on
the
host
would
see
even
traffic
that
was
not
associated
with
the
measurements
from
running
F
trace.
So
this
is
again
goes
like
this
point:
why
we
don't
want
to
use
these
generic
tracers
or
this
kind
of
application,
but
we
noticed
that
mace
scales
decently.
It
scales
better.
I
Some
initial
evaluation
of
the
runtime
of
the
different
components
of
mace
just
presented
here
so
on
the
left
of
the
berries
again,
the
baton
is
cut
off
a
little
bit
here,
but
the
group
of
balls
on
the
left
represents
the
results
running
in
the
container
from
from
a
measurement
running
in
the
container,
the
group
of
files
on
the
right
that
present
results
running
from
measurement
from
native
space.
So
then,
within
that
the
subgroup
on
the
left,
there's
four
bars:
the
blue
red,
yellow
and
light
blue
bars.
I
Those
are
their
system
trace
probes,
so
we
notice
that
they
hit
the
most
and
they're.
Also
the
slowest
running
functions.
So
really
we
want
to
kind
of
focus
our
efforts
there,
the
pink
and
purple
bells
are.
The
function
is
responsible
for
managing
the
ingress
and
egress
correlation
hash
tables,
and
the
green
bar
is
the
function
responsible
for
pushing
events
into
the
namespace
ring
buffers.
So
you
notice
that
the
system
call
trace
prints
are
the
slowest,
and
this
is
because
our
current
correlation
scheme
actually
requires
a
system
called
trace
points
to
do
copies.
I
I
So
before
so
our
goals
for
the
future
of
this
real
quickly.
We
want
to
add
tcp
and
UDP
support,
of
course,
we're
just
doing
ICMP
for
now,
but
you
could
imagine
that
ICMP
is
a
small
fraction
of
what
internet
measurement
cares
about
and
we
want
to
give
us
users
with
things
like
scan
through
TCP
ping
or
UDP
thing
we're
looking
at
using
Hardware
time
stamps.
I
These
are
available
pretty
commonly
on
a
lot
of
current
NICs,
and
if
we
can
use
Hardware
time
sense,
then
we
can
account
for
not
only
the
virtualization
latency
but
like
all
of
the
latency,
this
incurred
when
you're
sending
packets
down
to
the
hardware
level
better
in
flight
correlation,
as
I
said
earlier.
The
correlation
that
we're
using
right
now
requires
these
really
expensive
copies.
So
it
means
either
stick
stuff
on
the
SKF,
sir,
finds
another
correlation
scheme,
and
we
also
need
to
address
the
ease
of
application
of
the
correlation.
I
So
what
I
mean
by
that
is,
you
saw
earlier
reporting,
ICMP
sequence
number
and
the
reason
that
we
have
to
do.
That
is
because
otherwise,
it's
very
very
hard
to
connect.
The
latency
result
that
you
get
if
you're
doing
put
packet
Layton
sees
to
the
actual
packets
that
you
were
sending.
We
tried
just
using
time
stamps
and
the
accuracy
was
just
too
far
off
at
that
time
scale.
So
thinking
about
using
TCP
sequence
numbers
or
some
other
application,
their
correlation.
I
The
other
thing
that's
possible
is
that,
since
we're
doing
processing
in
kernel
space,
we
could
also
be
running
threads.
That
would
be
calculating
like
like
aggregation
functions
over
latency,
so
we
could
be
doing
moving
averages.
We
could
be
doing
minimums,
maximums,
histograms
or
other
things
again.
These
are
just
things
to
tack
on
to
the
implementation.
We're
also
looking
at
applying
this
tool.
I
So
we
want
to
use
it
to
see
if
we
can
improve
measurement
accuracy
significantly
for
like
geolocation,
for
estimation
of
costs
and
for
businesses
that
care
about
latency,
and
we
also
want
to
start
working
on
developing
virtual
network
telemetry
systems.
So
this
is
a
big
issue
that
I
was
talking
about
earlier
when
you
observe
a
latency
in
your
network,
especially
in
a
virtualization
these
days,
it's
very,
very
hard
to
know
who's
responsible.
I
So
we
want
to
kind
of
use,
mace
and
some
other
tools
and
maybe
kind
of
integrate
them
into
a
large-scale
virtual
network
to
a
mature
system.
That
would
answer
those
questions,
so
you
could
ask:
where
is
this
latency
coming
from?
And
we
say:
oh,
it's,
the
it's
the
bridge
network
or
is
the
overlay
network,
so
that's
another
sort
of
destination
for
these
results.
As
far
as
we
know,
if
there
are
others
that,
let
me
know
who
we're
open
to
whatever
so
suddenly
on
make
presented
some
of
these
issues.
I
We
proposed
our
solution
and
we
showed
the
evaluation
of
our
solution.
So
thank
you.
This
work
is
supported
by
the
Euro,
be
PRI
and
I
self,
like
to
thank
unanimous
viewers
for
their
feedback,
and
also
the
cloud
lab
team
for
maintaining
this
great
public
resource
I'll
be
happy
to
take
any
questions
just.
A
A
J
Hey
hey,
thank
you
so
much
for
the
introduction.
This
is
a
doing
work
with
people
from
university,
attics
and
research
and
Nokia,
but
labs
spring
on
telefónica.
So
the
rationale
of
the
system
is
to
try
to
summarize
the
coordinate
standardization
efforts
into
the
IETF
application
layer,
traffic,
optimization
working
group
to
support
multi
domain
use
cases
and
then
the
IDS
to
to
provide
a
keep
main
requirements
on
neighbor
information
exposure
to
support
such
party,
lemon
juice,
Cadiz
and
finally,
provide
information
about
new
and
novel
mechanisms
and
instructions
to
to
improve
the
base.
J
So
the
protocol
provided
information
to
application
in
order
to
try
to
improve
maintaining
the
application
performance.
Basically,
the
neighbor
information
is
posted
as
a
start,
mapped
network
maps
called
Maps,
unifier
property
maps,
etc.
In
order
to
try
to
protect
the
information
privacy
and
improve
the
scalability,
the
rings
of
the
Delta
protocol,
a
typical
application
where
a
host
disappear,
trying
to
connect
without
it
for
file
sharing
oriented
communication
but
or,
however,
now
Alta
is
going
to
be
considering
us
as
an
another
SEO
many
scenarios,
such
as
a
data
center
network
on
Network,
etc.
J
Multiplying
can
be
a
peer-to-peer
playing
application
or
can
be
a
racist
directory.
You
know
you
know
such
as
a
pictorial
tracker,
for
example,
in
the
in
the
later
case.
The
peers
are
not
able
to
to
request
information
directly
to
the
altar
server
instantly
reserve
the
right
to
request
information
to
volantis.
Over
and
after
we
provide
information
to
the
port.
J
Could
only
be
the
alto
provide
a
generic
architected
to
provide
native
information
for
application
in
this
picture
we
have
a
high-level
overview
of
T
Alto
mechanism,
Bank
instructions,
for
example
the
alto
mechanisms.
We
have
the
information
that
says
directory
that
allowed
to
provide
information
about
the
available
information
versus
directory.
We
have
the
information
consistency
to
to
specify
the
dependency
between
difference,
information
resources.
J
We
have
an
information,
update
model
to
string
control,
push
an
incremental
update
of
information
resources
and
also
alpha
introduced
instructional
modules,
difference
of
sexual
models
such
as,
and
that
were
not
on
course
not
in
order
to
to
allow
network
location
grouping
and
across
between
them.
They
patch
backdoor
abstraction
to
provide
a
more
detailed
information
about
the
routing
using
a
strap
network
elements
and
capability
maps
to
provide
unify
property
maps
are
pro
prints
capability.
J
So
this
is
an
overview
about
the
cooling
October
connect.
We
have
seven
orifices
a
for
example
from
debugging
staining
and
the
ricotta
means
it's
worth
define
a
device
at
the
protocol,
and
there
are
three
adults
to
your
FEC
related
to
this
ever
discovery.
Be
planning
consideration
a
multi-course
map.
There
are
serving
internet
graph
to
extend
the
debates,
Alpha
Protocol
and
a
set
of
individual
graph,
some
of
the
related
to
multi
domain.
J
A
scenarios
here,
I
am
including
a
some
potential
for
future
topics
under
discussion
to
reconsider
it,
some
of
them
related
to
the
extension
for
new
architect
hoods
extending
for
new
settings
such
as
in
the
pH
cloud,
or
the
integration
of
the
information
model
and
control
model,
and
yet
is
also
to
be
seeing
the
Delta
survey
in
order
to
map
dickering.
Although
implementations
in
real
use
cases
scenarios
so
currently
multi
domain
use,
cases
are
emerging
and
re-emerging
will
be
developed
of
new
technologies
such
as
ABA
NFP
of
IG.
J
It's
an
example
of
such
in
this
cases,
including
multi
domain
collaborative
data.
Science
will
t
remain
flexible,
inter-domain
routing.
In
the
past
few
ITF
alto
meetings,
difference
individual
graph
were
soon
meeting
summarizing
the
experience
of
developing
multi-domain
application
using
alt.
Here
we
have
an
incomplete
list
of
of
them
so
related
to
the
mint,
even
in
collaborative
data
science.
Here
we
have
a
couple
of
examples
on
premier
scientific
experiments,
for
example,
beloved
hi
uncle
along
here
square
kilometer
array.
Such
experiment,
push
scientific
discovery,
bond
bodies
and
rely
on
workflows.
That
coordinator
graphically
is
the
material
sources.
J
Here
we
have
the
example
of
the
morning
of
the
ellipse.
Let's
see
beta
from
poor
102m
dirty
a
storage
site,
this
type
of
scientific
workflow
use
multiple
domains
with
a
Bella
Union
resources
to
Lang
difference
application
objects
of
the
key
requirement.
East
above
distributed
world
flow
ability
to
decide
multiple
resources
across
multiple
domains.
Therefore,
apologia
news
applications.
J
Remain
Seraphin
change
changing
and
entering
that
your
service
often
requires
physical
arrears
and
inflation
in
a
specific
order,
a
quarantine
mobile
operator
is
told
of
an
employee
to
a
customer.
The
possibility
of
configuring
network
services
where
that
recipient
resources
and
perform
on
such
the
resources
are
are
expected
to
be
available
across
multiple
domains
may
different
technologies
or
different
administrations.
J
As
I
mentioning
Antoinette,
all
services
is
its
composite
by
a
set
of
built
on
mobile
functions
as
an
asset
off
of
that
foggy
deployment.
Official
problem
in
both
the
to
discover
the
best
scandal,
openings
and
debates
feasible
path
to
connect
such
domains
in
order
to
take
up
a
nice
implement
efficient
network
management
need
to
need
to
implementing
an
every
mentor
in
order
to
to
get
a
real-time
representation
of
the
available,
noble
and
trusted
to
resources
once
in
every
mentor
is
being
an
information.
J
Here
we
have
another
potential
multi-domain,
a
scenario:
well,
the
network
arrived
I
was
bummed
if
I
granted
capability
of
fvm
from
inter-domain
setup
to
multi
domain
settings
to
provide
flexible
inter
domain
routing,
traditional
inter
domain
routing
protocols
are
limited
because
usually
they
provide
a
single
pair
outing
or
limiting
the
clients,
bad
choices,
so
flexible,
multi
domain
wrapping
allowed
users
to
specify
routing
action
at
the
provider
network.
This
is
a
with
more
flexible
matching
condition
and
no
choices
on
growth
so
by
the
network.
J
Here
we
have
a
couple
of
requirement
for
Alta
to
support
new
multi,
the
main
use
cases.
A
modern
use
cases
require
information
on
properties
and
capability
for
different
network
sources,
including
transport
resources,
processing.
The
sources
is
startups
or
services.
So
a
devious
case,
you
need
to
akka
state
multiple
sources
in
multiple
networks
off
so
bad,
a
unified
representation
or
capabilities
of
multiple
sources
is
a
key
requirement
for
news.
Even
a
negative
formation
is
pressure.
J
Existing
representations
tend
to
focus
on
single
domains,
so
in
the
given
name
is
cases
that
related
information
career,
a
target
from
multiple
network
to
compute
into
an
information.
As
such
a
structure,
that's
apolog
regression
of
multiple
network
into
a
single
and
built
on
mobile
is
another
key
requirement.
J
So,
in
order
to
the
multi
domain,
abstraction
is
use
it
to
to
solve
the
application
layer
organization
across
multiple
domains.
Typically,
previous
cases
a
can
be
modulate
and
a
certain
accession
problem
F,
where
F,
if
initiation
function,
basic
all
two
to
type
of
it's
representing
the
network
parameters
and
we're
representing
the
application
parameters.
J
Such
a
AP
function
is,
is
also
contain.
Two
type
of
constraints-
X,
representing
the
networking
constraints
and
Grider,
presenting
the
application
constraints
and
the
lack
of
true
knowledge,
me
or
or
controller,
will
be
network
limit.
The
potential
values
of
X,
for
example,
bandwidth,
delay,
packet,
noise,
etc.
So
the
multi
domain,
networking
after
I
shown,
is
aiming
to
at
providing
such
constraint
in
order
to
solve
this
optimization
problem.
J
What
is
the
basic
formulation
application
that
we
network
by
asking
to
carry
traffic
for
a
set
of
flow?
In
this
example,
we
have
two
flows:
everyone
left
that
will
travel
different
networks,
each
H
a
flow
has
a
set
of
property,
for
example,
Pat's
representing
the
frequency
of
network
device
that
the
packet
of
flow
will
travels,
delay
available
bandwidth,
etc.
In
multi
domain
setting,
another
property
may
involve
network
properties
of
multiple
components,
for
example
in
case
of
being
denied
property
momentum.
A
setting
should
be
the
zoom
of
the
delay
delay
property
in
each
network.
J
J
A
H
J
Yes,
am
I
right,
the
idea
was
to
try
to
to
show
a
be
covering
a
standardization
effort
into
the
alto
working
group
provide
new.
These
are
okada
means
to
improve
the
base
at
the
protocol
to
give
in
to
the
the
more
experimental
evaluation
to
the
mental
many
scenarios,
and
the
idea
is
to
try
to
embody
a
2d
to
the
to
the
authorization
that
we
may
place.
If
this
starts
a.
J
A
Thank
you.
Let's
thank
the
speaker
and
there
is
a
hard
problem
to
tackle.
So
thank
you
for
presenting
that
I
think
Colin
wants
to
make.
Oh,
let's
have.
If
there's
any
one
question
we
can
manage
one.
Otherwise
we're
going
to
turn
it
back
over
to
the
IRT
f
chair
Colin
for
his
closing
remarks:
Oh
Philippa!
First,
okay,
all
right!
So
thank
you
very
much.