►
From YouTube: IETF113-IRTFOPEN-20220322-1330
Description
IRTFOPEN meeting session at IETF113
2022/03/22 1330
https://datatracker.ietf.org/meeting/113/proceedings/
A
A
A
A
B
All
right,
it's
a
little
after
half
past
one
yeah.
I
guess
I've
left
a
half
past
two
in
vienna.
From
what
I
can
see.
It
seems
we
have
a
reasonable
set
of
people
in
the
room
and
online,
so
we
should
get
started.
Welcome
everybody.
This
is
the
irtf
open
meeting.
My
name
is
colin
perkins,
I'm
the
irtf
chair
and
I
as
you'll
see
I'm
I'm
remote
for
this
meeting.
Thank
you
to
brian
trammell,
who
is
standing
in
in
person
and
looks
like
he.
B
Thank
you
for
standing
in
brian
and
yeah.
Let's,
let's
get
started
so
as
as
usual.
I'd
like
to
start
with
a
reminder
of
the
intellectual
property
rules.
B
This
is
an
irtf
meeting
and
by
participating
you
agree
to
follow
the
irtf's
processes
and
policies
around
intellectual
property,
and
this
means
that
you,
if
you're
aware
of
any
patents
or
patents,
applications
relating
to
the
work
you're
talking
about
in
this
meeting,
then
then
you
need
to
disclose
those
and
the
details
are
in
the
documents
linked
from
the
slides.
I
didn't
remove.
One.
B
If
you're
participating
in
person
and
you're
not
wearing
one
of
the
red
do
not
photograph
lanyards,
then
you
can
send
to
appear
in
such
recordings
and
if
you
speak
at
the
microphone
or
if,
if
you're
participating
remotely-
and
you
turn
on
your
camera
and
microphone,
then
then
your
you
are
consent
to
appear
in
these
recordings
and
a
reminder
that
this
meeting
is
being
recorded
and
live
streamed.
B
B
You
provide
accordance
with
the
privacy
policy
listed
also
a
reminder
that,
as
a
participant
or
an
attendee
in
these
meetings,
whether
that's
in
person
or
remote,
and
on
the
mailing
lists,
as
well
as
in
the
meetings
that
you
agree
to
to
work,
respect
respectfully
with
the
other
participants,
and
you
agree
to
follow
the
code
of
conduct
and
the
anti-harassment
procedures
and
so
on,
which
are
listed
on
the
slide.
B
If
you
have
any
questions
about
this,
please
do
contact
the
ombuds
team
and
again
the
contact
details
are
listed
there
and
that
they
will
help
you
out.
B
B
There's
a
special
icon
in
the
data
tracker
agenda
which
allows
you
to
join
using
that
tool
and
you
need
to
use
meet
echo
to
join
the
microphone
queue
since
we're
managing
a
uniform
microphone
queue
for
both
the
local
and
the
online
participants
and
if
you're,
using
the
the
on-site
version
of
muteco,
keep
your
audio
and
video
off
for
remote
participants.
B
Please
leave
your
audio
and
video
off
unless
you're,
cheering
or
presenting
or
asking
a
question
during
the
session
and
again
there's
a
raise
hand
button.
You
can
use
to
join
the
queue.
If
you
have
questions,
if
you're
having
issues
with
the
the
av
equipment,
the
the
url
on
the
slide
allows
you
to
report
those
or
mention
me
tekko
in
in
the
chat.
B
B
My
my
usual
reminder
that
the
iatif
is
a
research
organization.
No,
it's
not
a
standards,
development
organization
in
the
irtf
we're
here
to
do
research
and
we're
here
to
provide
a
venue
where
the
academic
research
community,
the
industry,
research
community
can
interact
with
the
the
engineers
and
and
the
standard
setters
in
the
ietf.
B
But
this
is
primarily
a
discussion
forum
and
it's
and
it's
a
forum
to
make
connections
and
while
the
irtf,
because
it's
associated
with
the
ietf,
can
publish
informational
or
experiment
experimental
documents
as
rfcs,
the
primary
output
of
the
research
groups
is
expected
to
be
understanding
and
research
results
and
academic
papers,
primarily
rather
than
necessarily
rfcs.
B
The
irtf
is
structured
as
a
number
of
research
groups.
The
the
slide
shows
the
groups
which
are
currently
active,
those
which
are
highlighted
in
dark
blue
on
the
slide,
the
crypto
forum
group,
the
pathway,
networking
group,
information-centric,
networking
measurement
and
analysis
of
protocols,
network
management
and
human
rights
are
still
to
meet.
Those
which
are
highlighted
in
in
light
blue
quantum
internet
group
met
in
the
previous
session.
The
the
global
access
group
earlier
this
morning
and
the
privacy
enhancements
group
yesterday
and
the
others
are
not
meeting
this
this
week.
B
A
number
of
them
have
interim
meetings
closely
located.
B
So
do
please
look
out
for
those
those
sessions
later
in
the
week
and
do
do
do
join
those
sessions
also
for
the
research
groups.
I'd
like
to
welcome
sophia
jelly,
who
joi
recently
has
joined
as
co-chair
of
the
human
rights
protocol
considerations
group,
which
is
meeting
tomorrow
morning
and
who's,
replacing
every
doria
who
who
stepped
down
last
year.
B
B
One
of
these
is
the
applied
networking
research
prize,
which
is
something
we
we
organize
in
conjunction
with
the
internet
society
and
the
applied
networking
research
prices
are
awarded
for
the
that
the
best
recent
results
in
applied
networking.
It's
awarded
for
interesting
new
results
that
are
potentially
of
future
relevance
to
the
standards
community
and
it's
awarded
to
try
and
recognize
upcoming
people
that
are
likely
to
have
an
impact
on
the
standards
and
technologies
in
the
future.
E
B
The
first
talk
will
be
by
sangeeta
abdul
jyoti
for
her
work
on
the
resilience
of
the
internet
infrastructure,
to
solar
events,
coronal
mass
ejections
and
so
on,
and
the
second
talk
will
be
by
bruce
spang,
who
will
be
talking
about
a
b
testing
and
some
of
the
the
fairness
problems
that
arise
and
the
biases
that
arrive
when
conducting
a
b
tests
at
the
presence
of
network
congestion,
and
we've
got
two
two
really
nice
talks
coming
up,
and
there
are
also
recordings
of
these
talks
and
slides
available
links
to
the
papers
on
the
website.
B
So
so
pl,
please,
please
do
stick
around
for
those.
We
got
two
two
really
good
talks
coming.
B
In
addition
to
the
the
prices,
we
also
run
the
applied
networking
research
workshop,
which
we
organize
in
conjunction
with
acm
sitcom.
B
D
B
Expected
to
be
a
hybrid
meeting
with
the
in-person
components
in
philadelphia,
I
believe
the
program
chairs
for
that
workshop
this
year
are
tj
chung
from
virginia
tech.
I'm
marwan
fayad
from
cloud
cloudflare.
The
call
for
papers
has
just
gone
out
recently.
The
paper
submissions
are
due
at
the
end
of
april
and
the
the
link
on
the
slide
again
points
to
the
call
for
papers
and
more
details
about
that.
B
Please
do
consider
submitting
your
research
results
to
the
applied
networking
research
workshop.
This
is
a
forum
for
for
the
researchers,
vendors
operators
and
the
standards
community
to
come
together
and
talk
about
their
their
latest
applied.
Networking
research
results.
We've
had
a
bunch
of
really
good
papers
in
previous
years,
so
so,
please
do
consider
submitting
this
year.
B
And
the
final
thing
I
wanted
to
to
mention
before
we
go
into
the
talks
was
that
we
are
pleased
to
have
been
running
a
travel
grant
program
sponsored
by
netflix
and
comcast,
and
it's
it's
good
to
get
back
to
this.
B
It's
good
to
you
know,
have
have
people
back
in
in
person
to
at
least
a
limited
degree
at
this
meeting,
and
we're
very
pleased
to
have
provided
a
number
of
travel
grants
for
early
career
academics
and
phd
students
from
underrepresented
groups
to
attend
the
irtf
meetings
that
are
co-locating
with
the
ietf
this
time
going
forward.
B
We
we
do
expect
to
offer
travel
grants
for
the
next
itf
meeting
in
july
and
for
the
applied
networking
research
workshop
and
keep
an
eye
on
the
url
on
the
slide
for
details
of
those
going
up
over
the
next
few
weeks.
B
And
with
that
I
am
done
what
we
have
coming
up
next
is
the
applied
networking
research
price
talks,
first
being
sangeeta,
and
secondly,
in
about
30
minutes.
Also,
bruce
will
be
talking
about
unbiased
experiments
in
congested
networks.
B
F
B
Should
be
a
little
document
icon
just
next
to
the
hand
icon
in
the
top
left.
B
Okay,
so
while
that
is
loading,
the
first
of
the
awards
today
goes
to
sangeeta
abdul
jyoti
for
her
work
on
the
resilience
of
the
internet
infrastructure
to
solar
superstorms
sangita
is
an
assistant
professor
at
the
university
of
california
irvine
and
an
affiliated
researcher
at
vmware
research,
and
she
leads
the
networking
systems
and
aaai
lab
at
uc
irving
and
her
research
is
on
internet
resilience,
systems
and
machine
learning.
B
The
paper
underlying
this
talk,
storms
planning
for
an
internet
apocalypse,
was
first
presented
at
the
acm
sitcom
conference
last
year.
Singita
over
to
you.
B
D
D
The
revenue
loss
per
hour
for
some
of
the
most
companies
are
estimated
to
be
tens
of
millions
of
dollars
and
at
the
scale
of
countries
it
can
be
hundreds
of
millions
per
hour
or
in
billions
per
day,
and
it's
not
just
the
economic
impact.
Many
of
the
other
critical
infrastructure,
such
as
health
care,
depend
on
the
that
depend
on
the
internet
and
will
be
severely
affected
now.
What
if
we
have
an
internet
outage
lasting
weeks
or
months
and
spanning
large
areas
across
the
globe?
D
One
might
think
that
such
a
global
outage
is
never
going
to
happen,
because
the
internet
is
racing
and
distributed
systems,
but
unfortunately
this
is
a
worst
case
scenario
that
could
happen
in
our
lifetime.
D
D
Their
impact
on
in
global
infrastructure
varies
widely
solar
storms
are
sorry.
Oh
yeah,
solar
flares
involve
large
amounts
of
emitted
energy
in
the
form
of
electromagnetic
radiation,
so
they
essentially
flashes
of
light
that
reach
the
earth
in
just
eight
minutes.
D
Now
coronal
mass
ejections
involve
emission
of
electrically
charged
solar
matter
and
the
accompanying
magnetic
field
into
the
space,
and
these
cmes
can
take
anywhere
from
13
hours
to
5
days
to
reach
the
earth
based
on
their
speed,
and
these
cmes
are
capable
of
causing
significant
damages
to
crystal
infrastructure.
So
cmes
are
the
focus
of
this
talk
and
we'll
discuss
their
impact
more
closely.
Soon.
D
Both
solar,
flares
and
cmes
originate
near
temporary,
dark
spots
on
the
sun,
caused
by
concentration
of
magnetic
field
flux
and
which
are
sunspots,
and
when
the
number
of
sunspots
on
the
surface
of
the
sun
increases,
there
is
a
higher
probability
of
cmes
occurring.
D
The
magnetized
cmes
are
highly
directional
and
when
the
earth
is
in
the
direct
part
of
a
cme,
it
interacts
with
the
earth's
magnetic
field
and
induces
large
electric
field
on
the
earth's
surface
through
electromagnetic
induction,
and
this
can
cause
geomagnetically
induced
currents
to
flow
through
long
cables
that
have
ground
connections
that
are
located
far
apart
on
the
earth's
crust,
and
this
could
include
power,
transmission
lines,
oil
pipelines
and
internet
cables
that
have
an
electric
connection
that
spans
hundreds
of
thousands
of
kilometers.
D
So,
due
to
the
orientation
of
the
earth's
magnetic
field,
higher
latitudes
are
at
a
significantly
higher
risk
and
it's
for
the
same
reason
that
auroras
are
common
closer
to
the
poles
and
since
the
this
impact
is
caused
by
interactions
with
the
earth's
magnetic
field,
the
induced
currents
can
affect
wide
areas
and
are
not
just
restricted
to
the
portion
of
the
earth
that
is
facing
the
sun.
D
The
gic
is
only
used
in
cables
with
ground
connections
and
a
conductor
where
the
ground
connections
are
located.
Far
apart
on
the
earth's
surface
and
finally,
the
orientation
of
the
conductor
does
not
impact
the
risk
associated
with
gic,
so
the
north,
south
or
east-west
orientation
of
the
cable
does
not
influence
the
induced
current.
D
Now,
let's
move
on
to
the
more
important
question:
when
will
a
large
event
that
affects
the
earth
happen
next,
so
solar
events
are
extremely
hard
to
predict,
just
like
earthquakes
or
any
other
natural
disasters.
Small
scale,
solar
storms
happen
all
the
time.
What
we
are
more
interested
in
are
solar
super
storms
that
can
have
a
significant
impact
on
our
lives.
D
The
1859
event
is
popularly
known
as
the
carrington
event
and
both
these
events
triggered
extensive
power,
outages
and
caused
significant
damage
to
the
communication
network
of
that
time,
which
was
the
telegraph
network
and
the
carrington
scale
event
missed
the
earth
by
just
a
week
in
2012.
Very
recently,
now
the
estimates
of
estimates
for
probability
of
occurrence
of
such
extreme
space
weather
events
that
directly
impact
the
earth
ranges
from
1.6
percentage
to
12
per
decade.
D
But
that's
not
all.
The
risk
is
not
uniform
across
the
years,
because
the
solar
activity
goes
through
cycles.
So
here
we
have
the
variation
in
the
number
of
sun
spots
across
years
from
1700
or
so
to
until
recently-
and
here
we
can
see
that
the
solar
activity
waxes
and
rains
in
cycles
with
a
period
length
of
approximately
11
years.
D
So
during
solar
maxima,
there's
an
increase
in
the
number
of
sunspots
and
hence
an
increase
in
the
frequency
of
cmes
and
other
solar
elements.
And
it's
not
just
that.
The
solar
cyc
solar
activity
also
goes
through
a
longer
term
cycle
that
is
approximately
80
to
100
years,
so
the
peak
solar
activity
very
significantly
across
this
hundred
year
cycle
as
well.
D
So
this
causes
the
frequency
of
solar
events
to
vary
by
a
factor
of
four
across
solar
maxima
in
this
hundred
year
cycle,
and
if
we
zoom
in,
we
can
see
that
modern
technological
advancement
coincided
with
the
period
of
weak
solar
activity
in
the
past
three
decades.
D
Now
we
are
at
the
beginning
of
solar
cycle
25
and
the
sun
is
expected
to
become
more
active
in
the
near
future,
but
due
to
the
absence
of
extreme
activity
in
the
recent
past,
we
have
overlooked
the
impact
of
these
events
on
the
internet
infrastructure.
D
D
D
So
today
we
know
that
long
distance,
land
and
submarine
cables
carry
signals
in
optical
fibers,
and
this
optical
fiber
itself
is
immune
to
induced
currents
because
it
carries
light
and
not
electric
current,
but
these
cables
also
have
repeaters
at
50
to
150
kilometer
intervals
and
which
are
connected
in
series
and
powered
by
a
conductor
that
runs
along
the
length
of
the
cable,
and
this
conductor
is
susceptible
to
damages
from
induced
currents.
D
So
today
we
don't
have
any
good
failure
models
for
understanding
the
failure
characteristics
of
long
distance
cables,
but
the
expected
induced
voltages
along
the
length
of
the
cable
could
be
one
or
two
orders
of
magnitude
higher
than
what
the
power
system
associated
with
the
cable
today
can
is
capable
of
handling,
and
today's
cables
have
not
been
stressed
tested
on
the
induced
voltages.
D
So
long
distance
cables
are
in
short,
vulnerable
internet
routers
can
be
protected
from
direct
voltage
surges
using
voltage
suppressors,
so
most
of
our
localized
infrastructure
can
be
protected,
so
they
are
safe.
Now,
moving
on
to
wireless
infrastructure,
satellites
are
directly
exposed
to
solar
storms.
D
They
generally
have
radiation
shielding
for
protection
from
high
energy
electrons,
but
with
very
strong
storms
can
cause
electrons
to
penetrate
deeper
into
the
interior
regions
of
satellite
and
damage
its
electronic
confidence
and
solar
storms
can
also
cause
drag
on
satellites
which
can
lead
to
satellites
losing
their
orbit
and
re-entering
the
atmosphere
and
burning
up.
D
So
very
recently,
there
was
news
on
40
starling
satellites
that
were
lost
to
geomagnetic
storm
when
a
small
scale
storm
hit,
but
the
the
satellites
were
just
being
deployed
and
they
were
at
lower
altitudes
where
this
risk
of
drag
is
much
higher
and
that's
how
they
were
lost.
So
satellites
in
general
are
vulnerable
and-
and
this
is
well
known,
cell
towers
are
protected
from
direct
exposure
because
they're
on
the
on
the
ground
and
protected
by
the
atmosphere.
D
Similarly,
personal
devices
such
as
laptops,
laptops
and
mobile
phones
are
also
safe.
So
to
summarize
long
distance
cables
and
satellites
are
the
most
vulnerable
components
of
the
internet.
Other
components
could
suffer
from
power
outages,
but
they
are
not
susceptible
to
direct
damages.
D
Now,
analyzing
the
impact
so
to
understand
the
impact
on
internet
infrastructure.
I
looked
at
a
broad
set
of
data
sets
comprising
of
various
internet
components,
so
the
submarine
cable
map
consists
of
submarine
cables
in
connecting
various
continents
and
then
the
itu
cable
map
contains
land
cable
information
from
across
the
globe
collected
from
regional
entities.
D
D
And-
and
here
we
look
at
the
distribution
of
summary
in
cable
endpoints,
so
we
know
that
higher
latitudes
are
more
vulnerable
to
induced
currents
from
solar
events,
particularly
latitudes
above
the
40
degree
threshold,
so
above
40
degree,
north
and
below
40
degrees
south.
So
we
evaluate
the
distribution
of
infrastructure
components
in
this
region.
D
So
here
we
have
the
probability
density
function
across
various
latitudes
plotted
on
the
x
axis.
So
we
look
at
the
distribution
of
population
and
summary
cable
endpoints,
and
we
see
that
the
population
is
more
concentrated
on
lower
latitudes,
while
submarine
cable
endpoints
are
concentrated
on
higher
latitudes,
especially
between
the
us
and
europe.
We
see
a
higher
concentration
of
submarine
cable
in
points
at
more
vulnerable
regions.
D
Now
next
we
look
at
distribution
of
various
internet
infrastructure
components
across
the
way
across
the
latitudes.
So
here
on
the
x-axis
we
have
the
latitude
threshold
and
on
the
y-axis
we
have
concentration
of
infrastructure
component,
the
internet,
routers,
xps
or
dns
root
servers
above
that
threshold.
D
So
the
most
vulnerable
region
is
above
the
40
degree
threshold,
which
is
denoted
by
the
dotted
line
here,
and
here
we
can
see
that
only
about
16
percent
of
the
population
is
in
this
region,
but
35
to
45
percent
of
the
internet.
Infrastructure
is
in
the
vulnerable
region,
and
this
holds
for
other
components
as
well.
You
can
see
more
analysis
in
the
paper.
D
Next,
we
look
at
cable
length
analysis,
so
here
on
the
x-axis,
we
have
a
cable
length
in
kilometers
in
long
scale.
So
this
is
the.
This
is
a
comparison
of
cdfs
of
length
of
lan
cables
versus
submarine
cables.
D
Here,
the
submarine
cable
data
set
is
complete.
It,
it
comprises
of
almost
all
existing
submarine
cables.
The
lan
cable
data
set
is
not
complete,
but
if
this
was
the
largest
publicly
available
data
set,
so
here
shorter
cables
don't
need
repeaters.
Only
cables
have
a
longer
than
150
kilometer
need
repeaters
and
need
that
conductor
along
the
length
of
the
cable
which
is
susceptible
to
the
damages.
D
So
here
we
observe
that
more
than
70
percent
of
the
land
cables
don't
need
repeaters
and
hence
are
not
vulnerable,
but
only
about
20
percent
of
the
submarine
cables
are
shorter
than
this
threshold,
which
means
that
nearly
eighty
percent
of
the
submarine
cables
need
repeaters
and
hence
are
susceptible
to
damages
from
induced
currents.
D
D
The
u.s
faces
a
higher
risk
of
losing
connectivity
to
europe
during
a
solar
superstar.
So
in
the
east
coast,
most
cables
between
the
us
and
europe
are
concentrated
between
the
northeast
and
the
northeast
of
the
us
and
the
uk,
which
is
most
likely
done
for
lower
latency,
and
there
are
no
connections
from
florida
to
southern
europe,
which
is
in
the
less
vulnerable
region.
D
D
And
here
we
have
locations
of
public
data
centers
across
the
globe.
Red
shows
higher
density,
followed
by
orange
and
then
blue.
So
we
observed
that
data
centers
are
concentrated
in
vulnerable,
high
latitudes,
and
this
observation
also
holds
for
hyperscale
providers
such
as
google,
facebook,
microsoft,
etc.
D
D
What
what
other
infrastructure
components
so
here
we
have
some
results
on
analysis
of
on
over
dns
and
the
domain
system
and
the
autonomous
systems
connectivity,
dns,
root
servers
are
highly
distributed
and
hence
they'll
remain
reachable
even
under
very
high
network
partitioning,
however,
location
data
on
top
level
domain
servers
and
other
authoritative
servers
was
not
available,
so
that
analysis
has
not
been
done
yet,
but
root
servers
are
highly
distributed
and
reachable
when
we
analyze
autonomous
systems.
D
So
next
I
discuss
some
of
the
open
questions
and
challenges
related
to
this
thread.
D
So
the
paper
my
work
primarily
looked
at
internet
partitioning,
based
on
submarine
cables
alone.
Understanding
the
impact
on
end-to-end
behavior
of
applications
remain
an
open
challenge.
This
needs
to
take
into
account
both
land
and
submarine
cables
and
also
better
failure
models
for
cable
failures.
So
currently
there
are
some
geophysicists
working
on
developing
failure
models
for
internet
cables.
D
D
Next,
we
need
to
devise
solutions
for
improving
long-term
resilience
of
the
infrastructure.
For
example,
this
could
involve
adding
new
cables
in
locations
that
are
less
vulnerable.
D
D
D
So
this
gives
us
a
short
interval
to
prepare
for
impact.
So
how
can
we
use
this
lead
time
effectively?
We
can
develop
plans
that
allow
us
to
prepare
for
an
impact.
So
maybe
we
can
cache
data
across
data
centers
globally,
in
an
effective
way.
How
can
internet
service
providers
react?
Do
we
need
to
increase
cache
time
in
the
dns
system?
So
there
are
a
lot
of
interesting
questions
on
how
we
can
use
this
lead
time
effectively
and,
more
importantly,
we
also
need
to
rethink
our
models
of
failure
analysis.
D
D
D
Now
can
we
look
at
alternative
solutions
that
can
provide
temporary
connectivity
after
an
impact,
so
examples
could
be
internet
collected
balloons
or
high
altitude
platform
stations
that
are
mobile,
and
ideally
these
solutions
should
rely
on
renewable
power
sources
such
as
solar
energy,
to
guarantee
connectivity
even
during
power
outages.
So
that's
another
interesting
direction
of
research.
D
Finally,
internet
and
power
grids
are
both
designated
as
uniquely
critical,
because
all
other
critical
systems
rely
on
them.
They
are
also
interdependent
on
each
other
and
both
are
susceptible
to
failures
from
solar
superstorms.
Hence
we
need
to
study
the
joint
failure
characteristics
of
this
complex
in
the
dependent
system.
They
also
have
very
different
failure
characteristics.
For
example,
if
we
consider
the
united
states
there
are
three
regional
power
grids:
the
western
interconnection
eastern
in
the
connection
and
texas
interconnect.
D
Now,
if
the
power
grid
in
the
east
face
it
only,
it
will
not
cause
any
significant
effects
or
overload
on
the
western
inner
connection,
but
on
the
other
hand,
if
all
cables
connecting
to
the
east
coast
fail,
there'll
be
significant
shifts
in
bgp
paths
and
potential
overload
in
cables
on
the
west
coast.
So
internet
is
more
global
compared
to
power
grids,
and
even
regional
failures
can
do
something
significant
consequences
for
the
broader
internet.
So
understanding
the
interdependency
is
an
interesting
question.
D
So
these
are
just
a
few
open
problems.
There
are
many
more
now
to
summarize.
D
So
space
weather,
particularly
coronal,
mass
ejections
from
the
sun,
pose
a
significant
risk
to
our
internet
infrastructure
and
modern
technological
advancement
coincided
with
the
period
of
weak
solar
activity.
So
the
internet
infrastructure
has
not
been
stress,
tested
under
strong
solar
events
and
the
complete
extent
of
this
threat
is
yet
to
be
estimated.
D
Internet
infrastructure
components
are
skewed
towards
highly
vulnerable
regions
in
the
higher
latitudes,
and
this
can
affect
the
overall
resilience
of
the
system
based
on
preliminary
analysis,
but
a
lot
more
work
needs
to
be
done
in
this
context
and,
finally,
we
need
to
work
towards
better
understanding
and
improving
the
resilience
of
the
global
internet
infrastructure.
B
Okay,
thank
you.
Computer
very
interesting
talk
a
really
interesting
problem,
you're.
Looking
at
here
where's
do
you
have
a
question.
F
I
do
so
first
off
thank
you
for
the
excellent
work,
I'm
a
ham,
radio
operator
and
we'll
pass
this
on
to
some
friends
that
are
interested
in
it,
because
we
we
consider
that
type
of
stuff,
all
the
time
there's
a
number
of
ham.
Radio
operators
in
the
atf,
I'm
also
a
root
server
operator,
and
so
one
of
the
things
that
I'm
interested
in
is
is
it's
worth
noting?
First
off
that
the
root
instances
or
dns
instances,
you
know
maybe
that
their
particular
installation
does
not.
F
You
know,
pose
a
problem,
but
if
they
become
disconnected
from
the
rest
of
the
system,
then
they
will
fail
to
get
updates
at
which
point
dns
will
stop.
You
know
validating
after
you
know
a
while,
as
the
signatures
expire
in
the
regardless
of
the
zone,
did
you
study
islands,
or
did
you
find
you
know
isolated
islands
of
communication
where
they
would
be
functionally
cut
off?
I
mean
the
nice
thing
about
the
routing
system.
F
Is
that
you
know
if
a
major
cable
gets
taken
out,
but
maybe
there's
a
link
to
greenland
to
you
know
something
like
that
that
you
can
actually
still
get
traffic
through
it'll
become
congested,
but
did
you
were
able
to
identify
any
particular
areas
that
are
completely
shut
off,
where
you
know
they'd
be
disc,
completely
cut
off
from
communication
for
a
while.
D
Yeah,
that's
a
that's
a
very
important
question,
so
I
don't
have
a
complete
answer
yet
so
my
my
initial
paper
only
looked
at
primary
regions
of
that
will
be
affected,
but
not
the
end
to
end
disconnection
on
the
complete
graph.
So
currently
I
have
students
that
are
working
on
this
topic,
so
I
don't
have
a
complete
answer
yet,
but
it
could
be
possible
that
there
are
very
big
islands,
but
not
not
a
lot
of
tiny
islands.
D
It
seems
like
so
that,
based
on
the
preliminary
analysis,
it
most
of
asia
and
so
the
asia
to
europe
connection
will
probably
stay,
but
the
us
to
europe
might
be
more
affected,
so
complete
disconnection.
I
don't
know
at
this
point,
but
a
significant
reduction
in
capacity
between
the
bigger
land
masses
is
possible.
B
A
little
bit
of
chat
about
vulnerability
of
satellites
in
in
the
chat.
I
don't
know
if
you
saw
that.
G
D
Yeah
the
question
on
the
satellite
impact
on
satellites,
so
nicolas
has
there
are
regular
solar
storms
and,
apart
from
startling
accident
with
very
low
earth
orbit
satellites
satellites
are
fine.
Do
you
have
pointers
to
justify
the
assumptions
on
satellites
yeah?
So
that's
a
very
important
question.
So,
as
I
mentioned
in
the
talks,
these
satellites
have
protective
covering
so
so
typically
they're
deployed.
D
They
have
a
children
that
can
protect
them
from
solar
activity
for
their
lifespan,
which
is
five
to
ten
years
or
so
so
it
was
only
in
the
past
decade
or
two
that
the
number
of
satellites
grew
really
exponentially
and
we
have
not
had
a
very
large
storm
since
then,
so
the
starling
satellites
were
affected
because
after
they
were
installed
at
the
lower
altitude
than
their
final
orbit,
two
back
to
back
very
small
scale.
D
Stormed
storms
happened,
so
the
solar
storms
are
ranged,
thus
the
the
intensity
of
these
storms.
They
are,
they
range
from
g1
to
g5,
where
g1
being
the
weakest
and
g5
being
the
strongest.
So
the
storms
that
hit
the
starling
satellites
were
g1,
which
were
really
weak,
but
it
took
down
the
satellites
because
the
satellites
were
at
a
lower
altitude
than
their
final
destination
altitude.
D
Now,
if
a
g5,
which
is
the
most
intense
storm
happens,
we
don't
know
the
impact,
what
the
impact
would
be
at
even
the
current
orbital
altitudes.
So
the
radiation
shielding
can
offer
some
amount
of
protection,
but
it's
not
guaranteed
that
they'll
be
completely
protected
and
not
just
that
these
satellites
fall
out
of
orbit
very
often
and
they
have
mechanisms
to
they
orbit
and
to
in
order
to
do
that,
we
need
to.
D
We
need
to
have
connection
with
the
earth
ground
stations
which
can
detect
the
current
altitude
and
correct
the
satellite's
position,
and
the
solar
storm
can
extend
for
hours,
maybe
10
to
12
hours
in
the
case
of
varying,
then
storms.
So
if
we
lose
connectivity
to
a
satellite
for
such
extended
periods
and
they
fall
out
of
orbit,
it
is
possible
that
they
could
fault,
even
though
there
are
thrusters.
If
we
cannot
communicate
with
the
satellites
they
could
get
damaged.
D
So
before
the
current
international
space
station
was
installed.
U.S
had
something
called
skylab,
which
was
the
previous
space
station
and
that
was
destroyed
in
the
1970s
during
the
large
solar
storm
and
they
couldn't
connect
with
it
to
push
push
it
back
up
into
the
orbit.
So
with
very
large
storms
that
extend
for
hours,
it
is
possible
that
satellites
could
be
destroyed,
but
we
don't.
We
have
not
experienced
something
that
large
very
recently.
D
Yeah,
so
there
is
a
question
on:
is
the
impact
focused
on
high
latitude
or
sun
facing
side
or
are
induced
currents
the
same
globally?
So
when
it
comes
to
the
direct
impact
on
satellites
satellites
that
are
facing,
the
sun
are
at
a
much
higher
risk,
but
when
we
look
at
the
induced
currents,
these
are
caused
by
interaction
of
these
magnetic
particles
with
the
earth's
magnetic
field.
So
in
the
case
of
induced
currents,
it's
not
just
the
sun
facing
side.
D
The
dark
side
is
also
equally
vulnerable,
but
higher
latitudes
on
both
the
sun
facing
side
as
well
as
the
dark
side,
are
equally
vulnerable.
B
Okay,
thank
you.
I
I
had
a
question
I
mean,
since
this
is
an
I
etf
and
irtf
meeting.
Is
there
anything
we
should
be
doing
differently
when
we
design
protocols,
which
would
help
with
this
type
of
you
know,
help
with
resilience
to
this
type
of
event?.
D
Yeah,
so
I
I
think
we
need
to
rethink
resilience
at
every
layer
of
the
stack
so
with
dns.
As
I
said
like,
we
know
that
root
servers
are
very,
very
well
distributed,
but
we
don't
know
how
the
entire
tree
would
hierarchically
would
kind
of
fair
under
a
solar
storm.
Do
we
need
to
change
our
caching
intervals
or
change
how
we
manage
these
dns
records?
D
That's
not
clear,
and
then,
when
it
comes
to
protocols,
it
seems
like
bgp,
which
allows
only
like
a
single
path
that
might
be
too
restrictive
when
the
capacity
is
severely
limited.
D
So
that's
another
analysis
that
we,
what
we're
planning
to
do
like
how
how
would
bgp
affair
in
such
a
scenario,
but
within
an
ais
like
ospf
or
the
other
entrance
routing
protocols,
will
fare
very
well
because
they
are
decentralized
and
they
can
use
whatever
parts
are
available
in
a
decentralized
fashion,
so
they
seem
mostly
safe.
It's
the
in
their
domain
protocol
that
needs
more
investigation.
B
Yeah,
okay,
that
makes
sense,
I
guess,
there's
a
whole
bunch
of
coordination
issues
and
management's
issues
with
large-scale
circuit
click,
cloud
cloud
provider,
infrastructure
networks
and
so
on
as
well.
Yeah
yep.
So
some
interesting
problems,
yeah,
okay!
Are
there
any
other
questions
for
century?
Should
we
move
on
to
the
other
talk.
B
Okay,
I
guess
there
were
no
more
questions,
so
in
that
case,
thank
you,
sanduta
excellent
talk.
B
All
right
next
up
is
bruce
spang
bruce
there,
yes,
so
the
second
award
today
goes
to
bruce
spang
for
his
work,
showing
that
networking
algorithm
a
b
tests
can
be
biased
because
of
network
congestion.
B
Bruce
is
a
fifth
year
phd
candidate
at
stanford,
studying
internet
networking,
video
streaming
and
theoretical
computer
science
he's
also
been
a
graduate
research,
fellow
at
netflix
for
the
past
three
years,
working
on
video
algorithms
and
in
in
the
past,
he
studied
at
umass
amherst
and
also
worked
as
a
software
engineer
at
fastly.
B
The
paper
he'll
be
presenting
today
is
unbiased
experiments
in
congested
networks,
which
was
originally
presented
at
the
acm
internet
measurement
conference
last
year.
Bruce
you
should
be
able
to
press
the
share,
preloaded
slides
button,
great.
B
Share
some
stuff
slides,
not
the
screen,
just
listen
to
you.
Instead,
just
for
your
animation
itself,
it
works
better
if
you
can
do
it
with
the
slides,
but
so
we
can
scream.
B
Okay
already,
when
you.
H
Great
well
thanks
so
much
for
the
introduction,
and
I
guess
first
off,
I
wanted
to
start
by
saying
thank
you.
So
thank
you
so
much
for
this
award,
some
of
my
all-time
favorite
research
papers
have
received
this
award,
and
so
I'm
really
honored
that
our
work
is
also
receiving
this
award.
H
Our
work
is
about
running
experiments
in
congested
networks,
and
this
is
some
joint
work
with
veronica
hannan,
sravya
from
netflix
and
my
advisors,
nick
mckeon
and
ramesh
jahari
from
stanford.
H
So
we've
got
a
pretty
typical
way
that
we
do
this,
and
normally
we
come
up
with
some
intuition
about
why
the
new
album
is
better.
Maybe
we
run
some
lab
experiments
that
show
specific
instances
where
the
new
algorithm
is
better
and
then
finally
we'll
go
and
we'll
run
a
production,
a
b
test
so
like
a
randomized
experiment
on
actual
traffic,
where
we
try
out
our
new
algorithm
and
see
if
it
actually
works
well
in
practice.
H
So
this
is
a
pretty
standard
way
to
evaluate
algorithms
on
this
slide
are
a
couple
of
papers
from
the
past
decade
or
so
where
they
rely
on
an
a
b
test
to
argue
that
their
approach
is
better
than
the
state
of
the
art,
and
so
this
is
just
a
subset
of
the
the
papers
that
have
been
published
and
also
there's
lots
of
unpublished,
work.
So
the
team
that
we
work
with
pretty
closely
at
netflix.
H
H
So
in
a
b
test
we
have
a
whole
bunch
of
users,
and
so
there's
normally
a
bunch
of
smiley
faces
on
this
slide.
So
imagine
a
bunch
of
smiley
faces
here.
These
smiley
faces
are
going
to
be
units,
and
these
units
are
something
like
users
they're
like
video
sessions,
they're
servers
and
what
we
do
is
we
randomly
assign
traffic
to
either
treatment
or
control.
H
H
So
this
is
a
generalization.
This
statement
that
an
algorithm
improves
performance,
we're
kind
of
making
a
statement
about
what
the
world
would
be
like
if
we
went
and
deployed
the
algorithm
based
on
the
results
of
an
a
b
test
and
typically
an
eb
test
is
like
a
pretty
small
scale
experiment,
maybe
one
percent
or
less
than
one
percent
of
global
traffic.
H
So
this
is
a
fine
sort
of
generalization
to
make,
but
it's
a
generalization
that
requires
some
assumptions,
and
one
of
the
big
assumptions
is
that
the
outcome
of
one
unit
in
the
test
doesn't
depend
on
the
other
units
in
the
test.
So,
for
instance
like
what
happens
to
me
when
treatment
is
applied
to
me,
doesn't
matter
if
treatment
is
applied
to
you.
H
So,
let's
imagine,
you've
got
some
social
network
and
you've
got
like
a
messaging
application
for
this
social
network,
and
you
want
to
test
some
new
feature
for
this
messaging
application,
if
you
randomize
users
to
treatment
and
control-
and
this
this
feature
does
something
that,
like
increases
usage
for
the
treatment
group,
then
maybe
this
user
who's
in
the
treatment
group
goes
and
messages
all
their
friends
who
are
in
the
control
group,
and
this
will
sort
of
increase
usage
for
both
the
treatment
and
control
groups,
and
this
will
bias
the
outcome
of
the
test.
H
H
H
There's
a
lot
more
examples
of
interference
out
there.
We've
got
a
link
to
all
sorts
of
papers
in
the
in
the
presentation
or
in
the
paper.
H
It
should
become
pretty
clear
that
interference
exists
in
congested
networks,
so
we've
got
treatment,
traffic
and
control
traffic
in
an
av
test,
and
this
traffic
shares
the
internet.
It
shares
networks
in
the
internet,
it
shares
cues
in
the
networks
routers
and
the
networks
and
links
in
the
network,
and
there
are
things
that
we
know
from
say:
congestion,
control,
research
that
the
treatment
algorithm
can
do
to
affect
the
network.
So
it
can
do
things
like
increase
or
decrease
the
length
of
the
queues
in
the
network.
H
So
interference
clearly
exists
in
congested
networks,
but
the
question
is
sort
of
raises
two
questions.
The
first
question
is
like:
does
this
matter,
so
maybe
there's
a
little
bit
of
interference
a
little
bit
of
bias,
but
it
doesn't
really
change
the
outcome
of
the
test
we
run
at
all,
or
maybe
it
does,
and
maybe
it
actually
would
sort
of
significantly
change
the
decisions,
even
we
would
make
as
a
result
of
these
tests.
H
H
H
So
this
experiment,
we
ran
in
cooperation
with
netflix
and
it
was
an
experiment
involving
bit
rate
capping.
So
first,
let
me
tell
you
about
what
bit
rate
capping
is
so
back
at
the
beginning
of
poke
19,
everyone
started.
Staying
home
load
on
the
internet
went
way
up,
and
governments
around
the
world
worked
with
large
video
streaming
services
like
netflix
to
reduce
load
on
the
internet
and
the
way
they
did.
This
was
by
capping
click
rates
and
overall,
this
reduced
the
load
that
netflix
was
sending
out
by
about
25.
H
So
what
did
this
mean?
Technically
so
netflix
videos
are
split
up
into
segments
and
each
segment
you
can
think
about.
As
a
couple
different
seconds
of
video,
each
segment
is
encoded
at
a
number
of
different
qualities.
There's
a
very
high
quality,
one
which
looks
good
but
tends
to
be
pretty
large
in
terms
of
bytes,
then
there's
a
range
of
qualities
that
goes
down
to
some
low
quality,
which
is
looks
less
good,
but
is
like
much
smaller
and
faster
to
transfer.
H
So
an
algorithm
goes
and
it
can
pick
each
quality
level
at
each
segment
and
then
at
the
next
segment.
It
can
pick
a
different
quality
level
for
bitrate
capping.
What
netflix
did
was
just
stop
serving
the
highest
quality
levels,
so
they
just
got
rid
of
those
and
so
for
the
same
sort
of
seconds
of
video.
H
H
So,
let's
imagine,
we've
got
some
congested
link,
so
this
link
up
here
on
the
top
left
and
we're
thinking
about
a
congested
link
here,
because
the
whole
point
of
this
bit
rate
capping
thing
was
to
reduce
congestion
in
the
first
place.
So,
let's
imagine
we
took
all
the
traffic
to
this
link
and
capped
this
traffic.
H
So
if
we
were
to
go
and
we
were
to
run
an
a
b
test,
one
possibility
is
that
so
we've
got
this.
Captain
control
traffic,
which
is
sharing
a
link.
The
cap
traffic
could
sort
of
reduce
the
data
that
it's
sending.
The
control
traffic
could
be
not
affected
and
there
could
be
some
free
space
available
on
the
link,
and
in
this
case
the
link
would
be
not
congested
so
in
an
a
b
test.
The
results
that
we
would
see
is
that
the
cap
traffic
would
use
less
bandwidth
than
the
control
traffic.
H
But
what
we
wouldn't
see
is
the
impact
on
congestion,
because
we
would
be
comparing
the
cap
traffic
to
the
control
traffic
and
they're,
using
the
same
link,
which
is
not
congested,
and
so
we
would
see
similar
metrics
for
congestion.
We'd
see
similar
packet
loss.
We
would
see
similar
queueing
delay
and
so
on.
H
A
second
possibility
if
we
were
to
run
an
a
b
test,
is
that
the
control
algorithm
could
be
running
some
sort
of
congestion
control
algorithm.
It
could
be
doing
some
sort
of
adaptive
bit-rate
algorithm
and
it
could
notice
that
there's
some
free
space
available
on
the
link
and
it
could
start
sending
faster
or
higher
quality
data
and
fill
up
the
rest
of
the
link.
H
In
this
situation.
The
link
could
stay
congested
and,
in
the
result
of
an
a
b
test.
What
we
would
see
is
that
the
cat
traffic
is
using
less
bandwidth
sort
of
directionally.
We've
got
the
right
direction,
but
here
we
will
get
the
wrong
ratio
so
we'll
see
that
it's
using
about
a
third
as
much
bandwidth
instead
of
half
as
much
bandwidth
and
then.
H
So
now
I'd
like
to
tell
you
about
an
experiment:
we
ran
to
measure
congestion
to
measure
the
sort
of
actual
effect
of
bit
rate
counting,
and
in
order
to
do
that,
I
need
to
describe
the
experiment
I
rented
to
you
and
in
order
to
do
that,
let
me
introduce
another
way
of
thinking
about
the
same
thing
that
might
make
the
experiment
design
a
little
bit
cleaner.
H
So
in
this
setting
we
can
visualize
the
behavior
as
follows.
So
on
the
x-axis
here
is
the
fraction
of
cap
traffic
on
the
y-axis
is
a
measurement
of
per
session
throughput,
and
so
we've
got
a
bunch
of
different
video
sessions
here
that
are
sharing
one
link,
and
so
we
can
look
at
a
per
session
throughput
for
those
video
sessions.
H
This
line
here.
This
is
going
to
be
the
throughput
for
the
cap
sessions
and
it
doesn't
depend
on
the
fraction
of
traffic
which
is
kept
because
there's
some
sort
of
upper
limit
that
we
send
at
and
as
we
increase
the
fraction
of
traffic,
which
is
capped,
there'll,
be
more
space
available
on
the
link
and
so
we'll
continue
sending
at
this
upper
level.
H
So
what
we're
interested
in
measuring
here
is
what
happens
if
we
were
to
go
and
we
were
to
deploy
a
bitrate
captain.
So
that's
the
difference
between
100
of
traffic
is
capped,
which
is
over
here.
On
the
right
hand,
side
100
of
traffic
is
running
the
control
algorithm,
which
is
over
here
on
the
left-hand
side
and
the
difference
between
these
two
points.
We
call
the
total
treatment
effect
and
this
is
sort
of
the
quantity
that
we're
interested
in
measuring.
H
If
we
were
to
go
run
an
av
test,
we
would
not
necessarily
measure
the
total
treatment
effect.
Instead,
what
we
would
do
is
we
would
pick
one
point
on
the
x-axis,
where
we
allocate
some
fraction
of
traffic
to
treatment
and
control,
and
then
we
would
compare
the
two
points
vertically
along
this
line,
so
we
could
run
this
50
a
b
test
and
we
could
look
at
the
difference
between
cap
and
control
along
this
line,
as
you
can
see
from
the
graph.
H
This
is
just
one
point
and
it's
a
might
be
a
biased
estimate
of
the
total
treatment
effect.
H
So
sort
of
the
problem
here
is
that
we're
just
looking
at
one
point
and
if
we
start
looking
at
multiple
points,
we
can
measure
capping,
effects
and
compare
this
to
the
bias
of
ap
tests.
So
let's
say
we
did
the
following:
let's
say
we
could
somehow
run
a
five
percent,
a
b
test
and
a
95
a
b
test
simultaneously.
H
Then
we
would
get
four
points
on
this
graph
here
and
we
could
compare
this
like
point
on
the
far
left
over
here
when
five
percent
of
traffic
is
capped
or
95
of
traffic
is
not
capped
to
this
point
over
here
on
the
right
where
95
of
traffic
is
capped.
So
this
is
sort
of
when
we're
switching
from
most
traffic,
not
cap
to
most
traffic
capped,
and
we
can
use
this
as
an
approximation
of
the
total
treatment
effect.
H
So
this
is
going
to
be
our
goal.
One
thing
that
is
tends
to
confuse
people
about
this
is
why
here
are
we
picking
a
five
percent
and
a
95
a
b
test,
instead
of
say
a
hundred
percent
and
a
zero
percent,
because
the
total
treatment
effect
is
the
difference
between
over
here
on
the
left,
0
and
over
here
on
the
right
100.
H
The
reason
we're
doing
this
here
is
because
we're
also
really
interested
in
measuring
the
bias
of
a
b
tests,
and
if
we
were
to
pick
this
allocation
of
100
and
zero
percent,
we
wouldn't
have
a
b
test
to
compare
it
to
so.
This
we
thought
was
a
nice
compromise
between
coming
up
with
an
approximation
of
the
overall
effect
of
deploying
bitrate
counting
and
then
also
still
having
a
b
test
that
we
could
compare
against.
H
So
this
is
going
to
be
our
goal.
The
real
crux
of
measuring
this,
though,
is
going
to
be
to
run
these
two
simultaneous
experiments,
and
the
problem
is
like
as
soon
as
we
start
running
this
five
percent
test,
like
it's
going
to
be
difficult
to
run
a
95
test
because
kind
of
we're
allocating
the
same
traffic.
H
D
H
So
these
two
links
were
both
congested,
both
pretty
separate
from
each
other,
and
so
what
we
could
do
is
we
could
run
one
a
b
test
here
on
these
treatment
servers
and
look
at
how
it
impacted
the
congestion
on
link
one
and
then
we
could
run
a
separate
test
on
the
control
servers
and
look,
how
it
impacted
the
congestion
on
link
two,
and
so
this
is
what
we
did.
H
H
H
What
we
saw
was
in
each
a
b
test.
We
saw
about
a
five
percent
decrease
in
the
throughput,
so
capping
decreased
throughput
by
about
five
percent.
But
overall,
when
we
compared
these
two
treatments,
we
saw
an
improvement
of
about
12
percent
in
throughput.
So
when
we
switched
most
traffic
from
not
cap
to
cat
throughput
went
up
by
about
12.
H
So
this
is,
I
think,
surprising,
and
one
thing
that
might
make
this
a
little
bit
clearer
is
if
we
look
at
the
behavior
of
throughput
as
a
time
series
so
over
here
on,
the
left
is
a
time
series
of
throughput
before
the
experiment.
H
We've
got
two
links
and,
as
you
can
see,
the
two
links
have
pretty
much
identical
throughput
over
time
over
the
course
of
the
entire
day
during
these
peak
hours,
which
we
shaded
here.
These
are
the
hours
of
the
evening
load
increases
at
some
point.
Congestion
sets
in
and
when
congestion
sets
in
throughput
decreases
pretty
significantly,
and
then
it
goes
back
up
to
normal.
H
So,
during
the
experiment,
we
again
have
these
two
links,
and
this
time
the
links
are
split
between
capped
and
uncapped
traffic.
Again
we
see
during
off-peak
hours,
the
links
are
both
pretty
similar,
but
during
peak
hours
we
start
to
see
a
difference
between
these
two
links.
So
these
top
two
lines
here.
This
is
the
first
link
where
95
of
traffic
is
kept,
and
the
bottom
two
are
the
other
link
where
five
percent
of
traffic
is
capped,
because
we're
doing
capping
here,
there's
sort
of
less
data
that
we're
sending
and.
H
For
the
link
to
be
congested,
we're
going
to
need
higher
load
and
that
higher
load
will
happen
later
in
the
day.
So
when
we
cap,
we
delay
the
onset
of
congestion
for
the
link,
which
is
count,
and
we
also
make
congestion
end
earlier,
and
so
this
means
that
between
the
two
links,
the
link,
which
is
mostly
capped,
is
going
to
have
sort
of
less
congestion
and
higher
throughput.
H
H
H
In
all
of
these
cases,
we
saw
improvements
in
the
total
treatment
effect
like
capping,
improved
these
metrics,
but
the
a
b
tests
sort
of
didn't
even
get
the
direction
of
improvement
wrong.
They
either
reported
that
the
metric
would
get
worse
or
that
the
metric
was
not
affected
by
the
treatment
and
the
the
a
b
test
was
like
extremely
confident
about
this.
Like
you
know,
you
compute
the
95
confidence
intervals,
they're
very
confident
that
it's
5
to
15
worse
in
an
a
b
test
when
really
it's
25,
better.
H
So
let
me
be
a
little
bit
more
specific
about
these
risks.
Here,
here's
a
pretty
common
development
process
you
might
use
if
you're,
using
a
b
tests.
It's
pretty
similar
to
the
process
that
the
team
we
work
with
pretty
closely
at
netflix
is
using.
So
maybe
you'll
come
up
with
some
idea.
You'll
implement
that
idea.
You'll
run
an
a
b
test
for
that
idea
and
you'll
get
a
sense
of
how
the
idea
performs
and
then
you'll
iterate.
These
steps,
one
through
three
until
you'll,
come
up
with
something
that
works
pretty
well
and
then.
H
And
you'll
deploy
that
idea,
so
the
risks
here
is
that
when
you
run
an
a
b
test,
you
don't
get
an
accurate
estimate
of
what
happens
when
you
deploy
the
idea,
and
this
can
have
a
couple
of
different
consequences
within
the
development
process.
It
means
you
could
give
up
too
early
on
an
idea,
that's
good!
H
H
So
so
those
are
the
risks,
but
another
thing
we
saw
is
that
we
can
run
experiments
that
help
reduce
and
remove
this
bias.
So
this
perdlink
experiment
we
described
is
just
one
example,
and
there
are
other
sorts
of
experiments
you
can
use
that
are
also
less
impacted
by
bias.
So
I'd
like
to
tell
you
about
two
of
these
experiments,
the
first
kind
of
experiment
design
is
called
an
event
study
in
an
event
study.
You
have
some
event
and
then
you
compare
what
happens
after
the
event
to
what
happens
before
the
event.
H
H
H
Another
good
thing
about
this
sort
of
experiment
design
is
it's
pretty
common.
Folks
might
already
be
doing
this
when
they
deploy
things.
So
if
you
go
and
you
deploy
a
new
feature,
it's
pretty
common
to
look
at
the
metrics
before
and
after
the
deployment
and
see
how
it
the
deployment
impacted
those
metrics.
H
One
way
we
can
deal
with
this
is
with
a
switchback
experiment.
This
is
another
sort
of
experiment
design
where,
instead
of
just
switching
once,
we
switch
back
and
forth
between
treatment
and
control.
So
maybe,
on
this
day
we
use
capping.
On
the
second
day,
we
use
no
capping,
and
then
we
switch
back
and
forth.
H
So
again,
this
gives
us
an
estimate
of
the
total
treatment
effect,
and
this
is
going
to
be
more
robust
to
seasonality
issues
because
sort
of
in
order
to
get
the
bias
here
from
other
events.
Those
events
need
to
line
up
with
most
of
the
switches
throughout
the
experiment
and
especially
if
you
randomize
the
days
on
which
you
are
using
treatment
or
using
control.
This
can
help
avoid
these
issues.
H
So
this
is.
This
is
certainly
a
risk,
but
it's
sort
of
a
risk
that
depends
on
your
system
and,
if
you're
running
experiments,
I
think
you
may
be
pretty
familiar
with
like
the
carry
over
the
sort
of
behavior
the
way
your
system
changes
over
time
and
you
can
design
this
experiment
accordingly.
H
And
so
I'd
like
to
conclude
and
say,
there's
a
lot
more
that
we
can
do
here.
So
I
think
what
our
work
showed
is
that
if
you
have
an
a
b
test
and
this
a
b
test
uses
a
congested
network,
this
a
b
test
has
the
possibility
of
being
biased.
There's
this
pathway
for
treatment
and
control
to
interfere
with
each
other
via
the
use
of
their
shared
links.
H
H
B
Okay,
thank
you
excellent
talk.
I
B
I
In
okay,
thank
you
bruce.
That
was
a
great
talk
and
good
work
and
hi.
By
the
way,
it's
good
to
see
you
presenting
at
the
irtf.
I
will
say
that
this
is
very,
very
eliminating
at
a
minimum.
I
I
think
it's
it's
it's
something
that,
as
you
say,
everybody
should
take
into
account
when
doing
these
experiments
and
it's
great
to
have
the
data
to
back
it
up
and
to
argue
that
simple,
small,
a
b
tests
which
I've
relied
on
for
a
long
time
aren't
necessarily
good
enough,
but
I
was
going
to
poke
a
little
bit
more
at
the
thing
that
you
just
said,
which
is
the
the
the
dominance
of
the
bias
that
you
see
there
in
particular.
I
One
thing
that
occurs
to
me
is,
I
don't
know
if
you've
looked
into
seeing
if
you
made
like,
for
example,
a
client
sticky
for
a
particular
a
b
test
as
against,
I
don't
know
how
exactly
I
haven't
read
the
paper
in
detail,
but
how
exactly
the
choice
was
made
for
serving
particular
type
of
content.
But
I
imagine
that
I
guess
where
I'm
coming
from
is
that
the
it
depends
on
whether
bottleneck
really
lies?
I
Then
in
that
particular
case,
I
imagine
that
if
the
user
is
in
a
bucket
that
is
either
control
or
treatment,
then
the
user
would
basically
effectively
be
at
the
far
end
of
your
of
your
scale
right
like
for
you.
By
seeing
that
effect.
Have
you
looked
into
that?
Have
you
were
your
experiments,
sticky
to
users.
H
Yeah,
so
this
is
a
good
point
that
there
are
things
you
can
do
on
the
allocation
side
to
kind
of
avoid
some
of
these
issues,
so
the
experiments
we
ran
were
sticky
to
users,
although
it's
a
little
bit
tricky
because
it's
like
the
users
that
are
going
over
these
two
different
paths,
but
the
the
sort
of
overall
point
you're
making
is,
is
true
that,
like,
if
you
sort
of,
can
guarantee
that
the
users
are
not
going
to
share
any
resources
with
each
other
over
the
whole
way.
H
We
didn't
explore
this
too
much
because
we
found
it
pretty
hard
to
measure
whether
or
not
users
were
sharing
links,
and
we
didn't
have
a
good
sense
of
like
how
often
this
was,
but
I
guess,
okay,
so
if
you
were
to
allocate
users
instead
of
sessions,
my
gut
instinct
would
be
that
this
is
an
improvement
over
just
allocating
sessions
in
terms
of
interference.
I
Yeah,
that's
very
helpful.
I
was
asking
this
because,
if
you
actually
did
the
experiment
by
trying
to
if
we
try
to
it,
could
tell
you
something
about
where
users
end
up
sharing
bottlenecks
as
well.
If
you
actually
try
to
do
it,
the
way
that
you
are
suggesting
any
of
these
ways,
those
ought
to
make
things
better
when
the
bottleneck
is
closer
to
the
user
versus
not
and
those
experiments
can
actually
shed
some
pretty
interesting
light
on
exactly
where
bottlenecks
are
on
the
internet.
J
Hi
brian
trammell,
not
speaking
as
sort
of
I
guess,
a
solar
flare,
backup
chair.
So
there
was
something
that
you
said
earlier
that
that
was,
I
think,
kind
of
fundamental
and
really
interesting
that
I
want
to
repeat
back.
If
we
use
what
we
know
about
networks
to
design
experiments
about
networks
we
can
get.
J
You
know
we
can
get
a
lot
smarter
about
this,
and
then
I
was
looking
at
the
graph
that
you
had
of
the
switchback
experiment
and
what
it,
what
that
reminded
me
of
more
than
nothing
else,
was
like
a
diagram
of
time.
Division,
multiple
access
from
sort
of
like
a
physical
layer
textbook
that
I
looked
at
in
college
right.
So
a
b
tests
are
basically
code
division
partition
of
the
space
switchbacks
are
time.
J
H
H
Yeah
I'm
happy
to
chat
about
it.
Definitely
I
can
share
a
little
bit
about
other
experiments
that
people
run,
and
maybe
this
will
be
interesting.
So
one
thing
people
do
in
social
networks
that
I
think
is
pretty
cool.
Is
you
would
sort
of
allocate
a
user
and
then
all
of
a
friends
users
to
a
particular
treatment
and
you
sort
of
think
about
this
graph
structure
and
how
you
can
allocate
based
on
this
graph
structure
in
order
to
get
optimal
behavior
and
so
maybe
there's
something
cool.
We
could
do
with
that
for
networks.
B
Yeah
try
something.
E
E
So
the
fine
details
of
the
testing
methodology
can
have
a
big
effect,
such
as
exactly
how
this
a
b
test
may
have
been
done,
and
whether
the
the
potential
bias
that
you've
just
highlighted
has
been
mitigated
in
any
way.
B
Okay,
apparently
there
may
be
problems
in
the
room,
but
hopefully
you
can
still
hear
us
if
not
see
that.
B
You
can
still
hear
us:
okay,
good
good,
okay,
so
I
guess
I
guess
I
it's
the
same
sort
of
question
I
used
to
send
you
to
earlier,
which
is
you
know,
obviously
you're
talking
to
the
the
ie,
etf
and
and
the
irtf,
and
you
said
there
was
a
need
for
for
better
experiment
methodology.
B
What
it,
what,
if
anything,
should
we
be
doing
when
we're
designing
protocols
evaluating
protocols
to
to
to
make
sure
the
evaluations
and
the
critical
designs
are
right?
Is
there
some
sort
of
general
guidance
we
should
be
providing
or
is
it
read
this
paper
and
think
about
these
issues.
H
I
guess,
overall,
what
I
would
say
is,
I
think,
all
the
stuff
we're
doing
today
is
good,
and
we
know
it's
good,
because
we
like
we
build
good
systems
with
the
the
stuff
we
do
today,
and
I
think
what
this
gives
us
is
like
another
tool
to
think
about
how
the
algorithms
work
and
try
to
measure
like
the
way
these
algorithms
work,
and
so
I
guess,
kind
of
when
designing
new
algorithms
like
in
the
itf.
H
I
would
think
about
like
the
way
the
experiments
we
run
to
validate
those
algorithms
can
be
biased,
and
then
I
would
try
to
think
about
other
ways
of
running
experiments
that
might
reduce
that
bias.
B
B
Okay,
I
guess
not
so
in
that
case,
thank
you
bruce.
Thank
you.
B
B
B
B
Okay,
in
that
case,
I
guess
you
all
get
20
minutes
or
so
back.
Thank
you.
Everybody!
Congratulations
to
to
bruce
to
sangeeta
and
some
really
nice
presentations
there
and
I
will
look
out
for
you
in
the
rest
of
the
week
and
perhaps
in
in
philadelphia.
Thank
you.
Everybody.