►
From YouTube: IETF117-ANRW-20230724-2230
Description
ANRW meeting session at IETF117
2023/07/24 2230
https://datatracker.ietf.org/meeting/117/proceedings/
A
A
A
A
B
C
D
Okay,
welcome
to
another
session.
It's
now
time
for
Alfred
Aurora
Arona
to
speak,
and
the
title
of
the
talk
is
lowering
the
buyers
to
working
with
public
career
level.
Data.
D
Maybe,
like
speak
closer
to
your
microphone
or
like
speak
a
little
bit
louder.
That
would
be
super.
E
Helpful.
Okay,
sorry
thank
you.
Okay,
so
yeah
I
was
saying
that
I
will
be
presenting
our
work
on
lowering
the
barrier
for
working
with
public
reliable
data,
and
this
has
been
done
with
a
supervisor
yoana
from
similar
Metropolitan
and
materials
from
University
of
20..
So
the
goal
actually
of
this
work
is
basically
to
of
this.
Paper
is
basically
to
introduce
our
Consolidated
data
to
the
community
so
that
you
can
avoid
going
through
all
the
challenges
while
working
with
a
level
data.
E
Yeah
so
of
today,
I
will
start
with
the
little
background
about
the
system.
E
Then
I
will
introduce
the
data,
the
original
data
that
we
have
collected
from
the
area
and
talk
a
little
bit
about
some
inconsistencies
that
we
have
seen
on
the
data
and
I
will
finish
my
presentation
with
our
proposal.
Our
console
did
data,
so
Internet
Resources,
like
such
as
ESL,
number
or
prefixes,
are
managed
by
several
organizations
such
as
the
Regional
internet
registry,
on
behalf
of
Ayana
and
icon.
So
we
have
five
internet
registry
and
they
have
creation.
E
They
have
sorry
regional
coverage,
but
they
share
basically
the
same
core
function,
but
for
the
purpose
of
this
one
we'll
focus
on,
we
decided
to
focus
basically
on
two
main
functions.
The
first
one
is
related
to
maintaining
a
directory
service,
including
who
is,
and
each
and
I
have
actually
extract
part
of
the
territory
into
publicly
available,
always
and
also
delegation
file,
also
called
statistics
file.
Each
area
also
provide
a
reverse
DNS
for
the
delegation
to
the
customer
and
those
file
also
provided
they
also
provide
sorry,
the
reverse
DNS
for
this
file.
E
So
we
decided
to
use
those
data
for
a
project,
so
we'll
assume
that
this
should
be
easy.
So
let's
look
at
the
data,
so
here
we
have
two
examples,
actually
of
reverse
DNS
on
the
real
level.
So,
on
the
real
level,
we
are
expecting
delegation
to
customers.
So
we
are
mostly
expecting
NS
record,
but
you
can
see
on
the
example
of
the
first
example
on
the
top
that
we
can
also.
We
also
have
ipv4
and
ipvc's
I
would
say
record
that
is
not
expected
to
see
on
the
area
level.
E
The
example
on
the
bottom
is
classless
delegation,
which
is
the
the
best
way
to
delegate
a
allocation
lower
than
slash
24
using
CNM,
and
in
this
example,
we
have
two
adjacent
prefixes,
169,
2,
15,
39,
128,
26
and
192-26,
which
are
just
some
prefixes
that
have
been
delegated
to
two
different
name
server,
so
why
this
data
was
important
for
us
at
the
beginning
of
the
project,
because
we
were
looking
basically
to
track,
for
example,
limb
delegation
at
the
area
level
or
map
a
prefix
to
a
name
server.
E
So
once
we
have
the
prefixes,
we
need
to
collect
additional
information
about
the
prefix.
This
is
where
we
go
to
the
who
is
data
set,
so
the
who
is
provide
basically
general
information
about
the
resource
and
the
way
that
the
public
available
who
is
Data
is
presented
is
Which.
Object
is
separate
by
an
empty
line,
so
here
on
the
left,
you
have
one
example
from
Irene
who
is
a
database
where
they
use
the
route
attribute
which
is
not
used
by
the
the
other
registrative
others.
They
use
the
inlet
number
attributes
for
ipv4
address.
E
On
the
right
hand,
side
you
have
two
objects
from
latnik
and
they
use
the
inert
num,
which
is
common
across
other
history,
but
you
can
see
that
you
use
a
custom
notation
for
the
inlet
object.
So
if
you,
for
example,
have
your
script
running
and
you're
expecting
to
see
well
from,
for
example,
prefixes,
you
will
encode
a
lot
of
problems
trying
to
address
all
those
inconsistency
on
the
innate
num
how
the
inner
attribute
is
used
in
this
region.
E
So,
in
addition,
we've
seen
that
the
data
that
is
publicly
available
on
the
whole
is
is
not.
You
cannot
have
access
to
historical
data.
So
it's
just
one
of
data.
Different
area
I
use
different
URL,
where
the
public
they
produce
the
public
available.
Who
is
data
set
and
I'll,
show
you
there
is
a
consistent
term
of
object
and
also
okay.
E
So
on
the
table,
you
can
see
what
I
just
showed
you
on
the
previous
slide,
I
didn't
using
the
the
hood
attribute
instead
of
initlob
for
the
app
prefix,
and
there
is
no
net
name.
Instead,
iron
used
description
attribute
in
like
new
region.
There
is
no
maintainer,
there
is
no
name
for
example,
so
we
try
to
fill
those
missing
obviously
attribute.
Relying
on
other
data
sets
on
the
reverse.
Dns
part
is
similar
to
the
who
is
Data,
it's
not
possible
to
have
access
to
historical
data.
E
We've
seen
an
expected
resource
record
and
in
RFC
1035
is
a
zoom
file
should
at
least
have
an
SOA
record
in
addition
to
NS
record,
but
we've
seen
on
the
data
that
is
published
by
the
area
that
most
of
them
do
not
provide
the
SOA
record
according
to
the
specification.
E
E
We
try
to
address
those
social
Engineers
with
our
Consolidated
data.
So
how
do
we
proceed?
We
propose
actually
our
consulate
data
in
the
common
format,
which
is
enter
operable
and
optimized,
so
we
organize
the
data
in
a
year
of
the
Year.
Actually,
so
it's
possible
to
have
access
a
inventional,
it's
possible
to
have
long
signal
analysis
and
the
data
is
also
designed
to
support
large-scale
analysis
too.
So
we
base
our
work
on
longest
practice
machine
and
we
create
what
you
call
in
an
entire.
E
So
we
have
a
start
and
end
address
that
you
use
as
a
key
for
each
record
that
we
have
from
who
is.
We
rely
on
the
delegation
flight
to
complement
a
misinformation
from
The
Who
is
for
the
reverse
Zone.
We
also
convert
the
domain
to
prefix
of
both
classless
and
classrooms
delegation.
We
also
apply
the
same
idea
of
identifier.
We
have
a
start
and
end
address
that
you
use
to
easily
identify
each
object
so
yep.
So
here
we
have
one
example
of
who
is
and
reverse
in
DNS
of
the
console
data.
E
E
The
orange
color
so
shows
the
the
key
that
we
introduce
in
the
data.
So
we
have
the
start
address
and
the
end
address
that
you
introduced
on
the
data.
The
green
shows
the
data
that
we
that
was
missing,
for
example,
in
this
case
we
have
the
status
and
we
have
the
country
that
we
were
able
to
complement
from
the
delegation
file
and
for
the
reverse
DNS.
We
add
a
flag
to
show
whether
the
data
was
from
classless
or
classes
delegation.
E
So
this
kind
of
easier
analysis
when
we
want
to
compare
classless
versus
classes
delegation
on
the
IR
level.
E
Yeah,
so
to
summarize
a
little
bit,
what
we
basically
did
is
use
the
publicly
available
data
from
the
who
is
and
reverse
DNS
data,
and
we
try
to
address
some
of
the
challenges.
So
we
add
what
you
call
it
Notifier.
So
we
have
the
start
address
and
the
end
address
that
we
that
we
can
easier
or
analysis
using
these
two
limitation.
E
We
provide
the
data
in
launching
manner.
We
start
collecting
data
since
November
last
year,
so
the
data
is
publicly
available.
The
data
is
compatible
with
data
engineering
tool
on
the
website.
We
provide
more
information
on
the
data
dictionary
and
we
also
propose
a
basic
python
notebook
that
you
can
use
to
that.
You
can
customize
actually
for
your
own
needs,
so
yeah
thanks
for
orientation,
think
the
result.
There's
already
one
question.
G
F
There
it
was
said
that
AP
Nick,
for
example,
doesn't
have
MNT
objects,
which
I
do
actually
see,
for
example,
in
the
apnic
entry
for
Quad
one
and
I
actually
did
a
who
is
on
the
V4
prefix
there
and
that
and
have
a
route
entry.
But
a
net
range
and
The
Cider
range.
E
E
We
use
the
public
available
data,
so
maybe,
when
you
run
the
who
is
on
your
client,
you
are
going
to
another
I,
don't
know
which
which
more
enriched
data
from
the
from
the
registry,
but
this
table
is
basically
based
on
the
public
available,
the
one
that
is
extracted
and
publicly
available
on
the
on
the
registry
website
so
yeah.
Maybe
this
is
where
the
missing
information
comes
from.
Yeah.
H
Mark
Hoster
is
Aaron,
so
I'm,
one
of
the
regional
Registries
here,
so
one
of
the
things
that
just
to
clarify
you're
using
irr
data,
to
do
your
who
is
work
as
opposed
to
actually
look
at
who
is
on
Port
43..
So
it's
slightly
different
I
understand
the
confusion
between
the
two.
It
is
what
it
is.
It's
been
the
years
that
this
been
this
way
so
that,
hopefully,
that
helps
to
clarify
things
like
your
question
that
you
had
here
earlier.
Thank
you.
D
Thank
you
all
right,
since
we
don't
have
any
other
questions.
Let's
thank
the
speaker
again,
foreign.
G
And
and
his
slates
are
amazing,
this
is
why
we
were
just
building.
You
know
interest
here.
J
I
Hello
I'm
Andrew
kaser
from
verisign
and
today
I
will
be
briefly
detailing
our
lightning
paper,
a
call
for
collaboration,
DNS,
Integrations.
I
One
of
the
ways
the
deployment
of
the
global
DNS
has
become
more
Diversified
is
through
the
integration
of
DNS
domain
names
into
new
application,
environments,
telnet
FTP
email
services
and
then,
of
course,
later
web
browsing
in
the
past
few
years.
We
have
also
observed
blockchain
and
decentralized.
Applications
have
emerged
as
a
new
use
case
for
DNS
domain
names,
which
can
lead
to
new
application
Integrations
beyond
the
traditional
use
cases
such
as
email
and
web.
I
The
way
these
interactions
work
is
via
a
DNS
integration.
A
DNS
integration
is
a
method
that
makes
an
association
between
a
DNS
domain
name
and
a
resource
in
an
application
environment.
Today's
Integrations
can
be
categorized
into
two
broad
types
based
on
how
the
association
is
created,
utilized
and
maintained.
Dns-Based
and
server-based
a
dns-based
integration
primarily
uses
DNS
records,
while
the
server-based
integration
primarily
manages
the
integration
via
a
server.
We
will
touch
upon
examples
of
both
of
these
to
show
how
they
are
used
in
both
pre-existent
and
novel
applications.
Today.
I
Finally,
we
will
discuss
some
challenges
that
these
Integrations
face,
such
as
accountant
for
the
domain
name,
lifecycle,
and
why
these
challenges
should
be
addressed.
We
will
also
suggest
principles
for
a
responsible
integration
between
the
global
DNS
and
new
application
environments
in
the
hopes
of
starting
a
conversation.
Now
that
can
continue
at
a
future
ietf
buff
and
culminate
in
a
set
of
best
practices
for
different
types
of
DNS
Integrations.
So
current
and
future
applications
will
have
a
clearer
path
towards
safely
and
securely,
integrating
with
the
global
DNS
namespace.
I
Now,
on
this
slide,
we
should
see
a
graphical
example
of
some
of
these
relations
that
I
just
mentioned.
First,
you
register
a
DNS
domain
name
in
the
global
DNS,
and
then
you
relate
it
to
an
application,
and
one
of
the
questions
we
always
ask
is:
could
this
pattern
repeat
itself
for
new
use
cases
now
before
describing
some
of
these
use
cases
and
the
Integrations
they
use?
We
want
to
highlight
that
many
of
the
new
applications
are
not
just
from
the
blockchain
and
decentralized
application.
Community.
I
There
are
in
fact
many
many
discussions
happening
throughout
a
much
broader
set
of
communities.
Other
slide
here
shows
a
very
partial
list
that
includes
ietf
participants,
irtf
participants,
icann
w3c
cab
forum,
blockchain
and
even
private
sector
entities
all
engaged
in
discussions
about
DNS
Integrations.
I
To
begin
with,
a
dns-based
integration
primarily
makes
this
association
between
a
DNS
domain
name
and
another
resource
using
DNS
records.
This
is
the
type
of
integration
that
most
of
us
in
the
room
are
probably
familiar
with
because
it
includes
the
most
common
DNS
use
cases
such
as
using
an
a
record
to
relate
a
DNS
domain
name
to
a
web
host
or
using
MX
records
for
email
Services.
These
are
the
kind
of
Integrations
that
you
use
on
a
daily
basis
whenever
you
open
a
web
browser
or
use
your
mail
client.
I
Newer
examples
are
coming
from
the
decentralization
application
Community,
including
through
the
use
of
w3c
decentralized
identifiers,
for
example.
What
Bluesky
is
doing
to
link
a
DNS
domain
name
to
a
w3cid,
did
through
a
txt
record
for
their
platform.
Another
example
is
the
proposed
w3c
did
method
did
DNS,
which
stores
adid
directly
in
the
DNS
as
a
URI
record.
I
If
we
dig
a
Little
Deeper,
there
are
also
dns-based
Integrations
that
can
be
used
to
initially
prove
control
the
DNS
domain
name,
while
the
rest
of
the
integration
occurs
somewhere
else,
and
let's
look
at
a
couple
examples
to
see
what
we
mean
by
this.
So
the
classic
example
is
using
your
DNS
domain,
Zone
to
prove
control
of
a
domain
name
to
be
granted
a
web
certificate,
such
as.
I
I
A
newer
example
of
this
comes
from
the
blockchain
namespace
communities,
such
as
the
theory
and
name
service
in
case
those
domains
which
are
using
dnssec
data
and
txt
Records
stored
in
DNS
to
prove
that
a
given
DNS
domain
name
should
be
imported
and
integrated
into
their
given
name
spaces.
Now,
DNS
is,
of
course,
used
to
prove
this
initial
integration,
but
once
an
integration
is
made,
subsequent
interactions
will
occur
in
that
namespaces
ecosystem
instead
of
in
the
DNS.
I
You
might
wonder
how
is
this
going
to
differ
and
the
primary
reason
it
differs
is
that
the
knowledge
of
a
server-based
integration
may
not
be
gleaned
from
DNS
Zone
data
alone,
for
example,
you
may
need
to
interact
with
an
application
that
tells
you
that
a
given
DNS
domain
name
supports
their
application
in
some
capacity,
and
you
have
to
go
to
their
server
or
some
other
endpoint
to
fetch
data.
Now
this
can
provide
flexibility,
especially
in
cases
where
storing
such
data
in
the
DNS
may
not
be
feasible
or
desirable.
I
I
This
tells
us
something
interesting
about
pre-existing
Integrations,
in
that
they
have
methods
to
use
both
dns-based
and
server-based
approaches
such
as
the
certificate
being
granted
using
either
a
DNS
challenge
or
an
HTTP
challenge,
and
this
kind
of
flexibility
indicates,
as
we
consider
this
topic
moving
forward.
We
will
also
need
to
consider
multiple
types
of
integration
to
support
different
types
of
applications.
I
I
I
would
also
like
to
note
that
these
are
broad
categories
in
that
not
all
Integrations
are
going
to
fit
neatly
into
a
dns-based
or
a
server-based
bin.
What's
important
here
for
our
conversation
today
is
observing
that
there
are
many
different
approaches
used
by
both
pre-existent
and
newer
applications
today
to
integrate
with
DNS
domain
names.
So
it
is
likely
that
we
will
need
to
develop
best
practices
for
different
flavors
of
Integrations
moving
forward
to
ensure
that
different
applications
that
Target
different
use
cases
can
choose
an
integration
that
best
fits
their
operational
profile
and
objectives.
I
Now,
with
all
these
Integrations
in
mind,
we
did
want
to
discuss
some
concerns,
such
as
interoperability
and
support,
but
today
I
want
to
highlight
the
synchronization
aspect
of
a
concern.
You
can
check
our
lightning
paper
for
a
discussion
of
the
other
topics
now
synchronization
between
a
DNS
domain
name
and
other
namespaces
and
applications
are
not
guaranteed
once
the
integration
is
performed.
I
For
example,
the
DNS
domain
name
may
be
important,
but
there
may
be
no
clear
process
or
mechanism
or
guidance
to
update
the
integration
when
the
DNS
domain
name
expires
is
transferred
the
zone
changes
or
the
content
on
the
server
changes
now
to
Grant.
Why?
This
is
concerning.
Consider
the
following
example
scenario:
first,
a
registrant
will
use
a
DNS
domain
name
and
a
DNS
integration
to
integrate
that
name
into
some
application.
I
Second,
the
DNS
domain
name
will
expire,
but
because
the
DNS
integration
is
no
longer
synchronized,
the
now
X
registrant
will
be
perceived
as
controlling
the
DNS
domain
name
in
this
integrated
application.
Then,
if
the
DNS
domain
name
is
re-registered,
two
separate
parties
will
be
perceived
as
controlling
the
same
Venus
domain,
name
dependent
on
the
application
context.
I
The
second
circle
is
domain
life
cycle.
Does
the
DNS
integration
account
for
the
DNS
domain
lifecycle
to
avoid
such
synchronization
concerns,
as
we
just
mentioned?
Additionally?
Is
an
integration
aligned
with
the
best
practices
and
policies
of
the
DNS
Community?
For
example,
if
you
support
DNS
tech-based
methods
in
your
integration,
do
you
support
the
required
and
recommended
algorithms
from
the
DNS
sec
rfcs
and,
of
course,
does
the
integration
expand
utility
without
impacting
the
ability
of
the
DNS
domain
name
to
be
used
for
other
purposes,
including
the
pre-existing
uses
it
was
possibly
being
used
for?
I
J
Hello,
Peter
Thomason.
You
gave
an
example
about
on
slide,
eight,
perhaps
of
where
the
problem
lies
and
I
think
you
yeah,
and
the
example
you
gave
is
essentially
when
I
let
my
domain
expire.
I
have
a
problem
now.
Is
that
the
main
issue
we're
solving,
because
it
seems
to
me
that
that's
maybe
not
best
off
with
integration
Concepts,
but
rather
with
not
let
having
the
domain
expire
right.
So
so
I
wondered
like
what's
the
problem
we're
solving,
because
that
doesn't
seem
to
be
it
right.
I
It
depends
on
if
you're
looking
at
the
synchronization
issue,
from
which
perspective,
let's
say,
for
example,
that
you
might
have
let
the
domain
name
expire,
and
you
still
continue
to
use
the
domain
name
in
that
integration
because
it
just
happens
to
work.
I
K
Jim
Reeds
interesting
ideas
here,
but
I
think
the
problem
I've
got
is
trying
to
figure
out
where
this
kind
of
discussion
and
collaboration
could
take
place.
You've
given
a
whole
shopping
list
of
things
that
could
be
looked
after
the
future.
Some
look
interesting,
some,
maybe
not
so
interesting,
but
there's
a
whole
bunch
of
organizations
and
institutions
that
could
be
involved
in
this.
We've
got
the
ITF
we've
got.
Icann
we've
got
various
other
industry
forum
and
so
four
are
going
on.
So
where
would
you
see
this
kind
of
collaboration
and
cooperation
discussion?
I
Yeah
an
excellent
question.
Our
first
step
really
is
to
try
and
have
a
boss
to
try
and
get
more
insight
from
the
various
communities
involved
to
see
who
would
be
interested
in
tackling
this
question
because
you're
right
that
some
of
these
topics
seem
to
be
better
if
it's
for
the
ITF,
some
seem
to
be
better.
I
If
it's
for
more
eye
can
level
discussions
or
w3c
discussions,
and
it
really
sort
of
depends
on
who
we
can
get
into
the
room,
to
discuss
these
topics
and
decide
what
direction
at
that
bot,
for
example,
that
we
can
take
okay.
K
Well,
I've
got
two
points
to
make
a
question
about
that,
just
to
be
a
little
bit
picky
here.
If
we're
talking
about
above
that's
a
specific
meaning
in
ITF
context
and
I-
think
you
probably
don't
want
to
have
one
of
those
kinds
of
Buffs,
because
those
Buffs
are
supposed
to
lead
to
a
working
group
being
formed,
but
certainly
having
some
place
where
these
people
could
come
together.
For
a
group
hog
would
be
a
good
idea.
K
I
think
one
of
the
challenges
you
would
have
trying
to
fight
to
to
make
that
happen
is
finding
a
forum
or
a
venue
for
it
and
I
think
some
of
these
organizations
are
like
to
be
very
protective
about
when
you're
doing
this
little
part
of
the
problem
space.
Here
you
don't
bother
us
with
things
that
are
going
elsewhere
and
I
think
that'll
be
a
challenge
to
get
these
people
to
think
they
could
come
together
and
work
in
a
collective
manner
to
look
at
look
at
these
bugs.
C
Hi
Daniel
Khan
Gilmore,
so
thanks
for
bringing
this
up
here,
I
think
you've
outlined
a
really
a
broad
class
of
problems
and
I
think
it
can
be
challenging
to
get
people
to
collaborate.
When
you
know
my
use
case
might
be
something
completely
different
from
someone
else's
case,
and
the
only
thing
that
we
share
is
that
we
have
some
kind
of
integration
with
the
DNS
right.
I
mean
I,
see
this
with
the
encrypted
client,
hello,
fronting
server
up,
DNS
updates,
for
example.
C
How
do
you
imagine
getting
people
who
work
across
such
widely
different
Scopes
to
actively
collaborate
on
this?
And
secondly,
due
to
this
with
the
synchronization
problem?
One
of
the
things
that
I
think
we
see
happening
with
the
DNS
is
that
the
DNS
is
used
as
a
leverage
point
to
create
things
that
then
have
a
different
actual
time
scale
than
the
DNS
records
themselves
so
like
if
I
use
Acme
to
get
a
certificate.
The
validity
window
of
that
certificate
is
not
bound
to
the
validity
period
in
the
DNS.
So
how
how
do
like?
C
I
So
to
take
that
last
question
first,
hopefully
it's
not
hopeless
part
of
the
motivation
here
to
try
and
broaden
this
collaboration
is
to
bring
sort
of
diverse
communities
together,
especially
ones
such
as
the
afman
community,
who
has
a
much
longer
history
of
operational
understanding
to
help
us
influence
and
understand
what
maybe
some
of
these
new
Integrations
might
be
able
to
to
do.
And
it
might
be
the
case
that
the
scope
is
I.
Think
as
Jim
was
also
alluding
to.
It
might
be
too
much
for
any
one
sort
of
venue.
C
Okay,
one
one
place
that
I
you
might
want
to
look
for.
Inspiration
is
the
UTA
working
group
The
using
TLS
and
applications
working
group.
It's
a
little
bit
more
focused
than
the
possible
places
you
can
integrate
DNS,
but
take
a
look
at
that
and
see
how
they've
dealt
with
TLS
in
a
range
of
different
options.
D
Next
speaker
is
Johannes
NAB
and
he
will
be
talking
about
the
title
of
his
talk
is
got
aquarium
all
again:
repeatable
name
resolution
with
full
dependency
problems.
L
All
right,
I'm
going
to
talk
about
name
resolution,
the
stuff
that
DNS
is
all
about,
or
maybe
about,
let's
start
with
an
example
to
get
us
all
back
to
speed.
Yes,
so,
for
example,
if
you're
resolving
up
trm.de,
we
start
with
the
routines
or
I'm
we're
going
to
to
talk
about
authoritative,
name
servers.
I
will
start
with
the
routines.
We
have
some
name
servers
names.
L
We
have
some
blue
records,
we're
going
to
start
and
for
Simplicity
reason
after
this
figure,
because
we're
going
to
fill
it
in
later
and
we're
going
to
emit
all
the
IP
addresses.
We
simply
assume
that
they
are
somewhere
in
the
zone,
meaning
in
the
root
Dot
and
for
the
authoritative
server
fqbns,
meaning
the
NS
record
names
we
are
shortening
them
and
simply
pointing
to
in
whichever
zone
They
are
going
to
be
answered.
L
L
So,
on
the
next
step,
we
can
simply
ask
one
of
the
servers
where
we
got
the
glue
record
as
well.
For
tom.de
we
get
an
additional
delegation
back
with
three
name
servers
and
for
one
of
those
name
servers
because
it's
in
sibling
domain
in
DDE
Zone.
We
luckily
also
get
an
A
Clue
record
back
and
then
we
can
simply
ask
this
one
and
get
our
answer
back.
L
So
during
that
resolution
we
more
or
less
relied
on
new
records.
We
have
a
heavy
resolution
pass,
but
we
found
a
lot
of
stuff
in
ddns
all
those
zones
in
in
Gray,
where
we
don't
know
the
name
servers,
we
don't
know
how
we
get
there
or
what
they
could
influence.
So,
if
you're
going
to
resolve
them
all,
then
we're
going
to
end
up
with
a
figure,
that's
going
to
be
a
bit
more
crowded
and
a
bit
more
complicated.
L
L
So
what's
the
motivation
I
mean
we
want
to
find
and
and
resolve
all
because
of
dependencies?
We
want
to
build
the
Empire
dependency
tree
that
we
can
figure
out
what
can
influence
the
name
resolution.
We
want
to
identify
broken
delegations
for
some
definition
of
flame.
We
called
them
previously
lame,
so
like
authoritative,
name,
servers
that
do
not
access
or
do
not
seem
to
exist.
So
if
I
try
to
resolve
a
name,
I
might
get
an
NX
domain
back
in
the
DNS.
It
could
be
in
the
root
Zone.
L
If
it's
simply
root
crap,
it
could
be
in
some
Tod
Zone,
where
I
might
be
able
to
add
my
own
records,
authoritative,
name
style
servers
that
do
not
answer.
Timeouts
ICP
errors
could
be
performance,
problems
and
also
performance
problems,
authoritative,
servers
that
don't
answer
or
don't
give
any
useful
answers
back,
meaning
various
DNS
error
indications,
non-authoritative
answers,
recursors
that
are
entered
into
DNS
errors
and,
in
addition
to
Simply
getting
the
DED
dependency
tree.
L
We
also
want
to
query
them
all,
and
we
want
to
also
query
the
multiple
data
query
and
compare
the
multiple
data
copies.
What
are
multiple
data
copies
or
what?
What
do
we
have
in
the
DNS?
We
have
endless
records
in
the
referral
itself,
but
we
also
have
NS
records
in
the
origin.
Do
they
match?
What
can
we
learn?
Additionally,
here
we
have
blue
records
and
we
also
have
degree
records,
hopefully
an
authoritative
data
and,
last
but
not
least,
we
also
have
for
each
domain.
Hopefully
multiple
authoritative
servers,
so
do
they
are,
are
they
all
synchronized?
L
Do
they
also
have
the
same
data,
or
do
we
have
some
configuration
drift?
Why
is
it
important
or
why
do
we
want
to
investigate
that
number?
One
is
the
security
aspect:
if
there
are
some
hidden
dependencies
there,
some
broken
dependencies
that
could
influence
the
resolution
and
also
the
performance
impact.
If
we
have
records
or
name
servers
that
will
not
work,
then
we're
going
to
maybe
spend
some
time
there,
but
not
speed
up
the
resolution
for
the
user
goes
for
our
goals,
so
our
research
questions
are.
We
want
to
study
the
DNS
dependency
graph.
L
We
want
to
find
potential
inconsistencies
in
this
configurations.
Try
to
evaluate
the
impact,
but
I
left
was
a
problem.
If
we
want
to
do
that,
regular
resolvers
can
resolvers
do
not
expose
that
data
and
they
do
not
even
internally
necessarily
get
all
that
data,
because
they
might
rely
on
two
records.
The
the
primary
task.
The
primary
goal
is
get
the
an
answer
to
the
user
as
as
soon
as
possible.
That's
modest
Benchmark
there,
so
we
we
did
the
only
thing
that
we
thought
to
do.
L
We
build
our
own
resolver,
foolishly
in
the
attempt,
because
how
hard
could
that
aging
be
and
for
our
implementation
goes
more
or
less
guided
based
on
what
we
want
to
achieve,
we
want
to
discover
always
reasonable
resolution
passes.
So,
if
there's
an
hidden
primary
server,
we
are
not
going
to
Brute
Force
the
entire
IP
space
to
find
that
server.
We
want
to
query
all
data
copies
as
reasonable.
L
We
want
to
capture
all
those
queries
and
save
them
that
we
can
later
on,
provide
provenance
on.
Why
did
we
get
that
answer?
Why
do
we
have
an
additional
answer?
Why
do
we
didn't
get
an
answer?
We
want
to
be
deterministic
repeatable
and
we
also
wanted
to
be
fair
and
efficient,
so
we
don't
want
to
overburden
authoritative,
name
servers,
especially
if
we
query
all
data
copies.
So
we
want
to
hear
good
net
citizen.
L
So
implementation
of
suction
resolver
on
with
a
very
rough
overview
here,
because
more
details
in
the
paper,
because
that's
gets
a
bit
tricky
on
what
we
need
to
consider.
So
we
structured
the
resolution
models.
We
try
to
build
our
Zone
tree
and
how
it's
observable
in
any
wild,
how
it's
observable
in
the
internet,
we
find
our
authoritative
server
candidates,
meaning
blue
records.
L
Routines
result
deep
name,
server,
nay
or
DNS
names
within
the
resolver
and
also
have
to
consider
like
if
we
can
have
a
name
server,
that's
authoritative
or
parent,
and
you
try
it
as
well.
We
do
not
get
in
referral
back,
but
we
simply
get
if
we
ask
for
a
delegation
or
try
to
figure
out
the
delegation
simply
an
authoritative
answer
back
in
another
code
and
for
all
the
servers
candidates.
We
are
querying
the
SOA
record
and
the
NS
record
the
NS
record
simply
because
that
could
lead
to
additional
information
that
we
can
uncover.
L
The
SOA
record
should
exist,
might
not
exist
for
for
interesting
configurations
might
give
some
hints
on
whether
or
not
they
are
properly
synchronized.
Even
if
we
don't
see
diverging
data
and
we
consider
the
names
of-
and
we
are
going
to
use
the
name
server
as
any
of
those
two
queries,
it's
going
to
provide
us
an
authoritative
answer.
L
So
is
it
resolution
all
the
way
down?
Let's
get
back
to
our
figure.
If
we
squint
hard
enough,
I
mean
it's
already
oriented
that
way,
we're
going
to
find
some
sounds
that
seem
to
interdepend
on
each
other,
like,
for
example,
either
zet.eu
LS,
Dot
delhs.in
are
all
zones
that
point
to
each
other,
where
the
name
server
records
point
to
the
Interlink
or
interdependent
to
each
other
Zone,
including
itself.
L
If
we
go
a
bit
back
to
graph
Theory,
we
are
going
to
find
out,
or
that
looks
a
lot
like
a
strongly
connected
component,
meaning
from
each
node
I
can
read
or
in
in
this
group.
I
can
reach
I
I
can
walk
each
other
node
and
come
back
to
to
the
origin,
and
I
can
influence
myself
for
the
DNS
impact.
That
means
that,
if
I
have,
if
I
have
a
a
name
server
in
such
a
group,
that's
going
to
to
provide
an
answer.
L
If
you
fill
that
in
for
everything
we
are
going
to
end
up
with
our
more
or
less
structure
dependency
tree
and
have
also
the
the
additional
completed
graph
we're
going
to
complete
the
the
resolution
process
or
it's
based
on
postponing
our
queries
until
we
figured
out
a
strongly
connected
components
by
an
online
Graph
Search,
and
we
are
going
to
need
to
that's
details
in
the
paper
a
bit
dense
on.
How
do
we
figure
out?
What
are
the
name
servers
we
get
queries.
L
So
we
in
order
to
to
figure
out
those
strongly
connected
components.
We
need
our
zones.
We
need
to
detect
on
what
might
be
a
Zone.
What
what
can
we
externally
observe
as
a
cell,
so
we
need
to
for
all
all
dots
within
the
Zone
we
need
to
figure
out
is
that
the
zone
is
there
a
delegation
there
or
is
that
simply
a
subdomain
in
another
Zone?
L
So
q9
minimization
provides
a
framework
for
that.
So
we'll
simply
query
for
each
delegation
we're
going
to
query:
is
there
something
that's
a
dedication
for
for
the
specific
name
and
not
only
the
complete
name
and
compared
to
the
the
r
receives,
or
these
suggestions
we're
going
to
use
SRI
queries
since
for
eight
queries,
we
have
the
problem
example
that
FD
parent
is
also
authoritative
for
the
child.
We're
not
going
to
discover
this
delegation
even
if
it
exists
for
NS
queries
and
that's
what
the
original
proposal
was.
L
L
If
you
have
w
it's
most
likely,
not
going
to
be
in
a
separate
zone
so
initially
for
for
all
those
labels
within
our
single
labels
within
the
second
level,
effective,
seven
level
domain,
we're
going
to
ignore
that
for
now
and
only
going
to
do
a
photo,
Zone
cut
Discovery
in
NS
records
or
records.
If
we
get
an
answer
back,
that
indicates
an
allegation.
L
So
if
we
query
all
name
servers
that
might
be
not
completely
viable,
for
example,
the
for
the
the.com
Zone
has
26,
authoritative
servers
based
AP
addresses,
there
might
be
more
servers,
there
might
be
less
servers,
I'm
interesting
on
information
there
and
there's
a
very
large
zone.
L
So
if
we
resolve
a
lot
of
names,
we
are
going
to
hit
a
lot
of
Verizon
name
servers
and
if
we
employ
rate
limits
on
our
own,
that's
going
to
be
the
bottleneck
for
our
resolution
and
I
would
assume
that
very
sign
is
an
operator
would
probably
prefer
to
not
answer
the
same
questions
26
times.
L
If
we
can
avoid
it.
What's
the
the
solution
here
we
can
simply.
We
extend
our
assume
that
TLD
servers
are
somewhat
synchronized,
consistent,
properly
managed
and
simply
query
a
consists,
a
deterministic
subset
of
the
name
servers
so
that
we
don't
have
to
query
26
but
Pick
3,
based
on
the
name
that
we
are
currently
asking
based
on
the
IP
addresses
of
the
candidates
that
we
can
ask
and
if
you
find
any
discrepancies.
L
So
if
our
assumption
they're
consistent,
we
observe
that
it
doesn't
match
we're
going
to
query
all
additional
optimizations
are
if
we
have
its
own
files,
for
example,
from
the
centralized
Zone
data
service.
We
can
use
those
delegations
that
we
find
in
those
zones
directly
and
Skip
querying
the.com
name
servers
completely.
L
For
testing
I
mean,
if
you
implement
in
resolver,
that's
going
to
be
a
lot
of
bugs
trial
and
errors,
and
a
lot
of
gray,
hair
re-running
against
the
internet
is
not
a
viable
option,
because
the
number
one
burns,
the
authoritative
sales
servers.
We
don't
want
to
be
a
bad
net
Citizen
and
even
if
we
would
do
it,
the
results
are
not
one-to-one
comparable.
L
If
there
are
changes
in
the
DNS,
we
don't
know
if
it's
code
changes,
that's
bugs
in
our
code
or
if
it's
a
simply
DNS
changes.
So
we
haven't
procedure
on
where
we
capture
the
previous
recorded
data
and
run
it
simulate.
L
The
ID
name
servers
that
we've
seen
in
a
Linux
Network
namespace
and
can
then
record
queries
that
we've
unknown
queries,
meaning
queries
that
we've
opted
in
the
original
data,
but
for
right
now,
after
and
even
in
the
simulation-
and
we
have
also
queries
that
we
know
in
the
original
data,
but
we
skipped
in
the
simulation
right
now
so
indicating
bugs
due
to
timeout
handling
unresponsive
name
servers,
that's
comparing
the
results.
If
you
run
it
multiple
times,
everybody
run
it
against.
L
Yourselves
is
a
bit
more
complicated
with
more
details
in
the
paper,
so
let's
conclude,
we
haven't
resolver
that
can
discover
the
entire
dependency
tree
provides
a
repeatable
and
deterministic
resolution
process,
independent
of
caching
ordering
Etc
we're
saving.
All
reasonable
resolution
passes,
including
all
the
authoritative
servers
that
we
can
ask
for
later
analysis
and
have
a
process
to
to
test
that
we
have
a
sample
data
set
on
tcpresolve.github.io.
L
With
an
outdated
Alexa
list,
which
is
no
longer
up
to
date
and
the
Majestic
million
list
for
reasonable
records
that
including
subdomains
and
a
few
name
servers
a
few
domains
that
might
be
interesting.
That's
only
a
simple
data
set
if
we
have
our
data
access
more
data,
but
that
needs
to
be
analyzed.
Those
impacts
and
misconfigurations
need
to
be
evaluated,
and
then
especially
here
are
interested
in.
Are
there
new,
interesting
questions
that
could
be
answered
by
such
data
sets?
C
C
Were
you
doing
the
so
I'm
trying
to
understand
the
relationship
between
this
and
tune
in
minimization,
because
one
of
the
problems
that
we
found
with
q,
a
minimization
if
I,
remember
correctly,
is
that
you
could
get
different
answers
if
you
were
sending
the
full
query
as
opposed
to
the
the
suffix
to.
L
L
That's
malicious
and
it's
especially
if
there
are
multiple
levels
like
a
child
of
a
child,
because
then,
if
I
ask
for
the
child
directly
I
get
an
delegation,
but
if
I
ask
for
the
child's
child's
name,
grandchild
name
the
answer:
the
the
server
still
has
the
Zone
configured
and
it's
going
to
answer
directly
right
so
that
there
are
some
differences
that
could
happen
there.
Not
sure.
If
that's
the
question
exactly.
L
L
C
The
other
thing
that
I
think
would
be
very
useful
and
I.
Don't
know
if
you've
produced
this
or
not
would
be
something
that
a
domain
administrator,
someone
responsible
for
a
given
DNS
record
could
run.
That
would
do
all
of
these
queries
and
map
everything
and
give
you
the
kind
of
diagram
that
you
gave
to
show.
You
know
here's
the
range
of
answers
that
I
got
and
here's
the
pass
that
I
got.
L
E
I
was
gonna,
ask
normally
what
uses
have
you
put
it
through?
It
looks
like
you're
making
the
data
available.
L
Coming
out
what
what
we've
already
done
with
the
data,
if
I
understand
the
question
correctly?
Yes,
no.
L
So
the
the
application
so
far
more
or
less
the
the
engineering
challenge
I
looked
a
bit
into
the
data.
There
are
a
few
scary
things
like
if
you
find
NX
domains
for
for
name
server
records
that
point
into
top
level
domains
where
I
did
not
yet
figure
out
if
they
are
completely
registerable
or
not.
L
That's
the
the
scary
part
and
other
than
that
that
there
is
data,
but
I
did
not
completely
evaluate
it.
Yet
initially,
with
was
the
engineering
challenge.
That
was
the
interesting
part.
M
Hello
am
I
Audible,
yes,
all
right,
good,
hello,
everyone,
my
name
is
Christian
and
I
would
like
to
present
today
my
work
on
enabling
multi-hope
ISP
hypergiant
collaboration.
So,
let's
start
looking
at
the
internet.
Nowadays
we
see
that
more
than
80
percent
of
all
the
traffic
is
coming
from
hyper
giant,
namely
Google,
Netflix,
meta
and
or
others
of
them
now
who
they
are
sending
this
traffic.
Well,
usually
those
are
the
isps
like
we.
M
18T
Airtel
EarthLink.
In
order
to
do
so,
the
hyper
Giants
tend
to
interconnect
as
much
as
possible,
so
they
tend
to
connect
to
as
many
networks
as
they
feel
see.
Feed
now,
nowadays,
large
Hardware
Giants
peel
more
than
10
000
different.
M
M
The
hyper
Jack
needs
to
select
the
optimal
server
right,
and
this
problem
is
not
trivial,
because
there
are
a
lot
of
things
that
are
changing
on
the
internet
all
the
time,
however,
previous
worked
by
pre-ordolatile
design,
a
system
that
actually
helps
the
hyper
giant
to
select
the
the
best
server,
but
just
for
those
isps
that
are
directly
connected
to
the
hyper
Giants
right
and
here
comes
the
question:
how
about
the
networks
that
do
not
actually
appear
with
this
hypergiant?
Since
the
largest
one
goes
up
to
15
000
networks.
M
It
means
that
there
are
more
than
40
000
small
Network,
guys
out
there
that
really
don't
appear
to
a
hyper
Giant.
So,
during
our
collaboration
with
a
large
European
Transit
provider,
we
actually
saw
that
a
really
large
number
of
these
small
isbs
to
do
not
peer
with
the
majority
of
the
hyper
Giants
and
they
actually
rely
on
their
Transit
provider.
M
Let's
have
a
quick
look.
We
have
here
on
the
left
side,
an
actual
hypergiant
that
would
like
to
send
some
traffic
to
to
a
small
European
ISP,
and
the
first
thing
that
it
does,
it
will
send
the
traffic
to
the
transit
is,
as
we
can
see,
the
traffic
is
split
in
two
different
locations.
M
M
What
will
happen
in
the
small
European
ISP
is
that
some
of
the
traffic
actually
needs
to
be
rerouted
from
one
location
to
another
in
order
to
reach
the
end
clients
right
and
in
this
situation
we
have
this
small
different
amount
of
percentage
that
actually
went
to
to
one
location
but
needed
to
be
in
another
location.
Further
investigation,
we
went
for
the
investigation
here
we
looked
at
all
the
router
at
their
capacity.
There
are
no
congestions
there,
actually,
no
problem
anywhere.
M
The
only
issue
that
this
is
happening
is
actually
the
improper
choosing
of
the
server
on
the
hypergiant
right
and
now.
If
we
try
to
look
at
the
entire
week,
we
want
to
see
what's
happening
over
the
entire
week
with
the
traffic
coming
from
the
hyper
Giant
and
what
we
see
from
the
total
traffic
coming
from
this
hyper
giant.
M
We
see
a
typical
Journal
pattern,
it's
very
typical
for
a
residential
ISP
with
high
peaks
in
the
evening,
and
when
we
look
at
the
non-optimized
traffic,
we
see
that
it
kind
of
follows
the
same
pattern,
but
what's
most
important
for
us
is
that
there
is
a
large
amount,
so
the
non-optimized
traffic
is
very
high
during
the
peak
times
right.
M
So
it's
almost
30
percent
here
foreign
now
we
see
this
behavior
in
more
than
20
European
isps
during
our
collaboration
with
the
log
Transit
provider,
and
we
asked
ourselves
is:
is
there
the
possibility
to
help
the
hypergines
to
improve
the
server
selection
for
non-directly
connected
isps
right?
So
can
we
actually
reduce
this
18
or
maybe
completely
remove
it?
If
it's
possible-
and
in
fact
the
answer
is
yes,
we
can
do
that
and
we
can
do
that,
but
by
enabling
is
the
ISP
to
hyper
giant
collaboration.
M
The
idea
is
that
the
ISP
means
to
send
some
additional
information
to
the
hypergine
in
order
to
improve
the
server
selection.
This
sort
of
collaboration
can
go
multiple
ways.
For
example,
you
can
have
a
multi-hop
collaboration
with
ISP
collaborates
directly
to
the
hyper
Giant,
and
no
other
in
between
Transit
is
they're.
Multiple
of
only
one
of
them
is
involved.
Another
sort
of
collaboration
can
be
one
plus
Hub
collaboration
where
there
is
a
chain
of
collaboration
between
all
the
neighbors
starting
from
the
ISP
ending
up
in
the
hyper
giant.
M
Multiple
other
collaboration
are
possible
and
we
discuss
them
in
our
paper.
So
if
you
are
interested,
please
go
ahead
and
read,
but
for
our
presentation
we'll
focus
on
the
multi-hop
collaboration.
The
idea
here
is
that
the
ISP
would
like
to
send
a
set
of
key
value
pairs
to
the
hyper
giant
where
the
the
key
is
an
IP
prefix
of
the
ISP,
and
the
value
is
a
list
of
this
similar
IP
prefixes,
give.
K
M
M
Right
now,
once
this
is
set,
the
idea
the
the
question
comes
up
is
how
does
the
ISP
select
this
prefixes,
what
what
prefixes
should
choose
and
where
it
is
the
three
different
possibilities
you
can
go
either
with
bgp
and
all
prefixes,
which
is
actually
not
efficient,
since
the
the
the
hyper
jet
already
knows
them
and
takes
them
into
account.
M
The
second
one
is
the
the
ISP
DNS
resolver
working
prefixes,
and
the
idea
here
is
that
inside
the
ISP
actually
uses
a
a
small
fine-graded,
a
prefixes
that
it's
working
with,
especially
for
the
DNS
resolvers
of
the
clients,
and
we
we
call
this
in
our
future
in
in
the
future
slides.
We
will
call
this
specific
work
prefixes
that
DNS
default.
A
third
option
is
complete.
This
aggregation
like
slash
22
desegregation,
where,
where
we
can
de-aggregate
disaggregate
all
the
prefixes
of
the
ISP
up
to
slash
24..
M
The
thing
here
is
that
both
the
DNS
resolver
and
the
and
the
DNS
server
should
have
the
ECS
enabled
now
going
back
to
our
traffic
and
the
unoptimized
part.
We
ran
a
retrospective
simulation
on
this
real
traffic
and,
as
we
can
see,
with
the
DNS
default
simulation,
we
managed
to
reduce
from
an
average
of
18
down
to
1.3
percent
of
the
amount
of
non-optimized
traffic.
Now,
if
we
use
the
slash
24
prefixes,
then
we
end
up
with
fully
optimal
traffic
in
here.
M
As
you
can
see,
three
of
them
here
are
marked
with
a
star.
The
point
is
that
for
these
three
hyper
Giants,
they
only
connect
in
one
location
with
the
transit
as
therefore
it
doesn't
really
matter
whatever
server
they
will
choose
that
there
is
only
one
possible
way
to
to
to
send
the
data
to
the
to
the
ISP,
so
the
further
optimization
can
be
done
only
by
the
transit
is
itself.
It
wishes
to
do
something.
Some
changes
inside
of
it,
internal
routing.
M
The
next
column,
shows
you
the
the
amount
of
traffic
coming
from
his
hypergiants
and
also
the
following
up
the
the
amount
of
non-optimized
traffic.
That
is
there,
as
you
can
see,
overall,
there's
about
14
of
the
of
the
traffic
that
incoming
coming
to
the
ISP
and
this
this
14
are
potential.
Have
the
potential
to
be
improved
on
the
last
column
that
we
have
here,
we
see
the
amount
of
non-optimized
traffic
per
own
traffic
share
and
we
see
a
large
discrepancies
in
between
hypergiants.
M
M
So
looking
over
the
time
on
all
the
all
the
traffic
that
the
incoming
into
the
ISP,
we
see
the
average
of
14
and
we
see
again
the
the
same
Journal
pattern
of
the
non-optimized
traffic.
Once
we
ran
with,
respectively
the
our
simulation
with
the
DNS
default,
we
managed
to
reduce
the
the
amount
of
unoptimized
traffic
down
to
four
percent.
M
What's
more
important
is
during
peak
times
where
the
resources
are
really
scars
for
for
small
isps,
we
managed
to
reduce
it
even
more
so
in
this
case,
as
you
can
see,
from
30
down
to
10
percent.
If
we
run
the
simulation
with
the
slash
24,
we
again
ended
up
with
optimal
traffic
right
going
to
the
conclusion.
During
our
research,
we
showed
that
it's
possible
to
improve
server
selection,
even
if
there
is
no
direct
Connection
in
between
the
hypergiant
and
the
isps.
M
We
also
showed
using
real
ISP
data
that
our
system
can
actually
improve
the
non-optimized
traffic
up
to
10
without
any
additional
implementation
or
Improvement
to
the
DNS
I'm,
trying
to
say
about
implementing
and
adding
ECS
to
the
DNS.
Also,
our
results
show
that
there
is
a
discrepancies
in
between
the
the
traffic
coming
from
different
hyper
Giants
and
for
some
of
them
there's
up
to
46
of
the
traffic.
That's
coming
the
non-optimum
correction
and
we
argue
that
the
out
there
are
more
than
40
000
different
networks
that
can
potentially
benefit
of
this
system.
G
D
So,
thanks
for
the
presentation,
can
you
comment
a
little
bit
more
on
How
likely?
Is
it
that
a
hyper
Zion
will
have
alternative
paths
that
bgp
can
expose
or
like?
How
do
you
really
change
the
path
that
is
being
used.
M
M
Our
point
is
that,
unfortunately,
the
hypergiant
has
not
enough
information
when
the
request
comes
in
in
order
to
select
the
proper
server
and
by
accident,
some
distance
server
coming
from
a
different
place
may
be
selected
for
that
client,
and
then
a
different
route
will
be
used,
so
it
will
be
the
best
route
in
between
the
server
and
the
client,
but
it's
just
the
wrong
server
or
the
not.
The
best
server
was
chosen.
D
M
So
don't
know
they
are
not
in
the
same
air.
So
during
our
collaboration
with
this
Transit
Transit
is
we
know
for
sure
that
there's
no
presence
of
the
hypergiant
inside
of
the
transit
is
so
therefore,
once
the
hypergenic
want
to
send
traffic
from
its
servers,
it's
from
inside
the
actual
hyper
giant,
it
has
to
send
it
via
the
transit
is
and
end
up
in
the
ISP
right.
So.
D
All
right
since
we
don't
have
more
questions,
let's
move
to
the
next
and
last
stop
for
the
day.
It's
from
Alex
Fang
thing
and
we
talk
is
entitled:
Daisy,
practical
anomaly,
detection
in
large
bdb,
mpls,
bdp,
svx,
VPN,
Networks.
N
N
So
I
will
first
present
the
scope
of
our
project.
What
do
we
consider
anomaly
in
our
bgp
and
PLS
and
bcpsrv6
VPN
networks?
I
will
go
through
the
different
Daisy
architectures
we've
been.
We've
been
working
on
the
ITF
contributions
with
working
to
have
not
only
an
open
source
solution,
but
also
a
standard
ietf
solution
and
at
the
end
of
the
results
and
the
ongoing
works
of
this
project.
N
So
a
VPN
allows
having
a
connectivity
for
the
customer
between
two
or
more
sites,
and
in
this
project
we
Define
an
anomaly
as
an
event
that
occurs
in
the
network
and
impacts
the
customer
traffic
and
therefore
makes
the
customer
unhappy.
This
event
can
be
provider
inflicted
due
to
an
incident
inside
our
Network,
a
fiber
cut
an
interface
not
working
properly.
N
It
can
be
also
providers
self-inflicted
when
there
is
a
maintenance
window
and
the
operator
is
pushing
a
new
upgrade
and
in
we
in
this
upgrade
there
is
a
bug,
but
it
can
also
be
customer
inflicted
when
the
router
or
the
customer
adds
router,
but
that
are
managed
by
the
by
the
customer
themselves.
They
push
a
wrong
configuration
to
the
router
and
they
lead
their
own
traffic
Cobra
hole
so
why
this
is
important
for
isps
but
because
at
the
end,
if
you
match
your
outages
badly
and
they
last
you
end
up
in
the
news.
N
We
all
know
that
issues
happens
to
all
the
network,
but
what
it
matters
for
ISP
is
how
you
manage
them.
This
service
Interruption.
Of
course
they
make
you
look
bad,
but
cost
you
a
lot
of
money
and
that's.
That
is
why
we
in
this
project,
we
are
focused
on
how
to
detect
these
anomalies
in
at
early
stages
and
how
to
provide
the
necessary
information
for
Network
operators
to
analyze
the
data,
find
the
root
cause
and,
of
course,
fix
the
issue
at
the
end.
N
So
this
is
a
project
that
is
financed
by
swisscom
and
we
do
research,
but
also
development
in
open
source
throughout
all
the
chain.
So
from
getting
the
data
from
the
network
to
get
visibility
of
what
is
happening
in
the
network,
we
do
research,
but
also
standardize
the
Telemetry
protocol
at
the
IDF
and
Implement
different
Publishers
and
and
collectors.
To
get
this
information,
we
propose
also
new
network
measurements
that
could
be
interesting
for
anomaly
detection
and
at
the
end,
the
final
goal
of
this
project
is
having
a
scalable
solution
of,
of
course,
anonymity
detection.
N
N
Let's
present
the
different
components
within
working
on
in
Daisy,
so
first
we
will
understand
that
we
need
to
know
the
behavior
of
the
customers
to
make
it
work.
I
will
go
through
the
different
standards
we
are
using
to
get
the
different
dimensions
we
are
getting
from
the
network.
How
do
we
post
process
them
and
at
the
end,
based
on
that
data?
How
do
we
detect
anomalies
and,
of
course,
once
we
have
detected
the
anomaly?
How
do
we
report
them
to
the
knock
so
that
they
can
fix
the
issue.
N
N
For
example,
they
are
customers
that
are
super
predictable
with
half
with
flat
curves
of
traffic
or
repeated
date,
night
cycles,
but
there
are
also
some
other
customers
that,
for
example,
regular
drops
of
packets
is
for
them
normal,
and
therefore
we
cannot
use
this
drop
Matrix
to
detect
anomalies
for
them.
On
the
other
hand,
they
are
managing
around
10
to
11
000
VPN
customers.
We
cannot
do
one
recipe
for
a
customer
and
therefore
we
need
to
group
the
customers
into
into
customer
profiles
so
that
we
can
base
our
anomaly
detection
recipes
on
this
profile.
N
So
we
are
getting
different
dimensions
from
the
network,
so
first
for
databane
we
are
using
the
ipfix
to
get
traffic
counters
and
packet
drops
from
from
Network
in
control,
pane
to
get
the
bgps
topology
and
the
bgp
state.
We
are
using
BMP
capturing
bgp
events
such
as
updates,
withdraw
and
pure
Downs
and
in
management
plane.
It's
still
a
bit
a
working
progress
at
the
ITF,
but
we
are
already
deployed
yampus
using
UDP
native
to
get
the
interface
State
changes
and.
L
N
Of
course,
once
we
have
got
all
this
information
by
The
Collector,
we
need
to
correlate
this
information
to
the
customer
so
that
we
know
to
which
customer
we
are
impacting
and
we
are
correlating
ipfix
to
the
bgb
path
so
that
we
know
which
Contours
belongs
to
which
customers
and
we
are
doing
the
same
for
the
input
interfaces
of
yampus.
To
know
when
there
is
an
interface
which
goes
down
to
which
customers
we
are
impacting.
N
We
are
doing
all
this
with
the
open
source
solution
of
PMC
City
developed
by
Paolo
lucente,
and
these
allow
us
to
rely
not
in
inventories
but
of
in
what
is
happening
in
the
network.
N
So
once
we've
got
the
data
correlated
to
the
customer
to
the
customer
identifier,
we
base
our
customer
our
anomaly
detection
on
the
customer
profile
for
each
customer
profile.
We
apply
a
set
of
strategies
that
are
a
way
to
capture
the
service
health.
N
These
strategies
are
organized
by
a
set
of
pipelines
that
are
a
sequence
of
conditionally
checks
and
the
checks
are
the
actual
algorithms
on
the
data
that
detect.
If
there
is
something
wrong
on
that
data,
a
simple
check,
for
example,
would
be
check
if,
for
that
customer
at
that
time,
if
the
traffic
is,
there
is
a
big
difference
from
the
last
week.
If
there
is
a
big
difference,
we
raise
the
alarm.
Of
course,
all
of
these
is
configurable.
N
And,
of
course,
once
we've
detected
an
anomaly,
we
need
to
report
them
to
the
nego
operator
so
that
they
can
fix
the
issue
for
that.
We
are
interfacing
with
ciscom
through
our
Kafka
topic,
so
that
we
are
sending
a
ticket
to
the
NOC
and
so
that
they
can
get
the
data.
Within
this
message
we
are
giving
them
also
to
Raw
data
from
where
we
have
executed
the
algorithms,
but
also
the
details
and
the
parameters
of
the
different
checks
we
have
executed
so
that
they
can
have
the
full
view
of
what
we.
N
What
did
we
raise
this
alarm
since
we
are
in
a
big
data?
We
cannot
save
all
the
data
when
there
is
an
incident.
N
We
are
saving
this
data
in
the
in
a
permanent,
permanent
storage
to
play
around
with
what,
if
scenarios,
to
experiment
with
new
strategies
and
new
checks
so
that
we
can
continue
improving
anomaly
detection
and
continue
improving
the
accuracy
of
our
platform
yeah,
as
I
said
at
the
beginning,
we
are
also
contributing
a
lot
within
the
ITF
to
not
only
have
an
open
source
solution,
but
also
a
standard
solution
with
the
different
rfcs.
N
For
example,
we
have
proposed
a
UDP
based
transport
for
young
push
to
allow
the
streaming
or
large
amount
of
data
from
the
router
directly
from
the
light
cut,
without
stressing
the
rod
processor.
Of
course,
I
do
ITF.
We.
We
have
seen
that
there
are
new
technologies,
same
routing
over
IPv6
that
are
starting
to
be
deployed.
N
We
are
proposing
also
extensions
to
ipfix
so
that
we
can
monitor
these
new
technologies
through
the
same
system,
and
we
are
also
proposing
new
metrics
in
that,
in
our
case,
is
the
on-pass
delay,
which
is
the
delay
between
the
encapsulating
note
and
the
different
nodes
along
the
path,
and
we
are
exporting
these
delays
using
also
ip6,
so
that
we
can
already
have
the
aggregation
from
the
node
foreign.
N
We
have
also
other
contributions.
I
will
not
go
into
details
on
each
of
them,
but
please,
if
you
are
interested
in
us,
but
basically
what
we
are
doing
is
extending
the
young
push
header
so
that
we
can
actually
monitor
not
only
the
data
but
also
monitor
the
whole
pipeline
of
young
push
and
in
the
second
instance
we
are
also
extending
ium
direct
export
so
that
we
can
compute
in
passport
mode.
The
on-pass
delay
that
we
proposed
early.
N
So,
what's
the
status
right
now
of
this
project
right
now,
this
project
has
been
developed
in
python,
as
a
proof
of
concept
has
been
deployed
in
production
for
a
subset
of
customers
of
the
swisscom
VPN
network
and
so
far
we
have
detected
six
outages
three
in
real
time
and
three
in
replay
mode,
and
currently
we
are
continuing
studying
if
there
could
be
New
Dimensions.
N
That
could
be
interesting
to
detect
these
anomalies
and,
of
course,
at
each
IDF,
discussing
with
the
different
vendors,
if
our
different
drafts
could
be
actually
be
implementable
in
the
future.
We
also
plan
to
study
the
same
framework
could
be
also
be
used
to
to
detect
anomalies
in
Internet
policies
since
at
the
end,
in
isps
it's
another
service,
but
they
use
the
same
system
in
a
way
of
monitoring.
N
So
in
a
way
in
our
season,
they
could
be
another
profile
with
different
recipes
dedicated
to
internet
services
and,
of
course,
at
each
ITF.
We
are
present
at
each
hackathon,
but
also
different
working
groups
to
continue
progressing
with
the
standardization.
N
So
that's
it
for
me.
If
you
have
any
questions
or
you
are
interested
actually
in
any
network
Elementary
topic,
we
are
a
bigger
group
than
the
authors
of
this
paper.
Please
ping
us
reach
out.
We
are
currently
at
the
whole
week
at
the
IDF
at
each
ITF.
We
are
present
working
with
swisscom,
but
also
different
vendors,
such
as
Cisco
and
hallway.
D
If
there
are
no
questions
from
the
audience,
I
can
sign
me
so
from
all
the
data
that
we
have
seen
so
far
and
like
all
the
different
incidents,
can
you
comment
on
the
more
common
or
the
more
disrupting
one?
What
would
you
like?
What's
the
worst,
the
more
common
one.
N
From
the
different
incidents
we
have
seen,
there
is
a
no
common
incident
because
it
it
was
common.
We
would
just
have
fixed
the
issue
and
not
see
see
it
nearly
right.
So
no
from
all
the
engineers
we
have
seen
so
far,
which
each
incident
is
a
new
one
and
with
learning
new
things
from
that
incident
and
implementing
so
new
checks,
new
strategies
to
see
if
we
can
improve
it.
D
Sure
I
meant
more
like
what
what
was
the
more
influential
Telemetry
metric
that
you
have
done,
that
catches
many
anomalies
for.
N
B
Okay,
so
it
looks
like
we're
ending
the
workshop
right
on
time
and
I
would
like
to
thank
everyone
for
joining
us
today
and
if
we're
asking
those
interesting
questions,
please
always
feel
free
to
reach
out
to
the
speakers
if
we
had
to
catch
our
questions
due
to
time
constraint
and
I
would
like
to
also
thank
our
speakers
once
again
for
those
insightful
research
talks.
Do
you
have
anything
to
add.