►
From YouTube: Foundations of Data Availability Sampling
Description
A Data Availability Sampling (DAS) research paper by Mathias Hall-Andersen, Mark Simkin, Benedikt Wagner.
The lecture given by Benedikt Wagner on April 19th, 2023.
B
So
yeah,
thanks
for
introducing
me
and
yeah
I'm,
happy
to
to
speak
here
about
this
project
about
data
availability
sampling.
So
the
last
couple
of
months.
We
try
to
understand
what
what
is
this
data
availability
sampling?
How
can
we
formalize
this
and
yeah?
So,
let's
set
the
stage.
What
is
the
setting
that
we
are
looking
at,
and
why
do
we
care
about
this?
B
So
let's
say
we
have
some
peer-to-peer
Network
and
we
have
a
party,
let's
call
her
Alice.
She
holds
some
data,
for
example
a
block
that
should
be
posted
in
this
peer-to-peer
Network,
because
this
network
runs
a
blockchain,
and
this
block
has
some
Associated
Header
information
and
now
all
the
blocks,
all
the
parties
in
this
network
download
the
entire
block
along
with
Etc,
okay.
But
that's
not
the
end
of
the
story.
B
We
also
have
parties
that
we
call
clients
and
these
clients
are
restricted
in
terms
of
resources,
so
they
are
not
able
to
download
the
entire
block
instead,
what
they
do
is
they
only
download
this
small
header,
but
still
they
participate
in
this
protocol.
So
they
want
to
verify
that
the
block
is
correct
in
some
sense.
B
B
We
don't
care
about
any
privacy,
so
we
don't
model
the
clients
as
malicious,
typically
okay.
So
that's
that's
the
setting
now,
of
course,
you
could
ask
the
question:
what
does
it
mean
formally
that
data
is
available?
This
is
a
very
vague
term.
It
is
not
really
clear
what
that
means
and
also,
let's
say
we
have
a
formal
definition
of
that.
Then
how
can
you
construct
a
protocol?
How
can
you
implement
this
efficiently?
B
B
So
Alice
has
the
data
and
let's
say
this:
data
is
now
K
chunks
of
equal
length
and
now
Alice
wants
to
send
it
to
the
network,
but
before
she
does
that
she
commits
to
this
data
using
a
Mercury
tree
which
will
give
her
a
Merkle
root,
and
that
will
be
our
header,
so
the
clients
download
these
this
Merkle
root
and
the
data
is
sent
to
the
network.
Okay.
Now
what
the
clients
will
do
to
check
that
the
data
is
available,
they
will
query
the
network
and
what
they
will
do.
B
They
will
sample
an
index
randomly,
for
example,
this
index
I,
and
then
they
get
back
from
the
network,
the
data,
the
if
data
chunk
and
an
authentication
path
that
convinces
them
that
this
data
is
consistent
with
the
root,
and
here
we
already
see
why
we
need
this.
This
Merkle
root
we
needed
to
make
sure
that
both
clients
get
consistent
responses.
This
is
an
important
aspect.
B
B
They
need
to
make
a
lot
of
queries
and
they
need
to
make
as
much
queries
at
as
they
would
have
to
do
if
they
want
to
download
the
entire
data.
So
this
is
really
not
a
good
good
idea,
so
this
is
a
first
try,
but
it's
not
really
a
good
scheme.
Okay
and,
let's
think
about
where
this
comes
from
this
problem.
B
Well,
it
comes
from
the
fact
that
we
send
the
data
to
the
network
without
adding
any
redundancy,
and
in
that
way,
if
we
take
the
data
that
is
stored
in
the
network
and
delete
just
one
symbol,
then
the
original
data
is
not
available
anymore.
So
what
we
will
have
to
do
is
we
will
have
to
add
redundancy
okay.
So,
let's
give
it
a
second
try.
B
So,
let's,
let's
add
redundancy
using
a
code
again,
let's
say
Alice
has
this
K
chunks
of
data,
and
now
she
treats
them
as
coefficients
of
a
polynomial
over
a
finite
field
of
degree,
K
minus
1,
and
then
she
sends
to
the
network,
not
the
data
itself,
but
instead
all
the
evaluations
or,
let's
say
2K-
of
these
evaluations
of
the
polynomial.
In
other
words,
what
she
does
is
she
encodes
the
data
using
a
read,
Solomon
code,
okay
and
then
before
we
saw,
we
need
some
form
of
commitment
to
make
sure
the
clients
get
consistent
responses.
B
Here
we
deal
with
polynomials,
so
it's
a
natural
idea
to
use
a
polynomial
commitment
scheme
such
as
the
kcg
commitment,
so
the
clients
will
get
the
kcg
commitment,
which
is
just
one
group
element
and
then,
along
with
the
read
Solomon
encoding,
Alice
stores,
all
the
opening
proofs
of
the
kcg
commitment
in
the
network.
Then,
when
clients
query
random
positions,
they
get
back
the
evaluation
along
with
the
opening
proof,
and
if
this
all
verifies
for
enough
queries,
then
they
are
happy
okay.
B
So
this
is
a
good
scheme,
at
least
intuitively,
but
you
know
it's
not
really
clear
if
this
is
the
case
from
a
formal
perspective,
so
we
ask
the
question:
is
this
secure
and
to
to
answer
this
question?
We
first
need
a
security
definition
right.
I
now
just
sketched
an
adversary,
but
it's
not
really
clear
what
this
adversary
is
allowed
to
do
and
yeah.
So
that's
why
we
need
a
formal
definition
and
then
also
once
we
have
a
formal
definition.
We
may
use
this
to
quantify.
How
much
better
is
this
scheme
compared
to
the
first
scheme?
B
So
how
much
do
we
gain
from
using
these
codes?
Okay,
so
that's
that's
the
setting
and
two
examples.
I
think
we
have
a
feeling
now
of
what
what
data
availability
sampling
should
be
intuitively.
Now,
let's
look
at
our
results
and
I
want
to
split
the
rest
of
the
talk
where
I
present
the
results
into
three
parts.
So
first
I
will
show
you
our
formal
definition
of
data
availability
sampling
as
a
cryptographic,
primitive,
and
then
we
will
look
at
the
constructions
and
for
that
we
developed
some
framework.
B
That
I
will
first
explain
and
then
we
will
look
at
specific
constructions.
Okay,
so
let's
look
at
the
definition
now,
how
can
we
Define
this
as
a
cryptographic,
primitive
yeah
and
the
first
thing
you
you
should
do
when
you
define
a
cryptographic.
Primitive
is
think
about
what
are
the
parties
that
are
running
this
protocol
and
what
are
the
algorithms
that
they
that
they
run.
So
you
specify
the
parties
and
the
algorithms
okay.
So
we
we
saw
this
before
we
have
our
proposal.
B
The
Proposal
somehow
takes
an
arbitrary
data
and
encodes
it
using
some
algorithm,
and
now
this
outputs,
a
commitment
sort
of
the
header
that
the
clients
will
download
and
also
an
encoding,
and
this
is
what
is
stored
in
the
network,
and
here
you
see
one
one
aspect
of
our
definition:
we
don't
model
this
network
as
a
set
of
a
lot
of
parties.
Instead,
we
say:
there's
this
encoding.
It
is
somehow
stored
within
the
network
and
the
clients
can
query
it.
Okay.
So
how
do
the
clients
look
like?
B
B
Okay.
So
that's
what
we
have
seen
so
far
I
think
nothing
really
special
happened,
but
now
the
question
was
really:
what
is
data
availability?
So
how
can
we
say?
Data
is
availability
available
and
we
formalize
this
via
an
extractor.
So
this
is
an
algorithm
that
takes
as
input
some
set
of
transcripts
and
also
the
commitment
and
it
extracts
the
data,
and
so
from
now
on,
we
will
always
say
data
is
available
available.
B
If
this
algorithm
can
extract
it,
so
availability
means
the
extractor
can
extract
it.
Okay-
and
this
is
not
only
some
concept
that
we
have
in
the
definition
you
can
think
of
this.
As
being
run
by
any
node
that
collects
enough
transcripts,
so
whenever
these
clients
finish
their
interaction,
they
can
just
send
their
transcripts
to
everyone,
they
know
or
everyone
they
want
to.
And
then,
if
you
collect
enough
transcripts,
you
run
this
extractor
and
get
back
the
data,
so
the
data
is
still
available.
B
Okay,
so
that's
the
syntax,
and
now,
let's
think
about
the
properties
that
we
want.
What
what
kind
of
properties
should
these
algorithms
satisfy?
And
the
first
thing
you
you
always
want
is
some
form
of
correctness
or
completeness
definition,
which
just
says:
if
everyone
is
honest
and
behaves
as
expected
following
this
protocol,
then
what
you
get
is
what
you
expect
right
and
in
our
case
this
would
be
well.
B
If
I
take
the
data
and
I
encode
it
and
run
these
clients,
then
all
clients
should
accept
right
because
the
data
is
available
and
also
what
I
extract
in
the
end
should
be
the
same
as
what
I
input
the
beginning.
So
this
is
just
completeness,
but
of
course
we
cannot
hope
that
everyone
is
honest
right,
so
we
need
some
security,
Notions
and
I
already
mentioned
before
that
we
assume
the
proposal
and
the
network
is
dishonest
right.
They
want
to
convince
these
clients
or
verifiers
that
some
statement
is
true.
B
Whenever
you
have
something
like
this,
it's
kind
of
a
proof
system
right.
You
want
some
form
of
soundness,
telling
you
that
if
these
clients
accept,
then
the
statement
is
really
true
and
in
our
case
this
means
data
is
available
right.
So
what
we
want
is
in
in
this
setting,
where
the
encoding
is
controlled
by
the
adversary
and
the
commitment.
If
the
clients
accept,
then
the
data
is
available.
B
Okay,
so
what
does
it
mean
that
data
is
available?
Well,
we
can
extract
it.
But
now,
if
we
look
at
this
picture,
we
see
the
encoding
is
controlled
by
the
adversary
and
the
commitment
as
well.
So
there's
no
original
data.
Okay,
so
that's
the
first
Challenge
and
let
me
just
explain
what
it
means
that
the
adversary
controls
this
encoding.
B
B
B
Okay
and
I
want
to
emphasize
two
things
here
and
the
first
is
this:
this
word
enough
right,
so
typically
in
a
proof
system,
you
would
assume
if,
if
one
verifier
accepts,
then
the
statement
should
be
true,
so
data
should
be
available,
but
we
cannot
really
guarantee
this
for
any
reasonable
system.
So
what
we
require
is
that
enough
clients
accept-
and
let
me
explain
this
so
let's
say
you
have
an
adversary
in
this
adversary
targets.
The
first
client
in
a
sense
that
it
takes
some
data
encodes.
B
It
honestly
and
responds
honestly
to
the
first
client
and
then
by
completeness
this
client
should
accept,
and
now
for
all
the
other
clients,
the
diversary
just
says:
I
just
give
you
garbage
or
random
things
or
nothing
as
responses
and
then,
of
course,
everything
that
every
information
about
the
data
that
that
can
be
used
in
extraction
comes
from
the
first
transcript.
So
if
the
first
client
makes
only
a
few
queries,
then
you
cannot
hope
to
to
extract
okay.
C
Excuse
me:
yes,
is
it
okay
to
ask
questions
now
or
we
should
defer
them.
C
I
have
a
kind
of
a
notational
question,
so
you
mentioned
in
the
beginning
that
some
data
is
stored
in
the
network,
but
I
can't
fully
understand
the
difference
between
the
network
and
the
proposer.
So
as
you
say
that,
like
that,
adverser
can
answer
differently
to
to
queries.
This
actually
means
that
nothing
is
stored
in
the
network,
so
like
they're,
just
that
version
has
some
storage
and
the
answers
arbitrarily
or
by
some
algorithms.
So
what.
D
B
So
good
question,
so
in
the
in
the
honest
case,
we
have
this
party
that
comes
up
with
the
data
or
somehow
has
this
original
data
and
encodes
it
and
stores
it
in
in
the
network.
In
the
dishonest
case
here
in
the
soundness
notion,
we
just
we
don't
want
to
trust
the
Network
right,
because
the
network
is
in
the
end,
the
thing
that
wants
to
convince
us
that
the
data
is
available.
So
we
should
model
this
as
the
adversary,
and
we
we
will
see
in
the
constructions
that
answering
inconsistently
is
not
really
possible.
B
A
B
Yeah,
so
that's
a
very
strong
notion.
Okay,
so
I
hope
this
somehow
answers.
B
Yeah,
so
where
was
I
okay
yeah?
So
the
first
thing
we
have,
we
have
to
assume
that
there
are
enough
clients,
just
because
of
this
adaptivity
and
how
many
clients
we
need
is
a
parameter
of
the
scheme,
and
we
will
come
back
back
to
that
later
and
then
I
say
we
extract
something.
I,
don't
say
anything
about
what
we
extract,
so
you
could
just
output
some
default
data,
and
then
you
satisfy
this
notion,
so
that
just
tells
us
that
this
notion
alone
is
not
enough
right.
This
is
not.
B
So
again,
I
said
before
so
these
clients
they
take
their
transcripts
and
then
they
send
it
to
some
other
parties
and
whenever
you
receive
enough
transcripts,
you
can
extract
the
original
data.
Now,
let's
say
you
have
two
nodes
that
do
this
right.
So
you
have
you
run
this
extraction
on
two
nodes
for
the
same
commitment
with
two
sets
of
transcripts
and
now,
of
course,
what
you
want
is
that
they
extract
the
same
data
right.
B
You
cannot
say
that
they
extract
some
specific
original
data,
because
the
adversary
controls
the
way
this
commitment
is
created,
and
so
there
is
no
original
data.
But
what
you
want
is
that
they
extract
really
the
same
data,
and
this
is
consistency.
But
now,
if
we
look
at
this
picture,
then
this
is
still
a
very
specific
setting
right.
So
there's
there's
two
sets
of
disjoint
clients
and
you
you
try
to
extract
from
their
transcripts.
B
What
we
really
want
is
a
more
powerful
notion
where
we
don't
care
where
the
transcripts
are
coming
from,
because
if
I
try
to
extract,
then
I
don't
really
know
who
who
made
up
these
transcripts.
So
in
our
consistency
notion,
we
give
the
adversary
the
power
to
come
up
with
these
sets
of
transcripts.
So
the
adversary
comes
up
with
one
commitment,
but
two
sets
of
transcripts.
They
do
not
have
to
be
disjoint
and
then
I
try
to
extract
from
them
right.
I
get
two
possible
data
and
my
requirement
is
that
they
should
be
the
same.
B
So
the
commitment
somehow
binds
you
to
the
data
and
of
course
I
can
only
require
this
if
the
extractions
do
not
fail,
and
now
we
also
see
that
this
outputting
some
default
data
is
not
a
good
scheme.
It
will
not
satisfy
this
notion,
because
the
adversary
can
just
be
honest
in
the
first
set
for
some
random
data
that
he
encodes
and
force
the
second
extraction
to
Output
the
default
data,
and
then
this
will
not
work.
So
this
consistency
tells
us
something
about
what
we
extract.
B
B
We
have
a
soundness
notion
telling
us.
If
enough,
clients
accept,
we
extract
something.
So
some
data
is
available
and
then
we
have
a
consistency
notion
which
says
something
about
what
we
extract,
namely
we
never
extract
two
different
things.
So
if
you
get
a
commitment
and
some
other
party
gets
a
commitment,
you
are
sure
that
you
also
get
the
same
data.
So
that's
our
basic
definition,
and
now
we
thought
about
extensions.
B
So
how
can
we
extend
this
definition
with
more
functionality
that
may
be
useful
in
practice,
so
the
first
extension
that
we
looked
at
is
repairability,
and
here
the
idea
is
what,
if
you
are
encoding,
is
broken
and
you
want
to
repair
it
go
back
to
a
stable
state
where
you
have
a
something
that
is
close
to
the
original
encoding
and
works
as
expected.
But
you
don't
want
to
change
the
commitment.
B
You
don't
want
to
redistribute
all
commitments
you
sort
of
want
to
do
this
repairing
transparently,
so
you
need
some
way
to
repair
the
encoding
such
that
it
still
works
with
the
old
commitment,
and
then
we
have
a
second
extension,
which
is
a
local
accessibility
here.
The
idea
is
what,
if
you
are
not
interested
in
the
entire
data,
but
instead
you
are
interested
in
parts
of
it,
for
example
one
symbol
and
then
of
course
you
could
just
wait
until
you
have
enough
transcripts
and
reconstruct
the
entire
data,
but
that
sounds
wasteful
right.
B
So
we
we
Define,
syntax
and
also
security
Notions
for
these
two
extensions,
but
I
don't
want
to
go
into
detail
here,
because
I
want
to
have
more
time
for
the
constructions
so
yeah,
that's
that's
it
for
the
definition
side.
So
maybe
now
is
a
good
good
point.
If
anyone
has
another
question
about
the
definition.
E
Because
you
mean
it's
not
sure,
can
it
go
back
one
slide,
it's
just
one
question
about
and
what
you
mean
by
repairability,
because
I
mean
this
kind
of
this
way
of
modeling.
The
network
is
a
bit
strange,
just
to
make
sure
I
understand
it
correctly.
So
this
repairability
means
or
repairing
means
that
some
of
the
queries
are
not
answered
at
all
or
what's
the
sort
of
the
setting.
B
E
B
E
B
E
B
B
You
say
in
this
case
where
the
encoding
is
broken.
What
I
mean
by
that
is
just
that:
I
see
that
some
clients
don't
accept,
then
I
know
something
is
wrong
right
and-
and
that
means
okay,
you're
right.
There's
no
explicit
encoding
anymore,
because
now
I'm
in
an
adversarial
setting
but
I
see
some
something
is
wrong
and
then
I
want
to
go
back
to
a
state
where
everything
is
fine
without
changing
the
commitment.
E
Yeah,
it's
just
that's.
What
probably
probably
would
really
like
to
see
the
definition,
maybe
but
yeah.
B
Sure
yeah
we
talk
also
we
we
it
took
us
some
time
to
Define
it
yeah.
So
you're
right.
This
is
a
bit
tricky,
yeah
Okay.
So
any
other
questions
about
the
definition.
Okay,
good.
So
now,
let's
look
at
constructions,
so
we
have
this
definitional
framework.
Now
and
now
we
looked
at
the
existing
constructions
and
also
we
thought
about
new
ones
and
then
what
we
saw
is
they
all
follow
sort
of
a
similar
framework.
B
So
we
we
abstracted
this
and
came
up
with
this
construction
framework
and
it
allows
us
to
separate
sort
of
the
combinatorial
parts
of
the
constructions
from
the
cryptographic,
parts
and
yeah
so
because
all
of
the
constructions
follow
this
framework.
I
want
to
explain
this
framework
first,
so
you
can
summarize
it
as
data
availability
sampling
from
Erasure
codes.
B
So
we
want
to
construct
a
data
availability
sampling
scheme
as
I
defined
a
few
minutes
ago.
For
that
we
will
use
three
components.
The
first
is
a
so-called
Erasure
code,
and
then
we
have
a
commitment
for
this.
So
this
is
a
new
cryptographic,
primitive.
We
Define
it's
a
Erasure
code
commitment.
You
can
think
of
this
as
a
generalization
of
a
polynomial
commitment
scheme,
and
then
we
have
a
third
component
so-called
index
sampler.
B
So
this
is
our
framework.
I
want
to
explain
how
this
works,
but
we
have
already
seen
one
example
of
this
framework
in
the
beginning,
when
we
looked
at
this
read
Solomon
code
based
construction.
So
here
the
Erasure
code
was
the
read
Solomon
code,
the
Erasure
code,
commitment
for
it
was
a
kcg
commitment
and
the
client
sampled,
their
indices
uniformly
with
replacement.
B
So
this
is
one
example,
and
now
let
me
give
more
details
about
this,
so
the
first
thing
we
want
to
look
at
is
Erasure
codes
for
that
I
need
to
recall
some
coding
Theory
just
to
get
everyone
on
the
same
page.
So
what
is
the
code?
The
code
is
just
a
mapping
taking
some
element
from
gamma
to
the
K
and
mapping
it
to
Lambda
to
the
N.
So
how
does
this
look
like?
B
You
take
some
message
which
are
K
symbols
over
gamma
and
you
map
it
to
a
code
word
which
are
n
symbols
over
Lambda.
So
you
add
some
redundancy
here
and
what
you
can
think
of
is
you
have
this
small
set
comma
to
the
K
and
you
injectively
map
it
into
a
large
space
Lambda
to
the
n,
and
this
is
your
code
and
I
should
say
that
I'm
typically
referring
to
the
code
as
the
mapping,
but
at
some
points
I
will
also
refer
to
the
code
as
the
image
of
the
mapping.
B
Okay,
so
what
is
an
example
of
a
code,
so
one
very
important
class
is
the
class
of
linear
codes,
where
your
alphabet
is
just
a
finite
field
and
you
map
a
vector
X
to
its
codeword
by
applying
some
Matrix
to
it
right.
So
you
take
your
vector,
X
multiplied
with
a
matrix.
You
get
your
code
word
and
an
alternative
way
to
look
at
this
linear
code
is
via
the
so-called
parity
check
Matrix.
So
this
is
another
Matrix
that
helps
you
identify
code
words.
B
So
a
vector
is
in
the
code
if
it's
product
with
the
Matrix
is
zero,
so
think
of
it
like
this,
you
have
this
long
code
word,
you
multiply
it
with
the
Matrix
and
if
it's
in
the
code,
then
it
gives
you
zero.
So
you
have.
You
can
check
membership
in
this
code
by
a
linear
check
and
we
will
use
this
later.
B
Okay,
so
one
example
of
such
a
linear
code
is
the
retail
element
code
right
now.
What
are
Erasure
codes,
so
Erasure
codes
again
just
any
code,
but
a
specific
way
of
looking
at
it
right.
So
we
look
at
it
from
the
perspective
of
erasers.
So
what
we
want
is
we
want
to
tolerate
erasures.
What
does
that
mean?
Well,
we
have
our
message.
B
F
B
And,
of
course,
if
you
have
the
case
where
the
alphabets
are
the
same,
then
the
best
you
can
really
hope
for
is
that
you
have
reception,
efficiency,
K,
so
you're
you
can
reconstruct.
Whenever
you
have
K
arbitrary
symbols,
you
cannot
hope
to
reconstruct
from
K
minus
one
just
based
on
information,
Theory
yeah
and
one
example
of
such
an
ideal.
Erasure
code
is
the
read
Solomon
code
right
whenever
I
have
K
points
of
my
degree,
K
minus
one
polynomial
I
can
interpolate
this
polynomial
okay.
So
now
we
know
what
an
Erasure
code
is.
B
Now,
let's
look
at
this
new
commitment
scheme
that
we
Define
the
Erasure
code
commitment
scheme
and,
as
I
said,
this
is
a
generalization
of
a
polynomial
commitment
scheme,
so
you
can
think
of
the
polynomial
commitment
scheme
as
being
an
Erasure
code
commitment
scheme
for
the
read
Solomon
code.
So
how
does
that
look
like
again?
We
have
an
arbitrary
code
and
now,
let's
say
we
have
Alice
and
she
holds
a
code
word
c,
which
is
C
of
X
or
some
message
X.
B
So
this
could
be
the
evaluations
of
a
polynomial
and
now
LS
wants
to
commit
to
this
code.
Word
and
Bob
wants
to
query
positions
of
this
coverage,
so
Bob
will
query
a
position
and
get
back
the
symbol
of
the
codeword
along
with
this
opening
proof,
Tau
I,
okay,
so
this
is
and
then,
of
course
you
can
verify
it.
So
this
is
just
as
you
would
expect,
as
you
have
in
a
polynomial
commitment
scheme,
so
you
commit
to
a
polynomial
and
then
you
get
back
evaluations
along
with
opening
proofs,
okay.
B
So
for
this
kind
of
commitment,
of
course
we
need
some
security
Notions
and
for
commitment
scheme
you
typically
consider
hiding
and
biding.
We
are
not.
We
are
not
caring
about
privacy,
so
we
only
consider
binding
Notions
here
and
there
are
two
that
we
consider
and
when
we
came
up
with
this,
of
course
you
want
at
the
same
time
you
want
to
be
consistent
with
the
Notions
that
are
already
there
for
polynomial
commitment
schemes.
B
On
the
other
hand,
the
Notions
should
be
strong
enough
to
sort
of
suffice
for
constructing
data
availability,
sampling,
okay-
and
these
are
the
two
Notions
we
came
up
with
so
the
first
is
position
binding.
You
also
have
that
in
polynomial
commitment
schemes,
where
you
the
adversary,
outputs,
a
commitment
and
then
he
tries
to
open
this
commitment
at
one
position,
I
to
two
different
symbols.
B
So
that
looks
something
like
this
universally
tries
to
convince
you
that
there's
this
code
word
that
he
committed
to,
but
at
one
position
he
opens
it
to
two
different
things,
and
this
should
not
be
possible,
so
the
adversary
wins
when
all
openings
are
valid
and
the
two
symbols
are
different.
So
this
position
binding-
and
this
is
something
you
can
easily
achieve-
just
using
a
maple
tree
right.
So
this
alone
cannot
be
what
we
want.
C
Sorry,
a
question
yeah:
why
do
have
you
decided
to
commit
to
the
code
words
because
well
I
think
in
the
very
original
in
in
the
first
slides,
when
you
sketch
it's
impossible
solution,
you
seem
to
commit
to
the
data
rather
than
to
the
code
words
at
which
point
have
you
decided
to
okay,
yeah.
B
So
you
can
also
phrase
this
as
committing
to
the
data,
but
opening
code,
the
codeword
position
right.
So
you
you
input
the
data,
so
so
this
is
sort
of
equivalent
I
would
say,
because
this
code
is
injective
right.
So
you
you,
when
you
commit
to
the
code
word
and
it's
really
a
code
word.
Then
it
implicitly
commits
you
to
the
data,
but
what
we,
what
we
will
do-
and
we
will
see
this
in
a
few
slides-
we
take
the
data
and
input
it
to
the
commitment.
But
later
what
is
important?
C
C
Just
kind
of
when
you
say
we
now
work
with
code
words
and
we
decided
to
commit
to
code
words.
This
looks
like
a
little
bit
kind
of
restrictive
because
in
general,
of
course
you
don't
have
to
encode
as
a
linear
code,
but
you
can
I
know
answer
queries
as
just
functions.
Okay,.
A
C
As
you
can
recover
the
data
from
from
from
from
function,
outputs,
you
should
you
should
Define,
okay,.
B
But
we
will
see
that
most
of
the
constructions.
You
can
write
it
in
this
way
and
so
so
I
claim
this
is
not
really
a
loss
of
generality,
especially
for
the
constructions
that
we
consider
and
if
you
have
other
constructions
sure
you
can
construct
it
in
a
different
way.
B
But
this
helps
us
to
sort
of
avoid
redoing
certain
proof,
steps
for
for
all
the
constructions,
so
yeah.
A
B
A
Clarify
the
goal
here
is
not
to
come
up
with
a
definition
that
covers
all
possible
imaginable
constructions.
Much
rather,
it
is
that
you
come
up
with
an
abstraction
that
covers
a
large
part
of
constructions,
and
then
you
show
that
if
you
have
such
a
commitment
scheme,
you
get
automatically
a
data
availability
sampling
scheme,
and
now
you
just
need
to
focus
on
building
the
specific
commitment
scheme
rather
than
building
the
more
complex,
primitive.
It
does
not
meant
to
cover
all
imaginable
constructions
of
data
availability,
sampling.
B
Yes,
but
it
does
cover
a
lot.
So
that's
that's
why
we
look
at
this.
Okay,
so
yeah
good
question.
So
now
we
have
this
position
binding
right,
but
what
I
already
said
is
you
can
easily
satisfy
this
right?
Just
take
your
data
compute,
the
code
word
and
compute
a
Merkle
tree
for
it
right.
So
this
is.
This
cannot
be
the
end
of
the
story.
Instead,
we
need
a
notion
that
relates
to
this
code.
B
So
this
is
code
binding,
and
here
we
have
again
LSG
outputs,
a
commitment
same
as
before,
but
now
watch
the
outputs
is
just
a
set
of
openings.
So
a
subset
of
this
of
this
code
word
and
what
we
Alice
wins
if
these
are
not
consistent
with
the
code
right.
So
there's
no
code
word
that
is
consistent
with
these
openings.
Okay.
What
does
that
mean?
It
means
whenever
L
is
outputs
openings.
B
Then,
if
this
scheme
is
code
binding,
then
we
know
that
all
the
openings
are
consistent
with
the
code.
Okay.
So,
for
example,
if
you
have
the
the
retolomon
code
and
you're
talking
about
polynomials,
then
it
cannot
happen
that
Alice,
outputs,
K
plus
1
points
that
are
not
on
a
degree
K
minus
one
polynomial,
okay-
and
this
is
something
that
actually
in
the
original
kcg
polynomial
commitment
scheme
paper.
B
You
don't
have
such
an
ocean,
so
this
is
a
a
notion
that
will
help
us
and
we
will
see
that
these
two
Notions
together
are
sufficient
to
construct
data
availability,
sampling.
Okay.
So
now
we
know
what
an
Erasure
code
is.
We
know
what
an
Erasure
code
commitment
scheme
for
this
for
some
code
is
and
I
don't
want
to
go
into
the
details
about
this
index
sampler.
This
is
a
very
interesting
combinatorial
thing
and
we
studied
it
for
a
while.
B
But
let
me
instead
just
show
you
how
we
construct
data
availability
sampling
now
from
these
components.
So
we
start
with
the
data
and
now
the
first
thing
we
do
is
we
use
our
Erasure
code.
So
we
use
our
eraser
code
to
compute
the
codeword,
and
this
will
be
the
first
part
of
the
encoding
that
we
will
store
in
the
network
and
now,
of
course,
we
saw
this
in
the
beginning.
We
needed
this
Mercury
to
ensure
consistency,
and
now
we
use
our
Erasure
code
commitment
scheme
for
that.
B
So
our
Erasure
code
commitment
scheme
will
give
us
openings
for
these
codeword
symbols
and
we
will
group
together
the
codeword
symbol
with
its
corresponding
opening.
So
one
symbol
of
our
encoding
for
the
data
availability
sampling
scheme
is
a
tuple
which
contains
the
codeword
symbol
and
its
opening
okay.
So
now
this
is
the
encoding.
Let's
now
look
at
the
clients,
so
the
clients
get
the
commitment
from
the
Erasure
code
commitment
scheme,
and
now
they
should
query
this
encoding
to
verify
the
data
is
available.
So
what
do
they
do?
B
They
first
run
this
index
sampler
and
now
we
know
what
this
is
is
just
an
algorithm
that
outputs
Q
positions
and
now
they
query
these
this
encoding
at
the
positions
that
is,
in
example,
outputs.
So
this
index
number
is
just
specifying
how
we
sample
the
positions
in
our
query.
These
and
I
verify
all
the
openings
right
from
every
query:
I
get
a
codeword
symbol
and
the
opening
I
can
verify
it
with
respect
to
the
commitment.
If
all
the
checks
pass,
then
I
accept
and
think
the
data
is
available.
B
If
some
check
fails,
then
I
reject
okay.
So
this
is
the
construction
and
when
we
analyze
this,
we
want
to
use,
of
course,
the
properties
of
the
Erasure
code,
commitment
schema
of
the
code.
So
when
we
analyze
this,
we
have
to
think
about
three
properties
right.
We
have
to
think
about
completeness
soundness
and
consistency.
B
So
this
is
a
summary.
So
for
completeness
and
soundness,
we
will
just
use
the
combinatorial
properties
of
these
objects.
We
will
not
use
any
computational
assumption
or
anything
like
that,
and
the
completeness
and
soundness
just
follows
from
the
reception
efficiency
and
some
measure
that
we
Define
for
this
index,
sampler,
which
we
call
the
quality,
and
if
we
combine
them,
then
this
will
tell
you
how
many
clients
do
need
to
accept
such
that
you
can
reconstruct
the
data
right.
So
remember,
in
the
definition
of
soundness,
I
told
you.
B
So
this
intuitively
makes
sense
right
if
you
need
to
collect
a
lot
of
symbols
of
this
of
this
code
word
to
be
able
to
reconstruct
the
data,
then
you
also
need
more
queries
of
these
clients
or
more
clients
in
general
to
reconstruct
the
data.
Okay.
But
the
third
notion
is
consistency
and
this
one
is
computational,
so
so
we
show
it
based
on
computational
assumptions.
We
show
it
based
on
the
position
binding
and
code
binding
of
this
Erasure
code
commitment
scheme,
so
intuition
is
as
follows.
B
Let's
say
you
have
two
clients
and
they
query
the
same
position
then
by
position
binding
of
the
Erasure
code
commitment
scheme,
they
get
consistent
responses
and
then,
if
you
look
at
all
the
responses
together,
then
by
code
binding,
you
know
that
they
are
consistent
with
some
codeword,
and
that
means
whatever
subset
of
that
you
take
as
long
as
it's
large
enough.
It
will
give
you
the
same
message
so
that
that
means
it
gives
you
the
same
data,
and
you
have
consistency.
So
this
is
just
a
a
high
level
sketch
of
consistency.
B
Okay,
so
that's
our
framework.
It
tells
us
how
to
take
a
Erasure
code
and
an
Erasure
code
commitment
scheme
and
to
construct
data
availability
sampling.
So,
as
as
Mark
said,
we
can
now
Focus
just
on
array
constructing
Erasure
code
commitments
for
Erasure
codes,
and
this
is
the
third
part.
B
Let's
now
look
at
at
specific
constructions,
okay,
so
here's
an
overview
of
the
constructions
that
we
that
we
looked
at
in
this
project,
so
the
first
one
we
already
seen
this
a
few
times
now
is
take
the
read
tournament
code
and
any
polynomial
commitment
scheme,
for
example
the
kcg
commitment
scheme
by
the
construction
that
I
that
I
just
showed
you.
We
can
use
this
to
construct
the
data
availability
sampling
scheme
and
this
already
sort
of
proves
that
that
this
construction
is
secure,
just
instantiating
our
framework.
B
Okay,
then
we
have
a
very
generic
construction
where
you
just
take
an
arbitrary
code
and
you
commit
to
the
codeword
using
a
vector
commitment
and
then
to
get
code
binding.
You
use
a
snark
on
top
of
it,
so
this
is
generic,
but
it's
also
very
inefficient
because
well,
the
snark
is
computationally
inefficient.
If
you,
for
example,
if
you
use
a
Mercury
and
I
look
at
this
more
as
a
educational
example
of
what
is
the
general
strategy
that
you
want
to
to
have
when
you
construct
these
commitments?
B
Okay
and
then
there's
the
ethereum
construction.
So
this
is
the
construction
that
is
currently
used
by
ethereum
and
we
can
actually
pass
this
and
and
view
this
as
being
an
instantiation
of
our
framework.
And
for
that
we
look
at
the
so-called
tensor
code
and
do
some
row
wise
commitment.
We
will
see
that
in
a
minute,
okay
and
then
motivated
by
the
fact
that
all
of
these
constructions
sort
of
rely
on
on
a
trusted
setup,
at
least
that
the
ethereum
one
and
the
read
Solomon
one.
B
We
asked
the
question:
okay,
can
you
have?
Can
you
construct
this
without
a
trusted
setup
and
ideally
from
minimal
assumptions
such
as
just
using
hash
functions,
and
we
came
up
with
two
constructions:
one
based
on
hash
functions,
one
based
on
homomorphic,
hash
functions
and
I
will
show
you
one
of
them
later.
B
B
So
we
take
every
row
and
apply
the
code
to
it,
and
then
we
get
this
rectangle
and
now
the
second
thing
we
do
is
we
extend
it
along
the
columns.
So
we
apply
the
code
to
every
column
and
you
can
write
this
very
concisely
in
this
form.
So
you
take
your
data
and
now
encoding
row.
Wise
means,
multiplying
by
g-transposed,
encoding,
column
rise,
means
multiplying
by
G,
and
this
is
the
the
generator
Matrix
that
we
talked
about
before
so
now.
B
If
you
look
this
at
this
code
and
let's
say
we
use
a
retolement
code
as
the
base
code,
then
every
row
and
every
column
is
now
the
evaluations
of
a
polynomial
every
row
and
every
column
should
be
a
code
word.
That's
our
code
and
before
I
show
you
how
to
commit
to
such
a
code
word
using
the
Erasure
code
commitment
scheme,
let's
first
think
about
the
properties
of
this
code
and
in
particular
about
the
reception
efficiency
of
this.
So
how
many
symbols
do
we
need
to
be
able
to
reconstruct
the
data?
B
And
why
do
we
care
about
this
again
this?
This
affects
the
the
soundness
and
completeness
parameters
that
we
get
okay
and
you
can
sort
of
see
that
the
worst
case
is
something
like
this.
We
have
all
the
highlighted
cells
or
the
highlighted
symbols
of
this
code
word,
but
you
are
not
able
to
interpolate
any
of
these
rows
or
columns,
and
so
it's
about
three
quarters
of
the
data
that
you
need
to
be
able
to
Recon
to
be
sure
that
you
can
reconstruct.
B
This
is
the
reception
efficiency
and
now,
let's
look
at
the
commitment,
so
I
said
we
have
this.
This
every
row
is
in
the
code
in
the
in
the
base
code
that
we
use,
and
now,
let's
assume,
we
have
an
Erasure
called
commitment
scheme
for
this
base
code.
B
So
if
the
base
code
is
the
read
tournament
code,
let's
assume
this
is
the
kcg
commitment
scheme
and
now
because
every
row
is
already
a
code
word,
why
not
just
commit
to
every
row
and
that
will
give
us
a
set
of
n
commitments
and
now
first
idea
that
we
could
have
is
just
group
them
together.
That's
our
commitment,
and
now
we
should
think
about
these
two
binding
Notions
right.
We
should
think
about
position
binding
and
code
binding,
so
first
to
to
understand
that
first,
let's
see
how
an
opening
looks
like
right.
B
Let's
say:
I
want
to
open
this
symbol
of
the
code
word
then.
This
is
of
course,
Associated
to
some
row
commitment
and
what
I
do
is
I,
just
open
it
with
respect
to
this
commitment.
That's
how
you
open
it
now,
let's
think
about
position
binding
and
code,
binding,
well
position.
Mining
is
easy
right,
because
every
position,
as
you
see
here,
is
associated
to
a
to
a
row
commitment.
B
Okay,
now,
let's
think
about
code
binding
so
for
code,
binding
I
want
to
I
want
to
be
sure
that
whenever
the
adversary
out
opens
a
set
of
positions-
and
this
is
consistent
with
the
code-
what
does
it
mean?
It
means
that
in
every
row
and
in
every
column,
this
is
consistent
with
the
base
code.
So
whatever
the
adversary
opens.
I
know
that
every
row
is
in
the
code
and
every
column
is
also
in
the
code.
Okay.
What?
If
a
row
is
not
in
the
code?
Well,
then,
because
I
commit
to
rows.
B
I
break
the
code
binding
of
the
underlying
Construction,
okay,
so
code
binding.
So
this
half
of
code
binding,
is,
is
clear
because
I
commit
to
rows
now.
The
challenge
is
what,
if
some
column
is
not
is
not
in
the
code
and
in
particular,
if
you
just
do
it
like
this
with
no
additional
change,
the
adversary
can
just
form
all
these
rows
independently.
Right,
there's
nothing
that
that
ties
these
rows
together
and
therefore
this
cannot
be
code
binding
yet,
but
we
can
make
it
code
binding
by
one
additional
check.
B
What
we
will
do
is
we
will
check
that
column
is
in
the
code
by
checking
it
over
the
commitments,
and
for
that
we
will
use
the
fact
that
commitments,
for
example,
kcg
commitments
are
homomorphic,
so
we
will
do
an
homomorphic
check
that
checks
that
all
these
all
these
commitments
together
are
in
the
code.
So
all
the
all
the
columns
are
also
in
the
code,
and
here
we
will
use
the
fact
that
checking
membership
in
a
code
is
a
linear
function
right,
so
we'll
use
the
parity
check,
Matrix,
okay
and
then
with
some
care.
B
You
can
analyze
this
and
you
can
show
that
it's
actually
code
binding
okay,
but
what
I
want
to
show
you
next
is
not
this.
This
proven
detail,
I
think
I
sketched
parts
of
it
now.
I
want
to
show
you
our
new
construction
and
again,
the
motivation
was
that
you,
you
rely
on
a
trusted,
setup
and
expensive
public
key
operations
here.
So
the
question
was
really:
can
we
do
it
just
from
hash
functions
and.
B
D
Question
yeah
so
I
mean
you
could
obviously
use
Fry's
polynomial
whiskey,
but
that
would
be
inefficient
Computing
all
of
these
openings.
So
what
we're
looking
at
is
whether
we
can
avoid
that
and
just
your
proximity
test
with
fry
so.
E
A
B
Yes,
so
that's
what
we
are
currently
working
on:
okay,
so
yeah.
But
let
me
show
you
this
this
construction.
It
is
for
so-called
interleaved
codes
and
we
only
use
hash
functions
for
that.
So
what
is
an
interleaved
code
again?
I
want
to
start
with
the
code
and
then
I
want
to
show
you
how
to
commit
to
a
codeword.
So
as
for
the
tensor
construction,
we
start
with
our
data
arranged
in
a
k
by
K
Matrix
and
as
for
the
tensor
construction,
we
take
some
base
code
and
and
extend
all
the
rows
using
this
base
code.
B
But
now
this
is
where
the
similarity
stops.
So
now
we
won't
extend
it
along
the
columns.
Instead,
we
group,
together
all
the
columns
as
symbols.
So
now
our
new
code
will
have
n
symbols
and
each
symbol
of
the
codeword
is
an
entire
column
of
this
Matrix.
So
again
you
can
you
can
write
this
in
a
matrix
form
and
that
that
also
means.
B
If
we
look
at
this
code,
then
the
reception
efficiency
is
the
same
as
for
the
underlying
code,
but
with
the
with
the
caveats
that
now
symbols
are
larger
right,
so
you
need,
if
you
get,
if
you
get
K
symbols
of
the
underlying
code,
then
you
have
them
in
every
row.
Sorry,
if
you
have
K
symbols
of
this
new
code
word,
you
have
these
symbols
in
every
row,
so
you
can
reconstruct
every
row.
B
But
again
the
symbols
are
larger,
and
that
means
whenever
a
client
queries
a
symbol,
then
it
will
get
the
entire
column.
Okay.
So
now
the
first
thing
when
we,
when
we
look
at
this,
is
and
one-
and
we
can't
want
to
construct
an
Erasure
code
commitment
this-
we
should
ensure
that
it's
position
binding
right
and
here
what
we
do
is.
Let's
say
we
do
a
very
naive
thing.
We
just
commit
to
all
of
the
symbols
using
a
hash
function.
Right
then
position
binding
just
follows
from
the
Collision
resistance
of
this
hash
function.
B
B
Okay,
of
course,
I
cannot
check
that
for
every
role,
because
I
don't
have
the
entire
rows,
but
what
I
can
do
is
I
can
sort
of
compress
these
rows
into
one
row
using
a
random
linear
combination,
so
I
sampled,
some
random,
Vector,
r
and
I
take
the
first
row
times,
R1
the
second
row
times
R2
and
so
on,
and
sum
that
up
and
I
get
one
compressed
row
w
and
of
course,
if
all
of
these
rows
are
in
the
code,
then
W
is
also
in
the
code.
B
This
is
just
because
this
this
code
is
linear,
and
the
hope
is
that
if
one
of
the
rows
is
not
in
the
code,
then
the
W
will
also
not
be
in
code.
That's
the
hope!
So
how
can
we
implement
this?
Well,
we
can
just
take
a
random
Oracle,
so
a
hash
function
modeled
as
a
random
Oracle,
apply
it
to
the
hash
values
and
get
our
r.
B
Then
we
take
W
to
be
R
times
C.
So
that's
just
combining
the
rows.
Okay
and
now
our
commitment
would
be
the
hash
values
and
the
W.
Now
the
first
thing
you
check
when
you
get
a
commitment,
is
you
check?
The
W
is
in
the
code
to
sort
of
implicitly
check
that
all
the
rows
are
in
the
code?
Okay.
So
how
do
you
verify
an
opening
then?
So
one
aspect
about
this
construction
is
that
there
is
no
explicit
opening
proof
instead,
I
just
give
you
the
symbol.
B
So
a
symbol
to
recall
is
a
column
in
this
Matrix.
So
I
give
you
this
column
CI.
You
check
that
W
is
in
the
code.
You
always
check
that
for
position
binding
you
check
that
the
hash
is
consistent,
and
then
you
also
check
that
the
along
this
column,
the
linear
combination,
was
formed
correctly.
So
you
check
that
R
times.
Ci
is
WI,
that's
how
you
check
it
and
now,
let's
think
about
this,
is
this
actually
code
binding?
So
the
intuition
is
that
it.
G
B
This
adversary
breaks
code
binding,
so
it
outputs
a
commitment
and
it
outputs
some
openings.
Okay
and
now
this
adversary
breaks
code
binding
if
all
these
openings
together
are
inconsistent
with
the
code.
So
there's
no
code
word
consistent
with
these
openings,
and
now
we
want
to
rule
that
rule
that
out.
Okay,
how
can
we
approach
this?
The
first
thing
we
can
do
is
we
can
say:
okay,
if
we
model
this
hash
function
again
as
a
random
Oracle,
then
these
hash
values
Define
the
pre-image
of
them,
and
we
can
in
fact
extract
it.
B
So
this
is
a
pre-image
c
star,
which
is
the
pre-image
of
all
these
hash
values
and
now
by
Collision
resistance
of
the
hash
function,
the
CIS
have
to
be
consistent
with
the
c
star
right.
So
that
means,
in
particular
the
C
star
has
has
to
be
not
in
the
code,
because
otherwise
I
mean
that's
the
winning
condition
of
code
binding
right,
whatever
is
consistent
with
the
CI
is
not
in
the
code.
Okay,
but
also
all
the
openings
need
to
verify.
So
that
tells
us
that
W
has
to
be
in
the
code.
B
Okay
and
now
it's
all
of
the
bad
events
that
we
have
to
rule
out
during
the
security
proof
is
that
well
they
are
consistent.
These
openings
are
consistent
with
the
c
star
and
they
are
also
via
this
R
consistent
with
the
w,
but
the
C
star
is
not
in
the
code
and
W
is
in
the
code.
And
now,
if
we
just
look
at
this,
then
believe
me,
you
can
bound
this,
but
there's
a
huge
problem,
and
this
is
that
we
have
this.
There
is
term
here,
so
we
need
to
use
a
union
bound.
B
So
why
is
that?
Well,
we
don't
get
to
choose
the
set
I
where
Alice
opens
the
commitment
right.
So
Alice
can
just
choose
the
set
I
in
an
arbitrary
way,
and
we
don't
know
anything
about
it,
except
that
it's
large
enough,
and
that
means
we
have
to
rule
out
this
event
for
every
set
I,
and
the
only
thing
that
we
can
do
here
is
to
use
a
union
bound,
but
there's
an
exponential
number
of
such
sets,
and
that
means
the
union
bound
will
kill
us
here.
B
So
this
is
not
a
construction
that
that
we
can
prove
secure,
and
now
how
can
we
change
it?
So
the
problem
really
is
that
Alice
chooses
this
set
I
and
now
an
idea
that
you
could
have
is
what,
if,
let's
say,
Bob
chooses
the
set
eye
or
some
somehow
they
said,
I
is
chosen
at
random
and
Alice
cannot
choose
it,
and
this
is
this
is
how
we
fix
this
problem
again.
We
have
the
same
the
same
hashes
for
position
binding.
We
compute
this
random
linear
combination.
B
This
is
everything
as
we
had
before,
and
now
our
fix
is
that,
in
addition
to
that,
we
sample
some
subset
J
of
all
the
positions.
This
can
be
a
rather
small
subset
size,
roughly
security
parameter
or
something
like
that,
and
then
within
the
commitment
we
force
Alice
to
open
the
commitment
at
this
and
the
indices
in
the
subset.
So
now,
if
we
look
at
the
security
experiment
that
we
had
before
L
is
no
longer
chooses
the
set.
Instead,
the
set
is
chosen
at
random,
and
so
we
don't
have
to
do
this.
B
Union
Round,
that's
roughly
the
intuition
of
this
Construction.
Okay.
So
that's
our
our
hash,
based
construction
and
before
I,
want
to
wrap
up
I
like
to
compare
these
constructions
and
sort
of
tell
you
the
trade-offs
that
we
have
here.
So
we
saw
in
this.
We
have
seen
this.
This
read:
Solomon
plus
kcg
construction,
and
here
the
real
Advantage
is
really
that
the
commitment
is
constant
size.
It's
just
one
group
element
independent
of
the
data
size
and
also
the
same
holds
true
for
the
communication
complexity
per
query.
B
Now
it's
scales
with
a
square
root
of
the
data
size,
but
we
have
an
additional
nice
feature,
which
is
that
you
can
do
this
reconstruction
in
a
local
way,
in
a
sense
that
you
can
take
one
row
reconstruct
this
row
and
you
don't
care
about
the
rest
of
the
encoding
and
now,
if
we
compare
it
to
our
new
hash
based
construction,
then
the
clear
advantages
of
this
construction
are
that
you
don't
need
a
trusted
setup.
We
have
sort
of
minimal
assumptions
and
you
can
implement
this
because
you
don't
need
public
key
operations.
B
You
can
implement
this
over
small
fields,
which
gives
you
a
computational
efficiency.
Of
course.
At
the
same
time,
you
have
a
larger
commitment
size
and
a
larger
communication
per
query,
but
there's
one
final
advantage
that
I
want
to
mention-
and
this
is
the
total
communication
size
that
we
need
to
reconstruct
the
data.
So
what
we
can
do
is
we
can
look
at
how
many
samples
do
we
need
to
make
in
order
to
be
sure,
with
a
certain
probability
that
we
can
reconstruct
the
data?
And
then
you
multiply
that
by
the
communication
per
query.
B
So
you
get
the
total
communication
complexity
over
all
clients
that
you
require
to
reconstruct
the
data,
and
we
computed
this
and
it
turns
out
that
for
this
construction,
it's
smaller
than
for
the
other
ones,
and
the
main
reason
is
that,
for
example,
if
you
compare
it
to
the
tensor
scheme,
you
have
this,
you
need
three
quarters
of
the
symbols
right,
so
you
need
to
make
more
queries
to
reconstruct
this
okay.
So
that
was
some
comparison.
Let
me
now
summarize
what
we
have
seen
and
what
are
the
next
steps
for
us?
B
E
B
Okay,
so
I
see
there's
a
question
in
this
chat.
Are
you
working
in
an
asynchronous
or
synchronous?
Setting?
Okay,
that's
a
tricky
question.
So
so
what
we?
We
don't
Define
like
the
communication
between
these
parties.
The
only
thing
we
Define
is
that
we
Define
sort
of
the
clients
as
an
oracle
machine
that
gets
Oracle
access
to
the
encoding.
So
what
we
assume
is
that
you
make
a
query
and
you
instantly
get
back
the
the
response.
B
So
you
you
query
position
I,
you
get
back
the
if
symbol
of
the
encoding
or
nothing
or
something
that
the
adversary
chooses,
but
there
is
no!
So
in
that
sense
you
think
you
should
think
of
it
more
as
a
synchronous
setting.
G
G
G
B
F
F
G
Of
saying
that
you
don't
know
when
the
client's
going
to
respond,
like
you
know
that
you'll
guess
a
response
assuming
they're
online
and
available,
but
whether
you
get
a
response
in
a
minute
from
now
or
10.
Minutes
from
now
would
be
something
that
you
would
like
me
to
actively
like
well
for
partial
synchrony
you'd
need
to
sort
of
consider
what
the
actual
time
bounds
are
and
for
philosyncrony
you
would
need
to
sort
of
say
we
can't
model
time
at
all.
G
A
F
A
G
Is
not
to
say
that
I
mean
I,
guess
you
probably
need
to
set
that
time
back
time
is
starting
to
be
very
high
if
you're
doing
it
like
that,
because
otherwise
you
have
to
start
considering
the
setting
of
well,
some
sometimes
they're
not
going
to
be
able
to
respond
within
10
seconds,
but
they
will
respond
within
20.
So
what
do
we
yeah?
So
it
probably
needs
to
be
like
a
ridiculous
overlap.
F
Yeah,
but
yes,
yes,
but
yes
that
affects
liveness
yeah.
So
if
you
want
to,
if
you
want
to
also
consider
I,
can
this
actually
worked?
And
yes.
A
Maybe
a
short
comment
to
what
Francesco
wrote
in
the
chat.
So
that's
like
a
question
of
how
you
model
it,
like
you
could
say
that
completeness
says
under
good
Network
conditions
under
honest
participants
under
honest
encoding.
Everything
works
well,
in
which
case
it
would
just
be
an
additional
Assumption
of
what
your
completeness
is
and
then
like
the
not
being
delivered
would
be
part
of
the
soundness,
but
you
could
also
model
it.
As
saying
like
you
have
completeness
in
an
asynchronous
Network
and
then
you
would
have
an
extra
definition.
A
G
G
D
A
After
the
paper
was
done,
maybe
one
small
comment
regarding
like
that
we
abstracted
away
the
network
so
in
the
talk
Benedict
was
talking
about
the
index
sampler.
So
one
thing
that
we
did
try
to
model
at
least
a
little
bit
is
the
fact
like
it's.
D
A
Uniformly
random
queries
like
in
practice:
you
would
do
the
encoding
and
you
would
split
it
across
a
bunch
of
machines.
Now,
if
I
do
the
uniform
sampling,
so
I'm,
one
client
and
I
collect
50
samples.
I
might
potentially
need
to
contact
50
different
machines
to
get
my
sandbox.
So
that's
why
we
have
the
subtraction
of
an
index
sampler
where
we
say.
A
Maybe
you
don't
want
to
have
50
uniformly
random
indices,
but
you
want
to
have
a
random
index
and
then
50
consecutive
samples,
in
which
case
you
would
probably
only
need
to
contact
much
much
fewer
servers,
so
we
kind
of
like
will
provide
benchmarks
and
comparisons
of
like
you
know.
The.
B
Yeah,
so
as
far
as
I
understand
this
is
also
what
is
what
is
done
in
the
ethereum
construction
right?
You
group
together
symbols
and
you
you
don't
query
one
symbol
alone.
You
always
create
I,
don't
recall,
I,
think
it
was
16
episodes
or
something
like
that.
D
D
B
D
A
And
the
other
thing
regarding
what
Gotti
asked
in
the
beginning
regarding
repairability-
maybe
it's
also
worth
mentioning
so
informally
to
describe
it
like
inform
me
saying:
oh
I,
want
it
to
be
repairable,
is
very
easy,
but
actually
writing
down.
Any
sensible
definition
turns
out
to
be
extremely
hard
because,
like
you,
have
an
adversary,
and
you
want
to
have
some
kind
of
repairability
notion
that
says
something
is
correct,
but
you
didn't
start
with
the
correct
object.
So
it's
a
very
tricky.
So
we
have
one
in
the
paper
formally
written
down.
A
That
kind
of
like
does
seem
to
model
something
that
is,
let's
say,
seems
sensible
to
us,
but
even
that
definition
may
be
still
a
bit
too
weak.
So,
for
example,
if
you
want
to
distinguish
the
ethereum
construction
from
a
standard
kcg,
just
like
read
solvent
code
plus
kcg
construction,
this
definition
doesn't
necessarily
highlight
some
practical
differences
that
you
would
have,
but
making
a
stronger
definition
that
does
highlight
them
is
like
because
I
mean
you
want
to
make
a
definition.
A
That
is
expressive,
but
it's
also
understandable
because
it
can
be
very
expressive
and
useless
to
use
as
a
definition
or
the
other
way
around.
So
we
try
to
find
the
middle
ground
and,
like
there's
still
maybe
some
work
to
do
to
find
like
an
even
stronger
but
still
useful
definition
of
repairability.
That's
a
very
tricky
topic.
Actually,
it
seems
very
innocuous,
but
it's
very
hard
to
define
it.
Foreign.
F
Yep,
thank
you.
Yeah
awesome.
Great
work.
I'll
have
to
run
so
yeah
thanks
and
yeah.
Bye,
bye,
great
nice.