►
From YouTube: Compute Over Data Working Group 4th Session
Description
Joel from the Ceramic team shares a walkthrough of the purpose and structure behind data streams and their upcoming ComposeDB project. Luke from the Bacalhau team takes us through the Bacalhau architecture, roadmap, and a live demo.
Ceramic: https://ceramic.network/
Bacalhau: https://bacalhau.org/
A
All
right,
all
right,
hello,
everyone
welcome.
This
is
our
fourth
meeting
of
the
compute
over
data
working
group
very
glad
to
have
everyone
who
is
joining
in
live
and
we're
very
fortunate
today
to
have
the
ceramic
team.
Who's
gonna,
give
us
a
rundown
of
their
tech
stack
and
then
also
the
bakayow
project
after
them,
and
so
joel
will,
let
you
lead
it
off
first
and
thank
you
for
joining
and
we're
looking
forward
to
whatever
content
you'd
like
to
share.
B
Thank
you,
hi
everyone,
I'm
joel
technical,
co-founder
of
ceramic
with
me
today.
I
have
spencer
and
maybe
sergey
if
he
joins
later,
so
we
have
a
lot
of
people
who
can
ask
questions
today
if
we
get
to
them
yeah.
So
I
kind
of
wanted
to
jump
right
in
I'm
going
to
share
like
a
kind
of
intro
high
level
overview
to
ceramic,
then
dive
into
some
of
the
things
that
I
think
might
be
a
little
bit
more
relevant
to
to
this
group.
B
Perfect,
okay,
so
enter
the
ceramic.
We
see
ceramic
as
kind
of
like
an
open
graph
for
web3
data.
You
can
kind
of
think
of
it
as
like.
This
open
knowledge
graph
of
the
internet,
it's
alive,
meaning
that
anyone
can
update
it
and
kind
of
like
add
things
to
it,
and
it's
like
relational.
B
You
have
relationships
between
different
things
and
from
this
you
can
kind
of
get
this
emergent
web
of
trust,
and
we
can
start
to
like
make
sense
of
of
the
world,
because
now
the
internet
is
not
only
something
that's
presented
to
us
by
web
pages,
but
we
can
actually
inspect
the
underlying
contributions.
How
people
interacted.
B
And
so
some
of
the
kind
of
thinking
that
came
into
like
being
important
when
building
ceramic
is
that
we
wanted
to
have
the
ability
for
applications
to
share
data
across
across
themselves.
So
like
big
cross
applications
and
across
organizations,
we
want
people
to
be
able
to
like
optimize
their
workflows
for
for
their
applications.
B
We
want
data
in
the
network
to
be
composable,
meaning
that
if
there
are
two
different
applications
out
there,
and
I
see
some
useful
data
in
both
of
them,
I
want
to
be
able
to
take
those
put
them
into
my
application
and
just
run
with
it
kind
of
use
that
and
when
I
make
updates
in
my
application,
those
should
propagate
to
the
other
application.
So
the
data
kind
of
is
composable
and
that's
like
interoperable
and
kind
of
there,
and
it's
not
locked
into
like
any
particular
app,
and
so
with
this.
B
B
B
We
want
audit
trails
that
we
can
audit
after
the
fact,
and
if
we
have
this,
we
can
actually
let
authors
of
different
sorts
of
the
different
parts
of
this
kind
of
decentralized
knowledge
graph
to
to
build
reputation,
and
so
why
don't?
We
actually
just
use
the
blockchain
to
do
this,
because
it
has
all
these
properties
right.
B
B
It
means
that
we
can
have
public
goods
funding,
we
can
have
nfts,
but
the
kind
of
main
limitation
is
that
blockchains
are
limited
in
throughput
by
the
the
block
producer,
basically
the
node
that
is
producing
the
current
block
and
the
next
block
and
so
on,
and
that
can
only
be
so
big
and
you
can't
really
like
build
a
twitter
or
facebook
on
like
a
strong
consistency
model
where
everything
goes
through
like
one
node.
B
So
different
people
have
approached
different
or
have
different
approaches
to
scaling
blockchains
for
data.
Some
do
like
the
bigger
block
approach,
so,
like
solana,
celestia
are
weave.
They
have
different
ways
of
making
sure
like
convincing
themselves
like
hey
this
big
block
producer
is
fine,
because
we
have
this
kind
of
validation
methods
for
it
and
so,
but
you're
still
limited
by
the
throughput
of
the
block
reducer.
B
Then
you
have
like
the
sort
of
proof
of
storage
approach
which
falcon
uses
with
cia
and
storjay
also
uses,
and
here
you
can
have
big
blobs
of
data
you
publish
and
then
you
have
some
kind
of
agreement
between
two
different
nodes
or
a
set
of
nodes
that
hey
this
data
will
be
stored
and
be
available,
but
you're
still
kind
of
limited
by
the
throughput
of
the
block,
because
you
can,
if
you
have
like
100
million
users,
they
can't
all
make
one
transaction
each
to
the
to
like
hey
here's.
B
B
That's
decentralized
and
eventually
consistent,
so
we
want
to
have
parallel
data
execution
or
data
production
rather,
and
we
think
we
can
achieve
this
through,
like
having
audit
trails,
there
are
kind
of
independently
verifiable
and
we
still
want
some
properties
like
data
composibility,
that
data
shouldn't
be
like
kind
of
logged
to
the
different
places
where
it's
at,
and
this
would
generally,
of
course,
only
work
for
non-financial
data
because
for
financial
data
you
want
strong
consistency,
and
so
imagine,
if
you
have
like
a
set
of
event,
streams
that
are
verifiable,
you
could
have
something
that
looks
like
this.
B
You
have.
The
different
colors
here
represent
like
different
subsets
of
the
network,
and
so
your
database
might
care
about,
like
some
particular
subset
here
or
basically
like
the
network
in
this
way,
so
you
have
like
the
different
subsets
of
shards,
but
someone
building
a
different
application
might
want
to
like
have
a
different
topology
of
like
how
they
build
their
indexes,
so
same
data,
but
like
different
configuration,
and
you
can
actually
achieve
this
if
you
have
independently
verifiable
event
streams.
B
But
if
you
have
something
like
hierarchy,
critical
systems,
you
still
kind
of
like
produce
the
falcon
users,
you
still
produce
like
blocks
and
they,
like
the
data,
is
kind
of
in
a
particular
shard,
and
then
you
would
need
to
move
it.
But
here
you
can
actually
produce
the
data
independently
and
then
you
can
build
kind
of
like
specific
consensus
over
different
places.
So
that's
why
in
ceramic
we
have
separated
two
layers
essentially
or
three
layers.
If
you
think
about
the
kind
of
user
experience
so
the
base
layer,
we
have
an
event
streaming
protocol.
B
On
top
of
that,
we
have
database
protocol,
and
on
top
of
that,
we
kind
of
have
this
nice
graphql
apis,
and
this
is
the
kind
of
system
we're
building
for
this
is
called
compost
db,
so
I'll
kind
of
quickly
go
into
composite
b.
I'm
not
gonna.
I'm
kind
of
gonna
rush
through
this
because
I
think
most
relevant,
for
this
call
is
the
event
streaming
layer.
B
B
So
a
data
model
just
describes
how
data
looks
like,
and
any
user
in
the
network
can
write
data
to
this
model,
they're
creating
use
using
the
graphql
schema
definition,
language,
of
course,
use
graphql
to
query
over
them,
and
these
data
models
are
discoverable
and
composable.
So
I
can
take
two
existing
data
models,
pull
them
into
my
application
and
build
something
new
and
roughly
this
is
kind
of
what
it
would
look
like
to
define
a
model.
This
is
kind
of
a
little
bit
outdated,
so
don't
use
this
in
in
in
your
code.
B
But
here
you
have
a
model
for
a
proposal
to
imagine
like
a
dog
proposal.
It
has
an
author,
it
has
text,
and
so
this
author
here
is
like
given
proven
crypto
graphically
by
this
document,
account
kind
of
tag,
and
then
the
text
is
some
string
that
the
user
puts.
Then
you
have
a
comment
and
same
thing
here.
B
There's
an
author
there's
text,
but
we
also
have
a
proposal
id
which
references
kind
of
like
the
id
of
a
particular
instance
of
a
proposal,
and
then
you
can
kind
of
see
that
we
have
a
comments
relation
here
in
the
proposal
model,
so
you're
going
to
have
this
kind
of
relationship
that
allows
you
to
query
hey
pro.
This
proposal
give
me
all
of
the
models,
and
so
you
can
kind
of
imagine
how
this
generalizes
to
a
graph
of
of
data
models
anyway.
So
that's
the
database
system
we're
building
on
top
of
ceramic.
B
If
you're
curious
about
that,
please
reach
out,
but
that's
not
what
this
is
about
so
event
streaming
here.
I
want
to
like
slow
down
a
little
bit
and
allow
us
to
dig
a
little
bit
deeper.
If
you
at
any
point,
have
any
questions,
please
feel
free
to
interrupt
me
and
we'll
go
through
that.
B
So,
first
of
all
event,
streams
in
ceramic
are
completely
independently
verifiable,
meaning
that
I
can
synchronize
one
event
stream
in
the
network,
verify
it
and
be
sure
of
its
integrity
without
having
to
verify
any
of
the
other
data
in
the
network.
B
B
Excuse
me,
then:
we
use
decentralized
identifiers,
the
ids
for
authentication
and
we've
built
a
system
that
allows
us
to
use
essentially
like
any
blockchain
wallet
in
principle,
to
to
write
into
ceramic
and
I'll
go
into
a
little
bit
more.
What
that
looks
like
we
have
a
peer-to-peer
network
for
synchronizing
events
or
synchronizing
the
event
stream
in
general,
and
we
sort
of
like
have
a
prototype
for
like
how
this
works
right
now
in
the
future.
We
want
to
build
something:
that's
like
a
little
bit
more
scalable.
B
So
that
looks
like
some
sort
of
like
custom
epp
protocol
and
I'm
happy
to
like
dive
into
that.
If
anyone
has
questions
about
that,
and
then
all
data
is
sold
bound,
meaning
that
data
is
authored
by
a
particular
account
and
there's
no
way
to
like
transfer
the
data
like
transfer
ownership,
essentially
so
like
the
data
is,
is
with
you
and
you
can't
like
trade,
the
data
in
the
sense
of
like
an
nft.
If
you
want
to
trade,
something
use
the
blockchain
and
use
nfts
for
particular
things
or
tokens.
B
So
an
event
stream
looks
roughly
like
this.
This
kind
of,
like
the
high
level
view
and,
of
course
like
if
you
dig
into
particulars
here,
there's
more
nuance,
but
essentially
like
this,
so
you
have
a
different
events
or
the
event
stream
is
made
of
events.
B
And
first
you
have
a
genesis
event
that
essentially
creates
this
stream
and
we
have
something
called
stream
id.
The
stream
ids
is
like
an
fire
for
the
stream
and
that's
based
on
the
cid
of
the
genesis
event.
B
Then
there
are
two
types
of
events
in.
In
addition
to
the
genesis
event,
one
is
an
anchor
event
and
one
is
a
signed
event.
So
an
anchor
event
is
essentially
a
timestamp
that
utilizes
a
blockchain
in
this
case
ethereum,
to
provide
a
cryptographically
like
verifiable,
trustless
timestamp
that
hey
this
previous
event
was
published,
did
exist,
at
least
at
this
point
in
time,
and
so
we
have
a
system
for
like
how
we
reference
that
and
how
we
batch
that
into
merkle
tree.
B
So
we
can
actually
like
anchor
not
only
one
stream
with
one
transaction,
which
would
be
like
very
unscalable,
but
actually
make
one
on-chain
transaction
and
anchor
a
large
amount
of
events.
At
the
same
time,
then,
we
have
signed
events
and
the
signed
events
includes
a
signature,
basically
proving
that
hey,
I'm,
making
an
update
or
I'm
I'm
adding
an
event
to
the
stream,
and
I'm
actually
like
authorized
to
do
so.
So
the
genesis
event
specifies
the
account
the
id
that's
allowed
to
make
updates
to
this
event.
B
B
So
just
this
whole
event,
log
fits
very
neatly
into
the
ipld
data
model
yeah,
and
then
you
can
just
like
add
more
signed
events.
You
can
add
a
new
kind
of
timestamp
banker,
events
and
so
on.
You
can
keep
like
growing
this
event,
log
and
it's
all
like
hassling,
together
from
genesis
to
like
the
latest,
the
latest
event,
and
so
right
now
in
ceramic.
B
We
support
kind
of
like
only
one
canonical
branch
of
history
in
the
future,
where
we
want
to
like,
allow
us
to
diverge
and
converge
on
the
log,
because
you
might
have
like
notes
that
are
not
completely
in
sync
and
we
don't
want
to
lose
data
yeah.
So
that's
kind
of
high
level,
the
data
structure
and
the
diving
in
into
like
the
how
the
authorization
authentication
works.
B
Is
that
generally,
if
you
think
about
distribution
of
public
keys
in
the
wild
right
now,
you
have
essentially
like
ethereum,
wallets
and
solana
wallets
that
have
like
really
big
penetration
in
the
market
of
like
just
providing
public
key
infrastructure,
and
we
want
to
leverage
that
right.
We
don't
want
to
build,
have
to
build
a
new
wallet
for
all
different
kinds
of
sort
of
things.
B
So
in
this
case
you
have
a
wallet
that
holds
a
public
private
key
pair,
say
an
ethereum
address
from
an
ethereum
mattress.
We
can
create
a
d80
pth,
I'm
using
ethereum
here
as
just
an
example.
This
would
work
for
like
any
any
sort
of
blockchain
wallets
that
has
like
a
public
private
keeper.
B
Then
pkh
stands
for
public
key
hash.
So
if
you
know
an
ethereum
address,
it's
basically
a
hash
of
a
public
key
and
the
pkh.
Basically,
access
ethereum
address,
plus
which
network
is
on,
and
now
it's
a
d80
and
then
there's
a
resource
in
ceramic.
In
the
case
of
ceramic,
it's
an
inventory,
and
so
in
the
genesis
event
it
specified
that
hey
this
particular
did.
Pkh
controls
this
resource.
B
B
But
if
you
think
about,
like
the
user
experience,
you
don't
generally
want
to
have
to
sign
the
message
in
meta
mask
every
time:
you're
making
an
update,
especially
if
it's
like
a
social
media
app.
You
just
want
to
make
a
comment
or
like
as
a
dow
proposal
like
you,
you
don't
want
to
like
always
have
to
pop
up
a
meta
mask
pop-up
and
so
to
mitigate
that
we're
using
something
called
object,
capabilities.
B
So
we
basically
leverage
sign
in
with
ethereum
for
ethereum,
there's
a
similar
standard
being
built
for
solana
and
can
be
extended
to
like
different
blockchains
as
well.
B
Basically,
you
generate
the
session
key,
which
is
the
id
key
in
the
browser.
Then
you
have
a
message
or
you
generate
a
message
that
includes
the
public
key
of
the
session
key
and
includes
the
shadow
identifier
of
the
resource,
a
stream
id
for
ceramic,
and
then
it
signs
over
that.
So
the
wall
is
basically
signs
giving
the
session
key
access
to
write
on
behalf
of
the
did
pkh
to
this
resource.
B
Now
the
the
application
has
the
session
key
and
it
kind
of
packages.
The
signature
and
the
message
that
was
signed
from
the
wallet
into
a
an
iple
format
called
cacao
and
every
time
it
now,
the
application
wants
to
update
this
resource.
It
basically
makes
jws
signatures
store
that
as
jose
and
includes
a
reference
to
the
basically
includes
the
cid
as
part
of
like
the
signature
payload
of
the
cacao
as
an
invocation.
B
You
can,
of
course,
also
you
know,
create
an
event
streaming
in
ceramic
that
only
relies
on
the
dad
key
and
the
resource
is
actually
controlled
by
directly
at
the
id
key.
It
kind
of
depends
on
like
what
sort
of
application
you
want
to
deploy
so
like.
B
If
you
have
a
server
side
application,
you
don't
really
need
like
a
wallet
and
all
that
infrastructure
for
users,
so
you
can
just
like
use
a
public
card
key
pair
and
just
use
the
adk
method,
so
there
it
depends
a
little
bit
on
how
you
would
approach
things,
but
for
users
this
is
really
neat
cool,
yeah
and
there's
also
work
happening
to
kind
of
align,
the
cacao
format
and
the
cacao
support
for
like
signing
with
ethereum
and
potentially
like
other
on-chain
capabilities,
with
the
work
done
by
ucan,
so
kind
of
like
harmonizing
those
standards,
because
they
kind
of
achieve
somewhat
different
goals,
but
they
they
have
or
they
have
similar
goals,
but
they
they
do
different
things
right
now,
or
they
have
different
capabilities.
B
No
pun
intended
right,
now,
cool
and
then
so.
What
are
kind
of
the
the
use
cases
for
event
streams,
so
one
that
I
already
talked
about
because
we
are
building
a
database
on
top
of
this
is,
of
course
you
can
use
event,
streams
for
databases
and
what's
neat
about
this-
is
that
you
don't
need
to
build
like
a
mapping
like
one
to
one
like
one
so
event,
type
of
event
stream
to
a
database.
You
can
actually
have
event
streams
and
then
build
different
sorts
of
databases
on
top
of
it.
B
B
Someone
wants
to
maybe
like
build
like
a
local
first
database
that
doesn't
really
like
care
about
the
global
state
but
kind
of
or
you
want
something
that
works
in
both
cases
like
both
local
and
like
a
global
database.
That's
like
optimized
for
both
and
the
event
streams
are
kind
of
agnostic
as
to
like
what
sort
of
indexes
and
what
sort
of
databases
you
build
on
top
and
since
we
leveraged
event
streams,
we
can
kind
of
like
have
different
consistency.
B
Models
as
well
like
I
might
want
to
have
like
I
plug
into
some
event
streams
and
build
my
own
database
locally
on
my
machine
and
other
people
do
the
same,
or
I
might
want
to
build
like
a
database
that
actually
leverage
a
blockchain
to
like
ingest
events
from
event
streams
and
build
consensus
of
like
what
we
observed
in
these
event
streams
and
build
an
index.
And
so
we
have
like
consensus
over
over
that
index.
B
I
think
what
is
probably
more
interesting
for
this
group
is
like
using
event
streams
for
compute,
so
we
can
think
of
like
an
event.
You
must
like
sort
of
like
a
formal
glue
between
different
compute
stages
in
a
data
pipeline,
and
so
imagine
that
you
have
like
someone
produces
some
data
and
puts
outputs
that
into
an
event
stream
and
signs
it
and
now,
okay.
I
trust
this
author
of
the
data
I'm
going
to
actually
plug
into
that
run.
Some
computation
on
top
of
that
or
the
set
of
event
streams.
B
I'll
put
my
my
result
of
my
computation
into
a
new
event
stream
and
then
maybe
like
do
an
extra
step
after
that
and
do
the
same
sort
of
thing
or
you
might
actually
want
to
have
these
sort
of
open-ended
pipelines
where
I
build
a
pipeline
based
on
like
some
events
and
then
some
computation,
some
events
and
I
achieve
a
particular
goal.
B
Maybe
like
you
want
to
have
a
system
where
you
have
like
five
different
nodes
or
five
different
organizations
that
produce
they
run
the
same
computation
and
produce
the
output.
And
if
it's
the
same,
you
can
kind
of
trust
it
or
you
actually
have
a
consistent
system
around
it.
But
you
can
kind
of
like
mix
and
match,
which
is
is
kind
of
cool.
B
And
so,
if
you
were
to
build
this
sort
of
like
system
on
top
of
ceramic
event,
streams
ceramic
wouldn't
really
care
about
like
how
you
define
compute
jobs
and
how
they're
executed.
B
I
know
there's
like
because
the
only
thing
ceramic
would
care
about
is:
is
you
produce
an
event
stream
that
the
signatures
are
correctly
produced
and
it's
up
to
you
to
enter
how
you
interpret
those
events
and
yeah?
I
know
that
there's
like
a
bunch
of
different
interesting
projects
in
like
the
ipfs
file
coin
space,
where
I
think
there's
like
this
project
starting
to
build
the
ipli.
B
I
think
it
was
called
like
the
linked
invocation.
I
don't
know
if
there's
anything
public
about
that,
but
that
was
talked
about
during
the
ipfs
thing.
B
I
know
block
science
is
building
this
cat
thing,
which
is
like
content,
addressable
transformers
and
there's
probably
like
a
bunch
of
different,
much
more
different
examples
that
I'm
not
aware
of,
and
it
would
be
cool
if
we
can
kind
of
plug
in
plug
this
thing
into
these
event,
streams
and
kind
of
like
sort
of
mix
and
match
these
sort
of
different
approaches
to
computation
and
yeah.
So
my
question
to
this
group
is:
is
there
like
an
appetite
to
standardize
around
some
of
these
things?
B
Like
do
we
want
to
standardize
how
we
do
an
event,
log
data
structure
and
how
we
verify
signatures,
how
we
do
the
id
authentication
authorization
and
how
we
do
like
timestamps
through
anchors,
I
think,
maybe
like
we
want
to
standardize
around
how
we
synchronize
these
event
streams
in
a
peer-to-peer
manner,
but
I
think,
like
only
standardizing
around
like
you
know,
a
data
structure
would
be
really
interesting,
because
then
we
can
at
least
have
a
standardized
way
of
validating
things
across
across
the
ecosystem.
B
So
yeah,
that's
that's
the
question
I
want
to
pose
to
to
this
community
and-
and
you
know,
if
there's
interest
we'd
be
happy
to
like-
maybe
make
a
first
proposal
of
like
what
this
sort
of
thing
could
look
like.
If
this
is
not
interesting,
then
you
know
yeah,
that's
essentially
it
from
from
my
end
happy
to
talk
about
this
or
answer
any
questions.
A
A
Well,
joel,
for
what
it's
worth,
I
know
the
fission
folks
are
not
here
today,
but
they're
very
actively
pushing
the
did
standard
forward
as
well.
So
I
think
other
folks
in
the
community
will
also
have
strong
opinions.
I'll
take
a
couple
notes
on
the
the
standards
component
that
you
raised,
because
I
think
that's
a
great
idea.
I
love
the
idea
of
you.
Guys
are
building
it
out.
You
guys
can
be
the
first
and,
if
other
folks
want
to,
you,
know
collaborate
on
that
all
the
better.
A
If
not
at
least
we
have
a
first
implementation,
and
I
thought
it
was
really
interesting
how
you
were
sharing
to
go
back
to
the
notes
here,
the
the
different
permutations
of
event,
streams,
and
do
you
have
any
indication
about
like
the
data
sizes
that
would
be
ideal
for
event
streams?
Are
they
limited
to
one
node
processing
the
data
streams,
or
is
it
even
large?
Big
data
sets
as
well.
B
Yeah,
so
so
currently,
how
ceramic
is
designed
is
that
ceramic
network
doesn't
really
like
care
about
how
the
event
streams
are
replicated.
It's
kind
of
like
up
to
your
node
to
decide
hey.
B
I
want
to
replicate
this
particular
event
stream,
and
so,
if
you
want
to
build
like
a
really
large
event
stream,
that
has
huge
chunks
of
data,
that's
fine
or
you
maybe
just
want
to
build
an
event
stream
that
has
references
to
cids
that
you
could
retrieve
of
falcoin,
and
you
have
the
data
integrity
stored
in
ceramic
or
like
the
kind
of
not
the
data
integrity,
that's
by
the
city,
but
like
the
the
integrity
of
like
the
process
of
like
how
different
pieces
of
data
was
processed
and
and
attested
to
over
time,.
A
C
Yeah
I'll
I
I'd
like
to
plus
one
on
the
creating
the
the
general
standards
and
and
yeah.
We
don't
have
everyone
here,
but
the
extent
to
which
you
can
start
getting
those
documents
you
know
onto
our
pages
for
people
to
discuss,
I
think,
would
be
amazing
and
wes
is
exactly
right.
First
person
to
propose
something.
This
is
a
lowercase
s
standard.
C
I
want
to
make
clear
like
just
up
to
us
to
figure
out
how
to
you
know,
produce
it
and,
and
anyone
who
wants
to
use
it
can
and
anyone
who
doesn't
doesn't
have
to,
but
the
extent
to
which
you
know
we
can
have
things
go
across
platforms
I
think,
is
incredibly
powerful
and
certainly
something
that
that
you
know.
I
think
a
lot
of
people
would
be
excited
about.
A
Very
good,
all
right,
joel!
Thank
you
so
much
for
the
overview
appreciate
that
that
was.
That
was
tremendous.
So
moving
on
to
the
second
half
of
the
session
david,
I
can
hand
it
over
to
you
if
you'd
like
to
give
us
a
brief
intro
of
backley
out.
D
Okay,
great
here
we
go
cool
so
hi
folks,
I
will
share
my
screen.
The
first
challenge
is
sharing
the
correct
screen,
because
sometimes
I
share
the
wrong
screen
and
then
I
do
the
whole
presentation
and
someone's
just
looking
at
my
inbox.
So
hopefully
you
can
see
some
slides.
Can
everyone
see
some
slides.
D
Okay,
great
cool,
so
hi
everyone,
I'm
luke,
I'm
here
with
kai
who
we
are
the
tech
co-leads
on
the
on
the
backyard
project,
and
we
also
work
with
david
and
wes
on
this
project
too,
just
kind
of
by
way
of
of
setting
some
context.
We
are
working
on
a
compute
over
data
project,
but
we
also
have
this
kind
of
dual
mandate,
so
we
have
two
jobs.
D
The
the
teams
in
this
working
group
to
be
more
successful
by
building
building
blocks,
for
you
finding
ways
to
collaborate
and
build
common
libraries
or
common
standards
and
by
splitting
out
working
code
from
whatever
we're
building
that
can
be
shared
in
the
community,
and
I
wanted
to
share
a
little
bit
of
sort
of
history
here.
So
dave-
and
I
have
some
history
together.
D
D
D
This
working
group
called
sig
cluster
lifecycle
and
we
built
a
reference
implementation,
which
was
cube
adm
for
installing
kubernetes,
but
we
also
made
sure
that
the
cube
adm
could
be
vendored
into
other
projects
and
was
providing
building
blocks
for
other
projects,
and
we
had
great
success
with
this,
because,
within
a
few
weeks
and
months
of
cube
adm
going
out
there,
we
had
coob
spray
cops,
mini,
cube
kind.
D
Lots
of
the
other
kubernetes
installer
projects
started
adopting
the
building
blocks
that
that
we
put
out
there
and
therefore
they
were
able
to
spend
more
time
on
their
differentiated
features
and
became
more
successful
as
as
installer
projects.
So
that's
that's
basically
we're
kind
of
we're
here
to
do
that
again
in
the
in
the
cod
in
the
computer
over
data
web
3
space,
so
that's
kind
of
what
we're
hoping
to
achieve.
D
So
with
that.
I
want
to
talk
a
little
bit
about
what
we've
got
now
and
as
you
look
at
this,
if,
if
you're
working
on
a
project
in
this
space,
if
you're
here
in
the
room
now
or
if
you're
watching
the
recording
later
have
a
think
about
whether
there's
anything
in
this
that's
useful
to
you
and
if
there
is
then
please
ask
us
to
split
it
out
and
we'll
split
it
out
into
a
reusable
component
that
you
can
that
you
can
use
so
that
you
have
to
maintain
less
code.
D
So
we
have
this
peer-to-peer
compute
network
that
interacts
primarily
with
ipfs
supports
users
showing
up
using
a
cli
to
submit
jobs
that
can
either
be
docker
jobs,
so
they
can
bring
their
own
docker
image
or
they
can
just
bring
a
python
script
and
we'll
run
python
inside
wasm
for
them.
The
reason
for
that
is
that
we
are
going
down
a
path
of
determinism
which
is
easier
to
achieve
in
wasm
and
I'll
talk
about
that
more
in
in
a
minute.
D
The
determinism
is
in
order
to
enable
verifiability
which
enables
us
to
start
to
have
some
confidence
in
the
results
we
support
concurrency.
So
you
can
say
how
many
times
you
want
to
run
your
job,
and
so
you
can
say,
concurrency
equals
three
and
then
three
nodes
in
the
network.
We'll
pick
it
up.
You
can
also
just
recently
we
added
support
for
sharding,
so
you
can
say
I've
got
a
job
that
has
ten
thousand
files
in
it.
I
need
to
resize
ten
thousand
images.
D
I
want
to
do
that
in
a
batch
size
of
a
hundred,
and
you
just
set
you
just
mentioned
a
glob
pattern.
Dot
jpeg
batch
size
equals
100
and
your
command,
and
then
it
will
get
run
across
10
nodes
in
parallel
and
the
results
get
reassembled
automatically
we
support
reading
from
ipfs.
D
Obviously
we
also
support
reading
from
http,
because
there's
an
awful
lot
of
data
sets
available
over
http
on
the
internet,
so
this
is
can
also
act
as
a
bridge
as
a
way
for
people
to
get
data
into
ipfs
by
reading
it
from
http
processing
it
and
writing
the
results
to
ipfs
and
right
now.
D
We
also
write
to
ipfs
and
we've
got
this
thing:
scaling
up
to
750
nodes
with
lots
of
performance
work
on
going
to
make
things
fast
at
750,
1k
10k
we're
working
on
on
scaling
it
up
to
tens
of
thousands
of
nodes
operating
on
petabytes
of
data,
so
that's
kind
of
where
we
are
today.
D
I
can
show
you
a
quick
demo
if
you
please
pray
to
the
demo
gods.
We
just
got
this
working
a
few
hours
ago.
So
what
we
have
here
is
I'm
going
to
do
a
little
bit
of
work
locally.
So
imagine
I
am
working
on
processing
video
data
and
I've
got
some
video
files
on
my
local
machine
and
I've
also
got
a
great
deal
more
video
data
on
ipfs.
D
That
is
more
data
than
I
can
fit
on
my
local
machine,
and
so
here
we
have
the
local
data,
and
here
we
have
kind
of
you
have
to
use
your
imagination
a
little
bit,
but
imagine
that
there
was
tons
and
tons
and
tons
of
data
in
in
this
ipfsc
id.
That
was
more
than
I
could
than
I
could
fit
on
on
my
physical
machine.
D
So
I'm
going
to
look
at
the
video,
it's
just
some
nice
footage
of
looking
at
a
gothic
building
and
then
there's
some
other
footage
that
we're
gonna
that
we're
gonna
use
and
what
we
wanna
do
is
we
want
to
apply
funky,
matrix
style
overlay
on
the
top
of
this
video,
so
I'm
gonna
do
that
locally
by
running
a
docker
command,
and
this
is
gonna
use
ffmpeg
inside
a
docker
container,
and
this
is
just
operating
on
this
literally
this
local
file
that
I
have
on
my
computer
and
it's
it's
processing
it
and
outputting
the
processed
video
data
in
in
my
outputs
directory
that
I
have
here,
I
can
show
you
exactly
what
that
job
was.
D
If
I
scroll
up
enough,
so
we
did
a
docker
run,
we
mounted
the
inputs
directory
and
the
outputs
directory,
and
we
ran
this
video
resize
example
container
image
and
we
just
ran
a
shell
script
inside
it
and
pointed
it
to
the
inputs
directory
in
the
outputs
directory.
So
that's
how
you
run
things
in
docker,
normally,
and
so
it's
finished,
and
so
we
can
see
what
this
looks
like
and
there
we
have
it
sort
of
funky,
matrix
style
text,
that's
being
overlaid
over
the
image
over
the
video.
D
Now
we're
probably
not
gonna
win
any
awards
for
for
our
our
video
work,
but
this
will
kind
of
serve
to
to
demonstrate
the
point.
I
hope
so
I'm
now
going
to
export
this
cid
now
remember
this
cid
points
to
this
data
which,
for
the
sake
of
argument,
imagine
this
is
more
data
than
you
could
process
on
your
local
machine.
So
I'm
going
to
I'm
going
to
show
you
two
things.
D
The
first
thing
is
I'm
going
to
show
you
just
running
that
same
container
image
on
the
back
of
your
network
and
I'm
going
to
time
it.
So
you
can
see
how
long
it
takes
so
we're
doing
instead
of
docker
run
we're
doing
backup
docker
run
so
we've
made
the
docker
run
command
in
battle,
yeah
very
similar
to
the
one
that
you
get
from
docker,
so
that
people
are
familiar
with
it
and
oh
yeah
just
quickly.
D
I
want
to
show
you
what's
actually
happening
on
the
nodes,
so
this
is
three
nodes
on
our
production
network.
D
One
of
those
nodes
has
picked
up
that
job
and
is
running
it
and
it's
finished
so
in
31
seconds
it
has
processed
those
those
three
files
so
yeah
just
to
show
you
the
command
in
a
bit
more
detail.
D
We're
re
we're
mounting
the
cid
as
an
input,
volume
and
back
liao
will
handle
reading
the
data
in
from
ipfs,
and
then
we
can
specify
you
get
c
two
cpus
for
gigabyte
memory,
and
then
we
can
say
please
wait
until
the
job
finishes,
which
is
why
it
actually
waited
this
command
blocked
until
the
command
finished.
And
then
you
can
run
bakulyao
get
on
the
job
id
and
we
will
see
that
we
can
see
the
results
of
that.
D
Excuse
me,
so
we
can
see
now
that
we've
got
processed
video
footage
from
the
baccala
network.
Okay,
so
that's
pretty
cool
and
you
can
also
see
if
there
was
any
errors
or
any
messages
on
standard
out.
Those
also
get
written
out
to
these
files
here.
D
So
I'm
going
to
clean
up
from
that
run
and
then
I'm
also
going
to
show
you
what
you
can
do
when
you
use
sharding
and
so
sharding
is
about
parallelizing
the
work
across
multiple
machines
on
the
network,
and
so
what
we're
doing
here-
and
actually
I
need
to
show
you
this
quite
quickly,
because
it's
so
fast-
is
that
the
work
got
spread
out
across
multiple
of
the
nodes
on
the
network.
You
can
see
this
middle
node
picked
up,
one
of
the
jobs
and
the
third
node
here
picked
up
two
of
the
jobs.
D
It's
a
bit
random
at
the
moment
which
nodes
pick
up,
which
jobs
but
and
you
can
see
we
did
the
work
10
seconds
faster
than
we
did
when
we
just
ran
it
one
after
the
other.
So
that's
pretty
cool.
We
can
slash
the
run
time
of
our
work
by
parallelizing
it
across
the
network
and
now,
if
we
do
baklyow
get
on
on
that
job
id,
then
you
can
see
this
time
we
had
three
shards.
D
We've
got
shard
zero
and
shard
two
and
shard
one
showing
up,
and
so
you
get
separate
exit
codes
and
standard
out
and
standard
error
from
all
of
them.
But
then
inside
volumes
you
get
the
same
data
reassembled
in
in
the
output
volume
directly,
so
yeah,
that's
the
demo,
that's
kind
of
what
we
have
now
yeah
I'll.
If
you've
got
questions,
please
hold
on
to
them.
D
I'll,
take
take
questions
at
the
end,
so
I'll,
just
kind
of
blast
through
our
road
map
and
talk
about
what
we're
gonna
do
next
and
and
like
I
said
so
as
you're
looking
through
the
road
map.
If
there
are
things
here
that
are
interesting,
that
you'd
like
us
to
split
out
into
separate
projects,
then
please
come
and
tell
us
and
because
we
we
want
to
do
that
to
make
the
whole
community
successful.
D
So
the
current
roadmap
looks
like
this.
I've
got
a
few
slides
here
for
each
year
half
so
I've
got
till
the
end
of
this
year,
first
half
of
next
year,
first
second
half
of
next
year
and
then
the
first
half
of
2024
and
notice
that
in
our
roadmap
we
have
these
explicit
goals.
D
Like
I
said
for
making
the
compute
over
data
working
group
successful,
so
we
want
to
find
partners
who
can
use
our
code
or
integrate
with
us
in
a
meaningful
way,
and
we
don't
know
how
much
engineering
effort
each
of
those
integration
efforts
will
take.
But
we
are
resourced
to
do
that.
D
We
are
also
working
on
making
users
successful
so
you'll
see
that
throughout
the
roadmap
is
that
we
have
targets
on
that,
and
then
you
can
also
see
the
kind
of
feature
work
and
that
we're
adding
so
there's
a
big
focus
at
the
moment
on
performance,
getting
performance
down
to
seconds
of
job
execution
in
thousand
node
networks.
We're
not
far
from
that
now.
D
D
D
So
if
you
like,
I
mentioned
earlier
with
the
deterministic
execution
mode
in
wasm,
you
will
be
able
to
start
assuming
that
if
multiple
nodes
came
to
the
same
result,
then
well
either
they
all
did
the
work
correctly
or
they're,
all
lying,
so
yeah
being
able
to
hash
the
outputs
of
deterministic
jobs
is
kind
of
step,
one
in
our
approach
to
addressing
byzantine
fault,
tolerance,
we're
also
looking
to
make
users
successful,
then,
after
that
we
are
adding
support
for
dag
execution.
D
D
Ideally,
we
don't
want
to
reinvent
the
wheel.
So
if
we
can
integrate
with
something
like
airflow
for
example,
then
we
might
do
that,
but
that's
all
tbd.
We
haven't
designed
that
yet
and
then,
of
course,
as
we
go
through
this,
we're
we're
interested
in
finding
ways
to
factor
out
what
we've
done
and
help
and
use
it
to
help
other
participants
in
the
working
group.
Then
we
are
upping
our
game
on
byzantine
fault,
tolerance
towards
the
end
of
the
year,
we're
looking
towards
making
more
users
successful
as
well.
D
Of
course,
in
january
next
year
we
might
tackle
non-determinism,
which
is
how
do
you
verify
so
we,
you
can
currently
run
non-deterministic
jobs
on
the
network,
but
obviously
you
can't
it's
hard,
it's
harder
to
verify
non-deterministic
jobs
than
it
is
deterministic
jobs.
So
we
we'll
we
built
a
prototype
earlier
on
in
the
project
of
how
you
could
verify
non-deterministic
jobs.
I
won't
dive
too
much
into
that
now.
D
But
if
you
have
questions,
please
ask
then
more
focus
on
work
on
working
groups,
successes
and
user
successes
and
then
we're
going
to
probably
around
march
next
year,
pick
up
prototyping
and
verification
protocol
around
battle
yell.
D
So
the
let
me
just
pull
up
some
notes
here.
The
the
verification
protocol
will
help
us
to
move
towards
being
able
to
build
systems
like
this
that
eventually
interact
with
incentives
and
so
the
verification
protocol.
For
example,
we
have
a
draft
of
right
now
that
we,
for
example,
can
have
nodes.
D
Punish
slash
those
nodes
in
some
way
and
then
that
prototyping
of
that
protocol,
which
we
have
a
draft
of
in
the
wiki,
will
allow
us
to
start
kind
of
testing
those
ideas
and
that
will
lead
towards
being
able
to
connect
the
network
to
a
smart
contract.
D
So
we
have
this
large
piece
of
work
in
april
through
june
next
year,
around
integrating
with
a
smart
contract,
and
one
of
the
ideas
is
that
our
entire
transport
and
scheduler
layer,
which
currently
is
just
a
custom
implementation
on
top
of
gossip
sub
on
live
p2p,
would
become
replaced
with,
for
example,
an
fvm
contract
and
so
that
fvm
contract
would
then
be
responsible
for
implementing
the
verification
protocol
and
then
around
that
time,
we're
looking
at
so
just
after.
We
start
work
on
the
smart
contract.
D
Some
notes
on
this
as
well.
Sorry,
I
should
have
had
these
already,
but
basically.
D
Let
me
see
sorry
the
yeah
here
it
is
so
so
the
formal
verification
we're
looking
at
tools
like
glow,
daphne
and
y3
as
ways
to
encode
the
behavior
of
the
verification
protocol
in
a
in
a
formal
system
that
you
we
that
we
can
mathematically,
prove
things
about,
and
the
goal
of
that
effort
is
to
make
it
it's
basically
to
eliminate
bugs
in
the
protocol
like
earlier
than
finding
them
because
we
get
hacked,
and
so
that
will
be
quite
a
big
effort,
but
we
believe
it
will
be
worthwhile
and
we
think
it's
it's
very
interesting.
D
I
think
there's.
So.
I've
worked
previously
on
formal
verification
of
some
network
algorithms,
where
we
actually
found
novel
attacks
against
established
protocols
by
by
using
formal
verification.
So
I
think
that's
an
area
that
is
super
interesting
in
the
space
honestly
so
excited
to
to
potentially
spin
up
an
effort
around
that,
and
then
I
see
basically
the
the
big
challenge
that
I
see
with
moving
to
a
smart
contract.
Implementation
of
the
of
the
transport
and
the
scheduler
layer
in
the
system
is
efficiency.
D
So
once
we
have
all
of
the
kind
of
scheduling
and
metadata
around
this
job
execution
happening
on
chain,
then
we
expect
that
will
initially
dramatically
reduce
the
performance
of
the
system,
and
so
we've
got
like
a
whole
half
a
year
dedicated
to
getting
that
back
up
to
scratch.
In
parallel
with
formal
verification,
all
the
while
making
users
successful
and
helping
split
out
work
that
we're
doing
and
then
come
october.
We
plan
to
build
a
plug-in
system.
D
We're
also,
then,
going
to
look
at
a
reputation
system
which
is
kind
of
using
the
outputs
from
the
smart
contract
and
the
verification
protocol
to
make
a
basically
a
published
public
dashboard
of
which
compute
nodes
are
are
reputable
and
which
ones
are
not
based
on
which
ones
have
been
deemed
to
to
be
publishing
good
results
and
which
ones
are
not,
and
then
for
the
following:
half
of
2024,
looking
at
incentive
models,
developer
experience
and
then
we've
also
heard
a
lot
of
demand
for
secrecy.
D
D
So
we're
really
interested
in
talking
to
other
working
group
participants
who
are
already
addressing
secrecy
and
potentially
partnering
with
them,
because
we
see
that
as
a
very
difficult
problem
to
solve.
Frankly,
then,
more
user
success
scaling
up
the
number
of
users
scaling
up
the
also
the
number
of
working
group
partners
who
we
have
building
on
the
foundations
that
we're
laying,
hopefully
and
then.
D
That's
dealing
with
an
arbitrary
api
request,
because
the
profile
of
that
execution
is
going
to
be
different
depending
on
what
the
api,
what
api
requests
come
in,
so
there's
a
whole
interesting
area
of
work
there,
but
I'll
stop
to
make
sure
that
we
have
time
for
q
a
and
discussion
here.
I'd
love
to
hear
from
folks
on
the
call
or
in
the
recording
afterwards.
D
D
Is
there
anything
on
the
road
map
that
people
would
like
to
collaborate
on
we'd
love
to
work
with
other
teams,
and
is
there
anything
you
think
that
we
should
work
on
that?
We
don't
have
on
our
roadmap
today,
because
we're
here
we're
here
to
serve
the
working
group,
so
yeah.
E
D
You
I'd
love
to
hear
answers
to
those
questions
or
any
other
questions
that
people
have.
A
Well
I'll
give
everybody
else
a
minute
to
chime
in
if
they
have
questions,
I
will
also
add
both
luke
for
these
these
questions
and
also
the
ones
that
joel
raised.
We
are
going
to
be
launching
a
discourse
server
soon
for
the
community,
so
the
very
least
even
for
folks
that
can't
join,
live,
we'll
make
mark
these
episodes
of
our
first
issues
for
discussion
purposes
and
try
to
funnel
folks
there
as
well,
but
I'll
just
pause
for
anybody
else.
Have
any
questions.
A
Okay,
all
right
well,
we'll
have
the
content
up
on
on
youtube
here
shortly,
so
other
folks
can
can
view
it.
A
lot
of
folks
weren't
able
to
join
on
the
call
today
live,
but
luke,
joel
I'll,
just
pause.
If
you
guys
have
anything
else,.
D
Yeah,
if
joel
is
still
on
the
call,
I
wanted
to
say,
I
really
enjoyed
your
your
presentation.
Thank
you,
as
you
can
see,
from
kind
of
what
we're
working
on
the
the
identity
standard
stuff
that
you
mentioned
is
not
directly
relevant
to
to
our
goals,
but
it
sounds
like
a
very
useful
thing
to
have
and
yeah
I
don't
know.
Maybe
you
have
some
thoughts
on
on
whether
there's
intersections
between
our
work.
That
could
be
helpful.
B
Yeah
I
mean,
and
the
stuff
you
presented
on
your
roadmap
is
like
very
wide
and
a
lot
of
like
open
problems,
so
I
can't
like
really
speak
to
like
exactly
how
it
will
fit
in,
but
like
we
basically
use
the
did
standard.
B
So
I
think
like
if
you
ever
have
like
a
thing
that
needs
more
than
just
like.
Okay,
I
have
a
computer
job
that
I
run
and
there's
a
signature
over
it
and
I
put
it
on
the
blockchain.
Then
that
might
be
interesting
or
yeah.
I
guess
like
so
I
guess
the
question
back
to
you
is
like,
as
you
imagine,
back
layout
right
now.
Is
it
like
a
fully
kind
of
just
like
incentivized
compute
network,
or
do
you
imagine
like,
as
people
build
different
like
compute
pipelines?
B
On
top
of
it,
you
actually
have
some
people
that
run
like
a
centralized
services
puts
inputs
output
in
there
and
then
maybe
there's
part
of
the
pipeline.
That's
like
more
distributed
and
more
verified.
B
D
D
I
think
if
we
develop
the
building
blocks,
if
we're
able
to
help
develop
the
building
blocks
for
like
10
different
projects
to
be
successful,
then
I
think
that
will
be
deemed
a
success
and,
at
the
same
time,
we're
also
going
to
be
guided
by
what
our
users
are
asking
for
and
and
what
the
storage
providers
who
we
want
to
integrate
with
are
asking
for,
and
things
like
that,
so
yeah,
not
a
direct
answer,
I'm
afraid,
but.
B
No,
that
makes
sense
and,
like
I'm
interested
to
see,
like
you
know,
from
from
a
very,
very
abstract
perspective
like
how
can
we,
because
different
people
are
building
different
story
like
you're
building
one
computer
project?
There
are
other
people
building
like
different
sorts
of
compute
projects.
I
think
the
the
thing
that
block
science
is
doing
with
like
having
just
what
they
call
cat
a
content
adjustable
transformers.
B
And
like
the
ability
to
take
these
different
sorts
of
compute
systems
and
like
plug
them
together,
because
they
might
be
good
at
different
things,
yeah
and
that's
why
I
think
it's
like,
like
wondering:
can
we
standardize
around
some
way
where
we
like?
This
is
how
we
shuffle
data
between
the
systems.
D
That
would
be
very
interesting.
I
think
that's
a
thread
that
we
should
pull
on
and
work
on
together
and
also
on
the
did
thing.
I
think
it
might
be
interesting
to
see
how
whether
identity
for
both
users,
submitting
the
jobs
and
also
for
the
compute
providers,
is
useful
in
that
in
that
world.
Sorry
go
ahead,
kai.
E
Well,
I
was
just
going
to
say
when
we
get
to
the
dag
story
that
that's
an
opportunity
for
baccala
to
be
one
stage
of
a
dag.
So
when
we
start
to
think
what
is
a
dag
like
bringing
various
systems
together,
because
if,
if
I
say,
consume
an
input
from
ipfs
and
write
the
output
to
ipfs,
and
then
I
tell
another
system-
hey
your
input
is
here:
I
just
wrote
it.
E
Then
we
start
to
have
like
is
in
the
the
the
pipeline
between
these
various
systems
can
be
the
storage
of
the
output
and
then
reading
from
that
input
and
there's
various
ways
that
we
can
integrate
like
that
and
yeah.
I
think
it's
a
it's
a
compelling
picture
to
say:
there's
lots
of
different
compute
projects
out
there.
Each
would
be
good
at
a
various
stage
of
the
pipeline
is
a
very
strong
statement.
I
like
it.
B
B
So
I
I
don't
view
it
as
like
we're
building
this
as
like
an
identity
piece
in
the
event
log.
You
can
use
it
for
identity,
but
it's
not
kind
of
what
it
is
at
this
core.
D
C
Okay,
I
know
we
are
out
of
time
and
before
everyone
departs
we
I
just
wanted
to
say
we
will
be
circulating.
Please
do
mark
down
on
your
calendar.
I'm
sorry.
E
C
It
off
the
conversation,
that's
just
right
out
of
time,
and
I
do
want
to
cover
this
one
item
november,
2nd
and
3rd
in
lisbon
is
going
to
be
the
competitor
for
data
summit.
I
will
be
sending
around
a
we're
trying
to
build
the
schedule
to
right
now
it
is
supposed
to
be
collaborative.
I
would
love
to
have
lots
of
people
talk.
You
know
across
the
over
the
course
of
the
day.
C
Talk
about
your
projects,
talk
about
things
that
are
interesting
to
you
and
then
work
on
some
tracks,
hallway
tracks
for
or
not
hallway,
but
unconference
or
other
tracks,
where
we
can
collaborate
on
particular
items
like
specs,
and
things
like
that.
C
So
please
keep
keep
your
eye
out
for
that
and
do
jump
in
the
compute
over
data
working
group
slack
to
continue
the
conversations
I
just
don't
want
any.
I
mean
we
can
continue
to
talk
now
to
to
be
clear,
but
I
just
didn't
want
to
miss
out
on
the
opportunity
for
everyone
to
collaborate.