►
From YouTube: Compute over data Networks Discovery and Composability
Description
Push-based approach with Aqua
Dmitry Kurinsky, Fluence Labs CTO
Compute Over Data Summit, Nov 2, 2022
Original recording https://youtu.be/WqquUQDgHj0
A
My
name
is
Dmitry
I'm
from
fluence
Labs,
I'm,
CTO
and
co-founder
of
Flint
slabs
and
I
want
just
to
share
some
raw
thoughts
about
computer
data,
computer
data
networks,
Discovery
and
compatibility
and
how
fluence
can
help.
What's
our
approach
on
this
topic,
so
I
decided
to
go
straight
for
from
the
fundamentals.
I
will
speak
about
data
compute
computer
data
Network.
What
does
mean
to
have
a
computer
data
Network
and
why
do
we
need
meaning
what
does
mean
discoverability
compatibility?
A
What
does
mean
computer
or
data
networks,
Discovery
and
compatibility?
And
finally,
when
we
have
some
understanding
of
the
problem,
I
will
tell
what
we
at
friends
do
to
help
with
this
problem.
A
So,
first
of
all
data
and
compute
data
is
things
known
or
assumed
as
facts
making
the
basis
of
reasoning
or
calculation
and
to
compute.
From
my
opinion,
it
means
to
shuffle
data
in
a
certain
way
for
a
purpose,
so
data
is
usually
needed
in
a
different
form
that
it
appears
it
appears
in
some
raw
form,
but
isn't
in
another.
And
what
does
this
mean
another
form?
It
means
aggregated,
retrieved,
analyzed,
filtered
augmented
indexes
searched
and
so
on.
A
So
we
need
data
in
many
many
steps
and
forms
and
compute
actually
accompanies
data
all
the
way
and
the
same
calls
for
data.
So
if
we
want
to
capture
the
fact,
it
actually
means
that
we
can
yield
this
fact
without
providing
something
before
it.
It
means
that
we
have
some
way
represented
as
an
arrow
here
to
get
some
data
as
letter
e,
if
you
transform
data,
it
means
that
we
provide
data.
We
have
Arrow
some
compute
to
have
another
data.
A
If
you
have
an
intermediary
state
that
is
held
between
different
invocations
of
compute,
then
we
have
this
kind
of
error,
like
State,
monitor
where
we
have
some
internal
State
and
input
and
providing
you
kind
of
the
state
and
output
and
this
state
change
it
may
help
with
augmenting
data.
It
may
mean
storing
data
and
providing
nothing
as
the
result
and
so
on
and
finally,
to
finalize
the
life
cycle
of
the
data.
A
We
need
to
consume
it,
for
example,
to
show
it
and
we
have
some
kind
of
things
for
the
data,
and
it
means
that
you
provide
the
data
as
an
argument
and
you
have
nothing
inside
the
system
out
of
it,
but
you
have
some
effects,
for
example,
something
shown
on
on
the
display,
so
there's
no
way
to
compute
without
data
compute
without
data
makes
no
sense.
Absolutely
no
sense.
You
can
cannot
see
anything
if
compute
without
data
makes
sense.
It
means
that
there
is
data
involved.
A
Actually
it
was
consumed
or
exported,
or
something
happened,
and
data
without
compute
is
the
Lost
fact
it
happened,
but
it
even
doesn't
isn't
stored.
So
it's
just
just
lost,
so
some
more
errors
and
letters
we
have
some
predefined
computes
and
providing
the
data
and
getting
CID
is
also
a
compute
job
and
it's
not
always
trivial
to
compute
CID,
and
it's
not
always
easy
to
understand
that
this
file
represents
dcid
and
not
something
else,
and
sometimes
we
upload
this
compute.
A
Even
this
compute
from
the
client
to
the
server,
for
example,
if
we
stream
data
or
if
the
data
is
too
big
and
we
just
upload
it
and
then
we
get
back
CID,
we
have
this
arrow
inversed
and
that's
not
quite
compute.
A
If
we
get
from
CAD
to
data,
it
means
much
more
than
compute,
because
we
need
to
Discover
it
and
a
lot
more
things
involved
here
and
we
can
think
about
the
user
defined
compute,
which
is
an
arrow
without
data
yet
but
ready
to
get
it,
and
luckily
we
can
represent
it
as
data
as
well
like
webassembly,
binary
or
a
Docker
or
like
any
code.
A
Both
cases
make
sense,
and
fundamentally
it
makes
sense,
because
it's
just
more
efficient
the
closer
the
computer
data,
probably
the
more
efficient
the
computation
is,
and
it
brings
more
privacy,
more
security
and
overall,
is
it's
fine,
then
a
few
words
about
the
network.
It's
a
concept
which
is
orthogonal
to
compute
and
data
and
I
wish.
We
could
just
skip
it.
A
So
we
just
have
to
use
physics,
but
if
these
resources
have
something
important
in
common,
if
they
share
some
attribute,
then
we
can
organize
a
network
out
of
them.
A
A
So
all
of
this
matters
and
means
that
we
have
a
lot
a
lot
of
networks.
We
have
place
to
have
a
lot
of
networks
which
can
kind
of
could
be
very
big
as
protocols,
but
they
also
could
be
very
small,
for
example,
all
the
providers
of
a
particular
file
for
a
given
CID
on
the
ipfest
network
effectively.
They
also
form
a
kind
of
a
network
by
this
attribute
and
basically
Network
means
Assurance
connectivity
and
or
agreements
over
data
within
a
set
of
beers.
A
Now,
let's
speak
about
computer
data
networks
also
with
arrows
and
letters.
If
you
think
about
ipfs
as
a
network
as
a
whole,
then
every
ipfs
deployment
forms
constitutes
computer
or
Data
Network
and
compute
here
at
least
is
getting
CID
or
getting
from
CID
to
to
files,
and
the
notation
could
be
something
like
we
can
yield
a
function
from
CID
to
maybe
maybe
data
if
it
exists.
A
If
we
can
find
it,
maybe
data
we
can
think
about
every
ipfs
node
as
a
single
peer,
cache,
Network
or
any
any
subset
of
this
nodes
as
a
cache
Network,
so
that
we
provide
a
peer
ID
and
have
this
function.
That
works
with
a
very
different
like
latency,
because
we
look
up
locally
and
use
ipfs
as
a
hot
hash.
A
Instead
of
using
the
network
for
Content
Discovery,
it's
very
different
for
compute,
we
should
have
many
compute
over
data
Networks
for
different
geography,
for
example,
if
you
wanted
them
locally
or
for
types
of
computes
or
for
different
ways
to
compose
this
compute.
But
fundamentally
it
means
that
somehow
we
can
have
an
access
to
the
peers
that
can
take
our
compute
in
some
form
of
abstract
Arrow
like
webassembly,
for
example,
or
Docker
or
whatever.
A
And
after
that,
we
will
have
an
ability
to
execute
these
computations
on
data,
so
the
notation
becomes
longer
and
longer,
and
also
we
can
think
that
a
single
device,
even
one
device
or
like
a
small
set
of
user
devices
that
belongs
to
one
user,
for
example.
They
also
constitute
a
single
device,
computer
or
Data
Network.
That,
for
example,
provides
the
way
to
capture
effects
from
I,
don't
know
measurements
or
click
streams,
or
something
like
that.
A
A
Then
comes
Discovery,
so
Discovery
means
that
we
need
to
use
some
capabilities
out
of
thin
air.
So
previously
we
had
this
from
nothing.
We
get
some
capability
and
that's
discovery,
and
we
have
a
lot
of
different
domains
that
we
won't
discover
for
and
in,
for
example,
for
different
types
of
data
sources.
We
want
to
discover
data
sources,
be
it
the
devices
access
to
the
chain.
A
Access
to
sensors,
to
external
events,
to
Randomness
is
different
data
sources,
but
we
need
to
find
a
way
to
get
this
data
or
for
a
type
of
data
store,
called
storage
like
filecoin,
hot
storage
blockchain
as
a
storage
whatever
or
a
particular
data
set.
We
need
to
find
this
particular
data
set
or
a
kind
of
compute
capacity.
We
need
GPU,
we
need
whatever
we
need
the
ZK
kind
of
compute
so
that
we
need
certain
proofs
and
does
this
compute
fit
for
a
certain
data
that
we
want
to
provide?
A
Is
it
possible
to
connect
data
source
to
compute
capacity?
Do
we
need
to
prepare
the
data
before
we
do
compute
and
for
efficiency
of
everything
above
we
need
to
employ
a
very
different
Discovery
algorithms.
So
there
must
be
some
Solution
on
top
of
that,
to
think
about
all
those
problems
in
some
layer
that
just
simplifies
them,
or
at
least
lets
us
speak
about
it
like
to
have
the
common
common
language
to
speak
about
all
these
problems
while
they
seem
different
and
in
web
2.
We
have
solution.
A
So
when
we
just
do
web
2,
we
have
a
lot
of
easy
ways.
The
most
easy
ways
is
to
point
on
the
local
capabilities
like
localhost
file
system,
local
CPU,
or
something
that
you
have
in
the
web.
Browser
just
call
function
in
JavaScript
or
write
variable,
that's
the
access
to
data
and
to
compute
very
simple
or
for
the
remote
capabilities.
A
Usually
they
just
have
a
resident
point.
Something
XYZ
like
some
hosts,
infuro
I,
don't
know
that
gets
to
DNS
for
Discovery,
and
we
have
this
discovery
layer
hidden
in
a
fundamentally
centralized
way.
But
if
we
get
to
the
end
point
that
fulfills
our
needs,
we're
just
happy
so
absolutely
good.
Web
2
is
very
optimized
for
that
and
we
may
discover
computer
related
networks
the
same
way
with
the
gateways
and
many
people
are
happy
doing
just
in
future
or
something
like
that.
A
So,
let's
switch
to
compatibility
what
to
compose
and
why.
What
we
want
is
to
utilize
the
same
network
or
a
sub
Network.
It
could
be
a
small
Network
or
the
same
compute
capabilities
or
the
same
data
wherever
they
are
as
building
blocks
for
something
more
complex
to
deliver
the
value
to
the
end
user.
A
One
example
is
a
data
processing
pipelines,
so
you
have
some
device,
something
that
produces
the
facts
that
we
want
to
capture
and
analyze.
It
could
be
really
device
like
the
phone,
but
it
could
be
the
gene
the
access
to
chain.
It
could
be
the
source
of
time
of
Randomness
other
triggers.
A
Then
we
have
the
edge
like
reeling
node
in
lip
P2P,
for
example,
or
the
closest
cache
or
we
could
like
do
some
filtering
pre-processing,
and
it
means
that
we
want
to
do
it
very
close
to
the
device
just
for
optimization
because
we
might
get
an
Insight
very
fast
or
we
can
filter
out
unnecessary
data.
Very
fast,
or
we
can
do
a
small
Aggregate
and
prevent
the
further
processing
until
we
collected
some
data,
so
it's
pigeon
after
that
we
could
have
some
real-time
processing
close
to
the
edge.
A
Probably
we
want
to
have
the
same
region.
We
want
to
have
good
latency
to
get
some
real
time
in
size
or
to
store
some
data
in
the
read
models
to
have
some
databases
for
like
application
Level
and
finally,
we
store
it
into
probably
filecoin
to
the
cloud.
We
accumulate
a
lot
of
data
we
batch
process
it
because
we
have
a
lot
of
data
available
and
this
is
a
very
different
different
domain.
A
So
the
question
is:
if
you
want
this,
how
to
express
this
pipeline?
What
is
the
way
to
to
speak
this
pipeline
to
approach
this
pipeline?
How
to
how
to
code
it,
especially
given
that
we
have
a
lot
of
Discovery
processes
here
and
different
protocols,
because
probably
Edge
nodes
is
a
sub
Network
that
is
capable
for
doing
one
thing,
while
cloud
is
a
very
different
set
of
peers
and
different
Hardware
different
capabilities,
different
connectivity,
different
latency.
A
So,
and
where
should
this
data
pipeline
be
expressed
again?
That,
for
example,
we
have
just
these
four
levels
who
owns
this
pipeline
from
one
level
to
another,
who
writes
this
code,
who
deploys
it
and
where
to
deploy
it?
Is
there
any
place
to
deploy
it?
Who
and
yeah
on
the
way
we
have
the
user
data?
Probably
we
get
some
insights,
we
make
some
decisions.
A
We
should
transfer
some
permissions,
so
a
lot
of
questions
about
security
that
were
like
ignored
or
exploited
in
web
the
world,
but
we
want
to
do
better
so
for
computer
or
data
networks,
Discovery
and
compatibility
if
we
take
all
the
words
together.
So
what
we
want
is
to
utilize
many
networks
with
different
kinds
of
resources
and
attributes.
We
want
to
point
to
these
resources
and
how
the
the
single
flow
of
processing
we
want
to
take
Resources
by
geography
latency
by
different
ways
to
establish
security.
A
Some
something
easy,
some
consensus,
some
Trust
a
different
kind
of
access
to
data.
It
could
be
real
time
hold
chain,
whatever
we
have
different
compute
capabilities
and
want
to
point
on
them,
and
we
have
a
very
limited
amount
of
peers
actually
connected
to
the
end
user
to
the
device
and
that's
the
question:
is
it
connected
or
not?
A
So
coordinator
to
organize
to
orchestrate
this
workflow
seems
very
inefficient,
because
if
we
think
about
just
these
four
layers,
there
could
be
much
more.
There
could
be
less.
But
if
you
think
about
just
four
layers,
then
there
is
no
good
dedicated
place
or
location
and
even
geographical
location
or
protocol
wise
location.
What
what
network
should
we
should
it
be
like
located
in,
or
should
it
be
a
web
to
like
endpoint,
just
no
very
efficient
place
in
web
2?
A
It's
not
the
case,
because
you
can
have
everything
in
one
place
and
you
can
see
you
can
have
one
endpoint
that
actually
leads
to
the
workflow
being
executed
inside
like
the
the
cloud,
but
in
web3,
probably
not,
and
also,
if
you
have
a
coordinator,
it's
going
to
be
too
powerful
in
terms
of
ability
to
censor
the
facts
or
to
change
them
or
to
move
them
to
some
other
storage.
That
user
haven't
authorized
to
work
with
this
data
and
so
on.
A
All
of
this
could
be
managed
inside
a
single
networks.
So
if
we
say
that
compute
over
data
is
a
actually
about
creating
just
one
network
that
that
is
a
One-Stop
shop,
that
does
everything-
probably
it
will
be
enough
to
have
some
protocol
inside
this
network
to
discover
the
right
place
to
have
the
call
and
so
on,
but
in
the
real
life
even
for
web
2.
One
network
is
not
enough,
and
even
in
web
2,
we
have
different
solutions
for
the
edge
and
different
regions
and
different
stores,
and
so
on.
A
So
probably
what
we
want
to
have
is
finally
a
flow
of
service
composition,
that
is
current
net
or
free
permissionless
like
ready
to
be
composed
and
integrated
with
different
protocols,
and
we
want
to
have
this
workflow
to
to
look
somewhat
like
this.
A
So
we
have
a
certain
steps
and
we
can
describe
the
workflow,
be
it
some
kind
of
controller
for
a
user
interface
or
really
data
processing,
or
anything
like
that.
We
can
abstract
it
out
to
something
that
consists
of
very
simple
steps.
For
example,
I
want
to
get
data,
then
I
want
to
discover
the
edge
peers
which
are
ready
to
serve
me
like.
Ideally,
they
should
be
prepared.
A
Then
I
run
compute
on
on
the
edge
I
wait
for
the
proper
results,
given
the
consensus
or
security
capabilities
of
this
Edge
beers
and
proofs
after
that,
I
discover
some
kind
of
fog,
some
intermediary
computations
for
the
real
time,
I
move
the
control
flow
here,
run
compute
on
fog,
wait
for
proper
results
and
proof,
and
after
that
I
discover
storage
like.
Why
should
I
discover
because
it
could
depend
on
the
data
that
I
got
from
from
the
fog
compute.
A
For
example,
I
could
determine
the
the
right
Shard
which
points
on
the
sub
Network
on
small
Network.
A
Only
during
this
competition
I
cannot
probably
I
cannot
have
everything
in
place
from
the
very
beginning,
but
I
can
discover
every
next
step
and
after
this
like
discover,
run,
wait,
discover,
Runway,
discover
and
read
and
finally
done,
but
if
these
peers
are
not
prepared
to
serve
me,
if
they're
not
warmed
up,
then
I
want
to
have
the
pipeline
like
this
I
try
to
discover
Edge
if
I
cannot
I'm
trying
to
do
latency
based
Discovery
to
find
the
closest
peers
deposit
the
code
on
them
so
that
to
prepare
the
The
Edge
nodes.
A
The
same
for
the
Fogo
I
want
to
find
the
cable
peers
if
they
are
not
ready,
if
they
are
not
in
cash,
deploy
the
content
on
them
so
I
on
every
step.
I
do
some
computations
I.
Have
the
decision
about
the
next
step
and
I
push
my
requirements
to
have
the
next
step
prepared
for
me.
A
A
So,
ideally,
the
pipeline
should
look
like
this
and
it
should
not
depend
on
any
any
particular
Network.
So
that's
about
the
Flint's
approach.
Given
all
of
this
in
mind,
we
created
a
set
of
tools
that
makes
it
possible
like
today.
So
first
of
all,
we
have
an
aqua.
Aqua
is
the
way
to
express
these
workflows,
which
are
like
decomposed
into
this
discovery.
A
Compute
Discovery
compute
steps
with
maybe
some
provisioning
like
deploy,
something
or
not
or
failover
or
not,
and
so
on,
and
you
can
compose
distributed
workflows
which
are
suitable
for
open
networks
from
these
basic
steps
is
push
based.
So
next
peers
are
resolved
on
the
Fly
and
you
can
take
the
actual
real-time
connectivity
into
account
so
that
you
can
optimize
the
way
you
you
don't
say
want.
A
You
say
I'm
going
here,
because
it's
meaningful
for
me
and
a
push-based
approach
means
that
still
having
an
open
network,
you
involve
the
minimal
amount
of
peers
possible
and
the
minimal
amount
of
beers
means
that
it
depends
heavily
on
the
Discovery
mechanics.
If
you
have
log
n,
then
you
have
a
log
n
appears.
If
you
have
several
logarithmic
Discovery
process
like
in
Academia,
then
you
have
K
log
n
or
something
like
that.
A
But
if
you
can
reduce
the
scope
of
lookup,
so,
for
example,
using
not
the
whole
cadem
layer
or
the
whole
network,
pretending
that
all
the
beers
are
the
same,
but
instead
you,
if
you
can
using
some
attribute,
select
a
subset
of
the
big
Network
and
organize
some
kind
of
cadaver.
On
top
of
this,
then
it's
much
much
more
performant
and
if
you
have
a
hot
cache,
it's
even
better.
A
So
but
fundamentally
it
depends
on
how
exactly
you
write
the
code
and
you
have
full
control.
It
works
on
top
of
lipitupy,
but
it
doesn't
require
you
to
recompile,
rebuild
redeploy
the
network
or
the
networks.
So
if
some
protocol
is
ready
to
serve
aqua,
then
it's
just
ready
and
you
can
use
it
for
composition
without
any
additional.
Like
communication
with
the
team.
I,
don't
know
protocol,
forking
and
so
on,
and
you
can
have
different
workflows.
A
So
it's
open
for
integration
with
all
the
other
protocols
and
as
the
result
of
execution
as
the
side
effect.
It
provides
an
audit
log
that
only
what
you
expected
to
happen
have
happened
and
you
can
use
this
log
and
we
are
going
to
use
this
log
for
the
incentives
layer
with
aqua.
A
You
can
Express
these
smaller
networks
on
top
of
a
compute
over
Data
Network
protocols,
so
sub
network
is
Network,
formed
around
a
certain
resource
and
or
attribute,
for
example,
data
for
a
user
or
peers
of
protocol
and
close
to
region.
A
G,
it's
an
attribute
or
a
set
of
attributes
or
peers
capable
to
access
some
data
like
blockchain
B,
which
run
fluids
and,
at
the
same
time
which
run
I,
don't
know
filecoin,
for
example,
so
that
you
can
access
the
new
blocks
and
with
aqua
you
can
do
SUB
Network
formation,
using
aqua,
just
libraries
to
interconnect
sync
bring
code
on
peers,
deploy
provision
and
so
on.
A
Because
all
this
network
concepts
of
hearing
connectivity
or
hand
agreement
over
data,
they
could
be
also
decomposed
to
a
single
workflows
and
if
you
can
have
a
set
of
workflows
to
run
on
the
network
without
the
end
user.
Interaction
with
this
workflows,
but
instead
with
the
sub
Network,
then
effectively,
you
have
desktop
Network
and
you
you
can
have
sub
Network
discoveries
in
our
collaborate
as
well,
and
the
composition
of
sub
networks,
which
means
pushing
data
and
control
flow
to
the
next
and
next
next
sub
networks,
meaning
peers
of
these
sub
networks,
because
nothing
magical.
A
It's
just
a
new
line
character.
You
get
from
one
line
of
the
code
to
another
and
it
could
mean
moving
compute
and
data
to
the
next
peer
and
finally,
Aqua
is
part
of
affluence,
and
you
can
think
about
the
fluence
at
the
workflows,
tech
for
a
compute
over
data
network
discovery
and
compatibility,
yeah,
Cod
and
DNC,
so
fluence
accompanies
protocols
to
help
them
organize
push-based
and
Purpose
Driven
workflows.
A
So
we
are
open
for
different
protocols
as
a
sidecar
or
as
lipid2p
protocol
with
aqua
the
language
you
can
describe
the
workflow
or
straight
provision
and
so
on,
and
we
have
Maureen
Marine
is
a
web
assembly
runtime
to
run
compute
or
probably
highly
likely
to
lift
another
API
or
protocol
running
on
the
same
peer
to
Aqua.