►
From YouTube: CODwg Live Jam Session 03
Description
Join @DeveloperAlly & @silentspring30 to jam with Simon Worthington (@51M0NW) from Bacalhau / Expanso / Register Dynamics and Luke Marsden from Lilypad on federated learning, compute use cases and engineering challenges
B
A
About
yourself,
I
know:
we've
worked
together
before
Simon
was
actually
instrumental
in
building
out
lilypad
version
zero,
which,
for
those
of
you
who
might
not
know,
was
a
bridge
between
the
filecoin
virtual
machine
and
the
backyard
public
network.
So
it
allowed
you
to
kind
of
call
back
our
compute
jobs
from
your
frame,
smart
contracts,
and
you
can
still
do
that.
So
he
was
instrumental
in
helping
me
build
that
bridge
and
I
think
learn
a
lot
about
ethereum
at
the
time.
So,
but
what
have
you
been
up
to?
Otherwise?
So.
C
C
To
besides
that,
I
was
laughing
so
hard
on
your
joke.
I've
been
working
on
since
then,
for
example,
so
we've
been
taking
back
the
hour
from
the
one
point,
not
release
that
we
did
in
May
and
it's
starting
to
add
lots
of
new
features
that
lots
of
people
have
been
asking
for
which
I
could
talk
about
a
bit
more
if
you
like,
but
some
of
them
are
things
like
custom
executors.
So
you
don't
have
to
just
build
a
Docker
image.
B
C
C
It
to
do
but
actually
kind
of
run
a
job,
that's
continuous
and
streaming
and
Library
very
long,
so
yeah
working
on
lots
of
those
features
bringing
back
out
to
more
people
all
that
good
stuff.
Basically,.
A
Awesome
and
I
think
one
of
the
reasons
that
we
asked
you
to
join
us
here
today
as
well.
Awards,
like
I,
know
you're
leading
kind
of
that
engineering
effort
on
back.
We
are
now
one
of
the
things
that
was
super
interesting
to
me
and
takatica
as
well.
Was
your
talk
from
the
computer
over
data
Summit
Where
You
released
backpr
version
one,
and
you
were
talking
all
about
Federated
learning
there
and
I
think
that
was
something
that
you
really
enjoyed
as
well.
Cardika.
Could
you
like?
A
D
Sure
yeah
like
it
was
super
interesting
like
so
you
know
we
will
go
like
Simon
would
talk
through
a
couple
of
you
know
like
the
latest
developments,
but
I
think
later
on.
We'll
talk
a
bit
of
a
case,
study
and
I.
Think
what
I
love
about
this,
these
Federated
Learning
Blocks,
would
be
for
medical
data
and
how
we
can
really
you
know,
be
able
to
answer
some
of
the
questions
and
how
can
I?
D
You
know
like
basically
ensure
that
my
data
is
sort
of
safe
and
not
being
shared
around
in
terms
of
being
able
to
share
data
with
people.
You
might
not
trust
that
much
but
still
give
access
to
it.
You
know,
like
that's
one
area
like
I.
Quite
thought
was
really
interesting
and
then
considering
that,
like
I,
think
68
of
Enterprise
data
is
not
being
like.
It's
not
really
being
able
used
or
like
not
being
able
to
give
access
to
that's
one
of
the
that's.
D
What
one
of
the
things
I
quite
like
to
explore
a
bit
more
and
and
then
in
terms
of
governance
data
as
well
like
so
the
government's
most
100
of
data
is
govern
on
the
gdpr.
All
of
these
other
ones.
You
mentioned
in
your
talk
and
David
too
I
think
yeah
super
interesting
to
learn
a
little
bit
more
about
that,
and
we
chatted
a
bit
about
the
medical
case
study
in
terms
of
like
breast
cancer
or
like
medical
data
and
how
we
could
sort
of
like
set
this
up.
D
C
Yeah
sure
yeah
happy
to
talk
a
little
bit
about
bacliao.
So
back
of
the
hour
is
a
computer,
a
data
platform.
It's
open
source,
it's
on
GitHub
github.com
project
and
what
it
aims
to
do
is
be
a
kind
of
commercial
off-the-shelf,
like
commodity
platform,
for
building
compute
over
data
networks
of
all
sorts
of
different
types.
C
So
we
have
like
a
kind
of
a
global
public
network
which
you
can
join
and
anyone
can
submit
jobs
to
that
Network
and
have
them
run
against,
like
volunteer
volunteer
kind
of
compute
in
ads
out
there
in
the
world.
C
That's
around
the
network,
but
maybe,
more
importantly,
what
you,
what
you
can
do
is
download
the
software
run
your
own
computer
network
with
your
own
Hardware
and
your
own
environment,
and
what
that
allows
you
to
do
is
take
a
kind
of
General
background
architecture
and
apply
it
in
lots
of
different
interesting
ways
and
Computing.
Over
Federated
data
is
exactly
one
of
those
exactly
one
of
those
ways
so
yeah.
C
Since
the
1.0
release,
we've
been
working
on
a
couple
of
things,
I
talked
about
the
the
executors
I'll
I'll
say
a
bit
more
about
them.
When
we
come
to
the
actual
computer,
Federated
databit,
but
we've
also
been
running
like
working
on
long-running
jobs,
so
this
is
yeah
allowing
a
job
to
exist
for
a
long
time.
You
know
weeks
months
and
continually
receive
new
input,
because
lots
of
lots
of
processes
are
not
kind
of.
You
know,
run
it
once
and
then
forget
about
it.
C
But
they're
kind
of
more
like
it's
going
to
receive
new
input
over
time
and
you
need
to
you
need
to
kind
of
react
to
that
new
input
with
some
sort
of
operation
I
mean
as
an
example.
C
A
key
one
might
be
like
user,
presses
a
button
and
says
something
into
a
microphone,
and
then
you
need
to
then
like
do
text-to-speech
and
then
apply
it
to
a
large
language
model.
But
you
need
to
do
that
every
single
time.
The
user
presses
that
microphone.
So
it's
less
about
kind
of
a
one-off
job,
it's
more
about
like
a
job
that
will
get
run
multiple
times
under
the
same
context
and
often
in
quite
constrained
environments.
C
So
like
one
of
the
one
of
the
places
that
we
think
a
lot
of
this
stuff
is
going
to
get
applied,
is
actually
on
home
devices
so
like
on
your
smart
fridge
or
on
your
smart
speaker,
and
it's
also
about
kind
of
bringing
computer
over
data
jobs
to
those
places
as
well
like
they're,
reasonably
constrained.
C
I
have
also
talked
about
running
jobs
on
satellites
in
the
past,
which
actually
I
think
is
something
that
is
happening.
I
think
we
have
I
can't
remember
I,
think
we
have
a
partner
who's
going
to
run
some
backlier
nodes
on
their
satellites.
I
can't
remember
what
I
shouldn't
say
anymore,.
A
So
yeah
I
could
definitely
see
back
yeah
being
but
I
mean
even
run
on
iot
devices,
as
well
as
another
cool,
really
cool
feature
about
about
it.
Yeah
yeah.
C
That's
been
quite
okay,
so
you're
gonna
have
this
big
spike
and
then
nothing.
How
does
the
system
handle
that
so
we've
been
putting
in
kind
of
persistent
storage
and
and
load
Spike
handling
Tech,
to
make
sure
that
if
you
submit
that
100
000
jobs,
you
get
100
000
results,
they
don't
just
someone
that
just
disappear
and
go
into
the
bin,
or
anything
like
that.
C
So
yeah,
increasing
the
kind
of
quality
of
life,
I
suppose
has
been,
has
been
a
lot
of
what
we're
working
on
and
also
experimenting
with
new
ways
to
to
to
submit
jobs
at
the
moment.
The
way
that
back
of
the
hour
is
very
much
designed
around
the
public
network.
So
it
was
like
here's
a
job,
who's
going
to
run
it
and
you
might
give
it
to
one
person
to
run
and
they'll
come
back
or
maybe
you
need
three
people
to
run.
C
It
then
come
back,
but
also
what
we
see
in,
especially
in
these
Edge
scenarios,
is
I've
got
a
whole
Fleet
of
you
know,
satellites
smart
fridges,
whatever
I
want
to
run
it
on
all
of
them.
Actually,
one
of
the
things
we've
been
doing
is
maybe
it's
possible
to
Target
an
entire
network
or
Target
an
entire
subset
of
a
network.
It's
not
just
running
the
job,
but
actually
running
it
everywhere
that
it
needs
to
run
kind
of
bringing
a
different
dimension
to
what's
possible
as
well.
A
C
A
B
C
I
know:
that's
that's
exactly
the
sort
of
thing
yeah.
Maybe
you'd
have
some
Laura
one
census
connected
to
it,
or
maybe
they
would
be
computer,
but
it's
absolutely
designed
to
work
over
those
sorts
of
like
very
low
bandwidths.
Like
interesting
connections.
B
D
B
Gotta
love
your
QA
testers.
A
A
To
working
more
on
lilypad,
so
I
haven't
had
as
much
to
do
with
back
allows
I
used
to
so
this
is
like
really
great
to
learn
all
the
new
things
happening
for
me
as
well,
and
is
the
public
network
still
running
with
my
question
and
and
how
robust
is
it
these
days?
Yeah.
C
Yes,
it's
definitely
still
running.
We've
still
got
a
bunch
of
like
compute
nodes
that
are
run
by
the
Buckle.
Our
team
and
we've
also
got
a
bunch
of
still
got
a
bunch
of
volunteer
nodes
that
are
being
contributed
by
people,
some
of
whom
we
don't
know
who
they
are,
but
that's
kind
of
the
nature
of
having
a
network
like
this.
C
We
when,
when
people
come
along
but
I
mean
the
the
public
network
is
deliberately
limited
because
we
don't
know
because
we
don't
know
who's
going
to
be
submitting
jobs
to
it
as
well.
It's
not
just
about
like
public
computer,
it's
also
public
users,
it's
deliberately
limited.
So
there
are
lots
of
things
you
can't
do
in
the
public
network
that
you
can
do
in
a
private
setting
like
jobs
can't
access
the
internet.
You
can't
have
a
job
that
lasts
longer
than
about
an
hour.
C
You
can't
use
too
many
resources
at
once.
You
can't
access
like
local
file
systems
or
anything
like
that,
so
it's
really
intended
as
a
way
for
people
to
check
it
out.
I
suppose
yeah
yeah,
like
as
a
demo,
ultimately
see
how
it
works.
You
know
happy
to
also
support
kind
of
people
running
jobs
for
public
good
on
there
like
absolutely
no
problem,
but
it's
intentionally
limited
for
security
reasons
and
to
protect
the
people
who
are
providing
their
compute,
including.
B
C
So
yeah,
where
people
have
been
coming
up
against
those
limits,
we've
been
saying
to
them.
Oh
well.
Also,
it's
really
easy
to
run
a
private
Network.
Here's
how
you
do
it
I'm,
trying
to
make
it
as
easy
as
possible
to
set
up
one
of
those
things
yeah.
A
That's
awesome,
so
it's
probably
pretty
good
how-to
guide
of
how
to
set
up
your
own
private,
Network,
now
I.
Imagine
in
the
docks,
yeah.
C
Yeah
definitely
Keen
for
people
to
to
do
that
and
for
us
to
make
it
even
easier.
I
mean
we'd
really
like
to
get
to
a
place
where
it's
kind
of
one
click
setup
or
something.
You
know,
maybe
not
one
click,
but
you
know
one
one:
click
per
per
compute
node,
so
it's
just
you
just
run
it
on
the
machines
and
they
just
all
manually
connect
together.
That's
kind
of
the
end
goal
for
how
easy
it
should
be.
I.
Think,
okay,.
D
So
so
I'm
gonna
jump
into
the
case
study
a
little
bit
oh
yeah,
just
to
bring
like
the
the
technology
a
little
bit
more
to
life
and
also
for
the
non-technical
folks.
I
thought
we
go
through
the
scenarios.
So
let's
say
we
want
I,
want
a
private,
Network
and
add
all
my
medical
data
as
I
said
earlier.
D
My
dad
is
an
oncologist,
and
wouldn't
it
be
amazing
that
you
know
all
the
mammograms
and
like
all
this,
other
data
would
be
he
could
actually
share
like
first
within
you
know,
universities,
but
then,
and
then
you
know,
people
can
actually
contribute
to
that
as
well.
C
D
You
know
people
could
add
to
those
and
then
eventually
there
could
be
a
Federated
machine
learning
and
you
know
like
some
jobs
can
be
run
and
and
then
the
nice
thing
would
be,
people
could
actually
get
royalties
at
the
end,
for
you
know
contributing
the
data,
and
one
thing
we
would
like
to
ensure
is
that
nobody
can
sort
of
tamper
with
that
data
and
sort
of
de-anonymize.
It
I
think
you
know
with
robate,
you
know
being
around
now.
D
People
are
much
more
aware
and
have
to
be
quite
careful
in
terms
of
you
know
how
data
is
being
shared,
so
yeah,
so
that
would
be
sort
of
like
the
the
overall
scenario.
Yeah.
D
In
terms
of
the
you
know,
setting
up
like
a
private
Network,
how
would
you
know
how
would
I
go
about
that.
C
So
it
should
be
super
simple
and
and
Ally
link
to
the
the
docs
on
the
on
the
comments.
C
But
broadly,
all
you
need
to
do
is
download
the
back
of
your
software
and
it
has
a
you,
have
a
command
in
that
software
when
you
do
that,
it
will
spin
up
a
compute
node
and
what
you
ideally
want
is
to
make
sure
those
compute
nodes
are
next
to
the
databases
that
you
want
people
to
be
able
to
access
or
have
access
to
that
data.
So
there's
lots
of.
C
C
C
It
is
like
a
trusted
agent
of
the
user,
but
it's
also
a
trusted
agent
of
the
network,
so
the
requester
node,
as
well
as
responding
to
user
requests,
is
kind
of
responsible
for
moderating
them
as
well.
So.
C
You
sounded
that
request,
node
configure
it
appropriately,
and
then
you
connect
into
it
all
the
different
compute
nodes
that
you
want
attached
to
all
the
different
databases
that
you
want
to
make
available
and
then
that's
It.
Ultimately
like
you,
only
need
to
do
those
things
and
then
you've
got
a
fully
functional
back
of
your
network,
which
can
which
can
answer
these
sorts
of
requests.
C
It's
not
like
there
are
various
like
flags
for
controlling
the
privacy
of
stuff,
but
ultimately
all
you
need
to
do
is
specify
that
the
things
that
you're
operating
on
a
private
so
you're
not
connecting
to
any
kind
of
other
background,
Network
you're
not
connecting
to
any
other
ipfs
network,
nothing
that
you're
doing
is
gonna,
leave
the
confines
of
what
what
you've
configured
and
then
you
can
configure
the
like
moderation
options.
C
So
by
default
the
request
will
send,
or
by
default
the
request
will
just
allow
all
jobs
to
to
be
used
so
a
way
in
which
we
would
control.
That
is
by
controlling
the
users.
Who
can
access
that
requester
node
like
via,
like
AWS
credentials
or
gcloud
credentials,.
C
Or
running
in
your
data
center,
like
your
kind
of
VPN
credentials,
that's
quite
a
broad
brush
thing,
so
it's
obviously
like
you
know
you
let
one
person
in
and
they
can
access
everything
the
next
person
can,
but
actually
sometimes
what
you
want
is,
but
different
people
have
different
levels
of
access
or
you
want
to
be
able
to
know
all
you
want
to
know
for
audit
purposes
like
who
who's
doing.
What
so
Bachelor
has
the
concept
of
a
client
identifier.
C
So
every
user
has
a
unique
client
identifier
associated
with
their
machine,
and
you
can
collect
those
up
and
you
know,
assign
them
to
specific
people
and
then
say
to
the
requester.
Node
actually
only
accept
jobs
from
these
client
identifiers
or
only
accept
jobs
from
these
real
world
identities,
or
you
can
go
a
bit
even
deeper
than
that
and
say.
Actually,
this
person
should
only
be
able
to
access
this.
This
person
shouldn't
only
be
able
to
access
that
blah
blah
blah
and
you
can
get
as
specific
as
you
want
ultimately
like.
C
There
is
a
lot
of
control
about
how
that
request.
Note
will
accept
jobs.
There's
also,
you
can
get
kind
of
into
this
Middle
Ground,
so
you
can
say
you
know
these
users
shouldn't
have
access.
These
users
should
have
Global
access,
but
more
realistically,
most
users
can
can
access
all
the
things
that
we
think
are
not
particularly
controversial.
So
if
the
user
wants
to
access
this
data
set
well,
it's
mainly
public
anyway,
so
they
can
just
access
it.
But
if
users
want
to
access
this
data
set
it's
sensitive.
C
You
know
maybe
it's
pseudo-anonymized,
but
it's
still
got
quite
high
value
data
in
there,
so
their
jobs
will
need
to
be
moderated
by
a
human
to
check
that
what
they're
doing
is
is
kind
of
valuable,
so
I've.
So
one
of
the
slides
that
I
presented
at
the
in
during
the
Cod
Summit
was
on
Instagram.
Let
me
see
if
I
can
find
it
I
can't
not
that
one
this
one.
So
the
the
various
ways
in
which
people
talk
about
like
what
is
what
is
acceptable
and
I
guess
the
key
element.
C
The
key
elements
that
that
mean
you
need
to
have
moderation
are
like
what
people
are
going
to
do
with
the
data
is
as
important
as
who
and
as
important
as
what
the
data
is.
So
you
know
you're,
obviously
happy
to
share
data
to
do
with
like
mammograms
for
the
purposes
of
curing
cancer.
But
if
someone
came
along
and
said
actually,
I
only
use
this
data
to
like
increase
the
profits
of
my
pharmaceutical
company.
Maybe
you
would
say
actually
no
I
don't
support
that
goal
like
that's
not
in
line
with.
C
Why
I'm
sharing
this
data
in
the
first
place,
so
I'm
not
happy
with
that.
So
understanding
the
context
around
what
the
job
is
going
to
do
is
kind
of
part
of
that
moderation,
job
so
yeah
you
can
configure,
ultimately
a
quite
a
high
level
of
with
a
high
level
of
control.
What
jobs
should
be
run
and
also
pass
it
to
a
human
moderator
to
say
you
need
to
check
this
to
see
whether
this
is
going
to
do
something
inappropriate
or
not.
So
the
thing
that's
interesting
about
that
is
at
the
moment.
C
Moderating
those
is
quite
hard
like
if
someone
says
I'm
going
to
run
a
random
Docker
Edge
over
your
sensitive
data,
I
would
be
like
well,
what's
it
going
to
do
like
how
do
I
know?
What's
in
the
docker
image
and
sure
you
can,
you
can
inspect
a
Docker
image
and
you
can
check
it
out,
but
like
actually
that's
very
difficult
to
do
and
it's
ready
to
any
it's
a
technical
job.
Like
you
know,
you
need
to
be
a
technology
expert
to
know
how
to
do
that
and
to
really
assess
it.
C
This
is
one
of
the
reasons
why
we
want
to
do
that
so
that
you
could
just
submit
a
python
script
to
run
against
the
data
like
you
know,
that
can
be
significantly
less
complicated,
significantly
less
complex
than
a
Docker
image,
which
is
a
lot
easier
to
moderate
or,
for
example,
you
can
go
even
higher
and
just
submit,
like
an
SQL
query,
to
run
against
a
database
quite
easy
to
moderate,
compared
to
like
a
blob
of
webassembly,
which
is
very
difficult,
so
yeah.
C
D
D
How
can
you
really
evaluate
you
know
what's
going
to
happen
and
what
they're
actually
running,
but
yeah
perfect
like
this
sounds
amazing
that
this
is
coming
up
and
yeah
I
would
love
to
like
learn
a
little
bit
more
and
help
people
can
actually
use.
You
know,
use
that
and
and
make
it
accessible
and
then
have
like
a
great
interface.
You
know
around
that
yeah.
C
D
C
Definitely
I
mean
yeah
that'd,
be
a
great
hackathon
topic,
like
we've
kind
of
always
imagined
that
there
would
be
automated
tools
that
can
help
with
that
moderation.
You
know
like.
Can
you
apply
some
AI
to
a
piece
of
code
or
to
a
whole
job
and
get
it
to
generate
like
some
level
of
risk
or
to
kind
of
try
and
describe
what
it's
doing
in
a
way
that
is
helpful
for
a
human
moderator
and
we've
never
really
tried
it.
So
that
would
be
great
to
see
people
doing
something
like
that.
B
B
A
Big,
Field
I
think
going
on
there,
but.
A
C
B
A
Don't
know
if
anyone's
heard
of
Blue
Sky
the
social
media
app,
which
is
trying
to
kind
of
create
this
version
of
Twitter
X
or
whatever
it
is
now.
A
So
blue
sky
is
kind
of
creating
this
version,
where
you'll
be
able
to
bring
it's
kind
of
like
Mastodon
as
well,
where
you'll
be
able
to
bring
your
own
servers
and
then
you
as
a
community
kind
of
decide
what
sort
of
moderation
rules
you
want
around
that,
and
you
know
this
is
only
parallel
likely
in
the
same
place
here,
but
you
know
I
think
content
moderation
is
like
one
of
the
challenges
that
has
just
been
a
challenge
for
many
years,
and
it's
still
like
really
unsolved.
A
How
do
we
do
it
well
and
yeah
I
think
you
know
there's
lots
of
different
ways
to
do
it.
I
definitely
think
AI
is
going
to
be
able
to
help,
though
live
answer.
Yeah.
C
Yeah,
no,
it's
definitely
really
hard
and
I
think
you're,
absolutely
right
that
the
key
is
to
understand
like
in
your
community
or
I.
Guess
in
this
context,
like
your
topic
and
what
the
sort
of
risk
factors
are,
because
right
at
the
bottom
of
that
list
is
safe
outputs,
which
is
exactly
the
same
problem
as
content.
Moderation
right,
like
you've,
got
some
data
coming
out.
C
You
need
to
moderate
it
as
automatically
as
possible
and
it's
a
lot
easier
if
you
know
that,
like
oh
we're,
you
know
working
over
patient
data
and
the
big
risk
is
people's
details
being
leaked
yeah,
and
actually
it's
a
lot
easier
to
solve
that
problem
than
it
is
to
solve,
like
a
general
problem
of
like
is
this
image
offensive
and
it's
like?
Well,
that's
great
yeah.
A
Yeah,
exactly
exactly
right
and
I
think
there's
kind
of
a
parallel
here,
or
at
least
a
a
random
segue
that
I'm
going
to
take
into
this.
This
is
also
the
case
with
how
we're
building
out
you
know
kind
of
this
incentivization
layer,
which
is
Lily
Pad,
which
you
know
is
basically
taking
the
vapia
public
network
and
turning
that
into
an
incentivized
version
of
that
or
a
like
three
version
of
that.
Like
so
much
like
kind
of
file
coin
is
to
ipfs.
A
You
could
say
little
iPad
is
to
to
backly
out
a
lot
of
extra
code
going
into
this,
but
the
basis
of
it
was
you
know
the
Buffy
okay
code.
B
A
C
A
Of
computers
are
deterministic,
so
at
some
point
we
are
going
to
have
to
figure
out
how
to
run
non-deterministic
to
a
certain
degree,
jobs
on
the
network
and
still
have
the
ability
for
Game
Theory
to
catch
when
those
are
the
wrong
thing.
So
really
interesting.
Research
problems
anyway,.
C
Yeah,
that's
really
interesting.
Yeah
I
mean
I,
think
I.
There's.
There
is
like
interesting
pedigree
from
people
like
koi
who,
like
sort
of
taken
the
problem
and
punted
it,
but
in
a
quite
interesting
way,
which
was
to
say
actually
you're
the
one
running
the
job.
Well,
you're,
the
one
running
the
community.
It's
up
to
you
to
decide
how
best
to
like
decide
whether
or
not
the
job's
been
done
correctly,
because
if
you're
screen
scraping
something
like
well,
it
kind
of
it
can
change
right
between
results.
So
it's
kind
of
statistically
is
it?
C
Is
it
better?
So
yeah
just
depends
on
the
workload
which
is
quite
hard
to
do
generally,
but
lots
of
engineering
research
for
sure.
A
Yeah
definitely
and
I.
You
know
I
think
Levy
ribelov
who's
our
researcher
on
this
and
knows
way
more
than
me,
I'm,
just
like
kind
of
regurgitating
some
of
the
things
I've
heard
from
smarter
people
than
me.
Basically
is
you
know
it
is.
Actually
it
knows
knows:
choir
has
worked
with
them.
B
Exactly
exactly
really
afraid
there.
D
Another
question
was
I
think
you
we
were
looking
at
that
in
terms
of
you
know,
saving
costs
and
sort
of
egress
costs
in
terms
of
bakayo,
and
you
know
you
had
like
an
interesting
calculation.
I
was
just
wanted
to
highlight
that
a
little
bit
again.
C
Yeah
yeah
so
I
mean
this
is
this
is
a
a
slide?
That's
come
out
for
Lori,
which
is
really
like,
demonstrating
the
sort
of
the
reason.
Part
of
the
reason
why
we
like
doing
computer
data
in
place
is
valuable
because
I
mean,
for
example,
this
is
the
egress
cost.
C
There's
a
similar,
Ingress
cost,
which
actually
I
think
is
more
than
this
as
well,
but
ultimately
just
moving
the
act
of
moving
data
around
it
costs
you
money
like
to
get
data
in
and
out
of
these
big
clouds
like
AWS
and
Azure,
and
Google
cloud
is
costing
people
like
thousands
of
dollars.
C
50
terabytes,
like
I,
don't
know
whether
people
think
that
sounds
like
a
lot
anymore
or
not
like
when,
when
I
had
my
first
PC,
that
was
like
unfathomably
large
amount
of
data,
but
I
think
you
don't
have
to
go
to
very
large
scale
anymore
before
that
becomes
like
a
commonplace
amount
of
data.
I
mean,
like
the
amount
of
data
that
Netflix
has
the
amount
of
data
that
Amazon
has
is
going
to
like
outstrip
that
by
orders
of
magnitude
right
like
several
in
fact,
or
does
a
magnitude
so
moving
so
yeah.
C
If
you
imagine
the
numbers
in
the
kind
of
next
to
AWS,
like
4300
and
just
times
it
by
like
100
or
a
thousand
and
in
fact,
I
think
the
costs
do
kind
of
they're,
not
linear,
either
I.
Think
for
these
clouds.
C
That's
when
you're
starting
to
just
see
something
that
is
just
prohibitively
expensive
and
definitely
the
reckon
from
I
think
most
computer
data
platforms,
but
certainly
back
here
in
general,
is
that
this
is
not
affordable.
It's
all
sustainable,
ultimately
like
another.
One
of
my
favorite
stats
is
data
growth.
The
growth
of
data
volume
is
outstripping
the
growth
of
network
bandwidth
by
like
a
factor
of
like
43,
so
that
just
means
that
it's
not
going
to
be
very
long
before
we
just
physically
cannot
move
all
the
data
that
we
have
around
into
a
central
place.
C
C
C
D
And
also
the
time
right
required,
I
think
one
super
interesting
area
would
be
to
analyze.
You
know
the
sort
of
carbon
footprint
you
know
the
reduction
you
would
actually
save
because
of
that
so
yeah
I
would
be.
Would
that
would
be
maybe
like
another
another
hackathon
or
something
you.
D
Just
a
research
study
in
terms
of
you
know,
like
Net
Zero,
there's
a
lot
of
criticism
around
that
as
well.
B
C
Yeah
yeah
I
mean
doing
less
ultimately
is,
is
the
answer
and
moving
moving
data
is
one
of
those
easy
things
to
do
less
of.
If
you
have
ability
to
query
it
and
I
think
you're
absolutely
right
that
a
lot
of
people
don't
appreciate
how
slow
data
transfer
still
is
I
was
I,
was
reading
I
can't
remember
where
it
was,
but
a
news
article,
the
other
day
where
they
raced
a
like
a
50
terabyte
data
transfer
between
like
two
European
cities.
C
One
was
over
the
internet
and
the
other
one
was
on
a
pigeon
and
the
pigeon
got
to
the
other
data
center,
like
about
20
times
faster
than
the
internet
transfer
took
so
I
think
that
it's
something
like
the
the
internet
transfer
was
only
at
four
percent
when
the
pigeon
arrived
and
it's
just
like
yeah,
this
thing's
really
slow,
like
a
pigeon,
can
do
faster
for
this.
For
this
sort
of
yeah,
this
is
it.
This
is
exactly
it.
This
is
like
for
those
sorts
of
volumes.
C
C
B
B
A
But
yeah,
that's
great
I.
Think
one
of
the
reasons
that
Falcon
was
born
as
well.
If
I'm
allowed
to
do
since
breaking
here
since
we
are
on
the
Falcon
channel,
but
you
know
the
point
is
to
decentralize
all
this
are
so
you
can
do
it
all
in
one
place
or
also
an
interesting
thing
to
think
about
when
we're
talking
about
Hardware
as
well
is
even
the
big
providers
are
mostly
concentrated
in
North,
America
or
Europe
a
couple
of
places
in
the
world.
A
Really
it's
not
a
very
distributed
geography
of
data
centers
even
owned
by
these
big
massive
Cloud
providers.
So
you
know
you're
also
kind
of
disadvantaging
people
in
general
in
a
more
Global
World
and
that
that
does
that
does
lead
to
issues.
We've
seen
that
the
internet
tried
to
you
know,
did
a
lot
to
change
that
as
well,
like
people
can
get
an
education
with
an
internet,
but
even
even
now,
like
that's
not
everywhere
so
yeah
I,
don't
know
what
I'm
on
about,
but.
D
Just
to
sort
of
you
know
get
back
to
the
case
study
so
the
last
part.
Let's
let's
say:
okay,
you
know
I
set
up
my
private
Network
and
then
I
can
do
my
run
mine.
You
know
machine
Federate,
machine
learning
on
there
as
well,
and
you
know
get
all
these
amazing
results,
but
so
in
terms
of
Lily,
Pat
and
being
able
to
then
get
royalties,
can
you
sort
of
talk
a
little
bit
about
around
that?
How
would
work
for
the
for
end
users.
B
A
So
that's
a
really
interesting
point,
I
and
I
think
you
know
it's
one
that
AIS
are
bringing
up
a
lot
these
days.
Well,
one
of
one
of
the
things
that
you
know
is
kind
of
coming
to
the
surface
because
of
AI,
like
you
know,
how
do
we
attest
to
you
know,
won
the
originality
of
potentially
any
creative
Endeavor,
these
days
blogs
codes,
whatever
it
is,
was
it
actually
written
by
a
human?
Was
it
actually
done
by
a
human?
A
Is
it
actually,
your
prime
minister
saying
that
who
knows
like
these
deep
fakes
are
going
to
be
literally
everywhere,
so
so
that's
one
problem
and
the
other
problem
is
We've.
We've
trained
up
all
these
data
sets
as
well,
and
this
is
what
you're
mentioning
as
well
here.
Cardika
we've
trained
off
all
these
data
sets
and
we
don't
really
know
where
those
data
sets
came
from,
because
a
lot
of
this
is
a
black
box.
A
closed
black
box
of
information.
A
These
days
I
mean
we
can
have
a
guess
where
these
data
sets
came
from,
but
but
anyway,
you
know,
but
the
original
artists
or
the
original
creators
aren't
getting
any
attributions
here.
So
how
do
we
manage
to
do
that
and
look
I?
You
know,
I
I
do
work
in
the
web,
3
space
and
I
think
you
know
this
is
exactly
what
blockchains
are
for.
These
are
the
kind
of
things
that
blockchain
can
help
with.
A
One
of
them
is
verification
and
attestation,
one
of
the
another
ones,
a
Providence
record
and
the
other
one
is
our
payments
layer
and
we've
kind
of
done
something
similar
to
that
with
a
project
called
waterloo.ai
which
trains
a
an
artist,
a
transa
model
on
an
artist's
work,
so
fine
tunes
really
to
an
artist's
work,
so
this
artist
would
upload,
say
50
images
or
so,
and
then
we
run
this
machine
learning
fine-tuning
model
on
it
to
fine-tune
to
that
particular
artist's
style.
A
So
it
works
best
for
artists
that
have
a
unique
style
if
they
have
stuff
everywhere,
it's
a
bit
harder
to
train
on
clearly
and
then
a
user
can
come
along
and
pay
a
certain
amount
to
have
an
image
created
out
of
a
prompt
they
put
in.
Like
you
know,
my
favorite
one
is
a
rainbow
unicorn
in
space.
This
is
the
one
I
use
all
the
time,
which
is
why
you'll
see
unicorns
everywhere,
but
and
then
in
the
background,
there's
smart
contracts
here
that
automatically
pay
out
to
that
artist.
A
So
the
idea
is
that
that
artist
deserves
some
attribution
for
the
original.
You
know
data
that
they
provided
to
this
AI
model.
So
that's
one
way:
blockchain
could
help
with
that
and
that's
just
a
proof
of
concept
project
like
I,
really
hope,
like
people
extend
and
expand
on
on
these
kind
of
ideas,
for
where
blockchain
can
kind
of
help
with
offset
some
of
the
maybe
more
problematic
areas
of
AI
and
I.
I.
Definitely
think
verification
or
provenance
and
attribution
are
some
of
those
so
and
yeah
under
the
hood.
D
So
Theory,
then
you
know
you
could
have
like
a
group
of
let's
say,
and
in
this
instance
it's
just
it's
this
one
artist
who
gets
the
the
royalties
back.
But
let's
say
they're,
like
hundreds
of
people
who
you
know
part
of
this,
maybe
a
Data,
Trust
or
something
else,
and
then
by
giving.
D
You
know,
like
all
this
data,
rather
than
you
know,
like
currently
people
sort
of
give
data
for
free,
like
even
not
knowing
right,
but
it
would
be
much
more
conscious,
like
a
conscious
way
of
contributing
data
to
like
a
sort
of
controlled
or
a
good
cause
or
but
then
you
get
like
your
little
royalties
back
yeah.
A
Yeah
100
I
think
you
know
that
is
kind
of
something
that's
happening
in
the
traditional
Tech
space
people
are
going.
Okay,
we
want
to
have
these
public
good
data
sets
that
people
can
train
AI
models
on
clearly
AI
is
going
to
make
a
massive
difference
in
the
world.
There's
problems
it's
going
to
solve
that
we
wouldn't
be
able
to
solve
on
our
own
like
complex
problems.
It's
going
to
speed
up
the
time
to
solve
some
of
these
problems,
and
we
were
talking
medically
before
that's.
A
A
Imagine
if
you
had
next
to
that,
then
something
that
could
process
that
data
like
back
yeah
like
lily
pad
something
that
can
access
and
use
that
data
in
that
same
data
house,
and
you
can
run
that
as
an
open
share
open
source
as
part
of
your
open
source
model,
or
you
can
run
you
know
these
Python
scripts
or
whatever
it
is.
You
know
your
own
scripts.
A
Clearly
that
are
better
than
than
mine,
although
you've
got
JavaScript
now,
so
maybe
you
know
to
to
try
out
some
new
things
on
top
of
this
data
and
if
you've
got
the
whole-
and
this
is
what
the
whole
open
Source
movement
was
about.
Wasn't
it
like
this
is
speeding
up
the
process
to
come
up
with
Solutions
and
a
really
interesting
article
actually
that
I
read
around
this.
A
Was
this
like
leaked
Google
Document
about
how
Google
has
no
moat
and
neither
does
open
Ai,
and
it
was
talking
all
about
how
open
source
is
basically
eating
their
lunch.
They
can
do
it
quicker.
They
can
do
it
better
and
they
can
do
it
more
targeted
these
open
source
developers
with
with
the
access
to
data
and
gpus.
So
let's
go
give
it
to
them.
Let's
see
what
see
what
you
know
we
get
out
of
it.
So
yeah,
that's
my
little
Spiel
right.
There.
B
C
Think
so,
I
think
the
interesting
I
guess.
The
way
in
which
it
comes
back
into
the
Federated
stuff
is
having
like
it's
it's
one
thing
to
have
an
open
data
set
that
is,
like
you
know,
built
in
an
open
source
way
or
built
by
a
community
in
an
open
way
them
in
the
medical
case.
C
Obviously,
like
it's
more
challenging
to
do
that,
and
maybe
you
could
have
some
medical
data
sets
that
were
anonymized,
but
it's
really
hard
to
know
whether
or
not
that
will
be
good
enough
before
you
release
stuff
into
the
open,
because
even
if
you
have
anonymized
data,
if
you
have
you
know,
it
only
takes
like
three
data
sets
which
contain
the
same
individuals
with
some
very
specific
properties
before
you
can
take.
C
Three
data
sets
that
are
anonymized
and
actually
uniquely
identify
an
individual
in
those
data
sets
so
like,
as
if
you're
a
data
custodian
for
some
of
the
sensitive
data.
It's
quite
the
bar
for
being
able
to
release
it
publicly
is
very
high.
I
mean
I,
don't
know
whether
you've
seen
the
news
in
the
UK
about
a
massive
data
breach.
C
That's
happened
with
all
the
the
police
data;
basically,
they
accidentally
released
all
of
the
data
of
all
currently
serving
police
officers
in
the
United
Kingdom,
including
Northern
Ireland,
where
police
officers
are
like
routinely
in
danger
as
part
of
their
Duty,
and
it's
that
sort
of
story
that
really
puts
people
who
are
in
a
difficult
position
of
being
responsible
for
sensitive
data
makes
them.
You
know,
pushes
them
back
towards
being
very
data
fearing
which
is
like
the
term
that
people
use
for
yeah.
This
is
oh,
no,
this
is
oh.
C
C
Think
people
had
all
the
data
in
Excel
and
didn't
realize
that
if
you
delete
the
tab,
but
you
still
have
a
pivot
table
that
references,
the
data
and
the
data
is
still
in
the
file,
so
they
release
the
Excel
with
the
pivot
table,
and
then
they
didn't
realize
that
all
the
data
was
still
there.
That's
how
it
happens
so.
C
C
C
So
the
term
that
people
seem
to
be
using
to
talk
about
this
sort
of
stuff
is
compute
islands
and
the
idea,
basically,
is
it's
topic
specific
compute
or
gating
access.
So
you
in
in
the
use
case
that
you
talked
about
like
your
your
father,
who's
like
an
epidemiologist,
has
some
data
has
done
some
data
collection.
C
That
was
an
expensive
process
and
would
like
to
make
that
data
more
available
for
use
right,
like
ultimately
you're
furthering
science,
so
making
that
data
as
available
as
possible
is
a
positive
thing,
but
also
doesn't
want
to
be
involved,
particularly
with
like
moderating,
requests
or
like
spinning
up
Bachelor
nodes
or
like
doing
any
of
the
tech,
because
that's
like
that's,
not
the
scientists
or
your
house
right
like
they
should
do
what
they
do
best.
C
And
so
the
idea
here
is
that
people
will
contribute
their
data
sets
or
people
contribute
their
their
compute
hours,
but
not
in
a
way
that
actually
removes
the
data
from
their
control.
So
it's
still
within
their
domain
or
within
that
kind
of
Enterprise.
You
know
boundary,
but
then
they
put
their
trust
in
an
external
moderator
who
is
kind
of
well
chosen
to
allow
or
deny
access
based
on
their
policy.
C
So,
like
you
know,
your
father
might
say,
I
only
want
this
data
to
be
used
for
the
Medical
Science
I,
don't
want
it
to
be
used
to
develop.
Like
you
know,
private
drugs
or
I
don't
want
to
be
used
to
kind
of
train
AI
models.
That's
my
ethical
stance
and
so
it'd
be
up
to
the
moderator
to
kind
of
apply
that
policy
and
because
you've
got
that
whole
end-to-end
thing
there.
C
If
you
introduce
that
remuneration
element
where
the
user
on
the
right
is
paying
to
access
that
data,
then
that
can
correctly
flow
back
into
the
covers
of
the
university
to
fund
more.
C
Right
so
this
is
yeah,
you
use
the
word,
Data
Trust,
absolutely
a
big
thing.
At
the
moment,
data
trusts
are
very
a
very
much
a
kind
of
social
or
legal
construct
right
so,
like
then,
they're
not
on
a
very,
very
strong
technical
footing,
and
mostly
they
do
involve
basically
a
bunch
of
companies
or
a
bunch
of
organizations
that
don't
have
a
huge
desire
to
share
data
with
each
other
other
than
under
specific
circumstances,
ultimately
giving
their
data
up
and
some
of
the
compute
over
data
technology
is
able
to
say.
C
What
queries
have
they
run
and
you
can
look
through
it
and
say:
oh
actually,
that
wasn't
really
what
I
had
in
mind
I'm
going
to
end
this
relationship,
whereas
if
you
hand
over
your
data,
obviously
you
no
longer
have
visibility
of
what
people
are
doing
with
it.
So
yeah
I
definitely
think
there
is
a
way
to
bring
more
of
that
private
data
in
a
safe
way
to
good
use
in
like
training
models
that
have
public
benefit
or,
like
other
doing
other
Sciences,
probably
benefit
right.
D
Absolutely
like
I
think
in
terms
of
you
know,
like
you
know,
if
you
you
know
when
you
think
about
you,
know
where
the
data
and
how
AI
models
have
been
trained
so
far
and
the
buyers
of
it
and
the
you
know
the
it's.
C
D
Crazy
right,
so,
if
we
could,
you
know,
enhance
and
have
like
this.
This,
these,
you
know
really
reviewed
data,
sets
that
have
like
you
know
like
the
buys
Checker
or
whatever.
So
you
know
what
you're
like
actually
looking
for
and-
and
you
know,
I
completely
understand,
like
you
know,
people
just
if
you
want
to
train
your
model,
you
just
need
to
get
started.
So
you
know
you
go
the
easiest
route,
but
at
least
you
know
what
you
know,
what
the
what
the
holes
are.
D
You
know
like
where
you
need
to
sort
of
like
and
add
to
it
and,
and
there
will
be
different
Pockets
of
you
know,
that's
how
Envision
it
like
different
pockets
of
people
or
different.
You
know
groups
of
people
who
can
then
sort
of
like
supplement
it.
You
know
like
support
each
other
within
that,
but
still
get
like
a
you
know.
A
monetary
reward
at
the
end,
like
you
know,
I
think
that
would
be
like
sort
of
an.
C
D
World
yeah
yeah.
C
Definitely
and
I
I
do
think
that
something
really
powerful
in
people
being
able
to
be
part
of
it
without
having
to
do
it
all
themselves,
like
you
know,
in
this
case
the
scientists
or
the
state
clinicians
who
are
able
to
contribute
stuff
and
see
some
of
that
reward
without
having
to
run
the
whole
system,
build
the
technology
from
scratch
and
I.
C
Think
that's
what
computer
data
in
general
almost
has
a
responsibility
to
do
or,
like
the
part
of
the
vision
of
the
whole
movement
in
general,
is
bringing
more
of
this
Tech,
which
is
not
new,
let's
be
clear,
like
they
were
doing
this
in
the
60s
right,
but
bringing
more
of
this
Tech
to
more
people
through
all
the
stuff
that
we've
learned
like
over
the
past
decades
right
and
like
exactly
like
you
say,
building
it
in
a
way
that
allows
people
to
to
take
part,
no
matter
where
they
are
and
and
be
compensated
for
that.
B
B
I
we're
talking
to
yourself
a
place
to
leave
off
here
unless
you
have
any
further
comments,
cardika
or.