►
From YouTube: GPUs on Filecoin: An Update on Mining
Description
An update on Filecoin mining using GPUs from Volker Mische and CryptoComputeLab.
A
A
But
before
I
get
into
the
details,
as
this
is
an
event
I'm
celebrating
a
five
coin
for
one
year,
I
also
like
to
show
you
the
progress.
So
I've
run
the
proof
of
replication
on
some
hardware
and
saw
the
same
hardware
from
a
version
a
year
ago,
and
the
version
that
we
have
today
and,
as
you
can
see
the
version
a
year
ago
for
sealing
a
32
gigabyte
sector
took
about
three
hours,
and
now
we
are
down
to
about
two
and
a
half
hours,
which
is
a
great
improvement.
A
All
the
details
about
this
graph
will
come
later
again,
so
just
to
get
this
is
just
to
get
you
excited.
So
if
I
talk
about
file
coin,
I
mean
proofs
in
this
talk
because
follow
my
means.
Many
different
things
to
different
people,
so
sometimes
people
refer
to
as
the
protocol
or
as
the
cryptocurrency
or
sometimes
as
the
implementation
like
lotus.
A
As
this
talk
is
about
gpus,
I
also
quickly
like
to
talk
about
what
gpus
are
good
for.
So
if
you
have
huge
amounts
of
data-
and
you
have
some
operation
that
you
can
apply
to
the
data
in
parallel
and
crunch
through
it,
basically
in
one
step-
that's
good
for
gpus
what
they
are
not
good
at
is,
if
you
have
some
interdependencies
and
then
we
will
probably
just
get
back
to
the
cpu
and
filecoin
actually
has
both
both
those
things.
A
Sadly,
I
don't
have
the
time
to
talk
about
the
proof
of
space
time,
so
the
proof
replication
is
also
known
as
ceiling
and
has
four
phases
and
two
big
ones.
So
we
split
we
make
the
difference
between
the
pre-commit,
which
I
would
call
the
preparations
and
then
there's
the
commit
which
is
about
the
actual
proofs
and,
as
you
can
see,
those
faces
again
split
into
pieces,
and
the
important
thing
here
is
now
that
we
have
kind
of
like
a
cpu
heavy
face
and
we
have
a
gpu
heavy
face.
A
A
So
what
we
do
here
is
we
take
the
original
data
and
encode
it
in
a
unique
way,
and
here
the
important
important
bit
is
the
unique,
because
if
you
do
the
process
on
some
on
again
on
the
same
data,
you
will
get
another
replica
copy.
That
is
again
unique
and
this
is
part
of
the
whole
process,
because
we
want
that
people
store
unique
copies
of
the
data
and
this
step
is
not
parallel,
parallelizable
by
design,
so
it
runs
on
the
cpu
and
it
takes
quite
a
lot
of
time.
A
What
it
does
in
the
background
is
mostly
it's
reading
a
lot
of
data
from
disk.
Then
it's
doing
some
char
hashing
and
writing
those
hashes
again.
But
the
point
why
it's
not
parallelizable
is
that
the
hashing
you
often
depend
on
previous
hashes
to
hash
the
new
hashes.
So
therefore
you
have
those
in
the
dependencies
and
in
this
step,
there's
also
some
basic
tree
building
which
isn't
really
performance
critical,
but
tree
building
is
kind
of
core
to
the
proof
for
duplication.
A
Therefore,
I
want
to
briefly
talk
about
what
mercury
building
means.
So
what
is
a
merkle
key
here?
You
can
see
an
example.
So
at
the
bottom
you
have
your
your
data.
You
want
to
build
your
mercury
on
and
what
you
do
is
you
have
a
certain
number
of
children
in
this
case
it's
two
children,
so
we
call
this
the
arity
of
the
tree
in
ficon.
We
also
have
trees,
which
have
which
have,
for
example,
eight
children
and
but
in
this
case
it's
only
two
children,
and
so
you
take
two
pieces
and
hash
them
together.
A
So
why
is
this
interesting
or
like?
Why
would
you
do
this?
There's
something
called
merkel,
inclusion,
proof
and
what
you
can
do
with
this
is.
If
you
want
to
prove
that
certain
bytes
are
part
of
your
data,
you
would
normally
transmit
the
full
file
and
look
into
the
file.
But
this
not
this
is
not
really
efficient.
So
what
you
want
to
do
instead
is
that
you
only
transmit
approve
a
miracle
proof,
and
here
you
can
see
in
the
light
gray
boxes,
that
this
is
what
you
would
need
to
transmit.
A
Let's
get
to
the
next
phase:
it's
the
precumbent
phase,
2,
it's
about
positive
hashing.
So
again
we
do
some
market
recreation
this
time
several
ones.
I
won't
go
into
the
details,
but
the
important
bits
is
we
use
poseidon
hashing
for
this
poseidon.
Hashing
is
a
a
rhythm
that
works
on
fine
finite
field
elements
what
they
are
well,
they
are
a
cryptographic
primitive.
A
That's
all
you
need
to
know
for
now.
I
will
get
back
to
this
later
and
those
this
cryptographic
primitive
is
used
in
snarks,
which
I
also
get
quick
later
and
they're
just
better
than
or
more
efficient
than
working
with
their
hedges
and
there
the
tree
building
is
highly
parallelizable.
A
A
Then
something
happens
in
kind
of
a
loop.
So
if
you
think
about
the
previous
example
with
the
mercury,
so
you
see
at
the
bottom,
you
have
lots
of
data
and
you
then
take
let's
say
it's
really
a
huge
amount
of
data
which
is
bigger
than
the
memory
of
your
gpu.
So
let's
say
you
take
a
third
of
the
data
and
you
batch
it
up,
and
then
you
put
it
on
your
gpu
and
on
the
gpu.
A
So
it
means
that
in
this
step
we
do
the
actual
tree
building
and
the
tree
logic
and
the
batching
is
ordered
on
the
cpu
and
only
the
actual
hashing
is
done
on
the
gp,
so
only
a
small
part,
let's
get
to
the
next
stage
the
commit
phase
one.
This
is
about
merkel
inclusion
groups.
A
So
we
create
those
in
order
to
make
sure
that
this
replica
data
that
we
created
contains
the
same
data
as
the
original
input,
and
this
is
a
really
fast
process,
so
on
a
beefy
machine,
it
takes
less
than
a
second.
So
therefore,
we
just
do
it
on
the
cpu
there's
no,
no
point
of
putting
something
on
the
gpu.
A
If
it's
that
fast
and
finally,
we
have
the
commit
phase
2,
which
is
about
snarks
and
first
I
would
like
to
talk
about
like
why
do
we
need
another
face,
because
we
have
already
like
proofs
to
prove
that
the
replica
is
the
same
as
the
original
data?
The
reason
is
that
those
miracle
proofs
are
quite
large
and
it
wouldn't
be
feasible
to
putting
them
on
chain.
A
A
I
will
go
into
some
bits
of
it,
but
not
all
details.
This
would
then
be
a
separate
talk,
so
you
what
you
do
is
you
build
something
called
a
circuit?
That's
kind
of
I
mean
it's
still
software,
but
I
would
say
it's
basically
the
physical
representation
of
a
system
that
does
some
magic
things
and
those
are
basically
polynomials,
as
you
know
them
from
school,
school
or
university,
and
you
have
lots
and
lots
of
them
really
a
lot
and
all
those
operations
that
you
do
in
in
those
snarks
happen
in
finite
films.
A
So
you
need
finite
field
because
it's
also
working
on
50,
curves
and
so
on.
So
now
we
get
back.
What
I
earlier
said
is
about
the
positive
hashing,
so
side
on
hashing
is
native
to
finite
fields.
So
therefore,
it's
a
good
fit
for
those
snarks,
so
what's
now
actually
actually
running
on
the
gpu.
A
So
what
you
want
to
do
is
in
the
system.
We
want
to
evaluate
those
polynomials
at
random
points,
which
means
basically,
we
want
to
get
out
the
results,
so
you
would
have,
as
you
can
see
at
the
bottom,
like
such
a
formula,
and
you
can
do
it
with
pen
and
paper,
and
you
just
want
to
put
in
some
value
for
the
x
and
you
have
some
values
for
the
a's
and
then
do
the
calculation
and
get
some
result
back.
This
is
what
evaluation
means
and
we
want
to
do
this
efficiently
on
the
gpu.
A
So,
first,
what
we
do
is
something
called
inverse,
inverse,
fast,
fourier
transform
and
what
it
does
is.
So
the
problem
is
that
in
the
circuit,
those
polynomials
are
represented
in
a
certain
way,
but
to
do
the
actual
calculation
we
need
to
represent
them
in
a
different
way.
We
actually
need
to
transform
them
into
a
coefficient
representation,
which
is
the
one
that
you
can
see
at
the
bottom,
and
most
people
are
used
to
or
think
about
when
they
talk
about
polynomials.
A
So
we
do
this
transformation
first,
and
this
is
again
an
operation
that
is
paralyzable,
because
you
do
some
dividing
conquer
algorithm
and
you
then
recursively
step
through
those
things
and
in
the
end
you
get
the
the
coefficients
back,
which
is
those
a's
in
the
polynomials,
and
the
other
thing
is,
then,
if
you
actually
want
to
evaluate
this
polynomial,
you
use
a
function
called
multi
x.
So
what
it
does
is
that
it
now
can
operate
on
many
of
those
elements.
At
the
same
time.
A
So
here
is
I've,
put
the
kind
of
like
the
function
call,
so
you
put
in
the
coefficients
and
you
put
in
the
x
so
the
x.
So
those
things
are
then
really
like
concrete
values
and
not
variables
anymore,
and
then
you
kind
of
do
this
calculation
in
one
big
step
on
the
gpu
and
you
get
the
result
back.
A
A
So
let's
look
into
what
it
means
and
what
performance
improve
proven.
It
gives
you,
so
this
is
again
run
on
the
same
machine
as
from
the
first
slide,
and
now
you
hopefully
have
a
better
understanding
of
what
what
those
bars
mean.
So,
let's
look
in
the
details,
so
at
the
top
we
have
the
cpu
and
the
runtime
of
a
proof
of
replication,
of
a
32
gigabyte
sector
and
in
the
bottom
we
have.
If
we
have
the
gpus
enabled
on
this
machine,
we
even
had
two
gpus.
A
A
The
next
phase
is
the
pre-commit,
which
is
about
poseidon
hashing,
and
then
we
use
the
gpu,
as
you
remember,
and
we
can
clearly
see
here
like
on
the
cpu.
It
takes
something
like,
let's
see
like
almost
an
hour
like
45
minutes,
but
if
you
use
a
power
of
gpus,
it's
a
matter
of
10
12
minutes
and
the
next
phase
the
commit
1,
which
is
the
merkle
inclusion
proofs.
A
You
can't
actually
see
it
on
this
diagram,
as
I
mentioned
it
only
takes
it
takes
basically
less
than
a
second.
So
therefore,
you
can't
sleep
in
this
diagram
and
it's
for
both
it's
just
both
cases,
it's
just
this
fast
and
then.
Finally,
we
have
the
commit
phase,
two,
which
is
the
snark
magic
and
again
we
can
see
that
on.
A
And
now,
finally,
I'd
like
to
talk
about
what
we
use
underneath
so
the
proving
system
I
was
talking
about
is
programmed
programming,
language
contrast
and,
of
course,
we
use
libraries
for
certain
things
and
I'd
like
to
get
into
some
of
them
the
the
important
ones.
So
so
we
interact
with
opencl
and
there
is
a
library,
library
called
opencl3
which
is
well
maintained,
and
it
actually
replaces
a
library
that
is
no
longer
maintained,
called
ocl.
A
So
we've
used
ocl
in
the
past,
but
with
newer
versions
of
the
proofs
we
use
opencl,
3
and,
as
you've
probably
seen,
the
transition
was
quite
seamless.
Probably
you
haven't
had
any
problems
with
it
and
what
I'm
especially
proud
of
is
that
protocol
labs
helped
in
the
early
stages
of
getting
opencl3
polished,
because
it
was
still
quite
young,
but
we've
bet
on
it,
and
it
was
a
good
decision
to
spend
some
time
on
it
and
improving
it
and
make
it
work
for
our
use
case
and
also
make
it
work
for
other
people.
A
So
now
it's
really
like
a
general
purpose
library,
and
not
only
for
filecoin.
It
can
be
used
by
anyone
and
it's
also
its
intention
on
the
cuda
side.
It
doesn't
look
that
happy
and
the
raster
cuda
library
is
only
partially
maintained
and
currently
we
even
need
to
use
a
fork
of
it,
but
it
should
be
short-lived.
It's
just
a
single
function
and
I
hope
to
get
it
upstream,
because
I
really
don't
want
to
maintain
a
fork
of
it.
A
Then
we
also,
of
course,
with
the
most
recent
version
of
of
dota's
in
the
proof
system.
What
you
can
do
is
you
can
switch
between
opencl
and
cuda,
and
so
we
have
code
for
opencl
and
cuda
and
of
course
you
don't
want
to
code
everything
twice.
So
you've
built
an
abstraction
library
to
make
this
easier
for
you.
This
is
again
not
filecon
specific
but
general
purpose.