►
From YouTube: ESPA Module 4A - Engineering Upkeep
Description
Raymond Zhang (VP of Engineering) with PiKNiK discusses the engineering cluster upkeep scripting at the Enterprise Storage Provider Accelerator (ESPA) bootcamp week that took place in February 2022.
Data is growing at an incredible speed and much of this data is archived and/or simply lost by enterprises. Our program will accelerate net-new Enterprise Storage Providers into the ecosystem, using a Web3 protocol with an impressive incentive model called Filecoin.
Learn more at:
Sign-up: https://web3espa.io
Landing Page: https://m.fil.org/espa-bootcamp
Follow ESPA:
Twitter: https://twitter.com/web3ESPA
LinkedIn: https://www.linkedin.com/company/web3espa/
A
Managing
a
small
file.
Coin
operation
is
easy.
Well,
it's
easier
than
enterprise
level
storage.
So
as
enterprise
level
storage,
all
the
processes
and
the
configurations
that
you
had
as
a
small
operation
won't
transfer.
As
you
scale
up
past
a
certain
threshold.
There
is
a
miner
in
the
community
that
we
know
of
that.
Didn't
really
configure
their
setup
properly
to
provide
actual
storage
and
the
result
of
that
was
once
they
grew
once
they
grew
past
a
certain
point.
A
The
whole
operation
broke
and
it
took
10
days
to
fix
and
the
result
of
that
was
that
they
lost
all
the
block
rewards
that
they've
earned
through
the
penalties
of
failing
the
data
audit.
A
So,
as
you
get
bigger
the
more
audits
and
the
more
chances
of
getting
penalized
because
you
have,
if
you
fail
them,
then
you're
just
going
to
get
slashed
window
post
is
the
data
audit
that
you're
subjected
to
every
24
hours
lose
failing.
This
audit
means
you'll,
be
financially
penalized
and
lose
the
opportunity
to
win
block
rewards,
which
means
earning
more
money.
So
it's
really
important
to
make
sure
that
your
systems
are
healthy
and
that
you're
able
to
perform
this
audit
within
the
time
window,
you're
allotted
and
so
some
things
to
monitor
are
resource
utilization.
A
The
audit
process
is
very
resource
intensive.
It's
utilizing,
ram,
cpu,
gpu
disk
space
and
io.
So
if
your
resources
are
being
overloaded,
your
system
will
crash
with
the
high
workload
results
in
higher
temperatures.
So
you
got
to
make
sure
that
your
system
is
running
within
a
reasonable
temperature
fans
can
fail,
which
will
cause
your
temperatures
to
spike,
and
this
can
damage
your
hardware.
So
it's
important
to
monitor
that
and
finally
wallet
balances,
because
the
audit
requires
a
message
to
be
submitted
to
the
blockchain
to
the
network.
A
A
Some
things
to
optimize
is
data
access,
because
the
audit
process
is
reading
data
from
all
your
storage
medium,
it
has
to
read
the
data,
so
the
faster
you're
able
to
read
the
data,
the
higher
your
chances.
You
can
finish
the
audit
within
the
30
minute
window,
you're
allotted
so
as
a
store
enterprise,
level,
storage
provider.
A
You
have
hundreds,
if
not
thousands,
of
machines
that
you
need
to
manage
and
make
sure
that
they're
all
working
on
tasks
and
coordination
with
all
these
different
machines.
You
have
compute
resources
that
you
want
to
dynamically
allocate
to
certain
tasks.
You
don't
want
to
waste
resources
by
having
them
sit
and
do
nothing
and
ways
you
can
do
this
is
through
automation
and
orchestration
tools.
That
will
tell
certain
machines
that
aren't
working
on
something
to
then
work
on
something
else.
A
You
also
need
to
perform
maintenance
on
all
these
systems.
So
your
storage
system,
your
miner,
your
worker
nodes,
the
most
important
thing
to
call
out
here
is
the
software
updates
and
the
security
updates.
If
you're
unable
to
update
your
system
in
a
timely
manner,
your
systems
are
vulnerable
to
attack
during
that
time
period.
So
how
do
you
push
security
updates
to
thousands
of
machines?
There's
no
way
to
do
that,
except
with
automation,.
A
A
You
won't
be
able
to
tell
the
network
that
you're
going
to
store
this
piece
of
data,
so
your
whole
operation
comes
to
a
standstill
until
you
get
back
in
sync
with
the
blockchain
network
and
in
order
to
keep
100
uptime,
you
need
to
have
multiple
nodes
that
are
syncing
to
the
blockchain,
so
just
in
case
one
node
fails
you'll.
You
can
automatically
fail
over
to
the
next
node,
that
is
in
sync
with
the
blockchain,
so
you
never
have
downtime.
A
As
a
source
provider
you're
going
to
be
importing
tons
of
data
petabytes,
if
not
zetabytes,
and
the
only
real
way
to
do
this
to
manage
this
much
data
at
scale
is
to
have
some
type
of
automated
system
to
do
this.
The
file
coin
protocol
only
limits
64,
gibb
sectors
at
the
maximum,
but
most
commonly
32gb
is
used.
A
So
if
you
have
files
that
are
larger
than
this
size,
the
only
way
to
store
it
on
the
falcon
network
is
to
cut
them
up
into
smaller
pieces
so
that
you
can
put
them
into
the
sector
for
storing
data
on
the
network.
You
also
need
to
put
up
collateral.
This
incentivizes
you
to
be
a
good
actor
on
the
network.
A
So
you
are
you
don't
just
shut
down
once
you
have
the
money
once
you've
been
paid
by
the
customer
and
also
once
you
get
their
data,
so
managing
collateral
is
important
as
you're
storing
data
as
you
sort
data,
your
collateral
is
returned
to
you
over
a
vesting
schedule.
So
it's
slowly
returned
to
you,
but
also
the
block
rewards
that
you
earned.
A
You
don't
get
all
the
rewards
immediately.
That's
also
an
investing
schedule.
So
you
need
to
be
able
to
maintain
a
natural
flow
where
collateral,
where
fill
is
coming
in
through
blockboards
and
collateral,
and
then
you're
utilizing
that
again
to
store
more
data
with
data
storage,
you
need
to
have
your
ceiling
cluster.
These
are
your
systems
that
encode
the
data
that's
stored
in
order
to
be
able
to
audit
so
based
on
how
fast
you
can
seal,
you
need
to
kind
of
feed
your
system,
the
data
at
a
certain
rate
that
your
systems
can
handle.
A
A
Data
preservation
is
the
core
function.
As
a
storage
provider
data
degrades
over
time,
that's
inevitable,
hard
drives
will
also
fail.
Data
gets
corrupted
when
bits
get
flipped
for
whatever
reason
that
just
happens.
So
it's
really
important
that
you
have
data
redundancy
to
continue
passing
window
posts,
even
though
the
inevitable
happens,
and
let's
say
your
whole
data
center
breaks
down
or
there's
no,
you
lose
power.
It's
important
to
have
a
full
data,
backup
somewhere
for
disaster
recovery,
so
that
you're
able
to
restore
the
data
and
continue
doing
these
audits.
A
However,
full
data
backup
is
really
expensive
at
the
enterprise
scale.
If
you
have
petabytes
or
zettabytes
of
data
you're
not
going
to
have
another
set
of
bite
or
petabyte
of
data
backed
up
somewhere
else,
that's
double
the
amount
of
that's
double
the
cost.
So
you
so.
In
order
to
tackle
this,
you
need
to
ensure
data
durability
through
the
compute
of
parity
bits.
So
these
are
bits
that
you
can
store
throughout
different
locations
and
you're
able
to
rebuild
the
original
data
from
these
bits.
But
you
also
need
to
be
proactive
about
this.
A
A
Data
availability
and
preservation
is
vital.
Like
I
mentioned,
if
you
don't
have
the
data
for
audit,
if
you
fail
the
audit
you're
going
to
be
penalized
until
you
restore
that
or
until
you
go
bankrupt
and
finally
utilizing
resources
efficiently
by
automating
based
on
your
needs,
your
systems
can
only
handle
so
much
workload,
and
you
also
don't
have
unlimited
fill
to
post
collateral.
So
you
need
to
figure
out
the
most
optimal
way
where
your
system's
running
smoothly
and
continuously.