►
From YouTube: Erasure Code at Scale - Thomas William Byrne
Description
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Thomas William Byrne, Science and Technology Facilities Council Linux System Admin
A
So
this
is
where
I'm
based
this
is
the
Rutherford
Appleton
Laboratory
in
Oxfordshire
in
the
United
Kingdom
there's
a
bunch
of
relatively
large
experiments
on
site.
We
we
took
there
mainly
sort
of
particle
physics
experiments,
but
they
have
a
whole
host
of
users,
people
that
sort
of
biologists
and
chemists
come
and
use
them
to
study,
study,
other
things,
they're
studying
yeah.
A
So
the
LHC
experiments
at
CERN
produce
a
fairly
ridiculous
amount
of
data
I
think
this
year
they
meant
to
produce
around
50
petabytes
of
data
and
all
of
that
data
needs
to
be
stored
somewhere
in
an
experimental
nice.
The
WL
CG,
the
worldwide
LHC
computing
grid
was
set
up
with
the
aim
of
doing
exactly
that.
So
it's
a
collection
of
sites
around
the
world
that
are
involved
in
the
story
in
the
storage
analysis,
where
one
of
the
biggest
sites
involved
in
this
project.
A
We
have
over
30
petabytes
of
disk
storage
and
then
tape,
storage
and
analysis
machines.
Currently
around
10
petabytes.
Our
storage
is
safe.
That's
what
that's!
What
I'm
working
on
and
yeah
and
as
as
we
go
on
more
of
that
historic
will
become
safe.
So
we
start
ramping,
ramping
that
up
cool
no
I've.
Also
on
the
map
there
I
have
the
institute
of
high-energy
physics,
it's
just
over
the
other
side
of
Beijing
and
they're,
also
involved
in
the
collaboration.
A
So
what
do
you
actually
need
to
provide
if
you're
wanting
to
provide
grid
storage?
It's
pretty
simple
for
the
most
part.
Traditionally,
the
storage
was
all.
It
was
hierarchical
file
system,
like
storage,
there's,
no
real
need
for
that
anymore.
We've.
The
experiments
have
moved
away
from
using
that
sort
of
stuff
for
the
most
part
they're
just
using
the
storage
as
as
an
object
store.
A
So
one
of
the
things
we
were
looking
at
when
we
were
looking
at
sort
of
replacing
our
storage
was
whether
we
could
use
an
object
store,
and
that
was
one
of
the
reasons
why
we
started
looking
at
self
yeah.
This
is
very
much
high.
Throughput
computing
we're
not
talking
about
high
performance
computing
here,
the
problems
are,
for
the
most
part,
embarrassingly
parallel
they're
broken
down
into
very
small
parts,
so
we're
not
particularly
worried
about
sort
of
single
stream
performance.
We're
not
particularly
worried
about
latency
yeah,
there's,
no,
there's
no
user
interaction
going
on
here.
A
All
of
this
is
jobs
have
been
scheduled.
Yes
and
then.
Lastly,
there
are
a
few
non
sort
of
mainstream
protocols
that
you
need
to
support
to
actually
work
in
the
grid.
Specifically,
those
grid
ftp,
which
is
used
for
external
transfers,
which
is
yeah,
shuffling
data
between
sites
and
then
extra
key
is
used
internally,
which
is
the
jobs,
will
be
fetching
data
off
the
storage
yeah.
A
So
yeah
back
around
the
release
of
Firefly
we've
started
looking
into
using
Ceph
for
this
role,
replacing
replacing
our
disk
only
storage
right
from
the
get-go.
We
had
an
incredibly
tight
limit
how
the
terabyte
limit
was
low
and
we
weren't
going
to
be
able
to
push
that.
So
there
were
going
to
be
a
couple
of
caveats
to
using
SEF.
A
Mainly
we
were
not
going
to
be
using
replication,
so
we
were
looking
at
region
coding
from
the
beginning.
We
were
always
going
to
have
large
storage
nodes.
We
weren't
going
to
have
a
nicely
SPECT,
safe
cluster
for
lots
of
AI
ops.
We
were
gonna,
be
looking
at
sort
of
30
30
plus
drives
in
each
storage
node,
and
we
were
gonna,
have
other
nice
things
like
SSDs
for
journals,
and
all
of
that
it
was
gonna,
be
everything
was
gonna,
be
collated,
co-located
on
the
data
discs
yep.
A
So
the
way
we
ended
up
supporting
the
grid
protocols
we
needed
to
was
using
writing
our
plugins
directly
on
top
of
liberators
or
on
top
of
Liberata
Stryper.
We
had
to
help
some
help
from
CERN
doing
this.
They
were
interested
in
using
it
for
a
different
reason,
and
so
we
were
able
to
collaborate
on
there
and
make
sure
we
got
what
we
needed
yeah.
We
did
try
and
get
experiments
use
s3,
but
there
was
definitely
limited
success
in
there.
A
That's
always
the
goal
with
this,
when
we
have
new
people
coming
on
and
using
it,
we
go
trying
to
push
them
towards
using
industry,
support
to
stuff
get
them
using,
read
less
GW.
It
makes
our
lives
easier
and
it
means
that
we
have
less
custom
solution
in
the
end
yeah,
oh
and
at
some
point
during
the
whole
process.
We
came
up
with
this
acronym,
so
it's
called
echo.
A
So
this
is
a
quick
diagram
of
how
we
actually
get
data
in
and
out
of
our
self
cluster.
So
just
start
with
sort
of
on
the
right
hand,
side
that
was
basically
all
we
had.
So
we
had
our
cluster
and
then
we
had
a
couple
of
external
gateway
machines,
which
is
just
big
machines,
with
lots
of
networking
that
ran
all
of
the
protocol
servers
that
we
needed,
so
the
external
sites
data
coming
in
and
data
going
out
would
go
through
them
and
data
going
on
to
the
worker
nodes.
A
This
wasn't
ideal
when
this
was
something
that
was
quickly
identified
as
being
a
generally
bad
idea,
so
we
wrapped
up
the
accurately
server
with
the
plug-in
and
the
various
things
that
needed
safecom
and
the
correct
key
ring
into
a
container,
and
now
that
runs
on
all
the
worker
notes.
So
when
it,
when
a
job
requests
files
on
one
of
the
worker
nodes,
it
thinks
it's
speaking
to
the
external
gateways,
but
there's
a
little
bit
of
host
redirection.
A
Cool,
so
this
was
the
first
lump
of
storage.
We
bought
four,
we
bought
four
okay,
yeah,
there's
not
a
great
deal
to
say
here,
it's
fairly
standard,
it
started
its
life.
Yes,
after
this
jewel,
we
quickly
upgraded
to
a
sink
message.
Sorry
to
Kraken,
because
we
had
a
lot
of
problems.
I
will
talk
about
them
in
more
detail
on
the
next
few
slides
and
then
it
was
upgraded
to
luminous.
Recently
yeah
there's
nothing
particularly
interesting
with
this
hardware.
A
A
So
we
had
some
decisions
to
make
when
we
were
coming
up
with
how
we
were
actually
going
to
do
the
arranger
coding-
and
this
was
obviously
back
in
2014-2015.
There
wasn't
a
lot
of
there-
wasn't
a
lot
of
information
out
there
about
what
people
were
doing
with
sort
of
large-scale
erasure
coding.
So
we
spent
a
lot
of
time
messing
around
with
this
and
getting
things
to
break
and
then
finally,
getting
things
working.
A
We
ended
up
with
eight
data
stripes
of
three
parity
stripes,
which
gave
us
an
overhead
we
could
afford
in
terms
of
we
could
buy
enough
storage
to
meet
the
pledges
we
needed
to
meet
and
gave
us
decent
security.
We
were
sick.
We
were
starting
to
see
issues
with
raid
6
on
our
existing
system.
We
were
losing
data
because
things
were
breaking
in
rebuilds
and
so
being
able
to
being
protected
against
through
failures
was
a
lot
better.
A
Initially,
we
were
interested
in
going
for
a
higher
number
of
data
stripes.
Obviously,
the
high
of
the
more
data
stripes
you
have,
the
smaller
the
overhead
is
for
the
same
amount
of
data
security,
so
that
was
something
we
were
interested
in,
but
what
we
found
was
that
it
was
very
hard
to
keep
a
cluster
actually
stable
when
you've
got
these
placement
groups
that
contain
19
OS
D's
each
yeah
things
didn't
work
very
well
that
this
was
again
pre,
async
messenger.
So
this
was
on
jewel.
A
Things
have
changed
the
rest
of
these
settings
of
very
default.
We
were
already
taking
a
fairly
large
gamble
by
using
arranger
coding
at
this
time.
There
wasn't
a
lot
of
people
using
it
and
it
seemed
it
seems
sensible
to
stick
with
the
things
that
most
people
would
be
using
so
that
we
hopefully
weren't
the
first
people
to
run
into
things
when
things
went
wrong.
A
Cool
so
yeah
a
little
bit
about
what
our
paws
are
actually
made
up
of,
so
our
largest
data
pool
is
just
over
4
petabytes
now,
and
so
that's
4
petabytes
with
yeah
2048
placement
groups.
So
we've
got
yeah
a
little
bit
over
2
terabytes
of
data
per
placement
group,
which
is
fairly
large
compared
to
what
a
lot
of
people
are
doing.
A
It
does
make
things
like
deep
scrubs
take
well
over
an
hour
which
can
be
a
bit
annoying
sometimes-
and
this
is
something
that
where
we
are
looking
at,
we
will
be
increasing
this
number
and
we're
gonna
be
aiming
for
around
one
terabyte
of
placement.
Group
I
think
is
a
reasonable
thing.
One
of
the
things
that
bit
us
early
on
is,
if
you're,
if
you've,
got
very
large
sort
of
ec
placement
groups
with
lots
of
our
seeds
in
them
the
recommendations
for
how
many
placement
groups
you
should
have
in
your
cluster.
A
Don't
necessarily
work
particularly
well,
and
you
end
up
with
far
too
many
placement
groups,
or
you
end
up
with
each
OSD
being
part
of
far
too
many
placement
groups
which
can
lead
to
serious
performance
issues
so
yeah
in
terms
of
the
sort
of
having
so
much
data
in
each
placement
group.
We're
not
seeing
any
issues
if
everything
works
well,
we're
getting
the
throughput
we
need
and
yeah
from
that
respect
is
working
fine.
A
What
we
were
seeing
issues
with
in
the
early
days
was
actually
just
the
amount
of
communication
that
is
involved
in
a
large
placement
group
like
this,
when
you've
got
1111
OSD
is
needing
to
communicate
with
each
other.
In
order
to
actually
peer
that
placement
group
you,
what
we
ended
up
with
was
we're
in
a
situation
where
peering
would
actually
cause
OSDs
to
not
be
able
to
get
that
the
heartbeats
wouldn't
get
through
and
so
OS
DS
would
end
up
knocking
on
the
ro
s
DS
offline,
they
would
go,
they
would
be
marked
down.
A
So
we
spent
quite
a
lot
of
time
tuning
sort
of
trying
to
figure
out
how
we
can
stop
that
happening.
This
was
what
worked
for
us
just
essentially
making
them
a
little
slightly
more
resistant
to
being
marked
out
so
so
early,
and
that
made
it
a
lot
more
stable,
I
say
again.
This
was
all
so.
This
was
all
in
the
early
days.
This
was
in
Joule.
Yes,
so
this
was
pre
async
messenger
things
have
got
a
lot
better
with
async
messenger,
and
so
it
would
be
in
a
sense.
A
So
I'm
not
gonna,
dwell
on
this
too
long.
It's
not
a
lot
of
interesting
information
here,
yeah
our
crush
way.
Ups.
We've
kept
them
very
simple
because
of
the
nature
of
what
we're
doing
we're
not
worried
about
trying
to
be
as
trying
to
have
as
much
availability
as
possible,
and
so
it
was
deemed
acceptable
that
we
would
be
in
danger
of
losing
availability.
If
we
lost
power
to
Iraq,
that
sort
of
thing
doesn't
happen.
Often
we've
got
dual
power
supplies
in
all
of
our
racks
and
our
networking
is
redundant
for
the
most
part.
A
So
I
mentioned
earlier
that
we
use
Liberator
striper
for
ya
throughout
for
our
plugins
to
our
external
protocol
service.
When
the
objects
come
in
from
the
experiments
they
come
in
at
a
range
of
sizes,
and
so
it's
yeah.
It
makes
sense
to
strike
these
down
to
manageable
numbers
or
manageable
size
of
objects
on
disks
liberato
strive
for
that,
as
I
think
was
mentioned
earlier.
Actually,
the
default
stripe
size
for
liberalist
Raipur
is
somewhere
around
4
megabytes.
A
This
number
makes
sense
when
you're
talking
about
replicated
pools,
you
end
up
with
4
megabyte
objects
on
disks,
which
is
a
completely
reasonable
size.
When
you
start
talking
about
high
sort
of
large
numbers
of
data
stripes
in
arranger
coded
pools,
you
suddenly
end
up
with
very
small
object
on
the
disks,
so
in
this
case
we
would
end
up
with
512
megabyte
objects.
A
If
we
were
looking
at
higher
K
numbers,
which
we
were
doing,
it
was
even
smaller
objects,
it
suddenly
means
that
you're
incredibly
dependent
on
how
many
ions
you
can
do
for
what
your
actual
performance
is,
which
is
something
that,
from
from
the
use
case,
I've
described
we're
mainly
worried
about
throughput,
and
so
we
did
a
lot
of
investigation
into
what.
What
should
you
actually
do?
What's,
if
what's
a
good
stripe
size
doing
what
what's?
What's
a
good
size
of
object
to
actually
aim
for?
A
That's
on
the
disk
in
in
our
case,
that
meant
going
with
Liberato
Stryper
striping
things
into
64,
megabyte
chunks,
stripes,
even
and
then
that
means
at
the
end
of
the
day,
you've
got
8
megabyte
objects
on
your
disk
yeah
which
which
works
for
us.
It
was
the
it
was
the
least
bad
option.
It
works
fairly,
well,
yeah,
and
at
that
bottom
point
it's
just
sort
of
basically
what
I've
just
said.
A
If
you
then
lose
a
placement
group,
you've
suddenly
lost
a
little
bit
of
every
single
object
in
your
cluster,
so
yeah
so
I've
sort
of
described
it
a
lot
of
the
decisions
that
went
into
actually
making
it
and
a
lot
of
the
yeah
a
lot
of
the
considerations
we
had
to
think
about,
and
then
I'm
gonna
sort
of
talk
a
little
bit
about
what
it's
actually
like
living
with
it
and
how
it
works
for
us.
So
I
guess
the
first
easy
thing
to
talk
about
is:
does
it
actually
work
and
yes,
it
does
work?
A
A
It
will
be
interesting
to
see
how
far
it
can
go.
We've
had
all
the
performance
problems
we
have
had
have
been
entirely,
or
mostly
due
to
the
external
gateways,
and
that's
things
like
badly
thought
out
buffer
sizes,
meaning
like
excessive
memory
usage
and
port
exhaustion
due
to
miss
configuration
that
sort
of
stuff,
so
that's
been
interesting
and
the
gateways
on
the
worker
nodes
have
been
really
really
good.
A
Okay,
so
the
other
sort
of
performance
you
care
about
when
you're
running
a
cluster
is
what
it
was
the
back
filling
like
in
again.
In
the
jewel
days,
we
were
seeing
issues
with
the
backfill
and
causing
high
load
impacting
client
IO,
and
we
spent
a
lot
of
time
trying
to
tune
that
down
so
yeah.
This
was
some
of
some
of
the
tuning.
We
did
yeah
we're
in
a
much
happier
place
now.
A
Backfilling
can
backfilling
goes
on
as
part
of
the
normal
operation
of
running
it
most
weeks,
we'll
be
doing
something,
and
there
has
no
effect
on
client
error
from
what
we
can
see
and
we'll
have
been
happily
doing:
20
or
30
gigabytes
of
backfill
traffic
and
10
or
15
gigabytes
of
client
traffic
with
no
issues
the
bottom
eight
we
do
see
when
we're
actually
adding
nodes
is
actually
the
networking.
So
the
cluster
networking
of
the
node
we're
adding
will
be
saturated,
awhile.
There's
enough.
A
A
A
This
is
yeah.
This
is
sort
of
a
taste
of
one
of
the
operational
issues
we
had
in
the
summer,
so
yeah
back
in
the
summer.
We
we
started
when
we
built
echo.
We
started
out
with
30
of
the
nodes
and
then
added
the
remaining
30
as
sort
of
an
experiment
of.
Can
we
actually
do
this?
Does
this
actually
work
for
us?
A
What
we
ran
into
was
this
bug
that
tracker
there
and
essentially
a
quick,
quick
rundown
is
when
a
when
a
ranger
coded
placement
group
is
backfilling,
the
primary
will
be
sending
out
requests
for
all
of
the
shards
from
all
the
other
OS
DS.
If
one
of
those
OS
DS
can't
read
it
shard,
if
there's
a
pending
sector
or
something's
gone
wrong
on
the
desk,
and
it
can't
read
the
shard,
it
will
reply
and
say:
I
can't
read
that
chart
and
the
primary
instead
of
being
aware.
Okay,
that's
fine.
A
I
can
still
reconstruct
the
object,
which
is
crash,
which
is
really
unfortunate
and
obviously
it
will
then
be
restarted
and
then
crash
again
and
still
and
then,
when
system
D
stops
restarting
it.
The
second
OSD
in
the
set
war,
then
sort
of
become
the
primary
real,
be
like
we
were
doing
backfilling
and
then
suffer
the
same
fate,
so
this
yeah.
This
was
interesting.
This
hit
us
fairly
hard.
We
ended
up
with
multiple
placement
groups
down
over
the
course
of
an
afternoon
and
no
clear
idea
of
what
was
going
on.
A
While
we
were
actually
trying
to
figure
out
what
was
going
on,
we
ended
up
doing
some
fairly
drastic
things
to
try
and
recover
the
placement
groups,
and
we
ended
up
actually
losing
one
which
was
pretty
unfortunate
and
yes,
so,
once
we
once
we
got
to
the
bottom
of
it
and
managed
to
start
removing
the
OSDs
that
had
any
any
OSD
that
had
a
single
pending
sector.
It
turned
into
a
bit
of
an
operational
nightmare.
A
It's
it's
been
a
fairly
tough
six
months,
trying
to
deal
with
running
a
cluster
doing
cluster
operations,
most
of
which
will
involve
backfilling
when
you're
in
a
state
where
any
any
backfilling
has
the
potential
of
bringing
down
the
placement
group
and
yeah
removing
reducing
your
data
availability
as
of
a
few
weeks
ago.
This
is
actually
fixed,
which
is
nice,
so
yeah,
when
I
get
back
I
will
be
upgrading
as
soon
as
I,
possibly
can
but
yeah.
So
this
this
was
an
interesting
problem.
We
had
so
continuing
on
the
theme
of
problems.
A
This
is
the
other
main
operational
thing
that
we
issue
that
we
get
when
we're
running
running
faster.
Think
the
thing
that
takes
the
most
of
our
day-to-day
time
in
terms
of
mundane
things
is
inconsistent
placement
groups.
So
an
inconsistent
placement
group
is
when
deep
scrubbing
happens
and
the
shards
of
an
object
don't
agree
with
each
other.
We
get
a
fair
amount
of
these
most
of
these
or
all
of
these
without
fail
have
been
due
to
a
there
being
a
pending
sector
on
a
disk.
A
So
a
disk
has
a
sector,
that's
become
unreadable,
and
so,
when
the
object,
when,
when
the
OSD
tries
to
read
the
shot,
it
can't
yes
I'd,
say
probably
nine
times
out
of
ten,
maybe
even
more
than
that.
This
doesn't
result
in
us.
Taking
that
disk
out
healthy
discs
do
develop
ending
sectors-
it's
just
they
are.
They
are
designed
to
work
around
them.
The
firmware
will
work
around
it.
A
It
will
remap
the
sector
so
most
of
the
time
we
will
not
be
taking
the
discount,
we'll
just
be
doing
a
PG
repair
which
will
just
try
and
rewrite
the
shard,
the
broken
shard
and
then
rescrub
the
placement
group,
and
then
it
will
come
back
yourself.
Okay,
in
this
respect,
I
think
this
is
a
slight
complaint
in
the
health
era,
for
what
is
a
harmless
problem
on
that,
these
types
of
pools
seems
like
a
lot
of
overkill.
A
It
makes
monitoring
things
like
safe
health
if
you're,
if
you're
doing
call-outs
and
everything
something
that
makes
a
lot
of
sense
to
call
out
on
this,
the
health
of
the
cluster.
If
you
suddenly
are
in
this
state
for
over
ten
hours
a
week,
it's
a
bit
of
a
nightmare
so
yeah,
that's
that's
something!
That's
interesting!
We've
actually
kind
of
got
around
that
we
use
the
scrubbing
scheduling
in
the
opposite
way
from
the
last
speaker.
We
actually
schedule
it,
so
we
only
scrub
in
day
time
hours.
A
A
A
Yeah.
It's
been
a
large
part
of
our
operational,
just
our
operational
work.
Our
development
work
has
been
figuring
out.
How
we
deal
with
these
we've
been
coming
up
with
short-term
solutions
of
pulling
discs
out
as
soon
as
they
get
pending
sectors
and
then
essentially,
remapping
them
writing
to
the
whole
disk
reading
from
the
whole
disk,
deeming
it
healthy
putting
it
back
in
this
is
a
lot
of
work
for
what
our
single
sectors
on
the
disk
going
bad,
which
can
be
expected
of
any
disk
over
its
lifetime.
A
So
having
to
move
petty
terabytes
of
data
around
for
single
pending
sectors
does
seem
a
little
bit
a
little
bit
silly
yeah,
and
this
is
one
of
the
areas
that
we
didn't
expect
to
be
putting
work
in
with
SEF.
And
it's
interesting
that
we
are,
and
it
will
be
interesting
to
see
where
we
go
in
the
future,
with
that,
whether
we
get
better
at
dealing
with
them
or
whether
Ceph
can
get
better
at
dealing
with
them.
A
This
works.
Okay.
For
us,
we
do
a
lot
of
part
of
echo
so
yearly
operations.
There
will
be
nodes
going
in
and
notes
notes
yeah
going
in
and
though
it's
coming
out,
yeah
we're
gonna,
be
there
will
be
new
generations
every
year
and
will
be
retiring
old
generations
every
year.
So
what
we've?
What
we've
ended
up
doing
is
we're
using
SEF
deploy
for
normal,
getting
OSDs
onto
disks
and
then,
when
we're
thank
you
yeah
when
we're
making
changes
when
we're
doing
real
weights.
All
of
that
is
just
manual
krushworth,
edits,
yeah.
A
It's
trickier
to
roll
it
back,
and
it's
not
as
quick
to
roll
it
back,
there's
just
being
able
to
push
the
old
crash
mat
back
in
so
ya,
have
being
able
to
do
manual,
crush
map
edits
and
then
do
it,
sort
of
a
step
change
and
there's
just
one
recalculation,
and
it's
certainly
like
oh
yeah,
there's
six
percent
of
objects,
misplaced!
That's
what
we
expected.
It
seems
to
be
a
much
cleaner
way
to
do
that.
It
would
be
cool
if
there
was
a
tool
to
do
this.
A
I
think
this
is
something
that
would
improve
the
usability,
certainly
for
us
I'm
sure.
Other
people
would
enjoy
this
too,
being
able
to
sort
of
batch
up
a
bunch
of
changes
on
an
offline
copy
of
the
crash
map
and
then
push
that
out
in
one
go
and
you
can
make
it
as
user
friendly
as
you
wanted,
and
you
can
do
all
sorts
of
fun
stuff
with
actually
analyzing.
What's
happening
with
what
data
you're
moving
around.
A
My
utilization
is
perfectly
effective
at
keeping
the
ones
that
are
nearly
full
moving
the
back
into
the
middle
of
the
pack,
but
you
never
really
lose.
The
longtail
you've
always
got
OSD
is
only
have
one
or
two
placement
groups
actually
have
data
in
them
and
it's
a
bit
yeah.
It's
certainly
something
that
I
think
could
be
improved,
and
that
is
one
of
the
things
that
has
been
improved,
and
so
we've
been
looking
at
the
balance
of
module,
which
is
new
with
luminous,
and
it
does
seem
to
just
seem
to
do
some
things.
A
lot
better.
A
A
It's
yeah,
it's
gonna,
be
really
exciting
to
see
how
Seth
handles
us
putting
a
lot
more
hardware
into
it.
Ekko
is
so
it's
currently
around
10
petabytes.
It's
going
to
be
around
30
petabytes
in
a
year's
time
you're
in
a
bit
at
the
time,
and
so
it's
yeah,
it's
gonna,
be
really
interesting
to
see
how
the
performance
goes.
A
Improves
with
that.
There's
also
some
things
that
I'm
not
so
excited
about.
I'm,
not
really
I,
don't
really
see
the
issues
we're
having
with
disks
doing
anything
but
scaling
up
with
the
number
of
disks
and
then
especially
as
we
start
having
aging
generations
of
hardware
in
there
yeah
as
hardware
you've
got
five
year
old
hardware
and
brand
new
harder,
and
how
are
you
deal
with
that,
so
that
will
be
fun,
yeah
and
then
I
think.
Finally,
so
to
sum
it
all
up
using
a
range
of
coding,
there
originated
SEF
on
large
storage
nodes.
A
It's
definitely
working
for
us.
It
was
yeah
a
bit
of
a
gamble
when
we
started
there
was
some
questions
that
we
weren't
quite
sure
where
which
way
they
were
going
to
go,
but
we've
got
a
lot
more
confidence
and
it's
clear
that
safe
as
a
community
is
getting
behind
a
racial
coding
with
yeah
RB
d
on
a
racial
coding
and
set
for
fest
on
a
racial
coding.