►
From YouTube: Ceph Days NYC 2023: 100 Years of Sports on Ceph
Description
Presented by: Frank Yang
Working together with a major American sports league, we built a multi-site 40 PB active archive housing over 100 years of game video and audio assets by using Ceph as the foundational storage technology. Along the way, we learned many lessons about architecting, deploying, and operationalizing Ceph from the vantage point of a large, modern, and rapidly growing media company. We would like to share our experience and learnings with the community to help others traveling a similar road.
https://ceph.io/en/community/events/2023/ceph-days-nyc/
A
So
my
name
is
Frankie
yank.
So
that's
the
title
of
the
talk
today.
By
the
way
we
didn't
get
the
permission
to
use
the
logo
and
their
name
so,
but
they
are
here
in
the
audience.
So
I'd
like
to
take
the
time
to
thank
our
partner
there.
They
had
the
vision
to
start
this
project,
and
so
we
we
are
just
very
fortunate
appreciative
to
be
invited
on
the
journey.
A
Right
with
that,
so
the
starting
point,
like
the
starting
point
of
this
project,
so
100
Years
of
Sports,
is
very
large
set
of
data,
and
these
are
mostly
video
video
data
right
and
so
the
motivation
there
is
that
these
datas
today
are
stuck
on
a
large
and
growing.
A
The
the
data
is
stuck
on
tape
today,
right
and
there's
a
large
growing
media
data.
Okay,
now
these
aren't
just
regular
data.
These
are
irreplaceable
data,
they're,
not
just
test
data
that
we
can
regenerate
there
of
historical
and
cultural
importance.
So
a
lot
of
this
motivation
is
around
preservation,
making
sure
the
data
is
available
not
just
now,
but
for
a
future
generation.
There
are
the
crown
jewels
of
the
league,
and
so,
in
addition
to
preserving
the
data,
we
want
to
be
able
to
actually
do
something
with
it.
A
So
it's
not
about
just
preserving
the
data
but
to
be
able
to
compute
on
it
to
put
analytics
on
it
and
monetize
on
it.
Okay,
so
what
kind
of
infrastructure
you
know
what
kind
of
storage
we
need?
So
this
is
where
the
motivation
is
to
open
up
this
infrastructure,
not
just
for
today.
For
future
the
the
size
of
the
data
is
growing
and
the
type
of
data
that's
going
to
be
put
in.
A
There
are
also
changing
all
right
so
these
days,
not
only
do
we
have
more
cameras,
more
angles,
we
have
higher
frame
rates.
We
have
data
now
that
are
not
video
right,
you're
talking
about
the
the
the
metrics,
not
just
from
the
game
itself,
but
from
the
audience
from
you
know,
other
use
cases
in
the
stadium,
and
so
the
requirements
for
putting
this
infrastructure
together.
Is
that
not
only
do
we
have
to
have
robust
data?
We
want
to
have
fast
access
to
these
data.
A
The
data
needs
to
be
not
only
had
it
available,
but
needs
to
be
accessible
from
multiple
sites
anywhere
anytime.
Okay,
at
the
fastest
speed
that
we
can,
we
can
make
possible.
We
want
to
be
free
from
any
vendor,
login,
okay,
so
hence
Seth
and
other
open
sources.
You'll
see
later
that
are
used
in
in
this
project.
And,
fundamentally,
you
know
the
the
cost
structure
of
this
needs
to
be
better
than
just
dumping.
A
This
Cloud
these
data
into
the
public,
Cloud
okay,
so
those
are
the
the
criterias
that
kind
of
we
set
forth
as
we
started
this
project.
Okay.
So
why
SEF
right
so
open
source?
Definitely
there
is
one
of
the
the
major
motivation
as
I
mentioned
earlier,
but
of
the
other
open
source
out
there
I
mean
Seth
is
the
only
one
to
us
anyway,
that's
viable
because
it's
the
it's
proven
right.
It's
got
the
large
community.
That's
why
we're
all
here,
it's
actively
being
developed,
you
know
their
their
stuff
that
are
coming
down.
A
The
pipe
is
now
static,
open
source
right,
it's
free
from
choice
of
Hardware
vendors
like
they're,
they're
self-implantation
out
there
you're
using
old,
not
just
different
vendors
but
different
types
of
Hardware
right,
so
that's
very
important
to
us
as
well,
and
then,
at
the
end
of
the
day,
it's
all
about
control
of
the
data,
and
these
data
are
staying
you
know
within
within
the
league.
They
want
full
control
over
it
and
they
want
available
to
to
do
what
they
need
with
it.
A
And
so
that's
why
we
went
down
the
the
path
with
Seth.
A
So
what
do
we
set
out
to
to
build?
So
this
is
a
multi-year
multi-phase
project,
so
in
the
first
phase,
where
we
put
together,
putting
together
40
petabytes
of
what
we
call
Active
archive
okay,
so
the
the
the
bars
that
we
set
for
ourselves
in
this
first
first
phase
is
one
you
know,
as
I
mentioned
earlier,
the
the
cost
structure
of
it
like
this
is
all
on-prem.
A
You
know
in
color,
so
you
add
together
all
the
the
harbor
or
the
folks
involved
right
and
all
the
software
requirements
and
so
forth
needs
to
have
a
lower
cost
structure
than
than
being
in
the
public
Cloud.
We
need
to
have
high
durability,
so
this
is
using
object
using
SEF
as
an
object
store
so
as
three
type
axis.
A
So
we
want
to
be
comparable
to
a
cloud
durability,
type
of
numbers
highly
available,
like
I,
said,
access
anywhere,
survive,
site
failure,
survive,
Hardware
failure,
Supply
survive
software
failures,
so
there's
a
high
availability
involved
in
this,
and
we
want
to
be
able
to
scale
to
hundreds
of
petabytes,
so
we
may
start
with
40.
But
this
thing
like
I,
said:
let's
go
instead
of
data,
so
we
want
this
thing
to
be
able
to
scale
to
hundreds
of
petabytes
and
perhaps
Beyond
the
next
one
is
very
important
right.
A
It's
about
the
the
operational
efficiency,
so
Seth,
like
the
folks
in
the
audience
here
or
a
lot
of
them,
are
all
experts
in
safe,
but
we
don't
want
to
have
to
have
the
entire
ID
team
be
safe
experts
right.
We
want
to
make
this
operational
efficient.
We
need
to
be
able
to
have
folks
that
are
just
you
know
your
usual
I.T
guys
or
be
able
to
manage
this
size
of
a
cluster.
It
is
not
just
the
storage
itself,
but
all
the
things
are
around
it.
A
So
it's
a
turnkey
easy
button
for
essentially
having
a
software
to
find
storage,
including
all
the
other
pieces
of
what
it
takes
to
be
able
to
access
data,
and
we
want
to
have
the
ability
to
be
able
to
compute.
So,
like
I
said
it's
not
just
a
storage,
it's
the
computer
associated
with
it's
networking
is
the
access.
Is
the
user
control?
Is
the
the
certificate,
controls
and
or
aspects
of
making
this
possible.
A
So
this
is
what
we
ended
up
with
and
it
took
a
lot
of
planning,
but
this
is
what's
built
today:
okay,
okay,
so
there
there
are
two
sites,
not
surprisingly
one
on
the
west
side
and
one
on
the
east
side
of
of
the
United
States,
and
we
have
a
media
manager
whose
job
in
this
first
phase
is
to
take
the
data
from
the
tape
archive
and
then
put
it
into
these
two
sites
and
they're
they're
copies.
So
the
two
sites
basically
have
identical
copies
of
the
data.
A
There's
a
extra
copy
that
will
go
into
the
public
Cloud,
okay,
but
most
of
the
active
part
of
it
of
the
active
active
Computing
are
done
directly
on-prem
on
these
sites
within
the
sites.
You'll
notice
that,
in
addition
to
the
production
cluster
right,
those
are
the
the
the
the
eight
racks
per
site
that
are
holding
the
production
data.
We
have,
very
importantly,
these
sandboxes,
okay
and
I
can't
stress
the
importance
of
having
these
sandboxes.
So
these
sandboxes
are
essentially
many
replicas
of
the
production
cluster.
So
I
have
the
identical
Hardware.
A
They
have
the
identical
setup.
They
just
have
less
of
it.
Right
so
same
versions,
running
same
setup
and
that's
where
all
the
not
only
the
staging
happens
for
before
we
push
anything
into
the
productions
for
either
upgrades
or
loading
new
softwares
doing
configuration
changes
all
happened
on
the
same
box
first
before
production,
but
that's
also.
We
where
we
can
run
experiments
so
before
we
decide
what
we
wanted
to
put
on
there
and
tuning
the
performances
they're
all
down
on
sandboxes.
First,
it's
also
the
canary
in
the
coal
mine
in
case.
A
Something
does
happen
and
it
has
happened
right
as
it's
a
again
can't
stress
the
importance
there
any
bugs
any
usability
issues:
user
errors
oral
detected
first
on
the
sandbox
before
they
you
know.
Hopefully
they
never
show
up
in
the
production
Department
the
the
networking
within
the
site
store
100g
networking.
We
want
to
make
sure
that
networking
is
not
the
bottleneck,
there's
also
200
gig
links
between
the
two
sites
over
private
Network.
So
this
entire,
you
know
lower
half
of
this.
A
It's
all
private
Network,
where
we
hope
the
bandwidth
is
not
what
we
make
sure
that
bandwidth
is
not
the
bottleneck.
A
Okay,
so
what's
what's
in
some
of
these
or
in
in
these
in
these
racks,
so
the
osds
are
in
jbots,
okay,
the
j-bots
are
zoned
into
two
halves:
two
halves,
each
of
them.
Each
half
basically
has
a
OSD
server
that
are
managing
a
53
osds,
and
then
we
also
have
within
the
Clusters
the
compute
nodes,
which
today
are
mostly
serving
the
purpose
of
being
the
Red
House
gateways
and
the
the
load
balancers
okay,
and
so
when
the
racks
are
fully
filled,
and
this
is
all
planned
out
right.
A
So
first
phase
is
the
racks
are
actually
partially
populated,
but
we've
already
marked
out
how
many
racks
the
placements
of
the
stuff
USB
upgrade.
Okay,
in
fact
this
year
we're
we're
in
the
process
of
doubling
the
size
you
know
from
from
40
petabyte.
You
know
to
X
that
and
just
some
interesting
facts
right
so
when
these
racks
are
fully
populated.
Each
rack
is
about
20
kilowatts,
so
that's
about
40
refrigerators
worth
of
of
power,
that's
being
consumed.
A
You
know
on
the
rack,
you
know
the
the
the
the
the
density
and
then
the
the
weight
of
the
racks
in
there.
If
you
add
up
all
the
not
just
the
servers
but
the
the
physical
drives,
these
are
rotational
drives
for
cost
reason.
A
And
you
know
it's
an
act.
It's
an
archive.
We
use
rotational
drive.
So
if
you
add
up
the
entire
weight
of
of
one
rack,
that's
that's
over
a
ton,
it's
about
the
size
of
a
small
car,
okay,
and
we
have
many
of
these.
So
just
some
interesting
facts.
There.
A
The
current
state
is
alive:
it's
working
right
now:
okay,
we
are
actually
putting
stuff
into
it.
The
total
storage
between
the
two
sites
is
about
is
36,
petabytes
tall,
okay,
so
about
18
percent.
Right
now
we
have
44
OSD
servers.
Okay,
the
number
of
jbars
in
there
is
22
right
with
100
plus
drives
in
each
one.
So
total
we
have.
You
know
over
2000
drives
that
are
that
are
in
spread
across
these
two
sites.
A
We
got
16
compute
nodes,
like
I,
said
doing
mostly
Radars
gateways
and
low
balancers
about
20
terabits
per
second
of
networking
capacity.
Combined
between
these
sites.
Every
node
has
400
gigabits
per
second
links
to
get
out
of
the
the
server
that's
for
redundancy
reason,
and
also
for
bandwidth
reason.
Okay
and
as
I
mentioned
earlier,
200
gigabits
per
second
between
the
two
sites.
A
So
remember
the
criterias
we
set
up
for
ourselves,
you
know
because
so
from
the
overall
economics
we
look
at
a
five-year,
TCO
I
think
we
hit
the
the
target.
A
lot
of
that
is
the
you
know.
We,
the
the
hardware,
is
definitely
up
on
cost.
But
if
you
look
at
the
the
five
year
amortization
of
the
hardware,
you
know
compared
to
like
a
typical
250k
per
per
petabyte
type.
You
know
Cloud
cost.
A
We
we
come
out
ahead
and
a
lot
of
that
is,
we
don't
have
the
overhead
of
the
a
lot
of
the
software
license
that
are
otherwise
required.
If
you
were
to
either
build
this
on-prem
or
use
the
cloud,
but
of
course
disclaimer
your
mileage
may
vary
right.
Our
partner
has
pretty
pretty
good
buying
power
and,
and
so,
depending
on
your
discounts
and
your
your
buying
power,
like
your
mileage.
B
C
C
A
So
this
this
is
comparing
the
the
public
Cloud
costs.
If
you
were
to
just
you
know,
kind
of
search.
The
cost
on
the
internet
compared
to
included
in
here
is
the
the
the
the
hardware
costs.
There
are
some
software
costs
overheads
a
number
of
people
required
to
operate
it
amortized
over
five
years.
A
All
right,
and
so
what
was
the
pain
point?
You
know
how
we
built
it,
and
you
know
what
we
went
through
like
where
the
challenges
are
so
I
mean
Seth.
Seth
is
not
easy,
but
that
wasn't
the
main
problem
right.
A
lot
of
this
is
the
the
careful
planning
up
front.
Like
I
said
this
is
not
a
a
a
a
test
environment.
This
is
not
a
cluster
we
gonna
put
together
and
tear
down
right.
A
This
has
to
last
for
for
years
and
longer,
and
so
a
lot
of
that
is
careful
planning,
a
front
planning
out
the
the
rack,
elevations,
selecting
the
harbor,
both
for
the
initial
phase
and
then
for
the
the
future
expansions
right-
and
you
know,
I-
have
to
take
the
cost
into
consideration.
You
know
so
there's
plenty
of
redundancy
planning
of
performance
built
into
the
hardware,
but
you
know
we
can't
we
don't
have
infinite
budget
to
to
do
this
either
the
networking
aspect
of
it
as
well.
A
You
know
I
talked
about
and
at
the
end
of
it
it's
just
you
know
good
old
system,
engineering
right,
it's
it's
not
just
the
Seth
aspect
of
it.
Earlier
we
heard
about
how
to
make
things
simple
right
to
make
it
more
consumable
for
the
users.
So
bringing
stuff
is
one
thing.
A
So
if
has
tons
of
knobs
either
you
can
tune
for
for
depending
on
what
how
you're
using
it-
and
so
we
have
to
Stills
down
to
something
that's
easier
for
the
operator:
who's,
not
a
safe
expert
to
be
able
to
consume
and
use
this
environment,
and
then
there
are
other
things
that
are
needed
to
again
make
this
into
a
service
right,
an
active
archive
service
that
can
be
consumed.
So
it
takes
a
lot
of
other
things
or
a
lot
of
things
other
than
Seth
to
make
this
possible.
A
So
here's
just
a
sample
of
all
the
other
open
source
projects
that
are
used
in
this
to
make
it
possible.
We
got
stuff
that
are
interacting.
You
know
at
the
the
hardware
level
so
directly
talking
either
to
the
Linux
environment,
to
the
hardware
orchestrating
or
taking
data
out.
We
got
stuff
on
the
the
platform
side
for
orchestrations
across
the
the
the
different
servers
right,
so
that
they're
they're
managed
like
clusters.
We
got
stuff
that
are
user
interfacing,
apis
guis.
That
are
written
in
order
to
abstract
things
away
and
make
things
simpler.
A
So
just
give
an
example
right.
So
when
we
have
stuff
that's
collected
from
the
from
the
the
OSD
nodes,
let's
say
right,
so
we
have
agents
that
are
running
on
there.
They're
collecting
the
data
they're
being
sent
back
to
the
the
micro
services
that
are
running
on
the
back
end
of
your
Kafka
okay.
So
we
use
kapha
as
a
reliable
bus
to
talk
between,
at
least
in
One
Direction,
getting
the
the
metrics
back
to
the
the
back
end
over
at
the
back
end
we're
storing
States.
A
So
these
aren't
States
for
some
of
the
states
for
for
Seth,
some
of
the
desired
States
for
all
the
application,
their
versions
and
the
configurations
for
the
networking
Etc.
The
states
are
stored
in
postgres,
okay.
So
that's
what
postgres
is
there
for?
A
We
have
metrics
that
are
also
coming
from
this
bus
that
are
built
into
a
Time
series,
and
so
the
time
series
are
stored
in
Prometheus
we're
also
using
Prometheus
to
generate
alarms,
I
guess
we
can
have
guys
staring
at
dashboards
all
day
long,
which
you
know
we
do
provide,
but
we
also
want
any
failures
or
threshold
Crossings.
You
know
events
of
interest
to
be
able
to
generate
automatic
email,
slack
messages
to
the
to
The,
Operators
Okay.
A
So
for
some
of
the
the
the
states
and
metrics,
you
know
that
we
need
facts
access
because
a
lot
of
things
that
are
reactions
remediations
that
are
built
in.
So
if
something
happens,
if
it's
something
that
the
software
can
remediate,
we're
not
waiting
for
an
operator
to
come
in
and
click
buttons
right
so
for
stuff
that
are
requires
fast
action
right
requires
data
to
be
available.
A
Some
of
those
datas
are
cached
in
redis,
because
the
micro
services
are
running
across
in
different
containers,
Okay,
so
so
storing
them
in
memory
access.
The
memory
isn't
sufficient
because
multiple
containers
need
to
access
data.
So
that's
what
the
the
Raiders
caching
is
for,
so
that
just
kind
of
gives
a
a
sample.
Like
I
said
you
know
all
the
the
stuff
that's
involved
to
make
this
possible
and
there's
logic.
That's
coordinating
all
this
okay.
So
this
is
not
just
scripting
and
scum
guy.
A
You
know
running
running
scripts
by
hand.
This
is
logic,
that's
all
built
in
it's
programmatic,
and
so
we
can
do
a
lot
more
when
we're
using
programmatic
code
like
like,
go
like
goaling
to
kind
of
tie
all
these
open
source
together.
So
this
is
where
all
the
determinations
of
comparing
the
Discover
State
versus
the
desire
State,
you
know
it
doesn't
match.
This
is
something
immediate
or
is
it
something
that
I
need
to
alarm
and
have
a
person
come
in
and
be
involved?
A
It's
the
logic
for
like
checking
the
the
rados
gateways
right,
putting
the
certificates
managing
the
certificates,
verifying
the
certificates,
user
access
control,
dishing
out
the
credentials
for
who
has
access
and
who
doesn't
and
then
also
a
different
layer.
Sorry
access!
You
know
basically,
our
back
access
for
infrastructure
for
the
storage
for
the
S3
Etc.
A
Okay,
so
you
know
we
got
Asian
collectors
that
are
running
on
all
the
notes
that
are
taking
all
the
like
I
said
earlier,
taking
all
the
data
and
feeding
it
back
into
the
the
back
end.
A
So
to
give
a
another
example,
here
of
kind
of
a
real
life,
real
life
example
of
how
how
this
came
into
play,
so
not
that
long
ago
right
so
we
we
in
the
sandbox,
deploy
it
well
in
a
sandbox,
because
it's
a
Sandbox,
we
tend
to
do
a
lot
of
experiments
over
there.
So
we've
deploy
with
tear
down
we've
experimented.
We
you
know,
create
failure,
scenarios
just
to
test
recovery
there,
so
they're,
usually
stuff,
that's
left
over
in
there
right.
A
We
try
to
clean
it
up
as
much
as
possible,
but
we
didn't
realize
at
the
time
so
part
of
the
the
orchestration
is
done
with
Seth
ansible
right,
and
so
we
were
trying
to
remove
a
node
using
zephensible
to
Persian
no
from
ceph,
and
you
know
stephensible
does
what
it
thinks
is
the
right
thing.
It
wants
to
purse
the
osds
that
are
belonging
to
that
set.
No,
and
but
it
doesn't
query
Seth.
A
It
goes
in,
there
goes
to
user
lip,
oh
sorry,
varlet,
Seth,
OSD
and
just
look
for
the
files
in
there
assume,
hey,
there's
all
your
osds.
You
know,
let's,
let's
go
Persian
well,
it
turns
out.
A
In
other
notes,
perfect
example
great
use
case
for
sandbox,
but
once
we
realized
that,
then
it's
very
easy
for
us
to
go
change
the
logic
this
is
where
the
the
power
of
using
you
know,
language
like
go
comes
in
right.
You
can
compile
quickly.
You
can
change
quickly.
We
can
push
these
updates
out
quickly.
So
you
know
in
a
matter
of
day
or
so
like
we
can
go
out
there
Discover
right
through
our
agents,
all
the
the
files
in
there
we
can
query
set.
A
We
can
compare,
make
decisions
on
what
our
real
files
in
there.
What
are
no
longer,
you
know
valid
files,
stale
files
that
needs
to
be
removed.
We
add
checks
in
there
to
now.
Okay.
Well,
if
you're,
gonna,
Purge
a
node
here
are
the
osds
that
you
think
you're
going
to
remove,
oh
by
the
way
before
you
remove
it.
Let's
run
software
to
go,
compare
against
what's
actually
in
SEF
before
you
go
and
and
execute
anything
okay.
A
So
these
are
the
kind
of
things
that
the
the
logic
again,
that's
tying
not
just
the
the
the
set
aspect
of
it,
but
all
the
other
peripherals,
all
the
other
applications,
the
power
of
what
these
logic
can
do
and
are
actually
necessary.
If
you
wanted
to
operate
reliably
long
term
and
easily
in
this
type
of
a
large
environment,
multi-cluster
multi-site
typing
part.
A
A
Okay,
making
it
robust,
making
reliable,
gets
to
be
more
difficult,
but
you
got
to
persevere
through
it
and
once
you
get
past
that,
then
Seth
is
a
great
platform
right.
It's
great,
not
just
for
storing
media
archive,
but
for
general
purpose
storage.
This
archive
I
talked
about
is
not
only
media
like
they're
they're,
it's
they
use
it
for
VN
backups.
They
use
it
for
any
storage
right,
and
so
so
it's
a
very
worthwhile
investment.
A
Okay
system
engineering,
like
I,
said
all
the
stuff
in
terms
of
Hardware
in
terms
of
networking
in
terms
of
other
softwares
and
applications.
How
are
you
going
to
use
it?
How
are
you
going
to
secure
it?
Like
that's
all
important
things
and
again
they
need
to
be
planned
out
not
just
for
day
Zero
but
long
term
for
or
or
you
think,
the
the
the
Clusters
are
hidden,
okay
and
then
the
automation,
the
smarts,
so
so
I
think
at
any
given
point
in
time.
A
We
all
become
experts
in
some
areas
of
SEF
or
something
because
you
know
it's
a
bug
about
debugging
or
some
features
that
we're
writing,
and
then
we
move
on
to
something
else,
and
we
forgot.
We
forgot
how
smart
we
were.
We
were
you
know
two
years
ago,
but
we
need
to
embed
those
smarts
not
just
in
documentation
but
into
the
software
itself.
Okay,
hopefully
a
lot
of
that
is
embedded
in
safe
itself.
A
You
know
Seth
versions,
progress,
but
the
stuff,
that's
not
inside
you
know
the
the
smarts
are
required,
the
interaction
between
Seth
and
and
either
the
infrastructure
itself
or
the
applications.
A
Those
smarts
need
to
be
embedded
in
software
and
that's
how
over
Generations,
you
know
as
people
come
and
go
as
we
move
on
to
newer
and
better
things
that
the
the
automation
remains
smart.
You
know,
and
and
being
able
to
do
a
lot
of
things
on
its
own
without
having
to
refer
back
to
documentations,
and
you
know
call
it
the
the
original
guy
that
wrote
the
features
and
things
like
that
and
then
live
and
die.
A
You
know,
buy
the
QA
and
the
the
sandboxes
can
stress
the
importance
of
that
Performance
Tuning
staging
debugging
troubleshooting,
it's
a
lifesaver
for
us
multiple
times
anytime's
over
and
you
know
the
the
economics
right
having
this
environment
is
all
you
can
eat.
Once
we
have
this
infrastructure
running,
especially
with
the
sandboxes
like
any
experiments,
we
wanted
to
try
anything
that
we
wanted
to
do
it's
it's
there.
It's
there
for
us
right.
That's
all!
You
can
eat
type
of
scenario.
So
that's
great
okay,.
A
So
I'll
close
there
again,
you
know
thank
our
partners
for
making
this
possible
and
I'll
take
questions.
D
The
the
tape
to
stuff
thing
is
interesting:
what
was
the
business
like
justification
or
Reason
to
move
things
that
you
said?
Are
the
crown
jewels
off
of
a
media
like
tape
onto
stuff.
A
Well
so
tape,
those
of
you
have
dealt
with
tape.
I
mean
it's
the
Aging
industry
right.
The
hardware
are
harder
harder
to
come
by.
The
upgrade
Cycles
are
tough.
You
know
the
the
whole
upgrade
process
is
crusty
at
best
at
best,
okay,
so
the
motivation
there
is,
if
you
don't,
if
you
leave
it
on
tape,
how
much
light
does
it
have?
Okay?
A
So
that's
one
of
the
one
of
the
the
major
justification
is
that:
well,
if
you
don't
do
it,
you
know
the
there
there's
actually
a
non-zero
chance
that
the
data
may
not
be
there
anymore
years
from
now.
Okay,
then
it's
just
the
economics
of
okay.
Well,
comparing
tape
to
rotating
drives-
and
you
know
the
economics
of
that,
and
so
that's
how
it
was
justified.
A
E
About
how
you
did
the
performance
tuning
for
the
cluster,
did
you
end
up
with
like
an
eight
plus
three
Erasure
coding.
A
Yes,
thank
you,
so
I
forgot
to
mention
that
so
the
in
the
production
environment,
the
Erasure
Coatings,
were
A4,
okay,
so
50
overhead.
It
was
chosen
based
on
the
number
of
racks
that
were
available
and
their
constraints
between
the
racks
that
are
available
in
one
side
versus
the
other,
so
we
could
add
more
racks
in
one
of
the
sites,
but
the
limitations
and
wanting
to
make
these
identical.
That's
how
we
we
chose
those
and
then
when
we
do
performance
tuning.
A
lot
of
that
you
know
besides
tapping
into
the
community.
A
What's
possible,
is
empirical
all
right.
So
we
do
again.
We
have
nice
automations
for
doing
that
being
able
to
try
the
number
of
Radars
Gateway
demons,
number
of
demons
per
note,
number
of
load
balancers
that
we
put
in
front
of
it
and
how
those
load
balancers
are
distributed.
So
we
ended
up
with
AJ
proxy.
It
was
a
low
balancer,
low,
balancing
Radars
gateways,
three
to
four
of
them
per
server
on
the
same
server,
and
that
is
a
building
block
repeated.
A
You
know
end
times
and
being
as
many
as
we
need
for
the
the
the
type
of
access
like
in
the
beginning,
we're
just
you
know,
basically
taking
from
tape
and
throw
it
into
Seth
like
we.
We
try
to
have
many
of
these
right
later
on.
They
may
get
reduced
okay.
B
So
how
will
you
maintain
check
consistency
between
the
different
sites
and
also
the
public
Cloud?
As
you
grow
number
terms,
number
of
number
of
objects
stored
amount
of
data
stored,
especially
after
you've,
had
some
kind
of
outage
anywhere
in
any
of
the
three.
The
consistency
of
the
data
yeah
between
the
between
the
three
copies,
basically
yeah.
A
Yeah,
well
that
that's
a
good
question
that
at
the
infrastructure
level
right
now,
it's
it's
difficult
and
we're
not
quite
doing
it
there
at
the
the
infrastructure,
in
terms
of
you,
know,
occurring
Seth
right,
so
the
the
media
manager
that
you're
seeing
over
on
top
there
like
how
the
datas
are
replicated
and
which
site
they
sit
in
and
whether
they're,
consistent
or
actually
being
done
at
the
higher
level.
Right
now,.
B
A
That's
right
so
we're
not.
We
actually
did
play
with
the
the
several
application
at
one
point
doing
the
across
the
two
sites,
but
of
course,
maybe
in
coming
up
later,
we
will
have
it,
but
right
now
this
Sev
doesn't
have
the
ability
to
go,
go
replicate
itself
into
a
public
Cloud
right,
so
the
hardware
is
needed
anyway,
and
so
that
just
became
the
the
the
de
facto.
F
So
this
is
not
so
much
a
question.
I
think
this
is
a
excellent
project.
I
like
to
have
a
poster
question
to
rest
of
the
team
here
that
I
know
you
say
that
tape
is
Agent
Technologies,
but
I
want
to
let
you
know
that
tape.
Actually,
you
know
because
we
have
the
tape
business
I
tell
you
the
pave
right
now
the
business
is
growing
like
crazy
because
of
the
the
hyperscalers
I
mean
it's
just
too
much
data.
You
know.
F
In
order
to
prevent
the
climate
change,
you
need
to
find
a
way
to
sustain
it
again,
your
projects,
it's
great
okay.
So
what
we're
thinking
about
is
whether
we
can
come
up
with
the.
But
the
problem
is
the
tape
you
know
it's.
The
interface
is
hard
to
use.
I
mean
you
know,
think
about
you
get
somebody
some
kids
from
the
you
know
college.
You
know,
ask
them
tape
like
myself,
have
no
idea
what
rewind
means
in
the
in
you
know,
there's
a
forward
and
Rewind.
What
is
the
rewind?
F
Remember
what
right
so
we're
thinking
about
whether
we
can
put
object,
storage
interface
in
front
of
tape,
to
make
tape
to
be
more
consumable.
The
problem
is
not
so
much
tape.
Technology
is
aging.
It's.
The
tape
is
hard
to
use
and
hard
to
manage
if
we
were
to
put
the
object
interface
in
front
of
tape,
whether
we
can
solve
that
problem.
A
Let's
suppose
the
question:
that's
a
valid,
that's
a
valid
approach,
yeah
and
then,
like
I,
said
it's
a
question.
That's
not
just
for
me.
It's
to
the
the
team.
There,
no
I
think
I
think
the
economics
of
it
in
maybe
the
environmental
aspect
of
it
does
definitely
makes
sense.
A
But
for
other
considerations
you
know
just
to
keep
it
in
mind
as
well
right.
So
so
it
is
also
about
the
the
access
you
know,
the
the
fast
access
and
also
correct
random
access.
Any
part
of
the
data,
sometimes
we're
accessing.
You
know
even
parts
of
a
file
correct.
F
E
A
Well,
yeah,
definitely
right
so
right
now,
just
just
putting
the
entire
petabytes
of
data
into
ceph
compared
to
robot
arms.
You
know
going
in
there
and
fetching
a
tape
you.
A
Already,
that's
an
improvement
all
right,
huge
Improvement,
like
in
the
tape.
There
are
cash,
so
they
are
servers
and
rotational
drive
there
that
are
caching
it,
but
you
know
it's
not
the
size
right
now
we
have
the
entire
content.
A
A
Well,
some
of
that
is
is
also
for
the
future,
like
the
purpose
of
those
ssds
part
of
it
is
so
that
the
the
metadata
can
be
be
stored
there,
but
we
do
have
expansion
slots
available
on
those
OSD
servers.
So
there
are
talks
of.
It
is
again
beauty
of
Seth
right.
We
can
have
massive
petabytes
of
data
on
rotational
drive,
but
we
can
have
small
pools
that
are
around.
A
B
A
Well,
so
the
some
of
the
tests
we
do
for
reliability
is
to
protect
against
failures,
and
so
we
do
try
to
mimic
things
like
a
harbor
failure
or
dry
failure,
and
you
know
it's.
The
project
has
been
around
for
a
couple
years
now,
so
we
do
actually
have
real
Hardware
failure
and
real
Drive
failures
in
production
environment
that
we
had
to
to
to
to
to
to
to
go
resolve.
Okay,
so
I
don't
have
the
numbers
in
terms
of
the
actual.
A
You
know
how
many
nines,
if
we
were
to
tell
you
up
all
the
run
times
and
the
the
and
the
number
of
times
that
we
encounter
issues.
But
you
know
we
haven't,
had
lots
of
data
in
the
production
environment.