►
From YouTube: How we operate Ceph at scale
Description
Event: https://ceph.io/en/community/events/2022/ceph-virtual/
Presented by: Matt Vandermeulen
How we operate Ceph at scale
As clusters grow in both size and quantity, operator effort should not grow at the same pace. In this talk, Matt Vandermeulen will discuss strategies and challenges for operating clusters of varying sizes in a rapidly growing environment for both RBD and object storage workloads based on DigitalOcean's experiences
A
Hi
everyone
I'm
Matt
from
the
storage
systems
team
at
digital
ocean.
This
is
how
we
operate
set
at
scale.
There's
a
lot
of
content
in
these
slides,
so
I'm,
basically
going
to
be
speedrunning
it.
A
quick
run
through
of
the
agenda.
I'll
talk
a
little
bit
about
what
do
is
who
we
are
on
the
storage
systems,
team
I'll,
move
on
to
our
use
of
Steph
at
do
how
we
approach
our
Automation
and
what
we
use
it
for
which
leads
into
operating
clusters.
A
I'll
finish
off
with
a
little
bit
of
reflection,
not
just
with
stuff,
but
our
approaches
as
well
and
we'll
wrap
up
with,
hopefully,
some
time
for
a
hiring
plug
and
some
q
a
so
what
is
digitalocean?
We
are
a
cloud
provider
founded
in
2012,
based
on
the
core
concept
of
Simplicity.
We
started
with
the
droplet
a
five
dollar
SSD
backed
virtual
machine,
which
was
very
attractive
in
2012..
In
2016,
we
introduced
the
second
product,
which
is
volumes
step
back,
detachable
droplets
storage.
A
Since
then,
our
product
portfolio
has
grown
significantly,
including
spaces
in
2017,
which
is
our
setback.
S3
compatible
object,
storage,
offering,
along
with
dbas
doks
app
platform,
lbas
and
more
most
recently
serverless
functions
and
manage
hosting
with
cloudways.
We
have
data
centers
in
eight
different
regions,
some
with
different
Metro
choices
such
as
sfo2
and
3,
that
give
our
customers
more
than
a
dozen
choices
to
place.
The
resources
we
had
an
IPO
in
2021,
and
now
we
get
to
join
in
on
these
stocks.
A
Games
sources
is
a
small
team
of
six
Engineers
with
a
number
of
goals.
The
scale
of
a
team
and
the
scale
of
our
deployment
should
not
be
tied
together.
There
should
never
be
a
ratio
of
Engineers
to
clusters
when
considering
team
size.
Just
because
we
have
another
end
clusters
over
a
year
doesn't
mean
we
can
hire
new
Engineers
to
dedicate
them
to
those
clusters.
A
A
huge
help
with
that
is
we
try
to
automate
everything
we
possibly
can
as
potently
as
possible
and
end
goal
or
dream,
as
it
were
here,
is
that
we'd
never
have
to
SSH
into
a
node
for
database
operations.
At
the
same
time,
we're
not
going
to
let
perfect
be
enemy
of
progress.
There's
a
lot
of
room
for
hacky,
one-off
bash
groups.
A
So
let's
talk
about
stuff
at
do
stuff
used
at
Geo
is
growing
rapidly.
I
used
for
blocking
object,
storage,
which
Powers
both
the
volume
and
spaces
products
at
do
with
many
grown
clusters.
Other
teams
make
heavy
use
of
volumes
and
spaces
for
many
of
do's
other
product
offer.
A
So
some
quick
stats
about
do
these
are
the
numbers
I'm
allowed
to
share
can't
go
any
further.
46
clusters
in
total
38
are
production
from
they're
running
Nautilus
and
eight
staging,
some
of
which
are
specific.
More
than
140
petabytes
of
Ross,
storage
and
ceph
are
biggest
clusters
are
over
nine
petabytes.
This
does
not
include
the
droplet
backup
snapshot
or
bring
your
own
image
storage,
which
is
pretty
staggering
in
itself.
There's
more
than
23
000
osds
in
our
Fleet
and
1200
servers
run.
A
So
I
want
to
give
a
quick
mention
to
as
many
parts
of
the
automation
as
I
can,
including
outside
of
our
team.
So
a
quick
Whirlwind,
that's
outlined
here,
there's
a
launch
process
of
qualification
and
procurement
that
happens
before
we
decide
to
deploy
a
set
of
equipment
for
a
new
generation
of
clusters.
A
lot
of
the
stress
testing
is
automated,
though
it's
mostly
through
scripts
and
tooling
that
it's
run
directly
on
the
nodes.
We
do
the
same
sort
of
thing
when
qualifying
new
Drive
models
as
well,
so
our
data
center
Ops
do
the
rapid
stack.
A
They
make
cabling
beautiful
power
on
all
the
equipment
hand
it
over
to
you,
network
and
Hardware
engineering
teams.
Networking
team
will
then
configure
switches
as
appropriate.
There's
additional
complexity
here
for
the
public
facing
load,
balancers
and
spaces
clusters
for
the
hardware
engineering
team
takes
the
server
nodes
through
provisioning,
workflow
and
they're
left
with
a
base
OS
with
up-to-date
firmware
on
all
components.
A
Storage
systems
then
runs
our
version
of
provisioning,
workflow
and
populates
ceph
cluster
from
scratch.
I
mentioned
a
bunch
of
automation
that
we
use
and
some
of
the
tools
that
we
use
to
list
here.
Chef
is
used
for
a
log,
Decor,
OS,
stuff,
General,
config
management.
We
use
ansible
for
things
that
are
sub-specific,
such
as
deploying
the
cluster
from
scratch,
augmenting
a
cluster
with
more
notes
or
drives.
Awx
is
an
open
source,
self-hosted
solution
for
running
ansible
playbooks
with
it.
We
can
share
failure
modes
with
the
team
and
we
have
a
detailed
history
of
runs.
A
We
still
write
one-off
bash
scripts
from
time
to
time
where
automation
doesn't
make
sense.
This
is
often
due
to
a
seeing
it
that's
a
one-off
as
we
understand
it
at
the
time,
but
we
still
fully
document
those
with
context
and
tickets,
because
sometimes
that
stuff
gets
moved
into
a
Playbook
again,
don't
let
perfect
video
on
immediate
progress,
something
to
note
here.
We
don't
use
stuff
ADM
at
all.
A
Today
it
didn't
exist
when
we
started
and
we
haven't
been
convinced
that
we'll
gain
much
benefit
from
its
day
because
of
some
of
the
requirements
for
Secrets
management,
other
Geo
ecosystem
ties.
We
can't
use
it
to
call
off-the-shelf
solution.
Anyway.
We've
had
support
luminous,
Nautilus
and
Pacific.
Upstream
automation
has
changed
between
these.
We
retire
find
grain
control
over
the
cluster
layout
for
and
behavior
for
specific
needs.
We
are
still
keeping
tabs
on
the
options
available,
for
example,
rook,
and
we
may
evaluate
them
in
the
future,
so
the
bulk
for
automatic
lies
in
ansible
deployments.
A
Ultimately,
this
is
what
allows
us
to
operate
stuff
at
scale.
It's
not
particularly
new
or
Innovative,
but
it
is
cool
to
see
a
bunch
of
the
animal
turn,
a
bunch
of
metal
into
user.
Consumable
storage,
as
mentioned
previously
new
cluster
deployments
and
augments-
are
done
through
these
playbooks
augments
come
in
two
forms:
either
adding
disks
or
adding
hosts
we
handle
both
of
them
in
a
similar
Way
by
expecting
or
increasing
a
discount
on
each
node.
A
Node
reboots
will
safely
reboot
any
of
the
nodes
that
we
need
to
either
when
we
want
to
On
Demand
or
whether
we
just
need
to
for
a
kernel
updates
or
the
likes.
We
set
our
nodes
in
central
fig
up
using
the
ansible
playbooks
and
reconfigure
them
that
will
the
stuff
upgrades
are
also
done
through
these
playbooks,
and
this
is
pretty
easy
because
we
use
containerized
deployments.
We've
been
running
set
for
a
very
long
time,
and
because
of
that,
we've
had
some
file
store,
osts
that
we
want
to
move
to
Blue
Store.
A
This
could
be
done
safely
with
our
playbooks,
either
by
draining
an
OSD
and
recreating
it
or
doing
it
dangerously
by
destroying
and
recreating
in
place.
Osb
restarts
can
be
done
one
at
a
time
or
host
at
a
time,
and
the
Playbook
will
just
wait
for
the
recovery
to
complete
before
before.
Moving
on
since
we've
been
running
these
clusters
for
quite
a
while,
eventually
they
get
old
and
we
need
to
shut
them
down.
A
This
kind
of
tear
down
is
largely
handled
by
the
automation,
but
does
not
include
getting
the
data
off
the
cluster
as
some
of
the
utilities
and
goodies
that
we
use
here
to
make
all
the
other
stuff
work
include
roles
like
stuff
weight,
healthy,
which
is
possibly
the
widest
used
role
in
our
in
our
repo.
At
this
point,
it
ensures
that
clusters
in
an
expected
State
before
continuing
determining
health,
is
super
simple
for
a
script
and
it's
super
boring
for
a
person.
So
we
let
the
script
determine
safety
is
appropriate.
A
There
are
no
maintenance
up
and
node
maintenance
down
rules
that
safely
pull
any
type
of
node
out
of
service
and
brings
it
back
in
their
Global
maintenance
locks
using
router
slots
before
progressing,
which
is
just
a
primitive
concurrency
control
mechanism
that
leverages
Target
cluster.
This
just
allows
us
to
make
sure
that
no
two
operators
are
going
to
try
to
run
different
playbooks
on
the
same
cluster.
A
We
also
have
slack
utilities
that
can
let
interest
parties
know
when
things
are
happening
as
they
happen,
and
there
of
course
ties
into
our
secret
storage
and
order
to
push
and
create
new
key
Rings
deployment
as
necessary.
A
quick
note
about
the
automation
is
it
of
course,
thrives
on
consistency.
Snowflakes
are
inherently
inconsistent
and,
unfortunately,
some
Winters
bring
more
snow
than
others
when
doing
operations.
Try
to
think
about
how
a
change
in
one
cluster
today
might
affect
your
assumptions
tomorrow
across
the
fleet.
A
Some
ways
your
cluster
might
end
up
being
different
from
others
or
even
nodes
within
a
cluster
might
differ.
Hardware
configs
are
the
easiest
deviation,
especially
these
drives
tend
to
go
end
of
life
and
you
mix
in
the
Next
Generation
centralized
config
can
change
between
the
Clusters,
and
you
might
also
forget
about
that.
One
single
OSD
that
was
given
a
specific
config
option
and
might
just
cause
confusion
down
the
road.
There
might
be
a
long-running
script
in
the
background
that
you
completely
forgot
about,
and
now
both
you
and
the
balancer
are
totally
confused.
A
Definitely
had
that
happened
highly
recommend
melting,
your
snowflakes,
so
they're,
all
kind
of
part
of
the
same
puddle
glossing
over
that
we
build
our
own
ceph
packages.
We
want
to
move
on
to
deploying
the
cluster,
and
ideally
this
is
as
simple
as
firing
off
a
Playbook
and
just
waiting
until
it
succeeds
or
it
fails.
Realistically,
it
is
generally
that
easy,
but
there's
a
ton
of
work
that
goes
into
these
playbooks,
and
some
of
this
is
worth
calling
out
explicitly.
A
First
off
we
make
sure
that
Chef
converges
on
a
host,
that's
just
kind
of
our
base
entry
point,
and
then
we
start
getting
into
the
meat
which
I'm
just
going
to
skim
through
here.
There's
some
safety
checks
along
the
way,
such
as
ensuring
that
all
the
drives
have
the
same
size
for
host.
We
then
create
system,
D
units
for
daemon's,
configure
manager,
modules
and
more
throughout
this
process.
A
The
next
diseases
have
been
done,
but
we
do
the
standard,
dance
of
creating
a
mind
map
deploy
the
mons
and
strip
Forum.
That
sort
of
thing,
then
we
want
to
create
our
osds
and
pre-populate
the
crush
tree
ahead
of
time.
This
is
super
simple
for
us,
because
we
have
an
ansible
inventory.
A
These
ansible
and
tutorials
generate
before
us
using
some
other
Geo
ecosystem,
and
it
has
some
specific
attributes
about
placements,
such
as
the
rack
number
of
discs,
whether
it's
an
index
or
a
data
node
that
sort
of
thing
next
up,
we
deploy
the
OSD
containers
and
start
them
across
the
entire
cluster.
This
is
also
very
simple
because
we
just
enumerate
the
disks
on
the
host
and
we
have
a
tool
that
wraps
self
deaf
volume
lvm,
because
the
crush
tree
was
pre-populated.
A
This
can
be
done
in
parallel
across
the
cluster,
which
makes
it
very
quick
for
us.
Finally,
we
do
quick
verification
that
all
the
osds
we
expected
to
create
were
created
and
started.
We
also
checked
the
cluster
Health
at
this
point
and
verify
the
cluster
is
healthy
and
is
as
bored
as
it
ever
will
be
in
the
future
we'd
like
to
fire
off
this
Playbook
automatically
after
generating
the
inventory
from
complete
tickets
during
our
handoff.
A
This
is
an
example
of
promoting
an
automation
to
a
service,
though
it's
effectively
just
orchestrating
Playbook
launches,
one
of
the
biggest
post-employment
operations
we
have
is
a
capacity
augment.
This
is
where
block
an
object
varies
very
slightly.
Deploying
on
the
osds
in
the
containers
are
the
same,
but
giving
them
pgs
is
different.
On
block
side,
we
use
our
communities,
which
is
open
source
to
slowly
update
osds
over
time.
This
is
mostly
done
to
mitigate
pairing
light
C,
which
I'll
talk
about
a
little
bit
more.
In
a
moment.
A
The
object
uses
PGE
remapper,
which
is
also
open
source
and
it
cancels
backfill
via
up
maps.
We
then
slowly
undo
them
in
a
loop
which
brings
the
adpgs
back
to
these
new
osts.
This
is
done
because
traditionally
object
heads
for
Hardware,
where
flapping
osds
were
not
uncommon.
The
recovery
weight
from
those
flaps
would
get
put
off
by
ongoing
backfill
and
eventually
turn
into
backfill
weight.
This
just
kind
of
snowballed
into
never
ending
tiers.
A
We
mostly
use
PG
remapper
for
an
object
now
to
maximize
backfill,
concurrency
and
minimize
degradation,
it's
possible
now
that
we
could
make
use
of
the
up
map
balancer
for
both
products
after
we
cancel
backfill.
The
balancer
would
then
start
opportunistically
removing
up
maps
that
aren't
needed
and
it
can
be
turned
off
if
necessary.
A
So
now
that
we've
got
a
cluster
released
to
the
world
and
it's
no
longer
bored,
we
want
to
keep
this
thing
up
to
date.
Handle
failure
modes
do
all
sorts
of
Maintenance,
so
some
planned
operations,
such
as
cluster
augments
and
capacity
management,
requires,
as
discussed
OSD
restarts
often
either
due
to
updating
stuff
disk
failures
or
simple
flapping.
A
If
we've
got
a
case
of
a
slope
that
is,
flows,
node
reboots
to
keep
a
kernel
up
to
date
or
nodes,
either
failing
due
to
bad
Ram,
anything
in
the
Network's
Stacks
solar
flares,
you
name
it
all
of
these
things
will
cause
pgs
to
start
peering
during
peering.
No
I
O
can
happen
on
these
pgs.
While
peering
is
very,
very
quick
on
our
plot
clusters,
it's
never
going
to
be
faster
than
our
P99
read.
This
is
this
can
cause
some
cascading
issues
to
our
most
our
most
latency
sensitive
customers.
A
This
is
less
important
on
the
object
clusters,
because
there's
the
HP
HTTP
overhead
and
that
latency
is
usually
longer
than
PG's
appear
to
give
a
bit
of
an
idea.
This
is
a
P99
read
latency
graph
and
those
spikes
there.
You
might
be
able
to
tell
where
OSD
restarts-
and
it's
important
to
note
here
that
we
measure
this
from
inside
the
cluster
against
a
real
RPG
image.
This
means
we
don't
have
all
the
overhead
of
the
network
between
a
droplet
and
a
cluster.
We
also
specifically
measure
I
o
latency
with
this
tool.
A
So
what
can
we
do
about
this?
Latency?
We
can
reduce
the
paxos
proposed
interval
from
its
default
of
two
seconds
to
a
quarter.
Second
is
to
justify
stage
on
mailing
list.
However,
we
actually
observe
that
this
made
things
a
little
bit
worse
for
us.
So
we
tried
another
approach.
We
could
try,
starting
all
the
osds
without
letting
their
PCS
actually
appear.
A
By
setting
no
up,
we
can
also
check
the
admin
socket
on
the
osds
for
their
current
status
and
wait
for
them
to
just
kind
of
hang
out
at
pre-boot
State
and
once
all
the
osds
per
hosts
are
at
pre-boot.
We
can
then
unset
NOAA
allowing
pgs
to
begin
peering
which
reduces
the
OSD
map
updates.
Now
we
get
to
deal
with
the
recovery
overhead
on
the
cluster
for
a
bit,
but
that's
still
better
than
hearing
for
us.
We
know
that
we'll
never
be
able
to
eliminate
peering,
latency
and
immediately
consistent
storage
system.
A
However,
if
we
can
make
some
progress
in
minimizing
it,
it
really
helps
those
applications
that
are
the
most
sensitive
to
latency.
So
look
at
this
graph,
I
zoomed
in
way
further
on
this
one,
to
reduce
the
noise,
reduce
the
noise
and
show
the
difference
when
combining
both
the
no
up
and
the
pax's
proposed
intervals.
That
stages
recommendation.
We
see
great
Improvement
to
latency
during
period,
so
that's
blocked
what
about
tricks
and
objects?
We
have
clusters
with
billions
and
billions
of
objects.
A
I
can't
share
the
numbers,
but
it's
nuts,
the
rgw
index
layer,
doesn't
handle
it
very
well.
When
we
have
buckets
with
Way
Beyond
a
hundred
thousand
objects
per
Shard
rule
of
thumb.
There
are
too
many
shards
that
will
heavily
impact
list
performance.
This
was
a
huge
problem
before
we
could
do
any
kind
of
dynamic
resharding
back
in
the
Luminous
days,
but
even
after
re-sharning,
what
about
cleaning
up
the
old
shards?
A
A
It
says
log
structured,
merge
tree,
which
is
append
only
which
means
that
every
new
entry
in
the
database
is,
of
course,
a
new
right,
but
that
also
means
that
deletes
our
new
rights
that
we
call
tombstones
a
process
to
remove
the
deleted
data
is
needed.
This
rocks
can
be
compaction.
It
must
either
read
the
full
database
or
ranges
of
it
to
compact.
This
is
a
lot
of
time
spent
in
rocks
to
be
code
that
isn't
spent
serving
customer
traffic.
A
It's
important
to
note
here
that
we
aren't
pointing
the
finger
at
either
rgw's
interaction
with
rxdb
or
roxubi
itself.
However,
we
have
observed
a
rough
blue
FS
interaction
with
roxdb
when
iterating
over
large
amounts
of
tombstones.
This
is
largely
improved
today,
because
the
cefts
and
the
community
are
awesome
and
the
scale
at
which
we're
at
is
not
what
rgw
defaults
are
tailored
for,
though
it
has
proven
quite
capable
of
handling
our
scale
when
tuned.
A
So
we
needed
to
dig
into
the
index
compactions
to
figure
out
what
was
up
in
roxdb,
there
were
files
which
won't
compact
until
the
level
is
full.
This
meant
there
was
no
upper
bound
on
the
tombstone
lifetime,
leading
to
slower
durations.
We
explored
a
lot,
a
lot
of
rocks
to
be
options
looking
for
silver
bullets,
and
this
effort
started
back
when
we
were
on
luminous
I'll
stress
that
I've
condensed
many
months
of
effort
that
most
of
our
teams
put
all
into
a
single
bullet.
A
On
this
slide,
there
was
a
ton
of
Discovery
and
testing
throughout
with
Nautilus
roxdb
was
upgraded
and
we
had
more
options
to
explore.
So
our
Silver
Bullet,
we
discovered
that
newer,
Rock
CB
gave
us
access
to
GTL
compaction
when
sale
data
reached
a
certain
age.
Compaction
is
triggered
on
the
file
within
Rock
CB.
This
means
that
the
first
TTL
compaction
run
on
an
OSD
that
hadn't
had
full
compaction
in
a
long
time
took
quite
some
time,
but
for
us
it
was
uneventful.
A
During
this
time
we
disabled,
GC
and
LC,
which
is
helpful
here,
because
their
index
heavy
workloads,
the
load
on
our
index
nodes
is
consistently
higher
than
it
was
previously,
and
for
us,
this
trade-off
is
absolutely
worth
it.
The
higher
load
isn't
worrying
for
us
at
all.
There's
plenty
of
Headroom
for
us
and
the
notes
are
forwarding.
The
index.
Utilization
and
capacity
sense
has
dropped
a
staggering
amount
from
some
osds
freed
up
double
digit
percentages
of
their
capacity
use.
A
We
have
since
disabled
periodic
compaction
in
favor
of
TTL
compactions,
and
this
has
been
our
biggest
Silver
Bullet
for
index
stability.
We
have
no
reason
to
believe
that
this
will
backfire
on
us
today,
hopefully
next
year.
We
won't
be
here
telling
you
that
we're
wrong,
but
these
slides
are
a
couple
months
old
at
this
point,
so
I
think
we're
good.
Hopefully.
A
Finally,
we
want
to
make
sure
that
all
the
Clusters
are
doing
what
we
expect.
First,
similar
to
ceph
ADM,
the
manager
of
Prometheus
module
was
not
available
when
we
started
so
we
wrote
an
open
source,
the
stuff
exporter,
that
support
is
written,
go
and
doesn't
rely
on
the
manager
which
has
had
some
scaling
issues
in
the
past,
but
otherwise
it
accomplishes
the
same
thing
keep
an
eye
on
the
fill
rates
and
projections
for
capacity.
This
is
especially
important
today,
as
supply
chain
issues
make
lead
times
absolutely
terrifying.
A
This
is
different
than
having
a
finger
on
the
pulse
of
what
capacity
is
that
today
we
want
to
understand
weekly,
monthly
and
even
yearly
Trends.
We
wrote
a
tool
called
stork
supporter
which
runs
on
each
host
and
talks
to
all
the
admin
sockets
for
a
ton
of
extra
insights.
This
can
also
check
Hardware
on
a
host
such
as
reading
smart
info
reallocate
sector
power
on
hours
Etc.
These
are
useful
metrics
to
help
identify
if
that
drive
is
headed
towards
failure.
A
You'll
start
to
get
an
idea
for
how
many
rights
of
traffic
take
over
its
lifetime.
How
long
that
lifetime
might
be.
We
also
monitor
Network
reachability
to
every
other
host
using
f-ping
with
an
expected
MTU
and
don't
fragment
Flags.
If
this
is
ever
flaky,
we
might
have
great
failures
on
a
single
link
and
a
cluster
which,
in
a
distributed
system,
can
cause
an
entire
world
of
confusion.
Network
monitoring
for
these
sorts
of
things
can
be
tricky
but
identifying
a
single
network.
A
We
also
have
a
tool
called
merigraph,
which
is
how
we
measure
the
cluster
latency
measure
and,
as
I
mentioned
earlier,
it's
a
very
useful
observability
tool
to
get
a
client-side
perspective
of
what's
going
on
and
with
all
things
monitoring,
you
should
only
alert
on
things
that
you
could
take.
Action
on.
A
Informative
learning
is
not
actionable
and
that's
what
our
graphs
are
for,
something
we're
still
working
on
is
looking
at
using
Prometheus
and
alert
managers
inhibit
rules
so
that
in
the
example
of
network
problems,
only
the
network
alert
would
fire
before
a
slew
of
other
alerts
that
were
firing
because
of
that
core
Network
problem.
A
So
closing
up,
let's
take
a
quick
look
at
how
we
do
things
differently.
In
hindsight,
we
spent
a
lot
of
time
with
division
between
the
block
and
object
teams.
They
used
to
be
separates
pillars
under
storage.
This
meant
that
we
treated
our
clusters
very
differently.
Automation
configuration
largely
diverged,
creating
a
lot
of
confusion
and
duplicate
work
throughout
the
life
cycle
of
the
cluster.
There
were
multiple
sources
of
Truth
for
different
things
where
chef
and
ansible
can
get
their
information
from.
If
that
were
minimized,
it
would
reduce
a
lot
of
confusion.
A
The
use
of
the
centralized
comp
instead
of
scattered
cefcomp
files,
for
example,
has
been
great.
We
have
a
lot
of
Automation
and
some
of
it
gets
pretty
complicated.
Now
some
of
that
might
be
better
office
services
and
just
like
determining
whether
something
is
automating.
That
line
should
promote
automation
to
services
and
finding
the
right
time
or
finding
the
time
to
be
able
to
do
it
is
challenging
melt.
A
All
your
snowflakes,
a
unique
cluster,
is
going
to
become
a
problem
somehow
someday
if
all
the
snowflakes
military
together
they
become
part
of
the
same
digital
ocean.
So
thank
you.
That's
about
wraps
it
up,
quick
firing,
plug
check
out
our
careers
page
and
if
we
have
some
time
here,
I
can
try
to
take
some
questions.
A
I,
don't
got
some
questions
coming
in
from
chat
yeah,
so
is
stronger
exporter
open
source?
Unfortunately
it
is
not.
It
is
something
that
we've
kind
of
talked
about
back
and
forth,
but
we
haven't
there.
There
hasn't
been
an
effort
to
actually
look
at
open
sourcing
it
yet
I
think
we
have
to
look
at
what
it's,
including
before.
We
can
look
at
that.
Do
we
balance
primary
pgs
balance
in
what
way.
A
A
This
was
I
think
it
was
originally
because
they
ran
on
several
generations
of
Hardware
way
back
way.
Back
in
the
day,
we
envisioned
objects
as
being
used
for
a
very
different
use
case
other
than
what
turned
out
to
be
web
assets.
We
were
expecting
large
objects
at
the
time,
so
we
geared
our
deployments
for
large
objects
where
all
of
the
block
deployments
were
deployed
expecting
all
block
workloads.
A
So
our
our
newer
generation
clusters
are
are
much
much
more
tuned
to
widely
widely
varied
workloads
on
object,
so
bouncing
primary
pgs
evenly
over
all
the
osds.
Yes,
we
we
do
use
the
up
map
balancer
today
on
the
Clusters
PG
remapper
has
been
useful
for
for
kind
of
circumventing
that
for
other
maintenance
operations
like
if
we
need
to
drain
an
OSD
or
just
to
cancel
ongoing
pre-mept
backfill.
For
any
reason.
A
Okay,
well,
thank
you,
Matt
for
taking
the
time
to
present.