►
From YouTube: Ceph Science Working Group 2023-01-31
Description
Join us for Ceph Science Working Group meetings. We alternate the third and last Tuesday of each month at 14:00 UTC.: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
You
know
big
pluses,
small
clusters,
Cloud
research,
whatever
they
get
together
in
chat
for
an
hour
or
so
about
anything,
Seth
issues
upgrades
whatever
these
meetings,
I
recorded
and
posted
to
the
Seth
channel,
usually
within
a
few
days
or
something
I
I'm,
not
like
a
presenter
I,
just
organize
them,
and
hopefully
we
end
up
having
a
good
chat
and
I
try
to
prod
it
along
a
little
I
suppose
so,
with
that
anybody
here
who
hasn't
chatted
before
or
joined
one
of
these
want
to
say
hi
or
like
what
science
you
guys
do.
B
I
just
started
talking,
so
you
know,
oh
so,
I
I
think
I
sat
through
one
of
these
before
and
looked
in
the
background,
but
didn't
talk,
but
I'll
talk
this
time.
Garen
Atterbury
I
work
with
the
University
of
Nebraska,
Lincoln,
well,
University
of
Nebraska
system.
Our
research
computer
groups
called
the
Holland
Computing
Center.
We
have
a
few
ceph
clusters
of
various
shapes
and
sizes.
B
The
biggest
one
is
we
have
a
tier
two
site
for
the
CMS
project,
which
is
part
of
CERN.
The
Large
Hadron
Collider
used
to
be
a
large
Hadoop
file
system
where
we
just
used
hdfs,
but
we
migrated
to
Steph,
as
did
most
of
the
other
us
sites
for
this
project.
B
D
B
Yeah
we
went
to
the
set
of
this
for,
for
this
particular
cluster.
The
majority
of
the
data
on
the
17
raw
petabyte
one
is,
is
this
ffs
file
system,
then,
in
the
past,
when
it
was
hdfs,
we
only
ran
the
hdfs
file
system
component.
We
didn't
actually
do
mapreduce
and
the
Hadoop
things.
On
top
of
that,
that's
honestly
been
a
pleasant
experience
of
the
you
know.
The
the
file
system
comes
along
happy,
as
can
be.
I
was
planning
on
upgrading
to
17
at
some
point
just
haven't
gotten
there.
B
We
have
a
few
other
ceph
file
systems
as
well,
one
that's
roughly
a
petabyte
that
was
actually
used
for
well.
It
was
intended
to
be
mostly
an
object
store
to
back
a
next
Cloud
instance.
B
Largely
that's
it's
unused
or
underutilized.
I
should
say
not
not
a
lot
of
activity
on
that
project,
and
then
we
also
use
Seth
via
Rook.
We
have
a
new
kubernetes
kubernetes
clusters
that
we've
spun
up
for
various
reasons.
So
one
has
a
bunch
of
pure
nvme
nodes,
burning
Seth,
it's
actually
triply
replicated
I
know
a
lot
of
people
do
double
because
of
the
reliability
of
nvmes.
B
In
theory,
but
that's
that's
another
one
we
have
and
then
because
we've
had
success
with
stuff
I
guess
we're
planning
on
building
a
new,
roughly
five
to
six
petabyte
Seth
cluster.
That
will
largely
be
general
purpose:
storage
for
our
campus
and
our
researchers.
The
intent
is
to
provide
well.
My
intent
is
that
it
is
a
you
know,
the
start
of
a
storage
platform.
So
it's
you
know
we
have
a
chance
to
get
some
brand
new
hardware,
use
that
and
you
know
utilize
it
in
various
ways.
B
The
majority
will
be
as
ffs
again,
but
we
also
have
people
wanted
to
do
some
block
device
things
which
are
back
be
backed
by
it
as
well
as
likely
the
next
Cloud.
So
the
object
store
side,
so
yeah,
I,
guess
I
I'm
here,
just
because
I
thought
this
was
interesting.
We're
a
scientific
research,
Computing
Center,
all
the
stuff
we
do
is
research,
oriented
and-
and
our
success
with
Seth
in
the
past
few
years
is,
has
led
to
that
be
kind
of
coming.
B
The
default
path
that
we're
trying
to
take
going
forward.
A
I
got
a
question
so
you're
processing
like
for
your
big
17
petabyte
cluster,
due
process
directly
off
of
that
for
the
physics
data
for
CMS.
B
Yeah,
yes,
and
no
so
the
CMS
workflows
are,
are
mixed.
The
majority
of
the
access
to
it
is
largely
bulk
data
storage.
We
we
do
have
in
the
past.
When
we
had
hdfs,
we
actually
had
a
mix.
The
the
data
nodes
also
were
the
worker
nodes
of
the
cluster.
So
it
was,
you
know,
by
servers
with
CPU,
good
CPUs
and
then
disks
in
them
too,
as
we
moved
to
Seth
the
the
servers
with
usds
are
separate.
They
don't
run
any
of
the
workflows
CMS.
B
There
are
cases
where
that
happens,
and
we
do
see
Heavy
usage
at
times
that
pretty
much
just
maxes
out
whatever
our
capabilities
are,
but
a
lot
of
it
is
remote
transfer
in
and
out
over
the
wan
to
and
from
other
sites,
just
because
of
the
nature
of
how
that
project
works,
that
if
the
working
set
size
is
fairly
small,
hundreds
of
terabytes
or
less
the
majority
of
it
is,
you
know,
data
that
just
happens
to
be
staged
around
various
sites
across
the
planet.
B
For
replication
and
redundancy
and
sites
remotely
access
that
and
read
streaming
from
from
our
file
system
when
workflows
demand
that
data
that
we
happen
to
have
I.
Think
that
answers
your
question.
Maybe
yeah.
A
Yeah
I
think
so
I
know,
but
it
was
UW
Madison
here
there's
a
the
physics.
We
have
a
I
think
we're
a
tier
2
site
somewhere
on
campus
as
well,
but
I
think
they
do
a
lot
of
like
improv
processing
in
place
on
their
cluster
yeah.
B
A
B
Here
too,
they're
actually
I
I
know
the
guy
or
the
people
that
work
on
it
fairly
well,
they're
still
running
hdfs
and
plan
to
continue
with
hdfs
for
a
while
yeah
I.
A
B
C
Yeah
great
thanks,
Karen
and
thanks
everyone
I
was
invited
to
this
meeting
by
Thomas
Bennett
who's
also
what
was
in
the
same
Institution
I'm
from
the
South
African
radio,
astronomy,
Observatory,
we're
based
in
Cape,
Town
and
South
Africa,
and
can't
leave
us
if
cluster
and
operation
there,
if
there
are
a
few
of
them
but
we're
using
SEF
to
provide
optic
object,
storage
for
the
data
products
of
the
meerkat
radio
telescope.
C
As
far
as
the
detailed
technical
details
go,
I'm
not
going
to
try
and
pretend
I
know
exactly
how
it's
set
up.
Thomas
Bennett
has
all
of
those
details
and
he's
and
and
he's
on,
the
call
and
I'm,
not
sure.
If
you
you
know
him,
but
this
is
this
is
part
of
a
transitioning
process
of
Thomas
is,
is
now
moved
out
of
cereo
and
it's
moving
into
a
private
consultancy
and
hopefully
we'll
still
be
supporting
our
safe
cluster.
C
So
I'm
here
for
for
continuity
and,
of
course,
to
maybe
learn
a
thing
or
two
as
we
go
along
thanks
for
hearing
me
out.
A
F
Anybody
else
got
hello,
here's
my
cluster
type
of
thing.
They
want
what
do.
G
You
think
okay,
I
I,
think
I
could
save
one
word
or
two
words.
You
have
I'm
from
CSC
pieter
and
you
only
have
most
likely
told
you
about
our
clusters
already,
but
just
straight
to
hello,
you
and
they
I-
haven't
been
in
this
meeting
earlier
I'm,
basically
trying
to
take
that
Union,
better
and
other
guys
have
resources
and
capability
to
run.
Our
machines.
G
D
I
can
continue
with
the
CSC
side,
so
I
I,
don't
remember
if
I
told
you
a
year
ago
or
within
a
year
that
our
plans
to
make
a
supercomputer
compatible
S3
authentication
things
I
on
that
timer
we're
planning
to
use
a
token
based
Authentication.
D
So
there
would
be
a
secret
and
the
key
and
then
token
for
every
user,
so
token
would
expire.
And
then
then
we
could
put
S3
credentials
on
a
on
a
supercomputer
side.
D
On
those
patch
queue
systems,
without
thinking
that
we
are
leaking
too
much
credentials
on
the
time,
so
temporary
keys
for
for
certain
usage
on
the
supercomputer
side,
that
was
really
promising
at
start,
but
it
ended
because
of
client-side
tools,
the
handling
of
S3
credentials
with
the
token
failed.
D
So
some
of
some
of
the
tools
they
understand
the
token
principle,
but
majority
majority
of
the
tools
that
our
customers
were
using
are
failing,
and
now
we've
been
developing
method
to
Auto
expire
keys
on
on
on
S3.
So
if
you
are
using
familiar
with
this
fifth
API,
you
can
there
expire
the
keys
quite
easily,
but
with
S3
they
are
like
forever.
If
you
have
a
key
and
secret
there,
people
are
tend
to
put
them
places
and
they
forget
that
they
have
leaked,
for
example,
the
keys.
D
F
A
I
think
that'd
be
an
interesting
soccer
whatever
at.
F
D
A
E
So
if
I
could
just
ask
you
saying
that
the
the
tokens
weren't
working
for
standard
tools,
where
which
standard
tools
were
you
trying
that,
when
supporting
tokens.
D
We
were
trying
multiple
tools:
everything
from
our
clone,
S3
CMD
cyber
duck
things
like
that.
D
So
most
of
them
worked
just
fine,
but
but
not
not
wide
enough
correct
or
different
tools,
so
that
was
that
was
a
good
intention
and
really
a
way.
It
should
done
still.
I
think
that
it
would
should
be
the
way
and
we
found
some
bugs
on
a
runner's
Gateway
internal
code.
Already
with
that,
and
we
have
made
made
a
progress
with
the
Naya
about
fixing
those
token
problems
on
the
back
end
or
the
right
Escape
by
side.
During
that.
D
D
A
D
And
size
of
the
cluster
is
40
petabytes.
D
D
We
have
eight
racks
of
machines
and
when
you
are
adding
a
node
it
it
starts
failing,
the
timeouts
will
will
get
way
too
long
when
you
are
adding
a
node
with
orchestrator.
D
D
H
You
remember
where
I
think
the
limiting
factor
was
regarding
the
safe
ADM
like
the
ultimate
check
or
something
we
tried
to
extend
it,
but
after
extending
it,
it
will
do
something
else.
Funky.
D
D
D
F
D
B
D
B
I
was
asking
because
I'm
about
a
just
over
a
thousand
osds
on
Pacific
with,
but
with
the
orchestrator
and
after
we've
gone
through
various
other
bugs
with
the
orchestrator
get
into
that
point.
Mostly,
the
ones
with
nodes
that
had
too
many
osds
in
them
was
happy
to
see
that
one
get
fixed.
B
So
yeah
I
was
just
wondering
how
much
further
I
have
to
go
before.
I
have
to
start
recording.
D
A
D
But,
for
example,
with
a
queen
see
if
you're
running
rados
gateways,
there
is
a
quality
of
service
component
on
Quincy
and
my
light
testing.
Is
that
I
really
like
it,
because
I
can
suppress
some.
E
Yes,
I
know
that
Luca
from
from
the
poison
soup
Computing
Center,
he
gave
us
some
information
on
their
their
experiences
with
Quincy,
eskel
and
I
see.
There
is
actually
a
link
here
on
in
one
of
the
blogs
on
it's
called
Quincy
at
scale.
E
Testing
with
proposing
super
computer
center
and
I
see
they've
got
4
320
osds
69
petabytes,
which
that
they're
deployed
with
with
yeah,
which
yes
there's
the
which
they
deployed
Quincy
and
I,
don't
know
what
what
they
were
using,
but
yeah
I
can
I
can
just
copy
and
paste
it
into
the
chat.
A
All
right,
I
guess
we'll
kind
of
just
hit
some
things
on
the
topics
list
upgrades
that
people
want
to
talk
about
that
have
gone,
really
good,
really
bad
somewhere
in
the
middle.
A
My
contribution
to
that
is,
I
have
a
cluster
that
is
on
a
mix
of
Centos,
7
and
8,
and
one
of
my
team
members
is
working
on
that
conversion,
so
that
we
can
actually
upgrade
pass
octopus
to
Pacific
would
be
really
nice
one
of
these
days,
but
we're
doing
he
figured
out
how
to
do
an
In-Place
conversion.
A
Where
you
know,
if
we
Kickstart
using
in
place
the
only
destroy
the
OS
drives,
don't
touch
any
of
the
osds
and
then
when
it
comes
up
it
just
it
has
a
process
that
you
figured
out
to
bring
the
osds
up,
discover
them
with
stuff
volume,
bring
them
up,
and
you
know
it
takes
60
Minutes
of
host
or
something
I
think
I
saw
him
if
he
joined
the
call.
So.
F
J
D
J
A
So
yeah
I
hope
that
answers
that
their
question
like
doing
a
full
drain
node
would
have
been
very
painful
with
that
many
we
definitely
skip
that
part
of
it
and
just
do
the
in
place
and
it's
Gonna
Save
Us
just
tons
of
time
and
then
at
the
end
of
the
day
here.
F
I
I
would
just
a
quick
recap:
my
name's
Bruno
canning
I'm
working
welcome
Sanger
Institute
in
South
Cambridge.
Here
in
the
United.
We
are
one
of
the
largest
genome
sequencing
facilities
and
genomics
research
facilities,
world
we're
running
a
51
node
Steph
cluster
with
60
drives
storage
node,
it's
procured
about
five
six
years
ago
now.
I
So
it's
a
little
long
in
the
tooth
fix
terabyte
drives
for
each
OSD,
and
the
purpose
of
the
cluster
is
mainly
two-fold
and
that's
to
serve
as
a
back
end
for
openstack,
and
so
our
users
can
create
VMS
with
bespoke
analysis
workflows,
and
they
can
do
so
as
obviously
as
root,
and
we
also
operate
radish
router
Skateway
service,
which
is
essentially
just
data
data
drop
for
for
users.
I
We
have
other
data
storage
facilities
on
site
like
irons,
which
gives
us
metadata
annotation
of
our
data
as
well,
and
we
operate
compute
files
too.
So
our
surf
cluster.
It
was
running
a
bionic,
Ubuntu
18.04
and
we
upgraded
to
vocal,
which
is
20.04.
We
had
to
go
to
focal
because
we're
on
octopus
and
we
wanted
I,
think
I
think
we
are
required.
We
can't
go
straight
to
Quincy,
I.
Think
there
is
this:
changing
the
OSD
code
such
that
we
have
to
go
over
the
Civic
first
and
so
our
upgrade.
I
We
started
with
our
monitor
hosts
one
at
a
time,
because
we
want
to
be
very
careful
with
these
and
observe
them,
but
we're
running
the
entire
cluster
is
three-way
replicated
so
and
we
have
four
failure
domains,
so
it
spans
it
spans.
Four
racks.
I
I
The
other's
degradation
in
the
data
redundancy
one
thing
I
did
notice
is
that
it's
perhaps
related
to
the
the
ceph
orchestrator
problem
observed
by
you
guys
at
CSC.
Is
that
when
it's
time
to
bring
the
osds
back
online,
if
you
set
a
note
up
and
what
it
can
do
is
set
for
them
patiently
bring
all
the
osds
up,
but
they
won't
actually
start
peering.
You
can
bring
up,
you
know
14
times,
16,
osds
and
they're
all
ready
to
go,
and
then
in
one
command.
I
You
are
you
the
no
upper
flag
and
you
don't
get.
You
obviously
get
a
huge
amount
of
peering
happening,
but
you
don't
get
the
state
where
there's
a
long
delay
between
the
first
OSD
coming
online
of
the
upgraded
hosts
and
the
last
one,
because
this
will
this
will
cause
backfilling
and
then,
as
more
rsds,
come
online
Seth
realizes
that?
Oh,
some
of
the
backfilling
it's
already
embarked
on
is
is
actually
not
not
required.
I
I
A
smooth
upgrades
in
normal
I
think
it
took
us,
we,
we
weren't
active
on
it
every
day,
but
we
did
it
in
about
16
days.
That
includes
weekends.
When
we
weren't
working.
D
I
At
least
I
was
a
little
horrified
when
one
of
my
colleagues
suggested
that
I
could
take
an
entire
domain
off,
but
he
said
well,
look.
We
had
a
power
failure
in
the
data
center
some
time
ago
and
the
cluster
just
kept
on
kept
on
running
kept
on
serving
IO
and
then,
when
the
powers
restored
to
that
failure
domain,
you
know
it's
then
back
catches
up
backfills
and
you
know
it's
Health
occur
again.
So
you
know
what
it
is
scary
to
take
that
many
nodes
off
in
one
go.
B
You
have
a
question
since
talking
about
Crush
map
back
when
I
was
first
starting
with
Seth,
and
we
had
a
mix
of
node
sizes.
Some
were
small
40
terabytes,
some
were
800
terabytes
and
it
turned
out.
I
had
not
not
enough
of
each
type
to
do
what
I
was
expecting
to
do
and
I
had
all
sorts
of
balancing
problems
that
went
through
all
of
that
all
resolved
after
having
a
sufficient
quantity
to
meet
the
you
know
erase
your
coding
profile.
We
had
set,
of
course,
and
everything
seems
fine.
B
So
my
generic
question
is
whether
there
have
been
any
developments
there
or
are
there
any
cases
with
some
of
you
that
have
larger
clusters
and
perhaps
a
more
mixed
environment
where
there
are
still
challenges
as
far
as
keeping
this
utilization
balanced
across
the
large
things,
or
is
that
largely
solved?
And
you
know
happy
easy
times.
A
I
think,
even
with
our
art
festival,
you
know
we
had
six
eight
ten,
twelve,
whatever
14
terabyte
drives
on
at
once.
You
know
24
each
and
ever
since
you
know
they
started
putting
the
effort
into
the
manager
and
the
balancer
and
up
map
back
in
the
day
yeah.
It
was
a
pain,
keeping
things
balanced
across
a
cluster.
But
now
it's
pretty,
in
my
opinion,
it's
pretty
trivial.
A
G
D
That
it
fulfills
seven
hours
T
is
too
much
so
there's
a
small
small
data
on
overseas
and
it's
really
hard
to
move
that
up
out
properly
because
in
a
way
it's
on
the
right
place.
But
it
gets
some
of
the
operations
really
really
really
really
really
slow.
D
J
J
Third
party
I
think
I've
got
the
link
to
it.
Actually
I'll
put
that
in
the
chat
I've
seen
it
on
the
mailing
list,
a
few
times
recommended
by
various
people.
G
J
Yeah,
it's
I'm
a
bit
foggy
on
the
details
between
the
different
balancers
right
now,
but
I
seem
to
remember
that
one
of
them
may
have
been
or
more
complicated.
Crush
Maps
example.
B
One
you
listed
is
actually
the
one
that
I
used
and
it
was
recommended
to
from
people
at
CERN
on
some
of
their
early
days
with
dealing
with
balancing
issues.
But
when
I
had
16
nodes
that
had
40
terabytes
and
three
nodes
that
had
you
know
upwards
of
800
terabytes,
you
know
I
had
to
use
this
tool
in
order
to
get
anything
resembling
equal
space
utilization
and
it
it
got
out
of
hand
quickly.
I
use
this
Tool
too
many
times.
B
Essentially
in
my
I
had
you
know
revamped
smile
log,
and
it
was
like
wait.
How
do
I
undo?
All
of
this
now
took
a
while
to
actually
learn
how
all
the
things
were
pieced
together,
but
it
certainly
works,
and
you
know,
has
its
use
cases.
But
ever
since
we
had
we,
we
have
enough
nodes.
Now
that
are
eight
plus
three
Erasure
encoding.
You
know
that
the
balancer
can
figure
out
and
do
the
right
thing
with
the
ability
to
put
data
in
the
right
places.
B
D
E
Okay,
yeah,
because
I
had
some
issues
in
in
luminous
at
one
at
one
point
where
it
was
yeah
it
just
there
was.
There
was
a
bug,
I
can't
remember,
but
yeah
it
basically
gets
its.
It
gets
very
confused.
B
Another
question
so,
when
I
introduced
the
Seth
clusters,
we
had
I
neglected
to
mention
the
really
old
original
ceph
cluster,
which
backs
a
openstack
instance,
which
is
still
running
Joule
in
the
case
of
you
know,
don't
touch
it,
don't
talk
about
it.
It
has
anyone
gone
the
whole
Jewel
all
the
way
up
to
latest
things
with
some
of
their
clusters
were
there
any
major
gotchas
between
versions
that
I
should
just
it's
not
worth
trying
her
plan
is
to
replace
it.
You
know
new
Greenfield
solution,
but
I'm
just
curious.
B
A
I
said:
do
it
my
cluster,
my
big
one
started
on
Hammer
I,
think
and
I've
ran
octopus.
Now
I've
hit
every
version.
I
prefer
to
just
hit
I
know
you
can
skip
some
versions
in
some
cases,
but
if
you
just
read
the
yeah,
if
you
read
the
release
notes
and
you
just
do
all
the
steps
you
know,
I've
never
had
a
problem
coming
all
the
way
from
Hammer
up
to
octopus.
B
All
right,
you
know,
I
had
gone
back
and
looked
at
some
of
the
you
know,
older
release,
notes
But,
as
time
has
gone
on,
finding
the
finding
documentation
that
is
correct
or
looks
correct
and
official
for
the
old
versions
has
become
more
challenging.
So
it's
like
this
really
applicable
still.
Is
it
a
is
something
yeah.
A
B
B
I
It
yeah
I
was
gonna,
say,
but
if
it's
been
in
operation
for
that
long,
you've
obviously
not
encountered
yeah
the
people
to
speak
to
about
that
would
be
Rutherford
Appleton
laboratory
in
Oxfordshire.
Okay,.
K
I
Those
people
used
to
work
there
some
time
ago
and
I
think.
I
We
had
some
issues
with
the
first
deployment,
so
we
tore
our
cluster
down
and
we
built
it
completely
from
scratch.
I
think
that
was
the
Dual
release
and
they
certainly
got
as
far
as
luminous
I'm
a
bit
out
of
touch
with
them,
but
I
should
imagine,
I
mean
they're
running
it
as
a
production
system,
so
I'd.
Imagine
if
they're
not
on
the
latest
they'll
be
only
one
main
releases
behind
yeah,
Tom
Byrne
is
I.
Think
he's
principal
operator
of
Lancaster
he'll
certainly
know
once
more.
G
B
B
I'm
going
to
ask
another
question:
if
nobody's
gonna
stop
me:
we've
done
the
orchestrator
on
rail
8,
so
Alma
8,
actually
systems
and
we're
gonna
deploy
a
new
one
here,
as
I
mentioned
at
the
beginning,
I
just
quickly
glanced
and
it
looks
like
rel9
packages
and
all
of
that
exists
or
have
are
people
using
death
in
the
orchestrator
on
rail9
and
it's
all
happy
or
is
there
still
a
little
some
work
to
go
there.
D
Well,
we
haven't
upgraded
the
nine
version
of
a
railover
events
yet,
but
there
is
a
one
big
thing
which
is
difference
between
teaming
and
bonding.
So
in
case
you
are
using
a
teaming
on
a
Centos
Rocky.
Whatever
eight
person,
the
nine
version
is
in
combat
with
income
of
it
all
with
that,
so
so
the
now
the
sit
they
are
shifting
back
to
bonding
bonding
will
break
your
teaming.
If
you
have
done
teaming
on
on
a.
B
D
D
F
A
I
see
somebody
put
in
some
stuff
for
bugs
if
they've
had
anybody
who
did
that.
But
if
you
want
to
talk
up
to
the
let's
say
some
kernel
client
issues.
K
F
K
F
K
Yeah,
okay
I
just
wanted
to
put
these
bugs
because
they
appear
to
be
slightly
annoying
now
use
case
on
HPC
because
we
use
ffs
for
most
of
the
data,
including
holes
and
stuff,
and
so
basically
the
this
current
degradation.
So
there
is
already
a
back
report
over
there.
K
It's
not
clear
Colonel
back
off,
it's
ffs
landmark
right
and
the
tendency
is
maybe
it's
more
kernel,
but
because
nothing
much
was
changing
seems
that
it
creates
much
too
much
load
in
the
system,
so
I'm
not
quite
sure
what
it
happens
in
the
plant
whatever
but
effectively
it
works
two
times
slower
right
to
Benchmark
and
still
it
still
works.
K
Okay,
the
stress
on
the
on
the
Note,
the
client
node,
is
much
higher
they're
coming
to
work
and
the
other
so
I'm
not
sure
if
somebody's
pushing
for
that,
but
even
Bradford
nine
is
experienced
this
back.
Let's
say
redcatates
has
let's
say
an
older
kernel
which
doesn't
they
work
quite
well.
A
The
other
one
is
about
yeah,
so
if
you're,
seeing
like
the
2x
slower
for
reading
and
writing
from
suffer
fast
like
at
the
same
time,
you're
seeing
high
like
CPU
utilization
and
like
CIS
time
or
something,
you
said,.
K
At
random
times
those
standby
viewers
brush
yeah
and
example,
last
week
this
enabled
we
had
a
lot
of
streaming
was
not
working
actually,
so
we
when
we
removed
when
we
disabled,
active,
replay
everything
started
to
work
again,
although
it
took
an
hour
recover
and
it's
looks
stable
this
way,
that
might
be
some
bugs
in
the
latest
release
it's
pretty
new
technology
anyway.
K
K
The
primary
wasn't
crashing,
but
the
last
time
I
was
away
at
the
time,
so
I'm
not
sure
what
happens.
Also,
but
we've
got
this
stable
only
after
disabling
back
to.
F
F
F
A
All
right,
well,
I,
guess
they're,
just
asking
any
experience,
observations
of
right,
amplifications
and
nvme
clusters,
I,
don't
know
if
they're
trying
to
talk
and
mute
it
or
we
can't
hear
them
or
something
right
now.
But
anybody.
I
I
am,
but
we
have
been
stung
on
octopus,
with
large
deletion
campaigns
by
our
users,
which
create
a
very
large
garbage
collection,
to-do
list
and
when
the
garbage
collector
in
rados
Gateway
starts
operating.
I
I
Although
the
OSD
is
it's
running,
just
fine,
it
gets
marked
down
by
its
peers
and
then
I
think.
As
soon
as
one
of
our
osts
goes
down,
we're
actually
rebalancing
back
feeling.
Sorry,
then
the
osc
comes
back
and
says:
oh
yeah,
no
problem
I'm
still
here
and
then
yeah,
but
after
a
while,
what
can
happen?
Is
you
get
an
avalanche
failure?
You
know
eventually
the
OSD
will
be
marked
down
properly
and
you
get
an
avalanche.
Failure
of
the
next
OST
becomes
a
problem
in
the
next
one.
I
And
perhaps
but
the
solution
from
our
vendor
was
to
get
nvme
storage
devices
and
move
the
the
routers
Gateway
pools
over
to
that.
I
I
F
Anybody
else
have
any
topics
they
wanna
throw
out
there.
A
B
I
will
at
least
admit
to
a
failure
of
monitoring
where,
after
a
power
outage,
many
of
our
large
nodes
that
are
Western,
Digital
external
jbods,
60
disks,
some
of
them
came
up
before
the
hosts
came
up
and
some
of
them
came
up
after
the
hosts
came
up
and
therefore
some
set
of
hosts
had
no
disks,
and
that
of
course
happens
on
a
weekend
in
them
like
not
paying
attention
and
tough
happily
tried
to
correct
and
rebalance
and
solve
itself
throughout
about
three
days
before
somebody
noticed
and
said
what's
going
on
here,
but
through
it
all
data
was
accessible
because
the
number
of
nodes
that
died
were,
you
know,
separated
out
and
enough
that
it
didn't
impact
the
data
availability.
D
F
F
All
right,
if
nobody
has
anything.
A
Else
for
today,
I
just
got
one
quick
thing
left
and
that
is
hopefully
everybody
saw
catholicon
2023
in
Amsterdam
is
happening.
April
I
am
planning
on
submitting.
You
know
in-person
birds
of
a
feather
session
for
one
of
these
things.
Hopefully
a
bunch
of
you
will
make
it.
A
If
you
see
the
emails
we'll
have
one,
if
not
just
assume
we'll,
do
it
at
supplicon
yeah.
If
you
want
to
be
on
the
private
reminder
list
for
this
I
take
the
sign-in
emails
and
do
that
because
things
are
easily
pissed
on
the
Southwest
sometime.
D
A
Yeah,
exactly
it's
only
a
couple
weeks
after
when
the
next
one
would
be
anyway.
So
it's
like
just
wait
for
that
and
we
can
have
a
nice
birds
of
a
feather
session
and
then
continue
it
with
some
beers
and
food
afterwards
or
something.
D
F
Thank
you,
yeah.
It's
everything
for
me,
hopefully
see
some
of
you
at
ceflikon.