►
From YouTube: Ceph Science Working Group 2020-07-23
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right
well
welcome
to
july
now
stuff
science,
big
cluster,
whatever
user
group
meeting
here
I'll,
just
give
my
shirt
a
little
spiel
in
case
people
out
here
that
haven't
joined
before
this
is
just
a
very
informal.
A
You
know
virtual
chat
that
we
have.
You
know
every
two
months
for
people
in
the
related
field
of
science
and
research,
or
really
just
big
clusters
in
general.
I
guess
there's
a
pad
link
in
the
chat
there's
also
in
the
the
list
email.
We
kind
of
keep
live,
notes
there
and
topics.
If
you
have
anything,
you
want
to
talk
about,
feel
free
to
add
it
to
the
topic
list.
A
Otherwise,
just
you
know
bring
it
up
at
some
point.
I
kind
of
just
go
through
the
topic
list
and
write
it
keep
the
conversation
going,
I'm
more
of
a
moderator.
There
isn't
a
presentation
or
anything.
A
A
B
A
C
We
we
had
a
partial
outing
in
our
main
cluster
from
1604
to,
and
there
was
a,
I
think,
a
race
condition
in
the
if
up
down
scripts
in
1804.
That
meant
sometimes
our
bonded
interfaces
in
the
v
land
on
them,
which
come
up
all
right,
but
without
a
default
route.
C
And
what
that
means
is
that
when
the
server
reboots,
all
these
osd's
can
talk
to
the
cluster.
Just
fine,
which
means
that
the
cluster
remains
happy.
But
their
clients,
then
can't
talk
to
an
osd
so
that
they
can
talk
to
the
modifier
which
gives
them
the
the
customer.
And
then
they
try
and
talk
to
a
particular
osd.
C
C
Suddenly,
just
sort
of
couldn't
perform
any
io
which
caused
a
certain
amount
of
head
scratching
until
we
realized
what
the
problem
was
and
we
fixed
it
by
using
a
later
version
of
the
cut
down
which
brings
up
the
default
gateway
reliably
and
we've
also
added
an
override
to
our
systemd
service
files
for
the
osds,
which
means
they
won't
start
up
if
they
can't
ping
all
of
our
bits
of
essential
internal
infrastructure,
which
will
mean
in
future.
A
Clients
an
interesting
one.
I
had
that
something
similar
a
long
time
ago,
whereas
I
know
that
if
up
networking
stack
on
centos,
just
probably
four
years
ago,
like
though
santos
wasn't
bringing
up
the
network
device
before
the,
if
up
scripts
were
running,
and
so
this
server
would
come
up
with
no
networking.
It
sounds
similar,
but.
B
C
We've
seen
we've
occasionally
seen
race
conditions
before,
but
normally
then
the
failure
mode
is
that
the
usually
the
bond
is
in
phase
doesn't
come
up
at
all.
We're
not
we
I've
not
previously
seen
it
managed
to
come.
D
A
C
Yeah
we
I
mean
it
affected,
maybe
three
or
four,
but
we
have
51
so
yeah.
We
we
spotted
it
fairly
quickly.
I
I
guess
our
systems
were
unhappy
for
about
half
an
hour
while
we
crashed
our
heads.
So
whatever
the
problem
was
yeah
so
yeah,
it
was
a
kind
of
a
relatively
short
partial
outage,
but
just
a
bit
a
bit
distressing.
C
A
E
D
I
guess
I'm
stepping
up
to
the
confessional
here.
I
don't
know
what
it
feels
like.
Doesn't
it
yeah?
So
we
we
have
yeah,
we
had
a
13
petabyte
production
cluster
that
was
reaching
full
capacity
and
we
needed
to
introduce
new
new
hardware,
and
I
was
taking
my
sweet
time
about
getting
the
new
hardware
and
for
various
reasons
we
had.
D
There
was
a
bug
in
the
balancer
at
some
point
that
wasn't
rebalancing.
It
was
basically
coming
back
telling
me
that
the
cluster
was
perfectly
balanced.
The
audio
was
telling
me
what
couldn't
find
solutions
that
yeah
the
balance
would
run
away
and
basically
yeah
charge
cpu
until
it
until
we
stopped
it.
So
we
so
there
were.
D
There
were
some
issues
with
the
balance,
so
this
is
in
12
to
3,
I
think,
is
what
we're
known
as
12
to
12
to
12
is
what
we're
on
yeah,
and
so
we
were
busy
trying
to
figure
out.
What's
going
on
the
balancer
to
sort
that
out,
then
our
cluster
was
getting
full
and
we
suddenly
had
a
whole
bunch
of
near
full
osd's.
So
now
it
was
crisis
time
we
knew
we
had
to
sort
this
out
the
other
kind
of
yeah.
D
We
were
trying
to
figure
out
what
to
do
was
to
how
to
add
new
hardware
into
the
cluster,
and
so
we
spent
a
fair
bit
of
time
just
trying
to
figure
out
what
the
best
way
forward
was
because
yeah
we
had
tried
gentle
re-weight
and
it
just
wasn't
really
doing
what
we
were.
What
you
wanted,
I
think,
at
the
end
of
the
day,
is
the
other
problem
is
that
we
have
particularly
large
pgs.
We
have
a
low
pg
count,
and
so
we
ended
up
with
we.
D
We
have
eight
terabyte
hard
drives
on
our
cluster
and
rpgs
for
100
gigs.
So
suddenly,
when
the
balancer
wanted
to
rebalance
stuff
as
well,
it
was
kind
of
shifting
a
hundred
gigs
around.
So
you
didn't
want
to
do
that
up.
Has
it
in
some
kind
of
ad
hoc
way
you
wanted
to
make
sure
that
it
was
that
yeah
when
it
when
it
moved
stuff
around.
It
wasn't
just
moving
stuff
around
for
the
sake
of
moving
stuff
around
so
yeah.
D
So
we
ended
up
stopping
right
to
the
cluster
for
three
days,
while
we,
while
we
added
in
the
new
hardware,
the
way
we
did
that
was,
we
used
dan
from
astaire's
app
map
remap.
So
we
we
basically
added
all
the
new
hardware
with
no
rebalance
set
in
the
cluster.
We
then
yeah.
We
then
basically
moved
all
the
pgs
back
to
where
they
originally
were,
and
then
the
holiday
was
not
kicking
the
balancer
and
the
balance
was
supposed
to
rebalance
the
cluster.
D
However,
the
problem
was
we-
and
this
is
probably
our
own
fault
once
again,
let's
say
we're
kind
of
stepping
up
to
the
confessional
here
we
we
had
near
full
osds
and
because
we
have
particularly
large
pgs
what
was
happening
was
the
balancer
was
wanting
to
move
data
onto
from
near
full
ost's
to
near
flow
osds,
so
it
was
kind
of
doing
a
non-optimal
move.
D
Obviously,
in
terms
of
the
the
the
the
crash
map
behavior,
it
was
optimal,
but
in
terms
of
as
trying
to
you
know,
write
the
data
across
to
the
new
hardware.
It
wasn't
optimal,
so
what
we
ended
up
doing
was.
I
ended
up
creating
a
few
scripts
that
would
take
a
we'd
use
the
omap
tool
and
basically
work
up
a
whole
bunch
of
app
maps
that
it
will
cultivate
a
whole
bunch
of
web
maps
apps.
D
We
then
filtered
that,
according
to
ones
that
were
going
to
drain
the
full
linear,
full
osds
and
put
them
onto
the
new
hardware,
so
we
were
pro.
We
basically
had
a
prioritized
balancer
and
this
prioritized
balance.
So
we
then
ran
for
like
a
week
or
two
just
to
drain
out
those
really
those
full
rsds
and
we
then
kicked
in
the
the
set
balancer
and
and
yeah
set.
Valencia
now
runs
and
happily
is
balanced,
rebalancing
our
cluster.
D
So
we
we
went
from
a
yeah
from
basically
from
a
state
where
it
was
we
I
mean
yeah.
D
We
were
flying
pretty
close
to
close
to
the
sun
in
terms
of
what
what
you
know,
what
should
be
allowed
in
terms
of
near
full
osds,
but
then
yeah
we're
able
to,
as
I
say,
with
a
couple
of
scripts,
try
and
figure
out
carefully
how
we
can
just
for
that
that
little
crisis
period
of
of
of
a
week
or
two
just
to
you
know
copy
this
stuff
off
prioritize
the
the
app
that
maps
that
would
actually
take
data
off
the
full
rsds
and
put
them
onto
the
the
the
two
new
racks
we
had.
D
We
had
added
and
that's
kind
of
got
us
through
the
woods
and
yeah.
Basically,
now
we
know
we're
happy
again,
so
that
was
yeah.
It
was
quite
a
a.
I
think
it
was
a
lesson
in
how
not
to
add
new,
how
how
not
to
wait
until
the
end
to
add
new
hardware.
You
know
when
you
want
to
add
new
hardware.
Add
it's
when
you're
in
the
class
is
not
too
full.
D
But
if
you,
if
there
are
problems
you
can,
there
are
ways
of
of
solving
this.
As
your
raised
hand,
there
yeah.
F
All
right
we're
in
the
middle
of
a
little
bit
of
pain.
Actually,
we
we've
got
a
northless
cluster
and
we're
using
erasure
encoding,
and
we
due
to
the
covert
we
couldn't
get
in
to
add
more
osds
until
last
week,
and
we
got
to
a
state
where
we
had.
F
Maybe
a
third
of
our
osds
were
in
their
full,
a
little
panic,
so
we
suddenly
managed
to
add
you
know:
10
more
service,
the
original
10
and
tried
to
add
another
100
osds.
Well,
it
all
looked
good
and
then
suddenly
we
realized
that
40
of
the
pgs
were
locked
in
the
activating
state.
So
we
couldn't
actually
get
access
to
our
data
at
all,
and
the
only
way
forward,
then,
was
to
re
what
it
was
to
weight.
F
The
osds
down
from
their
their
size,
which
these
were
16
terabyte
drives
we're
adding
to
an
eight
terabyte
cluster
was
to
actually
weight
them
down
to
a
terabyte
each
then
all
the
osds
would
activate
and
we're
slowly
slowly
now
creeping
forward,
increasing
the
weight,
as
things
start
to
balance
out
and
it's
a
horrible
situation
to
be
in,
but
it's
slowly
slowly
getting
there
now.
So
I
completely
empathize
with
what
you're.
F
F
How
how
fast,
oh
away
we
about
90
percent,
already
five
percent
on
west
europe.
D
Yes,
we
were
hitting,
we
were
hitting
between
85
percent,
full
yeah,
so
yeah-
and
it
was
I
mean
we
have
we
have,
but
then
the
the
problem
was
as
well
as
we
had.
We
had.
You
know
10,
10
or
20
year,
full
rsds,
but
then
we
had
the
rest
of
those
sitting
at
between
70
and
and
79
percent,
so
that
the
whole
cluster
was
was
full
and
if
yeah
yeah,
you
know
it's
kind
of
that's
the
situation
you
you
want
to
find
yourself
in,
but
I'm
questioning
why.
D
Why
did
the?
Why
were
the
oceds
not
activated
yeah,
the
new
ones.
F
F
Where
am
I
going
to
put
all
these
and
and
it
just
got
stuck
in
the
lid-
I
don't
know,
but
the
only
solution
forward
we
found
was
actually
changed.
The
crush
weight
of
the
usds
down
from
14
and
a
half
down
to
one
and
then
slowly
start
ramping.
It
up.
Nothing
else
worked.
F
D
E
That
I'm
I'm
waiting
for
but
I'll.
I
stuck
it
in
the
upgrading
to
octopus
section,
which
is
just
kind
of
where
it
landed,
but
maybe
people
have
have
other
bugs
or
ratchets
they
they
want
to
talk
about
before
we
before
we
move
on.
A
Kind
of
going
back
to
thomas's
thing:
does
anybody
increase
their
near
flow
ratios
on
the
ot
since
drives?
Are
so
big
now
from
the
defaults
was?
Was
it
like
85
or
something?
E
How
how
heterogeneous
are
your
drive,
sizes.
E
That's
that's
an
interesting
thought,
though
kevin
yeah.
I
would
have
to
think
about
that.
We're
actually
we're
getting
ready
to
add
storage.
We
actually
have
it
just
sitting
on
pallets
in
our
data
center
waiting
to
be
reacting
everything,
but
I
think
I'm
a
little
worried
because
the
person
who
expected
out
was
really
going
for
a
price
performance
ratio,
and
so
we
ended
up
with
machines
with
60
hard
drive.
E
A
Yeah
it
was,
I
was
just
saying
that
yeah,
those
are
some
pretty
big
notes.
I
wouldn't
be
comfortable
with
those
in
my
cluster.
It's
a
lot
but
depends
on
the
cluster.
I
prefer
to
go
smaller
and.
E
C
So
all
of
our
stuff
nodes
are
that
size,
but
they're
all
60
disc,
j
mods,
but
we've
got
quite
a
lot
of
them.
So
I
think
we've
got
51
in
our
production
cluster.
So
if
we
lose
one,
okay,
there's
still
a
fair
proportion
of
faster.
But
it's
not
it's
not
a
big
deal,
but
there
are
only
36
terabyte
drives
each
because
largely
that
was
driven
by
constant
considerations
and
I
guess
more
smaller
spindles
because
might
give
us
finding
more
ios,
I'm
not
sure.
C
Well,
I
think
it's
perhaps
less
about
the
raw
capacity
of
any
given
node
and
more.
What
proportion
of
your
cluster
is
in
a
particular
node,
the
more
nodes
you
have,
the
less
proportionate
loss,
one
of
them.
A
A
C
I
mean
our
thinking
was
perhaps
more,
you
know
how
many
drives.
Can
we
fit
in
one
box
and
still
expect
to
push
them
all
fairly
hard,
given
the
network
and
cpu
and
pci
capacity
of
those
boxes,
and
we
have
four
used
micro
servers
that
seem
reasonably
good
in
terms
of
you
know
having
enough
performance
to
shove,
to
drive
60,
spinning
rough.
C
E
Here,
maybe
not
all
right,
it
looks
like
it
looks
like
ann
is
talking,
but
it
might
be
muted.
C
Yeah
I
had
that
problem,
which
is
why
I
have
phoning
in
because
I've
I
found
twice
now
when
I've
used
the
computer
audio.
I
can
hear
everyone,
but
not
speak,
or
at
least
not
be
heard.
A
A
All
right
well
well,
she's
rejoining.
I
also
forwarded
a
thing
for
gabriel's
at
the
diamond
light
source.
He
was
going
to
be
giving
a
demo
of
his
stefan
ram,
the
gabriel,
if
you're
on.
If
you
want
to
say
anything
about
that,
otherwise
I
think
most
of
the
people
here
probably
saw
that
email.
B
Yeah,
so
it's
yeah
if
you've
seen
the
email.
It's
just
gonna,
be
a
demonstration
of
using
certain
ram
with
some
in-house
technology,
such
as
savvy
and
then,
like
general
use
of
using
s3
with
it
and
then
just
using
rados
to
show
like
performance
and
scaling
when
using
ram.
A
So
how's
life
with
octopus,
everybody
got
stuff;
they
want
to
speak
to
with
that
migration
problems,
good
bad
sentosa
issues.
I
think
this
is
where
was
liam?
Did
you
put
that
bug
in
there.
E
Right
so
we
upgraded
probably
about
five
weeks
ago
six
weeks
ago,
something
like
that
after
they
released
the
the
point,
release
that
fixed
that
corruption
issue
that
they
had
in
15
2.2,
so
we
upgraded
to
1523
from
nautilus
and
that
all
went
pretty
pretty
smoothly.
The
the
main
thing
that
I
I
noticed
immediately
is
you
know
I
looked
at
the
the
performance
graphs
that
are
the
all
the
diagnostic
information
about
the
the
machines
and
the
io8
had
gone
way
down.
E
I
noticed
on
the
osds
it
was
at.
You
know:
four
went
down
to
about
one.
It
was
about
a
25
of
I
o
weight
of
what
it
had
been
on
these
osd
modes
before
yeah,
so
that
was
interesting
and
everything
seemed
fine
for
about
a
week,
and
then
we
started
noticing
that
a
lot
of
users
were
having
problems
with
their
base,
accounting
being
incorrect
on
octopus.
E
So
you
know
we
would
have
you
know
users,
and
it
would
say
that
they
were
using
about
50
times
as
much
space
as
they
were
actually
using,
and
so
this
actually
created
a
a
real
problem
because
we
quota
yeah
users
on
our
cluster.
So
basically,
they
would
hit
their
quota
prematurely
because
the
accounting
was
wrong
and
not
be
able
to
write
any
data.
E
So
we
did,
we
did
a
little
bit
of
you
know.
Testing
was,
was
there
anything
that
that
we
were?
We
were
missing
and
what
we
ended
up
doing
is
just
rolling
back
to
nautilus
just
for
the
radius
gateway
notes.
So
we
left
all
the
mods
in
the
osds
are
running
octopus
right
now
and
have
been
for
the
last
couple
weeks,
but
the
rattus
gateway
nodes
are
back
down
on
nautilus
and
then
we
were
able
to
resync
the
stats
on
each
individual
user
to
correct
their
their
space
account.
E
So
I
put
in
a
bug
for
this.
It's
marked
as
high
and
it's
it's
been
assigned
to
somebody
so
maybe
it'll
get
worked
on.
I
couldn't
see
any
evidence
from
any
of
the
mailing
lists
or
anywhere
else
that
this
was
a
widespread
problem.
E
E
E
E
They
have
this
one,
they
have
another
rados
gateway
bug
seems
like
all
the
bugs
that
I
hit
were
in
rattus
gateway.
So
maybe,
if
you
don't
use
too
much
radish
gateway,
this
won't
be
a
problem
for
you,
but
there
was
basically
this
one
message
that
was
being
logged
at
the
wrong
level
and
we
ship
all
of
our
logs
off
our
machines
to
log
aggregation,
and
it
was
logging
about
100.
E
It
was
logging,
the
same
exact
message
about
you
know
some
like
fence
posting
error
or
something
and
brought
it
in
the
router's
gateway
code,
174
000
times
a
minute,
and
it
was
actually
creating
performance
problems
for
the
cluster
to
just
keep
up
with
doing
so
much
logging,
and
so
we
had
to
temporarily
change
the
the
logging
levels
and
then
you
know
go
back
to
what
we
were
running
on
nautilus
and
then
set
it
back,
but
set
the
fact
that
logging,
the
way
it
was
and
that
fixed
things.
E
A
E
E
We
use
that
deploy
actually
so
just
use
that
to
redeploy
the
packages
and
insert
the
the
nodes
back
into
the
cluster
once
you
get,
it
was
the
first
time
was
a
little
took
a
little
bit
of
trial
and
error,
but
after
that
it
was
about.
It
was
only
took
about
five
minutes.
E
E
Octopus
yeah,
I
mean,
I
think
the
the
performance
definitely
does
seem
a
little
better,
even
in
our
we're
noticing.
A
lot
in
our
cfs
cluster
seems
better,
and
I
guess
the
problem
with
that
is,
since
the
throughput
of
the
clusters
actually
been
increased,
we
have
enough
clients
that
they're
still
kind
of
just
saturating.
E
You
know
the
extra
performance
that's
now
available
and
it's
caused
some
other
some
other
issues
for
us
around
for
the
first
time
ever,
like
we've
had
this
cluster
for
seven
seven,
eight
years
and
we're
now
just
after
upgrading
getting
messages
like
about
deep
scrubs
timing.
Out
like
it
basically
seems
like
not
all
of
our
pgs
can
deep
scrub
themselves
once
a
week
and
that
just
started
with
octopus.
I've
never
seen
that
before.
E
I
don't
really
want
to
just
give
up
and
just
say:
well,
you
don't
have
to
scrub
as
often
or
something
so
I
haven't.
I
haven't
quite
decided
what
to
do
about
that.
Yet
I've
just
been
dealing
with
it
for
now,
but
haven't
found
a
permanent
solution
to
deep
scrubs,
for
example,
timing
out,
but
I
suspect
that
it's
actually
related
to
the
performance
characteristics
of
octopus,
because
basically,
if
you
have
more
cluster
activity
going
on,
if
you're.
E
F
Liam
you're
just
wondering
you're
using
self-professional
on
your
cluster.
What
clients
are
you
using
the
kernel
client?
But
what
kernel
version
are
you
using
when
you're?
Would
you
plant
some
what.
E
F
E
E
For
people
I
mean
the
other
thing
for
people
to
consider
around
an
octopus
upgrade
is
that
there
are
changes
to
the
requirements
for
the
os
that
you
can
run.
So
I
actually,
I
need
to
upgrade
all
of
my
mods
to
relate
now
they're,
all
under
l7.
E
Currently,
because
I
had
I
had
to
disable
some
of
the
plugins
that
have
python
3
requirements
that
that
it
can't
satisfy
they
have
just
haven't
built
the
packages
for
l7.
All
all
of
the
manager
modules
work
like
prometheus
was
when
we
had
to
disable
the
the
restful
api
manager
and
disable.
F
We'll
hit
that
at
some
point,
but
it's
stable
leave
it
across.
B
E
E
Cool
I'm
happy
to
be
people's
guinea
pig
if
more
people
in
a
year
or
two
are
having
an
interest
in
upgrading.
Eventually,
hopefully
I'll
have.
You
know
taken
a
lot
of
the
pain
and
you
know
they'll
have
fixed
some
of
the
some
of
the
issues,
so
I
think
I'm
just
a
little
surprised
that
it
seems
like
they're.
E
The
theft
developer
community
seems
to
focus
a
lot
on
new
features,
and
it
takes
many
many
point
releases
to
really
get
something.
That's
stable
that
you'd
actually
want
to
use
on
a
large
production
cluster
like
like,
for
example,
in
octopus,
right.
They
put
a
lot
of
effort
into
ceph
adm,
and
it
means
that
there
are,
you
know,
more
fundamental
issues
with
some
of
the
existing
pieces
like,
for
example,
what
I
hit
with
rgw.
C
It's
interesting
what
you
say
about
finding
lots
of
trouble
in
rgw's
we're
not
we're
nowhere.
E
C
Brazos
gateways
currently
we've
had
a
rash
of
multi-part
uploads,
getting
mangled
by
dynamic
sharding,
which
then
involves
a
lot
of
the
unpicking
of
the
data
structures
around
buckets.
But
it
does
seem
that
I
don't
know,
but
the
radar
skate
wage
either.
It's
something
that
not
so
many
people
are
using
less
love
or
it's
just
a
bit
more
complicated
than
some
of
the
other
bits.
E
I
mean,
I
think,
maybe
one
explanation
is
you
know,
I
think
the
rtw.
The
project
has
been
lucky,
that
a
lot
of
the
people
who
were
there
from
the
very
beginning
are
still
there
like
yehuda
who
has
been
you,
know,
kind
of
one
of
the
principal
developers.
There
still
work,
you
work
for
inktank
stillworks
for
red
hat,
but
it
seems
like
a
lot
of
the
folks
that
get
assigned,
maybe
the
the
day-to-day
sort
of
feature
and
bug
fix
tickets
in
rgw
or
kind
of
newer
people
in
the
in
the
project.
E
It's
just
something
I've
noticed
so
maybe
there's
not
as
much
engineering
maturity
in
in
rgw
as
as
there
used
to
be.
I
mean
they
just
keep
adding
features
too
right.
They
they
spend
a
lot
of
time
on.
You
know,
versioning
multi-site
thinking
other
other
stuff
that
maybe
just
it's
becoming
more
complicated
and
you
have
to
put
the
complexity
somewhere
and
it's
it's
just,
maybe
becoming
a
little
bit
less
reliable
over
time.
It's
kind
of
my
impression.
E
E
I
mean
it's,
it's
been
pretty
reliable.
I
don't!
I
don't
want
to.
You
know:
poo
poo
it
too
much.
You
know
and
software
is
hard,
but
I
would
like
it
to
be
a
little
bit
more
reliable.
This
far
into
releases.
A
B
A
B
B
D
E
D
Okay
yeah,
because
we
we
yeah,
we
well
one
of
the
guys
worked
with
who
was
playing
around
with
with
octopus,
and
we
had
some
interesting
issues
with
the
raiders
gateway,
in
fact,
as
it
turns
out
and
and
the
orchestrator.
So
I
was
just
wondering
if
you
had
some
experiences
but
yeah,
I
think
easily
to
play
around
there
and
see
what.
A
Yeah
with
me,
I
guess
I
was
going
to
ask
on
the
call
if
anyone
has
managed
to
deploy
rgw's
with
seth
adm,
because
I
can
deploy
osds
and
mods
and
everything
perfectly.
But
as
soon
as
I
try
to
do
any
rados
gateway
admin
commands
like
creating
the
realm
and
things
they
just
hang.
It
seems
to
create
the
dot,
rgw,
dot,
root
pool,
but
that's
it
and
all
the
other
rg
rados
gateway
admin
commands
just
fail.
Just
hang
there
indefinitely.
A
Yeah
completely
new
cluster,
we
were
testing
it
out
and,
as
I
said,
yeah
getting
the
mons
deployed
and
getting
the
getting
osds
up
and
running.
That's
all
perfect,
but
I
don't
know
if
there's
some
sort
of
weird
bootstrapping
thing
where
you
actually
need
an
instance
of
a
router's
gateway
running
before
the
router
skate,
where
admin
commands
will
work.
I
traced
it
a
bit,
but
we
didn't
have
enough
time
to
really
go
and
delve
into
what
it
was
blocking
on.
A
But
you
know
if
you,
if
you're
sort
of
following
me
the
instructions
as
listed
which
is
about
you
know
you
do
the
things
like
creating
realms
before
you
actually
deploy
rgw
instances.
Those
ones
just
seem
to
hang.
A
I
think
you
could
yeah
and
I
think
that's
one
of
the
sort
of
potential
issues.
If
you
end
up
with
you,
know
partially
containerized
services,
it's
probably
fine,
but
it's
it's
not
ideal,
and
I
guess
it's
yeah,
I'm
just
interested
in
whether
I
was
holding
something
wrong
with
anyone
else's
encountered.
A
Yeah
yeah,
all
our
production
stuff
is
separate,
but
this
is
one
of
the
things
we
want
to
test
out,
because
I
think,
for
us
we
are
most
of
our
environment
is
very
containerized,
so
having
containerized
deployments
would
be
would
fit
in
very
well
with
our
current
infrastructure.
E
Are
you
able
to
speak
about
how
the
the
the
disk
mapping
works
if
you're
using
containerized
deployments
there
like,
for
example,
one
of
the
things
that
just
scares
me
about
thinking
about
doing
is
that
we
use?
E
We
have
you
know
the
sas
jbods
right
and
that
are
using
multi-path.
So
each
of
our
drives
has
two
addresses
and
then
they
get
put
together
into
a
single
multi-path
device
on
the
host,
and
so
that's
what
we
use
when
we're
trying
to
associate
a
drive
and
an
osd
together.
So
I'm
just
wondering
like
if
you're.
E
Do
you
have
a
layout
that
looks
anything
like
that
if
you
were
going
to
use
cdm
and
containerize
your
osd's.
A
I've
tried
both
the
consume
all
available
devices
and
choose
individual
devices
manually
and
the
all
available
devices
seems
to
be
do
a
pretty
good
job
of
choosing
the
right
things,
but
I
think
we
would
look
into
a
kind
of
a
spec
file
dictating
particular
you
know
exactly
what
type
of
devices
we
wanted
to
use
in
that
that
all
available
devices
that
certainly
went
wrong
once
when
I
didn't
pay
much
attention
to
wanted
marketers
being
available
for
use,
but
yeah
I
haven't,
haven't
delved
further
than
the
sort
of
bare
bones
of
using
that,
but
I
think
that's
one
of
the
things
we
want
to
try
next,
when
we
fiddle
around
and
do
you
do.
E
A
No,
so
what
we
do
we
have,
we
have
an
nvme
pool
which
we
use
for
indexing
for
the
rgw,
so
we
put
the
metadata
pools
on
that,
but
in
terms
of
the
osds
and
their
associated
walls
and
things
like
that,
those
are
all
on
the
spinning
disks.
E
Because
that
was
that
was
kind
of
my
questions.
We
we
have
separate
devices
that
do
the
the
what
they
used
to
call
the
journaling,
which
is
now
whatever
the
the
rocks
db
and
stuff,
but
I
was
just
wondering
how
how
smart
is
cdm
really
like?
If
I
have
you
know,
60
drives
in
a
chassis
and
I
have
10
solid
state
disks.
Is
it
going
to
correctly
kind
of
figure
out
like
that?
I
want
to
you
know,
use
10,
spinning
discs
with
one
of
these
ssds
for
for
journaling,
or
I
I
mean
I.
E
A
That
that
service
specification,
yaml
file
that
you
can
make
looks
pretty
all-encompassing
in
terms
of
what
you
can
filter
on.
So
certainly
things
like
models
and
vendors
and
paths
and
sizes,
and
things
like
that.
You
know
rotational
or
not,
and
you
can
put
placement
certain
placement
patterns
on
so
certainly
if
they're
that
they're
easily
distinguishable
in
certain
groups,
I
think
you
could
probably
do
quite
easily
with
that.
A
A
A
C
A
A
A
One
thing
I
thought
was
worth
mentioning:
if
anybody
didn't
see
it
dan
from
cern,
did
a
tech
talk
in
june
about
their
bug
of
the
year
with
the
lz4
compression
bit
flipping,
they
were
seen
it's
pretty
interesting.
So
if
you
haven't
seen
that
it's
fairly
short,
it's
only
like
a
half
hour
or
something
there's
a
youtube
link
in
the
pad
feel
free
to
you
know
turn
that
on
in
the
background
and
listen
to
it,
there's
a
very
interesting
problem.
A
All
right
we're
coming
up
again
against
the
hour
mark
anyways
here
sound
like
a
good
chat
today,
the
next
one
will
be
in
september.
A
Usually
the
fourth
wednesday,
maybe
the
fifth
wednesday
of
september,
so
I
have
a
conflict
on
the
fourth
wednesday.
Probably,
but
we'll
see
what
happens
unless
somebody
else
wants
to
moderate
in
place
of.
A
C
G
A
Oh
okay,
that
works
for
matthew
two,
so
maybe
that's
a
a
good
thing.
I
know
discern
or
dan
from
sarin
was
also
on
this
week.
That's
why
he
hasn't
joined
us.
I
don't
know
if
anybody
else
from
sharon
actually
ended
up
joining
or
not.
A
Anyways
I'll
send
out
those
emails
a
week
or
two
ahead
of
time
as
usual.
If
anybody
is
new
on
the
call,
I
do
send
the
emails
out
to
the
seth
users
list
easy
to
miss
stuff
on
there,
so
I
also
have
like
a
private
list
of
the
attendees,
so
you
want
to
get
those
emails.
Go
put
your
name
and
email
address
in
that
setpad
and
I'll.
Add
you
to
the
private
email
list
that
hopefully,
that
calendar
invite
actually
worked
this
time.
A
Other
than
that,
thanks
for
joining
and
we'll
talk
to
you
all
in
two
months
and
hopefully
no
terrible
outages
to
hear
about
next
time,.