►
From YouTube: 2019-10-23 :: Ceph Science User Group Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
What
was
our
pleasure?
I
don't
know
if
we
need
to
go
through
really
in
detail.
I
just
wanted
to
quickly
say
that
there's
videos
posted
there
now
if
anyone
wants
to
go
through
we're
trying
to
work
with
mike
to
get
them
or
mike
mike
is
mike,
is
going
to
put
them
onto
the
channel
on
youtube.
I
guess,
but
we
have
a
problem
on
our
side.
A
A
One
thing
that
I
was
interested
in
in
the
topic,
sir,
was
how
tom
said
they
were
going
to
add
a
thousand
osds.
I
sent
them
an
email
and
they
haven't
done
that
yet
because
it's
curious
to
see
how
that
ended
up
for
him.
So
maybe
next
time
he'll
be
able
to
give
us
an
update
on
that.
B
Yeah
I
mean
I've
spoken
in
before
you.
You
have
a
question
about
single
large
versus
last
time.
I
asked
them
about
this.
They
have
a
basically
like
there's
a
the
higher
level
software
like
expert
d
that
they
use
to
expose
their
death
pool.
It
gets
complicated
if
they
need
to
if
they
need
to
expose
multiple
sap
clusters.
A
C
We've
not
seen
it
as
a
problem
as
yet.
I
don't
know
whether
it's
just
that
our
monsters
are
recently
nippy
and
their
monitor.
Storage
is
solid,
state
drives
or
something,
but
in
terms
of
map
updates.
When
we
lose
an
osb
or
one
comes
back,
I've
not
noticed
that
being
a
significant
impact.
A
B
B
Mixed
use
case
clusters,
and
that's
like
for
practical
reasons,
so
that,
like
with
the
with
the
block
storage
stuff,
you
can't
always
upgrade
the
clients.
So
I,
although
we
haven't,
had
any
tunables
changes
needed
recently,
but
you
know
in
the
years
past
we
were
having
to
upgrade
always
teamables
with
new
versions
of
stuff
and
the
running
running
vms
were
holding
us
back,
so
we
separated
for
that
reason,
also
yeah.
B
C
And
we
now
have
a
cha
proxy
in
front
of
our
advanced
gateways,
which
we're
currently
mostly
using
to
stop
one
particular
user
or
setup
users
from
dominating
the
service.
But
I
guess
we
could
also
use
that
to
tune
the
level
of
s3
availability
so
that
it
didn't
overwhelm
block
storage
as
well.
But
that's
not
something
that's
yet
to
be
necessary
for
us.
B
Yeah
we
have.
We
have
some
users
that
do
like
infrequent
big
like
like,
like
they
kind
of
hammer
the
s3
infrequently
and
yeah.
I
don't
think
that,
would
I
don't
think
the
block
storage
users
would
be
happy
during
those
during
those
moments.
B
B
Can
does
anyone
have
anything
to
say
about
nautilus?
We
we
just
upgraded
one
cluster
to
nautilus
last
week,
so
someone's
curious
about
that.
We
could
say
maybe
a
couple
of
things.
B
Same
here
mimic
to
nautilus
and
it's
it's
a
rattles.
B
Some
small
bugs
like
I
think
I
think
that
non
like
any
any
any
client
which
is
not
nautilus,
can't
run
status.
For
example,
you
get
this
bugaboo
type,
oh,
and
we
found
we
have
one.
We
have
one
issue
with
something
like
10
percent
of
the
time
that
we
stop
an
osd
in
nautilus.
B
B
A
I
haven't
run
into
that
osd
restarting
bug
yet,
but
I
haven't
been
bouncing
many
osd's
in
the
clusters
that
upgrade
so
maybe
I'll.
Try
that
in
my
test
cluster,
if
I
get
it
as
well.
B
E
C
So
we
haven't
done
that,
but
that's
what
we
are
planning
on
doing,
because
our
main
clusters
are
all
running
luminous
and
canonical
are
providing
luminous
and
northwest.
So
we're
expecting
to
try
upgrading
from
luminous
northwest
directly.
D
B
We
we
also
will
do
that,
similarly
from
from
luminous
to
nautilus
on
one
of
our
set
of
s
clusters,
but
we're
holding
back
at
the
moment,
because
I
think
there's
a
there's,
a
known
best
bug
in
nautilus
1424.
B
There
was
like
a
threat
on
deaf
users
a
couple
of
weeks
ago,
and
it
seems
that
the
mds
in
1424
has
a
bad
back,
has
a
bad
like
fix
and
a
bad
patch
in
it.
That
was
like
incomplete
backported
from
master
branch,
so
yeah
we're
waiting
until
that's
wait
until
1425
or
something
like
that
before
we
upgrade
any
cfs
cluster.
B
Yeah,
I'm
just
trying
to
find
it
now.
I
will
I'm
pretty
sure
it
was
1424
because
I've
been
like
waiting
for
that
fix,
but
I'm
not
I'm
not
finding
it
immediately,
yeah
I'll
I'll
come
back
to
that
later.
If
I,
if
I
find
it.
D
Okay,
so
our
company
have
several
clusters:
we
have
one
big
cluster.
Now
it
has
more
than
four
thousand
osd
is
running
and
five
rados
gateways.
This
is
object,
storage,
cluster
for
satellites
data
from
european
space
agency
and
this
data
reserved
to
our
cloud
for
clients,
and
we
have
several
smaller
clusters
about
1000
osd
for
forever
data.
For
for
our
back
office
data,
and
now
we
are
deploying
new
free
clusters
for
a
method
company
for
for
process
weather
data
also,
so
we
are
running
mostly
luminous
right
now.
D
And
we
are
trying
preparing
to
to
upgrade
this
biggest
cluster,
because
this
is.
There
are
some
issues
right
now
and
we
hope
that
the
decisions
will
be
gone
with
new
version.
A
Gotcha,
what
what
issues
are
you
referring
to.
D
B
Yeah,
I'm
not
sure
which
condition
you're
the
bug
it
might.
It
says
something
to
do
with
client
snap
caps.
So
maybe
this
is
the
affected.
It's
not
going
to
be
triggered.
If
you
have
snapshots
enabled,
maybe
you
don't?
I
don't
know
if
that's
true,
but
I'm
waiting
for
you
to
do
five.
Fourteen
two
three
and
fourteen
two
four
are
identical
releases.
There
was
just
some
set
volume
fix.
If
I
remember
correctly
yeah
I
saw.
B
C
F
B
Are
you
using
like
replication
between
s3
clusters
or
zones
or
regions
or
whatever
they're
called.
F
A
B
F
I
could
see
if
we
can
get
more
people
leveraging
s3
and
if
you
know,
sites
came
online
in
the
future.
That
would
be
maybe
a
better
way
to
move
forward,
but
our
sort
of
early
goal
was
to
try
to
provide
both
protocols
at
the
three
institutions,
and
so
the
idea
was
we
wanted
to
enable
cfs
and
have
a
stretched
cluster.
That
was
all
one
name
space
for
that
too.
B
A
B
This
is
actually,
this
is
something
different.
This
is,
this
is
just
more
s3.
Capacity
in
a
different
building.
This
week
has
have
some
bigger
use
cases
on
s3
and
we're
so
we've
yeah,
it's
just
a
new
cluster,
but
some
people
want
they
want
disaster
recovery.
They
want
to
be
able
to
like
write
to
a
bucket
in
our
main
data
center
and
then
magic
happens,
and
it's
replicated
to
a
second
data
center.
B
But
we
don't
want
that
for
every
bucket.
We
want
that
for
just
some
buckets
and
be
able
to
enable
like
say
this
bucket,
okay,
it
gets
replicated
without
the
user
knowing
and
then
other
buckets.
We
want
to
have
them
only
in
one
region.
A
Is
anybody
facing
any
big
problems
with
stuff
right
now?
Bottlenocks.
A
B
So
I
have
a
kind
of
crazy
idea
and
crazy
question.
Maybe
I
wonder
if
anyone
thought
about
this
before
on
some
of
our
servers,
we
have
lots
of
discs
and
few
ram
so
like
30
discs,
64
gigs
of
ram-
and
I
was
thinking
so
we
have
already
done
everything
possible
to
that.
B
The
osds
use
with
the
osd
target
memory
option,
but
I
was
thinking
of
trying
this
zram
compressed
swap
compressed
like
ram
on
these
servers
to
see
if
it
helps
in
any
way
and
on
I've
already
configured
on
one
server,
the
zram
swap
so
like.
Basically,
you
take
half
of
the
ram.
You
create
zram
block
devices
which
are
compressed
block
devices
in
ram,
and
then
you
enable
like
linux
swapper
to
just
like,
swap
out
inactive
pages
to
that
compressed
ram
and
maybe
maybe
buy
this.
B
C
C
No,
I
I
guess
we
found
that
these
things
have
worked
for
us
in
terms
of
like
we're,
using
a
fair
chunk
of
ram
on
them
and
but
we've
not
had
you
know,
hitting
swap
issues.
So
from
that
point
of
view,
we've
just
continued
to
buy
more
machines
at
the
same
spec
yeah.
C
B
B
B
A
B
A
B
Yeah
but
they'll
give
you
a
better
price
with
half
the
ram.
Do
you
what
kind
of
we're
going
to
put
in
the
block.db?
What
do
you
use.
B
So
what
kind
of
nvme
card
and
how.
A
C
No,
so
we
have
60
osds
in
the
box
and
each
of
them
gets
one
partition
which
will
be
on
one
or
other
of
the
cards.
So
where
we've
well,
we've
had
an
nvme
device
fail.
We've
just
had
to
rebuild
that
30
osds,
but
you
know
that's:
we've
got
enough
redundancy
in
our
cluster
that
we
can
just
do
that
and
at
the
risk
of
tempting
fate,
we've
only
had
one
nvme
card
go
south
so
far
and
we
do
monitor
their
where.
C
A
C
No,
they
I
mean
they
seem
that
I
I
think
at
the
moment,
the
most
worn
of
them
is
still
got
like
a
95
percent
of
its
where
life
left
and
that's
after
a
couple
of
years
of
use.
So
I
I
expect
we'll
probably
end
up
replacing
them
before
they
get
anywhere
near
the
end
of
their
rated.
Where
life,
so
I
mean.
F
I
can
give
a
data
point
on
the
nvme,
where
too
we've
got
a
kind
of
a
similar
situation.
We
have
four
nvmes
in
a
node
60
disks,
so
15
osd
db
devices
for
nvme
and
even
before
that
file
store.
They
start
with
file
store.
You
know
four
years
ago
and
our
oldest,
you
know
four
year
old,
nvme
they're
down
to
about
60
percent
of
their
available
life.
So
yeah,
I
think
I
think
you
know
lifetime
should
be
plenty,
at
least
in
any
enterprise
class
nvme
device.
A
F
F
Those
are
small,
nvmes
too,
like
the
first
ones
we
bought
were
400
gigabytes
apiece,
you
know,
and
over
time,
they've
become
larger
for
the
same
price,
just
naturally,
of
course,
now
they're
too
small
for
a
blue
store,
because
you
you
need
some.
A
F
F
B
F
Well,
that's
good!
We
sized
our
most
we're
our
newest
purchase
this
year.
We
we
kind
of
went
by
the
discussion
of
you,
know
the
level
one
of
the
levels
is
30
gigabytes,
but
leave
space
for
compaction,
so
assume
60
gigabytes.
So
so
we
sized
for
that,
but
we
have
some
space
even
above
that
just
because
of
that's
the
sizes
they
come
in.
F
D
We
configure
the
rxdb
options
to
fit
our
partition
of
nhlmi
device,
and
then
we
have
a
running
drop
in
our
ci
cd,
which
are
compacting
offline
device
until
it
fits
in
by
may
partition.
Only
and
you
you
have
to
have
enough
copies
to
be
sure
that
when
another
drive
will
fail,
that
you
have
still
data.
D
F
B
We're
doing
that
you
have
an
hpc
cluster
running
in
jobs
and
then
there's
four
four
osds
on
each
of
those.
Those
nodes
is
like:
okay,
maybe
300
nodes
like
that.
B
B
The
users
like
really
hammer
the
cpus
on
those
machines,
but
it
seems
okay
anyway,
like
nobody
complains
about
this
being
slow.
This
is
what's
stepping
fast
by
the
way,
with
the
kernel
mount,
and
I
mean
the
bigger
issue
is
that
we
have
two
different
teams
in
our
department,
like
one
for
storage,
one
for
hp,
like
slurm
main
slurp
support.
So
it's
like
getting
the
operations
working
between
our
two
groups
is
the
bigger
problem
because
they
want
to
reboot.
B
B
A
B
It'll
be
fine.
If
you
have
very
small
clusters,
then
it
can
be.
There
can
be
there's
some
kind
of
memory
issue
where
like
if
the
client
and
the
only
osd
are
on
the
same
machines,
there's
some
weird
internal
race
condition
that
can
happen.
But
if
you
have
large
clusters,
then
the
probability
that
the
probability
that
you're
reading
from
the
osd
that
you're
sitting
on
is
usually
quite
low.
So
so
it's
never
happened
to
this.
A
A
I'm
thinking
of
using
trying
a
little
bit
of
that
like
putting
a
condor
start
d
on
some
of
my
stuff
notes,
because
they're
a
bunch
of
cpu,
that's
idle.
A
lot
of
the
time.
A
Okay
suppose
we
can
wrap
it
up,
as
for
the
next
one,
they'll
probably
be
in
january,
we're
doing
every
other
month
and
every
other
month
ends
up
on
christmas
day.