►
From YouTube: Ceph Science Working Group 2020-09-23
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
my
quick
little
30
second
spiel.
What
this
is
is
we're
just
a
group
of
research,
sysadmins
or
big
clusters
assist
admins,
who
use
saf
getting
together
to
talk
anything
about
it
pretty
much
every
once
in
a
while
feel
free
to
contribute
to
the
conversation,
I'm
kind
of
just
I'm
not
presenting
just
trying
to
keep
things
moving
along.
You
know,
so
I
guess
I
started
out.
We
kind
of
just
had
a
few
general
topics
of
anybody
have
recent
outages.
They
want
to
own
up
to
their
faults,
something
else's
fault
since.
A
I've
got
one,
we
had
a
total
power
outage
for
four
hours
and
our
ups
is
only
lasts
about
20
minutes.
We
don't
have
generator
backups,
so
the
entire
data
center
went
down
overall.
Two
out
of
my
three
clusters
came
up
without
problems.
The
third
one,
which
is
the
bigger,
almost
10
petabyte
one
had
issues
just
regarding
two
holes
didn't
come
up
right
away,
need
a
little
kick
once
I
got
access
to
it
all
again,
just
during
the
boot
process.
A
The
other
stuff
was
just
corrupt
journals
here
and
there.
So
I
was
rebuilding,
probably
oh
10
journals
or
something
around.
They
seem
to
be
located
mostly
on
just
one
or
two
hosts
overall
big
outage,
but
first
off
not
my
fault,
which
is
always
nice
and
no
lost
data.
It
took
a
little
while
to
figure
everything
out
after
everything
got
up
and
do
a
little
rebalancing,
but
it
went
pretty
well
overall
if
nobody
else
has
any
outages
or
anything
kind
of
bugs
and
stuff
people
fit.
A
C
At
least
bug
wise,
I
added
I
added
something
to
the
the
list
there
in
the
in
the
pad
if
people
are
running
14
to
11,
actually,
if
they're
writing
14
to
11
or
any
think
recent
octopus
as
well.
C
A
One
thing
I
notice:
I'm
I
haven't
reported
it
and
I
don't
really
know
if
it's
a
bug
or
just
something
in
my
environment,
yet
my
file
store
osds
seem
to
not
be
cleaning
themselves
up
after
pg's
migrate
around.
So
I
get
artificially
high
disk
usage
on
some
kind
of
looked
into
it
a
bit.
A
I
haven't
found
like
anything
in
the
tracker
that
seems
relevant
to
it
yet
so
I
might
end
up
having
to
get
some
good
debug
logs
and
post
something
about
this
one.
In
the
end
I
tend
to
just
give
up
on
because
most
of
these
are
all
like
old
osds
that
were
done
with
seth
disk.
So
I
end
up
just
rebuilding
with
seth
volume
and
go
on
with
my
life.
It's
you
know
one
or
two
here
or
there
that
I
happen
to
do
it's
just
kind
of
a
bit
of
an
annoyance.
A
But
I
gotta
upgrade
coming
up
to
14
to
11
soon
and
see
if
it
still
happens
on
that
and
if
it
does,
I
was
gonna
report.
It.
A
A
All
right
next
stuff
on
the
list
was
just
in
general
how's
life
with
octopus.
Anybody
actually
move
large
clusters,
positive
negative
experience,
doing.
B
B
That
say
not
really
back
so
how
to
describe
with
kernel,
client
and
capabilities
when
you
change
the
when
you're
adding
new
pools
to
the
file
system,
for
example,
the
pro
the
changes
are
not
propagated
to
the
current
client,
but
probably
try
to
investigate
this
offline.
So
it's
on
some
clients.
It's
migrated
properly
on
you
know
some
others.
You
have
to
remind
a
file
system
which
is
kind
of
not
very
good.
Let's
say.
B
B
Which
which,
which
kernel,
is
that
it's
five
seven,
those
reasons.
B
B
A
C
Nautilus
and
the
gateway-
oh
yeah,
I
mean
well
just
because
tomorrow,
we're
gonna
upgrade
s3.stern.ch
to
to
from
luminous
to
nautilus
and
we've
tested
everything,
and
it's
would
be
okay.
We
already
run
our
second
s3
cluster
on
nautilus,
so
I
know
that
it's
fine,
but
I'm
always
worried
about
the
the
way
that
the
rattles
gateway,
like
all
of
the
region,
all
the
region
settings.
I'm
always
worried
that,
like
the
region,
settings
from
one
version
don't
work
to
the
next
version,
but
it's
hard
to
test
that.
D
We
should
count
it
as
a
good
experience.
I
mean
we
don't
have
massively
heavy
usage
of
rs3
endpoint
and
I'm
not
aware
of
anybody
making
use
of
the
region
features
or
doing
anything
complicated
with
it,
and
but
I
mean
we
do.
You
know,
there's
a
there's,
certainly
a
lot
of
internal
use
cases
where
people
are
dumping.
D
You
know
log
files
and
things,
and
then
we
have
a
growing
number
of
well
genuine
users
that
are
trying
to
think
data
across
from
their
on
normally
on
campus
experiments
that
with
various
random
things
that
they
download
off
the
internet
because
it
works
with
amazon
and
they
you
know
yeah
exactly
you
know
we
ask
them
what
they're
doing
and
then
oh,
I
saw
this
and
I
downloaded
it
and
it
works
fine
you're
like
okay.
Well,
there's
there's
several
of
them.
None
of
them
had
problems
when
we
did
it.
D
So
it's
not
exhaustive,
but
okay,
thanks
for
being
interrupted.
C
C
A
C
A
D
I
mean,
while
we're
on
the
topic
of
of
well,
I
guess
yeah
s3
upgrades.
I
mean
how
well
how
well
integrated.
Is
it
with
your
openstack
cluster?
So
if
you've
got
users
on
openstack,
are
there
various
things
that
they're
using
to
because
yeah
to
basically
automatically
access
s3
will
swift?
D
I
guess
the
integration
with
swift
is
better
because
that
that's
certainly
an
absolute
topic
at
rao
at
the
moment
and
and
people
are,
you
know,
adding
the
features
to
the
cloud
actively
because
there's
the
demand
from
the
users
that
want
to
be
able
to
you
know:
they've
got
projects
with
an
open
stack
and
they
wanted
to
be
able
to
create
containers
and
automatically.
That
would
be
the
in
the
s3
cluster.
I
don't
know
whether
that
is
a
have.
I
jumped
ahead
of
the
schedule
with
that
particular
question.
D
C
So
the
problem
in
the
past
was
that
if
you
use
okay
so
yeah,
we
we
want
the
same
thing.
We
have
openstack.
We
want
users
to
be
able
to
click
on
the
object,
store
button
in
openstack
and
manage
their
ect
credentials
also
to
use
be
able
to
use
swift,
although
they
don't
really
really,
they
don't
really
want
to
use
swift.
They
want
to
use
s3,
so
they
want
to
be
able
to
create
ec2
credentials
and
delete
them
and
manage
their
quotas,
and
things
like
that.
C
So
there's
that
there's
an
option
in
router's
gateway
to
just
do
the
s3
authentication
to
buy
to
let
to
let
keystone
do
the
s3
authentication,
but
the
problem
is
that
this
adds
an
authentication
hop
for
every
single
io,
the
router
gateway.
So
it
slows
things
down
a
lot
and
it
hammers
your
keystone.
C
C
So
what
we
did
was
jose
wrote
some
machinery
that
basically
synchronizes
all
of
the
ec2
credentials
from
keystone
into
our
rados
gateway
it.
It
writes
them
as
local
keys,
local
users
with
local
ect
with
local
s3
keys.
C
So
then
we
so
then
we
do
the
authentication
we
use
the
local
plug-in
for
authentication
in
router's
gateway,
and
then
that
makes
it
fast.
But
my
understanding
is
that
there's
a
proper
way
to
do
this
now,
but
I
don't
know
how
that
works.
I
think
there's
a
there's
a
better
way
to
do
that
kind
of
synchronization.
C
D
C
D
Yeah
I
was
just
kind
of
looking
through
the
various
clusters
that
people
I
think.
Well,
I
guess
most
people
yeah
people
are
using
rbd
for
openstack,
but
I
thought
I
saw
somebody
else
has
an
s3
buster
as
well
for
that
oh
yeah,
s3
and
rbd
storage.
That
was
matthew
vernon.
I
don't
know
if
he's
if
he's
on,
if
they've
I
don't
know,
he's
not
he's
not
attached
to
this
meeting,
but
he
did
put
his
his
information
in
the
in
the
seth
pad
or
sorry.
The
pad.ceph.com
thing.
D
Okay,
well,
we'll
keep
looking
at
that.
Thank
you,
though
dan.
That
is
useful
because
yeah,
we
would
have
probably
run
into
that
problem,
and
so
we
now
at
least,
can
ask
you
what
you've
done
or
for
the
details.
So
we
can
stop
running
into
the
s3,
because
we're
the
same
that
everybody
wants
to
use
s3
because
they've
just
heard
of
that,
and
even
when
you
kind
of
explain
that
there's
better
integration
with
swift
and
it's
approximately
the
same
in
terms
of
what's
logically
the
same
they
they're
like.
D
C
Right
yeah
so
much,
we
have
a
lot
of
s3
users
like
that.
I
mean
so
like
our
get.
Our
git
lab
has
like
70
or
80
terabytes
on
our
s3
cluster
into
two
buckets,
like
50
million
objects,.
D
D
And,
and
just
with
with
the
large
amounts
of
files
in
in
buckets,
because
we've
had
that
a
couple
of
times
that
you
know
bucket
has
somebody's
done
something
silly
and
they've
written,
you
know
millions
and
millions
of
files
and
when
they
get
really
big,
they
seem
to
be
really
difficult
to
manage.
You
know
we
often
them.
D
Unfortunately,
most
of
the
time
when
that
happens,
it's
somebody's
made
a
mistake
and
you
can
delete
you
know
you
can
often
just
delete
the
bucket,
but
I
I
kind
of
worry
that
there
might
be
a
case
where
somebody,
you
know,
there's
a
lot
of
useful
information
in
there.
Then
somebody
also
writes
something
stupid.
The
bucket
seems
to
start
performing
erratically,
and
then
it
takes
a
real
effort
to
actually
clean
up.
D
All
the
you
know,
the
you
know,
because
you're
having
to
list
you
know,
do
lists
and
things
to
find
the
objects
that
shouldn't
be
in
there
and
to
delete
them.
I
don't
know
if
anybody's
had
had
similar
kind
of
experiences
with
that,
or
is
that
just
I
mean?
Is
that
possibly
something
to
do
with
the
the
design
of
the
cluster?
If
we
need
to
put
the
shards
on
on
faster
discs
and
things
needed
to
be
more
performant,.
C
We
so
we
have
automatic
resharding
disabled,
and
one
month
ago
we
decided
to
resharp
the
the
get
lab
registry
bucket
that
had
32
million
objects
in
it,
and
the
users
use
rclone
synchronize
that
bucket
onto
another
three
just
as
a
backup,
and
they
found
that
after
we
restarted
from
32
to
512
index
shards,
the
their
backup
process
doesn't
complete
anymore.
C
They
have
a
one
one
hour
time
limit
and
now,
like
the
they,
they
benchmarked
the
the
list
operations
and
the
lists
are
like
ten
times
slower
after
we
restarted,
so
that
I
I
was
reading
through
the
nautilus
history,
and
I
found
that
someone
just
implemented
some
kind
of
optimization
for
for,
like
highly
sharded
bucket
indexes
like
that,
because
the
problem
is
that
when
you
have
so
many
shards
when
you,
when
you
do
a
list
operation,
the
rados
gateway
asks
all
of
the
shards
for
their
first
1000
entries
and
because
because
the
list
operation
is
paginated,
1
000
entries
a
time.
C
C
So
if
you
have
hundreds
of
shards,
you
end
up
with
hundreds
of
thousands
of
entries
to
sift
through
and
then
sort,
but
then
in
in
nautilus,
there's
some
kind
of
optimization
for
this,
but
it
doesn't
send
so
much
traffic
around
the
network
and
I
haven't
tested
yet.
But
I'm
hoping
that
that's
better
is
it
so?
Is
that
also
automatic
or
is
there
a
flag
that
we
need
to
step
to.
D
Have
to
check
with
with
tom
it's
interesting
that
yeah
we're
not
the
only
people
to
experience
that
it's
not
just
related
to
the
hardware
or
whatever,
and-
and
there
may
well
even
be
a
fix
for
it
or
an
improvement
at.
D
A
So
I
was
wondering
if
how
well
the
stuff
balancer
is
working
for
people
and
up
map
mode.
A
B
B
Goes
with
the
new
this
new
feature
with
the
resizing
of
the
placement
groups,
so
it
was
also
too
aggressive.
So
it's
nice
if
you
have
a
small
pulse
so
that
you
don't
allocate
too
many
placement
group
at
the
ones,
but
when
you
have
large
pools,
it's
typically
better.
If
you
do
it
manually
and
disable
the.
B
C
I
just
put
yeah,
I
think
this.
The
key
setting
is
up
max
up
map
max
deviation
because
the
default
is
plus
or
minus
five.
You
can
still
have
p.
You
can
have
osc
that
are
10
10
pgs
away
from
each
other,
and
then
the
balancer
says
it's
optimized
right.
That's
the
default!
Setting
so
check
that
you
have
that
one!
C
That's
look
back
and
see
what
she
can
do
self
config
dump
to
see
what
you
have.
A
A
A
So
one
other
thing
I
had
a
question
about:
was
anybody
using
ceph
csi,
not
necessarily
in
relation
to
you,
know
big
clusters,
but
in
general,
with
you
know,
just
kubernetes
and
integrating
stuff
with
it?
C
C
They
use
they,
they
use
it,
and
the
only
thing
is
that
is
that
it
was
like
tricky
like
we
had
to
tell
them
to
to
set
different
options
like
fuse
options
for
the
stuff
clients
for
ffs
yeah.
It
was
like
trick.
It
was
tricky
for
them
to
set
special
options
like
fuse
big,
writes
and
like
the
auto,
the
auto
reconnection
settings
that
we
like
to
customize.
A
A
A
A
A
A
All
right,
if
there's
nothing
else,
thanks
for
joining
a
little
smaller
group
today,
but
I
saw-
and
I
remember
got
a
few
emails
about
people
being
out
makes
sense
at
the
end
of
the
summer
might
as
well
take
advantage
of
it.