►
From YouTube: Ceph Science Working Group 2022-01-26
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
I
think,
we'll
just
get
started
here,
so
I'm
kevin
give
my
quick
30
second
spiel,
for
if
we
have
any
new
joiners
who
haven't
been
with
us
before
we're
just
this
is
a
basically
an
open
discussion.
You
know
kind
of
birds
of
a
feather
style
once
every
two
month
meeting
of
people
who
work
in
the
scientific
computing
or
big
clusters
really
to
just
chitchat
about
stuff
and
topics
or.
A
Bugs,
whatever
there's
no
set
presenter,
I
just
kind
of
try
to
keep
the
conversation
moving
through
the
topic
list
on
the
the
chef
pad.
There's
a
link
to
that
in
the
chat
of
the
meeting.
A
You
haven't
done
that
feel
free
to
sign
in
if
you
want,
if
you
don't
feel
comfortable
doing
so,
you
don't
have
to
it's
just
kind
of
there
for
a
future
reference
of
tracking.
You
know
people
who
join
us
and
if
anybody
else
wants
to
contact
somebody
with
questions
later
also
keep
in
mind
these
meetings
I
recorded
and
posted
to
the
stuff
youtube
channel
week
or
so
after
we
do
this
or
whatever,
but
that
sad
sorry
about
the
late
like
one
day
notice
on
this.
A
It
slipped
my
mind
last
week
to
send
out
the
emails,
and
yesterday
morning
I
just
realized
oh
crap.
I
didn't
do
that
yet
and
when
we
still
got
a
decent
number
of
people
to
join.
A
So
with
all
that
said,
I
generally
just
start
out
with.
If
there's
anybody
who
hasn't
joined
us
before,
if
they
want
to
speak
up
and
share
some
basic
info
about
them,
their
institution,
they
do
research
or
cloud
whatever
and
their
staff
set
up.
If
there's
anybody
here.
B
C
C
So
we
are
providing
a
lot
of
yeah
data
management
services,
storage
services
and
also
other
cloud
services
for
researchers
and
yeah
scientific
use
cases.
Basically,
and
we
started
using
surf.
I
don't
know
like
six
seven
years
ago,
first
for
our
openstack
cloud,
but
then
it
increased
for
so
other
use.
Cases
like
s3
and
7ffs,
and
currently
we
have
yeah
like
five
clusters
with
different
yeah
with
different
servers:
different
storage
types
like
ssds,
but
also
cold
storage.
C
With
around,
like
I
don't
know,
seven
petabytes
or
something
like
that,
and
we
are
currently
in
the
process
to
buy
new
hardware
which
will
be
around
18
18,
petabytes
of
cold
storage
and
1.4
petabytes
of
nvme
storage.
C
C
So
basically
we
are
very
happy
with
stuff.
Of
course,
we
also
had
some
issues
some
due
to
self,
but
most
of
this
issues
due
to
normal
data
center
operations
like
losing
network
for
the
whole
cluster,
and
things
like
that.
That
probably
also
other
people
here,
experience
yeah,
but
on
the
whole,
we
are
quite
satisfied
with
self
and
we
are
investing
more
resources
and
money
into
it
and
building
bigger
clusters
and
bigger
infrastructures.
A
I
don't
I
don't
have
anything
right
off
the
bat
here.
The
thanks
for
joining
us,
I'm
happy
to
have
you
and
yeah.
I
think
you'll
find
some
interesting
things
from
other
people,
and
people
have
also
brought
up
very
interesting
problems
as
well.
That
they've
noticed
in
their
clusters
that
have
helped
others,
and
will
you
get
some
value
out
of
this
and
sounds
like
you're,
pretty
deep
into
stuff
with
the
five
clusters
and
buying
18
19
petabytes
worth
of
this
coming
in
a
couple
months
here
so
very
interesting.
C
Yep
yeah,
I
mean
we
have
probably
the
same
problem
as
everybody
else
here
that
we
need
to
provide
a
lot
of
storage
very
cheaply
and
our
budgets
are
somehow
not
increasing,
but
the
demands
are
increasing
very
much
mostly
to
all
these
deep
learning,
things
and
stuff,
like
that,
so
yeah
needs
to
be
put
somewhere
cheaply.
Yeah.
B
A
All
right
anybody
else
new
want
to
share
for
a
minute
or.
B
B
Neuroscience
even-
and
we
have
a
couple
of
hpc
clusters
and
usf-
has
the
large
scale
storage
for
them.
Each
of
them
is
about
storage,
about
30
petabytes
are
all
for
each
of
the
clusters
and
probably
99
cfs
to
use
that's
what
most
people
most
of
the
researchers
are
kind
of
used
to
file
based.
B
We
started
very
very
early
with
stuff
asset
children
in
the
hammer
days.
The
very
first
cluster
put
up
was
it
was
hammer
when
surface
wasn't
even
officially
stable
yet,
but
it
worked
well
for
us
work
better
than
some
of
the
alternatives,
and
I
think
that
that
that
that's
the
cluster,
that
kind
of
grew
into
this
30
petabyte
cluster,
that
we
have
now
we've
arrived
lots
of
interesting
things.
We
moved
to
a
different
building
without
bringing
stuff
down.
B
We
moved
by
moving
servers
slowly,
we
we
had
servers
flooded
because
we're
in
manhattan
in
the
town
and
at
one
point
we
were
sharing
the
building
with
other
people
and
other
people
plumbing
above
us
things
like
that
and
survived
that
as
well.
B
So
now,
I
think
most
of
our
kind
of
difficulties
come
from
being
large
that
we
have
over
a
thousand
clients
and
well,
I
think
I
put
an
item
on
the
list
there,
but
usually
our
problems
are
metadata:
lots
of
small
files
or
lots
of
random
access
to
files,
and
things
like
that.
E
A
Cool
thanks
for
joining.
I
know
I've
heard
of
flatiron
before.
A
A
That's
not
anybody
else,
wanna
go,
say
hi,
who
hasn't
been
here
before.
A
All
right
guess
not,
so
I'm
just
gonna
start
running
through
the
topic
list,
if
you're
on
the
pad
feel
free
to
add
anything
through
it.
You
know,
there's
just
a
few
usual
topics
that
I
go
through
like
anybody's
who's
in
the
last
couple
months
have
upgrades
that
went
major
ones
or
maybe
even
a
minor
one.
That
went
particularly
bad
for
some
reason
or
went
good.
A
F
Maybe
I
can
just
comment
on
while
back
so
I
didn't
write
it
to
the
path
yet
I
mean
it
was
discussed
before
and
it
was
not
actually
clear
what
is
going
on.
Then,
when
restarting
osds
on,
let's
say
1600
or
is
this:
there
were
typically
five,
the
other
five
ones
that
didn't
want
to
start
question.
They
had
to
be
reinitialized.
F
F
It
was
not
really
clear
why
it
happened,
but
the
time
it
happened
to
me
I
didn't
have
right
cache
on
hdds
enabled
I
mean
right.
Cache
was
enabled
and
might
be
related
to
that,
so
the
back
is
very,
very
confusing
how
to
part
to
trigger
right
and
impossible
to
debug.
F
F
A
Yeah,
that's
interesting,
that's
kind
of
wondering
if
that's
something
in
the
stuff
blue
store,
that's
corrupting
rock!
So
if
it's
a
like
a
controller
or
even
a
hard
drive
from
where
that
could
be.
F
D
Yeah
that
will
be
perfect,
at
which
version
the
cluster
is.
F
F
D
D
Exactly
does
remind
me
about
another
issue
we
had
in
our
biggest
cluster
in
middle
of
december.
D
Probably
we
get
alarms
from
our
monitoring
system
that
the
temperature
is
rising
in
the
data
hall
after
we
confirmed
that
there
are
problems
with
air
conditioning
and
the
provider
of
data
center
communicate
with
us
that
they
will
not
fix
it
in
a
short
time,
so
we
decided
to
power
off
the
cluster,
and
currently
there
is
about
6
and
500
osds
in
this
one
cluster
and
we
power
off
and
power
on
without
problems
we're
running
currently
still
on
no-tillers.
So
this
look
pretty
stable
to
reboot
the
cluster.
A
Just
curious:
how
long
when
you
turn
the
cluster
back
on,
how
long
did
it
take
for
it
to
kind
of
sort
itself
out
and
all
the
osds
were
available.
A
I
assume
you
set
all
the
the
no.
Whatever
settings
demands
to
minimize
anything
of.
D
D
D
A
I
want
to
clarify
one
thing
on
that:
right:
cache
and
that's
the
hard
drive
right
cache,
not
the
controller
or
even
if,
if
you're,
using
them
in
the
raid
mode
or
a
sas
controller,
it's
not
that
cache
right.
F
F
E
E
Because,
okay,
I
was
particularly
interested
because,
with
with
more
recent
hardware
deliveries
at
turn,
we
also
had
experimented
a
bit
with
the
with
the
drive
cache.
We
actually
changed
our
setting
to
the
right
through,
but
we
have
not.
We
run
some
tests
of
restarting
osds
and
even
reboot
boxes
and
hold
backs,
but
we
have
no
problems
whatsoever.
E
F
The
interesting
thing
is
that
those
drives-
let's
say
the
osd's
processes
were
long-lived,
so
they
were
not.
They
never
crashed
or
something
like
this
right.
Well
then,
I
restarted
after
let's
say
two
months
of
running,
not
sure
if
this
is
related
at
all,
probably
not,
but.
E
F
Well,
in
any
case,
when
we
do
restarts
again,
will
reject,
what's
happening,
maybe
report
on
the
mailing
list,
the
experience
if
this
reappears,
or
not,
because
also
after
disabling
right
cache
on
the
drives,
the
system
actually
works.
Much
much
faster
for
us.
B
You
know
we
had
the
same
experience,
also
that
disabling
right
caching
actually
improved
quite
a
bit
all
the
overall
performances
that,
coupled
with
the
fact
that
I
wouldn't
really
trust
the
times
drives
from
where
nowadays,
because.
F
I'm
not
completely
sure
I'm
not
going
to.
We
did
upgrade
the
kernel
to
the
5
15
on
the
self
service,
the
same
time
from
511
which
might
have
another
box,
although
this
is
very,
it
would
be
very
strange
that
kernel
upgrades
would
do
something
like
that.
But
after
both
this
disabling
light
cache
upgrading
the
kernel,
the
throughput
went
up
for
a
factor
of
three
or
four
in
certain
cases.
D
F
A
A
A
Another
interesting
thing
I
saw
was
in
the
latest
the
last
version
of
nautilus.
Maybe
I
missed
this
before,
but
there's
a
new
erasure
code
tool.
I
assume
it's
in
pacific
and
whatnot
as
well
for
manual
object,
recovery
from
damaged
pg's,
nice
disaster
scenario
tool.
If
anybody
hasn't
used
that
yet
or
hopefully
enough
nobody
has
to
use
it
really.
A
So
somebody
looks
like
wants
to
talk
about
multi-site,
connecting
several
clusters
with
one
realm
I'll
be
talking
more
in
the
gateway
sense.
D
D
Yeah,
it's
a
little
horrible
topic
for
us
and
I
tested
it
in
kraken,
probably
or
in
jewel,
and
there
are.
There
is
a
mess
in
the
code
probably,
and
we
we
go
back
to
this
right
now
and
looks
better.
Probably
the
classical
replication
of
objects
works
well.
But
if
you
try
to
do
more
more
more
more
non-classical
moves,
there
are
problems
or
bugs
which
nobody
hit
right
now.
D
So
I
will
skip
probably
to
the
next
topic
of
mine.
I
created
a
poll
because
in
openstack
clusters
we
wondering
how
people
in
community
are
they're
using
a
multi-tenancy
environmental
single
tenancy.
D
So
I
will
be
thankful
if
you
write
plus
one
about
your
configurations,
because
we
are
thinking
about
some
changes
in
tenancy
management
in
another's
gateways
which
are
connected
with
pistons,
and
we
are
thinking
if
people
are
using
constituency
at
all
or
maybe
all
decided
to
use
single
tenancy
because,
as
as
you
know,
all
the
software
supports
s3
protocol
and
in
their
soft
in
in
in
the
software.
We
don't
have
something
like
multi-tenancy
well.
So
when
you
have
enabled
multi-tenancy
in
the
cluster,
there
are
problems
with
classical
tools
like
goofy's
s3
fs.
A
If
I
understand
correctly,
it's
the
the
gateway
itself
is
what
you
were
saying:
it's
not
really
voted
for
using
like
the
keystone,
multi-tenancy
too
well
or.
D
Having
granular
permissions
for
buckets
in
this
same
tenant,
because
when
you
have
multi-tenancy
cluster
rather's
gateways
create
users
in
his
namespace
of
metadata
about,
depending
on
project
id,
only
not
on
user
id,
and
we
are
thinking
about
changing
this
part
of
code.
I
saw
also
some
I
will
pass
it
later.
D
A
I
think
it's
worth
probably
like
you
said
putting
in
a
tractor
and
seeing
get
some
more
community
input
on
how
people
could
see
it
breaking
or
backwards
compatibility
with
people
yeah
probably
would
definitely
use.
A
A
What's
it
like
worth,
it
really
wants
to
stock
up
and
give
more
details
about
that.
G
Sure
I
seem
to
be
having
a
little
trouble
with
my
camera
right
now.
Sorry
about
that
all
right.
A
G
So
we're
just
trying
to
figure
out.
You
know,
what's
the
best
way
to
build
in
flexibility,
going
forward
and
kind
of
cover,
our
bases
for
some
kind
of
failure,
scenario
and
I'll
do
our
best
to
avoid
failures.
Obviously-
and
it
seems
like
seth
is
pretty
good
in
a
lot
of
cases
of
taking
care
of
things
itself,
but
there
are
also
situations
that
do
come
up.
So
you
know
this
would
be
like
a
worst
case
to
two
use
cases
really
for
support.
G
One
would
be
you
know,
trying
to
get
some
advice
in
terms
of
seeing
further
down
the
road
or
further
into
the
ceph
internals
in
terms
of
best
practices
and
how
to
avoid
getting
into
a
bad
situation,
and
then
the
second
would
be
if
you
do
start
getting
into
a
bad
situation.
G
How
do
you
get
out
of
it
right?
So,
hopefully
we
won't
get
there,
but
you
know
good
to
have
some
insurance
policy.
I
guess
is
kind
of
what
this
is.
So
I
was
just
curious
if
folks
had
any
experiences
with
any
sort
of
outside
professional
support
and
what
that
was
like
and
if
they
thought
it
was
worth
whatever
the
cost
was.
A
H
I
guess
I
could
speak
up.
I
can't
I
got
oh
sorry.
This
is
graham
from
minnesota
supercomputing.
Let
me
find
a
camera.
I
could
be
a
real
person.
H
We
have
had
some
interaction
with
soft
iron,
though
I
can't
really
tell
you
a
whole
lot
useful
because
it
was
a
few
years
ago
and
we
ended
up
ultimately
not
using
them,
not
that
that
was
their.
You
know
their
fault.
I
think
it
was
just
that
the
project
was
kind
of
maybe
misconceived.
H
At
our
end,
I
think
at
the
time
we've
we've
been
using
seth
for
a
long
time
kind
of
mostly
as
a
object,
store
and
building
a
like
open
source
cluster
from
our
own
hardware.
Since
I
don't
know
2015
or
so-
and
I
think
it
was
seen
by
some
management
at
the
time
as
a
way
of
to
use
soft
iron
would
be
a
way
to
as
you
as
you
say,
get
out
of
the
business
of
having
to
support
and
know
it
ourselves.
H
I
think,
and
we
did
actually
get
a
small
software
and
cluster
to
test.
Ultimately,
we
didn't
go
with
it,
because
we
had
a
certain
amount.
You
know
had
developed
as
an
amount
of
like
not
knowledge
and
about
ceph
and
had
a
reasonable
amount
of
confidence
that
we
could.
We
could
maintain
it
and
our
main
storage
clusters.
You
know
we're
just
growing
them
over
time.
We
don't
do
a
rip
and
replace,
and
I
think
I
think
the
like
logical
fallacy
was.
H
You
could
actually
bring
in
soft
iron
hardware
and
add
it
into
an
existing
cluster
and
have
them
support
it
and
clearly
that
that's
kind
of
an
insane
expectation
it
would
have
to
be
its
own
system,
so
it
didn't
make
sense
for
us
to
rip
and
replace.
You
know
three
four
petabytes
of
new
hardware
to
go
in
a
new
direction
so
that
that's
kind
of
how
we
ended
up
not
using
it.
In
the
end,
I
would
say
at
the
time
you
know
the
hardware
seemed
cool.
It
was
pretty
new
at
the
time.
I
think
it
was.
H
This
was
about
2017.
So
maybe
there
was
still
some
teething
issues
with
some
of
the
like
drive
replacement
routines.
They
seem
to
have
a
lot
of
smart
people
on
on
staff
in
engineering,
so
I
mean
I
felt
like
they
were
pretty
good
outfit.
It.
It
just
didn't,
seem
a
good
fit
for
us.
Ultimately,
so
I
I'm
sure
that's
not
really
a
whole
lot
of
help
to
you,
but.
G
I
definitely
appreciate
hearing
your
perspective
on
that.
You
know
we're
we're
kind
of
in
a
similar
vote.
We
already
have
about
six
petabytes
raw,
so
kind
of
the
rip
and
replace
is
not
a
great
option
for
us
either.
A
H
G
Probably
I've
been
trying
to
get
some
hard
numbers.
I
don't
have
them
quite
yet,
but
it
does
look
like
most
of
the
support
organizations
have
some
sort
of
at
a
minimum
like
raw
size,
based
pricing,
and
then
a
number
of
them
also
have
on
top
of
that
kind
of
the
numbers-based
pricing
like
numbers
of
osds,
for
example,
is
a
common
one
where
they
have
different
price
ranges
so
yeah,
and
then
a
lot
of
them
of
course,
want
to
sell
you
their
own
hardware
stack.
H
Yeah
at
the
time,
soft
iron
would
make
a
big
deal
of
their
arm-based
architecture,
and
you
know
it
was
definitely
cool
hardware,
but
ultimately
I
don't
think
their
arguments
about
power
savings
added
up
to
us.
I
mean,
I
think,
all
the
power's
spinning
hard
drives
and
you
know
the
power
consumed
by
the
cpu
is
in
the
noise
I
mean
it
seemed.
It
seemed
like
the
same
overall
power
consumption
per
petabyte
as
our
like
hpe,
apollo
storage
notes.
So.
A
All
right,
animal
thanks,
well
jill
I'll,
show
you
I'm,
assuming
you
probably
have
a
similar
data
flow
to
me.
I
work
with
the
national
solar
observatory.
I
work
for
earth
observing
satellite
data
processing,
so
I'm
assuming
that
you
probably
have
like
a
fairly
steady
stream
of
ingesting
solar
observatory
data
and
similar
to
how
we
have
a
stream
of
always
incoming
satellite
data
and
the
method
I
took
for
kind
of
ensuring
uptime-
and
you
know,
issues
with
a
massive
cluster-
is
a
split
some
well,
I
built
like
a
buffer
cluster.
A
Basically,
so
all
of
our
the
data
we
ingest
from
the
satellites
goes
directly
onto
like
for
us
about
half
a
petabyte
cluster
worked
and
all
the
initial
processing
happens
off
of
there
and
then,
after
20
or
30
days,
that
data
gets
migrated
to,
like
the
you
know,
10
11
petabyte
archive
cluster,
but
the
idea
was
is
that
if
we
have
a
problem
with
the
archive
cluster,
we
have
this,
you
know
20
or
30
day
buffer.
A
On
our
you
know,
smaller,
we
call
our
production
cluster
where
we're
not
affecting
like
our
contractual
obligations
for
delivering
satellite
data,
or
anything
like
that.
You
know
we're
still
running,
even
if
our
big
cluster
is
down
and
then,
if
the
little
cluster
had
a
problem,
it's
easy
enough
to
flip
over
the
processing
streams
and
ingest
and
processing
streams
to
work
off
of
the
the
big
cluster.
A
A
It
sounds
like
a
good
idea.
It's
worked
fantastically
for
us.
We
had
a
power
event
a
couple
months
ago
that
actually
killed
a
bunch
of
10,
gig
switches
and
big
cluster
was
down
for
like
about
a
week
and
we
just
the
small
cluster,
kept
everything
running,
and
then
we
were
still
ingesting
and
delivering
our
products
nasa
and
everything,
and
really
saved
us.
Having
that
small
cluster
as
the
buffer
there,
nice.
G
G
Yeah
yeah,
some
of
the
science
calibration
on
some
of
the
instruments
has
been
completed,
but
it's
gonna
start
with.
I
think
four
observing
instruments
and
then
there's
a
fifth
that'll
come
online,
maybe
a
year
or
so
from
now,
and
then
there's
potential
for
other
instruments
to
be
developed
and
deployed.
So
it'll
be
a
moving
target
over
time
and
we
really
since
they
haven't,
really
been
taking
observations.
G
Quite
yet
we're
not
sure
what
any
of
our
patterns
will
look
like
in
terms
of
ingest
from
the
summit,
or
you
know,
users
searching
and
downloading
data,
but
it
does
sound
like
having
you
know
that
small
buffer
cluster,
as
you're,
describing
it
just
from
a
structural
perspective,
makes
a
lot
of
sense.
A
Yeah
yeah,
I
I
think
it's
worth
it
and,
like
I
said
it
sounds
like
we
probably
you'll
end
up
having
a
similar
data
flow
to
us.
A
E
Yeah,
that
is
actually
me,
so
this
is
actually
an
upgrade
that
we
did
this
week
at
cern
on
our
main
cfs
cluster.
E
What
we
typically
have
there
is
four
active
mds
and
for
standbys,
and
what
we
were
doing
in
previous
upgrades
was
just
to
follow
the
standard
procedure
of
reducing
the
number
of
active
mdss
down
to
one,
and
we
started
running
the
new
version
and
then
activate
the
other
mds
as
well.
E
E
The
four
active
ones
in
one
go
and
the
standbys
to
cover,
and
I
think
that
overall,
the
the
file
systems
were
were
down
for
roughly
four
minutes,
which
is
a
pretty
good
with
respect
to
what
has
happened
in
the
previous
updates,
because
reducing
number
of
active
mdss
typically
takes
a
long
time.
In
our
case
and
and
and
then
you
start
seeing
slow
iops
coming
in
so
it
was,
it
was
pretty
good
to
be
able
to
make
this
just
in
one
shot
and
and
and
stopping
all
the
actives
at
once.
E
E
So
we
think
that
if
we
lower
the
amount
of
memory
that
we
the
the
memory
limit
for
the
mds
is
we
could
we
could
make
the
downtime
even
shorter,
but
it
was
quite
successful
at
the
end
of
the
day
we
it
was
the
first
time
we
were
doing
the
upgrade.
This
way
we
have
checked
with
some
cfs
experts
first
and
and
tested
on
some
test
clusters
before
doing
it
in
fraud,
and
it
was
good
we
will
look
into
if
it
is
possible
to
to
make
a
piece
of
documentation.
B
One
question
about
the
the
cluster
we're
also
looking
at
at
some
point:
we're
running
nautilus,
mostly
and
also
talk
quality
pacific.
B
E
E
Lately
we
see
an
increased
use
case
by
kubernetes
clusters,
because
it's
very
practical
to
use
fast
by
via
persistent
volume
claims,
and
I
would
say
that
we
are
living
at
around
80
million
I
nodes
and
10
000
mds
sessions
or
less
the
numbers
that
that
we
see
on
a
on
a
normal
day
of
vision.
E
E
I
can
tell
you
right
now,
but
we
are
using
multiple
active
mdss
for
this
very
reason,
and
we're
also
pinning
specific
mds
is
to
share
a
bit
alone
across
them.
We
know
that
it's
not
it's
not
that.
E
It's
not
that
widespread
to
have
many
multiple
running
mds's
at
the
same
time,
but
this
has
been
quite
successful
for
us
until.
B
E
Not
yet-
and
this
is
something
that
we
would
like
to
do
in
the
future-
we
are
looking
into
that,
but
it
has
not.
It
has
not
touched
the
production
yet
and
and
not
on
this
classroom.
We
have
another
cluster
running
pacific,
which,
which
will
be
the
first
one
in
which
we
will
enable
snapshots.
B
E
B
E
B
So
there
is,
there
are
some
active
issues
in
the
inside
the
mds,
with
snapshots
enabled
for
kind
of
large
caches.
As
soon
as
you
have
like
millions
of
inodes
in
the
the
nds
cache
there
are
loops
inside
the
mds
that
they
for
each
inode
in
my
cache
do
so
so,
and
these
get
kind
of
hit
wants
snapshots
snapshots
are
in
the
involved.
We've
been
talking
to
devs
about
this
and
they're
looking
into
it
for
a
month
and
a
half
now
or
so,
but
maybe
two
months
so
far.
There
is
no
no
solution.
B
B
Then
we,
when
we
kind
of
ran
into
this,
we
removed
all
the
stuff,
but
I
think
at
that
point
the
damage
was
done
and
the
problems
stay
so
any
any
kind
of
like
directory
removals.
Before
we
created
snapshots
the
directory
gonna
remove
it,
for
example,
20
milliseconds.
F
Well,
because
we
have
snapchats
as
well
because,
but
we
have
very
few
of
them
and
what
we
noticed
is,
if
you
have
too
many
actually,
each
snapshot
increases
the
latency
because
of
the
multiple
lookups
in
each
snapshot.
Right,
that's
why
I
removed
them,
but
I
didn't
actually
notice
a
real
performance,
slow
down.
B
I
think
there's
another
there's
a
separate
issue
in
the
kernel
client,
also
relating
to
that
there's
also
a
tracker
for
that.
I
can,
but
that
if
there
is
a
if
certain
operations
trigger
some
busy
loop
inside
the
common,
the
workers
there's
all
the
self-worker
threads.
So
you
see
like,
in
top
one
of
the
kernel
threads
taking
out
things
not
progressing
too
much.
B
B
Yeah,
it's
been
outstanding
for
a
long
time
and
right
now,
I'm
kind
of
work
rolling
out
the
work
around
so
potentially
modifying
the
client
on
our
side,
modifying
the
clients
so
that
they
don't
do
these
operations.
Gonna
move
files
out
of
the
way,
I'm
holding
area
that
the
cooperation
is,
the
textual
instant.
B
On
the
mds,
I'm
rolling
out
of
a
fixed
war
crimes.
Actually
this
well,
something
can
be
fixed
on
me
on
the
side,
I'm
kind
of
not
don't
know
enough
about
the
nds
to
start
changing
coding
and
the
client,
and
more
so
again.
Yes,
definitely.
A
Well,
with
nautilus
end
of
life,
your
any
fix
for
this
by
the
developers
will
probably
take
an
upgrade
to
octopus
to
get
at
least.
B
Probably
I
mean
I,
I
would
probably
patch
nautilus
for
it,
but
the
the
the
loops
in
question
that
are
problematic.
Look
pretty
much,
plus
optical
synthesis
that
that
okay
they're
called
recorders.
It
looks
pretty
similar.
I
don't
think
it
is
gotcha
we're
looking
at
that
upgrading
to
octopus,
but
we're
kind
of
consumed
with
this,
this
problem
for
the
past
month.
So
that
would
put
a
hold
on
that,
but
I'm
happy
to
hear
from
the
the
star
and
upgrade
went
well
and
it
gives
us
a
plus
to
do
that.
B
Anybody
runs
saffas
on
pacific
successfully
or
any
kind
of
tourism
upgrading
this
ffs
here
in
pacific.
F
B
B
That's
what
I
wrote
and
we
kind
of
talked
about
a
little
bit
yeah
besides
the
the
snapshots.
The
other
issue
is
that
we
have.
We
have
jobs,
that
kind
of
access,
especially
machine
learning,
that
access
files
in
a
very
random
fashion,
very
large
number
of
nodes,
and
we
we've
had
difficulties
with
the
mds
keeping
up
really
causes
overall
slowdowns
in
the
system,
since
the
mts
has
a
certain
number
of
ops
per
second
that
it
can
do,
perhaps
tens
of
thousands,
maybe
10
000,
maybe
in
the
thousands
somewhere.
A
This
just
be
bad
gold.
A
Like
I
mean,
if
they're
doing
really
random,
you
know
the
people
using
these
gpus
could
just
be
opening
and
closing
the
file
for
every
read
they
do,
instead
of
holding
it
open
or
like
causing
too
much
if
you're
opening
and
closing
it
a
thousand
times
and
every
five
seconds
there's
something
to
do.
Read
a
couple:
teleports,
that's
just
bad
code
on
whoever
wrote
this
anyways.
B
Right,
some
of
it
is
bad
code,
and
can
you
prove
it
somewhere
right?
So
very
often
it's
like
a
million
files
and
they
pick
a
random
one
and
read
that
and
feed
the
gpu
and
pick
another
random
one
from
the
millionaire,
the
the
algorithm,
some
like
stochastic
gradient,
which
is
based
on
the
idea
of
randomness,
so
they
actually.
The
goal
is
to
do
this.
B
And
I,
and
probably
they
should
use
some
kind
of
database
or
something
other
than
that
we,
the
other
or
local
storage.
Sometimes
it
fits
in
local
search
and
then
we
usually
tell
them
that
why
don't
you
just
copy
them
to
a
local,
nvme
or
other
gpu
nodes?
We
keep
some
nvme
and
just
change
the
data
there
and
then
feed
the
feedback
in
that
way.
But
if
it's
bigger
than
the
under
me
yeah
and
then
that
that's
a
tough
one,
yeah
it'll
be
tough.
B
If
it's
a
very
random
reads:
yeah,
it's
very
random:
it's
not
it's
not
that
small
data
set.
Sometimes
it's
in
hundreds
of
gigabytes
in
total.
F
I'd
be
interested
into
to
see
to
know
more
about
this,
maybe
offline,
but
let's
say
we
also
have
typical
hpcu
use
cases
where
people
are
processing
like
something
like
10
million
files
in
relatively
short
time,
and
we
didn't
notice
this
low
times,
those
let's
say
colonies
of
mds
well,
for
a
particular
user,
there
might
be
a
a
bit
of
a
slowdown
with
doing
ls
or
finding
files
and
stuff
like
that.
F
F
A
I
was
gonna
say
something
like
that
too,
if
you,
if
all
this
gp
work,
is
on
a
certain
data
set,
you
could
always
pin
that
to
a
certain
mds
in
hopes
that
it
doesn't
go
down
like
the
mds
is
for
other
people.
B
I
mean
the
the
other
approach
that
we've
tried
with.
Some
of
these
use.
Cases
was
to
use
to
mount
an
rbd,
read
only
to
set
up
a
data
set
and
system
and
an
rbd
and
mount
that
read
only
in
which
case
there's
no
ideas.
Some
extent
that
war
sometimes.
A
Okay,
yeah
somebody
wants
to
talk,
looks
like
difficulties,
understanding
gateway,
space
usage
on
small
objects.
F
F
So,
for
example,
instead
of
two
terabytes
images
of
100
kilobytes,
so
several
of
them
several
million
took
something
like
five
times
more,
which
was
pretty
pretty
strange
to
me.
But
anyway
this
might
be
a
minor
issue
and
come
follow.
It
probably
follow
it
up
later,
but
ssfs
has
a
lot
of
settings
for
striping
for
object,
size
for
for
minimum
object,
sizes
and
so
on,
but
I
couldn't
find
anything
relevant
for
for
the
gateway
or
the
object
store.
D
Yeah,
I
will
put
general
if
you
want
get
more
information.
You
can
catch
me
on
erc,
but
generally
there
are,
as,
as
I
remember
correctly,
three
main
options
for
that
about.
One
is
for
about
striping
on
raddus
level
and
allocation,
and
another
important
is.
D
I
have
some
drawings
about
it
when
I
re
researching
about
it,
but
they
need
to
find
out.
F
D
F
F
One
is
a
multi-site
which
clusters
which,
where
one
is
on
ipv4
the
other
one
on
ipv6
and
it's
very
difficult,
then
to
use
them
transparently,
especially
because
both
are
used
for,
let's
say
for
kernel
mounting
and
then,
if
you
enable
dual
stack,
it
doesn't
work
as
expected.
A
A
All
right
well
we're
a
bit
well
past
the
hour
here,
so
I
suppose,
probably
should
start
wrapping
up
a
couple
things
I
wanted
to
mention
quick
was
looks
like
cephalocon
is
still
going
on
april.
A
Who
knows
what
travel
restrictions
will
be
like
from
europe
to
the
us
at
that
point?
Hopefully
some
of
you
all
can
make
it
we'll
see
what
happens.
I
did
have
a
birth
of
a
feather
for
ceph
and
scientific
computing
and
large
clusters
get
accepted,
so
we're
gonna
do
one
of
those
on
site.
Hopefully
people
can
attend.
A
A
The
other
thing
was
there
was
some
interest
in
possibly
rescheduling
this
to
like,
like
a
monday
or
tuesday.
At
the
same
time,
I
threw
a
doodle
pull
out
in
the
pad
if
anybody
wants
to
go
fill
that
out
this
time,
slot
conflicts
with
the
seth
weekly
leadership
meeting,
so
yeah
there's
some
reason
to
maybe
switch
to
a
different
day.
A
If
you
have
any
opinions
on
that
feel
free
to,
like
I
said,
fill
out,
the
doodle
poll
submit
it.
Otherwise
I
might
just
go
ahead
and
switch
it
to
tuesday
and
for
the
next
time
and
see
how
the
turnout
goes.
A
A
A
All
right,
yeah,
we'll
figure
it
out
as
we
get
closer
anybody
else,
any
closing
thoughts
or
anything
quick.
We
want
to
discuss
as
well
bring
up.
A
All
right,
just
not
well
thanks
for
joining
we'll,
maybe
see
something
about
suffocating,
maybe
not
I'll.
Let
you
know
do
the
list
about
the
next
meeting.