►
From YouTube: Ceph Developer Monthly 2021-05-05
A
All
right
looks
like
we've
got
a
quorum.
Why
don't
we
get
started
so
welcome
to
cdm
for
may
today.
We've
got
two
topics
on
the
agenda.
The
first
one
is
optimizing.
The
osd
map
for
client
uses
kifu.
Do
you
want
to
take
it
away.
B
Yeah,
let
me
start
with
the
motivation
I
I
explained
it
briefly
over
the
middle
east,
but
I'll
repeat
it
in
in
this
meeting.
B
So
the
problem
is
that,
in
a
very
typical
use
case
in
a
real,
fast
cluster,
for
example,
a
all-flash
flood
cluster,
it
just
completes
a
rebalance
or
recover
the
balance
rebalance
complete
very
quickly
and
the
rebalance
leaves
the
cluster
with
a
lot
of
oc
maps,
but
before
before
that,
before
before
that
some
cluster
didn't
get
a
chance
to
pick
up,
these
updated
maps
is
they're
far
behind
of
these
updates
over
this
oc
map.
B
B
B
So
there's
a
couple
couple
alternatives
out
here.
B
B
And
the
second
option
is
to
is
purpose
by
me
to
to
to
use
the
final
grand
blocks
in
a
monitor
in
hopes
to
to
have
a
more
parallelizing
polygon
in
the
monitor
when
it's
serving
the,
for
example,
the
the
hosting
map.
B
I
think
that's
a
that's
a
one
use
case,
probably
there's
more
because
we
we
did
this
before
to
to
to
encode
the
to
to
update
the
mapping
of
those
of
the
pg2
to
usd,
but
that
greg
sends
an
email
to
to
explain
his
concern,
because
inherently
the
the
monitor
need
to
access
the
single
data
structure
of
the
rxdps
for
for
writing,
to
updating
the
roxdb
and
for
for
reading
from
it
in
a
pre-processing
stage,
even
erased
from
active
in
pre-pressing
stage.
So
to
to
use
a
final,
grand
or
lock
is
very
scary.
B
C
So
do
you
mind
if
I
ask
a
couple
questions
so
we're
specifically
concerned
about
the
scalability
of
the
monitor
serving
osd
maps
to
clients
right?
Yes,
that's
not
a
logging.
D
C
It's
just
a
read,
greg's
worried
about
trying
to
parallelize
updates,
which
is
not
which
is
not
relevant
here,
and
it
also
means
that
finer
grade,
locking
is
also
irrelevant
here.
C
C
Secondly,
we
don't
get
faster
with
more
monitors.
So
I
disagree
about
your
characterization.
That
increasing
throughput
at
the
monitor
is
the
more
general
solution.
The
more
general
solution
is
to
reduce
the
work
the
monitor
has
to
do
in
the
first
place,
and
in
this
case
I
think
this
is
even
easier
than
the
original
proposal
makes
it
sound.
C
C
A
C
C
Solutions
here
are
to
modify
the
client
to
optimist
or
to
up
notice
when
it
doesn't
need
the
innovating
maps
and
make
a
point
not
to
ask
for
them
and,
secondly,
to
change
up
the
osd
map
requesting
pipeline
on
the
monitor
to
be
more
parallel.
That
should
be
straightforward.
I
don't
think
there's
any
way
that
it
could
cause
problems.
The
only
exception
would
be
that
the
monitor
needs
some
kind
of
widely
available
internally
published
notion
of
the
most
recent
monitor
map,
but
even
that
can
be
allowed
to
be
somewhat
out
of
date.
C
Won't
work,
no,
no!
No,
we
can
read
them
out
of
roxy.
We
don't
have
to
hold
them
in
memory.
It's
just
that.
If
a
client
request
quote
the
most
recent
unquote
map,
we
need
to
know
what
that
is.
That's
the
only
piece
of
shared
immune
of
shared
mutable
state
that
this
theoretical,
parallel
osd
map
serving
process
needs
to
worry
about,
and
even
that's
allowed
to
be
out
of
date,
so
it
doesn't
even
need
to
be
perfect.
A
And
which
part
of
the
osu
app
serving
is
expensive,
is
it
encoding
the
map
or
is
it
fetching
for
box
db?
We
have
the
memory
cache
already.
C
I
presumed
that
it
was
actually
dominating
the
the
primary
monitor
box,
so
the
goal
here
isn't
so
much
to
reduce
the
cost
it's
to
get
at
the
hell.
Out
of
the
monitor,
update,
bottleneck.
D
Right,
I
think
we've
seen
clusters
where
it
actually
is
just
that
the
map's
big
and
it
takes
a
long
time
to
encode
and
decode
from
the
various
buffer
lists.
We
have
it
in.
C
D
A
B
C
X
info,
I
believe,
is
always
uninteresting
to
clients,
if
I
recall
correctly
yeah,
I
think
there's
some
other
stuff
too
yeah.
That's
true.
So
that's
four
distinct
things
we
can
do
to
solve
the
problem,
but
I
think
the
biggest
most
important
one
is
that
clients
that
don't
need
intervening
maps
shouldn't
ask
for
them.
D
That's
the
thing
we
can
do,
but
it
doesn't
help
us
with
existing
clients
in
the
field
and
I'm
actually
not
sure
it's
the
most
important
in
these
degenerate
cases,
because
clients
are
trying
like
a
lot
of
times
like
when
this
is
a
problem.
It's
usually
because
there's
a
lot
of
cluster
change
happening
and
the
clients
are
like
trying
to
get
rights
through.
D
C
D
Yes,
yes,
yes,
but,
and
we
can,
we
can
look
at
trying
to
parallelize
osd
map,
answering
requests,
but.
A
A
What
about
when
does
the
client
request
it
from
the
map
from
the
monitor
versus
getting
the
map
automatically
from
the
osds,
when
it
tries
to
talk
to
them.
A
I
guess
I
thought
it
was
only
the
case
that
it
asked
the
monitor
for
the
map.
If
it
wasn't
currently
doing
I
o
to
osd's
or
was
very
out
of
date.
Do
you
does
anybody
remember
exactly
exactly.
D
D
C
C
The
for
for
the
client
to
know,
without
a
date
in
the
first
place,
someone
has
to
uphold
it.
So
in
the
case
that
it's
an
osd
that
told
it,
the
osd
is
supposed
to
give
it
the
maps,
that's
missing
at
the
same
time,
so
we
will
need
to
know
under
what
conditions
this
is
happening
to
the
client
like.
Is
it
that
the
client
sent
a
request
to
the
monitor?
C
B
E
B
C
C
Different
ways
to
ask
the
monitor
for
a
for
a
map-
this
is
only
the
one
you're
describing
is
only
one
of
the
behaviors
that
the
monitor
has
so
it
could
be
that
it
remembers
the
map
at
last
had
and
every
time
it
wakes
up.
It
always
requests
all
the
maps
back
to
that
map,
whether
it
needs
them
or
not.
C
D
C
A
B
You
say
it
was
simon.
Yes,
he
was
actually
other
of
the
pr
I'm
performing
in
impact.
D
C
B
Okay,
so
the
let
me
summarize
the
discussion
a
little
bit
so
the
the
first
thing
might
be.
We
need
to
do
is
to
to
define
the
to
understanding
the
behavior
of
the
client.
Whites
would
need
to
ask
for
the
incremental
map,
even
though
we
have
we
offer
offers
the
protocol
in
which
one
the
client
is
able
to
ask
for
the
full
map.
What
is
far
behind
the
cluster
hdmi
folks.
C
C
B
Okay,
I
see
you
updating
the
easy
apparent.
So
if,
if
this
is
the
existing
behavior
of
the
other
kcfs
client,
we
need
to
to
fix
it
so
that
the
client
can
ask
for
the
latest
map.
It's
very
likely.
What's
what
is
proposed
in
the
in
the
pool
pull
request,
but
it's
trying
to
fix
the
an
add
a
setting
to
do
it
to
do
this.
B
C
B
Okay,
that's
my
I
I
was.
I
was
wondering
why.
Why
does
all
that?
Go
all
the
way
through
all
the
trouble
updating
the
monitor
side
to
address
the
problem
from
the
client
funny,
because
he
also
updated
the
client
client
to
to
to
get
get
his
problem
fixed
anyway,
I
will
talk
to
him
offline
to
to
find
out
more
details
and
I'll
get.
You
update,
update
you
guys
over
the
mailing
list.
G
E
Yeah,
but
before
we
move
to
the
next
topic,
I
was
just
thinking
the
idea
of
the
lighter
weight
was.
The
map
is
also
not
bad,
generally,
probably
explore
whether
whether
that's
needed
not
needed
or
how.
How
far
can
we
get
with
that.
C
A
It
depends
whether
it's
using
today,
if
it's
done,
decoding
those
sections
like
the
x
info
and
it
doesn't.
D
A
C
A
C
C
E
D
I
think
there
are
some
things
it
needs,
but
if
the
client
doesn't
need
anything
out
of
the
extended
osd
attributes,
we
could
set
the
extended
osd
attributes
version
to
zero
or
one
or
whatever.
That
is
the
minimal
size,
while
maintaining
the
rest
of
the
osd
at
its
current
state
or
the
rest
of
the
map
at
its
current
state.
B
I
think
we
can
just
audit
the
behavior
of
the
other
clients.
I
think
that's
that's.
That
kind
of
triggered
this
resume
then
work
to
to
understand
what
it
is
interested
and
if
it's
only
interesting,
the
first
part
I
think
we
can.
We
can
just
skip
the
second
part
when
serving
the,
but
we
cannot
differentiate
the
the
the
client
by
looking
at
it's
a
it's
an
entity
because,
as
the
iliad
mentioned
that.
D
B
For
example,
civ
cri
uses
a
client,
but
it
wants
the
full
version
of
the
host
map.
That's
that
might
be
an
exception,
but
we
have
no
way
to
differentiate
to
tell
if
it's
the
kind,
it's
if
your
eye
or
not
or
it's
a
case
of
ls
client.
C
C
Our
ability
to
correct
this
in
the
short
term,
but
as
far
as
I
know,
this
is
only
a
modestly
sized
problem
and
if
we
need
to
do
something
drastic
in
the
short
term,
we
can
do
that
too,
but
we
should
definitely
address
creating
a
cut
down
version
of
the
osd
map
with
only
things
the
client
cares
about.
Also
splitting
the
osd
map
into
a
version
encoded
for
osds
and
a
version
of
code
for
clients
will
actually
make
maintenance
easier.
C
C
C
So
I
think
we
should
do
essentially
all
of
the
things
discussed
here,
just
perhaps
not
at
the
same
time,.
A
Yeah,
I
wouldn't
say
everything's
highest
parity,
but
we've
seen
one
real
problem
which
is
solved
by
the
latest
client
change
and
the
further
optimizations
can
improve
the
ability
of
the
monitors
type.
B
I'm
I'm
trying
to
figure
out
the
exact
step
we
should.
We
should
follow
if
we
want
to
implement
the
current
version
of
the
host
map
to
before
that.
I
want
to
have
your
opinions
on
on
this
exam
step.
If
we
want
to
go
this
way,
I'm
pasting
I'm
updating
the
it's
pad
with
a
step
in
the
very
end.
D
I
think
what
you've
outlined
is
is
a
good
approach.
It's
just
that,
like
the
details
will
depend
on
exactly
what
things
are
going
to
be
in
both
both
the
client
and
the
os
and
the
server
side
version
of
the
map.
Yes,
so
like
identifying
that
very
carefully
and
then
being
like.
Yes,
the
solution
works
or
no
there's
something
a
little
trickier
that
we
need
to
deal
with
is
I,
I
think
the
way
to
go.
C
C
C
C
If
sorry,
I
meant
the
different
versions,
the
wrong
word
here
of
different
encoding
types,
yeah
that.
F
C
Memories,
don't
always
have
a
cache
to
write.
There's
no
reason
to
relate
them.
We
would
just
cache
two
different
copies
like
if
we,
if
we
try
to,
if,
if
we
add
that
as
a
requirement
it
it
creates
dependencies
between
fields
updated
in
the
client
hosting
app
and
the
ones
updated
in
the
observer.
I
don't
think
it's
worth
it.
B
B
C
That
might
actually
not
be
true,
it's
possible
that
we
might
introduce
a
field.
That's
interesting,
that's
interesting
for
both
clients
and
servers
where
the
client
is
new
enough
to
to
get
the
field,
but
the
server
isn't
because
we're
in
an
upgrade
situation.
So
I
it
it'll
be
often
true,
but
I
don't
think
it's
strictly
true.
D
C
D
C
C
B
D
D
B
Thank
you
guys,
and
I
think
I
think
that's
it.
A
All
right,
so
the
next
topic
is
that
unless
there's
more
on
this
one.
D
I
was
just
going
to
make
a
note
that
even
some
sam
brought
up
a
thing
about
using
lrus
for
answering
osd
maps,
subscriptions
requests
and
I'm
actually
I'm
not
sure
if
that's
a
problem
or
not.
But
I
would
want
to
look.
C
D
C
C
D
A
In
any
case,
it
sounds
like
that's
a
a
lower
impact
change
than
the
other
pieces
that
we
discussed.
C
A
All
right,
so,
let's
move
on
to
the
second
topic,
then
this
is
about
adding
being
able
to
block
list
entities
or
by
a
label
or
by
generation
number
or
some
some
mechanism
for
block
listing
things
across
the
whole
site,
but
still
being
able
to
bring
the
site
back.
I
Yeah
sure
so
we
we
sort
of
discussed
this
in
the
december
cdn,
so
I'll
start
with
the
motivation
here,
which
is
more
covered
in
the
mail
sent
to
the
mailing
list.
So
this
is
basically
for
disaster
recovery,
where
there
are
two
kubernetes
clusters
or
in
other
words,
two
client
clusters,
self-clan
clusters
that
share
a
common
theft,
storage,
endpoint.
You
know
sorry
self-storage
clusters
in
that
sense
and
applications
that
have
rbd
images
of
cfs
sub
volumes
mounted
on
client
clusters.
I
I
The
intention
here
is
that
there
is
the
recovery
point
objective
or
the
there
is
no
data
loss
because
they're
writing
to
the
same
ceph
cluster.
So
as
long
as
you
cut
out
one
client
cluster
and
then
move
the
workload
to
the
other
client
cluster,
it
should
theoretically
be
able
to
contribute
from
where
it
left
off.
So
so
that's
that's
where
the
requirement
kind
of
comes
in
from
so
initially,
we
were
looking
at
the
existing
existing
block
list
features
and
whether
we
can
reuse
them.
I
In
this
context
there
are
a
few
gotchas
we've
been
discussing
this
with
jason
and
patrick,
so
block
listing
did
not
fit,
and
plus
we
actually
want
a
whole
set
of
clients
cut
out
from
the
cluster
and
start
from
the
cluster,
not
necessarily
a.
A
I
So
so
so
the
volumes
could
still
be
mounted
or
sub
engines
could
still
be
mounted
and-
and
you
could
still
be
going
on,
so
we
needed
something
more
broader
to
fence
off
an
entire
cluster
from
access
to
these
f
clusters,
so
so
so,
okay,
so
we
were
discussing
that
and
jason
came
up
with
an
idea
saying
that,
okay,
why
don't
we
just.
I
Knock
out
the
credentials
for
this
fx
identity,
that's
that's
being
used
to
mount
images
from
cluster
a
client
cluster
a
and
that
way
we'll
take
care
of
all.
You
know
that
cluster
will
be
completely
sensed
till
that
particular
fx
identity
is
reinstated
or
its
capabilities
are
being
stable.
I
So
there
was
a
a
problem
here
in
the
sense,
so
we
did
try
that
that
basically
works,
but
there
was
a
problem
that
there
could
be
outstanding
tickets,
which
are
still
valid,
and
hence
they
also
need
to
be
invalidated
or
invalidated.
I
And
then
patrick
also
added
a
few
other
cases
where
there
were
some
there
was
some
stuff.
I
I
don't
know
these
cases
very
well
he's
added
the
trackers
in
there,
but
this
f
manager
and
cfs
interactions
needed
certain
glob
based
fencing
of
certain
identities,
so
that
when
I'm
not
sure
about
the
use
case,
I
don't
want
to
talk
about
that.
I
I'd
probably
move
it
up,
so
so
so,
based
on
that,
we
we
wrote
up
this
particular
I
mean
I
wrote
up
this
particular
feature
request
which
is
around
you
know:
block
listing
suffix
entity,
name
by
and
either
by
either
either
a
suffix
entity
name
or
by
a
label
and
and
all
tickets
lower
than
a
particular
generation
number
so
that
it
that's
for
the
third
requirement
which
I'm
not
going
to
talk
about.
I
So
I
just
wanted
to
resync
on
this
and
kind
of
understand.
If
this
is
the
last
time
we
discussed
this,
there
was
no
problem
around
feasibility,
but
then
again
we
can
discuss
this
with
everybody,
so
I
just
wanted
to
bring
this
up
again.
So
the
the
first
one
is
the
the
ability
to
block
listed
as
in
the
tracker,
the
ability
to
block
listed
on
block
list.
I
I
D
Sorry
I
mean
we're
talking
about
a
geographically
distributed
set
of
stuff
here,
so
like
kubernetes
cluster,
one
and
two
are
gonna,
be
at
different
places,
sending
routed
traffic
to
a
step
cluster
in
a
third
place,
or
maybe
one
of
those
places.
So
it's
going
to
be
going
through
routers
that
can
just
drop
the
traffic
so.
D
C
No,
no,
no,
hang
on
there's
a
that's.
That's
not
what
that
kind
of
block
list
is
for
so
okay,
so
we
have
two
set.
We
have.
We
have
two
clients
for
for
simplicity
purposes,
the
from
and
the
two
right,
so
the
sequence
of
operations
is
block
list
from
swap
to
two
start.
I
o
on
two.
What
this
guarantees
is
that,
when
the
first
I
o
arrives
on
two
anything
that
had
been
committed
from
one
is
committed
or
not,
but
after
the
very
first
read
that
site
two
does
it
won't
see
changes
to
that?
C
D
I
C
D
G
G
D
G
C
D
D
Yes,
it's
it's
solvable,
but
it's
a
lot
of
work
and
for
a
very
specific
use
case
that
I
think
can
and
maybe
I'm
wrong.
And
if
we
go
away
and
look
at
it,
then
the
answer
will
be
no
but
sam
you're
concerned
about
the
cutoff
point.
I
understand
the
theoretical
concern
you're
saying,
but
that's
just
not
a
problem.
It
is
like
super,
not
a
problem.
The
cues
aren't
that
long
at
the
places
that
we
would
be
blocking
them
from
and
it
would
be
and
if
we
can
just.
C
I
I
Only
now
our
specifications
changing
to
block
a
node
in
the
kubernetes
cluster
from
you
know,
fencing
a
node
off
that
that's
because
it
would
be
now
being
developed
and
that
still
has
a
mapping
that
we
need
to
do
between
a
node
identity
that
kubernetes
thinks
of
as
the
node
and
the
actual
addresses
that
the
node
is
using
to
map
volumes
and
sub
volumes.
I
I
D
Someone
writes
this
and
submits
matches
for
it.
I'll
look
at
them
carefully
and
if
they
seem
maintainable,
then
that's
fine,
but
I
don't
think
anyone's
going
to
do
it
quickly
and.
I
So
so
I
have
absolutely
no
idea
about
the
complexity.
That's
involved
here
right.
Let
me
be
honest,
so
yeah,
so
we
never
got
around
to
how
difficult
is
this
to
do
if
you're
gonna
say
that
it
is
tough,
then
I
definitely
like.
Are
there
other
alternatives
that
you
can
think
of
here?
That
can
make
this.
C
Happen
for
that
I'm
with
greg.
This
is,
I
don't
know
a
person
year
person
year
and
a
half
before.
A
Only
block
list
in
the
entire
cluster
that's
much
much
simpler
than
trying
to
do
a
blog
with
some
unblock
list
of
much
finer
entities
and
handling
the
manager.
Failover
kind
of
things.
A
A
C
I
Right
so
the
among
the
three,
the
first
ones:
what
well,
what
what
the
community
side
of
things
need.
But
what
you're
saying
is
even
that.
E
So
I
I
I
just
want
to
ask
you
a
basic
question,
so
we
already
have
the
ability
to
blockless
clients
right
now.
What
if
we
have
something
in
the
middle?
Don't
don't
have
let
us
say
we
don't
have
this
ideal
feature
of
block
listing
and
entity
or
whatever.
What
do
we
have
something
in
the
middle
that
deals
or
like
that
takes
care
of
this
one
client
versus
multiple
clients
and
for
for
raiders
it
is
just
going
to
be
issuing
multiple
block
lists
for
n
number
of
clients
versus
one
client.
I
So
one
of
the
prerequisites
there
is
is
to
know
all
the
client,
so
it's
kind
of
funny,
because
the
cube
cluster
can
still
continue
doing
what
it
does
the
cluster
that
you're
actually
trying
to
fence.
So
it
could
at
least
get
you
the
amount
of
a
volume
and
another
node
in
its
cluster.
Just
because
it's
it
thinks
it's
up
in
a
life,
whereas
a
higher
level
management
layers
actually
taken
the
decision
to
move
the
workloads
to
the
other
cluster.
So.
I
So
so,
if
I'm
not
wrong
block
listing
is
for
con
for
active
connections.
Right
I
mean
it's
not
for
future
connections.
I
mean
it
it
it's
like
blocklessly,
so
yeah
yeah,
so
so
it
could.
It
could
just
move
from
node
one
to
node
two,
just
because
node
one
in
that
cluster
is
inactive
whereas,
whereas
from
like,
I
said
from
a
higher
level
orchestration
entity,
the
entire
cluster
is
inactive
and
the
workloads
actually
moved
off
to
an
entirely
different
cluster.
E
Yeah,
so
it's
basically
the
knowledge
of
the
clients
is
not
you
know
it
can
change.
I
A
Wouldn't
there
be
other
kinds
of
resources
in
the
cluster
that
you'd
want
to
like
prevent
the
applications
from
using.
I
Okay,
so
there
are
two
layers
here:
one
layer
is:
is
a
global
traffic
manager
or
which
actually
routes
application
traffic.
So
there
is
within
cluster
of
application
traffic
that
can
be
that
applications
can
talk
to
each
other
and
then,
but
then
for
applications
that
need
disaster
recovery.
I
The
intention
is
to
actually
have
a
global
traffic
manager
which
which
actually
routes
traffic
to
these
various
applications,
but
that
will
be
flipped
from
instead
of
going
to
cluster,
adb,
flips,
away
to
b
and
the
applications
would
be
moved
as
a
set.
So
so
there
was
a
communities
terms,
a
stateless
application
that
uses
another
stateful
application,
which
basically
means
it
has
mounted
storage.
They
will.
They
will
be
restarted
on
cluster
b,
as
I
said,
and
the
global
traffic,
the
gtm
would
be
re-routed
to
cluster
b.
C
C
It's
the
same
problem
for
all
of
them,
so
fundamentally
we're
talking
about
what
block
level
requests
to
rbd
right
or
file
level
requests.
Ffs
doesn't
really
matter
either
way
we're
dealing
with
those
vsf.
However,
if
there
were
a
more
generalized
s3
available
service
that
a
an
application
we're
using
for
its
persistence,
you'd
have
the
same
problem.
You
need
to
stop
the
previous
defunct
version
of
the
clients
from
updating
that
shared
resource.
I
I
Multi-Cluster
kubernetes
workloads
are
not
something
that's
prime
time
yet.
So
it's
there
aren't
even
communities.
Multi-Class
to
sig
is
not
necessarily.
C
So
that
raises
a
question:
how
does
this
work
within
a
kubernetes
cluster
if
you're
migrating
a
stateful
application
from
one
part
of
a
cluster
to
another
part
of
the
same
cluster?
Why
isn't
this
a
problem,
then
it
okay.
I
I
One
of
them
is
that
it
just
waits
five
minutes
before
a
node,
which
is
the
configured
default
timeout
before
the
node
is
deemed
dead
and
an
additional
six
minutes
for
the
volume
to
detach.
I
But
there
was
no
mechanism
in
communities
to
actually
inform
storage
that
that
that
that
a
particular
node
is
dead
and
it
needs
to
be
fenced
the
the
spec.
So
this
has
been
a
problem
and
the
spec
for
this
is,
as
we
speak
being
written,
I
mean
I
think
it
was
rolled
out
a
week
and
a
half
back
where,
where
the
control
plane
will
be
notified,
that
a
particular
node
is
dead
and
that
will
subsequently
notify
the
storage
orchestrator,
which
will
yes.
C
G
C
I
D
It
is
possible
to
implement
client
applications
and
pods
that
handle
this
correctly,
whereas
it
is
not
currently
possible
to
implement
kubernetes
multi-cluster
that
handles
this
correctly.
I
think,
I
think,
is
the
answer.
How
do
you
do
the
former
look?
Yeah
client
can
do
things
in
its
pods,
like
rvd
volumes
are
fenced
using
the
existing
rvd
fencing
mechanisms.
C
D
E
I
D
D
C
C
D
C
D
What
I
would
feel
more
comfortable
with
is
that
we
could
block
like
ip
ranges
with
with
some
kind
of
epic
that
that
I.
C
Better
once,
but
so
I
think
we
have
two
different
sustained
problems.
One
is
that
we
don't
know
the
end
the
instance
list,
for
whatever
reason,
I
think
we
need
to
solve
that
problem
via
whatever
means
it
doesn't
seem
conceptually
hard
and
use.
The
current
unscalable
commands
that
already
exist
and
stuff
to
just
do
it
with
an
orthogonally.
A
C
C
Right,
but
that's
not
the
immediate
problem.
The
immediate
problem
is
that
this
feature
doesn't
work
in
kubernetes.
It's
not
that
it's
not
scalable.
It's
that
it's
not
possible.
I
Initially,
we
were
thinking
about
the
ips
and
you
know
just
getting
all
node
ips
and
things
like
that,
but
I'm
not
so
sure
that
two
kubernetes
so
within
a
kubernetes
cluster
and
not
having
a
node
and
pods
having
unique
ips
is
guaranteed,
but
across
cube
clusters,
not
so
sure
it's
guaranteed.
So
now
the
point
is,
if
you
block
list
a
range
of
ips
and
and
and
and
and
the
other
cluster
he
uses
those
because
it's
I
I
don't
know
whether
that
that
will
work.
D
Okay,
well,
having
told
you
that
the
earliest,
you
could
do
any
version
of
these
things
that
you
talked
about
is
in
a
year.
Maybe
we
could.
We
could
do
ip
range
block
listing
much
more
quickly
if
you're
convinced
it
was
useful
and-
and
I
mean
I'm
talking
about
like
you-
know
us
well,
however,
much
range
is
needed,
but
like
a
slash,
16
or
whatever
like
we
like,
like,
we
would
probably.
A
I
C
D
D
D
C
C
I
Yeah
the
two
separate
set
clusters
is
is
a
different
one
that
we,
where
we
don't.
We
don't
need
defense
off
in
this
this
fashion,
because,
for
example,
both
rbd
mirroring
and
flash
mirroring
have
a
notion
of
primary
and
secondary
interference.
I
I
The
only
use
case
that
I
know
is
an
external
cell
cluster
shared
by
multiple
open
shift
clusters,
sorry,
which
no,
the
only
use
case
that
I
know
is
an
is
an
external
self
cluster
shared
by
an
open
shift
cluster.
This
this
whole
multiple
clusters
is
something
we're
attempting
to
do
so.
I
don't
have
an
answer
to
your
question.
That's
that's
what
I'm
saying.
I
Okay,
so
the
this
effects,
identity
and
feature
there
is
going
to
take
a
long
time
we
probably
need
to
we,
as
I
probably
need
to
figure
out.
I
A
Or
you
can
start
with
the
basics
of
trying
to
use
existing
blockbuster
commands
even
trying
to
get
the
list
of
instances.
C
C
D
C
I
Okay,
the
so
it
it
it
is
the
host
that
mounts
me
verifying
one
or
two
things.
A
couple
of
days
back,
we
had
another
meeting,
we're
verifying
one
or
two
things
around
that,
but
it's
the
host
that
mounts
and
for
the
parts
they
just
bind
mounted
into
the
pods.
I
So
so
we're
not
so
worried
about
pod
ips
yet,
but
we
need
to
confirm
that.
D
Yeah,
so
a
lot
of
the
rbt
tooling
that
you're,
probably
referring
to,
is
set
up
to
do
specific
instances,
because
it
was
designed
for
openstack
where
things
were
mounting
as
like
part
of
a
key
new
kvm
process.
But
you
can
also
just
block
this
in
ip
and
it's
stock
block
listed
into,
but
I
it
might
be
a
default
24
hour,
timeout
or
ever,
but
you
know
either
you
can
refresh
it
or
it's
stuck
block
listed
until
you
unblock
listed.
D
So
if
you
know
the
number
of
posts
in
their
ips
that
works
the
way.
I
don't
know
what
scale,
but
a
large
enough
scale
for
many
for
many
kubernetes
clusters
that
I've
heard
of
those
deploying.
D
I
So
I'll
I'll
keep
that
in
mind.
I
mean
this
is
a
little
down
the
line,
I'm
just.
E
Yeah,
I
think
I
think
the
main
thing
to
remember
is
what
what
is
the
scale
we
are
looking
at.
What
is
the
timeline
right?
We
don't
need
to
come
with
come
up
with
the
perfect
solution
in
six
months.
We
can
come
up
with
a
perfect
solution
in,
like
you
know,
two
years,
but
the
point
is
what
is
the
minimum
viable
product
you're
looking
for
in
in
what
time
frame?
E
If
you
can
get
those
straight
to
us,
then
probably
you
know
we'll
be
able
to
guide
better
at
this
point
we
are
like
okay,
this
is
a
possible
solution
and
this
is
the
best
solution,
but
we
need
we
can
pick
something
in
the
middle.
G
A
Thanks
for
bringing
it
up,
any
other
topics
folks
wanted
to
discuss.
B
Over
the
cdm,
oh
the
you
know
the
regarding
to
the
collecting
the
slow
through
request
in
in
in
manager,
because
in
last
discussion
we
mentioned
that,
let's
order,
cluster
load
going
through
the
the
textures
could
be
hurts
the
performance
of
the
the
monitor
in
in
the
case.
If,
when
the
cluster
is
suffering
from
the
performance
issue,
we
don't
need
to
to
burden
it
with
more
more
small
load
with
processing
the
plus
log.
B
So
this
last
time
our
solution
was
to
use
the
health
protocol
to
collect
the
summary
of
the
house
health
reports
and
let
the
manager
to
aggregate
summary
and
then
ask
enable
the
manager
to
to
collect
the
details
from
the,
for
example,
the
osds
and
the
mds
statements
for
them
for
the
exact
a
slow
request,
so
it
so
so
the
safe
cri
could
collect
all
them
or
dashboard
could
ask
the
manager
for
more
details
on
demand.
B
But
when
looking
into
this
problem
in
in
depth,
I
realize
that
there's
some
problem
we
need
to
solve
the
first.
One
is:
what's
the?
What
if
the
major
active
manager
said
instance
fails
because
before
that,
we
are
using
the
rock
db
for
for
persistent
in
the
this
class
log,
but
the
manager
not
does
not
have
this
facility
at
this
moment.
A
But
yeah,
I
think
the
basic
point
is
this:
is
that
we
don't
need
all
the
all
this
information
for
it
debugging
and
so
we're
trying
to
summarize
it,
and
I
think
part
of
that
also
means
we
don't
even
need
it
to
persist.
Necessarily
like
it's
helpful
to
understand
what
what
went
wrong
the
at
the
moment.
A
But
if
the
manager
happens
to
go
down,
I
don't
think
we
need
to
worry
about
trying
to
collect
these
information
again,
because
if
the
requests
are
still
happening,
then
they'll
still
be
sent
again
and
if
they're
not,
then
there's
not
much
more
diagnosis
that
we
can
do
with
that
point.
A
D
C
E
A
few
data
points
you
know
in
different
places.
We
should
you
know
that
that
should
be
good
starting
points.
I
don't
think
we
even
even
at
this
point
we
log
so
much
a
slower
request,
but
we're
not
probably
looking
at
everything
we're,
probably
looking
at
when
it
initiated
and
how
long
it
lasted
right.
Yeah.
A
That
reminds
me
actually
had
an
idea
when
we
were
investigating
a
recent
issue
when
we
were
able
to
tell
what
was
blocking
the
slow
requests
based
on
the
dq
latencies.
A
In
order
to
make
this
easy
easier
to
diagnose
what
was
an
end
up
blocking,
we
could
have
right
now
it's
very
difficult
to
tell
from
the
existing
information
what
is
causing
a
slow
request
if
we
had
a
sort
of
cache
of
what
the
recent
ops
were
for
for
each
shard,
when
the
dq
latency
has
a
spike,
we
could
record
that
in
the
cluster
login
and
or
somewhere
record
that
somewhere,
at
least
in
the
osd
log,
and
find
out
what
what
the
most
recently
processed
ops
were,
since
they
were
probably
the
culprit
of
that
large.
B
E
Yeah,
I
don't
think
we
want
to
keep
all
of
them.
We
can
keep
a
count
of
how
many
of
them
are
there
right
like
how
many
just
the
count
of
like
how
many
slow
requests
over
time
have
we
seen
if
it's
like.