►
From YouTube: Ceph Developer Monthly 2021-11-30
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
that
I
don't
know
is
this:
on
amri
and
casey:
you
want
to
kick
it
off.
B
B
B
So
let's
talk
our
current
work
in
in
the
tracing
and
we
have
created
a
tracer
class
using
the
open,
telemetry
client
sdk.
We
we
moved
from
open,
telemetry
to
sorry
from
open
tracing
to
open
telemetry.
B
B
B
We
are
able
to
create
a
new
trace
to
l,
to
add
a
child
spans
to
an
existing
trace,
and
we
also
can
disable.
I
mean
enable
the
the
tracing
at
runtime
and
we
want
it
to
be
disabled
by
default,
and
we
also
can
serialize
and
deserialize
trace
info
in
order
to
unify
spans
into
a
single
trace.
B
Here
is
an
example
of
the
the
regular
ui
we
can
see
here
in
the
we
can
see
here
in
the
11th
races
which,
with
the
each
trace
is
the
operation
in
the
lgw,
and
the
trace
name
consists
consists
of
the
operation
name
and
some
unique
transaction
id.
B
Let's
talk
about
the
actual
traces
that
we
already
have
in
in
the
ldw,
and
every
user
request
in
rgb
is
being
traced.
B
B
We
currently
have,
we
currently
have
a
basic
expanse
in
in
each
trace,
that
represents
a
user
request,
the
user,
the
we
have
three
spans,
one
for
the
verified
permission
and
one
for
the
trace
and
one
from
developer
provision
and
the
third
one
is
for
the
execute
method
of
the
operation,
and
we
have
tags
on
on
this
trace
that
that
can
can
be
used
to
identify
the
trace,
for
example
like
a
bucket
name
or
object
id.
B
And
the
second,
the
second
traces
that
we
have
is
stress
for
the
multi-port
upload
operation
and
which
is
trace
that
unifies
the
the
several
operations.
I
mean
the
subsequent
operations
to
a
single
trace.
B
B
We
can
support
it,
support
it
with
the
serialized
serialization
of
the
trace
info
and
deserialize
it
and
build
the
an
object
called
spam
context.
B
We
want
to
compare
several
approaches
we
want
to
to
compare
when
the
tracing
is
compiled
out
and
versus
tracing,
compiling
and
disabled,
which
is
the
default
state,
and
we
also
want
to
to
compare
when
the
tracing
is
disabled
and
versus
tracing
enabled,
which
is
the
the
when,
when
most
of
the
runtime
cost
is
happening.
B
In
the
tracing-
and
we
also
want
to
to
deploy
to
deploy
the
the
jager
components
along
with
with
ceph
and
we
need
we
need
to,
we
still
need
to
integrate
with
safe
adm
to
make
it
deploy
the
the
jager
containers
if
the
trace.
If
if
tracing
is,
if
we
want
it,
I
mean
it's
not
supposed
to
to
deploy
it
in
any
always.
B
Just
if
you
pass
some
flag
to
it
and
jager
also
supports
rook
deployment
and
they
have
an
operator,
and
we
decided
with
the
rock
maintenance
to
to
document
the
process
of
installing
the
jaeger
operator
and
deploying
the
agusa
and
not
addicts
of
code
to
the
rook
operator,
to
the
yeah,
to
the
operator
to
to
include
to
install
the
yellow
pivotal
by
by
itself,
and
then
they
have
some
abilities
to
to
configure
the
diego
components
to
to
config
the
safe
cluster,
to
communicate
with
the
aggro
operator
with
the
edgar
pots,
and
so
adding
a
code
to
the
rook
is,
is
currently
unnecessary.
B
B
We
can
we
can
help
from
it
from
multisite
racing,
since
it
it's
very
hard
to
debug
it,
and
we
also
want
to
to
make
a
traces
integrated
trace
for
the
rgw
and
the
osd
like
an
end-to-end
trace
from
the
from
the
start
of
the
operation.
Until
the
those
d
operations.
C
C
Like
in
the
starting
you
covered
serializing
and
deserializing
the
traces
I
I
didn't
get
them,
can
you
kind
of
explain
a
bit
about
it.
B
C
Like
that
was
my
second
like
how
multi-part
uploading
is
like
even
I'm
not
much
familiar
with
multipart
upload
this
so
yeah,
how
do
you
make
use
of
them.
B
This
is
the
serialized
spam
context
in
the
in
meta
object
when
we
create
the
the
upload
object
and
the
actual
upload,
and
when,
when
we
read
when
we
want
to
to
to
put
to
upload
and
a
part
of
it,
we
we
read,
we
read
the
upload
the
object
and
from
it
the
meta
object
and.
B
And
we
take
those
span
id
racd
and
flex
and
convert
it
to
a
spam
context
which
is
used
in
the
ad
span
method
of
the
tracer
class.
C
Okay,
so
it's
like
deconstructing
the
trees
and
then
like
using
that
context
to
continue
the
trees.
D
Whenever
you
read
it
from
from
rados,
you
get
the
serialized
span,
so
you
have
you,
take
it.
If
you
find
it
there,
you
deserialize
it
and
you
continue
your
trace
from
there.
C
Okay,
so
like
in
open
tracing,
we
had
extract
and
inject
method
that
was
kind
of
storing
the
context
and
then
using
like
using
that
one
context
again
continuing
the
span
generation
at
the
other
end.
Is
it
similar
to
that.
B
The
the
extract
the
extract
function
is,
is
a
rgw
related
it
it's.
It's
not
related
to
the
common
tracer.
D
Like
tracing,
if
you
want
to
have
this
unified
tracing
across
rgb
and
osd,
then
we
would
have
to
put
like
to
serialize
the
information
into
somewhere
in
radius
and
just
send
it
to
the
osd
and
when
the
osd
will
get
the
the
command
the
latest
command
from
the
rgw.
It
would
look
for
this
extra
piece
of
information
in
the
payload
and
if
it
finds
it
there,
then
it
would
reconstruct
the.
A
C
Okay,
okay,
yeah.
I
I
okay
that
clarifies
my
doubt
yeah
I
I
even
I
think,
even
open
tracing.
We
were
having
these
two
methods,
which
were
similar
to
the
decelerating
and
serializing
that
we
are
doing
here
so.
D
The
open
tracing
had
had
the
same
functionality
as
well.
I
don't
think
we
we
used
it,
though,
in
open
choicing.
C
Yeah,
we
don't,
we
don't
use
it
currently
and
like
I,
even
I
am
not
much
familiar
with
open
telemetry.
Do
they
have
that
feature
or
not
good,
to
know
that
we
have
something
peaced
out
and
like
mighty
path.
Traces
like
has
it
been
worked
out,
it's
not
yet
merged
and
still
work
in
progress.
C
The
like
you
are
using
my
deserializing
and
serialization
in
my
part,
tracing
if
I
understood
correctly.
B
C
Okay,
so
is
it
also
multiple
cross
boundary
system
like
rgw
and
osd.
B
It's
supposed
to
to
write
in
the
the
span
context
into
in
buffalo
list,
I'm
not
sure
if
we,
if
the
osd
uses
the
same
in
the
same
buffer
list-
or
I
mean
not
the
same.
But
for
this
I
mean
the
same
data
structure
to
to
serialize
objects.
But.
D
It's
the
common
tracer.
We
already
made
this
change
before
the
migration
to
open
telemetry,
but
it's
common,
it's
common
code,
so
the
new
open,
telemetry
code
is
already
running
in
the
osd
with
the
existing
stuff.
Now,
regardless
of
serialization
and
that's
and
those
things,
but
the
open
telemetry
is
already
in
the
osd.
A
D
C
And
like
have
you
tested
it
in
like
multiple
nodes
or
right
now,
the
dev
cluster
is
limited
to
one
personal
work
environment.
I'm
asking
from
a
deployment
per
purpose
like
because
I
have
not
tested
in
multinote
till
now.
D
D
It
we
didn't,
we
were
just
testing
with
restart.
I
think
that
omri
is
working
on
selfie
dm
integration,
and
once
we
have
that,
I
mean
I
guess
we
can
test
that
manually
on
multiple
nodes.
Even
now,
you
just
need
to
go
and
and
set
up
things
manually,
but
once
we
have
safe
idea
integration,
then
you
you
can
set
up
the
the
agents
and
collectors
in
the
back
end
on
whatever
setup
that
you
have-
and
this
is
also
true
for
rook
for
rook.
It
will
be
again
well,
not
exactly
manual
but
just
be.
C
Any
like
difficulty
you
faced
in
while
working
on
the
traces,
because
I
from
the
past,
I
remember
that
there
were
cases
where
having
a
two-parent
parent
traces
like
that
was
leading
to
a
deadlock
condition,
and
I
don't
particularly
remember,
but
we
stick
to
having
only
one
parent
in
the
whole
scope
for
individual
trace.
I
am
have
you:
were
you
able
to
produce
multiple
traces
and
then
the
same
scope?
C
C
Maybe
yeah
so
like
I,
I
just
wanted
to
ask
her.
You
were
able
to
have
multiple
traces
while
dealing
with
the
same
scope.
D
They
start
with
the
transaction
start
and
they
end
with
the
transaction
end,
and
we
have
the
the
complete
multiplied
trace,
which
starts
with
a
multiple
in
it
and
then
goes
through
all
the
puts
of
the
motor
part
and
then
ends
and
the
multiple
complete
and
we
have
them
running
in
power.
So
in
the
same
time
you
would
see
a
trace.
Oh,
it's
a
trace
for
a
put
a
transactional
trace
for
the
put,
but
this
same
put.
D
Operation
is
also
part
of
a
much
longer
trace,
which
describes
the
multipart
from
the
inner
through
all
the
different
puts
until
they're
complete-
and
I
don't
know
me
correct
me
if
I'm
wrong,
but
I
don't
think
you
run
into
issues
with
us
all
right.
C
Cool
yep,
I
can
think
of
these
things
out
of
my
mind,
anybody
else
have
anything.
E
D
C
D
I
I
said
omi
started
to
work
on
on
the
sfdm
integration.
D
Yeah
and
and
for
rook
there
won't
be
any
actual
code
going
in.
It
will
be
just
documentation
inside
the
rook
manual
for
deployment
with
the
tracing.
C
So
just
to
mention
is
there,
like
a
have
you
guys
thought
about
the
design
of
safarium
and
base
deployment?
C
Last
I
was
working
a
bit
on
it
and,
like
I
was
thinking
of
having
different
demons
and
deploying
it
sidecars.
Similarly,
we
have
the
systemd
units
or
other
other
cef
components,
but
anything
anything
specific
that
you
have
thought
about.
Is
it
still
a
plan
for
future.
B
Yeah,
it's
it's
the
deployment
via
safety
dm
is,
I
mean
safety.
Damn
has
some
options
to
deploy
promoters
which
is
similar
to
to
tracing
to
to
jager
and
in
a
staff
adm.
They.
B
They
have
some
services
that,
like
that
outside
of
outside
of
the
the
core
of
like.
B
External
services
and
we
will
deploy
it
in
the
same
way.
Like
the
promotions,
I
mean
it,
it's
supposed
to
be
a
service
that
can
that
the
manager
can
identify
and
know
that
is
running
or
or
not
running
and.
F
Hey
andre,
I
have
a
question
for
you:
have
you
thought
about
making
these
metrics
some
of
these
measures
available
to
the
users,
for
example,
by
the
server
timing?
Http
api?
I'm
not
sure
if
you
are
familiar
with
that,
but
you
can
add
some
headers
not
so
really.
F
So,
just
just
for
you
tonight,
I
think
that's
I'm
currently
exploring
that
for
that's
where
I
think
it's
pretty
useful,
only
for
debugging
tracing
purposes,
probably
to
be
disabled,
the
rest
of
the
time,
but
at
least
you
want
to
debug
or
get
some
profiling
or
some
tracing
that
might
help.
D
So
I'm
not
sure
that
we
need
to
do
that.
I
mean
this
is
probably
something
you
can
do
from
the
elasticsearch.
Back-End
like
this
is
what
jagger
are
using
they're
using
elasticsearch.
So
I
mean,
if
you
write
the
correct
queries
there,
you
should
probably
be
good
to
extract
the
data.
F
C
I
think
we
would
just
need
to
embed
the
link
for
the
the
query
and
the
jaeger
ui
generally
takes
care
of
these,
so
we
can
display
them
some
of
these
matrix,
consistently
dashboard
as
well.
G
I
have
a
question:
please:
does
it
make
sense
to
anonymize
some
of
these
metrics
and
have
them
being
sent
through
our
telemetry
module
as
well.
B
D
Main
matrix,
the
main
matrix
that
could
be
extracted
from
the
traces
are
the
the
latencies
or
the
when
the
time
it
takes
to
perform
operations,
the
those
are
like
the
the
matrix
information.
The
other
information
from
there
is
like
more
debugging
information,
the
tags
and
the
and
the
logs
is
more
like
you
get
a
very
nice
long
log
record
that
can
hop
multiple,
daemons
and
cross
time
and
stuff
like
that.
So
those
are
like
the
two
functions,
but
from
from
kind
of
latency
calculation
perspective.
I
think
this
could
be
very
valuable.
D
It's
also
related
to
what
ernesto
said.
So
we
probably
should
think
about
that.
It's
all
in
the
elasticsearch
backhand,
so
there
probably
is
a
possible
way
to
to
fetch
it
from
there
and
store
it
in
in
places
where
they
show
graphs
and
so
on.
G
C
C
So
all
the
details
and
time
stamps
and
everything
would
be
intact
and
like
we
can
have
them
easily
portable,
so
yeah.
I
really
like
that
should
come
handy.
A
H
Thanks
josh,
so
there
is
a
pull
request
that
adds
the
ability
to
store
the
ssh
password,
the
sh
key
password
within
the
config
key
store,
and
I'm
slightly
worried
that
this
might
not
be
a
good
idea.
H
I'm
not
feeling
so
well
about
that
possibility.
Honestly.
What
can
we
do
about
it?
I
mean
I
mean
we
could
we
could
encrypt
the
mon
database?
That's
that's
one
idea
with
a
business
static
key.
Does
it
really
solve
issues?
I
don't
know
we
could
try
to
hook
up
vault
into
the
into
the
manager.
Would
that
be
a
good
idea?
I
don't
know
we
could
try
to
enable
the
ssh
agent
in
the
manager
containers.
H
Would
that
be
a
good
idea?
I
don't
know
I'm
looking
for
inputs
from
our
group
here.
D
H
So
in
self-idea
use
ssh
and
right
now
there
is
a
requirement
that
there,
the
ssh
key
that
is
used
to
call
to
establish
ssh
connections
to
remote
hosts,
does
not
have
a
password
yeah,
and
this
particular
pull
request
had
the
ability
to
have
password
protected,
ssh
keys,
which
in
itself
is
a
good
idea.
A
This
is
a
parallel,
but
where
we
store
the
encryption
keys
for
the
osds
they're
also
stored
in
the
monitors
after
using
address
encryption,
and
I
think
we're
kind
of
assuming
that
if
users
are
concerned
about
the
security
of
those
keys
they're
using
a
full
desk
encryption
on
the
monitors
and
some
others,
don't
have
a
control
over
their
own
block
device
or
anything
like
that.
H
H
Then
I
I
would
be
good
right
can.
Can
we
verify
it
that
this
unrest
encryption
for
the
one
database.
A
H
I
mean,
if
I
remember
correctly,
you
you
were
pretty
much
okay
with
just
storing
things
in
the
config
database
right.
J
All
right,
yeah,
the
I
mean
the
monitor,
is
the
root
of
trust,
and
so
it
kind
of
is
what
it
is.
I
think
at
some
level
the
monitor
has
to
have
that
key.
I
think
the
question
is
like
making
sure
that
the
monitor
stores
it
in
a
way
that
doesn't
introduce
like
the
opportunity
for
users
to
accidentally
expose
it.
J
So
I
think
I
wouldn't-
and
I'm
not
I'm
not
the
security
expert
here,
but
I
wouldn't
worry
about
things
like
encrypting
the
monitor
database,
because
you
have
to
decrypt
it
and
the
key
for
that
is
going
to
be
readable
by
the
root
user
and
the
monitor
database
is
also
owned
by
the
root
user
and
not
readable
by
world,
and
so
whatever
it's
all.
It's
all
the
same.
J
On
the
other
hand,
if
you
do
self-config
key
dump,
then
it'll
dump
out
a
whole
bunch
of
stuff.
That
includes
secrets.
So
that
is
like.
We
want
to
make
sure
that
people
don't
accidentally
expose
that
information
and
understand
that
config
key
is
potentially
sensitive.
There
is
sort
of
a
weird
cluj
that
we
added
a
couple
years
ago,
where,
if
you
have
a
monitor
capability
and
you
do
allow
r
meaning
allow
read
access
that
used
to
include
reading
everything
in
config
key,
we
changed
that,
and
so
you
can't
read
config
key.
J
Unless
you
have
the
star
permission,
you
have
to
have
allow
star,
which
is
only
the
admin
user.
So
all
the
other
daemons
normally
only
have
allow
r
and
then
like
a
specific
profile
that
has
specific
sub-commands
or
whatever.
Maybe
you
can
read
specific
subsets
of
that,
and
so
only
client
admin
generally
has
access
to
read
all
of
the
key,
and
so
that
sort
of
burns
it
off.
J
But
I
think
the
the
danger
is
that
config
key
doesn't
sound
sensitive
and
so
users
might
not
realize
that
it
may
contain
sensitive
information
in
the
same
way
that,
like
in
kubernetes,
there's
called
there's
something
called
a
secret
store.
J
J
F
I
remember
we
had
a
similar
discussion
with
the
niha.
You
probably
will
remember
this
with
the
login
of
the
safe
key,
the
config
key
set
and
commands
right,
so
those
were
basically
printed
in
the
log,
so
we
were
building
also
the
the
secrets.
K
A
J
One
option
might
be
to
just
create
an
arbitrarily
different
namespace
called
the
secret
store
that
works
exactly
like
config
key
and
might
even
have
the
same
set
of
permissions.
But
it's
just
not
called
config
key.
It's
called
secrets,
but
you
could
do
secret
stump
and
it
would
dump
all
the
secrets.
And
then
we
adjust
things
to
store
them
under
that
it's
just
a
identical,
but
not
whatever
equivalent
but
separate
namespace.
J
K
No,
I
think
that
that
was
the
busy
that
was
a
security
issue,
that
we
were
dumping
sensitive
information
in
the
logs,
so
we
had
to
remove
a
whole
bunch
of
things
yeah,
especially
like
you
know,
like
values
to
keys.
That's
that
was
that's
where
the
problem
was.
We
are
not
dumping
values,
yeah
go
ahead.
J
The
as
a
practical
matter
that
the
get
and
the
set
commands
can
either
take
the
new
value
or
return
the
value
as
the
or
they
can
set
the
value
rather
either
as
an
argument
in
the
command
or
as
the
like
standard
in
type
payload,
and
so
I
think,
the
the
practical
step
that
we
need
to
make
sure
is.
Whenever
we're
setting
a
secret,
we
pass
the
secret
via
the
standard
in
so
it
doesn't
get
logged
those
two
as
an
argument.
J
M
Not
dump
the
content
of
the
key
and
just
access
the
content
of
the
key.
If
we
get
the
key
with
the
config
down,
then
you
see
like
like
a
bunch
of
stars
for
those
sensitive
values.
But
if
you
get
the
key
in
particular,
then
you
get
the
value,
if
you're,
afraid
of
just
dumping.
Everything
and
exposing
sensitive
data.
J
J
H
I
Instead
of
getting
rid
of
it
implementing
another
option
like
config
dump,
he
sanitized
like
another
flag
or
another-
you
know
parameter
in
that
command
so
that
there's
a
dump
option,
but
then
there's
also
a
filter
dump
option
just
an
idea.
I
A
B
J
And
also
the
site
command
passing
the
value
as
an
argument
as
opposed
to
an
input
yeah.
I
guess
that's
the
cli
really
well.
It
shows
up
on
all
the
logs,
because
when
a
manager
module,
for
example,
issues
a
command
to
set
a
secret
it
whatever
it
triggers
a
command
that
shows
up
in
the
logs.
K
But
yeah
but
then
that's
a
very
stopgap
solution.
It's
very
easy
for
somebody
in
future
to
introduce
something
new
and
go
log
it
somewhere.
I
mean.
Currently
the
code
has
explicit
comments
about
why
this
has
been
removed.
But,
like
you
know,
we
cannot
rely
just
on
that.
So
having
something
more
foolproof
will
eventually
is
probably
not
a
bad
idea.
J
The
current
configures
has
like
a
somewhat
of
a
structure:
that's
carved
up
hierarchically
and
so
like,
for
example,
all
the
manager
module
stuff
is
sort
of
private
within
the
manager,
slash
module
name
web
namespace
or
whatever,
and
so,
if
we
like
carved
off
another
namespace,
that's
like
secret,
prefixes
secret,
then
it
would,
it
would
complicate
that
hierarchy.
J
H
We
are
typically
not
storing
the
root
keef
ssh
key
password,
so
we
are
storing
the
ssh
key,
but
you're
not
storing
the.
This
is
hkey
password
and
I'm
worried
that
someone
is
using
his
private
key
or
for
it
and
then
accidentally
exposing
his
private
key
plus
password.
H
H
It
it's
deceiving
if
you're
using
a
the
password
for
that
key,
then
you
would
expect
the
the
password
to
be
a
bit
more
secure
than
the
key
itself,
which
is
not
the
case.
J
H
I
I
mean
I
I'm
going
to
ask
him.
I
I'm
going
to
ask
him
yeah
if
he's
aware
that
if
he
stores
it
there
and
then
it's
not
no,
not
a
bit
more
secure
than
having
an
ssh
key
without
password.
A
J
And
if
we
do
merge
it,
then
the
documentation
that
explains
that
you
can
do.
This
should
be
very
clear
that
they're
stored
in
the
same
place
e
and
the
password
password
that
goes
with
it.
J
A
Next,
one
is
around
detecting
inactive
or
out-of-date
communications
with
monitors
ernesto.
Do
you
want
to
bring
this
one
up,
or
maybe
alfonso.
F
Yeah,
this
is
going
to
be
presented
by
perry
super.
If
you
want
to
go
with
that,
but
I
mean
just
the
context.
I
think
this
has
been
discussed
previously.
So
basically
we
we
got
reports
from
situations
where
at
least
two
of
you
know
three
class
three
monitor
cluster
twelve,
two
of
the
monitors
come
down
or
unresponsive
or
whatever,
basically
from
the
manager
that
everything
will
will
feel
like
the
clusters
is
going
on.
F
It's
running
right
because,
as
per
well,
very
probably
you
may
more
or
less
explain
this
with
more
detail,
but
we
will
still
receive
reports,
monitor
reports
and
signals
that
are
actually
or
falsely
indicating
that
the
cluster
is
still
healthy.
So
pretty
it's
structures.
If
you
want
to
go
with
that,
okay.
D
N
So
that's
a
problem
that
we
try
to
fix
and
yeah
we
went.
I
would
think
that
the
monitor
should
recover
from
this.
Obviously,
but
but
we
went
for
a
quick
fix
on
this,
which
basically
is
when
we
receive
a
log
message,
since
we
are
still
getting
messages
from
the
monitor.
We
get
the
last
log
message
and
we
store
the
timestamp
from
this
blog
and
yeah.
N
If
this
timestamp
is
too
too
far
away,
we
we
notify
the
modules
and
just
we
stopped
the
dashboard
and
the
improvement
prometheus
basically,
but
I
think
that
there
should
be
another
fix
for
this,
like,
for
example,
a
hair
bit
from
the
monitor
to
the
monitor
or
we
camping
from
the
monitor
to
the
monitor,
maybe
also
the
monitor,
shouldn't
be
sending
messages
to
the
manager.
That's
another
thing,
but
yeah.
Overall,
I
think
that
the
monitor
should
recover
right
and
I
don't
have
that
much
knowledge
about
the
monitor.
To
be
honest,.
F
Yeah,
I
guess
so
eventually
this
is
quite
a
corner
case,
but
the
thing
is
that,
if
it
happens,
all
the
data
that
you
will
get
at
least
in
improvisation
graphina
will
be
completely
mislead,
because
you
will
see
the
clusters
in
one
status
rather
than
in
error,
and
probably
lots
of
stats
will
be
selling
misleading,
so
yeah.
The
thing
is
that
very
use
here.
The
log
message
is
a
kind
of
harvard,
but
the
question
is
whether
we
can
actually
send
a
real
harvard
or
I'm
going
to
actually
detect
whether
there's
been
a
one.
A
A
All
right,
I
was
asking
about
that
because
at
least
two
components
to
this
one
is
not
that
being
able
to
see
the
dashboard
when
your
monitors
are
out
of
quorum.
A
What
the
what
the
status
of
the
monitors
are,
and
the
second
piece
it
sounds
like
is
you're
still
getting
getting
updates
from
prometheus
and
grafana,
and
therefore
answer
dashboard
from
the
monitors
that
are
out
of
form
but
they're
sending
you
out
of
out
of
date,
information.
E
N
F
You're,
detecting
that,
by
a
notify
event,
yeah.
The
thing
is
that
the
the
map
and
everything
regarding
the
monitor
in
the
manager
is
cache.
So
basically
we
have
a
you
know,
outdated
picture
of
the
class,
so
yeah
probably
not
sure
if
stop
being
sending
matches
or
maybe
having
a
ttl
in
the
maps,
so
we
can
detect
whether
they
they
are
outdated
or
something
so
they
are
multiple
was
doing
this
but
yeah.
F
It
is
that
and
we
we
didn't
want
to
break
either
the
existing
modules,
because
probably
lots
I
mean,
I
don't
think
if
the
modules
are
prepared
to
this
situation,
so
maybe
yeah
breaking
all
the
manager
api
in
case
that
there's
been
a
the
situation
might
cause
a
lot
of
person,
and
we
don't
want
to
do
that.
So
basically,
we
just
wanted
to
deal
with
the
dashboard,
monitor
situation
and
yeah,
not
because
well
I
can't.
I
don't
want
to
imagine
what
this
might
happen.
F
K
So
from
so
just
so
that
I'm
understanding
this
correctly
from
the
end
user's
perspective,
what
would
the
so
like
if
we
just
stop
the
dashboard
module
or
what
like,
what
do
they
see?
Do
they
see
the
last
status
or
like
what
is
from
their
perspective?
What
change
changes.
K
And
and
what
and
when
the
monitors
do
form
quorum,
let
us
say
it
just
automatically
checks
that
last
time
it
got
a
message
from
from
the
mon
manager
had
a
communication.
It
will
start
the
dashboard
again,
that's
how
it
works.
N
F
F
J
J
J
J
So
I
think
that
the
simple
work
around
would
just
be
to
have
some
thread
or
loop
in
the
dashboard
that
you
know
waits
five
seconds
and
then
issues
this
status
and
then
waits
five
seconds,
and
does
it
again
and
if
you
ever
find
that
the
most
recent
result
that
you
got
was
more
than
everybody
seconds
old.
Then
stale.
J
K
A
These
days
like
in
the
past,
we
had
to
go
to
the
events.
I've
been
socket
to
figure
out
the
quorum
status
if
the
ones
weren't
in
quorum
right.
The.
J
A
J
F
F
N
K
I
guess
yeah
I
mean
yeah
if
you
want
to
conclude
something
out
of
this
right,
so
I
guess
I,
the
motivation
with
which
this
pr
has
been
opened
is
absolutely
fair,
and
I
think
we
should
do
something
about
it.
Maybe
give
us
some
time
to
take
a
look
at
the
manager
beacon
mechanism
to
see
if
there
is
something
we
can
use
there.
If
not,
we
can
come
up
with
something
more
minimal
like
this.
What
you
have,
but
I'm
I'm
just
not
sure
about
how
reliable
this
solution
is
going
to
be
stuff.
K
A
Okay,
next
topic
is
around
background.
Docker
for
a
development
environment.
N
So
some
of
you
might
know
that
we
were
developing
a
new
development
environment
and
there
basically
is
a
python
script
that
runs
docker
well,
a
docker
container
and
inside
this
docker
container
we
deploy,
but
we
use
the
fading
to
deploy
another
container
inside
this.
This
container
well,
we've
had
some
plug
problems
here.
First
deploying
was
this:
this
was
fixed,
so
that
was
nice.
I
think
I
I
have
to
talk
with
sebastian
and
say
it
and
yeah.
Nothing
too.
N
Well,
but
the
problem
here
is
that
in
my
computer
it
works
just
fine,
but
I've.
Also,
some
avant
and
teammates
of
mine
have
been
having
a
lot
of
trouble
with
this
because
of
c
groups
b2,
and
also
because
we
are
running
this
first
container
with
privileges
so
yeah.
We
are
having
problems
with
that
and
this
solution
that
we
think
will
work,
is
making
the
container
room
and
privileged.
N
N
Then
we
can
maybe
use
podman
for
this,
but
I'm
not
sure
I
haven't
looked
at
it.
That
much
also
the
we
wanted
to
know
if
it
maybe
was
possible
to
add
to
the
third
volume
inventory
loopback
devices,
so
they
are
available
there
and
we
can
so
basically,
at
the
end,
we
can
run
the
dashboard
ci
test
with
with
loopback
devices.
N
So
yeah
this
is
basically
the
current
status
of
the
docker
docker.
We
are
still
having
some
trouble
with
it
in
my
computer
works
but
yeah
I
would
suggest
if
some
of
you
can
test
it,
but
be
careful
with
it,
because
I
think
her
also
crashed
his
computer
with
it
too
and
well.
Apart
from
this,
I
wanted
to
show
you
a
demo
guys
so,
okay,
so
here
this
is
the
folder
where
we
store
the
document,
docker
development
environment.
N
So
basically,
you
can
check
the
cluster
running
with
this
list
command
and,
as
you
can
see,
there
are
two
containers:
the
seat.
One
is
basically
the
one
that
holds
the
thefadm
and
all
the
other
containers,
and
this
one
is
just
ssh
servers,
so
we
can
deploy
well.
So
we
can
use
them
as
hosts
and
if
we
run
this,
we
enter
in
the
first
container
this
one,
and
here
we
can
see
that
we
have
the
favn.
N
D
A
Nice
locks
are
quite
easy
to
use
and
is
the
idea
that
it
uses
existing
likes
of
container
image
and
for
like
the
dashboard
development
purposes.
I
guess
like
how
do
you
make
your
local
changes
to
the
container
images.
N
N
F
Yeah,
this
helps
on
the
python
development.
Basically,
we
haven't
thought
yet
about.
I
mean
I
guess
you
would
first
need
to
well
be
the
container,
but
maybe
there
might
be
a
way
to
mount.
Also
other
I
mean
other
parts,
not
only
the
python
one,
so
yeah
these
miles
work.
I
guess
that,
should
I
mean
we
should
extend
separate
dm
to
mount
also,
these
other
directories
but
yeah
eventually
this
they
might
also
work
with
with
the
rest
of
services
and
demos.
So.
A
J
I
think
I
mean
this
looks
pretty
useful.
That
makes
a
lot
of
sense.
I
think
the
problem
before
was
just
that
when
the
original
pull
request
merged,
it
broke
a
bunch
of
stuff,
but
we
need
to
make
sure
we
don't
break
one
just
this
time,
but
you
mentioned
that
there
were
like
there
were
specific
things
that
you're
still
that
you're
still
blocked
on
that
was,
I
guess,
loopback
devices
you
need
stuff
for
them
to
report.
N
At
the
end,
when
I
think
that
you
made
a
pull
request,
adding
the
no
temp
fs
flag
to
lbm
activate
so
with
that
it
started
working
and
also
I
had
to
change
the
storage
driver
for
a
docker
for
it
to
work.
I
think
it
was
part
of
the
problem.
I
don't
remember
exactly,
but
the
main
problem
my
concern
here
is
that
if
you
run
the
docker
container
with
privileges
on
there
might
be
a
lot
of
unexpected
behavior.
N
F
Yeah,
the
challenge
here
is
running
both
a
container
inside
the
container
and
also
systemd
inside
the
container.
So
the
mixture
of
these
two
things
is
kind
of
a
challenge.
I
saw
some
presentations
from
red
hat
that
with
podman.
They
are
now
able
to
do
this,
also
in
ruthless
mode
because
yeah
in
the
past,
it
also
required
privilege
and
android
mode,
but
it
seems
like
it's
working
in
those
lessons
now
so
yeah.
J
I
guess
it
it
feels
to
me
like
there's:
this
is
gonna,
be
this
is
a
path
that
is
gonna.
It's
a
road
that,
like
is
gonna
end
at
some
point
like
the
further
down
this
road.
You
get
the
more
these
annoying
issues,
you're
gonna
run
into.
I
mean
it
it.
The
overall
goal
is
to
have
an
environment
that
is
like
a
real
cluster
and
is
easy
to
work
on
and
whatever.
Then
you
can,
you
can
get
new.
J
F
J
F
J
Yeah
I
mean
there
is
a
c
start
command.
I
don't
think
anybody
really
uses
it,
but
it
basically
just
runs
fidm
and
it
can
and
probably
should
be
modified
to
have
that
magic
argument
that
passes
your
source
code
directory
into
the
container.
So
you
can
make
running
modifications,
but
that
was
the
that
was
the
intention
to
essentially
replace
or
have
an
alternative
to
the
start
that
just
uses
a
real
stuff
adm
cluster
running
on
your
local
machine.
J
But
that
seems
like
challenge
number
one.
I'd
be
curious,
like
obviously
the
multiple
host
part
is
going
to
be
a
challenge,
although
probably
you
could
run,
but
you
could
just
start
up
some
vms
on
your
local
machine
and
then
add
those
to
your
that
video
cluster,
two,
that
might
that
might
be
an
option.
J
I
think
the
other
half
of
this
is
the
having
things
that
look
like
real
discs
and
behavioral
disks,
but
aren't
real
discs
and
I'm
not
sure
that
loopback
devices
are
necessarily
the
best
way
to
do
it.
They're
sort
of
ignored
by
the
volume
for
confusing
reasons
that
I
don't
know
exactly
what
they
are.
So
I
would
be
hesitant
to
just
allow
them
without
understanding
why
they
were
excluded
in
the
first
place.
J
But
the
one
thing
I'll
mention
is
an
alternative
to
that
is
the
nvme
loopback
port
in
recent
linux.
Kernels
lets
you
create
an
nvme
device
like
dev
nvmes
in
xero,
whatever
whatever
it
is,
backed
by
a
file.
J
And
those
actually
do
behave
exactly
like
a
normal
block
device.
The
only
way
you
can
tell
that
they're
not
is
that
the
vendor,
instead
of
being
you
know,
seagate
or
whatever
is
linux
like
the
vendor
string
looks
different
but
that
they
look
and
behave
like
real
devices.
So
that's
what
we
started
using
for
technology
so
that
we
can
test
all
the
orchestration
features
around
creating
industries.
J
Like
they
look
like
real
invenior
devices,
the
the
catch
is
that
they're,
you
need
a
pretty
recent
kernel,
so,
for
example,
ubuntu
2004
is
not
sufficient.
J
You
need
to
use
the
hardware
enablement
kernel,
which
is
a
little
bit
newer
than
that,
but
still
supported
or
whatever
and
they're
slightly
annoying
to
set
up,
like
the
the
tooling
there's
like
some
python
tool
that
somebody
wrote
that
it's
like
not
available
on
ubuntu
and
so
the
technology
test
that
sets
these
up
just
fiddles
and
says
dev
to
do
it
manually,
which
is
pretty
tedious
but
can
be
scripted,
has
been
scripted.
I
can
point
you
to
the
technology
test.
A
A
I
think
we're
on
the
closenet
utility
side
to
be
able
to
like
run
tests
like
junior
high
scripts,
that
he's
written
to
set
up
like
all
those
topology
services.
You
need
to
run
on
your
local
machine,
so
I
think
the
only
missing
piece
there
is,
I'm
adding
support
for
like
like
adding
worker
nodes
like
a
vm
or
something
to
be
able
to
actually
run
the
tests
against
your
local
container
builds.
A
So
we're
getting
closer
to
being
able
to
have
that,
all
being
all
dancing,
we'll
use
your
deployment
like
we
can
actually
test
with
all
the
kinds
of
tests
we
run
in
your
local
machine.
A
All
right,
so
the
last
topic
is
around
the
auto
scaler
and
the.
A
Pg
max
all
right,
general
ideas
rent
how
to
work
with
metadata
pools
with
the
scale
down
mode,
and
do
you
want
to
talk
about
this.
O
Sure,
I
guess
the
motivation
behind
this
pr
is
because,
with
the
scale
down
mode,
instead
of
if
you're
familiar
with
how
the
autoscaler
works,
it
scales
up
the
pool
the
pg
on
a
pool
based
on
the
capacity
ratio.
So
the
higher
the
capacity
ratio,
the
higher
it
will
increase
the
pg
number,
but
with
scale
down.
O
What
we
do
differently
is
that
we
take
into
account
all
the
capacity
ratio
of
all
the
pools
and
see
if,
if
there
are
one
pool
that
is
using
more
than
other
pools,
we
kind
of
give
so.
Okay,
sorry.
Initially
we
give
we
maximize
the
number
of
pgs,
so
we
call
it
the
pg
the
complement
and
that's
calculated
by
multiplying
osd
count
the
number
of
osds
by
the
monitor
target
pg
per
osd,
which
is
normally
100.
O
So
so,
therefore,
it
kind
of
like
gives
a
lot
of
pgs
to
all
the
pools
and
you
only
if
the
pools
is
using.
If
one
who
is
using
more
pgs
than
the
other
pools,
they
will
get
like
more
pgs
in
that
pool
and
whatever
is
left,
you
give
it
to
the
all
the
other
pools
that
are
not
using
as
much.
O
But
the
problem
is
that,
with
the
scaled
down
mode,
when
you
have
a
default
pool
such
as
device
health
metrics
or
like
the
dot
mgr
pools,
you
will
scale
up
the
that
pool
by
a
lot
because
of
how
the
scaled
down
algorithm
works.
And
if
you
wait
for
a
bit
and
you
deploy
all
other
services.
O
The
the
first
pool
that
you
created
and
has
the
most
pgs
will
start
scaling
down
because
it
did
the
auto
scale
detected.
There
are
all
other
pools,
so
it
needs
to
adjust
so
to
to
give
the
pg's
the
correct
amount
to
the
pools
and
that
can
take
a
while,
and
that
can
be
a
problem.
O
It
can
take
an
hour
or
more
based
on
how
much
pg
needs
to
scale
down
by,
and
so
I
think
this
pr
kind
of
like
tackles
the
issue
of
having
too
much
pg's
on
one
pool
so
yeah.
O
I
guess
pg
max
value
would
make
it
so
that,
like
it
respects,
so
it
is
the
pool,
respects
dollar
scale
respects
you
know
the
boundaries
that
is
set,
so
it
won't
go
more
than
like
a
certain
amount
so
that
it
didn't.
It
doesn't
cause
the
issue
of
like
the
time
it
takes
to
scale
down,
because
when
you're
scaling
down
a
pg,
it's
much
slower
than
scaling
up,
I'm
scaling
down
pg
you
do
it
like
one
at
a
time
but
scaling
up
a
pg.
You
can
do
it
like.
O
O
And
I
guess
if
we
look
at
the
pull
request,
there's
a
discussion
about
it
with
sage
and
josh,
and
I
kind
of
agree
on
the
path
that
we
could
take
and
having
like
the
a
small
flag
for
any
like
metadata
pools,
so
that
we
kind
of
have
in
the
calculation
of
scale
download.
We
can
prioritize
that
specific
pool
first
to
so
that
it
behaves
like
a
scale
up
mode,
so
it
would
only
scale
based
on
its
capacity
ratio.
O
And
then
you
subtract
that
to
like
the
the
pg
complements
the
total
pg
that
we
would
distribute
among
other
pools
so
yeah.
So
I
can
see
how
it
fits
into
the
calculation
of
the
scale
downloads.
So
I
think
I
agree
with
your
josh's
latest
comment
on
the
pr
on
how
we
kind
of
do
it.
K
So
junior
for
folks
who
haven't
followed
the
comments,
can
you
summarize
what
the
idea
is
that
to
have
like
different
modes
like
for
smaller
pools
or
boost
that
we
know
are
not
going
to
consume
too
much
capacity?
We
just
use
the
scale
up
more
and
for
everything
else.
We
use
scale
down.
O
Yes,
yes,
but.
K
I
mean
how
hard
making
that
classification
that
small
flag
that
so
it
probably
only
applies
to
something
like
the
default
pool
right.
The
dot.
O
Manager,
so
we
still
have
to
impose
the
we
have
to
steal,
like
you
know,
all
the
services
that
are
not
like
stuff,
like
other
services,
that
we
that
has
like
metadata
pools,
we
need
to
in
somehow
incorporate
like
the
flags
into
it.
When
you
create
that
pool,
I
think,
in
order
for
the
auto
scaler
to
detect,
if.
K
You
you're
trying
to
say
like
when
you
use
a
service.
It's
like,
let
us
say
an
rgw
based
cluster.
If
there
is
a
small
rgw
pool
that
we,
you
know
so,
rgw
and
pool
creation
would
use
this
small
plug
and
use
scale
up
mode
and
the
data
pools
would
scale
down
yes
and
in
general.
I'm
like
these
classifications
make
me
a
little
nervous,
because
you
know
it's
almost
using
two
different
modes
right.
K
O
So,
in
my
thoughts
it
would
be
like
scale
like
the
profile
of
autoscaler
is
kind
of
like
global.
So
I
I
didn't
want
to
have
some
pools
that
are
like
scaled
up
and
some
pools
are
scaled
down.
It
should
be
like
all
the
same
like
scaled
down,
but
with
this
like
small
flag,
it
kind
of
adds
to.
O
It
only
applies
to
scale
down
mode,
so
this
small
flag
would
only
would
would
help
prioritize
how
you
cal,
you
distribute
the
pgs
and
how
like
each
each
like
the
calculation
of
pg's
would
be
create
like
like
it
will
prioritize.
The
pools
are
small,
and
that
will
you
know
only
scale
up
based
on
the
capacity
ratio,
but
it
wouldn't
like
you
know.
O
If
you
do
like
osd
pull
auto
scale
status,
it
wouldn't
say
that
that
s
pull
is
a
scale
upload,
so
it
would
remain
a
scaled
down
mode
and
all
the
other
ports
are
like
scale
down
mode.
It's
just
that
the
small
flag
adds
into
helps
prioritize
like
the
calculation,
so
that
we
don't
have
the
problem
where
it
scales
up
too
much
with
the
skill
download.
K
So
why
can't
we
achieve
this
with
the
max
rpg
num
max,
so
what
you're
describing
as
a
small
pool
if
we
have
an
upper
bound
as
to
how
many
pgs
that
particular
pool
can
consume,
but
then
that
also
it's
almost
doing
the
same
thing
right.
So
you
might
may
want
to.
A
The
reason
like
we
brought
up
the
idea
of
the
flag,
instead
of
just
the
max
value,
was
that
there
are
a
bunch
of
other
settings
that
we
are
often
applying
to
metadata
pools.
We
could
make
this
more
general
than
just
the
pg
m,
like
I'm,
applying
things
like
the
record-break
priority
that
we
have
set
by
the
by
defaults
and
like
different
tools
that
are
creating
rgw
pools
or
festivals.
Today,
set
a
few
different
values
for
the
metadata
pools.
A
We
could
make
that
make
those
kind
of
a
common
thing
that
are
implied
by
the
small
flag.
E
A
The
other
piece
to
consider-
and
it's
it's
a
detail,
a
little
bit
in
the
ether
pad
there
is
like
the
upgrade
path.
So
I
think
the
idea
would
be
to
keep
everything
with
the
when
you
upgrade
keep
all
the
pools
with
the
current
mode
in
the
scale
scale
up
mode
and
then
now
like
document
how
to
when
you're
turning
on
the
scale
down
mode
that
you
would
want
to
set
this
flag
on
your
metadata
pools.
This
video
wouldn't
have
a
good
way
to
do
that
automatically
or
existing
pools.
A
Unfortunately,
then,
but
if
for
new
clusters,
they
could
use
scale
down
out
of
the
box
and
set
the
flag
on
metadata
pools
that
are
being
created
for
the
places
that
we
can
change.
Creation.
J
I
wonder
if
a
couple
things
I
wonder:
if,
instead
of
calling
it
a
small
flag,
if
it
should
be
a
scale-up
flag
like
so
actually
the
flag
is
actually
what
it
controls
well,.
A
If
we're
we're
talking
about
it
being
more
than
just
like
product,
auto
scaling,
but
also
for
like
recovery
priority
or
like
other
things
related
to
metadata,
maybe
we're.
J
Yeah,
okay,
that
was
one
thought
the
other
one
was.
Maybe
it
might
make
sense
to
flip
it
this
around,
and
so,
instead
of
having
to
like
the
special
pools,
be
marked
special,
have
the
the
bulk
data
pools
that
we
actually
want
the
scaled
down,
behavior
mark
them
like
bulk.
A
A
I
guess
from
like
an
upgrade
perspective,
if
you
don't
have
a
bulk
setting
and
we
try
to
apply
this
like
profile
of
like
every
prioritizing
recovery
or
other
pieces,
we
probably
wouldn't
want
to
do
that
to
a
bulk
data
pool.
J
Just
when
is
that
this
scale
down
mode
intended
to
be
the
default,
is
it
already
the
default
in
pacific,
or
did
we
revert
that
then
we
were.
I
can't
remember.
K
But
yeah,
maybe
we
should
take
a
call
about
this
before
quincy.
J
J
Like
the
worst
case
scenario,
if
a
pool
scales
up
instead
of
scaling
down
is
that
some
data
moves
as
you
fill
it
up,
but
not
a
big
deal
compared
to
like
these
very
small
pools,
generate
a
bajillion
pgs
and
then
have
to
very
slowly
collapse
down
on
themselves
like
that
that
posse
cluster
has
been
working
for
days,
unlike
taking
up
one
of
the
early
pools
that
has
a
bazillion
dgs
and
like
slowly
ratcheting
down
now
that
some
other
file
systems
have
been
created
and
it's
like
it's
a
totally
empty
pool.
It's
like
a
total.
A
J
Big
and
things
like
you
know,
rvd
users
creating
their
own
pool.
I
think
there
is
an
rbd
init
command
or
something
like
that.
I
think
nobody
uses
it,
but
it
exists,
and
maybe
we
should
just
be
like
better
documenting
that.
A
We
can
also
maybe
piggyback
on
some
of
that
application
enable
like
for
for
rbd,
there's
only
one
type
of
pool
that
they
you
believe,
it's
always
about
data
pool.
J
A
K
Talking
about
that
pool
creation
path,
the
last
time
I
looked
at
least
the
most,
so
we
had
added
this
flag
called
mostly
omap
or
something
for
rgw
said
the
bias
property
for
for
metadata
was
in
rgw,
but
that
wasn't
being
exercised.
So
I
don't
know
what
the
you
know
where
the
pool
creation
was
happening
or
like.
Was
it
a
different
code
path?
But
that
is
also
another
problem
if
we
have
the
implementation,
but
if
you're
not
using
that
code,
but
we
use
a
different
code
path,
then
there's
no
point.
D
A
Requires
of
principle
or
probably
self-ed
at
this
point
stuff,
at
least
so
we
have
to
change
each
of
those
places
to
use
these
new
flags.
A
I
guess
the
hope
would
be
that
if
we
have
this
new
flag
changing
because
that's
the
main
distinction
we
make
is
like
between
these
data
pools
and
large
pools,
we
plan
wouldn't
have
to
do
that
kind
of
change
to
everywhere.
We
create
pools
in
the
future
because
we'd
be
able
to
like
add
new
things
based
on
the
existing
flag.
A
J
It
was,
I
think,
that's
this
yeah.
It's
been
a
real
problem
on
posie
yeah
yeah.
I
think,
but
it's
it's
mostly
because
the
there's
that
other
manager
bug
where
it
was
making
the
pg
num
make
really
big
jumps
without
jumping
the
pgp
num.
But
I
have
an
open
full
request
to
address
that
now
that
I'm
deploying
today
but
yeah
it's
there
are
a
lot
of
like
commands
and
not
rushing
to
set
flags
quickly
or
whatever
just
kind
of
annoying
right
right.
A
Yes,
maybe
we
could
be
thinking
about
adding
more
of
those
kinds
of
things
to
those
basic
pool,
create
commander's
options.
Then.
J
K
I
guess
what's
the
conclusion
for
quincy,
I
mean
like:
do
we
go
with
the
bulk
flag
but
with
the
scale
up
mod
by
default
or.
O
I'm
okay
with
the
the
dash
dashboard
and
we
can
keep
it
scaled
up,
and
so,
instead
of
like
having
the
profile
to
be
like
this
guy,
let's
go
down
his
skill
up
and
then
dash
dashboard
would
just
do
the
same
kind
of
like
calculation
for
only
the
bulk
pools.
I
think
I
don't
know
if
it's
too
much,
for
I
mean
yeah.
A
We
can
test
it
out
on
the
lrc
and-
and
you
do
have
a
larger
scale
test
before
you
can
see
there
as
well.