►
From YouTube: Ceph Developer Summit Quincy: Orchestrator Follow-up
A
Hi,
everyone
and
yeah
welcome
to
the
eds
follow-up,
and,
as
far
as
I
can
see,
we
are
supposed
to
kind
of
build
some
roadmap
for
quincy,
I'm
not
mistaken.
What
what
I
did
is
I've
used
the
orchestrated,
quincy
other
pet
to
write
down
what
things
were
on
the
list
at
the
cds
plus.
A
I
I
scrubbed
through
the
tracker
and
through
trello
and
I've
hoped
to
come
up
with
a
list
of
kind
of
high
level
things
that
we
need
to
do
I
mean
I,
I
didn't
list
bugs
because
that
that
would
be
overwhelming
and
it
wouldn't
be
sensible-
and
I
also
didn't
include
things
that
are
minor
additions
to
existing
features,
because
that
would
also
be
kind
of
explode.
A
B
A
I'm
looking
at
I'm
looking
at
the
other
pad.
Okay,
I've
copied
the
trello
or
do
you
want
do
we
want
to
use
a
trailer?
What
what
did
you
do
for
the
other.
A
A
Yeah
I
mean
managed
client
curing
is
already
in
progress
right
yeah.
If
we
are
looking.
B
A
If
you're,
looking
at
the
seth
radium
column
here,
it's
already
at
the
top,
minutechef.com
is
already
in
progress.
Poststrain
remove
is
kinda
done
already
right,
yeah
the
documentation,
yeah
clarify
host
maintenance
mode.
B
A
It's
just
a
demon,
stop
the
curios
safely
for
a
for
one
host,
so
it
doesn't
have
to
do
with
with
the
scheduler.
It
just
stops
the
daemon
and
it's
pretty
okay.
B
My
recollection
was
that
that
helper,
that
I
renamed
recently
that's
all
the
schedulable
hosts
excludes
the
host
maintenance,
maintenance,
hosts
and.
A
B
A
Info,
it's
a
gap
that
we
have
compared
to
defensible.
I
would
also
keep
it
there
wherever
it
is
right.
A
B
Yeah
I
had
it
there
just
because
it
seems
like
it
should
come
after
auto
tuning
osd
memory,
which
is
more
of
a
functional
crap
birthday
agent
thing.
It's
only
going
to
be
like
performance
at
scale
which.
A
B
Yeah
at
the
end,
yeah,
that's
it
might
be
redundant.
That's
I
mean
those
yeah.
Let's
think
of
the
same
thing,
I
can
pull
that
off.
Well,
I
don't
yeah,
I
mean
that's
the.
If
we,
if
the
agent
gets
us
the
reconciliation
speed
that
we
want,
then
we
can
just
call
that
one
done
partly.
I
use
this
list
also
just
so
there's
a
when
we
finally
merge
and
we're
looking
at
the
end
of
the
release
and
we're
trying
to
build
this,
a
high
level
list
of
the
things
that
we
delivered.
A
I
mean
I
would
my
idea
would
be
really
to
focus
on
improving
the
architecture
and
making
sure
we
can
update
the
dashboard
pretty
fast,
even
on
really
large
clusters,
instead
of
maybe
sparing
one
or
two
ssh
connections
or
yeah,
make
cutting
the
the
the
time
to
carry
a
host
by
half,
because
it's
not
going
to
it.
It's
increasing
the
the
the
static
portion,
it
it's
just
a
yeah,
it's
the
constant
part,
you're,
reducing
the
constant
part,
but
you're
not
really
having
a
big
impact
on
larger
clusters.
A
That's
my
problem
was:
was
adjusting
improving
the
loop
okay,
smb
number
kinda
important
right.
B
They
were
missing
that
one's
blocked
because
we
don't
have
the
the
high
level
config
option
for
the
mds
yet
so
we
have
to
wait
for
that.
First.
A
B
B
Yeah,
I
just
that
the
thing
I
I
started
a
pull
request
for
this
and
I
just
kept
running
into
like
annoying
technical
issue.
After
annoying
technical
issue,
it's
just
really
complicated
because
the
manager
has
so
many
ports
and
they
can
come
and
they
can
go
depending
on.
If
you
like
turn
on
a
module,
then
suddenly
the
aj
proxy
configuration
has
to
change
and
port
conflicts
demons
that
weren't
conflicting
before
might
conflict
whatever
it
just
got.
Really
it
was
getting
really
complicated
and
I,
whatever.
B
B
B
It
didn't
seem
as
high
priority
as
like
nfs.
B
B
A
Reflect
their
past
package,
I
think
it's
kind
of
important
and
I
would
really
like
to
have
it
because
there
are
so
many
duplicated
code
within
the
staffidm
binary,
without
really
the
ability
to
refactor
at
large
scale,
without
being
able
to
have
proper
modules.
C
This,
yes,
I
think
that,
maybe
I
I
don't
know
I
maybe
it
is
related
with
the
possibility
to
have
some
kind
of
fdm
agent
in
each
of
the
host.
B
C
Yes,
but
what
I
mean
is
if
we
are
going
to
install
the
fdm
agent
in
its
node
could
maybe
we
could
use
the
same
wealth
installation
of
this
agent
in
order
to
install
also
fedm
and
to
convert
the
vtm
in
a
complete
python
package
and
divide
the
script
in
order
to
use
modules
and
to
make
more
clear,
maybe
we
can
use
the
same
approach
and,
at
the
same
time
convert
everything
to
a
proper
python
packet.
B
A
A
B
Then
yeah
it's
20,
20,
yeah
yeah.
The
only
thing
to
watch
out
for
is
that
we
are
using
the
remote
like
bit
where
it
runs
python
code
remotely
for
a
couple
of
cases,
as
opposed.
B
B
B
A
That's
crimson,
but
I
had
that
pro
that
request,
even
before
thinking
about
crimson
more
than
half
a
year
ago
already,
so
it
it's
really
something.
Users
are
interested
in
in
having
the
at
least
ability
to
cpu
pin
than
our
sd
card.
B
So
we
right
now
we
do
it
through
the
config
options.
That's
it's
sort
of
independent
of
that
video,
but
there
are
certain
limitations
to
what
I
think
yeah.
My
question
would
be
what
the
what
the
requirement
there
is,
but
we
already
have.
We
already
have
it
some
degree
like
if
you
set
up
for,
if
you
set
an
option
on
a
single
osd,
you
can
tell
it
what
numa
node
to
use
okay.
B
A
A
A
But
if
we
have
a
workaround
for
that,
that's
kind
of
no
such
high
priority
to
me
would
be
great
to
have
it
right,
but
only
if
you
have
the
time
to
do
that.
B
B
A
Progress
items:
that's
what
kind.
A
Yeah
the
progress
module
items.
B
A
B
A
A
B
A
B
B
Put
it
up
here,
no,
like
my
my
first
priority,
I
think,
is
just
making
sure
that
we
have
like
the
smoke
that
separate
em,
subsuite
or
whatever
that
covers
as
much
as
possible,
with
with
short
running
tests,
so
like
virtually
no
actual
client
workload,
but
just
deploying
it
making
sure
it
works
and
then
removing
it
and
expand
the
coverage
that
way,
and
then
it
seems
like
separately.
B
We
want
all
of
our
stress
tests
to
be
migrated
to
use
that
video
where
possible,
and
then
I
get,
I
think,
in
a
few
cases,
they're
like
they're,
actual,
like
step
fading
stress
tests
that
we
want,
for
example
like
the
nfsha,
for
example.
We
should
probably
create
a
thrashing
truss
test.
That's
like
destroying
ganesha
nfs
servers
and
running
a
workload
and
making
sure
that
clients
are
behaving
and.
A
What
about
an
a
manager,
thresher
yeah?
Are
we
really
sure
that
everything
in
safe
adm
is
item
put
in.
B
B
A
So
bind
to
specific
ips
for
monitoring
services.
That's
kind
of
important.
A
Mean
so
if,
if
you're
deploying
an
ntw,
you
do
not
have
any
dashboard
integration,
I
think
for
dashboard.
We
need
to.
I
think
we
need
two
things
right,
configure
the
dashboard
to
know
where
it
needs
a
doctor
and
set
up
a
an
admin
user
in
order
to
make
the
dashboard
able
to
actually
do
stuff.
B
Yeah,
I
would
put
that
under
the
dashboard
trello,
probably.
A
B
C
B
C
Yes,
because
we
talked
about
that,
this
process
is
probably
is
something
that
should
be
executed
the
interactivity
okay.
In
order
to
see
what
happened
in
the
in
the
moment
that
you
are
creating
it's
it's
part
of
the
architecture
topology,
so
maybe
a
with
art
or
something
like
that
in
the
dashboard
is
going
to
help.
A
But
we
really
need
a
plan
for
that,
because
it's
a
bit
awkward.
You
know
you're
deploying
a
writer's
gateway
with
safety
amp,
and
then
you
need
to
configure
the
dashboard
to
actually
manage
the
writer's
gateway
that
you
just
deployed.
B
B
I
think
we
have
a
rough
plan.
I
just
need
to
sit
down
with
yehuda
and
lee
ernesto
and
just
make
sure
that
it's
right
the
right
path
forward.
Okay,.
B
But
I
would
be
inclined
to
leave
this
card
off
until
like
it
seems
like
the
set.
Fadem
part
is
actually
just
the
dashboard
needs
to
when
you.
If,
if
the
dashboard
is
going
to
do
like
realm
and
zone
management,
then
it
should
like
follow,
which
apply
and
actually
create
you
create
them
in
the
dashboard.
But
I'm
not
sure
if
it
does
that
yet.
A
A
B
Well,
it
yeah
yeah.
I
think
it.
I
think
it
could
get
away
with
doing
both,
because
it
could
show
all
of
the
realms
and
zones
based
on
the
realm
and
zone
config
and
if
you
create
one
it'll,
also
deploy
it
for
you,
but
if
it
already
exists,
then
it'll
you'll,
just
see
it.
You
won't
be
able
to
delete,
won't
work
properly.
I
guess
it
won't
shut
down
the
demons
for
you,
but
that
don't
happen
on
this
one.
B
A
B
Yeah
yeah,
so
maybe
it's
sugar's
data,
probably
fixing
the
monitoring
ips
can
happen
before
smb
because
that's
easy.
B
A
B
I
mean
it
seems
like
an
easy
thing
to
do,
would
be
to
not
persist
them
and
then
have
them
disappear
if
manager
restarts,
because,
most
of
the
time
the
manager
doesn't
restart,
and
these
are
sort
of
cosmetic
anyway,
just
to
see
I'm
creating,
you
know
four
nfs
servers
or
whatever
it
is,
or
I'm
deploying
image.
Busties.
B
B
C
B
B
Anyway,
okay,
we
can
worry
about
that
when
we
actually
do
it.
I
think.
C
What
about
the
possibility
of
purge
cluster.
B
Because
yeah
stuff
idiom
can't
purge
itself,
I
mean,
I
wonder
if
we
want
a-
and
this
is
the
doc
that
you're
starting
sebastian
internally
but
like
if
we
have
a
an
auxiliary
ansible
playbook,
like
maybe
stephanie.
C
It's
not
possible
to
execute
the
the
fatm
rmrm
cluster
in
each
of
the
host.
In
fact,
we
are
using
a
file
was
file
that
is
doing
that.
Okay,
so
I
think
that,
for
example,
if
we
have
a
cluster
and,
for
example,
in
the
theft
cell,
we
are
moving
the
monitor
and
the
manager
to
the
to
the
host
where
we
are
running
the
theft
cell.
Okay,
and
after
that,
we
are
using
an
rmrm
cluster
in
each
of
the
host.
We
are
going
to
remove
everything.
C
Okay,
and
the
only
thing
that
we
are
going
to
need
to
to
clean
are
ost.
Okay,
I
know
is
this:
is
infrastructure?
Okay,
that
I
think
that
is
easy,
because
we
have
the
fav
theft
volume
command.
Okay,
and
after
that,
even
we
can
remove
the
files
that
we
have
in
the
backlift
theft
configuration
and
backlog
theft
configuration.
So
I
think
that
it's
perfectly
possible
to
remove
everything
to
have
only
one
host
and
in
the
moment
that
you
have
only
one
host
you.
C
B
I
think
the
one
thing
that
we
could
add
would
be
a
a
command
that
will
look
at
this
volume
inventory
and
zap
every
osd
that
matches
the
cluster
you're,
trying
to
destroy.
A
C
A
So
the
problem
is
orchestrating
the
cluster
removal
from
within
the
cluster.
Is
that
you
and
you
are
optimistic
right
if
you,
if
you,
if
you
don't,
have
a
monochrome
and
you
can't
execute.
C
You
can
do
that
because
you
have
information
in
the
organizator
about
what
is
the
composition
of
the
cluster,
but
we
can
do
also
from
the
from
the
third
cell
and
sorry
from
from
there
from
the
horse
directory
with
the
theft
in
the
embinary.
But
if
you
want
to
do
that
from
the
inside
orchestrator,
I
think
that
even
that
is
possible.
C
A
B
Yeah,
I
think
we,
I
think
it's
yeah,
I
think
we're
better
off
having
a
a
cfd
mrm
cluster
with,
like
seems
like
we
need
two
things.
We
need
stuff
adm,
like
zap
zap,
all
that
you
pass
an
fsid
and
then
have
for
arm
cluster
have
an
optional
flag.
That
includes
the
zap
all
for
that
same
fsid,
so
you
can
do
it
in
one
command
instead
of
two
and
then,
if,
when
you're
gonna,
do
that
when
you're
gonna
do
the
cluster,
you
just
do
like
a
one
line:
bash.
C
A
B
B
Well,
I
can
add
here
the
videom
that
cluster.
B
A
A
A
A
C
B
B
But
we
should
just
make
sure
that
the
atheist
one
or
whatever
also
has
that's
that
property.
But.
C
B
A
It
actually
works
already
and
I've
added
that
into
the
into
the
other
pad,
but
I
agree
that
we
need
to
have
some
kind
of
it.
It's
too
flexible
right
on
one
hand,
it's
super
flexible
to
have
those
config
options
overwritten
by
better
specifier,
and
on
the
other
hand,
we
we
have
too
many
competing
ways
to
do
that.
Right
now,.
B
B
We'd
have
to
audit
the
code
to
make
sure
that
the
like
there's
a
pull
request
that
I'm
testing
right.
Now.
That
upgrades
the
monitoring
containers
and
I'm
not
sure
that
it's.
A
B
A
Yeah,
just
that
would
be
my
preferred
option
to
have
one
way
to
just
update
all
the
different
default
container
images,
and
not
you
have
to
to
change
that
config
option
for
the
save
base
image.
You
have
to
set
this
config
option
for
the
monitoring
images
you
have
to
set
this
their
respect
property
for
the
proxy
stuff
having
having
one
workflow
that
does
update
all
the
container
images.
A
A
No
it
it
doesn't
really
work
right.
If
you
have
a
custom
container
image
for
snmp
hux
hooks,
then
you
need
a
specific
image
for
for
that
service
and
if
you
have.
B
B
B
Kind
of
I
mean
that
the
upgrade
is
pretty
aggressive
for
the
monitoring
stuff
that
just
we
just
redeploy
it.
If
it's
different,
there's
no
like
there's
no
pause
for
the
monitoring
upgrades.
Basically,
and
so
we
don't.
We
don't
need
that
because
it's
there's
no
pause.
C
I
think
that
to
have
different
versions
in
for
each
diamond
to
open
the
tool
to
unexpected
behaviors
and
problems
now
well
difficult
to
to
resolve.
Okay,
I
think
that
well,
if
we
need
to
we
decide
to
to
deploy
a
service,
rafana
monitoring
monitors
manage
whatever.
Now
we
need
to
use
the
same
image
for
all
the
diamonds
of
one
service.
B
Yeah,
I
guess
I
would
be
inclined
to
wait
until
we
have
a
an
actual
use
case
for
somebody
who
needs
multiple,
different
custom
images
before
we
worry
about
it,
because
it's
it's
work
to
support
that
or
whereas
what
we
have
now,
but
it
still
lets
you
customize
it.
It
just
assumes
that
all
your
grafonites
and
some
in
the
cluster
are
the
same,
which
I
think
is.
B
B
A
B
But
what
there
was
that
I
wanted
to
check
on
it.
It's
it's
in
pretty
good
shape.
I
think
I
think
the
main
problems
is
just
that
it's
it's
a
little
bit
delicate
like
if
you
start.
If
you
start
scheduling
services
that
conflict
on
ports,
then
things
get
confused.
B
If
I
think
this
might
be
a
problem
with
ingress
in
general,
if
you
like,
if
you
delete
the
backend
service
and
the
front-end
service
like
it
might
throw
an
assertion
or
in
some
cases
I
noticed
that
in
like
the
prepare,
create
and
generate
config
functions,
if
those
through
assertions
they
get
swallowed
and
aren't
shown
anywhere,
which
made
the
booking
interesting.
Those
are
all
these
are
all
sort
of
unrelated
to
nfs,
typically,
but
we
should
clean
that
up
at
some
point.
B
Yeah,
I
mean
one
of
the
things
that
I'm
wondering
about
the
well
the
biggest
thing.
Actually,
that
has
to
happen
with
the
nfs
thing
is
just
testing
like
it.
It
all
fits
together.
The
way
that
it's,
I
think
it's
supposed
to
fit
together,
but
I
don't
need
to
actually
scale
the
demons
and
make
sure
that
I'm
just
guessing
at
the
aja
proxy
configuration
and
we
need
to
actually
test
it.
B
There
are
like
a
whole
set
of
timeouts
in
there
that
I'm
not
sure
which
ones
what
values
make
sense
in
the
hprc
config
and
then
the
other
thing
that
this
has
been.
B
I
think
the
thing
that
most
worries
me
is
that
by
default,
the
nfs
service
is
on
port
2049,
the
nfs
port,
but
if
you're
going
to
use,
then
that's
the
one
that
should
be
on
2049
and
the
backend
nfs
port
should
probably
be
something
different
in
case
they
get
scheduled
on
the
same
hosts,
20
40,
49
yeah,
and
so
I'm
wondering
if
the
nfs
service
should
default
to
a
different
port,
but
the
expectation
that
you'll
then
apply
h
a
or
or
not.
I
don't
really
know.
A
Then
I
think
we're
good
and
then,
if
we
have
a
dedicated
section
about
yeah,
okay
having
two
sections
right,
one
one
sec
supposed
to
look
like
so,
but
if
you
really
want
to
have
an
fs
ganesha
without
then
you
need
to
set
the
port
for
nfs
ganesha
to
the
default
port,
20
40.,
yeah.
B
Oh,
I
guess
this
is
sort
of
related.
It
came
up
in
the
review.
There's
there's
still
this
osd
spec
affinity,
thing
yeah.
I
was
always
confused
by
this
before
I
never
fully
understood
it,
but
I
think
now
that
we
have
the
service
name
property
on
all.
The
demons
like
three
you
can
is
that
obsolete
now.
A
A
B
A
By
by
the
way,
if
you
do
that,
then
we
have
a
bug,
because
we
then
have
two
different:
we
we
have
the
same
demon
name
on
two
different
hosts.
We
need
to
periodically
in
our
reconciliation.
Look
over
all
the
all
the
demons
find
duplicates
and
delete
all
demons
that
are.