►
From YouTube: Ceph Orchestrator Meeting 2021-05-25
Description
A
B
B
It's
a
total
disaster.
So
the
problem
is
the
ingress
service
in
particular,
but
probably
other
things
need
to
be
able
to
look
up
ip
addresses
to
generate
configuration
files
and
they
can't
do
that
got
my
podman.
They
can't
do
that
if
the
etsy
hosts
file
is.
B
Modified
it
feels
like
we
stumbled
upon
the
like
the
working
combination,
so
we've
all
resolve
ip
will
work
with
padman
if
we
pass
no
hosts,
which
is
what
our
original
fix
for
the
other
problem
did,
and
it
also
works
in
general
and
docker.
B
It'll
return
the
actual
ip
address,
but
if
we
don't
pass
no
hosts
to
podman,
if
we
just
run
podman
with
no
special
arguments,
it'll
return
the
loopback
address
and
if
we
pass
the
add
host,
it
also
returns
loopback
address
and
if
that
happens,
then
we're
kind
of
screwed,
because
basically
we
can't
rely
on
dns
in
general.
B
We
can't
rely
on
dns
or
etsy
hosts
in
general,
because
it
just
like
won't
work
because
it'll
get
over,
because
if
you
happen
to
be
looking
up
the
post
or
the
ip
for
the
host
that
you're
the
manager
is
running
on
then
you'll
just
get
the
wrong
address,
so
that
sucks.
I
went
back
to
verify
the
original
dashboard
issue,
which
is
get
udn
or
socket
dot
bucket
dot
get.
If
udn
is
the
original
problem
and
the
problem,
there
was
that
it
was
returning
dash,
fsid
dash
dash
a
x
whatever.
B
And
that
was
screwing
up
the
dashboard
like
tries
to
look
up
what
its
address
is.
This
actually
goes
through
some
helper
function.
I
can't
remember,
but
it's
like
the
the
host
address,
that
it
advertises
when
you
do
step
manager,
endpoints
like
how
you
reach
the
dashboard
or
whatever
is
based
on
doing
that,
get
fqdn
call,
and
it
was
returning
the
container
name,
which
meant
that
the
advertised
address
wouldn't
actually
work
to
reach
the
thing.
B
That
was
the
original
original
problem
that
we
were
trying
to
fix,
and
we
worked
around
that
by
doing
dash
dash
no
holds
on
pod
man,
which
didn't
make
this
entry
and
so
get
fqidn
worked
as
it
was
intended
to
work.
A
B
I
mean,
I
guess
the
the
long
and
the
short
of
this
is
that
what
is
in
master
right
now
works
by
accident?
Basically,
if
both
of
these
problems
are
avoided,
the
dashboard
issue
is
fine
because
we
worked
around
it
by
passing
no
hosts
and
the
resolve
ip
happens
to
work
with
podman.
No
hosts,
though,
what's
currently
in
master,
is
fine,
except
that
you
can't,
if
you're
using
podman,
you
can't
rely
on
etsy
hosts.
D
Which
sucks
so
in
order
to
isolate
the
issue?
I
think,
at
least
from
the
dashboard
side
you
can
set
the
server
address
and
that
basically
overwrites
this,
because
this
was
mostly
for
guessing
the
interface
and
the
well
kind
of
inferring
the
ip
address.
But
if
you
replace
that
with
the
config
option,
the
ip
address
or
server
address
sorry,
then
you
can
call
this.
So
if
that
works,
it
may
be
a
workaround
for
this.
B
I
mean
we
also.
Instead
of
doing
this,
we
could
do
resolve
ip
get
hostname
right,
and
that
also
works
because
of
the
above.
B
D
D
A
B
I
agree,
which
was
why
I
wrote
this
pull
request
that
basically
tries
to
populate
the
ip
field
of
hosts.
When
you
add
them,
when
we
know
that
resolve
up,
you
will
actually
work
and
then
use.
A
B
A
We
know
we'll
be
actually
doing
it.
The
the
bootstrap
post.
B
Oh,
but
also
the
bootstrap
code
is
running
on
the
host
and
not
inside
a
container,
and
so
it
could.
It
could
also
do
whatever
like
he
can.
They
can
do
whatever
we
want
to
do
I
mean
that
I
think
it's
a
bit
questionable
whether
you
should
rely
on
dns
for
this
anyway,
like
I
really
think
it's
better.
It's
a
better
practice
for
your
host
to
have
an
ip
that
you
specify
that
you
use,
because
then
you
don't
have
this
external
dependency
on
dns
if
dns
changes
and
suddenly
your
cluster
breaks.
B
If
it
goes
down
your
cluster
breaks,
like
all
that
stuff
seems
bad,
and
so
with
this
full
request,
then
we
basically
force
people
to
if
they
didn't
provide
the
ipu
for
the
host
when
they
added
it
it'll,
just
look
it
up
for
you
and
add
it
with
that
ip
address
and
then,
when
the
configs
are
generated,
it's
passing
that
address
the
resolver
p,
which
is
usually
just
an
ip,
so
it'll
just
resolve
the
ip
to
get
the
ip
and
that's
mostly
just
for
backwards
compatibility.
B
E
B
We
could
yeah,
I
just
didn't,
do
it
in
that
pull
request,
and
you
have
to
be
careful
because
again
you
solve
this
problem,
where,
if
you
try
to
look
up
the
host
that
the
manager
happens
to
be
running
on,
you
don't
get
the
right.
B
You
don't
get
the
right
answer,
so
it
would
basically
have
to
like
populate
all
the
fields
that
don't
resolve
to
a
loopback
address
and
then
the
next
time
the
manager
fails
over
to
another
host
it'll,
do
the
same
set
of
checks
and
then
hopefully,
it'll
be
able
to
resolve
the
one
that
was
missed
last
time.
I
guess
I
don't
know
whatever
it's
kind
of
a
messy
upgrade,
but
that
seems
like
the
most
reasonable
path
forward.
A
B
E
I
I
think
in
that.
Well,
if
we
try
to
cover
up
the
all
the
different
use
cases,
probably
we
are
not
going
to
achieve
that.
Okay
and
I
think,
because
maybe
we
are
trying
to
to
solve
something
in
the
video
that
is
not
part
of
the
fdm
or
even
the
the
fifth
cluster.
E
Okay,
that
is
resolve
domain
names,
okay,
so
the
my
my
view
is:
that
is
something
that
is
a
resource
that
should
be
provided
by
the
final
user,
the
final
user,
in
the
same
way
that
we
are
requesting
some
requirements,
for
example
python,
3
or
lvm.
We
need
to
to
request
that
you
need
to
have
some
kind
of
name
resolution
in
your
in
your
network
in
that
way
that
we
can
use.
E
I
think
that
the
the
most
simple
function
is
get
the
other
others
info
from
the
the
socket
library
in
order
to
get
the
information
that
we
need
to
work,
okay,
only
working
with
the
name
of
the
host.
I
think
that
this
is
the
the
right
approach.
If
we
try
to
cover
all
the
all
the
different
cases,
probably
we
are
not
going
to
achieve
that
because
we
have
a
lot
of
variability.
So
I
think
that
is
a
problem
that
we
need
to
fix.
E
This
is
the
the
environment
where
you
are
going
to
to
have
your
cluster
working
with
all
the
all
the
possibilities,
and
if
you
do
not
have
this
this
situation,
then
you
are
not
going
to
it's
not
going
to
be
possible,
for
example,
to
use
the
ingress
service
because
we
need
the
name
resolution.
I
don't
know
it's
it's.
I
think.
B
B
I
don't
think
it's
a
feature
to
rely
on
dns,
because
either
the
only
two
things
can
happen
right.
It
will
go
down
and
then
things
will
break
or
it'll
you'll
change
your
dns
so
and
you
want
every
all
the
ips
to
update,
and
I
don't
think
that
that's
a
common
scenario
and
even
if
it
were
like,
I
think
it's
actually
better.
Just
to
tell
the
cluster
like
the
ip
for
this
host
to
change.
I
don't
think
that's
unreasonable
that
I
that
the
so
I
don't.
B
I
don't
think
people
would
expect
to
change
the
dns
and
the
ip
on
a
host
and
have
all
the
software
that's
installed
and
configured
and
running
on
that
host
like
magically
continue
to
work,
especially
a
complicated
system
like
ceph
or
kubernetes,
or
anything
like
that's
totally
non-trivial
like
that's
just
not
expected.
I
don't
think
we
should
worry
about
that
and
if
that's
the
case,
then
there's
like
there's
no
upside
to
rely
on
external
dns.
There's
no
value
in
not
using
ip's
place
of
host.
B
Say
best
practice
is
to,
when
you
add
a
host
to
the
cluster,
you
specify
what
ip
you
want,
the
cluster
to
use
for
that
host
and
we
use
that
wherever
possible,
and
all
these
calls
to
resolve
ip
in
the
code
will
either
be
removed
or
we'll
be
removing
this
whatever
we'll
be
resolving
an
ip
or
whatever.
So
they'll
only
be
there
for
backward
compatibility
or
the
upgrade
okay.
D
D
Okay,
no,
I
was
just
remembering
that
when
we
were
discussing
the
hsa4,
the
dashboard
one
of
the
topics
that
arose
there
was
the
possibility
of
using
services,
discovery,
dns
based
service
discovery,
and
I
was
exploring
that
and
maybe
just
trying
a
tiny
dns
inside
the
manager,
and
I
was
playing
with
the
dns
python
library,
which
is
a
pretty
simple.
I'm
not
sure
if
we
want
to.
I
mean
unbox.
That
thing,
but
at
least
yes,
I
mean
to
be
aware
that
there
is
that
possibility
just
to
get
some
kind
of
service
discovery.
D
A
And
that's
orthogonal,
and
so
that's
orthogonal
because
of
the
the
problem
we
are
observing
here
is
kind
of
the
view
of
the
cluster
itself
by
self
adm
and
a
dns
server
within
the
manager
would
be
for
the
internal
view
of
services
within
the
cluster.
A
There
are
two
different
they're
completely
different,
not
completely
different,
but
they
are
authentic
orthogonal
to
each
other.
We
can
do
the
an
in
a
possible
internal
dns,
independent
of
how
cfadm
knows
about
the
cluster.
A
That
that's
independent
of
of
the
current
problem
here.
A
A
Easily
the.
A
The
problem
that
we
have
right
now
is
that
the
safe
idm
itself
needs
to
know
which
ip
has
which
host
has,
which
ip
and
we're
doing
it
by
kind
of
magically
getting
the
ip
address
of
hosts.
A
That's
in
order
for
surfam
to
work,
but
the
local
dns
server
in
in
the
manager.
I
don't
I
don't
know
if
that's
a
good
idea
or
not,
we
can
discuss
it
later.
I
don't
know
it
would
only
solve
the
problem
for
services
within
the
cluster,
but
not
for
self
adm
itself.
B
D
A
So
the
what
what
I
see
as
a
problem
as
a
problematic
is
that
existing
ways
to
deal
with
hostname
resolution
is
changing
and
people
don't
like
changes
and
fast
by
them.
So
especially
if
it's
kind
of.
A
A
Your
pull
request
is
kind
of
kind
of
sains
it
right,
but
it's
surprising
to
users.
B
I
think
it
feels
like
it
should
just
be
accompanied
by
a
bunch
of
set
of
changes
to
the
documentation
that,
like
basically
remove
any
reference
that
or
make
it
explicit
that
when
you
add
a
host
to
set
adm,
you
have
to
tell
the
fadium
what
the
ip
address
is
for
that
host
and
at
the
time
you
add
it.
Well,
if
you
don't
provide
that
ip,
you
will,
we
can
use
dns
to
try
to
figure
it
out
for
you.
But
beyond
that,
like
there's,
no,
you
can't
use
etsy
host.
B
Yeah,
that
seems
like
the
way
to
do
it
and
the
same
thing
with
that,
and
then
the
dashboard
code,
also
that,
like
returns,
the
the
address
to
use
instead
of
doing
the
socket.fqdn,
it
should
just
look
at
which
host
it's
on
and
then
use
the
ip
that
we
have
for
that
host
right,
like
it
shouldn't,
do
what
it's
doing
now.
D
Yeah
yeah
that
part
of
the
code.
I
will
check
the
look
just
to
see
because
I
think
that
part
has
undergone
some
refactorings.
So
probably
we
fixed
this.
We
made
this
just
for
fixing
some
previous
issues.
B
Yeah,
but
in
any
case,
if
we
make
this
change
like
the
user
real
ip
adder
as
host
whatever,
then
then
we
don't
need
the
other
one
to
allow
it
to
host
with
podman
yeah
like
we
don't
need
that
anymore.
B
B
B
B
B
Yeah,
there
are
a
couple
issues
there,
because
the
config
files
that
contain
that
ip
aren't
listed
as
dependencies,
and
so
we
won't
automatically
reconfigure
all
the
appropriate
services
if
the
ip
dream
changes
exactly.
B
B
Anyway,
so
I
think
action
item
basically
is
to
take
that
close.
This
first
etsy
host
one
take
this
other
one
and
include
the
doc
updates
and
make
the
dli
command
tell
you
the
ip
address
it
used.
If
it
looked
it
up.
B
B
B
It,
okay
and
I'll
do
another
get
grip
on
resolve
ip,
just
to
make
sure
I've
caught
all
the
I
mean
basically
resolve
ip.
We
shouldn't
be
using.
I
think
that
so
the
last
missing
piece
that
I'm
not
sure
is
a
blocker
but
needs
to
follow
at
some
point,
is
the
upgrade
path
where
we
need
to
like
clean
all
this
up
for
buster's
existing
clusters
that
don't
have
ips.
We
need
to
fill
them
in
when
we
can.
B
E
A
Facts
but
magically
resolve
the
problem
of
having
multiple
ips.
If
we
find
a
way
for
that,
I
would
be
more
inclined
to
to
make
use
of
get
facts
instead
of
forcing
users
to
do
the
ip
address,
but
I
don't
think
we
are
there
yet.
B
A
B
B
B
B
B
Let's
probably
also
revert
that
so
that
we're
not
passing
those
arguments
with
podman
and
then
not
don't
replace
the
don't
resolve
the
ip
if
it's
a
loopback
and
then
during
the
upgrade
process
the
manager
is
going
to
move
between
hosts
and
so
we'll
always
be
able
to
look
up
that
host
from
the
other
machine.
At
some
point,.
B
B
Okay
in
the
meantime,
then,
I
would
like
to
merge
the
nfs
pull
request.
Yes,
that
one.
A
A
Okay,
nfs
lcw
second
topic
turns
out
that
we
have
a
conflict
flag
in
the
ganesha
template.
That's
corrupting
nfs
edgerw.
B
Can
we
just
remove
dirt
chunk
equal
zero
and
will
that
magically
fix
everything
or
what
is
the
purpose
of
this
email
thread
should
be
on
the
list.
Basically,.
B
D
A
Okay,
so
where's
the
template.
A
B
B
B
But
there's
only
one
libsefs
client
instance
there's
only
one
stuff,
client,
and
so
the
rgw
block,
which
is
global,
has
the
the
rgw
user
and
sepax
key
or
whatever
for
that,
whereas
for
cephfs,
every
single
export
block
has
its
own
stuff
client.
B
So,
given
that,
I
think
the
rgw
block
makes
sense,
I
think
the
current
behavior
of
how
defadem
is
configuring
it.
Where
we
out
we
create
that
fx
user
for
the
rgw
client
at
the
time
we
create
the
server.
I
think
that
all
makes
perfect
sense.
I
don't
think
there's
anything
really
to
change
there
unless
we
want
to
change
the
high
level
behavior,
but
I
don't
think
there's
any
reason
having
a
shared
def
client
for
all
the
bucket
exports
makes
perfect
sense.
A
B
One-
and
it
means
I
mean
the
mda,
the
other
ones
are
so
I
think
anyway,
I
think
well,
there
were
sort
of
two
threads,
the
question
of
whether
it's
the
right
way
to
like
set
up
the
users
and
configure
whatever.
I
think
all
that's.
What
we
have
now
is
fine,
so
I
think
there's
nothing
really
changed
there.
B
The
bigger
question
is:
is
it
in
fact
appropriate
to
use
a
single
ganesha
config,
a
single
ganesha
daemon
to
export
both
ffs
and
rtw,
which
is
something
we
decided
a
while
ago
and
we
went
and
like
removed
that
option?
When
you
create
a
cluster
of
what
type
of
nfs
cluster,
it
was
whether
it
was
an
origin,
w1
or
an
nfs
one,
but
it
turns
out
this
dirt
chunk.
A
The
ultra
expiration
time
is
in
an
export,
underscore
defaults
block
which
is
not
set
by
the
ganesha.conf
so
yeah.
That's
fine,
then
that's
fine.
It
can
be
put
into
a
common
conflict.
Okay,
that's
independent!
That's
just
this.
B
Dirt
jungle
is
the
other
one
that
he
mentioned
only
numeric
owners.
I
don't
know
what
that
is.
Let's
see,
varsha
sent
me
only
numeric
get
in
here
numeric
owners.
That
is
not
an
expert.
That's
in
the
global.
B
Yeah,
that's
the
question,
the
other
one
that
caught
my
eye
was
in
one
of
the
sample
configs
for
rgw.
There
was
an
option
called
graceless.
B
Which,
I
think
basically
skips
grace
the
grace
period,
which
is
not
okay
for
zfs
but
for
rgw
is
probably
fine,
because
there's
no
state
rgw's,
stateless
rgb
exports
are
stateless.
It's
not
at
the
end
of
the
world.
It
just
means
that
clients
will
like
if
there's
a
server
failover
they'll
have
to
wait
a
little
bit
for
the
grace
period
to
expire
before
they
can
resume.
B
B
A
B
A
A
A
And
it's
different
to
what
we
expected
to
or
what
we
wanted
to
do
mike
any
thoughts.
B
My
sense
so
far
is
that
all
of
the
options
that
have
come
up
they
aren't
maybe
like.
If
you
pick
the
common
set
that'll
work
with
both.
It
isn't
maybe
exactly
what
you
would
choose
for
rgw,
but
it's
not
fatal
like,
for
example,
not
setting
graceless
true,
I
think
it's
fine
for
rgw,
but
this
may
be
not
what
you
would
set
if
it
was
a
dedicated
hw
one.
B
The
one
I'm
more
worried
about
is
that
after
that
dirt
chunk
thing,
because
I
have
I'm
not
sure
if
the
reason
why
they,
that
happened
was
because
it
needs
to
be
like
a
pass
through
into
the
fcell
back
end
for
every
I
o
or
every
metadata
operation,
because
we
basically
want
to
disable
the
caching
for
this.
I'm
not
yeah,
I'm
not
sure
if
it's
a
correct,
missing
or
not.
F
D
A
And
that's
cash
node
yeah,
that
was
the
old
name
for
for
md
cash
and
it
has
dir
and
it's
got
chunky
crit.
A
A
B
B
Because
it
also
means
that
it
means
that
the
nfs
servers
have
a
type.
B
Right,
it
means
that
they're
of
ffs
ones
and
rgw
ones,
and
you
have
to
make
sure
that
the
exports
are
only
added
to
the
right
type.
But
when
you
create
the
cluster,
you
have
to
determine
decide
which
type
it's
going
to
be
when
you're
like
looking
at
it
in
the
orch
or
whatever
like
should
we
have,
should
it
be
like
nfs,
fs
and
nfs
rgw
service
types
instead
of
nfs,
like
all
this
stuff,
gets
gross
at
these
assumptions
that
were.
B
Yeah
all
the
ones
that
are
currently
deployed
are
at
the
fs
flavor
because
they
don't
support
the
rgw
exports
right
now.
It's
fidm
right.
A
No,
we
do,
but
we
have
the
it's
a
w
block.
A
It's
not,
but
it's
not
drying
right.
You,
you
get
those
inconsistent
with
the
red,
read
the
results.
A
F
A
B
B
A
Row
mds:
no,
actually
it
two
different
templates,
one
one
one
ginger
template,
but
two
different
flavors
one
for
nfs
and
one
for
gw
and
the
ntw
flavor
does
not
contain
the
chunk
zero
and
the
nfs
as
the
the
the
cfs
flavor
does
not
contain
the
lgw
block.
So
indeed,
also
deepsea
has
two
different
types
of
banisher
conflicts.
B
B
We
are
realizing
that
the
fdm's
decision
to
have
a
single
ganesha
instance
that
exports
both
ffs
and
rgw
is
problematic
because
for
s
this
dirt
chunk
option
has
to
be
zero
or
should
be
zero.
I'm
not
sure
exactly
if
it's
a
hass
to
or
should
be.
G
I'm
not
I'm
a
little
unclear
on
that
myself.
Honestly,
I
kind
of
followed,
probably
on
the
drag.
Somebody
like.
G
His
name
also
oh
dan,
like
dan
granowitz,
you
probably
know
he's
the
one
that
recommended
that
originally
so
so
ganesha
has
some
caching
that
it
does
for
director
entries.
G
I
think
that
that
was
supposed
to
reduce
the
amount
of
caching
we'd
get
on
directory
entries,
so
that
you
know,
because
you
know
lifts
ffs-
is
doing
that
for
us
at
the
times.
We
want
to
do
it,
so
we
you
know,
we
don't
really
need
to
have
ganesha,
cashing
them
for
us
as
well
a
little
bit
more
processing
to
have
to
come.
Do
the
conversion
every
time
from
whatever
lips
ffs
has
to
whatever
finisher
wants.
But
it's
not
that
big
a
deal.
I
don't
think
so,
but
but
rgw
doesn't
like
that.
I
take
it.
B
A
G
But
again
you
know
the
way
these
tunables
work
in
ganesha's
little
one
do
it
doesn't
seem
to
not
like
you're
turning
off
caching,
when
you
say
that
what
that
says
is
like
the
size
of
the
directory
chunks
that
it's
caching,
I
think,
is
zero.
B
G
A
Or
matt
maybe
frank,
daniel
was
engaged
in
this
conversation.
B
B
Well,
I
guess
it's
like
a
it's
a
question
of:
do
we
try
to
go
down
this
road
of
maintaining
a
single
type
of
ganesha
that
can
export
both
means
that
we
have
to
have
a
common
set
of
global
options
that
will
work
well
for
both
and
how
willing
are
we
to
like
make
changes
even
to
ganesha
to
make
that
possible?
A
B
I
guess
if
we
do
that,
I
would
like
to
make
it
so
that
we
can
remove
that
restriction
in
the
future.
Yes,
so,
for
example,
instead
of
making
instead
of
re-adding
that
type
as
a
positional
argument
for
the
nfs
create,
maybe
make
it
a
flag.
That's
like
a
flavor
that
at
the
moment,
is
required,
but
in
the
future
will
not
be
required
both
by
a
flavor
and
has
a
field
that
right
now
is
required
in
the
service
spec,
but
in
the
future
will
be
optional
and
or
ignored.
G
A
B
I
guess
there
was
that
one
unrelated
issue,
question
on
the
thing
is:
what
happens
when
you
to
the
ingress
service?
If
you
delete
the
underlying
rtw
service
right
now,
it'll
just
throw
an
exception
when
you,
when
it
tries
to
do
the
config,
but
probably
it
should.
B
B
I'm
hesitant
to
have
it
delete
itself,
because
what?
If
there's
like.
A
Yeah,
I
I
wouldn't
do
anything
yeah.
I
wouldn't
do
anything
if,
if
we
at
some
point
decide
to.
A
Start
implementing
migration
automatic
migration
for
wholesaler
offline
yeah.
Then
then,
at
that
point
we
start
to
do
some
kind
of
self-healing.
A
G
Okay,
oh,
I
think
it
back.
That
option
is
still
there
and
in
main
line
I
missed
the
caps
so
get
prepped.