►
From YouTube: Ceph Orchestrator Meeting 2022-01-18
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Looking
at
the
agenda,
we
have
yeah,
we
are,
we
have
quincy,
so
we
we
have
quincy
and
we
have
a
potential
race.
A
Okay,
so,
first,
first
of
all
the
quincy
release
is
going
to
be
not
completely
imminent,
but
we
have
to
wrap
up
quincy
and
we've
discussed
it
last
week.
So
there
isn't
too
much
that
we
really
have
to
care.
At
this
point.
A
It's
mostly
that
we
should
be
aware
that
we
by
now
have
a
dedicated
quincy
brain,
so
we
have
to
make
sure
that
we
are
properly
back
porting
to
quincy
at
this
point,
if
there
is
something
worth
back
parting,
which
I
think,
every
bug
fix
and
every
feature
at
this
point
is
going
to
be
worth
backpotting
to
quincy.
A
And
as
always,
my
my
idea
is
to
propose
we
we're
doing
batch
black
pots
again,
I
did
a
bunch
of
individual
back
parts
to
pacific
today
and,
and
yesterday
and
my
experience
doing
it-
or
I
think
it
was
last
week
already
mike
my
experience
or
my
take
on
it-
is
that
we
should
avoid
individual
back
pots
by
all
means,
because
it's
creating
a
ton
of
extra
work,
any
thoughts
better,
maybe
or
mike.
B
A
Yeah,
and
actually
the
biggest
pain
point
I
discovered
is
that
we
are
going
to
merge
the
backboards
in
a
different
order
than
they
are
in
in
the
master
branch,
which
means
that
at
this
point
there
is
no
way
to
properly
sync
up
master
and
pacific
anymore,
because
it's
going
to
it's
going
to
create
a
ton
of
extra
conflicts
and
yeah,
I
would
that
would
continue
with
space
backwards,
anyone
against
bat,
spec
bots
or
for
force
f
adm.
At
this
point.
C
A
B
Yeah,
I
think,
there's
two
things
left
open
and
then
you
can
start
working
on
the
back
porch,
so
those
are
gonna
have
to
get
so
we
can
can
we
just
backboard
to
quincy
at
the
same
time?
Yes,.
A
Even
though
I
wouldn't,
I
wouldn't
create
an
extra
backpack
to
quincy,
because
the
quincy
already
contains
the
agent.
So
I.
B
Just
mean
for
the
two
changes
that
are
left:
yes,
yeah,
yeah,
yeah.
A
But
pacific
is
going
to
be
a
bit
more
more
work
because
you
have
to
resolve
non-trivial,
merge
conflicts.
B
Yeah
there'll
be
a
lot
again
once
those
two
changes
are
in
it'll.
Probably
take
a
few
days,
get
that
working.
A
B
You'll
find
the
names
so
there's
I
know,
there's
one
for
using
its
own
cache
object,
workflow
in
the
cache
size
and
what
was
the
other
one,
and
then
we
already
get
merged
in
actually.
A
A
Okay,
the
next
topic
that
I
have
on
my
list
is
the
a
race
in
the
surf
loop
against
other
things
happening
in
different
threads,
changing
the
data
structures
mike.
I
guess
you
have
investigated
this
a
bit
right.
C
I
casually
looked
at
it.
I
can't
find
a
root
cause,
one
of
the
things
that
I've
speculated
when
I've
seen
this
is
people
are
doing
some
of
the
synchronous
commands
when
the
server
clip
is
blocked,
doing
something
else
like
so,
for
example,
a
second
or
daemon
ad,
or
something
like
this
async
to
the
server
being
stuck
somewhere
else,
which
then,
as
the
result
of
adding
things
to
the
host
cache.
C
So
maybe
you
should
not
be
there,
such
as
like
a
daemon,
that's
in
a
strange
state,
but
we
have
to
be
able
to
reproduce
that
there's
like
a
very
narrow
timing
window
in
there.
A
So
yeah,
as
far
as
I
know,
I
have
created
at
least
two
pull
requests
trying
to
fix
individual
races.
A
A
A
As
I
said,
the
the
host
add
command
was
was
bracing
with
the
service
edition,
so
we
we
created
so
first
we
can
actually
share
my
screen.
A
Here,
can
I
actually
see
that,
can
you
see
that,
can
you
see
my
screen
yeah,
okay,
so
one
one
race
I've
found
a
year
ago
was
that.
A
We
have
checked
the
host
demons
before
applying
services
on
a
given
host
that
so
that
was
a
workaround
for
this
particular
thing.
The
next
one
was
a
pull
request.
A
There
is
a
possibility
that
we
first
deployed
a
new
nfs,
ganesha
demons
and
then
in
the
next
iteration,
we
deleted
the
old
nfs
ganesha
daemons,
which
didn't
work,
because
we
now
ended
up
with
with
a
port
conflict
between
the
old
servers
and
the
new
servers.
So
in
general,
everything
worked
because
we
first
deleted
the
old
straight
humans
and
then
created
the
new
demons.
But
if
it,
if
we
are
racing
with
a
surf
loop,
we
are
creating
the
new
nfs
ganesha
daemons
before
deleting
the
old
ganesha
demons.
A
A
Yeah
I
fixed
this
particular
race
by
by
manually
deleting
the
old
by
manually,
deleting
the
old.
B
B
A
A
No,
I
had
the
that's
a
different
thread:
yeah.
A
Oh,
it
might
end
up
in
a
different
thread
exactly
so
we
we
ended
up
doing
the
migration
in
between
those
lines
here
before
deleting
the
old
demons
and
and
creating
the
old
demands,
and
if
we
do
that,
we're
creating
the
new
nfs
demons
before
deleting
the
old
demons
ending
up
in
a
port
conflict.
I
I
did
it.
I
I
resolved
this
particular
race.
B
A
B
B
I
don't
think
what
we
could
do
the
migration
we'd
have.
I
think,
if
we
really
wanted
to
make
sure
we
never
have
any
issues,
I
think
we'd
have
to
just
almost
do
it
as
part
of
the
server
loop.
Rather
have
it
been
its
own
thread
be
slower,
but
if
you're
really
worried
about
it,
we
could
do
that
for
the
synchronous
stuff
I
feel
like
if
he
really
wanted
to
never
have
any
race
conditions.
B
But
again
it
sounds
like
something
kind
of
annoying
to
do
and
I'm
not
sure
how
often
these
things
are
going
to
come
up.
That's
worth
doing.
B
Yeah,
so
the
agent
does
there's
only
one
data
structure,
it's
really
updating
and
that's
just
the
demon
cache,
but
we
get
around
that
by
having
that
like
host
metadata
up
to
date
check,
and
so
we
never
apply
services
if
that
check
doesn't
pass.
B
A
Can
can
we
do
that?
Also
for
for
the
other
operations
like
from
the
cli.
B
I
mean,
theoretically,
we
could,
if
somebody
applied
a
spec
mark,
all
the
hosts
is
not
up
to
date,
anymore,
required
getting
metadata
again
before
we
apply
anything
that
would
guarantee
refreshing
demons
essentially
happens
before
applying
specs,
and
you
do
the
same
thing
with
scheduled
actions
as
well.
We
mark
that
particular
host.
That's
not
updated.
We
really
wanted
to
again
that
would
just
basically
enforce
refreshing
demons
always
before
doing
anything
else.
On
those.
A
C
C
C
C
B
B
Yeah,
it
can
be
done
for
hosts,
so
the
way
it
works
right
now
is
it
gets
turned
set
to
like
false.
Whenever
we
create
or
remove
a
demon
on
a
host,
then
that
particular
host
metadata
is
marked
out
of
date.
It
was
really
only
intended
for
fixing
the
applying
specs
and
you
want
to
like
double
place:
demons
yeah,
so
it
gets
marked
false.
B
Whenever
we
remove
a
daemon
and
then
we
update
the
counter
value,
and
then
we
don't
mark
it
back
up
to
date
until
we
get
new
metadata
with
the
right
counter
value.
C
Seems
like
it's
worth
a
try
to
me.
Yeah,
I
mean
one
of
the
other
things
I've
seen
is.
If
you
kind
of
pound
the
scheduled
actions
over
and
over
like
say,
a
redeploy
or
restart
you
queue
a
whole
bunch
up.
You
can
confuse
the
scheduled
actions
in
a
way
that
the
servlet's
broken
as
well,
and
the
fix
for
something
like
that
is
drop.
The
host
cache
so
so
putting
a
lock
around
the
host
cache
sounds
like
the
right
thing
to
me.
B
Hey
man,
I
kind
of
like
want
to
see
the
actual
effects.
You
know
I'm
kind
of
confused
how
this
works
with
scheduling
actions,
housing
issues-
that
was
one
I
didn't
think
would
be
a
problem
just
because,
like
you
just
schedule
them,
then
at
the
end
the
server
was
one
responsible
for
actually
doing
anything.
D
Another
place,
maybe
what,
if
with
another
start
any
time
on
in
a
host,
if
we
see
that
there
is
a
diamond
of
the
same
type
for
the
same
service,.
B
B
D
D
So
in
that
moment,
if
we
check
that
we
have
another
diamond
from
a
previous
version
in
the
same
course
from
the
same
service,
we
need
a.
I
think
that
we
can
stop
and
do
not
launch
the
new
the
new
diamond
easily.
D
Is
in
the
moment
that
we
are
going
to
start
the
diamond
when
we
are
going
to
in
the
moment
that
you
have
make
all
the
checks?
Okay
in
the
in
the
same
moment
that
you
want
to
start.
The
new
diamond
is
the
moment
that
we
need
to
check
if
there
is
another
diamond
of
a
previous
version
or
the
same
service
running,
and
if
this
happen,
but
we
can
abort
the
staff.
D
Probably
we
will
need
more
information
about
the
diamond
okay,
that
is
running
about
versions
or
get
this
this
information
directly
from
from
the
diamond.
I
don't
know
how
to
do
that,
but
in
this
way
we
avoid
to
really
in
any
kind
of
extra
data
structure
in
the
gazette
or
in
the
in
any
other
place.
So
it's
in
the
moment
that
you
are
going
to
start
the
diamond
in
the
host
is
the
moment
that
you
are
doing
the
check.
B
D
What
is
the
service
that
is,
that
belongs
to
the
container
and
what
is
the
version
of
the
diamond
that
is
running
this
in
in
three
years
time,
just
in
the
moment
that
you
are
going
to
start
them.
I
think
that
this
is
the
the
more
difficult
part.
Probably
we
will
need
to
add
more
labels
to
the
containers.
A
Yep,
the
the
problem
is
that
we
are
that
that
we
can't
do
that
right
because
it
would
be
too
slow,
and
this
is
why
we
are
adding
the
the
the
agent
in
order
to
push
information
to
the
miniature
module
asynchronously.
A
But
now,
as
a
side
effect,
we
are
having
to
deal
with
income
with
both
possible
inconsistencies
between
the
data
we
are
having
and
and
in
the
real
world,
and
I
think
the
the
way
adam
you
fixed
it
in
the
in
the
800,
with
the
metadata
up
to
date.
Flag
is
a
good
idea
that
you
should
leverage
more
often
also
in
the
in
the
non-agent
cases,
because
I
think
it's
a
an
easy
way
to
avoid
this
particular
problem.
A
Given
that
the
self-monitor
key
value
store
on
the
keystore
doesn't
have
resource
ids,
I
mean
if
we
could
get
a
resource
version
of
a
particular
time,
then
we
could
just
get
a
consistent
snapshot
right,
but
we
are
not
going
to
have
it.
We
are.
We
have
to
deal
with
the
the
fact
that
new
updates
to
our
data
structures
are
overwriting
existing
values
and-
and
we
have
to
be
aware
that
this
is
racing.
B
C
Like
per
so
that
rbd
issue,
I
found
our
set
volume
hangs,
but
if
we
acquire
this
and
that
the
thread
never
comes
back,
has
the
potential
to
hang
everything
up
too.
B
And
this
came
back
to,
I
know
we
just
on
wednesday.
In
this
end
up
we
had
some
ideas,
I
think
we're
talking
about
like
timeouts
and
like
watchdog
threads
or
something
we
cannot.
A
Okay,
the
problem
is
that,
if
you're
killing
a
threat,
you're
not
cleaning
up
blocks,
you're,
not
cleaning
up
some
airflows,
which
means
as
soon
as
you're,
killing
a
threat.
Your
application
is
prone
to
deadlocks.
B
B
If
we
should
input
a
timeout
system,
but
we
never
got
anywhere
with
the
discussion.
I
don't
think
because
it's
sort
of
tricky
to
come
up
with
good
timeouts,
because
you
can't
make
them
really
small,
because
there
are
certain
commands
that
are
slow
like
a
deploy
command.
That
has
to
also
pull
an
image.
Gonna
take
a
while,
but
if
you
make
it
too
long,
then
you
know
you
wait
for
20
minutes.
Then
it
runs
the
circle
one
more
time.
Then
it
gets
stuck
again.
B
So
I
mean
you
could
raise
a
health
warning
at
that
point
at
least,
but
it's
so
slow,
still,
no
there's
no
inclusions
or
anything.
B
Yeah
it
would
take
a
while
to
get
to
there,
but
I
mean
it
could
be:
maybe
a
launch,
a
long-term
solution,
I'm
going
to
start
short
term.
It
still
is
probably
worth
having
some
time
out
like
a
really
long
one,
maybe
and
just
raise
the
health
corner.
So
people
know,
instead
of
just
the
circle
of
hanging
forever
in
the
long
term,
maybe
go
with
that.
A
A
A
Properly
terminating
ssh
connections,
except
for
creating
physical
linux
processes
for
every
ssh
connector
for
every
linux,
ssh
for
every
established,
ssh
connection,
that's
going
to
be
super
resource
incentive.
A
A
Okay,
I
have
one
more
thing
to
do
and
it's
announcing
that
I'm
going
to
leave
redhead
by
the
end
of
january,
so
this
is
going
to
be
my
second
last
orchestrator
weekly.
B
A
A
The
magnitudes
orders
of
magnitudes,
more
images
than
the
app
than
the
docker
container
docker
registry
allows
us
to.
So
we
have
to
have
a
the
proxy
registry
for
dakai
dot,
io
and
up
jim
sapien,
we're
doing
it.
David
onset.
B
Yeah,
I
wasn't
sure
where
they
were.
If
because
I
was
just
asking
because
the
openstack
team
is
still
looking
for
those
containers
like
where
they
pulled
them
from,
because
they
still
have
been
pulling
the
docker
ones
and
their
ci
is
breaking
with
the.
D
A
No,
it's
it's
internal
to
optimizapia.
Unfortunately,.