►
From YouTube: 2018-APR-04 :: Ceph Developer Monthly
Description
Monthly developer meeting for the coordination of Ceph project development.
http://tracker.ceph.com/projects/ceph/wiki/Planning
A
B
Let's
go
so
welcome
back
to
the
set
developer
monthly.
This
is
the
meeting
for
April
20
team
yeah.
We
have
some
talks
to
discuss
today,
we're
going
to
begin
to
start
by
some
usability
discussions
that
then
they
need
sage
and
the
rest
of
the
guys
started
during
settle
Kong
in
Beijing,
no
short
time
guys.
Okay,.
C
Yeah
so
indeed,
yeah
Danny
and
Danny
allograft
from
Deutsche
Telekom
eyes,
started
chatting
at
cephalic
on
and
ended
up
having
some
work.
Some
somehow
useful
discussions
about
small
sort
of
paper
cuts
that
could
help
usability
a
bit
and
we
noted
these
down
I
go
through
them.
The
link
is
there
in
the
in
the
chat,
and
these
are
in
kind
of
no
particular
order,
and
some
of
them
are
probably
bad
ideas,
and
maybe
some
are
some
good
ideas,
but
we
can
discuss
okay.
First
one
was
about
basically
it's
Steph
off
edit.
C
So
the
the
problem
here
is
that
Steph
off
caps
sub-command
is
it's
very
simple:
to
make
typos
and
break
client
key
rings,
so
we
thought
of
a
set
off
edit
command.
That
would
just
pop
up
your
your
editor
interactively
pop
up.
Vim
lets
you
edit
the
caps
and
then
save
it,
and
it
would
write
the
caps
very
simple.
A
It
seems
like
it's
an
improvement
over
like
dumping
them
and
cut
and
pasting
them
under
the
command
line,
and
then
editing
him
put
him
back
in,
but
I
think
that
the
bigger
goal
would
be
that
we
don't
normally
have
to
edit
the
caps
at
all
that
most
most
the
time
you'd
be
like.
So
it's
like
there's
a
new
set,
that's
deaf
FS
command
for
grant,
making
or
what
it
actually
is
or
it
just
does
it
all
for
you,
so
I,
don't
I,
don't
want
to
lose
track
of
like
that
larger
goal.
D
D
C
Somewhere
I
mean
as
I'm
as
I'm
speaking,
the
it
sort
of
goes
with
the
next.
With
the
the
next
plus
two
I
mean
it
is
some
interactivity
to
caps,
editing
just
general
interactivity
caps.
Editing
would
be
helpful,
I
think
like
maybe
when
you
run
set
off
caps.
I
could
say:
oh
hey
looks
like
you're.
This
is
your
diff.
This
is
what
you're
changing
you're,
adding
this
or
you're
removing
you're
removing
the
are
permission
from
the
Mon.
Do
you
really
want
to
do
that?
C
E
A
F
C
C
A
This
isn't
this
isn't
adding
functionality
at
the
CLI
right.
This
is
just
making
it
easy
to
edit.
It's
just
like
I'm
pasting,
the
old
value
into
your
command
line.
So
you
can
edit
it
before
you
run
the
at
the
end
of
the
day,
still
you're
gonna
edit,
the
file.
It's
just
gonna,
do
them
on
the
same
monk
and
you
could
have
typed
kind.
C
One
on
one
thing,
because
I'd
hate
like
in
any
kind
of
weird
crisis
situation,
you
know
I'm
home
on
the
weekend
and
there's
some
issue
I'd
hate
to
have
to
like
use
the
remote
Microsoft,
Remote
Desktop,
two
to
get
in
the
CERN
firewall
and
spin
up
a
web
browser
to
to
change
something
right.
That's
why
I
like
I,
told
you
everything.
A
F
C
The
next
one
was
a
that
the
set
CLI
could
use
like
a
normal
and
expert
mode
and,
like
a
like
a
you
know,
I
know
Danny
had
some
funny
video
game
descriptions
of
like
a
death
mode
or
something
for
the
very
advanced
options.
But
basically
you
would
hide.
You
would
hide
some
of
the
sub
commands
of
the
sub
CLI
unless
you
enable
an
expert
mode-
and
this
is
like
taking
inspiration
for
some
well
lots
of.
C
Like
filer
appliances
do
this-
they
don't
enable
that
they
have
a
nav.
You
have
to
enable
admin
mode
on
a
net
app,
for
example,
if
you
want
to
see
some
of
the,
if
you
want
to
play
with
the
waffle
layout
or
whatever
I
think
we
already,
we
already
have
advanced
and
we
have
different
like
difficulty
levels
for
the
config
options.
I
noticed
I,
don't
know
where
those
are
exposed,
if
anywhere,
but
we
could
add
something
like
that
to
the
CLI
sub
commands
as
well.
D
C
B
E
You
know
the
stuff,
that's
really
using
the
cluster
rather
than
tweaking
it,
and
maybe
those
will
also
be
I'm
skipping
ahead
slightly
in
the
ether
pad,
but
those
will
also
be
the
types
of
commands
that
would
have
the
friendlier
interactive
type
behave.
Yeah,
but
I
think
it's
going
to
be
harder
to
divide
these
things
up
into
into
normal
and
expert
than
it
was
for
the
config
options,
because,
with
the
config
options,
almost
all
of
them
are
in
principle
optional.
E
You
never
really
need
any
of
them,
whereas
with
the
command
line
stuff
like
you've
got
real
functionality
that,
if
you're
sort
of
turning
off,
if
you
hide
something
in
expert
mode,
like
I,
don't
know
creating
custom
eec.
Profiles
is
probably
probably
expert,
but
at
the
same
time
it
is
a
first-class
feature.
We
probably
want
it
to
be
friendly,
so
I
think
it's
gonna
be
really
hard
to
feel
that
line.
D
C
A
C
A
B
A
H
A
C
C
I
F
A
C
Well,
what
example
the
interactive
prompts?
Okay
and
the
CLI
could
be
interactive
instead
of
instead
of
taking
like
it's
very
yeah
as
an
operator,
it's
very
difficult
to
know
what
the
different
options
are
like.
The
documentation
is
even
not
always
clear,
like
you're,
creating
a
pool.
Where
does
the
eraser
bit
go?
Where
does
the
crush
rule
set?
Go
like
this
is
all
placing
these
things.
The
right
orders
is
always
tricky,
so
interactive
pool
create
interactive.
C
What
were
the
other?
What
were
the
other
examples
that
we
thought
of
is
so
that
they
actually
the
cool
example
that
we
thought
it
was
delete,
because
so,
instead
of
saying
like
yes,
I,
really
really
mean
it
and
having
to
like
inject
to
the
Mons
that
yeah
you
unable
delete
mode,
you
would
just
try
to
delete
a
pool,
and
it
would
say
hey.
This
pool
has
got
this
much
data
in
it.
You're
gonna
remove
all
that
data
and,
by
the
way,
here's
some
ongoing
I/o
and
in
this
pool,
are
you
sure?
C
A
J
A
E
Should
like,
we
should
get
more
into
that
right,
because
the
the
it's
true
that
adding
interactive
stuff
by
leaching
into
the
sea
lifeline
is
straightforward.
It's
a
little,
but
if
you
get
into
things
like
prompting
people
for
I,
don't
know,
pool
creation
and
having
logic
for
informing
them
about
I,
don't
know
the
number
of
Peavey's
or
what
the
effective
consumption
of
a
pool
will
be.
E
If
you
pick
at
a
particular
array
of
code
profile,
you're
very
rapidly
getting
into
a
position
where
what
lives
on
the
client-side
would
end
up
being
looking
a
lot
like
a
manager
module
right,
so
I
think
if
we,
if
we
weren't
going
down
this
path,
it
would
probably
be
better
to
find
a
way
to
have
more
of
that.
Multi
service
I
just
be
exactly
the
same
logic
that
is
going
to
also
be
presented
in
the
dashboard
when
it
comes
to
the
sort
of
heuristics
and
that
we're
presenting
to
people.
C
F
E
There's
the
right
place
to
dump
this
kind
of
stuff,
I
mean
if
we.
If
we
had
a
lot
of
commands
like
this,
we
could
even
consider
whether
the
interface
between
the
CLI
client
and
the
manager
should
you
know,
be
like
a
streaming
thing
rather
than
a
call-and-response,
though,
if
we
really
are
doing
a
lot
of
interaction
stuff,
but
but
the
more
stuff
we
had
here,
the
more
it
it's
me
wish.
We
were
spending
that
effort
I'm
putting
in
the
dashboard
rather
than.
C
F
So
there
is
this
other
software-defined
storage
project
that
recently
redid
their
CLI
and
has
a
manager
and
the
way
they
redid
their
CLI
was
by
making
a
call
out
to
their
manager
server.
That
turns
the
backend
for
their
dashboard.
That
makes
me
cry,
but
probably
just
because
I'm,
a
C++,
junkie
and
I
hate
web
api's
I
think.
C
B
A
C
Well,
Google
create
I,
just
realized
pool,
creates
gonna,
be
difficult
because
you
can't
make
a
here
it
your
heuristic
has
to
like
you
know
your
heuristic
will
have
a
different.
The
trivial
heuristic
would
have
a
different
outcome
for
the
first
pool,
you'll
create
versus
the
tenth
pool
you
create.
It
has
to
like
know
how
many
pools
you're
gonna,
create
yeah.
A
J
C
A
E
Pool
creation,
I
I
do
have
my
game
adequately
publicized.
What
can
progress
branch?
That's
the
pool
set
concept
and
it
does
the
high
level
high
level
create
ripples
so
and
does
that
by
letting
the
user
say
what
fraction
of
their
cluster
they
intend.
This
pool
to
be
to
get
around
this
sort
of
the
first
pool
versus
tentacle
thing,
so
I'm,
I'm
kind
of
running
behind
on
that,
but
yeah,
it's
that's
kind
of
the
interface.
We
should
be
exposing
to
users
as
a
friendlier
thing
and,
more
generally,
when
we've
got
things
like
PD
gnomes
with
less.
E
C
Maybe
I'll
look
through
the
CLI
as
a
user
from
the
user
point
of
view
again
and
see
if
there's
other
like
killer
use
cases,
we
could
have
for
the
for
an
interactive
mode,
because
it's
true
that
anything
involving
PGs-
and
this
is
like
there's
nothing
better.
It's
a
better
answer.
I
think
removal
could.
E
C
A
A
C
C
C
The
use
case
for
this
is
that
you
have
there's
been.
You
know
something
something
has
happened
to
your
cluster
and
you're
in
a
degraded,
State
or
misplaced
state,
and
you
know
it's
going
to
take
several
days
or
weeks
to
fully
like
rebalance
the
data,
but
for
reasons.
Okay,
any
any
reasons,
maybe
you're
going
on
holiday,
or
maybe
other
reasons
you
just
want
health.
Okay
on
that
cluster,
again,
so,
what's
fos
he
crushed
freezed,
it
would
do.
C
Is
it
would
use
something
like
I
guess
it
could
use
up
map
to
just
take
like
a
snapshot
of
the
current
upset
of
of
OSDs
for
all
P
G's,
and
it
would
let
the
ongoing
backfilling
PGS
to
finish
backfilling
and
they
would
go
to
activist
clean,
but
all
of
the
other
PG
s
that
were
in
backfill
weight
or
recovery
weight
will
not
recovery
weight,
but
backfill
weight
would
just
go
active,
clean
like
immediately
and
then
yeah.
So
that's
the
that's
the
use
case.
That's
the
feature,
that's
the
use
case.
What
do
you
think.
A
This
is
a
totally
different
answer
than
I
gave
you
when
you're
in
Beijing,
but
having
thought
about
it
again,
the
second
time
it
feels
like
this
is
slightly
the
wrong
layer.
What
you
actually
want
is
a
like:
a
data
migration
arrive,
yeah
rebalance
of
data
migration
pause,
so
you
want
anything.
If
there's
any
degraded
pg,
you
wanted
to
continue
to
recover
to
become
not
degraded,
but
anything,
that's
misplaced!
You
don't
care
you're,
just
gonna,
leave
it
where
it
is,
and
that's
gonna
be
fine,
and
so
it
kind
of
feels
like
from
your
users
perspective.
A
A
A
A
And
I
think
it
would
I
think
it
would
operate
at
a
slightly
different
level.
It
would
be
like
a
flag
in
the
OSD
map.
That
indicates
that
we
don't.
Basically
we
shouldn't
try
to
move
Fiji's
that
are
misplaced,
that's
basically
it
and
that,
because
the
flag
is
fat,
then
the
PGS
would
go
clean,
they
would
still
say
misplaced,
but
the
health
okay,
when
when
it
would
be
out
okay,
I
guess
that
makes
sense.
C
J
J
A
J
A
Remapped
clean
is
actually
what
we
want
and
in
fact,
I
thought
we
made
that
change
in
other
post
Manticore
for
luminous
or
the
energy
is
clean
and
and
can
be
clean
and
remap
at
the
same
time,
I
believe
so,
which
means
that
those
C
maps
can
trim
and
we
can
go,
but
there's
probably
like
a
still
a
health
warning
that
I
can't
remember.
I.
A
A
I
think
the
key
thing
that
I
struggle
with
is
that
if
you
set
this
flag,
then
it's
basically
telling
the
system
to
like
pause
data
migration
to
where
it
thinks
it
should
go,
and
if
your
end
goal
is
to
get
a
health
okay,
what
my
first
impulse
would
be
to
make.
If
you
have
that
flag
set,
let's
call
it
no
rebalance.
If
you
have
no
rebalance
set,
then
I
would
want
a
health
warning,
reminding
you
that
you
have
that
flag
set.
B
A
To
you
is
to
set
the
flag
and
then
mute
that
flag
for
48
hours.
While
you
go
away
for
the
weekend
that
one
that
one
warning
you'd
still
get
all
the
other
warnings
for
that
by
the
recoveries
that
are
completing
and
once
it's
done,
then
the
only
warning
that
remains
is
that
no
rebalance
flag
is
set,
but
it
is
muted
for
48
hours.
So
there's
there's
already
a
no
rebalance
flag.
Yeah.
C
C
So
we
would,
with
with
this
thing,
we
would
be
able
to
set
the
tunable
z'
and
then
freeze
the
placement,
but
then
like
super
gradually,
like
maybe
over
a
period
of
one
year,
phase
phase
out
the
old
placement
and
phase
in
the
new
placement.
You
see,
you
see,
you
see
what
I
mean,
so
it's
not
that
I
would
want
to
like
freeze
it
and
then,
like
you,.
F
Yeah,
so
an
alternative,
because
we've
talked
about
that
mapping
change
before
and
in
the
past.
What
I've
advocated
for
is
that,
rather
than
is
that
we
sort
of
maintain
both
of
them
and
we
just
do
it
by
the
hash.
So
we
like
increment
one
PD
at
a
time
to
say
all
the
PGS
prior
to
one
point:
F
our
Maps,
the
new
way
and,
in
all
the
rest
room
app
the
old
way,
still
yeah,
yeah
I,
don't
know.
D
F
I
mean
all
of
our
other
throttles
and
controls
which
aren't
there
yet,
but
are
getting
much
better,
should
be
dealing
with
Steve
like
impact
doing
that
migration.
So
if
that
makes
me
think
that
what
we
really
ought
to
be
worried
about
is
the
reporting
or
not
really
ought
to
be
worried
about,
but
that
we
need
to
think
about
those
as
separate
issues.
A
F
Yeah
my
particular
concern
with
the
up
map
thing
is
that
if
that
puts
the
cluster
into
health,
okay
and
there's
a
dramatic
imbalance,
then
like
none
of
our
other,
for
instance-
are
like
data
fullness,
metric
measures
and
stuff.
None
of
those
are
going
to
work
correctly,
and
that
seems
incredibly
dangerous.
C
Well,
you
would
have
I
mean
I
mean,
so
this
is
then
skipping
to
the
next
point.
Right
I
mean
you
would
have
up
knops,
you
would
have
up
map
labels,
so
you
would
have
up
maps
that
came
from
the
freezer
and
yet
up
maps
that
came
from
the
balancer
and
like
the
balancer
would
keep
working
and
the
balancer
would
keep
actually.
Maybe
the
balancer
would
gradually
remove
these.
These
frees
up
maps
or
something
like
that.
You
know
you
would
have
probably
that's
what
it
would
do.
C
C
C
The
next
thing
is
about
health
improvements
to
health,
okay,
so
this
so
the
very
first
one
was
a
was
a
actually
a
request
from
the
the
guy
at
the
the
British
guy
from
STFC
that
has
the
15
petabytes
physics
data
cluster.
They
have
some
kind
of
hardware
issue
and
they
have
inconsistent
P
G's
all
the
time
and
basically
they
were
like
it
was
like
they
were.
It's
like
it's
a
bit
scary
that
one
inconsistent
object
leads
to
health
error
and,
like
health
error
in
most
people's
monitoring
systems
leads
like
pagers
going
off.
E
People
run
their
systems
in
a
way
that,
like
they're,
managing
at
a
different
level,
you
know
the
the
life
cycle
of
they're
known
to
be
unreliable.
Hardware,
I
wonder
if,
for
those
people,
it
would
be
sufficient
just
to
be
able
to
meet
the
whole
alert
rather
than
setting
a
different
threshold
in
it.
A
Maybe
so
that
the
related
part
of
this
I
think
is
the
mutant
which
I
guess
is
that
what
that
pull?
The
pull
request
was
about.
There's
a
whole
discussion
about
this,
that
maybe
we
should
talk
about
any.
We
shouldn't
guess
that
that
doesn't
quite
cover
this
case,
because
you'll
have
one
inconsistency,
it
goes
away
and
then
you
have
and
then
you
will
go
away
and
then
all
the
new
inconsistency
and
a
new
alert
will
come
back
and
commute
that
one
to
you.
A
The
pull
request
is
like
very
specific
for
something
for
muting
yeah.
The
pull
request
isn't
really
sufficient
it,
but
it
triggered
this
whole
conversation
about
how
you
would
make
sort
of
a
robust
commute,
I
think
like
because
you
can't
just
mute,
set
those
two
down
when
there's,
because
you
might
only
have
one
LC
down
and
then
like
five
more
fail,
and
but
you
want
to
know
about
the
new
those
new
failures,
so
it
has
to
be
like
muting,
individual
and
Julie
identified.
A
C
G
A
I
they're
like
a
scalar
value
or
a
unique
identifier,
and
then
you
can
mute
either
the
entire
warning
in
perpetuity,
which
would
be
like
a
very
dangerous
thing
or
you
would
mute
the
individual
identified
detail
items,
and
so,
if
a
new
inconsistent,
ppt
pops
up
that
isn't
muted
yet
or
you
can
mute
against
a
threshold
on
the
scalar
value,
either
like
individual
details
or
some
of
them,
or
something
like
that.
That
makes
sense.
It
seems
like
we
could
come
up
with
some
like
relatively
constrained,
set
of
like
ways
that
you
mute.
That
would
capture
capture.
E
C
E
So
I
you
can
either
I
mean
that's
all
I'm
thinking
of
this
in
terms
of
like
the
Jason
structure,
and
you
you
you
probably
you
would
have
the
existing
health
field
with
the
list
of
things
which
aren't
heated
and
then
a
separate
list,
that's
exactly
the
same
syntax,
but
as
a
lawns
that
currently
are
muted.
That
way,
if
a
tool
consuming
it
was
unaware
of
meeting,
then
it
would
still
see
on
may
be
unmuted
once,
but
you
could
do
it
the
other
way
and
put
a
boolean
flag
next
to
each
one
right,
I.
A
E
I
think
the
threshold
is
a
fair
bit
harder
to
define,
because
what
some
some
types
of
Health
Alert
won't
have
a
threshold
and
anything
that's
providing
a
user
interface
to
this
stuff
needs
to
no
like
what
products
that
can
present
for
each
health
alert.
So
the
needle
mute
at
once
Mita
always
is
quite
uniform
and
the
same
for
everything,
but
the
prompt
that
says
something
like
mute
unless
the
number
of
degraded
P
G's
goes
above
such-and-such
is
maybe
called
it
to
expose.
A
Yeah
I
would
expect
that
button
when
you
mute
when
the
warning
of
2.5
percent
degraded.
When
you
mute
it,
it
would
just
look
at
the
structure
they
alert
and
it
wouldn't
be
identified
by
unique
identifiers,
but
it
would
be
identified
by
a
threshold,
and
so
the
mute
would
just
automatically
be
based
on
that
threshold.
So
the
interface
wouldn't
be
any
different.
But
implicitly,
if
you
needed
something
that
was
2.5%
and
it
jumped
to
2.6
that
wouldn't
apply
anymore.
B
E
C
C
The
next
thing
was
about
Seth
volumes,
so
this
is
just
general
user
feedback
that
set
volume
is
like
super
cool
and
super
powerful,
and
definitely
the
way
to
go
super
reliable.
But
we
really
liked
I,
think
I,
don't
know
I,
I,
hope,
I,
guess
other
people
users
would
agree
with
me.
We
really
like
the
simplicity
of
the
output
from
likes
fdisk
list.
It
very
clearly
shows
you
what
are
the
scuzzy
devices
on
your
system
and
what
are
the?
C
C
L
C
And
then
we
thought
of
a
cool,
I,
think
sage,
you
thought
of
this.
Something
useful
that
could
also
be
used
by
the
orchestration
tools
would
be
a
batch
prepare.
So
you
have
like
cephalus
F,
set
volume,
LVM
bat
prepare
me
this
server
that
has
24
spinning
discs
and
4
SSDs
and
just
do
the
right
thing.
I.
A
A
Never
just
go
go
do
it,
but
then,
yesterday
I
think
we
were
in
the
container
call
and
we're
having
a
conversation
about
how
the
interplay
between
rook
provisioning
devices
and
the
manager
whatever
should
work,
and
it
feels
the
net
in
the
context
of
that
decision,
that
the
manager
dashboard,
whatever
something
is
going
to
like
ask
a
ask
you
for
an
inventory
of
the
hosts
and
all
the
devices
on
them.
And
then
it's
going
to
go
sort
of
instantiate
a
bunch
of
individual
right
disks.
A
It
feels
like
it
makes
more
sense
for
it
to
live
in
the
manager
in
that
case
and
then
call
through
whatever
the
generic
interfaces
for
I'm
talking
to
the
provisioning
layer,
because
that
I
think
the
thing
to
keep
in
mind
is
that
well
simple,
to
think
about
the
case
of
an
empty
server
and
a
bunch
of
blank
discs
and
doing
the
right
thing.
What
happens
three
months
down
the
line?
A
C
C
It's
it's
doable
so
when
you
said
that
the
manager
would
do
with
that
with
that
like
how
would
the
manager
know
on
an
empty
OSD
server
like
do
you
have
like
a
safe
inventory
agent?
That's
like
telling
the
manager
all
here's
all
the
servers
and
here's
what's
not
used
all
of
them,
and
then
you
can
click
click,
click
OSD.
The
OSD
like.
A
Yes
or
no
yeah,
all
right,
so
the
nice
thing
about
putting
into
the
manager
is
that
you
can
access
that
logic
in
multiple
ways
like
the
dashboard
could
call
out
to
it.
The
dashboard
could
like
present
a
GUI
that
lets
you
click
or
maybe
you
click
a
button
that
says
just
do
the
right
thing
and
it
does
it,
but
you
can
also
wire
up
CLI
commands
that
also
trigger
the
same
code
in
the
manager
would
be
like
staff
space
volume
space
whatever,
but
yeah
it's
it's
it's
more
complicated
because
you
suddenly
have
this.
A
This
added
step
of
the
manager
calling
back
out
to
the
provisioning
layer,
whether
it's
like
raw
SSH
or
core,
set
ansible
or
something
so
I
can
do
things.
A
E
Kind
of
had
been
defaulting
to
the
view
of
the
disk
joist
stuff
living
in
manager,
but
there
are
a
couple
of
cases
where
that
doesn't
make
sense.
Then
the
main
one
is,
if
you're,
using
something
like
Brooke
and
you're
configuring
it
to
say
use
all
the
disks
on
all
the
hosts
with
a
certain
label
has
orestes
such
that
when
the
hosts
are
added,
it
will
automatically
slurp
them
up
again.
E
If
we
wanted
the
managers
to
be
making
the
disk
usage
decisions,
then
it
would
have
to
somehow
be
in
the
loop
in
that
process,
which
is
not
impossible,
but
it
makes
here
to
face
a
lot
more
complex
and
it's
actually
a
lot
easier
in
terms
of
the
interface
between
the
manager
and
the
various
orchestrators.
If
we
just
have
a
way
of
deferring
that
to
the
orchestrator
and
saying
you
know
whether
you
are
and
some
on,
BBC
or
rook
go,
do
the
drags
on
this
host.
E
But
they
have
to
be
able
to
feedback
to
the
manager
and
say
like
roughly
what
they
would
do.
If
you
ask
them
to
do
something
so
that
we
can
still
have
the
workflow,
where
we
warn
the
user,
you
know
which
disks
are
really
going
to
go
format
and
if
they
click
them,
but
in
any
of
these
cases
it
only
makes
sense
to
have
it
in
set
volume.
If
the
orchestrators,
whichever
looks
really
amazing,
is
then
gonna
use
safe
volume,
which
is
realistic.
E
E
H
D
D
So
that's
a
point
where
we
wanted
this
to
be
above
the
Ceph
volume
layer.
We
need
the
extra
information
about
what
the
use
cases
for
this
cluster
like
do.
You
need
metadata,
pools,
they're
faster
for
RW,
worse
FS,
so
you
have
SSDs
packing
as
always
to
use,
or
are
you
purely
going
to
use
those
for
caching
or
for
journaling.
H
Can
you
set
up
policies
now,
where
you
can
utilize
capacity
on
other
SSDs,
that
you
have
to
recreate
those
OS
T's
without
having
to
actually
replace
the
hardware,
and
do
you
want
to
do
that?
There's
just
there's
some
things
that
you
might
be
able
to
start
doing.
If
you
have
a
single
OSD,
that's
just
managing
pools
of
disks
right,
I,.
F
H
F
So
our
eventual
changes
might
enable
stuff,
like
this
mark
but
I-
think
that
that'll
be
a
discussion
when
we,
when
we
have
those
changes,
not
that
we
should
be
worrying
about
right
now.
I
think
that'll,
I
think
I
think
that
if
we
do,
that,
I'll
just
mean
that
all
our
orchestration
tools
are
running
many
years
ahead
of
what
anything
else
in
the
system
can
actually
take
advantage
of
and
that'll
slow
down.
Other
feature
work.
A
A
That
was
originally
driven
by,
like
our
DMA
connection,
counts
or
whatever
and
multiplexing,
but
I
think
actually
a
more
likely
end
point
is
that
will
want
to
have
will
actually
want
to
keep
these
things
separate
so
that
we
have
TCP
connections
terminating
on
the
same
core.
That's
handling
the
I/o
for
the
particular
device,
so
we
might
actually
put
it
in
the
other
direction.
We
want
to
keep
them
separate,
whether
than
the
same
process
or
no,
that
is
sort
of
irrelevant.
At
that
point.
That.
D
C
Mean
I,
I,
think
I,
so
I
think
the
verbosity
thing,
probably
you
guys
agree,
but
the
second
half
is
you
know.
The
use
case
is
is
like
the
it's
just
the
use
case
that
we
need.
We
need
to
do
the
right
thing,
maybe
give
me
two
or
three
options
for
my
different
use
cases
for
my
cluster,
but
you
know
we
need
to
do
the
right
thing
on
a
whole
server
level.
I
think
that's
the
point.
F
You
share
that
my
concern
is
that
I
don't
know
if
we
can
do
this
on
a
generic
basis,
but
we
can
maybe
start
building
up
like
databases
for
certain
kinds
of
hardware
deployments
or
something
yeah.
C
Sure,
right
now,
I
just
think
I
mean
I,
think
that
this
is
I.
Don't
think
it's
changed
since
this
we
might
I'll
reap
which,
like
tomorrow,
to
make
sure
that
it's
the
latest
latest
and
greatest
probably
we
were
missing.
This
might
be
missing.
Some
like
blue
store
here
and
there,
but
but
otherwise
this
is
it
I
just
put
it
up
in
the
chat.
Okay,
so
I
can
move
on.
C
C
C
E
F
C
C
C
C
C
Exactly
I
also
skip
the
PG
hard
limit
one
because
that's
like
basically
it's
a
debate
that
ki,
foo
and
I
were
having
I.
Think
too
many
people
have
been
burned
by
this
PG
overdose
production
that
we
should
just
increase
the
open
house.
Protection
of
it
I'll
probably
just
send
a
pull
request
and
then
let
the
debate
happen.
There.
C
C
Let
me
do
them
in
reverse
order:
okay,
all
right,
because
the
next
one
happens
to
be
something
that
was
already
discussed
and
decided
to
be
a
bad
idea,
SEPA
fests.
So
for
backing
up
set
fest,
it
will
be
really
cool
if
you
could
easily
see
the
list.
The
files
between
two
snapshots,
the
father,
changed
between
two
snapshots.
So
NetApp
has
this
like
snap
tip
diff
functionality
that
does
exactly
this.
C
That
feeds
into
into
like
TSM,
for
example,
I,
guess
all
backup
systems
have
this
basic
functionality,
I
guess
I
say
the
list
of
files,
because
then
the
assumption
of
the
backups
will
then
lead
those
files
in
their
entirety.
It
would
be
even
better
if
you
could
export
the
diff
like
we
do
with
block
devices
and
only
get
like
the
parts
of
the
files
that
change
but
I
assume
that
that's
more
calm
and
god's.
F
L
F
B
F
A
F
I,
just
don't
remember
if
you
can
get
our
stats
on
snapshot
of
directories
that
are
used
from
that
way,
yeah
so
yeah.
So
I
think
you
just
look
at
the
our
stats
on
a
snapshot
directory,
and
so
it
won't
tell
you
about
individual
files
with
it
lets
you
identify
whether
a
file
has
changed
to
cross
between
those
or
whether
directories
change.
So
you
can
build
a
very
fast
traversal.
Is
there.
F
C
F
Ya
know
it's:
it's
all
slowed
up
by
the
directories
for
looking
at
file
dips.
That's
not
supported
in
the
MS
right
now.
One
thing
that
has
come
up
which
might
have
been
which
I
was
about
to
talk
about,
was
an
easy
diff
between
individual
radio
subjects.
I,
don't
know
if
you
wanna
talk
about
that.
Josh
yeah.
D
I
was
talking
about
this
little
do
with
the
Jang
and
Beijing
and
use
it
somewhere,
pinyin
to
you,
Gregg
and
sage,
but
having
the
traversal
tree
still
I
thought
I,
imagined
em
how
that
I
really
does
it
by
giving
it
an
index
of
which
objects
have
changed
between
snapshots
with
the
object
map
and
something
similar
might
be
possible
with
stuff
efest
directories.
Yes,
at
least
at
the
radius
level,
you
can
get
going
get
the
diff
for
individual
readers
objects
so
that
that
can
become
more
efficient.
The.
F
A
C
C
The
next
point
was
that
one
of
the
common
use
cases
when
you're
for
the
MDS
balancer
is
that
you
want,
to
like
add
a
particular
so
I
thought
of
this
is
like
a
path.
Prefix
balancer,
so
like
everything
in
like
slash
home
I
want
to
char
it
across
all
the
active
MTS's,
but
it
parently
said
you
had
thought
about
this
in
a
different
discussion
right.
A
Yeah
I
remember
talking
about
it.
I
can't
remember
if
we
decided
that
certain
solutions
did
or
didn't
work,
but
this
feels
like
something
that's
analogous
to
the
painting
where
you
just
also
have
a
little
thing
that
you
tag
the
directory
with
it
says
like
char
doll,
children
and
so
as
they
get
instantiated
in
the
in
the
cache.
We
just
ship
them
off
to
a
random
I.
Think
if
we
have
that
hint
then
I
think
it
it
wouldn't
be
that
challenging
to
implement.
But
I
can't
remember
if
there
were
other
issues
or
not.
C
C
Now
you
can
already
with
these
with
these
open
source
Dropbox
systems,
you
can
already
use,
of
course,
F
of
S
or
SEF
s3
as
the
backend
storage
for
these
things,
but
they
used
more
like
an
they're
used
like
as
the
as
the
data
store.
They're
not
used
they're,
not
intended
for
the
users
to
also
access
the
files
through
that
through
that
gateway.
So
that's
what's
unique
about
what
we
do
with
so
called
CERN
box
yeah.
A
K
M
C
Yeah,
okay,
going
up
to
our
GW,
so
there
was
a
I
think
it
was.
Was
it
Robin?
Robin
Wright
had
the
idea
with
Danny
that
that
they
should
be
spinning
down
cold
disks
turn
off
entire
servers
with
things
like
OpenStack
Nova.
We
can
we.
We
know
how
to
power
on
and
off
physical
servers
via
ironic,
so
they
have
all
the
credentials
to
talk
to
the
IP
at
my
interface
yeah.
So
you
could,
for
example,
have
likes.
A
C
A
C
Then
then,
the
idea
for
a
tape-
backend
is
also
cool,
hey.
Maybe
it's
also
two
steps
ahead,
but
at
certain
we're
writing
an
open-source
tape
management
system
to
replace
our
old
open-source
data
management
system.
So
I
had
the
idea
that
maybe
we
can
somehow
work
together
on
that
part
again:
yeah,
okay,.
A
It
just
just
really
pretty
sweet
briefly.
The
conversation
here
this
actually
just
came
up
on
the
list.
I
think
yesterday
about
doing
tearing
in
Ratos
or
doing
instead
of
SNR
GW
I
mentioned
there
and
I
don't
say
to
get
nice
I
still
like
the
idea
of
doing
it
in
the
set
of
s
energy
layers,
because
you
got
better
understanding
of
what
you
should
tear
and
when
I
think
would
be
awesome
to
implement
the
nm
API
or
something
at
camera
with
the
HSM
api's
are
called
DM
API.
A
C
C
Basically,
we
we
had
this,
we
thought
of
embedding
Mon
code
into
every
OSD
so
that
the
basically
you
have
a
subset
of
the
OSD
s.
We
cutting
octant
Mons
like
three
or
five
of
them,
and
you
distribute
those
active
bones.
You
select
them
with
a
crush
map,
so
you
get
your
Mons
running
for
free
across
the
failure
domains
in
the
cluster.
C
C
F
F
Stuff,
but
also
because
it
means
that
as
nodes
fail
like
we
could
reproduce
them
without
having
to
do
that.
But
right
now,
like
the
monitors,
are
well-known,
IPS
and
that's
part
of
their
functionality
is
that
clients
know
who
to
talk
to
you
to
to
check
to
the
cluster
and
oh
I,
see
some
I
turn
on
know
who
to
talk
to
you,
connect
to
the
cluster
and
replacing.
That
would
be
a
major
major
undertaking.
E
C
C
C
A
Thanks
for
thanks
for
taking
the
time
I
think
probably,
but
we
should
come
back
to
this
and
sort
of
scrape
actionable
pieces
out
of
this
and
that's
we're
sort
of
doing
our
planning
for
mimic,
because
I
think
there's
a
lot
of
a
lot
of
good
ideas
here,
letting
for
Nautilus
I'm.
Sorry
Louis!
Yes,
that
next
one
okay.
F
Okay,
yeah
thanks
so
much
Dan.
We
didn't
talk
about
a
hierarchy
of
managers.
Right
I.
Think
John
probably
has
discussed
this
before,
though,
so.
Do
you
just
want
to
real
quick.
E
Well,
I
didn't
I,
didn't
write
any
of
this,
and
but
I
can
just
give
my
opinion
about
it.
I
I
think
that
we
could
have
like
a
little
baby
manager
that
can
exist
in
a
hierarchy.
That
just
knows
how
to
do
certain
stats
stuff.
But
if
we
got
to
the
point
where
we
were
thinking
about
like
distributing
queries
or
anything
across
them,
then
I
think
we
be
getting
into
the
territory
of
stuff
that
Prometheus
should
be
doing.
E
C
A
D
Probably
make
more
major
one
is
trying
to
not
have
an
unbounded
amount
of
memory
used
by
PG
logs
during
thankful
and
recovery
there.
Currently,
we
don't
trim
the
logs
and
back
for
work
every
at
all
until
the
recovery
is
completed
and
therefore
can
build
up
kind
of
got
more
and
more
dirty
objects,
as
more
writes
come
in
while
there's
our
activity
in
the
cluster,
and
this
is
all
to
avoid
needing
to
restart
backfill
or
move
from
recovery
to
a
new
backfill
in
the
case
of
a
failure
during
this
process.
D
A
A
H
A
It's
still
gonna
be
a
huge
pain
because
during
peering
you
have
these
missing
sets
that
you're
passing
around,
which
is
like
a
summarized
condensed
parsed
version
of
what
has
happened
to
just
the
stuff.
That's
changed
in
what
versions
like
it's.
You
still
need
to
bring
all
the
memory
for
that.
It's
gonna
get
it's
gonna,
get
super
I
wonder
if
what
actually
we
want
is
well
short
term,
it
seems
like
just
just
forcing
a
trim
and
going
to
back.
So
that's
like
that's
a
no-brainer.
L
L
A
L
D
K
A
A
The
limit
you
have
like
a
hundred
thousand
objects
any
of
a
million
interest
in
your
log,
and
it's
you're
just
optimizing
the
wrong
dimension
right,
yeah,
so
clearly,
I
said
I
mean
this
is
like
the
bloom
filter
to
me
sounds
like
it
might
be
the
right
balance
where
we
can
have
some
sort
of
not
probabilistic
but
a
helpful
hint
it.
So
we
can
prune
out
some
of
the
work
doing
back
though
yeah
I
think
that
sounds
like
a.
D
Good
idea
and
it's
relatively
cheap
to
it
and
fly
as
well
MIT
on
disk
I
think.
Another
piece
we
will
probably
look
look
at
sooner
rather
than
later
is
maybe
optimizing
more
how
much
data
broker
bring
lifer
large
room?
Hipe
objects,
for
example.
What
rtw
creates
it's
already
some
work
going
on
from
some
folks
from
yeah.
A
stack
for
partial
recovery
of
regular
and
data
from
an
object
might
want
to
look
at
doing
that
for
own
web
objects
as
well.
A
D
A
A
D
But
in
any
case,
I
think
that
partial
recovery
is
a
much
bigger
piece
of
work
in
general,
so
the
short
term
stuff,
like
you,
said
when
sudden
it's
just
hard
the
mid
or
a
trip
beyond
which
we
start
coming
again
and
maybe
the
bloom
filters
yeah,
you
constrain
things.
M
M
Mean
I
mean
PG
log
based
recovery
versus
versus
traversing,
all
our
partial
data,
certain
disk.
Basically,
we
still
need
to
take
care
of
two
completely
different
kinds
of
devices.
It's
this
which
we
are
as
which
are
large,
but
have
but
offer
low
traffic
and
contrary
SSDs,
which
tends
tend
to
be
small
but
extremely
fast.
This
is
this
I
mean
also
here
a
p.m.
or
something
like
that.
M
D
As
I
think-
and
this
was
kind
of
going
into
the
next
talk
a
little
bit-
an
abstraction
for
the
on
disk
format
of
the
PG
log
certainly
makes
sense.
I
think
that's
what
we've
discussed
a
little
bit
on
the
mailing
list,
keeping
the
existing
I've
shot
and
subtraction
using
OMAP
for
the
hard
disk
case,
but
since
that's
Q,
CPU
intensive
for
the
SSD
case
may
be
trying
to
add
a
different
on
disk
or
out.
D
There
I
think
it's
much
more
difficult
to
make
a
higher
level
abstraction
of
for
every
and
backfill
together,
because
they're
kind
of
entirely
separate,
all
mostly
separate,
could
pass.
At
this
point.
It
would
be
quite
painful
to
try
to
combine
them
in
a
way
that
made
sense
and
still
kept
things
stable.
H
D
D
H
D
H
D
D
A
H
A
Much
about
this
talked
about
that
I,
don't
know
I'm
like
I,
can't
decide.
If
it's
worth
to
the
complexity
like
during
recovery,
we
would
steal
memory
from
Lucerne
rocks
to
be
in
order
to
keep
a
longer
log
to
do
log
based
recovery,
I,
don't
know
for
now,
at
least
at
them,
I
lean
towards
just
keeping
it
simple,
and
if
we
did
something
like
a
bloom
filter
to
accelerate
rocks
DB,
then
or
what
everything
that
would
I'm
that
might
cover
is
partly
another
end.
A
Just
I
wonder
just
changing
it
to
a
deck.
Actually,
if
it's,
if
it's
right
now,
it's
a
linked
list
of
PG
login
treaties
coming
out
of
cool,
changing
into
a
deck
which
gave
you
like,
maybe
it
doesn't
matter
actually
just
saving
a
couple.
Pointers
anyway,
might
be
a
bit
faster,
yeah.
A
G
Update
sure
I
can
give
you
a
quick
update
to
be
honest,
I,
don't
know
if
dashboard
was
actually
a
topic
of
CDN
call
before
so
I.
Don't
know
where
to
start,
but
maybe
I
just
give
you
a
quick
update
on
what
happened
since
we've
actually
merged
into
our
nope
era
of
in
the
master
tree.
So
the
merge
happened
on
6th
of
March
almost
a
month
ago
and
well.
G
The
dashboard
basically
is
a
full
replacement
for
the
dashboard
that
we
should,
with
luminous
plus
the
additional
stuff
that
was
added
to
the
original
dashboard
and
that
we
ported
over
to
dashboard
v2
before
we
basically
replaced
the
dashboard
is
version
one
in
particular.
These
are
things
like
having
rgw
details
listed.
You
have
a
list
of
the
months.
G
But
that's
something
that
is
in
progress
or
to
say
the
the
documentation
about
the
dashboard
has
a
more
detailed
featureless,
where
I
also
try
to
summarize
in
more
detail
what
each
page,
what
kind
of
information
is
available,
but
so
far
this
basically
is
just
you
are
converting
the
old
old
read-only
dashboard
into
the
new
architecture,
with
lots
of
UI
improvements.
Some
changes
on
the
back
end
that
don't
really
that
user
visibility.
G
Yet,
since
the
original
milestone,
merch
we've
added
support
in
the
backend
for
managing
as
in
Kronus
tasks,
that's
something
that
we
need
when
we
get
into
managing
our
biddies
or
pools,
for
example,
where
you,
basically
you
trigger
a
job
through
the
UI.
But
the
cluster
will
need
some
time
in
order
to
fulfill
that
request,
and
you
can't
have
the
browser
wait
until
the
user
gets
the
request
back.
So
we
have
a
task
queue.
Implementation
that
you
can
see
what
kind
of
jobs
are
currently
being
processed,
the
backend
part
is
already
merged.
G
G
Then
you
get
the
output
of
the
JSON
and
you
can
submit
parameters
and
values
through
the
web
browser
and
submit
them
as
if
you
were
a
rest
client
which
makes
development
much
easier,
and
it
also
gives
you
kind
of
an
automated
overview
in
documentation
of
all
the
various
REST
API
costs
that
are
available
since
they
are
browsable.
So
let's
merge
still
needs
to
be
completed
with
some
of
the
API
calls
right
now
we
are
building
up
a
lot
of
infrastructure
and
adding
lots
of
modules
and
components,
especially
in
the
UI
that
are
needed
to
fulfill.
G
The
real
management
has
to
be
able
to
enter
values,
make
changes
creating
modal
dialogues.
There
are
so
many
small
UI
components
that
we're
currently
building
up,
which
paved
the
way
forward
to
the
more
user,
visible
features
that
you
can
use
to
to
make
changes
to
your
classroom.
So
the
task
manager
you
I
just
mentioned
other
things
that
are
currently
in
the
pipeline
and
nearing
completion
is
our
BD
management,
so
block
devices
creating
them
deleting
them
snapshotting
them.
G
Routers
gateway
management
is
close
to
merged
being
able
to
list
all
the
users,
their
keys
on
all
their
buckets,
creating
users
in
new
keys.
This
is
almost
done.
Safe
pool
management
is
already
merged
from
a
back-end
perspective,
so
the
pool
controllers,
their
web
view
is
also
in
progress.
So
this
will
give
you
a
way
to
create
pools
through
the
dashboard,
but
since
the
main
developer,
working
on
the
UI
part
is
currently
on
sick
leave,
I'm,
not
sure
if
this
is
going
to
meet
the
dynamic
deadline.
We'll
do
our
best.
G
Also,
the
newly
merged
features
with
relate
to
making
contract
changes
and
being
able
to
review
organ.
If
I
remember
correctly,
where
you
recently
added
some
changes
on
how
to
where
you
can
basically
snapshot
a
configuration
of
a
crescent
and
see
a
differe,
what
parameters
have
changed?
These
things
are
also
going
to
be
manageable
through
the
UI
going
forward.
Tatiana's
working
on
that
and
well
Bui
will
be
capable
of
being
translated
into
multiple
languages,
so
the
infrastructure
for
that
is
basically
in
place
on
the
codebase.
G
We
are
now
evaluating
how
to
facilitate
or
how
to
leverage
the
community
for
adding
translations,
basically
to
web-based
platforms
that
we
couldn't
live
in
into
the
one
SPO
editor
com,
the
other
one
is
transifex
both
claim
they
do
have
free
offerings
for
open
source
project.
So
that's
something
that
we
would
like
to
tap
into
in
order
to
set
a
web-based
platform
where
users
can
contribute
translate
its
drinks
to
us
for
what
other
languages
they
require.
G
But
for
the
initial
start
we
will
try
to
come
up
with
a
set
of
translations
on
the
languages
that
our
development
team
speaks
and
well
using
just
the
did
commit
messages
basically
to
to
submit
those
translations
as
a
as
a
foundation
going
forward.
That's
a
really
rough
overview.
I
gave
a
presentation
about
the
current
state
of
the
dashboard
at
cephalic
on
last
week.
The
slides
are
available
and
if
you
want
to
see
them,
I
can
send
them
to
you.
I
hope
that
the
recording
will
also
be
available
through
YouTube.
G
At
some
point,
I
guess
Leonardo
is
working
with
the
organizers
I'm.
Making
that
happen.
I
also
created
a
screencast
from
two
or
three
weeks
ago.
That
just
goes
through
the
features.
If
you
want
to
take
a
look
at
the
UI
and
you
don't
want
to
set
it
up
and
running
into
yourself
but
yeah,
that's
basically
it
I'm
open
to
any
questions
or
feedback
or
comments.
At
this
point.
A
E
G
G
G
E
So
I'm
not
sure
how
many
of
the
relevant
people
we
have
a
look
all
today,
but
it's
pretty
much
just
making
it
run
aware.
The
idea
of
this
is
to
have
a
common
interface
between
seth
manager,
primarily
the
dashboard
and
the
various
different
backends
that
people
use
for
remoting
out
their
hosts
and
creating
IDs
on
them
and
starting
and
stopping
services.
That
kind
of
thing
the
three
that
have
been
explicitly
discussed
so
far
are
set
and
small,
deep
sea
and
rip.
E
So
the
idea
is
to
find
a
useful
common
set
of
functionality.
We
spent
a
couple
of
hours
talking
about
this
in
Beijing
and
what
it
sort
of
boiled
down
to
was
five
basic
operations.
One
is
discovering
inventory,
though,
hosts
and
disks,
not
all
Hardware
inventory,
just
enough
showing
the
user,
where
they
can
do
things
like
create,
osts
and
place
services.
Second,
is
creating
stateful
services
with
a
modern
sort
is
DS
where
the
user
can
say
specifically
which
discs
they
want
to
use.
E
Third
is
stateless
services
and
yes,
Azad
UW's,
so
on,
where
the
user
in
general
doesn't
say
where
they
want
to
run
it,
but
they
might
say
which
host
labels
are
suitable
for
running
it.
So
we
want
to
bring
forward
the
concept
of
labeling
hosts
there.
Lots
of
underlying
orchestrators
have
and
use
that
as
a
way
for
the
user
to
give
us
policies
about
what
should
run
where
the
fourth
one
is
querying
the
status
of
an
existing
service.
E
We're
set
itself
would
be
able
to
say
to
be
able
constraint
here
is
the
version
I
would
like
to
be
running,
and
so
there's
a
little
bit
of
a
chicken
and
egg
thing
there.
The
idea
is
that
the
orchestrator
is
always
something
was
running
externally
to
the
manager.
So
it's
the
thing
that
drives
updates,
but
the
user
would
still
like
to
access
to
that
through
the
easier
interface.
E
So
there's
a
call
for
the
orchestrator
to
publish
a
list
of
versions
that
can
be
installed
and
then
a
call
for
the
percept
manager
to
request
itself
to
be
upgraded
to
a
particular
version.
The
analogy
is
this
manager
sort
of
gets
put
to
sleep
and
when
it
wakes
up
it'll
be
a
new
version
there.
The
details
of
that
might
be
kind
of
interesting
to
implement.
But
that's
that's
where
we
are
at
the
moment.
There's
a
little
bit
more
detail
in
that
dot:
py
orchestrated
rpy
interface
file,
but
at
a
high
level.
E
E
A
E
So
the
my
thinking
on
this
is
that
these
add
stateful
service
operation
that
lets
you
target
a
specific
disk
should
be
sufficient
for
building
whatever
we
want.
If
we
want
to
build
intelligence
up
at
the
set
manager
layer
because
it
can
see
all
the
disks
and
then
do
things
to
them
individually,
but
from
from
talking
to
to
the
people,
who've
been
involved
in
the
various
orchestrated
projects
that
there
isn't
that
I
think
there
is
sort
of
a
feeling
that
people
would
like
to
keep
some
of
the
stuff.
E
That's
implemented
at
that
layer,
and
it
also
I
think
saves
us
a
little
bit
from
from
having
to
have
such
a
rich
interface.
If
we
can
say,
for
example,
that,
if,
if
the
orchestrator
wants
to
do
the
disk
or
show
selection,
if
it
wants
to
query
like
some
really
nitty
gritty
low-level
piece
of
hardware
information
to
make
its
decision
or
like
look
it's
this
tags
on
us
because
they're
building
an
appliance
or
something
like
that,
then
we
can
say:
okay,
fine!
A
E
E
So,
in
order
to
present
that
in
a
user
interface,
it
might
mean
that
the
dashboard
needs
a
little
bit
of
orchestrated
knowledge
to
know
what
the
valid
extended
options
are
for
each
for
each
Orchestrator,
but
yeah
I'm,
sort
of
I
guess
sidestepping
trying
to
define
things
like
that.
By
saying
that
you
know
they
should
just
be
extended,
extended
options
during
creation,
I'm
I'm,
not
sure
we
can
see
far
enough
ahead
to
define
that
stuff.
A
G
Also
I
think
the
the
the
idea
of
having
an
easy
way
to
upgrade
the
cluster
from
one
version
to
next
is
one
of
the
steps
that
should
be
following
the
in
the
later
point
in
time,
because,
really,
depending
on
what
kind
of
orchestration
to
you
used,
they
may
not
even
know
about
a
next
version
at
or
how
to
access
it,
because
they
depend
on
the
package
management
and
the
repositories
they
are
subscribed
to.
So
having
a
generic
interface.
G
E
If
you're
in
size,
you
know
Red,
Hat,
satellite
or
whatever
the
equivalent
is
in
other
distributions,
then
that
becomes
much
less
possible,
but
I
would
sort
of
be
inclined
to
send
that
back
up
the
chain
as
a
challenge
to
the
you
know,
the
people
who
build
that
type
of
infrastructure
for
the
various
distros
to
say,
look
if,
if
I
can't
programmatically
discover
and
there's
an
update
available
for
my
system,
then
maybe
you
should
give
me
an
interface
for
that
and
that
might
take
some
time.
Oh.
G
E
N
E
F
Sure
sorry
I
wasn't
prepared.
This
was
actually
something
that
a
REIT
discussed
in
our
supplicant
QA
thing,
which
was
that
it's
difficult
to
run
tests
that
require
large
numbers
of
machines
in
long-running
times
in
the
lab
right
now,
just
because
it
goes
into
the
queue
of
things
that
are
trying
to
lock
machines,
and
if
you
try
to
lock
one
two
or
three
machines,
you
never
win.
F
D
Jasser
saged,
you
remember
yeah
I
guess
so.
When
idea
was
larger
scale
performance
testing
that
he
has
been
working
on
a
small
scale
performance
test
suite,
that's
it
I
mean
it's
infancy.
Now
we
don't
yet
have
the
way
to
view
the
results
or
visualizing
the
cross
runs
yet,
but
at
some
that
we
want
to
have
and
getting
that
kind
of
data
on
a
larger
scale,
coaster,
that's
filled
up
over
several
days.
D
F
K
F
F
F
We've
identified
is
that
we
don't
have
any
good
like
databasing
or
analysis
solutions,
and
that's
something
that
ii
thought
he's
going
to
need
to
grow,
but
we
haven't
really
found
or
no
one's
really
aware
of
any
other
tools
that
do
what
we
need
to
thought
like.
We
still
want
the
to
thought
the
way
the
tooth
ology
controls
daemons
the
way
that
it
can
lock
machine
to
do,
installs
and
stuff.
It
just
doesn't
seem
like
we
gain
anything
by
going
elsewhere.
A
Yeah
I
think
it's
I
think
it's
specifically.
This
isn't
aging
tests
right.
This
is
no
long-running
cluster
because
we
don't
have
that.
We
can't
afford
to
have
the
cluster
dedicated
to
storing
data
over
long
periods
of
time,
so
this
is
always
going
to
be
constrained
to
datasets
that
we
can
generate
and
test
that
we
can
run
over
a
short
period.
It
feels
to
me
like
that
this
sort
of
the
key
thing
is
whether
it's
something
that
we
can
categorize
as
pass/fail
right.
A
So
if
what
we're
testing
is
like
how
well
recovery
a
performance,
isolation
between
clients
from
recovery
handle
and
what
is
the
like
sort
of
qualitative
behavior
of
the
cluster
under
stressful
conditions,
like
those
are
things
that
are
very
hard
to
programmatically,
define
and
then
like
yet
to
set
like
certain
thresholds
and
I
know
that
that
performance
test
suite
is
like
made
a
lot
of
progress
there,
but
I
think
we
should
initially
at
least
shy
away
from
those
types
of
tests
and
focus
just
on
things
that
we
can
like
validate
larger
scale
correctness,
yeah
I
agree.
H
Be
really
careful
about
how
you
set
up
your
lab
right.
I
mean
right
now,
tooth
ology
is
in
this
big,
huge,
noisy
environment,
where
lots
and
lots
of
stuff
is
going
on,
and
it
wasn't
really
designed
to
run
performance
tests
right
I
mean
it
wasn't.
None
of
it
was
it
was.
You
know,
designed
to
do
functional
tests,
which
are
very
different.
I
think
that
if
you
want
this
to
actually
work
well,
you
need
to
really
think
about
reducing
noise
in
reducing
X
external
influences
on
things
which
is
not
really
just
a
tooth
ology
problem.
D
F
Yeah
and
I
mean
like
some
of
this.
It,
like
some
of
the
performance
stuff,
can
sort
of
be
correctness,
but
a
lot
of
it
is
just
like
okay.
No,
we
need
to
make
sure
that
if
you
have
a
thousand
RG
charred
rgw
bucket
with
a
hundred
thousand
entries
per,
then
it
actually
like
functions
when
you
do
that,
because
we
really
don't
have
any
way
of
making
sure
that
that
doesn't
break.
L
Kind
of
come
get
something
like
a
snapshot
in
Kentucky
right,
so
you
actually
fill
the
cluster
up
to
60%
or
70%,
and
then
you
snapshot
it
at
that
moment
and
then
from
there
on
what
you
just
do
different
tests
and
then
roll
back
those
neck
shock.
I
mean
we
don't
really
come
up
with
a
good
solution:
they're
thought
of
actually
and
LBM
snap
shorted
each
OS.
We
just
do
that.
It's
a
piece
of
thing
and
then
you
pick
it
snaps
at
the
point.
I
mean
you're
thinking
about
at
that
time
and
then,
but
but
yeah.
L
So
what
I
mean
we
can
actually
barricade?
Smaller
cluster
are
in
small
to
medium
kind
of
to
carry
out
these
kind
of
fails,
and
then
you
can
actually
take
I
mean
if
we
had
to
take
a
snapshot.
Kind
of
them,
I
mean
if
we
can
come
up
with
something
like
that,
so
you
can
always
roll
back
and
then
rerun
a
different
config
suit
or
on
the
same
cluster,
and
then
we
can
see
whether
it
really
breaks,
or
that
is
all
this
condition.
N
So
I
have
a
couple
of
things
that
I
want
to
say
mention
here
that
so
we
are
trying
to
address
two
problems
here.
One
is
correctness,
and
one
is
the
scaling
and
how
the
cluster
behaves
with
time.
The
framework
that
we
build
for
one
can
be
used
for
the
other
I
feel
it's
just
the
resources
that
we
allocate
for
each
of
them.
That
will
be
different
so
of
to
address
the
first
issue.
N
What
we
have
currently
and
what
we
started
out
doing
is,
in
my
opinion,
at
the
right
approach,
because
we
have
now
come
up
with
a
basic
suite
where
we
have
like
fixed
set
of
workloads-
and
we
are
just
running
this
suite
on
the
master
branch
and
only
on
smithing
machines.
So
it's
a
very
narrow
bandwidth.
N
Now
that
we
are
going
to
focus
on-
and
this
will
allow
us
in
maybe
a
month's
time
or
so
to
at
least
figure
out
reasonable
baselines
for
correctness,
the
correctness
aspect,
but
yes
for
the
scale
and
the
performance
care
and
the
other
things,
we
need
to
figure
out,
first,
the
resources
and
how
we
want
to
deploy
those
clusters
and
that
definitely
needs
the
database
aspect
and
a
great
way
to
store
the
results.
I
think.
H
H
What
is
the
OST
doing
when
we
run
this
test
and
has
it
changed
not
not
as
a
performance
improved
or
gone
down,
because
there's
there's
too
many
things
to
account
for
there,
but
if
we
can
say
we're
spending
a
lot
of
time
in
this
function
or
spending
a
lot
of
time
in
this
other
function,
it
has
changed
because
of
some
reason,
I
think
that's
much
more
useful
to
us
than
been
saying.
You
know
the
AI
ops
varied
by
10%.
L
Yeah
I
mean
you
get
that
point
mark,
so
I
will
not
count
this
as
a
performance
test
when
we
are
doing
something
at
60%
or
70%
of
the
load.
Mostly
I
would
actually
want
to
measure
the
correctness,
and
it
is
durable
kind
of
thing.
For
me
right
I
mean
it
will
not
break
anything
more
than
performance.
I
would
be
more
cautious
about.
Okay
weather
can
really
fill
it
up
to
90%
and
will
still
sell.
My
I
was
or-
or
something
happens
at
that
point
of
time,
let
it
really
get
busy
in
doing
some
background.
L
F
There's
a
lot
of
a
lot
of
stuff
that
sort
of
ties
into
performance,
but
that
we
can
measure
much
more
directly
as
show
criteria
like,
for
instance.
This
would
let
us
look
more
realistically
at
what
memory
usage
actually
happens.
Can
we
just
look
at
a
top-level
RSS
for
each
OSD
in
these
scenarios
and
see
if
it
changes
over
time,
and
if
we
don't
understand
it,
then
we
know
that
we
have
a
problem
much
larger
or
smaller
than
we
think
they
are.
L
A
F
Yeah
we
make
sure
that
we
like
have
enough
like
sure
we
can
do
one
and
shut
it
down
for,
however,
long
that
one
takes
to
run.
If
we're
it's
can
actually
use
it,
but
I
just
we
want
to
make
sure
that
it's
got
enough
stuff
that
we
actually
run
it
reliably
on
some
sort
of
basis
and
once
we're
gonna
do
that,
like
we
probably
don't
need
to
use
up
all
190
machines
or
whatever
on
one
test.
F
F
A
Cool,
alright,
I
think
that's
the
that's
it
on
the
agenda.
Oh
no,
there's
one
last
item:
I,
don't
know
if
that
squealing
is,
there
was
a
there's
something
else
that
another
thing
that
that
Greg
brought
up
and
I
think
we
talked
about
this
a
little
bit
on
the
mailing
list.
I
guess
it's
sort
of
a
couple
related
issues.
A
We
currently
don't
have
a
two
things:
X,
we
don't
have
a
check
after
we
build
the
packages
for
release,
that
they've
been
successfully
published
to
the
repo
and
the
repos
are
like
valid
and
usable,
and
you
can
actually
like
you're,
not
getting
an
error
and
that
you
know
w
need,
isn't
missing
again.
That's
one,
and
the
other
thing
is
that
we
don't
have
a
way
currently
to
say
for
this
random
test
branch
that
maybe
is
like
the
release
candidate
or
something
build
all
the
packages.
A
So
I
can
run
a
test
suite
that
just
installs
them
on
probably
on
VMs
or
whatever
their
mo
machines,
with
that
Picard
was
just
to
make
sure
that
they
they
work
to
run
like
basic,
upgrade
tests
that
make
sure
that
the
package
dependencies
aren't
broken
or
or
whatever
it
is.
We
used
to
kind
of
do
that
with
the
subtly
sweet
as
they
would
iterate
over
all
this
distress
to
find
in
the
supported
directory.
A
But
the
set
deployed,
testing
stuff
has
gone
stale
and
extensible-
probably
better
choice
for
that,
but
at
some
level
it
feels
like
we
need
sort
of
those
two
things
being
able
to
test
all
test.
Bill
all
build
all
targets
first
and
just
make
sure
they
all
build
in
sort
of
that
the
shaman
the
test
environment
and
that
we
solving
it.
A
Probably
I
don't
really
know
how
the
package
repo
publishing
process
works.
If
there's,
where
the
staging
goes
and
I
know,
it's
all
I
know
is
that
they're
really
big
in
take
a
long
time
to
copy,
but
it
seems
like
in
an
ideal
world,
you
would
go
through
all
you
do
all
the
test
builds.
You
run
all
the
tests.
Testing
process
like
we
normally
do,
and
then
you
get
to
the
point
where
you
like,
build
the
reaction
release.
D
A
We
we
deuce
hit
bugs
in
package
dependencies
during
upgrade
like
pretty
much
every
time
we
shuffle
the
packages
around.
We
break
it
and
the
upgrade
suite
catches
that,
but
we
only
do
that
on
the
distress
that
we
do
so
it
doesn't
sent
us
and
on
them
too.
So
if
the
package
name,
sir
versions
are
different
on
Debbie
and
then
you
have
noticed
so
I'm
not
sure
if
you've
hit
one
of
those.
Yet
it's
a
possibility,
but
it's
certainly
a
possibility.
Yeah
yeah.
F
And
I
think
it's
not
usually
the
packages
aren't
there,
but
that
they're
not
signed
or
something
like
some
some
stuff
got
missed.
So
just
a
basic
install
would
catch
all
that,
but
yeah
it
needs
to
be
a
thing
like
a
thing
that
can
happen
without
being
another
giant
set
of
work
for
the
people
walking
down
the
release.
Checklist
yeah.