►
From YouTube: Ceph Performance Meeting 2019-01-03
Description
B
C
C
Realized
like
45
minutes
ago
that
I
had
about
three
or
four
weeks
worth
of
PRS
to
get
through,
and
there
was
absolutely
no
way.
I
was
gonna
get
done
by
the
time
we
started
the
meeting,
so
I
didn't
even
bother
trying
so
next
week.
I
will
have
that
updated,
but
for
this
week
up
and
sage
is
there
anything
you
wanted
to
mention
that
you
know
about
up
top
of
your
head.
No.
B
Yeah
there
was
a
there
was
an
internal
thread
about
noon.
Opening
that
I
resurrected
this
morning.
I
should
just
move
it
over
to
the
development
list
today,
but
the
basic
idea
is
first
add
the
infrastructure
to
nuuma
node,
so
add
helpers
to
forgiving
block
device
to
determine
what
Numa,
no
that's
attached
to
and
then
for
object,
store
back
end,
look
at
any
non
rotational
devices
that
we're
using
and
look
at
the
Numa
node
for
all
of
them
and
if
they
all
match
then
report
that
as
the
back
ends,
Numa
node.
B
If
they
don't
match,
then
don't
do
anything
or
if
they're,
not
rotational
or
if
they
are
rotational.
Don't
report
anything
report
all
that
up
through
the
OSD
and
then
just
report
that
include
that
of
any
OSD
metadata.
That's
reported
to
the
monitor
at
the
first
part.
Second
part
would
be
do
the
same
thing
for
the
NIC.
So,
look
at
the
IP,
the
public,
IP
and
back-end
cluster
IP
that
we're
bound
to
and
look
at,
the
Numa
notes
for
those
and
if
they
match
then
report
that
as
the
network.
C
B
I'm
sorry,
because
the
third
thing,
okay,.
C
B
B
Like
that,
but
how?
Where
I'll
just
report
the
OSD,
the
new
my
notes
for
the
Nick
and
for
the
storage
and
if
it
if
we
are
pinning
the
LSD,
what
how
it
spins
and
if
the
pneuma
notes
don't
match,
then
we
wouldn't
pin
and
then
of
course,
that
I
guess
the
fifth
thing
would
be
to
if
match,
automatically
pin.
B
B
And
there'd
be
like
a
host
Enuma
in
or
something
resetting
to
control,
whether
you
that
basically,
first
part
would
just
be
to
like,
have
some
visibility
so
that
if,
if
it's
clear
that
the
storage
is
on
one
Numa
node
and
it's
clear
that
the
network
someone,
then
we
know
then
report
that
so
that
admin
can
see
it.
And
if
that
you
know,
if
it's
unknown
they
on
this
host,
maybe
we
just
don't
know
what
the
NIC
is
or
maybe
the
storage
is
ambiguous.
B
Then
we
could,
like
you
know,
do
negative
one
or
a
dash
or
something
or
just
leave
it
blank
or
whatever
right.
But
if
we
know
that
it's
on
one
node,
then
whatever
and
so
then
in
Emmett
can
look
at
this
thing
and
say
well,
how
come
my
storage?
Doesn't
this
reporting
a
newa
node
and
if
they
go
look
and
debug
the
system
that
they
can
they'll
notice
that,
like
their
devices
spanning
two
they're
using
an
SST
device
and
an
SSD
journal,
an
nvme
journal
so
Chu
and
me
devices
but
they're
under
for
Numa
notes?
B
Maybe
that's
why?
Whatever
who
knows
or
the
same
thing
on
the
nicked?
Maybe
the
NIC
is
the
reporting,
an
uma
note
and
that's
because
the
front
end
and
back
in
networks
that
are
different
or,
more
importantly,
if
they
are
reported,
but
they
don't
match,
then
they
know
that
they're
they're
just
either
not
using
the
right
Nick
or
they
don't
have
a
NIC.
That's
attached
the
same
socket
as
they
as
they're
in
VMI's.
That
would
be
a
problem.
B
Yeah
missile
chart
like
this
is
gonna,
go
a
long
way
because
you
can
just
look
and
see
if
they
match
or
don't
match
yeah
whether
we
want
to
go
so
far
as
to
like
raise
health
alerts
if
they
don't
match
I
think
it's
a
little
bit
clearer,
because
they're
gonna
be
a
lot
of
deployed
clusters
that
are
totally
fine.
That,
like
are
just
the
hardware,
it's
just
not
not
ideal
and
there's
nothing
to
be
done
about
it,
though
raising
a
home
doesn't
really
help.
B
So
I
think
that's
why
I
think
I
knew
a
status
just
a
command
that
you
can
run
that
just
tells
you
what
the
newest
status!
Maybe
we
can
know
service
on
the
dashboard
at
some
point
like
highlight
and
read
the
ones
that
are
problematic
in
orange,
the
ones
that
are
like
not
ideal
in
the
ones
in
green
that
are
fabulous
any.
B
Like
a
system
where
somebody
did
the
front
and
the
back
of
network
is
also
gonna
flag
stuff,
because
that's
just
turns
out
this
time,
I
guess
six
would
be
to
rewrite
this
stupid.
Out-Of-Date
document
rewrites
talks
about
front
back
networks
for
honesty's,
because
I
think
that's
just
a
bad,
maybe
in
the
offense.
You
know
five
years
ago,
but
it's
just
bad
advice.
C
When
one
thing
that
we
we
haven't
really
done
a
good
job
of
over
the
years,
I
think
is
really
focusing
on
like
hardware
topology
right
and
part
of
this
is
because
the
vendors
don't
like
it's
impossible-
are
really
hard
to
get
the
vendors
to
really
like
think
critically
about
how
the
hardware
inside
the
node
is
really
laid
out,
or
at
least
that
I
always
found
it
was
myself.
But
as
we
move
forward,
we
probably
need
to
really
do
a
better
job
of
getting
people.
Thinking
about
this.
B
There's
some
command
and
known
as
well
as
googling
for
this
called
Ellis
Ellis
topo,
which
is
like
a
visualization
of
what
the
what
the
hardware
looks
like
where
the
PCI
buses
are
and
and
all
this
stuff
I'm
fooling
it
now
and
I'm,
getting
like
pictures
out
of
a
GUI.
But
I
thought
this
was
a
command-line
application,
but
maybe
it's
actually
a
next
application
or
something
but
I
did
tells
you
how
the
l1
l2
l3
caches
are
how
the
Numa
nodes
are
set
at
and
so
on.
It.
C
B
B
B
C
Probably
not
well
maybe
this
time
I
don't
know,
but
at
some
point
I
still
very
much
would
like
to
talk
about
what
an
OSD
really
means
and
kind
of
what
it
means
in
terms
of
like
a
hard
work.
Sure
like
this,
when
we're
moving
into
a
world
where
you
have
different
new
menu
nodes
and
lots
of
cores
and
lots
of
hardware
in
a
box,
what
isn't
what
is
no
SD
and
why?
C
Why
should
it
be
one
device
or
should
it
not
be
one
device,
and
should
it
be
like
a
conglomerate
of
closeby
hardware
or
what
I
think
I
am
following
into
the
mindset
of
thinking
that
an
OSD
should
be
a
grouping
of
closeby
hardware
and
that
then
it
figures
out
how
to
divvy
up
that
Hardware.
That's
within
like
a
Numa
node,
but
that's
that
it
might
very
well
be
wrong.
C
Let's
just
can
like
I
current
thinking,
I
guess,
but
maybe
this
kind
of
can
lead
us
into
the
update
that
Radek
you
have,
because
you
were
just
recently
talking
about
that.
Can
other
paths
of
having
an
OSD
be
very
simple
and
potentially
like
single
threaded.
So
do
you
do
you
want
to
talk
about
what
you
guys
are
seeing.
A
I
just
wanted-
maybe
not
even
talk
just
asked
for
some
input
to
the
discussion
we
have
around
around
sharding
in
in
the
system
with
two
approaches
were
identified.
One
is
to
make
is
to
make
very
simple
OSD
to
think
explicitly
on
one
OSD
on
one
core
consuming
one
core
having
one
thread
and
everything
a
synchronous
inside
another
approach
would
be
to
make
their
OSD
more
beefy,
be
able
to
to
span
across
multiple
cores
and
do
sharding
internally,
and
both
approaches
are
having
their
own
advantages
and
disadvantages.
A
In
sister
we
had
we
had
discussion,
and
the
conclusion
is
that
we
don't
know
how
to
judge
appropriately
those
those
presence,
counsel
and
I
would
love
to
ask
about
the
inflation
of
of
OSD
map,
especially,
let's
suppose,
if
biggest
this
ad
fended
disadvantage
of
having
simple
OSD
spanning
only
on
one
car
is,
we
would
need
have
a
lot
of
OSD
in
cluster
to
saturate
powerful
devices
that
rate
nvm
ease.
This.
A
B
Don't
know
because
thing
is
if
we
thought
it's
not
so
much
a
core
that
we
want
to,
pin
tier-I
tassa
per
node
is
a
core
I.
Guess
percy.
Sorry
actually
want
like
one
literally
one
core,
because
at
least
for
the
network,
we
want
to
make
sure
that
we
had.
We
would
want
the
TCP
connection
to
terminate
at
the
core,
because
if
we
don't,
then
we're
sharing
the
network
stack
across
scores
and
we
are
sort
of
losing
the
whole
point
of
all
this,
like
I'm
wondering
if,
even
if.
B
Flexibility,
we
could,
we
could
have
an
OSD
for
the
entire
host,
the
entire
Numa
node
for
a
single
core
like
if
we
could
do
whatever
we
wanted.
I
think
we
would
still
possibly
want
to
squash
those
tees
down
really
small,
because
it's
the
minute
we're
sharing
network
connections
across
course,
and
suddenly
we
have
all
the
cross
course.
D
B
D
D
A
One
quick
question:
before
going
further:
if
I
may
are
we
entirely
focused?
Aren't
we
over
focused
on
Colonel
networking
internal
things,
I,
guess
that
the
main
yeah
I
mean
I,
guess
that
the
main
reason
of
getting
Easter
inside
is
just
bypass.
Colonel
distal
is
basically
a
scheduler
userspace
scheduler.
It
offers
the
PDK
based
network
stack
and
mark
and
I
bet,
and
that
on
powerful
deployments
we
are
we
will.
We
will
be
going
to
use
to
expedite
as
well
yep.
B
That
case
each
each
instance
of
the
OSD,
or
instance
of
the
network
endpoint
whatever.
However,
we
carve
it
up,
it's
going
to
be
a
Vista
virtual,
it's
right,
yeah,
yes,
that
virtual
networks
function,
which
means
that
the
hardware
is
doing
DMA
into
memory
for
us
and
we're
using
DVD
K.
So
there's
no
kernel
involved,
there's
no.
It
should
scale
very
well.
E
B
It
I
think
can
have
one
for
every
yes.
Every
second
I
mean
they're
they're
they're,
designed
for
a
virtual
machines.
Do
you
have
like
a
thousand
of
them
or
something
another
there's
billion?
The
idea
was
to
have
the
hardware
passed
through
direct
into
a
VM,
so
you
get
high-performance
networking
inside
of
you
and
that's
I
think
what
it
was
built
for,
but
here
would
be
making
use
of
it
or.
A
B
Understanding
is
that
this
hardware
capability
was
developed,
support
things
like
virtualization,
where
you
had
high
perform.
You
wanted
a
high
performance
of
the
em
that
was
pinned
to
the
CPU
at
no
scheduling
overhead
and
had
direct
access
to
the
hardware.
No,
it's
just
important
things
like
virtual
network
functions
and
whatever
Oh.
A
I
cannot
say
too
much
about
SP,
DK
or
ADP
DK,
but
I
have
had
some
experience
with
the
third
technology
that
is
I
guess
made
to
be
combined
with
both
really
the
previous
one.
Previous
ones,
I
mean
qat,
it's
for
me.
It's
for
offloading
compression
is
for
offloading
encryption
and
the
sharing
the
single
device
between
multiple
cars,
multiple
programs,
multiple
applications,
where
we're
well
designed
from
the
very
beginning,
I.
A
C
I,
don't
one
of
the
things
that
I'm
I'm
a
little
worried
about?
Is
that
every
it
feels
like
to
me
or
every
time
we
we,
we
hope
or
rely
on
the
kernel
or
some
other
piece
of
software
to
to
do
something?
Well,
if
sometimes
feels
like
we,
we
end
up
regretting
the
decision
have
in
the
future
right
that
that
something
doesn't
work
right
and
it's
not
easy
to
fix
and
certainly
not
easy
to
fix
by
us.
C
A
F
D
A
F
C
Maybe
maybe
we
you
can
do
really
fast
look
ups
inside
that
node
and
it
gives
you
more
flexibility
about
data
placement
locally,
like
maybe
maybe
the
problem
that
Crush
solves
is
much
more
relevant
when
you're
dis,
panning
lots
of
nodes
in
a
big
cluster
relative
to
like
placing
data
within
some
some
very
local
Numa
node
is
that
is
that
worth
thinking
about?
Is
that
something
people
are
interested
in
I.
B
B
B
B
B
E
C
B
But
I
think
the
extent
of
that
which
that
so
that
I
think
they're,
two
things
that
work
on
our
favor
one
there's
this
newest,
Biff
and
nvme
means
face
stuff.
Whatever
that
lets
you
actually
much
so
certain
about
the
SPD
case
I'd,
we
should
double
check
if
how
you
can,
if
you
can
carve
at
multiple
devices,
consuming
the
same
SP
DK
device.
There's
a
colon,
vini,
I'm,
not
sure
I,
don't
think.
E
B
The
thing
there
remember
is
that
we're
talking
about
intersecting
reality
like
in
a
year
and
so
that
it's
a
things
like
it's
a
question
for
what
the
what
the
roadmap
direction
is
for
the
hardware
manufacturers
like
are.
They
is
just
a
thing
that
what
is
the
plan
for?
Is
this
a
thing
that
is
going
to
be
normal,
or
is
it
going
to
be
a
weird
special
case?
I
think
that's
that
well.
E
They're
happy
with
it
is
they're
not
trying
to
push
it
forward.
What's
that
exciting
I
mean
the
conversations
I
remember
having
they're
not
trying
to
push
forward
that
those
features
at
all
and
it's
not
anywhere
and
I,
don't
think
it's
something
we
want
like
I.
Think
the
best
case
is
that
if
we
did
this,
then
each
OST
would
basically
have
one
of
the
flash
channels,
but
once
you
do
that
it's
all
super
slow
one
of
the
white
Kenneth
one
of
the
internal
channel,
the
NDS.
It's
the
flash
controller
yeah.
E
E
B
D
D
D
It
feels
like
it
feels
like
they're,
always
I'm,
not
sure
it
feels
like
they're,
even
for
our
machines
that
are
more
complex
than
that
more
devices
and
more
and
more
activation
together
or
more
cores.
You
want
to
run
in
that
configuration,
even
if
you
want
to
be
able
to
slim
down
and
just
some
more
isolated
dispersed
units.
If
that
someone
starts
building
that.
C
You
know
pseudo-random
distribution,
but
you
might
not
always
want
that
right,
like
officially
in
the
future,
might
be
where
you
want
some
kind
of
like
ability
to
to
say
this
device
is
in
the
middle
of
some
local
maintenance
or
local.
You
know
it's
busy.
I
want.
I
have
envied
Jim's
I.
Can
you
know
make
really
quick
choices
about
where
to
put
something?
Maybe
you
want
to
do
that
internally
or.
B
C
B
I
guess
I
guess
what
I'm
getting
at
is
I
wonder
if
this
actually
is
gonna
work
in
more
situations
than
we
think
it
might
actually
work
pretty.
Well,
it's
not
like
this
I
mean.
Ultimately,
if
we
have
I,
don't
let's
do
that.
Lets
us
sort
of
use
as
many
cores
or
as
few
courses
we
want-
and
you
know
either
like,
but
everything
into
one
mega,
OSD
diamond
for
the
whole
host
or
have
one
per
core
per
device
or
or
slice
of
a
device
like.
C
B
B
Is
that
wonder
if
what
we
should
focus
on
as
sort
of
phase
one
is
implementing
a
simple
case:
that's
not
going
to
be
completely
general,
but
actually
is
going
to
probably
cover
you
know,
70%
of
these
high
performance
cases
and
agreed
somewhat
gracefully
and
then
get
that
working,
that'll
sort
of
give
us
like
go
to
the
upper
bound,
because
it's
the
easiest
case
like
it's
all
in
one
core,
there's,
no
whatever
and
then
start
adding
all
the
complexity.
To
you
know.
B
A
A
If
you,
if
you
have
one
one
T
star
fret
in
HD,
it's
it's
still
more
complex,
not
quite
acceptable.
If
you
have
even
a
bit
beefier
or
SD
I
mean
multiple
sister
frets,
but
not
sharing
and
new
resources
perfectly
Charlotte
without
changing
shot
in
the
middle
of
the
request
processing.
You
can
live
with
that,
but
in
situation,
where
you
have
multiple
these
star
frets
exchanging
data
between
them.
Well,
it's
the
pain,
start
they're.
A
Sharing
data
in
sister
should
be
made
we're
using
message
passing,
but
there
is
absolutely
no
language
tool
set
to
prevent
you
from
accidental
sharing.
Well,
even
shank
is
not
the
whole
thing
you
need
to
take
care
about.
Even
the
allocation
of
memory
on
proper
car
is
another
pain.
Sure
sister
has
a
foreign
pointer,
but
it's
like
unique
pointer
of
us.
It's
it's
not
it's
not
for
it's
not
for
sharing
yeah
yep.
A
G
Pointer
you,
it
also
held
it-
has
two
versions
of
shared
pointer
has
lightweight
shared
pointer,
which
is
for
a
reference
count
on
a
single
core,
but
this
has
the
non
lightweight
shared
pointer.
So
there
is
actually
a
way
to
you:
share
data
across
cores,
that's
supported
by
the
system
in
the
cases
that
we
need
to
generally
read-only
data.
If
you
actually
want
to
do
a
modification
for
multiple
places,
you
almost
certainly
want
to
just
send
a
message
to
the
core
that
owns
its
immutability.
E
G
E
Radically,
if
you
were
there
like,
because
I
think
you've
been
working
forward
on
the
assumption
that
you
have
multiple
cores
per
OST
of
a
sea
star
process,
so
were
there
like
specific
problems?
You
already
run
into
that
prompted
this
question,
or
are
you
just
like
wow
would
be
really
cool?
If
we
just
didn't
have
to
do
any
other
work
cuz,
it
would
be
really
cool,
then
actually
the
work.
It
would
be
so
much
simpler,
I'm,
just
really
really
scared
of
Tiger
selves
into
that
that
a
whole
lot
of
locals
a.
D
B
F
E
E
A
Guess
well
from
the
discussion
I
can
find
on
sublevel
it's.
It
appears
about
to
be
connected
somehow
to
the
crossbar
we
would
need
to
have,
but
I'm
not
sure
a
it's.
It's
entirely
correct
in
the
case
of
having,
in
the
case
of
very
simple
SD
having
one
fret
only
well,
we
don't.
We
would
not
have
to
implement
the
crossbar
in
OSD,
because
the
crossbar
basically
will
be
fulfilled
with
the
current
crash
infrastructure.
Wait!
What
what
did
this
come
up
on
there
Septimus.
B
B
B
D
Made
that
is
always
made
sense,
but
I
think
the
most
I
think
you're
on
the
right
path.
You
know
when
you're
talking
about
having
a
particular
whiskey,
try
to
be
or
least
we've
been
there
even
specific.
Have
you
even
of
know
by
invariantly
be
a
kind
of
a
logical
task?
Annika
yeah
single
thread,
but
then,
but
then
you
have.
The
possibility
of
anymore
is
an
appropriate
number
of
those
allocated
India
on
any
particular
or
with
it,
with
affinity
with
India,
and
you
have
the
fixed
affinity
in
crush
of
PG,
but
but
systems
pry.
D
B
B
D
F
F
D
B
D
B
D
E
E
D
E
B
Concerns
me
on
the
network
side,
I'm,
not
concerned,
because
you
have
these
this,
that
already
exists,
and
you
can
you
can
reach
agar
your
networks
and
they're,
all
it's
all
dynamic.
So
who
cares
right?
Never
side
is
easy.
It's
the
storage,
side
and
I.
Think
the
question
is
at
the
end
of
the
day,
once
we
are
sort
of
done
with
this
whole
process.
B
How
much
is,
is
it
gonna,
be
one
modern,
CPU
core
one
modern
nvme
or
is
it
gonna,
be
like
four
cores
four
nvme,
or
is
it
going
to
be
a
quarter
of
a
core
to
one
Indy
me
like?
What's
the
what's
the
bounds
gonna
be
because
currently
we
spend
like
eight
cores
keeping
one
Indian
a
disease,
but
we're
like
we're
paying
all
this
complexity.
It's
like
the
code
is
horribly
inefficient,
like
if
we
actually
could
do
it
efficiently
like
where's.
B
What
would
the
end
point
be
because
I
and
I
can't
really
tell
because
I
can
see
it.
I
can
see
it
going
both
ways.
I
can
see
us
wanting
to
have,
like
you
know,
push
a
million,
I,
ops
and
so
an
SSD
and
like
you're
gonna,
you
kinda
want
to
be
able
to
spend
different
amounts
of
CPU
and
they
were
doing
Orisha
coding,
maybe
or
not
whatever.
On
the
other
hand,
I
also
see
that
now
the
vendors
want
to
build
these
boxes
that
have
like
you
know,
64
12,
terabyte
nvme
sticks
in
one
box.
C
Per
page,
the
answer
is
easy:
cores
aren't
getting
faster,
they're,
not
right.
Their
cores
or
not.
Cpus
are
getting
faster,
you're
getting
more
cores,
but
they
are
not
getting
faster,
but
an
nvme
drive
as
a
conglomeration
of
lots
of
channels
with
lots
of
parallelism
is
right.
So
unless
you
can
do
what
Greg
was
talking
about
and
start
like,
the
Salvatore's,
the
specific
channels
of
energy.
B
C
B
Think
one
I
think
that's
the
question
that
we
need
to
talk
to
WD
and
Intel
about
and
find
out
what
is
the
hardware
roadmap?
What
is
what
do
they
expect
is
gonna
happen
here,
because
I
think
there
is
a
something
that's
analogous
to
this
or
these
indium
you
drives
I,
don't
know
if
it's
the
namespace
or
if
it's
you
know,
if
you
have
64
channels,
each
one
gets
16
dedicated
channels
or
whether
they're
getting
multiplexed
in
the
controller.
I
don't
cares
or
maybe
what
I
should
you
care?
C
So
sage,
there's
the
there's
one
other
thing:
I
want
to
ask
you
about
back
when
I
was
working
on
pet
store,
I
had
proposed
kind
of
this
very
simplified
OSD
model,
and
you
were
really
against
it.
Then.
Is
it
just
the
ability
to
do
this
kind
of
virtual
was
if
wherever
they
call
it,
you
have
direct
access
to
the
hardware.
That
kind
of
has
swayed
you
in
the
other
direction,
or
is
there
something
else
that
I
don't.
C
But
it
was
very,
very
abstract
in
the
get
rid
of
all
the
everything
just
make
a
single
single
threat.
You
know
core
OSD,
that
does
you
know
event.
Basically,
you
know
event
model,
but
then
it
that
was
kind
of
with
the
assumption
that
maybe
you're
just
going
on
you
know
a
typical
file
system
or
something
you
know
you.
Don't
you
don't
carpet
up
the
device
directly
using
you
know
all
these
new
user
space
technologies.
So
so
that's
why
I'm
wondering
is
kind
of
what
what's
kind
of
pushed
you
and
what
is
yeah
what'swhat's
your
yeah.
B
C
B
B
E
B
E
Need
to
email,
your
some
people,
then,
because,
like
we've,
a
strat
and
the
answer
was
no
not
really
like,
there's
an
interface
that
looks
like
it
might
do
that
and
the
internal
implementation.
It's
a
lie,
yeah
it's
at
least
that's
that's
what
I
remember
from
when
we
asked
about
the
channels
in
the
past.
E
The
other
thing,
though,
is
I.
Would
it's
it's
good
to
question
your
assumptions,
and
but
I
mean
we
have,
or
at
least
I've
been
in
conversations
that
did
like
touch
on
this
sort
of
stuff
in
the
past
and
I'm
really
really
nervous
about
specifically
requiring
the
performance
of
for
the
performance
of
all
of
the
pieces
to
align.
E
I
think
we're
gonna
be
in
a
use
case,
where
that's
just
not
true,
and
we're
like
this
is
gonna,
be
our
only
third
path,
and
so
I
would
like
to
see
very
explicitly
what
we've
done
so
far,
where
that
makes
us
think
that
losing
this
programming
complexity
is
worth
that
loss
of
generality,
because
it
is,
it
is
simpler,
but
it's
if
we
like
it
makes
the
primitives
a
little
more
complicated
but
I,
don't
from
what
I've
seen
and
I
mean
I'm
an
outside
observer.
What's.
C
C
A
F
G
E
Not
because
we
don't,
we
can't
saturate
and
hard
drive
really
really
easily,
but
because
we're
trying
to
shove
more
and
more
hard
drives
between
one
cook,
behind
a
single
CPC,
CPU
socket
and-
and
it's
also
not
just
about
the
CPU
amount
of
CPU
used,
but
that
the
more,
if
do
one
ôs
D,
is
one
core.
That
means
we
have
you
know
16
or
32,
or
64
x,
OS
DS
per
box
guaranteed.
We
can't
reduce
that
number
at
all.
E
It
means
that
we
have
that
many
copies
of
all
the
live,
OSD
maps
in
the
system
and
they're
larger,
and
that's
more
memory
that
might
be
used
in
a
in
a
low
memory
ratio,
environment.
B
Regardless
of
what
the
like
that,
the
top
line,
hero
number
whatever
is
and
what
the
right
balance
is
for
that
with
the
fastest
Hardware
in
every
category,
the
reality
is
that
the
hardware
that
that
our
users
are
gonna
use
its
gonna,
be
like
wildly
varying
right.
So
I
think
the
question
is
like,
if
you
think
of
in
my
mind,
I'm
wondering
if,
if
we
think
about
sort
of
the
most
extreme
case
of
being
these
machines
that
are
just
like
packed
with
enemies
in
packs.
A
E
E
Real
part
about
it
agree,
yes,
like
we
use
a
lot
of
memory
right
now
and
that
aren't
like,
and
the
increases
we've
had
in
using
memory
are
causing
trouble,
not
just
for
like
homebrew
lab
users,
but
for
some
of
our
commercial
users.
D
E
B
D
Of
the
day
when
memory
was
really
expensive,
EFS
got
wins
by
by
using
UDP
because
it
because
it
avoided
of
line
blocking
and
it
garbage
collected
all
the
all
the
overhead
of
early
TCP
stacks
I,
don't
know
how
long
you're
gonna
we're
gonna
get
back
there.
Hol
is
going
to
be
significant,
but
I
just
think
of
hardwiring.
The
decision
like
we're
gonna,
have
one
network
stack
for
OSD
a
seems
odd,
given
the
way
crush
works.
C
D
B
B
Worried
about
you
know,
five
years
from
now,
when
you
have
a
256
court
system
and
suddenly
out
of
twenty
fifty
six
so
Steve's
on
one
box
like
what
is
that?
How
big
do
your
maps.
C
D
It
wasn't
an
Olaf
Emerson
is
on
a
phone
call,
but
back
when
we
talked
about
this
flat.
We
can
we
conceive
this
problem
before
we
can
see,
but
the
quite
in
the
same
hardware
in
a
specific
way,
but
we
assumed
that
this
could
be
solved
by
by
it
by
by
sharding
or
fragmenting
and
isolating
map,
relevant
portions
of
maps
to
area
to
subdomains.
G
Sure
it's
something
about
the
the
thing
that
I'm
unclear
about
is
in
terms
of
pure
complexity.
The
crossbar
seems
like
less
of
an
issue
than
stuff
like
PG
split,
merge
or
even
a
shared
OSD
cache.
Okay,
worse,
the
crossbar
is
just
we
allocate
memory
on
one
thing
at
the
beginning
of
a
request
and
then
all
the
way.
At
the
end,
when
we
finish
the
request,
we
have
to
send
a
message
to
deallocate
it
locating
memory
by
sending
a
message:
isn't
the
bad
part,
it's
all
other
stuff
that
I'd
worry
more
about.
It's.
B
The
stuff,
yeah,
no
you're,
totally
right,
split
and
merge
are
a
huge
headache
that
are
more
of
a
concern
but
I
think
it's
not
so
much
about
the
allocation
de-allocation,
at
least
with
my
limit
understanding.
It's
that
yes,
the
thing
the
messenger
reads:
it
off
the
network
and
probably
does
a
CRC
check
and
then
hands
it
off
to
another
core,
and
so
it's
no
longer
in
the
like
the
l1
cache,
because
that
other
core,
when
it
then
does
any
additional
processing
or
sense
about
the
network,
yep.
A
B
Does
it
tie
our
hands
in
other
cases
and
whatever
I'm,
also
that
those
folks
probably
know
something
about
the
hardware,
virtualization
capabilities
or
whatever
the
right
term
is
for
the
SSDs
I
can
reach
out
to
to
Shara
separately?
Let's
see
what
Intel's
take
on
that?
That's
better.
It
feels
like
that's
sort
of
the
next.
That's
not
it!
It's
a
conversation
board.
F
Yes,
consider
up
to
you
guys
to
try
to
call
it
being
to
invite
you
guys
to
the
next
crimson
core
and
it's
in
the
mail
to
summarize
some
of
the
problem.
Consumption
week
we
have.
We
got
into
the
complexity
introduced
by
the
cost,
bad
messenger
and
some
of
the
question
and
an
assumption
regarding
to
how
to
charge
the
online
storage.
How
do
we
make
use
of
it
using
the
multiple
study,
I
think
I.
C
Also
like
to
have
talked
to
hell
Fredo
about
what
does
it
mean
in
terms
of
the
complexity
of
having
to
divvy
up
hardware
or
different
OS
DS,
or
you
know,
whatever
you
want
to
call
them
right
and
with
static
allocation,
saying
this
is
a
portion
of
an
LVM
device.
It
gets
given
to
this
small
OST
and
this
other
one.
This
other
one
versus
kind
of
the
model
of
you
hand.
You
know
resources
associated
with
a
Numa
node
to
this.
This
OSD,
you
know
conglomerate
thing
right
and
they
figures
it
out.
C
B
Think
that
the
good
news
is
that
all
of
this
is
focused
on
like
puri
and
emu
devices
and
95%
of
the
complexity
and
the
stuff
I'm
stuff
that
alfredo
has
been
working
on
is
all
the
other
crap,
where
you
have
like
carving
it
up
into
12
pieces
and
sharing
it
among
about
hard
bunch
of
hard
drives
and
whatever
I
think
a
lot
of
that.
It's
less
of
an
issue.
Oh.
B
B
The
opt-in,
stuff
well
I'm,
hoping
that
either
you're
just
gonna
have
opt-in
devices
or
not
obtain
devices
and
you're,
not
gonna
like
have
like
an
nvme
and
a
piece
of
an
up
game
and
that
in
the
case
we're
doing
persistent
memory.
It's
going
to
be
actual
dim
form
factor
and
so
it'll
be
a
little
bit
easier
to
manage.
I'm
hoping
but
I,
don't
know.
Who
knows
there
are
a
lot
of
questions.
C
B
But
we
not
might
not
necessarily
want
to
do
a
hybrid
OST
right
like
for
hard
disks.
Hybrid
OC
has
really
made
sense
because
the
flash
was
so
much
faster
than
USD
and
it
was
so
much
easier
to
get
good
performance
out
the
hard
disk
with
a
little
bit
of
flash
I'm,
not
sure
that
the
same
holds
true
with
quote:
unquote,
slow
and
VMI's
versus
I'm
opting,
but
maybe
I,
don't
know.
I.
A
A
The
the
possibility
to
employ
the
AIA,
the
sisters
I,
our
scheduler
user
space,
our
scheduler
you
can
have
you-
can
do
a
lot
more
with
with
scheduling,
io,
prioritizing
particular
kinds
of
of
I/o
operations.
That's
good,
very
good,
finding
very
good
reason,
a
very
good
advantage,
our
for
for
the
beefy
design.
C
F
Specifically
allows
us
to
to
to
to
share
the
device
by
by
by
a
dying,
different
tasks
with
with
it,
with
a
share,
for
example,
if,
if
it
had
to,
should
have
a
more
more
more
share,
we
can't
I
needed
a
number
like
1217.
If,
if
another
particle
should
have
a
less
share,
smallest
share,
we
can
assign
the
number
like
two.
It's
just
a
nice
way.