►
From YouTube: Ceph Performance Meeting 2022-09-15
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
So
I
don't
have
anything
prepared
for
that
this
morning,
but
adam
and
I
have
been
spending
a
lot
of
time
this
week
talking
about
how
shared
blobs
work
in
blue
store,
and
so
I
imagine
we'll
probably
discuss
that
quite
a
bit
today.
A
Maybe
before
we
get
into
that,
though
I'll
open
it
up
for
other
people,
is
there
any
topics
that
people
would
like
to
bring
up
today
or
are
there
any
topics?
Sorry.
A
All
right!
Well,
then,
so
all
this
comes
out
of
the
work
from
the
rbd
mirroring
performance
issue
that
was
seen
that
we've
been
talking
about
quite
a
bit.
A
The
gist
of
it
is
that
the
work
I
had
been
doing
to
try
to
see
if
I
could
improve
the
performance
of
iterating
over
extents
it
kind
of
worked.
I
was
able
casey.
Your
idea
was
good
regarding
the
the
movement
constructor
that
did
work
in
the
end,
or
at
least
it
somewhat
worked
in
the
end,
at
the
very
least
enough
that
I
was
able
to
run
normal
benchmarks
fairly
fast.
In
fact
it
it
generally
just
for
doing
normal
benchmarking,
blue
star
it
looked
like.
A
Maybe
we
got
even
like
a
five
percent
cpu
usage
reduction
with
at
least
similar
performance
levels,
so
so
win,
but
I
ended
up
with
really
really
irritating
strange
issues
with
reference
counting
in
other
parts
of
the
code
that
adam
and
I
sat
down
and
tried
to
debug
and
theory
out,
and
it
turns
out
that
it
didn't
look
like
this
was
really
doing
a
whole
lot
for
the
the
issue
that
we
were
hitting
with
with
snapshots
and
so
that
kind
of
got
put
on
hold
and
maybe
abandoned,
we'll
see.
A
But
we
we
ended
up
looking
at
the
ref
map
inside
shared
blob
and
adam
I'll.
Let
you
talk
about
everything
that
you've
discovered
I'll
just
say
I
I
did
go
through
and
try
to
replace
that
with
a
all
right.
I
did
actually
successfully
go
through
and
place
it
with
the
flat
map
and
got
all
the
iterator
validation
out,
and
that
runs
fine.
Now,
seemingly,
the
downside
is
that
the
benefit
is
pretty
small.
I've
got
the
numbers
in
a
spreadsheet.
A
You
can
kind
of
tell
that
it's
it's
helping
a
little
bit
the
yellow
line
or
the
yellow
peaks
are
the
ones
using
flat
map
for
the
rough
map
versus
using
the
original
standard
map,
and
this
the
standard
map
is
for
every
single
shared
blob
and
we
have
a
lot
of
them
per
object
in
in
this
kind
of
workload
and
that
shared
map
or
that
that
rough
map
is
basically
just
storing,
like
eight
eight
bytes
of
data,
it's
ridiculous.
A
It's
tiny,
there's
lots
and
lots
of
those
maps,
but
that's
again
starting
to
get
into
the
stuff
that
adam
discovered
so
adam.
Do
you
want
to
talk
about
the
things
you
were
working
on
and
what
you
saw.
B
Yeah
sure,
just
a
bit
out
of
focus
with
browser,
I
tried
to
find
my
pr,
maybe.
C
B
Revolves
around
earlier
work
that
I
did
about
just
printing.
Oh
note,
metadata
internals,
because
I
knew
what
the
structure
of
the
data
is,
but
how
actually
it's
being
used
was
unknown
to
me.
So
I
made
an
example,
a
tool
that
allows
me
to
print
exact
metadata
as
it's
as
it
encodes
objects.
B
B
Let's
clearly
observe
that
when
we
modify
object,
that
is
shared
meaning
we
have
some,
let's
say
already
shared
object.
A
shirt
object
here
means
that
we
already
did
a
snapshot
of
things
of
it,
so
blobs
that
are
in
the
object
are
already
shared.
So
when
we
write
to
a
head
object
that
has
a
snapshot.
B
B
If
you
think
about
it
as
a
one-time
operation,
it
makes
perfect
sense,
that's
exactly
what
you
should
do,
but
the
test
we
were
tackling
was
that
we
had
an
object
that
we
modified
and
we
already
made
many
snapshots.
We
just
made
one
snapshot,
write
some
data
made
another
snapshot
and
so
on.
It
means
that
the
effect
that
we
have
new
blobs
that
basically
cover
only
one
only
data
modified
in
a
specific
period
between
snapshots,
just
a
sequence
of
that
shared
blobs.
B
That
still
is
not
so
bad,
but
when
we
delete,
because
when
we
have
all
the
snapshots
we
have
when
we
are
still
required
to
have
all
snapshots,
then
we
have
to
somehow
maintain
the
data
and
that
might
be
proper
way
to
do
it.
But
we
delete
the
snapshots
we
in
the
test.
We
have
and
it
looks
like
a
reasonable,
real,
real
life
case.
B
B
A
B
Will
I
will
share
my
result
of
a
simulator?
Basically,
the
effect
is
we
can
get
rid
of
that
additional
blobs
and
keep
number
of
shared
blobs
like
five,
when
you
have
five
copies
of
an
object
in
reasonable
intense
for
reasonable
intensity
of
random
rights.
B
B
Basically,
you
might
not
know,
but
what
we
currently
have
as
a
shared
blob
entity
in
x,
column
in
roxdb,
does
not
track
blob
at
all.
The
object.
Shared
blob
is
basically
only
a
tracker
of
how
many
times
specific
region
of
the
disk
was
used
by
shared
blob.
It
means
you
could
have
like
the
and
and
in
addition
to
that,
each
object,
that's
referencing,
the
shared
blob
is
having
its
own
encoding
entire
encoding
of
a
blob.
B
That's
it
and
for
right
cases,
when
you
modify
object,
you
modify
your
local
blob
and
then
you
notify
shared
blob
that
some
references
might
have
been
unused
and,
of
course,
if
ref
counter
goes
down,
the
else
case
is
not
important
at
that
moment,
so
we
are
trying
to
maybe
simplify
the
case
here.
By
attempting
I
don't
know,
maybe
I
mean
various
ideas
I
had
today,
one
that
is
either
stupid
or
ingenious
to
let
allocator
disc,
allocate
or
actively
count
references.
B
So
that
would
be
like
a
case
when
object
is
no
longer
referencing
part
of
shared
blob.
It
will
just
release,
it
will
just
release
data
and
be
done
with
it,
and
allocator
will
behave
appropriately,
so
yep,
that's
it
and
now
let
me
copy
find
and
copy
results
of
simulator
how
amount
of
change
the
shared
block
changes.
A
Well,
adam's
doing
that
igor.
A
Some
of
the
things
we've
also
talked
about
is
whether
or
not
we
could
well,
I
suppose,
if
we
removed
compression
from
blob
weather
and
that
it
would
allow
us
to
not
keep
track
of
the
the
bite
granularity
anymore,
but
if
we
could
do
it
at
the
the
the
minolic
size,
you
probably
know
this
code
better
than
anyone.
I
think
what
are
what
are
your
thoughts
on
all
of
this.
D
E
D
E
E
I
don't
remember
exactly
which
cases
get
relied
on
because
it
it
it
worked,
and
so
things
started
getting
built.
A
couple
of
things
were
built
on
top
of
it.
E
What
I
know
of
is
the
fs
crypt
integration
in
southwest,
and
I
think
that's
been
adjusted
to
to
be
less
fussy
now,
because
we
discovered
that
some
ost
configurations
did
behave
differently
than
than
we
thought,
but
in
particular
the
monoxide
is
a
user
configurable
thing
and
it
has
in
the
past,
been
like
64
kilobytes,
and
it
definitely
needs
to
be
finer
granularity
than
that.
E
B
Those
are
links
to
simple
simulator,
of
how
many
blobs
we
will
use
if
we
allow
ourselves
to
integrate
a
newly.
B
That
local
blocks
into
some
shared
blobs
that
we
already
own
in
an
object.
That's,
of
course,
relates
to
a
process
of
making
extant
map
dupe.
So
the
snapshot
moment.
B
The
sequence
is,
and
the
name
is
how
many
like
16
rays,
it's
how
many
allocation
units
are
per
block
allowed
and
the
sequence
you
can
read
it
like
this.
There
is
a
sequence
of
iterations
and
first
there
is
a
part
that
simulates
right
and
here
we
see
a
wave
sign
that
marks.
Where
we
write
new
new
element,
then
there
is
a
snapshot
and
when
snapshot
happens,
all
newly
written
data,
the
spaces
means
empty
holes.
B
So
when
we
do
snapshots
here,
snapshot
a
later
letters
say
what
is
the
shape
of
our
shared
blob
and
which
segments
it
represents
of
again,
and
we
can
see
in
dump
of
blobs
that
each
region
of
the
shirt
blob
now
has
two
references
and
it
goes
on
and
on
and
when
we
try
to,
we
would
need
to
go
some
way
down.
A
Adam
is
it,
is
it
worth
sharing
your
screen
or
with
like.
B
Okay
and
for
other
people
that
that's
the
first
link
from
the
four
I
I
shared,
the
sequence
is
the
simulator
is
like
this:
it
gate
goes
in
50
iteration.
Each
iteration
is
fierce,
trying
to
write
some
random
data
to
object,
basically
16
16
times
allocation
unit
in
size.
That's
that's!
That's
all
that's
interesting
and
in
between
there
is
a.
B
There's
a
right
face
and
snapshot
phase.
You
can
see
that
the
places
where
we
in
the
first
right,
we
put
some
data,
are
converted
to
share
the
blob
now
in
both
objects,
object
h,
which
is
head
and
object,
one,
which
is
a
snapshot,
snapshot
number
one
and
there
is
a
dump
of
content
shared
blobs.
B
B
When
reuse
is
not
in
effect,
that's
it
the
same
simulation
when
reuse
was
not
enforced.
You
can
see
in
line
18
to
20
new
blob
appeared.
So
in
that
case,
where
previously
we
could
reuse
some
space
of
the
same
blob.
Now
we
had
to
create
an
entire
new
block
and
that's
the
case,
how
we
do
it
now.
That's
the
current
way.
B
A
Agree
adam,
I
don't
think
we
can
guess
at
it.
There's
too
many
moving
parts
here
it
just
I
mean
the
the
fact
that
we'll
have
more
extents
than
the
rough
map
for
jerk
blob
means
that
the
characteristics
are
going
to
change
dramatically
compared
to
what
they
are
now.
B
We
even
end
up
with
less
blobs
than
with
smaller
blob
size.
That's
actually
predictable
outcome,
since
if
there
is
a
larger
size
of
blob,
there
is
more
more
ability
to
find
some
blob
that
will
still
be
matching.
Oh,
I
forgot
to
say
one
limitation:
the
algorithm
only
tries
to
reuse
the
blobs
that
they
were
already
used
in
encoding
of
the
current
object.
So
if
there
are
some
blobs
that
are
not
the
part
of
actually
used
set,
they
are
skipped
in
binding
attempts
to
find
space
in
that
blobs.
Those
blobs
yep
guys.
A
So
adam,
would
it
be
a
good
time
to
talk
about
the
profiling
results
that
we
looked
at?
Is
that
useful
dude.
B
A
Sorry
this
is
taking
so
long.
This
is
not
very
working
very
well.
A
Oh
okay,
I'm
gonna
give
up
on
this.
For
now
it's
it's
has
permission
problems
as
well,
apparently
so
anyway,
I
guess
you
can
look
at
it
in
the
chat
window
there.
Okay,
so.
A
A
A
Improvement,
it's
very
slight
improvement
in
in
place,
but
there's
a
lot
of
other
stuff
going
on
in
here
too.
Definitely:
management
of
intrusive
pointer
references,
there's
some
encoding
and
decoding
in
here
creation
of
old,
extents.
A
Alt
range
is
taking
up
some
time
opening
share
blobs,
there's
decode
blob
decode,
the
gist
of
it
is
there's
a
lot
going
on
in
here
I
mean
it's
not
just
one
thing:
it's
just
the
sure,
quantity
of
extents
and
shared
blobs
and
other
data
structures
that
are
involved
when
we
have
a
lot
of
fragmentation
across
the
snapshots.
A
It's
just
there's
there's
a
lot
going
on
so
like
if
we
need
the
bite
level
granularity,
I
don't
know
I
I
guess
if
there
are
ways
that
we
can
reduce
the
the
quantity
of
of
data
structures
that
we're
dealing
with,
that
would
probably
help
us
short
of
that.
It's
going
to
just
take
a
lot
of
optimization
of
a
lot
of
different
areas
in
a
lot
of
different
areas,
a
lot
of
really
small
gains
that
maybe
they
add
up
to
something
I
don't
know,
but
that's
that's
where
what
it
looks
like
to
me
now.
A
B
I
think
it's
we
should
simplify
data
structures.
We
have
in
bluestora
now
specifically
that
some
of
the
cases
we
use
them
don't
really
make
much
sense.
For
example,
we
correlate
buffer
cache
with
shared
blob
that
gives
us
ability.
If
we
read
one
head
object,
then
we
could
read
a
snapshot,
object.
B
B
B
Idea
is
like
also
that
we
mark
already
looked
recently
ref
blob
ref
map,
t,
that
is
a
full
map.
Basically,
in
most
cases,
contains
only
four
elements
so
again,
simplifying
that
would
be
huge
benefit
and
so
on.
A
If
I
remember
right
in
shared
blob,
the
buffer
space,
this
was
that
a
a
pointer
to
buffer
space
that
contains
the
map
of
of
objects
or
cache.
A
B
Keeps
objects
that
keeps
intrusive
pointers
of
buffers
and
yeah
and
then
relationship
to
buffer
space
cache,
which
is
a
collection
level.
I
guess
other
you
cache,
follow
all
all
the
data.
B
If
we
attempt
to
do
that,
we
could
strip
shared
blobs
from
possibly
shared
blobs
face
it
from
blobs
that
basically
are
regular
blobs,
because
now
the
only
way
of
cache
data
I
mean
we
cache
data
in
regular
blob
by
creating
the
view
as
a
shared
blob,
and
then
we
add
it
to
to
cash
yeah.
A
B
The
place
when
we
can
see
that
we
now
assume
it's
always
present.
It's
read
cache
which
iterates
over
extents
in
objects,
and
all
that
are
present.
We
just
assume
that
shared
blob
is
is,
does
exist
with
just
the
reference
and
there
is
no
ifs.
A
I
wonder
in
this
case
that
we're
this
camp
pathological
case
that
we're
looking
at
right
now,
if
we'd
really
end
up
with
very
many
fewer
blobs.
At
that
point,
though,
may
we
we
we'd
have
some
for
a
while
right.
We
wouldn't
need
it
for
any
regular
blobs,
but
eventually,
like
everything,
becomes
a
a
real
shared
blob
right.
C
A
B
I
I'm
thinking
that
maybe
we
should
be
extra
careful
not
to
optimize
for
some
pathological
case,
like
basically
the
one
that
we
analyzing
now
with
paul's
testing.
Well,
not
over
simplify.
For
such
case,
I
mean
over.
A
B
A
B
I
guess
I
was
trying
to
deduce
from
code
and
from
old
pr's
thanks
mark
by
the
way
and
maybe
some
documentation
if
I
found
it
what
was
actual
goal
of
current
architecture
of
shared
blobs,
because
I
definitely
don't
want
to
cut
down
some
functionality
that
it's
not
implemented
by,
but
I
should
implement
it
one
day
and
in
other
hand
I
will.
I
would
love
to
cut
down
anything
that
we
will
not
practically
use
and
is
a
just
a
leftover
burden
from
past
era.
A
A
C
A
It
it
it
kind
of
as
the
transition
from
new
store
to
blue
store.
We
we
changed
a
lot
of
stuff
really
fast.
I
don't
really
remember
why
what
what
our
thought
process
was.
C
B
F
Something
like
that
I
mean
well,
maybe.
D
A
I
don't
adam,
I
don't
know
what
what
you'd
think
do
you
think
that
the
the
behavior
with
like
crc,
that
you
saw
would
be
classified
as
a
bug
or
as
a
design
detail.
B
I
found
out
recently
that
when
we
encode
crc
or
blobs
that
have
like
32
kilobytes
hole
in
front
and
then
four
kilobytes
of
actual
data,
then
when
we
encode
crc,
we
basically
dumped
into
metadata
32
8
entries
for
that
32
kilobytes
and
then
one
entry
of
actual
data,
and
that
was
very
severe
metadata
consumption
place
for
that
poles
test.
When
we
suddenly
get
a
lot
of
shared
blobs
just
containing
one
allocation
unit.
Modification.
A
It
seems
to
me,
like
that's
the
kind
of
thing
that
we
should
probably
try
to
fix
in
the
current
code,
but
maybe
maybe
some
of
these
big
ideas
regarding
drastically
changing
things,
maybe
maybe
that
would
be
better
to
do
than
more
isolated,
especially
for
changing
on
disk
format.
I
guess
I
don't.
D
B
Yes,
and
no-
I
was
thinking
about
just
software
modification
I
mean
without
on
on
disk
format
of
a
change
that
will
basically
move
front
of
blob
to
a
different
position.
But
I
was
pretty
scared
that
it's
never
going
to
happen
in
current
code.
B
Well,
I
don't
think
so.
If
I
can
cut
blob
and
move
the
position,
then
I
can
also
extend
blob
by
moving
starting
position.
So
that
seems
to
still
hold
I
mean
the
only
thing
that
I
agree
should
be
preserved.
Is
that
I
shouldn't
chop
my
blob
into
like
I
have.
Let's
assume
I
have
three
allocation
unit
modification
in
one
blob
spread
over
a
bit.
Then
I
should
not
basically
create
three
different
blobs
each
allocation.
You
need,
because
that
would
be
inefficient.
D
B
F
D
D
C
G
Suggestion
did
we
consider
the
option
of
not
using
snap
for
rbd
mirrors,
I
mean
doing
mirror
doing
creating
snap
every
15
minutes
and
deleting
them
is
extremely
expensive
operation.
I
mean.
C
A
G
So
you
could
do
a
very
simple
thing,
like
I'm,
not
saying
it's
the
best
solution,
but
just
like
from
like
five
minutes
playing.
While
writing
you
could
send
you
could
duplicate
the
right.
Everything
that
you
want
to
do
could
be
written
to
some
kind
of
a
cyclic
log
log
buffer
and
then
in
the
background,
you
could
push
that
thing.
G
Like
like
yeah
a
journal,
so
you
you
every
right,
you
want
to
do
to
the
primary
you
do
normally
and
then
you
also
send
the
same
right
to
the
journal,
and
if
we
want
to
be
nice
we
could
even
make
a
special
command
for
the
osd.
So
the
osb
would
split
the
right
into
the
journal
and
then
the
journal
will
have
checkpoint
and
you
have
some
background
process
on
the
osd
itself,
sending
them
to
the
remote.
So
you
don't
need
the
the
client
the
rbd
to
do
the
mirroring,
because
why
should
you
read
it?
G
And-
and
so
you
have
now
the
moment-
is
the
rbd
is
reading
from
the
osd
and
then
send
it
to
the
other
one.
So
if
the
osd
knows
that
they
have
to
do
the
snap,
it
could
use
this
journal
and
then
push
everything
to
the
remote
when
they
have
the
time
and
they
have
checkpoint,
and
they
can
have
point
in
time
I
mean
that's
just
something
which
I
know
it's
like
down.
G
Probably
I
didn't
do
any
smart
calculation,
it's
like
gazillions
holdings
in
design,
but
if
we
spend
two
weeks
designing
it,
I'm
sure
you
could
find
some
different
solution.
Sometimes
you
do
solutions
because
it
was
straightforward
to
design
them
and
they
worked.
But
over
time
you
realize
after
some
time
it
wasn't
the
best
solution
at
the
time
it
was
good
to
do
it
because
it
gave
you
a
chance
to
do
things
quickly,
but
eventually
you
go
back
and
say
you
know
what
let's
refactor
this
thing
and
do
it
differently.
G
So
you
can
set
the
right
to
the
two
two
osd
and
they're
going
to
be
a
bit
saying.
This
thing
should
also
be
mirrored
somewhere
else,
and
then
the
osd
would
write
to
one
place,
keep
something
on
a
journal
and
there's
going
to
be
a
background
process
on
the
osd,
pushing
thing
from
the
journal
to
the
remote.
Whatever.
C
D
G
Didn't
hear
your
ego,
so
I
didn't
get
what
you're
saying
your
your
microphone.
F
I
I
I
mean
it
looks
so
from
your
words.
It
sounds
like
there
is
some
similarity
to
pg
log
stuff.
F
G
G
G
A
I
will
say
that
we've
in
general,
gotten
feedback
about
snapshots
being
slow
and
and
problematic.
So
beyond
rbd
mirror.
There
is
some
general
sense.
It's
it
hasn't.
This
isn't
in
isolation.
I
think
people
are
suffering
when
they
end
up
with
a
lot
of
shared
blobs.
G
G
I
don't
know
some
extra
space,
some
ex
it's
got
to
be
more
efficient,
but
it's
going
to
hit
the
same
rod
block
you're
just
going
to
get
later
to
the
roblox.
So
it's
not
going
to
take
you
two
hours
to
get
there
or
12
hours.
It's
going
to
be
24
hours,
but
if
you
keep
running
avid
and
rvd
mirror
is
just
running
continuously
we've
seen
after
12
hours.
This
thing
is
bad,
so
let's
assume
that
adam
and
igor
are
going
to
make
things
much
much
better.
A
Cannot
do
this,
we
get
we
we
hit
saturation
point.
Basically,
once
you
hit
like
the
same
number
of
shared
blobs,
as
extents
so
say.
If
you've
got
a
four
megabyte
object
and
you
you
do
this
pattern
that
we
see
with
random
rights.
Eventually,
you
end
up
with
almost
10
24
shared
gloves
with
one
extent
in
them,
each
essentially
not
quite
as
close,
but
it's
you
know
it's
roughly
that
that's
that's
where
we
hit
saturation
that
looks
like.
A
We,
I
think
it's
bounded.
I
don't
think
that
we
continue
to
just
use
more
and
more
cpu
forever.
Once
we
get
to
the
point
where
we
have
as
many
shared
blobs
as
we
can
possibly
have
or
close
to
it.
That's
when
when
we
we
end
up,
basically,
you
know
hitting
hitting
the
limit,
but
it's
it
causes
a
lot
of
disruption
for
sure
icp
usage,
a
lot
of
work
during
dupe.
A
And,
and
out
of
my
head,
the
opposite
thought
earlier
today,
I
was
thinking
well
if
we're
going
to
end
up
at
the
maximum
level
anyway,
what?
If
we
just
optimized
for
the
the
idea
that
you
always
have
lots
and
lots
of
shared
props.
B
A
Well,
I
was
even
thinking
if
you
just
assume
that
you
have
basically
a
shared
blob
for
every
extent,
can
you
can
you
make
it
certain
things
simpler?
At
that
point,.
G
It's
just
going
to
mean
that
it's
like,
if
you
decide
that
all
the
cars
can
never
go
faster
than
10
mile
per
hour,
then
you're
not
going
to
see
so
many
you
are
going
to
be
slow,
yeah
you're,
not
going
to
be
surprised
by
all.
Today
I
had
bad
traffic
because
you're
going
to
have
bad
traffic
every
day.
A
B
A
And
yeah
I
mean
I'm,
I'm
admitting
this
I'm
being
kind
of
ridiculous
here,
but
my
my
thought
process
was
well.
If,
if
you
had,
if
you
knew
that
this
was
the
case,
then
could
could
you
make
the
stuff
sitting
inside
shared
blob
and
rough
map?
Maybe
not
so
so
heavy
anymore
if
it
if
this
was
kind
of
your
you're
just
the
way
it
is,
but
it's
probably
like,
like
you,
said,
gabby,
it's
maybe
just
being
a
little
ready.
I
don't
know.
G
G
And
adam:
do
you
think
the
change
that
you
guys
are
suggesting
is
going
to
eliminate
this
problem
or
is
going
to
just
give
us
more
briefing
room
but
eventually
you're
going
to
hit
the
same
problem
if
we
keep
hitting
the
same
procedure
so
rbd
mural
is
issuing
a
new
snap,
every
15
minutes
and
he's
doing
continuously
random
right.
Are
you
going
to
eventually
get
the
same
problem
or
it's
just
not
going
to
happen,
because
you
have
a
solution
which
guarantee
that
this
thing
could
not
happen.
B
G
B
B
G
Okay,
so
you're
going
to
get
four
x
improvement,
which
is
very,
very
significant,
but
is
it
enough
for
our
case
because
paul
was
showing
me
24
x,
slow
down,
so
now
we're
going
to
be
six
x,
slow
down?
G
I
mean
it's,
it's
it's
it's
huge
improvement,
I'm
not
saying
don't.
Do
it
I'll
say
definitely
do
that.
But
what
about
the
6x
slowdown?
Is
that
acceptable
and
again,
if
you
got
four
weeks
improvement,
then
never
stop
from
doing
that.
You
must
do
that,
but
why
not
real
solution
with
the
journaling,
which
probably
should
be
in
wall's
case
2x,
slower.
A
Adam
I
want
I
wanted
to
ask
you
before
we
get
into
the
performance
specifically.
Does
your
change
actually
reduce
the
upper
bound
on
the
number
of
shared
blobs
that
you
would
end
up
with
per
per
oh
node.
B
Theoretically,
yes,
it
does,
because
if
in
reuse
case,
there
should
be
at
most
as
many
shared
blob
per
the
default
dragon
of
shared
blob
size
as
there
are
copies
of
object,
no
more
because
if
you
have
more,
then
you
should
be
able
to
find
a
space
in
a
shirt
blob
that
you
can
fit.
Your
new
data
in.
A
B
B
Upper
boundary
would
be
amount
of
shared
amount
of
shared
blobs
per
object,
meaning
four
megs
per
divided
by
64k
times
the
amount
of
snapshots
we
have
live,
but
that's
the
upper
bound,
not
the
expected
value.
C
C
A
So
I
think
the
upper
bound
is
technically
still
the
same.
If
you
had
unlimited
like
as
you're
approaching
an
infinite
number
of.
A
A
So
gabby
getting
into
what
you
were
saying
with
performance,
I
guess
this
change
is
as
valuable
as
maybe
the
number
of
snapshots
that
you're
keeping
inversely
to
the
number
of
snapchats
you're
keeping.
A
So
the
hope
is
that
if
you
only
have
like
one
or
two
snapshots,
you
can
have
significantly
fewer
shared
blogs
than
you
can
now,
but
as
the
number
of
snapshots
increases
and
approaches
infinity,
the
closer
you
get
to
having
the
maximum
theoretical
limit
of
of
shared
blobs
per
object
that
you
you
did
previously
so
right
now,
after
like
12
hours
right,
we
we
hit
close
to
10
24
sharp
blobs
per
object.
It's
not.
C
A
But
it's
pretty
close.
It's
like
around
a
thousand,
whereas
with
adam's
change,
you
would
have
a
lower
upper
boundary
at
a
low
number
of
snapshots,
but
as
the
number
of
snapshots
increases
the
closer
than
you
get
to
that
one
like
10
24
limit
upper
boundary
that
you
we
have
now.
B
B
So
in
our
current
situation
we
had
that
writes
in
different
places
and
we
get
more
and
more
shared
blobs
throughout
and
span
of
entire
object,
but
with
the
change
that
is
trying
to
reuse
space
in
already
used
shirt
blob
once
you
had
all
the
same
variants,
you
just
had
one
shirt
blob
that
was
snapshoted
six
times
and
there
was
nothing
change
in
it.
Then
it
should
be
basically
the
same.
It
should
revert
back
to
the
original
variant.
It
will
no
longer
be
fragmented.
A
A
And
right
now
it's
like.
We
don't
even
have
a
soft
limit
right.
We
just
as
we
go.
We
just
fragment
up
to
the
maximum,
whereas
with
yours
by
understanding
correctly,
you
would
have
a
soft
limit
below
that
hard
limit
that
you
wouldn't
go
beyond,
but
it
would
increase
as
the
number
of
top
just
increases.
C
A
A
It
will
make
the
the
get
and
put
calls
inside
the
ref
map
more
expensive,
because
we
do
well
for
the
current
implementation
with
map.
Actually
it'd
only
be
the
the
lower
bound
that
would
be
slower
because
insert
should
basically
be
the
same
cost.
B
G
G
G
D
G
A
G
A
A
B
I
mean
historically,
I
could
imagine
how
it
worked,
because
you
could
think
remember
that
we
started
from
blobs
like
being
512k
and
if
you
think
in
that
sizes,
then
having
one
shirt
blob
that
that
encompasses
entire
region
really
makes
much
sense.
You
trade,
so
many
allocation
units
for
one
region
of
object.
That
says
this
is
shared
apart
and
it
it's
in
multiple
objects
and
we
don't
want
to
just
iterate
over
so
many
elements.
C
B
That
was
me
answering
your
question
why
we
do
have
plops
for
that
usage.
G
Sure,
probably
if
the
10
24
shared
lobes
per
object
is
a
reasonable
thing
to
do.
If
it's
a
reasonable,
then
definitely
go
for
sure
extent,
you
gain
nothing
from
the
blogs.
If
it's
a
very
crazy
synthetic
benchmark,
then
maybe
shed
lops
makes
sense.
I
don't
know
that
they
do,
but
it
depends
how
many
share
blocks
you
have
per
object.
If
it's
28
24,
then
it's
definitely
a
bad
idea.
If
it's
two
three
four,
then
yeah
sure
it
makes
sense.
A
G
B
Nice,
but
we
cannot
do
that
currently,
since
we
do
not
know
who
is
an
owner
of
the
other
references,
you
have
your
object,
that
you
see
that
you
have
a
lot
of
shared
blobs
in
your
metadata,
but
you
have
no
idea
who
else
keeps
the
other
references.
So
if
you
do
anything
with
your
object,
you
will
still
have
no
way
to
optimize
the
orders.
G
If
I'm
going
to
add
my
memory
table
with
active,
let's
put
it
just
for
the
once
I'm
doing
fermentation,
then
you
can
always
check
yourself
in
this
table.
If
you're,
not
there,
then
do
whatever
you
want.
If
you
are
there,
then
you
need
something
and
before
you
do
anything
else,
you
must
put
yourself
to
kept
to
to
allocate
an
entry
in
the
hash
table
and
then
you
don't
need
persistency
or
anything.
G
I'm
saying
if
you
wish
to
do
something
on
a
short
extent,
any
modification
you
need
to
grab
a
lock
and
the
law
could
be
held
in
a
global
hash
table.
You
query
the
hashtag,
but
if
it's
there
then
you
cannot
take
it
or
maybe
you
try
to
grab
a
look.
If
the
look
is
free,
if
it's
not
there
you're
going
to
create
the
entry
and
have
it
locked
and
then
anybody
which
want
to
do
any
modific
anything
which
require
notification
or
short
extent
have
to
go
to
these
things.
G
No,
it's
not
a
very
good
idea.
It's
required
that
all
if
all
the
active
shared
extents
are
going
to
be
held
in
a
memory
table,
then
this
thing
is
doable.
B
Well,
my
approach
to
even
trying
to
be
able
to
do
such
the
fragmentation
was
to
somehow
assign
shared
blob
namespace
for
a
single
object.
I
mean
when
you
create
object,
you
get
a
shared
blob,
namespace
and
then
all
the
namespace,
you
shared
blobs
that
you
create
for
this
object
and
all
of
the
snapshots
of
this
object
will
belong
to
the
same
shared
blob.
Namespaces.
B
That
way,
you
will
have
a
limited
set
of
possible
actors
that
do
play
a
role
in
your
shared
blobs,
and
even
if
that
was
like
a
hundred
objects,
you
could
still
know
that
they
shared
the
same
id
namespace
id
with
you
that
that
way
you
could
optimize.
But
that's
that
was
only
me
thinking
not
even
being
close
to
try
to
implement
it.
G
B
A
I
have
to
say,
as
we
talk
about
this,
I
really
hate
how,
when
we
take
a
snapshot,
it's
like
we
have
all
these
references
to
the
thing
that
that
is
like
the
the
real
object
right.
I
almost
want
that
that
snapshot
to
be
an
immutable
thing
and
what's
live,
maybe
can
reference
some
of
it
and
then
maybe
doesn't
you
know
like
if
it's
changed,
but
I
want
to
be
able
to
make
the
decision
about.
A
Like
that,
we
keep
up
all
of
these
references
to
like
little
blobs
of
things,
and
we
have
to
have
them
grow
over
time,
because
we
can
keep
making
modifications
to
the
new
thing,
like
the
new
thing
then
makes
it
so
that
now
we
have
like
everything
divvied
up
in
weird
ways,
because
we
have
this
old
snapshot
that
has
all
you
know
different
data,
it's
like
it
feels
like.
A
Instead,
we
should
have
like
this
solid
base
that
then,
when
we
create
like
when
we
modify
an
object
or
change
something
new,
then
you
know
we
maybe
have
some
old
portion
of
the
data
that
we
can
reference,
but
maybe
we
don't
maybe
we
actually
create
a
whole
new
copy
of
it
and
instead,
rather
than
you
know,
trying
to
pick
at
little
bits
of
the
the
the
old
one.
Does
that
does
that
make
sense?
A
B
C
A
B
B
You
still
have
to
remember
what
you
had
and
the
new
ones
need
to
know
what's
modified,
but
you
don't
know:
do
you
don't
want
to
have
a
head
object
being
an
update
of
some
frozen
object?
You
want
a
head
object
to
be
actually
real
and
just
like
make
a
backward
difference
for
the
other
object.
But
can
it
be
done
like
you
set
some
omap
data
for
your
head
object?
You
might
think
that
okay,
I
do
set
new
omap
data
for
hit
object,
but
I
will
note
that
the
previous
object
had
set
it
back.
A
Okay,
so,
like
adam
here's,
here's
what
the
problem
I'm
thinking
of
is
like
once
I
get
to
like
say,
say:
I've
got
a
four
megabyte
object
and
once
I
get
to
the
point
where
I've
got
a
hundred
shirt
blobs
for
this
thing,
I
don't
want
it
to
care
anymore
about
referencing
the
old
thing
I
just
want
to
make
a
new
copy.
That,
then,
is
you
know
one
one
big
extent.
C
B
Sacrificing
space
for
performance
and
readability,
okay,.
A
A
C
G
A
A
D
D
C
B
B
B
A
I'm
confused
adam
when
you
have
one
shared
blob
and
you
do
a
snapshot.
What.
B
This
joint
two
things,
one
shared
blob
tracking
that
we
use
to
reference.
How
many
times
we
use
specific
disk
offsets
in
a
shared
blob.
B
That's
one
thing,
and
the
other
is
that
we
use
shared
blob
inside
blob
implementation
just
for
data
data
keeping,
and
I
want
only
to
talk
about
that
shared
blob
that
is
tracking
disk
usage.
That
part
that's.
The
type
is
called
blue
store,
ref
map,
t
or,
or
something
like
that,
yeah
and-
and
my
thinking
is-
I
will
have
only
one
that
shared
blob
per
object
when
I
allocate
I
put
it
in
I
mean,
maybe
only
when
I
do
first
stop,
let's
assume
it's
not.
B
Snapshot,
I
just
add
one
to
what
I
already
have
in
my
current
head
snapshot
and
when
I
delete
some
object.
Of
course,
snapshot
inherits
that
shared
reference
to
that
tracker
and
when
the
object
is
deleted,
it
just
removes
itself
from
tracker,
and
now
it
works
the
same.
If
the
tracker
goes
to
zero
at
some
region,
it
means
a
location
unit
is
to
be
released,
a
transaction
that
did
that.
F
D
D
D
B
B
D
D
B
D
D
D
F
D
I
I
I
mean
internally,
this
shed
tracker
would
be
pretty
similar
to
the
current
cherry
blob,
so
it
should
keep
a
sort
of
mapping
between
offset
and
reference
count,
and
it
should.
D
D
D
C
B
I
agree
that
possibly
having
actual
ones
I
I
deliberately
started
telling
about
about
one
to
make
it
simpler,
but
I
think
one
will
not
work.
It
should
be
somehow
segmented
to
to
various
offsets
of
object,
but
it
has
to
be
fixed.
So
all
the
clones
will
know
when
to
access
when
they
will
reference
or
the
reference
disk
usage.
A
A
It
was
like
this
fixed
thing
with
fixed
size,
regions
that
were
compressed,
and
then
you
just
had
like
you
know
the
ability
to
use
it
so
like
say
64k
or
128k
or
whatever,
and
then,
when
you
took
a
snapshot,
the
new
version
of
your
object
could
or
could
not
choose
to
use
one
of
those
depending
on
whether
or
not
it's
changed,
but
but
it's
not
like
it
is
now
where
it
diffies
up
ever
smaller.
It's
just
a
fixed
size.
A
No
sorry
just
forget
the
compression
part
you
just
have
fixed
size
regions
of
the
original
data.
That's
like
yours,
your
shared
blog,
but
it's
when
you,
when
you
make
a
snapshot
for
what
you
now
have
as
your
your
current
live
object
or
whatever
you
want
to
call
it.
I
guess
it
can
or
cannot
use
those.
But
it's
not
like
you
divvy
them
up
smaller.
D
B
Yes,
because
all
the
usages
I've
seen
and
all
the
unit
tests
I've
seen
they
always
have
the
same.
The
destination
of
that
is
the
same
as
source,
and
it
makes
perfect
sense
for
all
snapshots
and
also
for
cloning
regular
objects
to
move
offset.
You
will
actually
do
some.
I
don't
even
know
what
you
can
achieve
if
you
clone
object,
but
move
offset.
B
C
A
A
A
Sounds
good,
thank
you
guys
for
the
discussion.
Hopefully,
hopefully
we
make
progress
here.