►
From YouTube: CDS Jewel -- Sloppy Reads
A
B
I'll
see
someone
or
yeah,
there
has
been
some
interest
lately
in
well,
as
always
reducing
latency
on
reads.
So
beyond
the
things
we
talked
about
before,
there
are
some
sort
of
architectural
choices
and
stuff
that
make
it
difficult
to
guarantee
latency,
for
example,
if
we
can't
be
sure
that
we
have
a
proof
that
we've
seen
every
every
right,
that's
been
committed
to
the
client,
we
have
to
block
reads
until
OS
DS
come
up
that.
Let
us
do
that.
B
That
would
be
the
period
where
you're
stuck
in
peering
or
inactive
or
down,
and
that
sort
of
a
bummer.
That's
just
the
choice
of
the
way
we
of
the
consistency
model
we
have
so
there
are
use
cases
where
it
might
be
valid
to
not
do
that.
For
example,
if
you
have
a
use
case
where
every
new
object
is
created
with
unique
name
and
is
never
recreated
and
is
never
modified,
then
if
you
find
anything
with
that
name,
you
can
be
sure
that
it's
the
right
thing
there's
no
possibility.
B
That's
sure
that
you
haven't
add
a
bit
copy.
So
in
a
situation
like
that,
it
might
be
worth
it
to
allow
the
client
to
read
from
a
replica
or
an
OSD
that
happens
to
have
a
copy
of
the
PG
that
is
not
currently
active
for
that
PG.
If
the
replica
returns
you
know,
nth
I,
don't
have
a
copy,
then
the
client
can
simply
retry
later
on
a
different
OSD.
B
But
if
it
gets
a
success
and
act
was
actually
able
to
read
the
object
that
it
knows,
it's
good
to
go
so
I
guess
what
the
discussion
should
be
about
one
do
any
of
you
have
used
cases
where
this
would
be
useful
to.
How
do
we
want
to
make
the
interface
hypothetically
for
something
like
this
work?
I
think
it
would
be
pro
hit
a
sensitive,
redundant,
rbd
pool,
because
there's
no
way
to
do
that
correctly.
There's
I
can't
think
of
a
valid
reason
to
perform
this
sort
of
read
on
honored
our
people.
B
B
C
B
D
Right
don't
want
to
if
you
want
to
enable
this
optimization
or
something
for
its
temples,
you'd
like
to
give
to
do
that,
instead
of
having
to
create
a
new
pool
type
or
if
you
accidentally
create.
If
you
don't
know
about
this
yet
and
then
you
want
to
use
later
nice
nice
to
not
have
to
copy
all
your
data
over
a
canoe
pool
to
do
that.
D
B
B
D
B
B
D
C
D
C
C
B
B
The
Raiders
request
is
to
that,
but
the
head
object
would
be
presents
that
would
imply
I'm
trying
to
decide
whether
it's
possible
for
you
to
be
missing
an
IO
that
changed
the
state
of
the
object
before
the
snap,
because,
if
you're
reading
from
head
necessarily
that
object
on
that,
OSD
hasn't
seen
that
snapshot
yet.
So
you
don't
actually
know
for
sure
that
there
wasn't
another
snapshot,
take
that
there
wasn't
another
I/o
taken
or
that
happened
between
the
object
you're
looking
at
and
the
snapshot
that
was
requested.
B
B
That's
the
problem,
that's
that's!
That's
why
we
can't
tell
so
if
you
were
half
way
through
flushing
the
rights
and
then
the
yeah
things
have
changed.
The
old
replica
would
have
an
object
at
whatever
version
and
would
then
receive
a
read
at
a
particular
snapshot
that
was
in
the
future
of
its
current
snap
context.
On
that
object.
It
wouldn't
be
able
to
tell
the
difference
between
that
case
and
the
case
where
it
had
fully
flushed
out
all
the
rights
you're.
D
In
general,
you
also
don't
want
to
seal
them
for
every
day,
because
you
could
potentially
do
have
people
like
taking
some
chances
and
also
continuing
to
write
the
image
hi
Sam,
you
could
say,
you're
sealing
this
particular
snapshot,
perhaps
as
part
of
doing
that
snapshot,
that's
also
not
great,
because
then
you
have
to
actually
you're
increasing
it,
you're,
basically
making
the
snapchat
or
n
in
the
side
of
the
image.
Instead
of
you.
D
D
D
B
E
So
I
have
a
general
question
sure,
so
this
would
work
only
for
replicas
right
when
you're,
when
you're
replicating
I
mean
not
for
a
race,
you're
coding.
It.
B
Could
be
extended
to
work
with
a
retro
coding
and
okay,
so
there's
no
inherent
reason
why
you
couldn't
do
this
with
an
arrest,
recording
cool?
What
you
do
instead
is
you'd
pick
some
like
of
the
OS
DS
you
know
about.
You
would
pick
some
SAP
such
that
it
covers
enough
of
the
shards
that
you
can
do
a
reconstruction
and
then
you'd
perform
you'd,
see
you
send
your
read
messages
and
assuming
you
got
back
versions
that
were
consistent
across
and
you
can
get
back
a
version
number
from
the
read.
B
That's
the
tricky
part
why
we
don't
support
replica
reads
on
a
ratio
that
pools
yet
we
have
a
plug-in
system
for
the
erasure
coding,
libraries
and
the
monitors
and
the
OSD
is
have
a
have
the
plugins
locally,
but
clients
don't
necessarily
so
it
would
be.
A
choice
to
make
that
sort
of
thing
available
to
the
client
and
we'd
have
to
make
some
voices
about
how
that
would
be
configured.
It
might
make
sense
to
have
rgw
do
it.
It
might
not
make
sense
to
have
look
RVD.
Do
it
if
that
makes
sense.
E
Yes,
yep,
thank
you
and
I
have
another
another
kind
of
really
want
sure.
Would
it
make
sense
to
to
create
a
more
generic
consistency?
I
mean
to
support
multiple
consistency
levels,
not
just
a
few
and
in
a
generic
way.
So
you
could
say:
okay,
I
want
to
be
to
have
read,
read
my
rights
or
or
or
have
eventual
consistency
or
strong
consistency
or
monotonic
reads,
and
things
like
that,
instead
of
just.
B
So
usually
the
closest
people
get
to
that
is
allowing
you
to
specify
the
number
of
or
what
percentage
of
the
Quorum
group
you
need
to
write
to
that
doesn't
generalize
well,
and
it
doesn't
actually
give
you
the
properties
that
Cassandra,
for
example,
claims
it
does
if
you
lose
enough
o
as
to
use
in
a
situation
like
that,
you
could
still
get
an
inconsistent
read
even
though
you've
read
it
all
of
your
the
fulcrum
group,
and
that's
so
I
guess
the
short
answer
is
I
have
yet
to
see
a
generic
system
that
has
that
much
XX
precipice.
B
The
idea
with
this
one
is
that
we
can
get
this
relatively
simply.
All
we
really
need
to
do
is
disable
some
checks
and
change
the
semantics
of
it
and
ratos.
That's
it's
not
so
bad.
The
next
step
would
be
to
create
an
actual
AP
pool
type,
which
does
something
more
like
what
Cassandra
does.
That
would
be
also
an
exciting
project,
much
more
work.
We
have
you
have
to
rewrite,
or
you
have
to
create
an
entirely
new
PG
type.
B
You
wouldn't
be
able
to
reuse
our
recovery
mechanisms
because
you
wouldn't
have
a
log
and
you
wouldn't
be
able
to
reuse,
or
rather
the
existing
recovery
mechanisms.
You
wouldn't
be
able
to
reuse
the
existing
peering
mechanisms
because
they
wouldn't
be
applicable
and
they
be
doing
things
that
you
didn't
want
them
to
do
anyway.
B
E
I
mean
there's,
there's
word
that
it's
that
it
has
been
done
by
microsoft,
research
where
they
have
this
general
framework,
where
you
have
where
you're
able
to
specified
on
a
client
session.
What's
your
consistency
level
that
you
want
to
observe,
and
then
the
system
provides
that
for
you,
but
I
I
understand
your
point:
I
didn't
I!
Guess
what
you're
saying
is
that
there's
a
lot
of
the
consistency?
B
Also
that
well,
for
instance,
F
with
our
pools,
it
doesn't
make
sense
for
a
replica
to
choose
weaker
right
semantics.
It's
not
possible,
and
it's
not
so
much
because
of
that
is
because
we
have
a
read
after
write,
guarantee
a
strong,
strongly
consistent.
We
have
to
write
guarantee,
so
it
doesn't
make
sense
for
a
client
to
do
a
write
with
anything
less
than
that
consistency,
because
they
can't
be
sure
that
other
clients
haven't
requested.
E
C
C
D
B
Don't
need
special
support
for
later.
All
we
need
to
be
able
to
do
a
support,
object
class
with
a
guard
on
your
next
set
of
al.
You
yeah,
it
might
make
sense
to
add
support
for
it
simply
so
that
there's
a
canonical
way
of
doing
it
and
we
could
return
a
nice
error
if
you
try
to
perform
an
override.
But
aside
from
that.
D
B
Well,
they
don't
really
need
to
do
the
ceiling,
although
it's
just
that
you
don't
get
useful
results
back
in
the
car
in
ok,
so
the
easy
one
is
where
you
create
an
object,
fully
seal
it
and
then
never
modify
it
if
you're
actively
modifying
an
object.
The
only
way
this
kind
of
read
makes
sense
is,
if
you
genuinely
don't
care
what
version
you
get
back.
That
would
be
odd.
I
can't
offhand
think
of
a
reason
to
do
that,
but
that's
probably
lack
of
imagination
on
my
part.