►
From YouTube: Ceph Code Walkthrough 2020-07-14: Dmclock
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Good
morning,
everybody
what's
up
ready,
set
guy
good
crowd
here.
So
welcome
to
the
code
block
2
for
dm
clock.
A
Of
a
algorithm
for
distributed
quality
of
service
eric,
if
you
want
to
take
it
away.
B
Sure
so
this
was
an
effort
by
me
a
few
years
ago
to
try
and
improve
the
fairness
within
within
ceph
and
the
osds,
and
so
it
was
the
focus
of
my
work
for
quite
a
while,
but
actually
I
probably
haven't
touched
the
code
in
a
couple
years
at
this
point.
So
this
morning
I've
been
kind
of
going
through
reviewing
it,
and
I
remember
how
things
were
organized.
B
Why
did
certain
things
certain
ways
and
then
more
recently,
other
people
have
kind
of
taken
on
dm
clock,
so
within
rgw.
I
think
both
casey,
bodley
and
abhishek
was
also,
I
think,
on.
B
This
call
had
done
quite
a
bit
of
work
with
dm
clocks
in
the
rgw
space
and
sam
just
has
done
reworked
quite
a
bit
of
it
on
the
core
side
as
well
so
anyway,
so
they
especially
sam,
might
have
a
stronger
knowledge
at
this
point
of
how
things
are
organized,
but
I'm
happy
to
to
go
through
and
and
share
things
and
anyway,
let's
go
from
there.
B
All
right
are
people
seeing
my
shared
screen.
B
B
So
everything
kind
of
comes
back
to
here
now
over
time
we
did
kind
of
find
areas
where
things
weren't
working
as
we
would
expect.
This
we've
made
a
few
deviations
from
the
algorithm
as
described
here,
but
for
the
most
part
it
follows
it
pretty
religiously.
B
The
second
thing
to
realize
is
that
dm
clock
is
entirely
was
written
as
a
library
that
could
be
incorporated
in
a
variety
of
contexts,
so
it
the
library
itself,
is
simply
some
header
files
which
are
heavily
templatized
and
and
and
some
other
code
which
which
actually
does
get
compiled,
but
because
of
the
templates
you
can
use
in
a
variety
of
circumstances.
It's
not
written
for
seth,
specifically,
so,
in
fact
it
has
its
own
github
repository.
B
It's
a
sub
module
of
ceph
and
other
people
could
actually
use
it
for
entirely
different
projects
all
together,
so
the
paper
is
available
online
and
it
and
for
those
of
you
looking
at
the
code,
it
might
help
explain
certain
things.
B
If
certain
questions
do
arise,
let
me
switch
to
a
different
window
now,
all
right
now
the
paper
describes
the
m
clock
algorithm
and
then
a
variation
of
it
called
the
dm
clock
algorithm
which
stands
for
distributed,
distributed
m
clock
and
the
idea
is
that
you
have
multiple
servers
that
can
handle
various
requests
and
multiple
clients
as
well.
So
this
is
kind
of
looking
at
distributed
clock
and
for
any
given
request
a
client
might
direct.
It
need
one
of
a
number
of
servers.
B
Now
this
is
very
nicely
in
the
saf
world,
because
the
servers
are
basically
osds
and
you
can
imagine
clients,
you
know
figure
out
which
osd
gets
a
given
request
and
sending
it
to
the
appropriate
osd,
but
we
want
to
ensure
fairness
among
all
clients,
no
matter
which
server
that
which
server
or
servers
they're
they're
sending
requests
to,
and
so
one
of
the
nice
features
of
the
dm
clock.
B
Algorithm
is
that
the
servers
are
not
sharing
information
among
themselves
about
the
clients,
the
servers
are
simply
getting
requests
servicing
the
requests
and
the
responses
back
to
the
clients,
it's
the
job
of
the
clients
to
keep
track,
of
which
servers
have
successfully
handled
requests
for
them
and
keep
track
of
those
internally,
and
that
way,
every
request
to
a
new
server
can
can
give
like
a
a
little
bit
of
knowledge,
a
little
bit
of
history
of
what
kind
of
service
that
client
has
received
from
other
servers
and
that
allows
the
server
getting
this
request
to
weight
things
appropriately.
B
So
if
this
client
has
been
getting
a
lot
of
requests
serviced
by
server
one-
and
the
next
request
goes
to
server
two
server,
two
will
say:
okay,
your
needs
are
being
met
or
haven't
met
in
the
recent
history
because
of
server
one,
and
therefore
your
request
might
not
be
rated
as
highly
as
some
of
the
other
clients,
and
so
this
distributed
aspect
happens
strictly
on
the
client's
side
and
simply
they're
reporting
back
with
every
request
kind
of
the
level
of
service
they've,
already
they've
been
achieved,
they've
achieved
recently,
that's
kind
of
an
interesting
thing
about
how
dm
clock
is
organized.
B
The
other
thing
is
that
the
term
client
is
is
where
can
be
used
relatively
loosely,
so
the
typical
environment
is
that
we
actually
have
different
clients,
people
wanting
to
use
ceph
for
various
purposes,
and
each
of
them
gets
to
send
some
requests
to
the
server.
B
But
when
this
was
originally
used
within
ceph,
clients
was
used
far
more
broadly.
In
that
we
wanted
to
be
able
to
adjust.
B
How
servers
were
handling
requests
from
actual
clients,
but
also
handling
background
processes,
and
I
remember
what
the
names
of
the
background
processes
but
different
kinds
of
cleanup
processes
and
whatnot
that
have
to
occur
in
order
for
a
sephluster
to
run
smoothly,
and
so
we
have
we,
we
had
one.
One
client
was
specifically
four
clients
themselves,
and
then
we
had
different
other
clients
used
for
different
background
processes,
so
we
could
give
those
lower
weight
or
reserve
fewer
slots
for
them.
We
want
those
background
processes
to
run
and
accomplish
their
goals.
B
We
want
to
balance
those
against
requests
from
actual
clients,
and
so
there
literally
was
just
one
of
these
clients
was
literally
for
clients
all
the
other
clients,
all
the
ceph
clients,
and
then
other
clients
were
for
specific
background
processes,
and
this
is
one
area
of
code.
That's
changed
quite
a
bit
since
I
last
touched.
I
was
just
looking
at
the
code.
B
Realizing
I
didn't
recognize
it,
and
sam
just
has
completely
rewritten
all
of
this
code
here,
we'll
take
a
quick
glance
at
it
during
the
code
walkthrough,
but
he's
adjusted
things
quite
a
bit
from
where
I
had
last
left
things.
B
Okay,
I
don't
know
if
I
people
can
ask
questions
whenever
it's
most
convenient
for
them.
Maybe
I'll
just
ask.
Are
there
any
questions
at
this
point.
B
Okay,
let
me
see
if
I
can
help
organize
things
here,
a
little
further,
so
the
server
has
this
idea
of
clients.
Each
client
has
a
queue
of
requests
that
have
arrived
for
the
client
again
we're
using
the
term
client
very
loosely.
But
probably
you
just
want
to
understand
the
algorithm
itself.
It
probably
makes
sense
to
think
of
clients
as
true
clients
here
and
so
you'll
see
that
we
have
n
different
clients
here.
Each
client
has
a
queue
of
requests.
B
The
the
first
request
in
each
queue
is
here
on
the
left
and
is
highlighted
in
red.
Okay
now
each
request
has
a
tag
that
goes
with
it,
and
the
tag
includes
its
reservation,
its
limit
and
its
weight
all
right
yeah.
We
actually
assign
these
actually
by
clients,
as
opposed
to
every
single
request.
We
don't
keep
repeating
information
again
each
time
but
and
that
helps
determine
which
request
is
handled
next
by
a
given
server.
B
So
one
server
has
has
all
these
clients
has
these
requests
lined
up,
and
it's
looking
only
at
the
first
request
for
each
client
in
order
to
determine
which
thing
which
requests
the
service
next
so
for
any
given
client,
the
requests
are
handled
sequentially
all
right.
The
only
question
is
which
client
gets
this
request
handled
next,
and
so
it's
looking
at
the
first
item
in
each
of
those
requests.
B
So
at
this
point
I
don't
know
if
this
is
as
organized
as
I
could
make
it.
Let
me
actually
start
looking
at
some
code
here.
B
Hopefully,
so
let
me
share
a
different
window,
so,
as
you
can
see
the
dm
clocks
source
code
sub
module
within
ceph
here's,
the
top
of
the
seth
tree,
then
we
go
to
source
and
we
have
dm
clock
and
then
we
have
its
own
source
directory
from
there
right
and
if
we
look
in
this
directory,
actually
let
me
pop
up
one
level.
First,
at
the
dm
clock
level,
we
have
some
files
for
making
things
and
whatnot.
B
We
also
have
a
source
directory
which
has
the
source
code
for
dm
clock
itself.
We
have
support,
which
has
which
persistently
specifically
has
one
data
structure
which
is
key
to
how
the
server
side
is
implemented.
It's
called
an
indirect
intrusive
heap,
so
we'll
take
a
look
at
that.
B
It,
of
course,
has
some
testing
code
and
then,
when
I
was
developing
it,
I
wanted
to
make
sure
it
was
running
as
I
was
expecting.
So
it's
actually
a
little
bit
of
a
simulator
in
here
as
well
to
actually
simulate
a
bunch
of
servers
and
clients
and
using
the
algorithm
itself
to
see
how
the
requests
are
serviced
and
so
forth.
B
Then
there
is
a
header
file
for
some
records,
some
structures
and
it's
actually
relatively
small,
but
these
are
structures
that
are
shared
between
both
the
client
and
the
server
side,
so
only
to
be
defined
once
there's
also
some
utility
code
as
well,
that
can
be
used
by
either
the
client
or
the
server
and
just
take
a
quick
look
at
the
rex
file.
B
It's
actually
not
that
big,
that's
the
entire
size
of
the
file
here
and
basically
defines
a
counter
type,
a
phase
type,
whether
a
given
request
is
handled
during
the
reservation
phase
or
the
priority
phase
and
the
request
params.
These
are
params
that
come
in
with
each
request
that
contain
these
values
called
row
and
delta.
Again,
these
were
not
again.
These
are
the
values
that
each
client
sends
in
with
each
request,
which
describe
the
level
of
service.
B
That
client
has
received
from
other
servers
and
that
allows
a
given
server
with
a
given
request
to
appropriately
prioritize
this
request.
So
if
it's
been
getting
a
lot
of
service
from
other
servers,
it's
not
going
to
get
us.
The
request
is
going
to
be
lower
than
the
priority
as
compared
to
other
requests
from
other
clients.
B
Okay,
so
that's
really
all
that
that
contains,
and
actually
the
the
numbers
or
the
values
that
are
contained
within
these
are
delta
and
rho
delta
is
how
many
replies
client
has
received
since
its
most
recent
request.
A
B
So
it's
just
simply
saying
you
know:
I've
received
five
reply,
since
my
most
recent
request
because
requests
go
out
and
then
there's
a
delay
before
the
replies
come
back
in
and
then
row
is
how
many
of
those
requests
were
handled
during
the
reservation
phase.
Again,
there
are
two
phases
that
dm
clock
runs
at
it
for
reservation,
means
kind
of
the
minimal
level
of
quality
of
service
that
we're
trying
to
give
to
a
given
client,
and
so
we
first
try
and
achieve
the
minimum.
B
The
reservation
for
every
client
once
every
client
has
achieved
its
reservation.
We
then
switch
to
our
priority
phase,
where
we
use
a
weight
value
to
determine
which
request
is
serviced
next,
right
and
so
row
simply
counts
of
those
recent
requests.
B
How
many
were
handled
during
the
reservation
phase
and
then
simply
by
subtraction,
you
can
determine
how
many
were
handled
by
the
priority
phase,
so
those
is
simply
two
extra
integers
that
are
sent
in
with
each
request
that
determine
that
that
describe
the
level
of
service
a
given
client
has
received
from
other
servers.
Okay,
now
the
client
code
is
a
lot
simpler.
So
why
don't
we
start
there.
B
And
in
order
to
for
people
who
want
for
people
who
want
to
use
dm
clock,
keeping
track
of
row
and
delta
is
a
little
bit
of
a
chore
itself.
So
there's
actually
a
data
structure
designed
to
do
that
and
that's
called
a
a
tracker
and
two
trackers
are
provided
for
in
this
source
file.
B
One
is
called
or
ridge
tracker,
which
is
the
original
tracker,
as
described
in
the
dm
clock
paper
implemented
as
closely
to
that,
as
we
could
figure
out
at
the
time
all
right,
and
then
there
was
an
alternative
that
was
developed
as
well
called
the
borrowing
tracker,
okay
and-
and
I'm
not
actually
sure
I'll-
have
to
look
at
the
code
to
see.
B
B
I
should
say
in
asia
who
were
really
interested
in
dm
clock
and
really
providing
a
bunch
of
feedback,
doing
analyses
on
how
it
was
operating
and
so
forth,
and
there
was
long
discussions
about
how
the
original
track
was
intended
to
work,
and
I
believe
now
the
borrowing
tracker
is
no
longer
used,
because
some
tweaks
were
made
to
the
original
tracker,
probably
maybe,
as
original
paper
originally
wanted
them
to
be
handled.
B
But
there
was
some
ambiguity
in
interpreting
the
paper,
but
I
think
the
original
tracker
is
the
only
one
that's
being
used
at
this
particular
point
in
time
and,
as
you
can
see
it,
it
has
a
function
here
called
prepare
request
that
that
returns,
the
request
parameters
with
the
delta
in
a
row
and
then
calculates
the
next
delta
and
rows
as
well.
And
then,
when
a
response
comes
in,
I
remember
exactly
how
this
works
here.
B
B
Okay,
coming
down
below
here,
we
have
this
thing
called
the
server
service
tracker,
which
then
takes
one
of
those
two
trackers,
that's
the
template
t
and
by
default
it
uses
the
original
tracker,
and
then
we
need
some
way
to
identify
each
of
the
different
servers,
and
we
don't
want
to
bake
this
in
because
we
want
dmclock
to
be
as
flexible
a
library
as
possible.
B
So,
however,
we
track
servers,
it
could
be
strings,
it
could
be
integers,
it
could
be
what
that
is
the
first
parameter
to
the
template
all
right.
It
keeps
track
of
a
variety
of
things
here,
including
all
the
various
servers
and
there
and
there's
different
requests
going
out
to
them
and
and
then
basically,
as
I
can
find
it
here,
when
we
have
a
request
ready,
we
call
this
get
request.
B
B
So
those
can
be
sent
back
in
and
then,
when
the
reply
comes
back
in
the
client
will
call
this
function
track
response
specifies
the
server
id
which
phase
that
happened
in
and
what
the
cost
of
the
request
was,
and
it
goes
then
uses
that
to
then
update
the
information
about
that
particular
server
and
so
forth.
So
the
deltas
and
rows
can
be
tracked
appropriately.
B
So
for
the
client
there's
just
a
little
extra
effort
involved
in
calling
one
function
when
a
request
is
ready
to
be
sent
to
the
server
and
then
another
function
to
call
when
the
response
comes
back
from
the
server
now
at
some
point,
we
introduced
this
idea
of
cost
into
this,
and
that
is
because
not
all
requests
have
the
same
cost.
B
So
some
read,
requests
might
ask
for
1k,
some
might
ask
for
16k
and
those
might
have
represent
different
costs
to
the
server
write.
Requests
might
be
more
costly
to
the
server
than
read
requests
and
then
there
may
be
thing
requests
that
are
actually
composed
of
multiple
sub
operations
and
so
forth,
and
those
requests
might
be
more
expensive
than
other
requests.
B
We
send
that
back
into
this
track
response
function,
so
that
the
deltas
and
rows
can
be
updated
appropriately,
saying
that
we
essentially
got
five
requests
handled
with
this
one
request,
because
it
was
such
an
expensive
request
all
right,
so
it
gets
a
little
bit
complicated
here,
but
for
the
client
you
just
simply
create
a
service
tracker,
and
then
you
call
these
two
functions
with
each
request
and
each
reply
respectively
to
track
things
on
the
client
side.
B
All
right,
this
is
definitely
the
more
complicated
aspect
of
the
algorithm
here.
B
And
it's
all
in
a
header
file,
because
templates
are
being
used
to
create
as
much
flexibility
as
possible
for
those
who
want
to
use
this
library,
yes
I'll,
just
kind
of
come
down.
Look
for
the
interesting
things
here,
so
for
every
client.
We
have
to
keep
track
of
certain
information
about
the
client.
B
What
its
reservation
is
again
that
represents
the
minimum
amount
of
service
we're
trying
to
give
to
that
client,
the
weight
and
the
and
the
weight
is
used
once
the
minimum
service
is
achieved.
We
then
use
the
weight
to
determine
which
client
gets
its
request
next
handled
and
then
there's
also
a
limit.
B
We
can
put
a
cap
on
the
level
of
service,
we
give
a
specific
client
as
well,
and
so
that
again
only
comes
during
the
the
weight
phase,
the
proportional
phase
of
this,
and
that
only
clients
who
have
not
reached
their
limit
are
considered
when
we
choose
the
next
request
via
the
weight
or
the
proportion.
B
Unfortunately,
I
think
the
code
uses
both
way
and
proportion
and
they're
essentially
equivalent,
but
unfortunately
different
parts
of
the
code
use
different
terms.
That's
could
be
corrected
here
all
right.
B
We
also
keep
track
of
the
inversion
of
all
of
those,
so
one
over
the
reservation
is
the
reservation
in
inverse
one
over
the
weight
is
the
weighted,
verse
and
so
forth.
We
also
allow
a
zero
to
be
given
for
the
reservation,
the
weight
or
the
limit,
which
means
that
we
don't
care
all
right.
So
if
we
give
a
zero
in
for
the
reservation,
that
means
there
is
no
reservation.
B
There
is
no
minimum
level
of
service,
and
only
when
the
reservations
for
all
of
our
clients
have
been
met,
will
we
then
have
our
weight
will
be
used
to
determine
if
we
get
some
service
as
well
weight
if
the
weight
is
zero?
That
means
that
we're
only
going
to
handle
things
based
on
the
reservation
and
if
the
limit
is
zero,
that
means
there's
no
limit
for
this
particular
client.
There's
no
cap
on
it,
so
each
of
those
could
be
zero
and
because
they
can
be
zero.
B
B
The
request
tag,
as
you
can
see,
contains
the
reservation
proportion
and
the
limit.
Okay,
also
the
delta
and
row
that
came
in
with
the
request,
the
cost
as
it
was
calculated
and
whether.
B
That
we
have
not
hit
the
limit.
If,
if,
if
we
hit
the
limit,
then
this
is
not
ready.
Okay
and
again,
this
is
used
during
the
wait
phase
of
dm
clock.
B
Okay,
there's
a
code
here
to
calculate
the
request,
tags.
B
All
right,
and
then
we
start
getting
to
the
meat
of
the
algorithm
here
with
this
templatized
class
called
priority.
Q
base
all
right,
and
there
are
a
number
of
template
parameters
that
we
can
use
to
tune
this
to
for
various
purposes.
Okay,
so
c
is
the
identifier
for
clients.
Again,
we
don't
want
to
specify
that
a
priori,
so
this
could
be
a
string.
It
could
be
an
integer,
it
could
be
any.
It
could
be
a
structure
that
identifies
the
various
clients
we're
using
our
determines.
B
What
the
structure
is
that
the
requests
are
comprised
of
that
we
have
to
keep
track
of
in
our.
Our
queue
somehow
is
delayed
was
an
optimization,
I
believe,
added
by
casey
bodley,
which
is
that
when
do
we
calculate
the
tag
for
a
given
request
now,
originally
every
tag
was
actually
I'm
not
sure
it
was
casey.
B
It
might
have
been
one
of
the
contributors
in
asia
who
contributed
this,
but
basically
originally
the
tag
was
calculated
when
the
request
came
in
it
would
use
the
most
recent
row
and
delta
for
that
client
that
it
knew
of
at
that
time.
B
This
optimization,
where
we
delay
the
tab
calculation,
actually
makes
a
dm
clock
work
better
better.
Rather
in
that
we
calculate
the
tag
once
a
request
achieves
the
head
of
the
queue
for
that
particular
client,
and
that
allows
us
to
use
the
most
recent
row
in
the
most
recent
delta
to
calculate
the
tag
allowing
it
to
use
the
most
recent
information
which
allows
the
the
algorithm
to
be
as
fair
as
possible.
B
Then
we
have
the
b
parameter,
there's
a
heat,
that's
being
used
to
actually
multiple
heaps.
There's
three
heaps
being
used
to
track
all
the
different
clients
and
with
the
heat
data
structure
you
can
use
a
fan
out
factor
by
default
is
typically
two.
B
We
had
one
student,
a
graduate
student
who
spent
a
summer
looking
to
see
whether
we
could
optimize
the
fan
out
for
the
heat,
and
I
believe
that
he
ultimately
determined
that
three
might
have
been
slightly
better
than
two,
but
it
was
two
or
three
you
can
specify
that
here.
B
I
forget
whether
this
is
a
base
class
and
I
believe
the
the
subclasses
have
the
default
for
that
it's
either
two
or
three
and
and
finally
there's
this
other
boolean
parameter,
which
determines
whether
the
client
information,
that
is
the
reservation
for
the
client,
the
the
limit
for
the
client
and
the
wait
for
the
client
are
determined
dynamically.
B
So
if
the
clients
can
possibly
change
those
values,
a
client
can
request.
Maybe
a
higher
level
of
reservation
that
there's
a
higher
minimum
level
of
service
that
could
be
changing
all
the
time
and
so
either
we
once
a
client
is
introduced
to
this
to
the
queue.
We
simply
leave
those
those
values
static
for
that
client,
or
do
we
occasionally
re-query
the
client
and
say
what
is
your
current
reservation?
What
is
your
current
limit?
What
is
your
current
weight
and
update
our
internal
data
structures
based
on
that?
B
Of
course,
we
have
to
keep
on
querying
the
client
that
requires
a
lot
more.
That
requires
some
additional
work
and
and
slows
things
down
a
bit.
So
we
only
want
to
do
that
if
necessary,
so
it
really
depends
on
whether
this
is
being
used
in
a
more
of
a
static
or
dynamic
situation,
where
clients
could
possibly
alter
those
values,
and
that's
why
this
is
a
boolean
here.
B
B
B
That's
going
to
be
in
that
support
subdirectory
of
the
dm
clock,
but
we
actually
have
to
keep
track
of
three
different
heats
one
for
the
reservations
we
want
to
the
heap
data
structure,
for
those
of
you
who
aren't
familiar
with
it
is,
is
often
used
in
priority
queues
for
a
couple
reasons.
B
B
It
has
a
login.
Most
operations
have
a
log
in
order
of
login.
B
What's
the
word,
I'm
looking
for
an
order
of
login
all
operations,
order
of
law
again,
which
means
a
relatively
inexpensive
and
really
it
doesn't
have
to
keep
all
the
elements
within
it
sorted
from.
First
to
last,
it's
simply
for
a
priority
queue.
You
simply
need
to
know
which
has
the
highest
priority
at
any
given
moment,
and
so
it
it
simplifies
things
by
only
having
to
track
what
the
most
highest
priority
item
is
within
within
the
heap.
B
B
We
also
need
to
know
which
clients
are
still
within
their
limit
that
if,
if
a
client
has
gotten
a
lot
of
of
their
request
service,
they
may
be
outside
their
limit
and
they
may
not
get
any
attention
for
and
until
we
get
back
below
their
limit
and
then
for
once,
we've
satisfied
all
the
reservations
we
need
to
go
by
weight
and
the
ready
heap
is
a
heap
that
contains
only
those
clients
who
have
not
exceeded
their
limit,
and
then
it
orders
things
by
weight,
as
opposed
to
by
the
reservation.
B
So
reservation
heap
orders
things
by
reservation,
limit
by
limit
and
the
ready
or
is
has
two
two
factors
in
in
things.
One
things
that
have
are
at
their
limit
are
at
the
end
of
the
heat.
B
B
All
right,
so
I
hopefully
you're
back
at
at
the
picture
of
the
server
view
and
the
elements
in
the
heap
are
each
client,
so
client
one
is
one
element
in
each
of
these
heaps.
Client
two
is
one
element.
Client
three
is
an
element
but
they're
ordered
based
on
the
tags
for
the
first
request
in
each
of
those
queues,
and
so
the
request
that
has
the
lowest
reservation
will
be
at
the
top
of
the
reservation.
B
B
The
heap
doesn't
contain
the
client
itself,
because
it
can't
same
client
can't
appear
in
three
heats:
there's
simply
a
record
in
each
of
those
three
heaps
which
refers
back
to
each
of
these
clients
and
that's
why
it's
an
indirect,
intrusive
heat,
because
there's
one
level
of
indirection
so
that
these
these
structures,
indirect
structures,
can
appear
each
one
can
appear
each
heap
can
contain
a
bunch
of
those,
all
of
which
refer
back
to
the
actual
clients,
all
right.
B
B
There
is
some
conditional
compilation
here
there.
Originally
we
thought
there
might
be
a
fourth
heap,
the
one
that
just
keeps
track
of
a
proportion
or
weight,
but
it
turns
out
that
is
entirely
subsumed
by
the
ready
heat,
and
so
I
don't,
I
don't
believe
I
believe
used
prop
heap
is
always
false
and
therefore
we
don't
incorporate
the
force
heat
into
things.
Currently,
one
other
important
detail
that
I
missed
coming
back
up
here.
B
Please
give
it
to
me
and
that's
a
pull
based
system.
You
could
also
imagine
a
push
based
system
as
well,
and
so
those
are
the
two
subclasses
of
priority
queue
base
a
pull
hue
and
a
push
cue.
Now
it
turns
out
that
I
believe
that
the
two
uses
of
of
dm
clock
within
ceph
one
is
for
the
osd's
the
other
is
for
rgw.
B
B
B
We
have
this
immune
class
called
next
request
type,
and
it
could
be
well
first
of
all,
where
I,
I
believe,
there's
a
boolean
somewhere
in
here
that
determines
whether
or
not
we
enforce
the
limit.
B
Should
there
just
not
be
a
request
there,
because
all
the
clients
have
achieved
their
lit,
have
already
passed
a
limit
until
some
more
time
goes
by
and
then
we
can
get
service,
more
requests
from
them
or
the
osd
is
ready.
Why
not
give
it
something
all
right
and
so
somewhere?
I
believe
there
is
a
variable.
Let
me
see
if
I
can
find
it
here
quickly.
B
A
B
And
we
either
can
or
we
can't
we
can
either
allow
it
to
be
enforced
or
not.
Have
it
enforced
not
popping
out
to
me
right
now
but
realize
we
have
the
two
different
ways
of
using
it
either
enforcing
the
limits
or
you
know
when
I
push
comes
to
shove
and
the
osd
is
ready
for
another
request,
give
it
something,
even
if
everything
is
at
their
limit
already,
but
when
the
limit
is
being
enforced.
B
This
next
request
type
comes
into
play.
So
when
we
add
when
the
osd
says,
I
would
like
your
next
request.
That's
a
that's
based
on
the
priorities.
You
can
either
return
a
request.
B
It
could
say
it
has
there's
no
request
in
the
system.
We
we
simply
have
none.
Sorry
or
it
can
say
all
the
clients
have
achieved
their
limit,
but
the
next
one
that
once
again,
the
limit
is
something
like
five
requests
per
second
or
something
along
those
lines.
So,
if
we've
already
given
this
client
five
requests
in
the
last
second,
it's
already
achieved
its
limit
and
we
it's
supposed
to
be
capped.
We
could
say
that
you
know
in
two
seconds.
B
B
The
next
immune
class
is
heap
id
again.
This
is
for
that
specifies
whether
we
service
a
request
during
the
reservation
phase
or
the
weight
or
proportional
phase
and
we're
using
the
ready.
So
now
we
actually
have
three
different
terms,
for
we
have
about
we're
using
the
ready
queue
here
for
that,
so
for
the
other
phase,
which
is
the
weight
proportional
or
it's
based
on
the
ready
q.
Here,
all
right.
B
So
there
is
this
data
structure
that,
when
a
request
is
requested
by
the
server
to
the
osd
to
operate
on
it,
it
says
which
type
of
request
it
is
returning
future
or
none.
And
then
we
have
a
union
here
that
either
contains
the
heap
id,
which
is
either
reservation
or
ready,
meaning
that
we
are
returning
a
request
and
here's
which
phase
it
was
returned.
B
It
was
it's
coming
from
or
if
it's
future
returns
a
time
instead
saying
here's,
when
the
next
request
can
be
serviced
all
right,
so
it's
one
or
the
other
of
those,
which
is
why
we
have
a.
A
B
All
right,
the
next
big
thing
here
is
this
client
compare
function
again:
it's
templatized
and
it's
basically
the
function.
That's
used
to
sort
each
of
our
three
heats
and
this
the
way
it's
sorted
is
based
on
the
template
parameters
that
come
in
when
this
structure
is
instantiated.
B
So
it's
which
so
request
tag
tag
field
determines
which
of
the
fields
is
it
the
reservation,
the
weight,
the
limit
which
is
being
used
here,
whether
we
care
about
the
value
of
ready
again,
things
that
are
marked
as
ready
means
they're
still
within
the
limit
we
haven't
hit
the
limit,
yet
the
cap
and
the
bull
is
whether
or
not
we
use
the
delta
property
in
these
calculations
here
and
so
based
on
those
it
looks
complicated,
but
it's
basically
saying
that,
given
these
two
client
records,
which
one
should
come,
first
does
client
one
come
before
client,
two
or
not.
B
Okay.
If
client
one
should
come
before
client
two,
then
it
returns
true.
If
client
either
client
two
should
come
before
client,
one
or
they're
the
same,
then
it
returns
false
and
that
is
what's
being
used
by
the
heap
data
structure.
In
order
to
move
the
next
item
to
the
top
of
the
heap,
by
calling
by
doing
a
sequence
of
comparisons
and
moving
a
client
record
up
to
the
top
of
the
heap,
all
right.
B
And
even
though
it
looks
complicated,
because
these
are
template
parameters
and
whatnot,
it
probably
compiles
down
to
some
relatively
simple
code
based
on
what
those
template
parameters
are
okay,
and
so
now
we
can
see
that.
B
We
have
each
of
those
three
heaps
again
we're
not
using
the
prop
heat
that
is
compiled
out
and
that
we
supply
the
comparison
function
for
each
of
them.
So,
for
example,
the
reservation
heap
here
it
uses
the
reservation
tag
as
this
main
thing
for
for
organizing
for
ordering
things.
B
It
ignores
the
ready
option
and
it
does
not
incorporate
delta
all
right,
whereas
the
ready
heap
okay,
where
things
are
handled
by
weight
and
things
that
have
not
achieved
their
cap,
used
to
use
the
proportion
field.
The.
A
B
It
makes
it
out
that
client
more
important
in
the
heap
and
we
are
going
to
incorporate
delta
in
the
ordering
all
right.
So
it
consolidates
all
that
code
nicely
there.
Okay.
B
Okay,
so
here's
the
code
for
adding
a
request,
it
kind
of
gets
complicated.
It's
called
do
add
requests,
I
believe,
there's
a
number
of
of
ad
request
interface
functions,
all
of
which
can
get
consolidated
down
to
the
do
add
request
which
actually
does
the
work
of
adding
a
request
to
the
queue.
B
B
Okay,
so
first
looks
to
see
if
the
reservation
heap
is
empty.
If
the
reservation
heap
is
empty,
then
all
all
the
heaps
are
empty.
It
means
that
there's
there's
no
active
clients
currently
and
it
does
return.
None
in
that
specific
case.
B
And
then
it
looks
to
see
whether
there
is
a
request.
B
Pulls
up
the
top
request
and
sees
whether
its
reservation
is
less
than
or
equal
to
now,
if
it
is,
the
work
is
done,
it
knows
which
request
it
needs
to
be
handled
and,
and
the
work
is
done
now,
if
there
is
nothing
which,
if
we've
met
all
of
our
minimum
reservations,
and
it
has
to
go
on
to
the
next
phase
here,
where
it
has
to.
B
Let's
see
here,
it
first
adjusts
the
limits
heap
all
right
and
then,
amongst
those
things
that
are
ready,
it
looks
to
see
what
the
top
element
is.
B
Make
sure
that
it
is
ready
that
the
ready
flag
is
true
and
that
the
proportion
is
less
than
the
maximum
tab,
so
that
that
what
is
drawing
a
blank
as
to
why
that
bit
of
logic
is
there
and
if
it,
if
it
finds
something
which
achieves
that,
then
it
returns
that
as
the
next
request
during
the
proportional
phase,
the
ready
phase.
B
Now,
once
we
get
here,
that
means
that
we
had
nothing
that
we
could
schedule
be
a
reservation,
nothing
we
can
schedule
via
weight
or
proportion,
and
so
now
the
question
is
whether
what
will
we
do
when
we've
achieved
our
limit?
Do
we
ignore
the
limit
and
simply
return
the
next
item?
Nonetheless,
or
do
we
enforce
the
limit
and
that's
what
this
code
here
is
doing?
Okay,
and
if
we're
allowed
to
break
the
limit
and
say
we
return
something,
we
first
do
it
by
we.
I
guess
we
do
it
by
weight.
B
Okay,
so
we
have
a
way
of
breaking
the
limit.
If
we
don't
want
to
enforce
the
limit
and
then
there
is
some
a
bookkeeping
which
is
done
down
below
here.
I
guess
this.
I
guess
this
is
the
case
now,
once
we
get
to
this
point,
nothing
is
scheduled
so
make
sure
when
we
rerun
the
next
reservation
item
or
next
limited
item
comes
up.
B
Okay,
so
if
we
didn't
find
anything
that
we
could
return,
we
end
up
having
to
either
return
a
future
that
says
that
wait
until
this
time
or
something
will
be
ready
or
if
all
the,
if
there's
nothing,
that's
me
ready
soon.
We
return
none,
okay,
and
so
that's
all
the
logic
there.
There
is
some
cleanup
that
needs
to
be
done
when
clients
might
disappear.
B
Okay,
I
am
seeing
that
it's
now
almost
15
minutes
in
maybe
I'll
take
a
quick
peek
at
the
indirect,
intrusive
heat
since
that
handles
a
lot
of
the
the
key
work
for
that.
Well,
there's
two
ways
to
go
here:.
B
B
Let's
see
that
sam
just
is
basically
rewritten
all
this
code
and
it
did
need
to
be
rewritten
back
in
september
of
2019,
so
this
code
is
actually
new
to
me
that
I
saw
for
the
first
time
today.
B
Show
you
that
in
this
particular
implementation
there
are
essentially
three
clients.
There
are
one
client
in
the
actual
client
request
coming
in
from
a
variety
of
clients,
all
the
clients
that
are
sending
requests
to
the
cluster.
B
Their
requests
are
mixed
up
as
a
single
client
from
the
viewpoint
of
dm
clock
here,
okay,
so
we
have
one
set
of
of
scheduling
parameters
for
all
clients.
B
We
have
another
one
for
background
recovery,
so
so
background
cover
is
considered
to
be
the
second
client
in
this
particular
system
and
it
has
its
requests
and
then
the
third
one
is
something
called
background.
Best
effort,
I'm
not.
I
haven't,
looked
for
the
code
to
see
what
gets
scheduled
under
this
particular
client.
B
B
B
And
go
into
the
supports
subdirectory
and
we
have
some
support
classes
and
some
tests
for
those
support
classes
and
the
key
one
here
is
this
thing
called
the
indirect
intrusive
heap,
which
again,
we
instantiate
three
times
on
the
server
once
to
order
things
based
on
reservation,
once
on
wait
and
once
on.
Excuse
me
once
on
limit
and
once
on
whether
they're
ready,
which
combines
both
limit
and
weight
and
so
I'll.
Take
a
very
quick
look
at
that.
B
It's
been
a
long
time
since
I
looked
at
this
code,
but
the
reason
why
it's
intrusive-
and
maybe
I
have
to
go
back
to
the
picture.
B
B
That's
going
to
need
us
to
re-evaluate
and
and
re-bubble
up
to
the
top
of
each
of
our
three
heats,
which
request
will
come
next.
It
will
affect
everything,
and
so
that
means
client
one
now
might
appear
somewhere.
It
might
not
be
at
the
top
of
of
any
of
those
given
heaps,
and
so
we
have
to
recalculate
those,
and
so
the
client
records
have
to
know
where
in
each
of
the
three
heaps
they
exist,
there
has.
B
So
each
of
these
client
records
on
the
server
side
has
to
have
a
number
which
says
in
the
in
the
reservation
heap
we're
at
position
three,
where
it's
an
index
and
in
the
ready
heap,
we're
at
position
five
and
in
the
limit
heap
we're
at
position
two.
So
when
so,
when
we
then
service
this
request,
we
can
go
in
and
adjust
that
client
for
each
of
the
three
heats
okay
and
that's
why
it's
an
indirect
intrusive
heap.
B
We
have
to
have
some
intrusive
data
that
we
manage
in
each
of
the
client
records
all
right.
So
I
don't
think
we
have
enough
time-
and
I
don't
know
if
my
memory
is
clear
enough,
since
it's
been
so
long
to
kind
of
go
through
all
this,
but
that's
basically
what's
happening
here.
You'll
see
there
are
various
operations
to
manage
the
heat.
This
is
our
here's,
our
public
interface,
so,
whether
it's
empty
or
not,
how
many
elements
are
in
the
heat,
pushing
items,
removing
items,
finding
items
and
so
forth?
B
A
Eric,
maybe
you
could
speak
a
little
bit
about
the
performance
potential
areas
you
you
could
see
for
performance
improvements
or
areas
of
the
algorithm
that
are
more
expensive.
B
Yeah,
if
you
go
back
to
the
server
code
well,
so
some
operations
when
you
choose
a
set
of
data
structures
to
use
you
try
and
optimize
them
for
the
operations
that
are
occurring
most
often
nonetheless,
some
other
operations
that
need
to
be
done
might
be
done.
Might
you
might
have
to
sacrifice
them
somewhat?
B
There's
even
a
comment
here
when
you
need
to
remove
requests
by
filtering.
I
believe
that
the
cues
that
the
osd
have
say
you
remove
all
there
are
some
functions
functionality
there,
which
they
remove
all
the
requests
that
that
this
filter
returns.
True
for
which
means
that
we
have
to
then
implement
this
in
the
library
as
well,
and
so
I
believe,
that's
a
relatively
expensive
operation
and
there's
even
a
little
comment
here.
B
That
says,
because
this
is
a
a
deck,
this
can
be
relatively
expensive
as
we
have
to
go
through
and
probably
touch
a
bunch
of
other
stuff
in
there.
B
This
code
has
been
looked
at
and
sifted
through
and
by
a
number
of
people,
and
actually
the
open
source
contributors
in
in
asia
associated
with
various
telecoms
have
really
done
a
lot
of
work
in
optimizing
things,
which
is,
for
example,
why
the
tag
calculation
is
delayed
because
they
found
that
the
the
algorithm
works
better
when
we
use
the
most
recent
row
and
deltas
to
calculate
those
tags.
B
So
I
I'd
say
at
this
point:
nothing
pops
out
at
me
again,
I'm
I
probably
haven't
looked
at
this
code
at
least
a
year
and
a
half
to
two
years
very
closely,
but
dm
clock
received
a
lot
of
attention
there
for
a
period
of
time,
a
lot
of
discussions
going
on
as
far
as
whether
we
were
implementing
it
correctly,
based
on
the
paper
whether
the
paper
might
have
been
vague
at
some
point
or
paper
might
have
been
incorrect
in
some
points
or
or
missed
a
detail,
and
so
I
think
we're
we're
close
to
there.
B
But
I'm
sure
other
people
who
went
through
this
very
with
a
fine
tooth
comb
probably
find
things
that
could
nonetheless
be
improved.
But
I
don't
know
of
anything
at
this
particular
time,
I
guess
to
say
those
who
are
interested
in
working
on
this
again.
There's
you
know,
at
least
within
the
cef
community.
B
There
are
four
people
that
I
know
that
have
been
working
on
it.
Sam
has
done
a
lot
of
work
recently
on
it.
Casey
and
avishak
have
done
a
lot
of
work
on
the
rgw
side
and
of
course
I
did
some
of
the
original
work
as
well,
and
I'll
only
speak
for
myself,
but
I
would
certainly,
if
you
have
any
specific
questions,
it
might
take
me
a
while
to
answer
them
because
it's
been
a
while,
but
I
am
happy
to
answer
questions
on
it
and
I
suspect
the
other
three
wouldn't
mind
either.