►
Description
From the 2017 OpenZFS Developer Summit:
http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2017
A
B
Cool
so
yeah
I'm
Prakash
I'm,
going
to
talk
about
my
Zil
performance
improvements.
So
here's
a
brief
overview.
What
I
plan
to
discuss
is
broken
up
into
roughly
three
parts.
First,
I'll
give
some
background
and
discuss
what
the
Zil
is,
how
it's
used
and
how
it
works.
Then
I'll
get
into
the
problem.
I
set
out
to
fix
how
I
fixed
it
and
provide
some
details
on
how
I
did
that
and
then.
Lastly,
I'll
show
off
some
graphs
and
results
of
my
work.
So
with
that
out
of
the
way,
let's
get
started
first
off.
B
B
B
The
Zil
is
almost
always
used
whenever
any
of
these
logged
operations
occur.
They're
inserted
into
the
zil's
in-memory
list
of
operations.
These
operations
are
often
called
IT,
exes
or
1/10
log
transactions,
when
the
Zil
is
called
the
ITX
is
tracked
by
the
in-memory
Zil
are
written
to
disk
when
Zil
commit
is
called
the
caveat
to
all
of
this.
Being.
None
of
this
occurs
that
the
dataset
is
configured
with
sink
equals
disabled
if
sink
equals.
B
Disabled
ITX
is
aren't
tracked
in
memory,
nor
are
they
written
to
disk
since
I
often
see
the
Zil
and
s
log
term
used
incorrectly
I
wanted
to
briefly
address
this
s.
Lock
stands
for
separate
log
device.
The
Zil
and
s
are
different
in
that
the
Zil
is
a
mechanism
for
issuing
rights
to
disk,
and
the
S
log
may
be
the
disc
that
the
rights
are
issued
to
with
that
said,
an
S
log
is
not
necessary
by
default.
B
Zil
rights
will
go
to
the
main
pools
disks,
but
an
S
log
can
be
used
to
try
and
improve
the
latency
of
Zil
rights
if
the
main
pools
be
tabs
are
deemed
too
slow.
So
why
exactly
does
a
Zil
exist
in
the
first
place?
Well,
rights
and
ZFS
are
right
back.
What
that
means
is
data
is
first
modified
and
stored
in
memory
in
the
DM
you
layer
and
then
later
at
some
point,
the
data
is
written
to
disk
via
spa
sync.
The
problem
is
spa.
B
Sync
can
take
tens
of
seconds
or
more
to
write
out
this
data.
It's
unacceptable
for
all
sync
rights
to
take
tens
of
seconds
to
complete,
yet
to
complete
further
rights
and
ZFS
often
cause
more
rights
to
occur.
For
example,
a
single
file
right,
modifying
a
single
block
of
user
data
will
then
cause
indirect
blast.
Indirect
blocks
to
also
be
modified
and
written.
The
Zil
allows
us
write
amplification
effect
to
be
mitigated.
B
Essentially,
the
Zil
exists
as
a
performance
optimization
to
provide
synchronous
to
barite
synchronous
semantics
to
applications
faster
than
what
could
be
achieved
with
spa
sync
alone,
while
correctness
could
be
achieved
without
the
Zil
performance
would
be
unreasonably
bad,
which
makes
it
a
necessity
before
I
jump
into
the
next
section.
I
wanted
to
quickly
go
over
the
on
disk
format
of
the
zone.
Each
ZFS
dataset
maintains
its
own
unique
Zil
on
disk.
B
Each
of
these
ills
is
a
singly
linked
list
of
Zil
blocks
or,
as
they're
also
called
LWB,
which
stands
for
log
write
block,
as
one
can
see
here.
The
uber
block
has
a
pointer
to
the
moss,
which
then
has
pointers
to
each
data
set
in
the
pool,
and
each
of
these
data
sets
has
a
pointer
to
its
own
Zil
header.
Each
seal
header
then
points
to
an
LWB
and
that
block
and
that
block
points
to
the
next
block
in
the
list.
B
Now,
let's
get
into
how
the
Zil
is
used,
the
Zil
is
used
by
the
z,
posix
layer
or
ZPL.
For
short,
the
CPL
interacts
with
the
Zil
in
two
phases.
First,
it
uses
ill
ITX,
a
sign
which
causes
the
Zil
to
log
the
fact
that
an
operation
is
occurring
then
it
uses
ill
commit,
and
this
tells
the
Zil
to
write
out
these
log
records
to
disk.
B
Let's
look
at
ZFS
right
as
an
example
of
this
ZFS
right,
we'll
call
ZFS
log
write,
ZFS
log
write
will
then
cause
ill
ITX
create
which
will
create
the
ITX
structure
in
RAM,
then
it'll
cause
ill
ITX,
a
sign
which
will
insert
the
ITX
into
the
zil's
in
memory
state.
Finally,
if
this
is
a
sink
rate,
ZFS
right,
we'll,
then
calls
ill
commit
which
will
cause
the
ITX
to
get
written
to
disk.
B
Now,
let's
look
at
a
ZFS
F
sink
as
another
example.
F
sink
doesn't
create
any
new
modifications;
instead,
it
simply
ensures
any
previous
operations
are
written
to
disk
before
it
returns.
Thus,
the
FSF
sink
doesn't
cause
ill
ITX
create,
nor
does
it
cause
l
ITX
a
sign.
Instead,
it
only
calls
xilk
commit
calling
Zil
commit
will
ensure
all
previous
operations
are
written
to
disk
before
f
sink
returns.
B
The
parameters
of
Zil
commit
are
such
that
the
caller
will
pass
in
enough
information
to
uniquely
identify
an
object
whose
data
is
to
be
committed
and
the
contract
as
ill
maintains.
What
the
color
is,
all
operations
relevant
to
the
object
specified
will
be
persistent
on
disk
by
the
times
they'll
commit
returns
by
relevant.
It
means
all
operations
that
would
modify
that
object
and
by
persistent
it
means
the
operations
are
written
to
disk
and
the
disks
used
for
those
rights
are
flushed
further.
We
must
issue
the
the
disk
flush
after
the
writes
complete.
B
Lastly,
the
interface
for
Zil
commits
doesn't
allow
the
caller
to
specify
which
operations
that
they
care
about.
Thus
xilk
commits
must
write
all
operations
for
a
given
object,
even
if
the
caller
only
cares
about
us
a
subset
of
those
operations.
For
example,
if
there's
multiple
threads
writing
to
the
same
file,
but
at
different
offsets,
all
offsets
must
be
written
to
disk
before
zel
commit
returns.
Even
if
the
calling
thread
only
cares
about
one
of
those
offsets.
B
So
how
does
the
Zil
accomplish
this?
Well,
as
I
alluded
to
previously,
the
zeal
maintains
an
in-memory
list
of
IT
x's
that
have
occurred,
but
having
yet
been
written
to
disk.
This
list
is
maintained
via
the
ITX
G
structure
in
the
Zil,
and
each
ITX
G
structure
contains
the
following:
a
single
list
of
all
synced
operations
that
have
occurred
for
all
objects
in
the
dataset
plus
per
object
list
of
async
operations
for
each
object
modified.
B
Here's
what
this
might
look
like
in
this
example,
the
ith
ITX
g-sync
list
has
to
I
T
X's
in
it,
each
of
which
can
map
to
an
operation
for
any
object
in
the
data
set,
and
then
a
list
of
async
operations
that
occurred
for
object.
A
and
a
list
of
async
operations
that
occur
for
object,
B,
when's,
L
commit
is
called.
How
do
these
IT
x's
get
written
out
to
disk?
B
B
B
Additionally,
it's
worth
pointing
out
the
point
of
the
commit
list
is
so
we
have
a
list
of
I
T
X's
to
write
out
that
will
not
be
modified
by
any
concurrent
zpl
activity,
as
new
zpl
operations
occur.
The
sink
list
may
change.
For
example,
operations
may
be
added
to
it,
but
the
commit
list
will
remain
the
same.
B
Now
that
we
have
a
list
of
IT
x's
to
be
written
out,
it's
time
to
actually
issue
them
to
disk.
We
do
this
by
iterating
over
all
of
the
I
T
X's
in
the
commit
list,
then
for
each
ITX
we
attempt
to
copy
it
into
the
the
open
Zil
block.
If
there
is
insufficient
space
in
the
block,
then
we
allocate
a
new
block
and
issue
the
old
one
to
disk.
Lastly,
after
all,
IT
x's
are
copied
into
LW
B's
wish
you,
the
last
open
block
to
disk,
allocating
the
next
open
block
in
the
process.
B
So
here's
what
I
mean
given
the
commit
list
from
before
and
lwb
one
as
a
currently
opens
L
block.
We
select
the
first
ITX
in
the
list,
so
here
we
select
ITX
s1
and
copy
this
into
the
blocks
buffer.
This
block
will
remain
open
as
denoted
by
the
dotted
line,
since
it
may
also
be
used
for
the
next
ITX
in
the
list.
B
So
now
we
move
on
to
the
next
ITX
ITX
s2.
This
one
doesn't
quite
fit
in
the
currently
open,
Zil
block.
As
you
can
see,
it
doesn't
fit
in
lwb
one
here,
so
we
must
allocate
a
new
block
and
issue
the
current
one
to
disk
here.
Lwb
one
now
has
a
solid
line
to
indicate
it's
been
issued
to
disk
and
LWB
two
has
a
dashed
line
to
indicate
it's
a
new
opens
ill
block.
Further
LW
b1
maintains
a
pointer
to
LWB
two
on
disk
and
that's
the
singly
linked
list
relationship
that
I
mentioned
earlier.
B
B
So
we
issue
LWB
two
to
disk,
allocating
the
next
open
block
in
the
process.
So
here
elder
BB,
one
and
two
have
been
issued
to
disk
as
we
can
see
if
they
have
the
solid
line
T
and
then
LWB
3
has
a
dashed
line
to
indicate
it's
now.
The
new
open,
Zil
block
and
this
block
will
be
used
when
writing
out
the
net.
The
next
batch
of
my
exes
now
after
we've
issued
all
Zil
blocks
to
disk.
We
must
wait
for
them
to
complete.
B
So
now,
let's
dive
into
the
problem
with
with
all
of
this,
so
the
main
issues
can
be
summarized
into
the
following
three
points:
first,
I
T
X's
are
grouped
and
written
in
batches
where
the
commit
list
constitutes
a
batch
and
the
batch
size
is
proportional
to
the
sink
to
the
sync
workload
on
the
entire
system.
Next
threads
waiting
for
Zell
commit
to
complete
are
only
notified
when
all
Zil
blocks
when
all
Zell
blocks
in
a
given
batch
complete.
B
Here's
an
example
of
what
this
ends
up.
Looking
like
this
is
a
timeline
of
the
disk
activity,
disk
activity
of
an
example
pool
what
can
be
seen
here
is
blocks
a
through
e
that
first
screen
batch
blocks.
A
through
e
are
all
written
in
the
first
batch,
but
the
disk
activity
is
slightly
uneven,
while
disk
disks,
2
3
&
4
only
receive
a
single
Zil
block
to
write
disk
one
receives
two
blocks,
as
you
can
see.
B
Thus
disks
2,
3,
&
4,
complete
their
rights
and
then
remain
idle,
while
disk
1
finishes
its
work.
This
idle
time
is
due
to
the
fact
that
only
a
single
batch
can
be
processed
at
a
time
which
leads
to
inefficient
usage
of
the
storage,
for
example,
blocks
F,
G
and
H,
as
shown
in
the
yellow
batch.
Just
after
that,
first
green
one
blocks,
F,
G
and
H
could
have
been
issued
to
disks
2
3,
&
4
filling
this
idle
time,
but
the
batching
mechanism
prevents
us.
B
C
B
By
this
one
up
there,
it
would
also
have
to
wait
for
Block
E
to
be
written
as
well
also
written
by
block
by
disc
1.
This
unnecessarily
increases
the
latency
of
Zil
commit,
and
in
this
case,
potentially
doubling.
Yet
the
solution
is
somewhat
obvious.
Let's
just
remove
this
concept
of
batches,
rather
than
waiting
for
the
current
batch
to
complete.
We
should
issue
new
z
locks
to
disk
immediately
as
soon
as
they
can
be
written
out
further,
rather
than
waiting
for
a
batch
to
complete
before
notifying
threads.
B
These
threads
should
be
notified
immediately
when
their
data
is
safe
on
disk.
If
we
did
that,
then
we
could
go
from
this
diagram,
which
I
showed
earlier
to
this
one,
where
all
disk
all
disks
in
the
pool
are
saturated
and
threads
are
notified
as
soon
as
each
individual
block
completes.
As
this
diagram
illustrates,
we'd
be
able
to
service
the
same
number
of
Zil
blocks
in
nearly
half
the
time,
potentially
doubling
our
eye,
ops
and
and-
and
this
is
without
changing
a
single
thing
about
the
workload
nor
the
underlying
storage
characteristics.
B
The
bulk
of
the
changes
revolve
around
three
things:
changing
how
Zil
blocks
are
issued
to
disk
changing
one.
The
flush
commands
are
sent
and
changing
how
we
notify
waiting
threads.
Previously,
this
was
a
slip,
was
a
sequential
three-step
process
step.
One
would
consist
of
creating
the
Zil
blocks,
issuing
these
to
disk
and
then
waiting
for
the
I/o
for
all
of
the
blocks
to
complete
next
after
the
blocks
completed,
step
two
would
consist
of
issuing
the
flush
to
each
v
dev
and
then
waiting
for
those
flushes
to
complete
and
finally,
after
all,
those
flesh
is
completed.
B
The
Zil
CV
would
be
signal
would
be
signal
to
notify
any
waiting
threats.
All
threads
that
called
Zell
commit
would
be
waiting
on
the
CV,
so
this
was
a
mechanism
to
let
them
know
that
their
data
is
safe
on
disk.
All
three
of
these
steps
would
consist
of
a
single
batch
after
one
batch
completes.
Another
would
start
out
and
another.
B
Now
the
process
is
entirely
different
and
heavily
leverages
the
zio
infrastructure
instead
of
a
single
route
zio
for
an
entire
batch
of
blocks.
Each
block
now
has
its
own
unique
route.
Zio
each
route
will
eventually
have
two
children,
a
right
zio
containing
the
data,
the
ITX
data
to
be
written
and
a
flush
cio
that
is
issued
after
the
right
completes.
B
Since
these
are
each
child
zio
s,
the
route
cannot
complete
until
both
the
right
and
the
flush
complete.
This
is
enforced
by
the
pre-existing
zio
parent
semantics,
so
I
didn't
have
to
change
any
of
that.
I
just
got
it
for
free
by
using
CIOs
further
the
route
CIO
of
the
previous
block
will
also
be
a
child
of
the
next
block.
For
example,
here
l
WB
one
is
a
child
of
l
WB.
B
B
We
now
have
a
cv
per
thread
then,
when
the
route
zio
for
any
Zil
block
completes,
each
cv
in
the
blocks
list
is
signaled
notifying
the
waiting
threads
that
their
data
is
safe
on
disk,
let's
walk
through
an
example
of
what
this
looks
like
first,
the
l,
WB
structure
and
the
route
zio
is
created.
Initially,
the
list
of
Seavey's
will
be
empty,
but,
as
IT
x's
are
copied
into
the
blocks
buffer,
it
will
be
can't
will
begin
to
accumulate
a
list
of
waiting
of
threads
waiting
on
it.
B
This
same
process
will
occur
for
for
the
next
block
and
the
next.
At
this
point,
though,
it's
important
to
note,
the
sequence
of
events
doesn't
have
to
occur,
occur
in
this
specific
order.
For
example,
it's
possible
for
block
to
to
issue
its
right
before
the
right
for
block
one
completes
in
this
case.
The
right
for
block
one
would
have
been
issued
and
then
the
right
for
block,
even
though
block
one's
right
was
still
being
serviced
by
the
disk.
Previously,
this
would
have
been
prevented
due
to
the
batching
mechanism.
B
The
same
goes
for
the
right
for
block
three.
We
can
also
issue
block
three
to
disk
before
two
and
one
complete,
it's
even
possible
for
the
rights
of
block
two
and
three
to
complete
before
the
right
of
block
one.
If
this
happens,
though,
the
route
of
block
two
and
three
will
still
be
blocked,
waiting
for
block
one
to
complete.
Thus,
even
if
the
flushes
were
issued
and
completed
for
blocks,
two
and
three
their
CVs
would
not
get
signaled.
B
Yet,
as
soon
as
the
right
for
block
one
completes,
the
flesh
would
be
issued
and
once
block
ones
flush
completes,
then
all
three
blocks
would
complete
simultaneously
and
the
Seavey's
for
all
of
these
blocks
would
be
notified.
So
while
before
this
process
was
very
sequential
now,
it's
completely
driven
by
the
coming
sync
workload
and
the
end
disk
completion
events.
B
Before
jumping
into
the
performance
results,
I
wanted
to
quickly
talk
about
how
we
determine
when
to
issue
a
Zil
block
to
disk.
If
you
remember
from
earlier
in
the
talk,
will
we
build
up
these
blocks
by
iterating
over
the
commit
list
and
copying
each
ITX
into
one
of
the
LWB
buffers
previously
once
once
we
reach
the
end
of
the
commit
list,
in
essence
the
end
of
a
batch,
we
would
issue
the
last
LWB
to
disk
now
that
we're
batch
lists.
B
We
don't
necessarily
want
to
do
that
if
we
reach
the
end
of
the
commit
list,
but
there's
still
buffer
space
available
in
the
current
block,
for
example,
we
could
have
128k
a
block
with
only
a
single
a
k
right
in
it.
Then
we
actually
want
to
delay
issuing
that
block
just
in
case
new
IT
x's
are
generated
that
would
still
fit
in
that
in
that
Zil
block
this
way,
we
can
write
out
more
IT
x's
using
fewer
AI
ops.
B
The
problem
is,
we
have
no
way
to
predict
a
future
here.
We
don't
know
for
sure
if
more
IT
x's
will
be
generated.
Thus,
if
we
wait
for
future
IT
exits,
but
none
are
generated
where
we're
adding
additional
latency
to
the
current
LWB
for
no
benefit.
But
if
we
don't
wait
at
all
and
additional
IT
x's
are
generated,
we
could
end
up
using
more
AI
ops
and
we
need
to
and
potentially
degrade
performance
by
saturating
the
disk.
B
D
B
If
it's
filled
within
that
250
microseconds
it'll
be
issued
to
disk
immediately
so
like
here.
If
lwb
2
might
still
have
some
space
in
it,
so
it
it'll
probably
wait
here
right,
it'll
wait
for
the
timeout!
So
if
it's
not
filled
it'll
timeout
after
250
microseconds
and
be
issued
to
disk
partially
filled,
but
if
there
are
more
ITX
than
commit.
E
B
Finally,
let's
go
over
the
results
of
the
performance
tests
that
were
used
to
verify.
This
I
used
two
different
FIO
workloads
to
verify
for
this
workload,
each
fil
thread
was
submitting
sync
rights
as
fast
as
it
could
and
I
measured.
The
total
number
of
I
ops
that
were
achieved
across
all
all
FIO
threads
and
I
very
varied.
The
number
of
FIO
threads
from
2
to
1024.
This
graph
shows
the
percentage
difference
in
I
ups
between
illumos,
with
my
changes
and
illumos.
B
Without
my
changes,
the
dashed
line
at
the
bottom
is
just
a
visual
aid
showing
to
highlight
where
a
0%
difference
would
be
so
anything
above
that
line
is
improvement
and,
on
average
I
measured
an
83
percent
increase
in
I
ups
with
my
changes,
the
dotted
line
in
the
middle
is
another
visual
aid
simply
to
show
where
exactly
83%
improvement
is
in
relation
to
the
actual
measurements
taken.
And
additionally,
this
zpool
used
for
this
graph
consisted
of
a
traditional
spinning
drives.
B
I
also
ran
the
same
workload
on
a
zpool
consisting
of
eight
SSDs
when
running
on
SSDs
improvement
isn't
as
dramatic,
but
I
was
still
able
to
measure
a
48
percent
improvement
on
average.
Again,
the
same
visual
aids
are
here
the
dashed
line
at
the
bottom.
Anything
above
it
is
improvement,
and
then
the
dotted
line
kind
of
in
the
middle
ish
is
the
average
to
show
a
forty
eight
percent.
B
The
second
workload
tested
was
again
using
FIO,
but
this
time
each
FIO
thread
would
attempt
to
issue
a
maximum
of
64
sync
rights
per
second.
Thus,
the
number
of
I
ops
was
constant
with
and
without
my
changes,
but
the
latency
of
each
sync
right
was
still
improved
since
now
we're
measuring
latency,
rather
than
I
ups,
any
value
below
the
dash
line
is
improvement.
The
dash
line
still
showing
0-0
percent
change
when
running
this
test.
On
my
8
spinning
disk
pool
I,
measured
the
latency
of
each
sync
right
to
decrease
by
an
average
of
27%.
B
Also
worth
noting
the
I
ops
began
to
diverge
at
thread
counts
greater
than
64,
where
my
new
code
started
doing
more.
I
ops
on
the
old
code,
so
I
removed
those
data
points
to
keep
the
comparison
fair.
There
was
still
improvement,
but
I
didn't
think
it
was
fair
to
show
that
and
lastly,
I
ran
that
same
workload
where,
with
a
constant
FIO,
sync
writes
per
second
on
my
8
SSD
system.
B
B
Iran
testing
but
I
didn't
oh,
did
I
run
any
performance
testing
with
a
mixed
workload
of
sync
and
async
writes
or
any
other
mixture
of
stuff
I
ran
testing,
but
I
didn't
measure
like
the
performance
difference
of
that,
just
simply
because
I
wasn't
really
sure
how
to
get
a
baseline
and
how
to
compare
when,
like
I,
really
wasn't
sure
what
was
important
to
test,
because
you
know
the
test
matrix
kind
of
expands
exponentially
with
all
of
these
different
variables.
So
I
ran
a
test
for
like
24
hours.
Just
let
it
do
whatever
the
hell.
B
B
I
just
touched
how
it
is
issued
and
can
try
to
remove
the
batching
and
the
wickiups,
and
that
sort
of
thing
so
I
would
expect
s
log
to
also
have
improvements,
but,
as
I
showed
with
my
SSD
testing,
the
improvement
isn't
as
great
there
and
there's
still
some
more
things
that
you
know
we
could
improve
on
to
make
it
better
for
really
low
latency
drives,
and
that
was
kind
of
the
issue
that
I
saw
with
a
with
the
SSDs
is
just
really
low.
Latency
drives
the
current
algorithm,
it's
good
enough,
so
I.
B
I
think
it
will
improve
with
the
number
of
drives,
based
on
the
theory
and
I
did
run
some
tests
on
Linux
with
a
bunch
of
drivers,
because
brian
behlendorf
gave
me
a
box
too
test
with
and
I
saw
the
same
improvements
with
lots
of
drives
the
same
percentage
improvement.
The
the
graphs
were
mostly
the
same
so
yeah.
B
B
So
every
time
we
reached
the
end
of
the
commit
list,
it
would
issue
that
LWB
to
disk
immediately
and
what
I
saw
with
was
because
of
the
changes
that
I've
made
to
how
the
blocks
are
are
like
how
the
commit
list
is
built
up
and
how
there's
no
batching
the
commit
list
usually
was
like
1
or
2
IT
x's,
or
a
very
small,
a
number
of
IT
x's.
Each
time
it
was
traversed,
so
we
would
get
to
IT
X's
into
a
LWB
and
then
we
would
issue
it
to
disk
immediately.
B
So
what
happened
is
I
was
saying
instead
of
using
large
128k
LW
B's
using
the
maximum
size.
I
would
see
a
lot
of
like
16
klw
B's
filled
with
much
smaller
IT
x's.
So
it's
hard
for
me
to
say
how
much
it
helped,
because
I
didn't
try-
and
you
know
do
before
and
after
with
and
without
that
delay
with
my
final
code.
But
early
on
performance
was
was
terrible
until
I
added
that
delay.
B
A
B
I
you're
remembering
the
correct
thing,
but
I
don't
know
exactly
how
to
present
what
we
found
out,
but
it
was
without
even
with
the
delay
like
if
you
don't
have
enough
threads
to
fill
an
lwb
block.
If
you
have
120
okay
block
and
you
only
in
my
I
was
doing
a
k
writes
so
you
need
15
k,
threat,
15
kak
writes
will
fit
in
128
K.
So
what
like,
when
I,
was
running
with
16
threads
15
of
those
a
cake
threads
would
wind
up
in
one
sill
block,
128
K
and
then
there's
one
straggler
thread.
B
That
would
because
of
like
metadata
accounting
in
the
block
would
wind
up
in
the
next
one.
So,
instead
of
having
all
16
fit
in
a
single
block,
we
would
have
to
use
two
blocks.
So
there
was
kind
of
like
a
performance
kind
of
cliff
spike
and
that
that
could
be
a
artifact
on
my
testing
or
whatever
you
want
to
call
it.
E
B
So
the
question
is:
did
I,
do
any
performance
testing
about
the
impact
of
flushes?
Did
I
try
and
test
with
flushing
disabled,
and
could
we
like
batch
the
flushes
instead
of
issuing
a
flood
fur
for
each
lwp?
Does
that
a
good
summary
I
didn't
do
any
rigorous
performance
testing
with
flushing
disabled
I
just
use
the
default
config
I
think
it
would
be
useful
to
do
that.
The
the
reason
I
didn't
is
because
I
think
performance
can
only
get
better
if
we
turn
it
off.
So
I
wanted
to
try
and
see
what
the
defaults
like.
B
Is
it
good
enough?
Do
I
need
to
do
something
more
complicated
and
since
I
saw
a
bunch
of
improvement
with
the
defaults
and
flushing
for
each
LWB,
I
kept
it,
as
is,
if
we,
if
we
needed
to
do
testing-
and
we
found
out
that
this
is
doing
too
many
flushes
and
degrading
performance-
there's
no
reason
we
couldn't
change
it
to
two
batch
it
up,
but
then
it
kind
of
gets
us
back
into
the
batching
mechanism.
So
it's
we
just
need
to
be
a
little
bit
careful
about
how
we
implement
that.
B
So
the
the
the
question
is:
what
was
the
motivation
of
the
original
backing
mechanism
and
I
guess
maybe
was
that
because
it
helped
order
order
the
IT,
x's
and
and
things
I
definitely
can't
comment
on
the
original
motivation,
because
I
was
way
before
my
time,
maybe
mark
or
Matt
or
George,
or
somebody
else
can
comment
on
like
the
motivation
there.
It's
definitely
a
little
simpler,
well,
I
mean
I
think
this.
B
This
mechanism
is
pretty
simple
just
because
it
falls
out
with
it
with
the
CIO
stuff
really
nicely
it
the
ordering
just
kind
of
because
it's
batched
that
you
don't
have
to
worry
about
the
order
of
wake-ups
there.
Everything
just
wakes
up
at
the
same
time,
so
that
is,
is
solved
automatically,
but
I
mean
it.
B
From
my
perspective,
a
sort
of
felt
like
that
was
geared
more
towards
throughput
and
maximizing
throughput,
because
you
issue
a
bunch
of
IT
x's
assuming
the
batch
lice
is
really
big,
and
then
you
wait
for
the
Malka
all
to
complete,
so
I,
don't
know,
D
I
know
I
Oracle's
done
some
work
on
this
I
think
the
implementation
is
completely
different,
but
you
guys
try
to
address
this.
Do
you
wanna.
F
B
So
take
a
single-threaded
case.
Oh
the
question
is
with
a
small
number
of
synchronous,
writers
and
an
S
log.
Can
the
delay
like
negative
negative
negatively
impact
performance?
Is
that
good,
summary
so
I
think
it
can
like
for
a
single
thread
threaded
case
you're,
basically
always
going
to
hit
the
delay
so
I
the
way
I
rationalize
that
is?
B
Hopefully
single
threaded
cases
aren't
the
norm
and
with
a
small
number
of
threads
writing
with
a
previous
code,
it
would
like
I
saw
improvements
with
two
threats:
I
saw
I
didn't
see
improvement
with
one
thread,
because
I
would
always
hit
the
delay,
but
with
two
threads.
The
old
code
also
wasn't
good
because
it
would
almost
always
use
at
least
two
blocks,
so
the
first
thread
would
come
in
and
do
a
Zil
commit
and
it
would
it
would
consume
one
Zil
block
and
once
that
writer
finished
it
would
issue
the
Zil
block.
B
F
F
F
B
Yeah,
so
so
the
more
concern
about
small
numbers,
think
writers
think
writers
are
important
and
we
don't
want
to
delay
them
any
more
than
then
we
need
to.
We
could
set
it
to
zero
or
we
could
make
the
delay
opt,
make
it
optional.
We
did
make
it
a
percentage
B.
So
like
it's
still
5%,
you
could
end
up
adding
that
latency
for
no
good
reason
but
yeah
it.
B
F
B
Yeah
right
the
the
the
question:
I
guess:
the
concern
is
that
maybe
these
changes
increases
the
space
consumed
by
Zul
rights
right
is
that
I
didn't
change
any
of
that,
so
in
in
theory
it
should
be
the
same
if
not
improved,
but
let's
take
some
measurements.
If,
if
you
see
differently,
then
let's
fix
it.
B
That
was
another
consideration
about
the
5%
timeout,
we're
like
without
it
I
in
my
testing,
always
saying
these
LWB
blocks
not
utilized
very
well,
because
new
ITX
would
come
in
and
have
to
get
a
new
ziplock.
So
I
I
saw
the
space
efficiency.
It's
the
performance
issue
is
kind
of
a
variation
on
the
space
utilization.
We
want
to
make
sure
as
best
as
possible.
We
issue
these
LW
B's
as
soon
as
possible,
and
we
also
utilize
them.
So
we
don't
waste
space
on
these
small.
You
know
expensive
devices,
so
yeah.