►
From YouTube: ZFS performance on Windows by Imtiaz Mohammad
Description
From the 2021 OpenZFS Developer Summit
slides: https://docs.google.com/presentation/d/1vcKWOCgw5G3YLiNSolWXg26eK4w7a_iG
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2021
A
Awesome
thanks
guys
welcome.
This
is
imthis.
I
work
with
datacore.
I've
been
managing
a
team
in
bangalore
and
they've
been
working
on
zfs
performance
on
windows
over
the
last
18
months,
or
so
so
I'm
going
to
give
a
quick
snapshot
of
what
we
have
done
there
all
right,
a
bit
about
data
core
we've
been
doing
software
defined
storage
for
over
two
decades.
Now,
among
other
things,
we
have
windows
server,
based
solutions
for
log
storage,
so
that
is
what
triggered
interest
in
the
fsn,
specifically
the
z-walls
around
q4
of
2019.
A
Now
we've
been
working
with
jorgen.
Since
then
we
have
received
great
support
from
him.
Along
with
him,
we
managed
to
stabilize
zfsn,
which
is
the
windows
port
of
zfs.
Of
course,
we
made
the
zdb
functional,
we
expanded
the
z-wall
size,
so
there
was
some
sort
of
a
limitation
in
data
types
which
didn't
allow
us
to
create.
A
So
you
was
more
than
I
think
two
terabytes
or
so
so
we
we
could
go
up
to
seven
exams
with
that
change,
and
then
we
did
confirm
the
integrity
of
data
that
we
are
writing
to
cfs
using
tools
that
are
homegrown
at
data
core
and
that
has
kind
of
laid
a
good
foundation
to
they
pursue
the
performance.
Experiments
that
I
am
going
to
talk
about
in
this
talk
and
alongside
we
did
manage
to
introduce
perfmon
counters.
So
perfmon,
as
you
might
be
aware,
is
a
very
popular
tool
in
windows
ecosystem.
A
We
use
the
wpp
framework
for
doing
the
tracing,
so
I'm
going
to
talk
about
that
as
well
a
little
bit
and
then
at
the
bottom
you
see
the
the
repo.
So
that
is
where
you
could
get
a
copy
of
zfs
in
that
we
have
been
using
if
you
are
interested
in
checking
it
out
all
right.
So
this
has
been
the
the
focus
for
us
for
the
for
the
last
18
months
or
so
so.
A
Measuring
performance
of
sea
walls,
especially
when
deduplication
compression
and
encryption
are
turned
on
and
then
identifying
the
bottlenecks
and
and,
of
course,
fixing
the
bottlenecks.
So
how
did
we
measure
the
performance?
So
we
used
this
delhi
mc
server,
which
had
128
gb
ram,
16
cores,
4,
ssds
of
size,
370
gb,
each
and
the
the
repos
that
you
see
there.
A
When
I
show
you
the
slide
where
we
compare
the
performance
with
the
open
zfs
on
linux,
so
that's
the
repo
that
we
used
and
on
windows.
Of
course,
we've
used
the
repo
that
I
just
talked
about
all
right,
so
so
this
is
how
we
configured
our
pool
and
wall.
We
used
up
the
four
ssds
and
on
the
wall
side
we
chose
to
have
only
metadata
cached
and
we
turned
on
the
dupe
we
used
lg4
for
compression,
we
used
sync,
always
we
used
a
block
sizes
wallboxes
of
128k
turned
on
encryption.
A
We
used
the
aes256
gcm
algorithm,
the
size
of
the
wall
happens
to
be
500g,
and
then
we
used
disk
spd.
It's
a
nice
tool
which
works
for
linux
as
well
as
windows.
Again
the
links
at
the
bottom.
They
can
reveal
more
information,
so
it
has
a
bunch
of
options.
A
So
you
can
see
you
can
specify
the
block
size
of
the
the
workload,
the
duration
of
the
test,
whether
you
want
to
disable
the
right
caching,
the
latency,
whether
you
want
to
track
latency
or
not
the
outstanding
ios
per
thread,
the
number
of
threads
you
want
to
do:
random,
io
or
sequential
io.
A
The
percentage
of
rights
versus
reads
the
warm-up
time,
whether
you
randomize,
whether
you
want
random,
is
content
in
every
right
and,
of
course,
the
target
right,
typically
a
very
standard
set
of
parameters
that
you
would
expect
to
see
in
any
io
tool
all
right
now.
This
is
how
the
performance
of
zfs
syn
was
a
year
ago.
A
The
first
row
is
deduplication
plus
compression
turned
on
on
z
wall.
It
was
okay,
I
mean
you
can
see.
128K
sequential
right
was
giving
us
around
400
mbps,
but
then
the
the
problem
started
when
we
added
encryption
to
it.
So
you
can
see
that
pretty
much
it
has
become,
you
know,
unusable
and
that
that
is
what
you
know
kind
of
triggered
the
the
next
set
of
experiments
that
we
did.
A
So
how
did
we
identify
the
bottlenecks?
Well,
one
tool
we
used
quite
a
bit
was
d
trace.
I
think
it
is
something
that
is
well
known
in
solar,
arrays
and
linux
community
as
well
again,
it
has
been
there
on
windows
for
a
while.
So
the
links
at
the
bottom
can
give
you
more
information
about
the
tool.
So
using
that
tool
you
could
find
out.
You
know
where
is
it
that
you're
spending
most
of
your
time?
A
So
you
could
use
it
for
tracing
kernel
code
as
well,
so
that
that
gave
us
quite
a
bit
of
insight
into
what
was
happening
and
then,
of
course,
zippo
iostat
arcstat.
So
those
have
been
very
well.
You
know
used
tools
for
measuring
the
performance
or
doing
the
analytics
as
far
as
performance
is
concerned.
The
only
challenge
there
is.
It
may
not
be
easy
to
do
the
the
charting
or
the
graphing
there
and
that
led
us
to
integrate.
A
You
know
the
the
output
of
zippo
iostat
or
whatever
zebra
stat
or
whatever
arcstat
does
in
perfmon.
So
you
can
see
those
three
highlighted
counters
there
open
gfs,
cache,
open,
zfsp,
div
and
z
pool.
So
those
are
the
three
counters
that
we
added
to
perfmon.
So
that
made
it
a
lot
easier
to
capture
the
output
in
the
form
of
graphs
that
I'm
going
to
show
in
a
while
again,
if
you
want
to
read
up
a
little
bit
about
what
what
is
the
pool?
A
What
arcstat
does
there
are
a
couple
of
links
in
the
bottom,
so
this
is
how
you
could
choose
the
v
devs.
They
are
qualified,
along
with
the
full
name
and
of
course
you
could
choose
the
pools
themselves.
If
you
have
multiple
tools,
you
can
see
all
of
them
here
and
choose
whichever
you
want,
and
this
is
this
is
how
the
the
output
will
look
like
in
the
tabular
format.
A
So
we
have
neatly,
you
know
prefixed
the
counters,
with
rcl
to
our
slog,
as
you
can
see
here
and
then
vdf.
So
again
you
can
see
you
know,
they're
active
async,
read
writes
there
is
a
pending
assignment
rights.
Reads
and
a
weight
count,
wait
time
and
so
on
and
so
forth.
All
of
them
neatly
organized
and
if
you
want
to
get
the
counters
at
the
pool
level,
this
is
what
you
could
do,
and
this
is
the
charting
that
I
was
talking
about.
A
A
So
this
this
really
helped
us
quite
a
bit
along
with
the
the
d-trace
tool
that
I
talked
about
and
then,
given
that
we
were
just
wrapping
up
on
zfs,
we
also
wanted
to
know
how
the
code
path
is
hit
and
what
code
paths
we
need
to
really
monitor.
How
do
we
learn
more
about
the
code
right?
So
one
great
way
is
to
do
tracing,
but
then
we
didn't
want
the
tracing
itself
to
hurt
the
performance,
so
that
is
where
we
used
wpp
testing.
A
It's
a
very
efficient
way
of
doing
tracing
in
windows
very
lightweight,
so
we
did
modify
the
cfs
installer.exe.
You
can
see
that
the
options
here
that
we
added
was
the
trace
command.
Hyphen,
l
ox4
basically
says
I
want
to
trace
anything,
and
that
is
at
level
four
and
below,
and
the
hyphen
has
250
says
the
size
of
the
file,
the
trace
that
we
want
to
collect
in
mb
and,
of
course,
hyphen
p
is
a
path.
So
basically
we
are
saying
hey.
A
I
want
to
turn
on
a
session
where
I
can
log
the
traces
at
level
four
or
below,
and
it's
a
circular
tracing
of
size,
200,
250,
mb
and
here
is
where
the
file
resides
and
where,
when
you're
done,
you
can
just
delete
the
session
using
a
hyphen
d
option.
So
we
added
a
trace
event
function.
So
you
can
see
there
are
a
few
flags
here
and
also
in
an
example.
So
this
is
how
you
could
say
this
is
again
a
windows
specific
code.
A
So
you
could
say
you
can
specify
the
level
here
and
then
the
string
that
you
want
to
print
in
the
parameters
just
like
you
would
use
in.
You
know,
printf
statement
and
alongside
we
also
instrumented
the
printf
function,
because
there
are
a
lot
of
different
diffs
already
in
the
code
and
they
map
to
a
default
level
of
four.
A
So
so
that
means,
like
you
know,
using
this
command
here,
gfs
installer
trace
hyphen
l0
x4.
We
could
just
see
all
the
traces
that
were
coming
from
d
printf,
as
well
as
any
new
places
that
we
have
added
again.
If
you
want
to
explore
wpp
a
little
further.
There
are
a
couple
of
links
in
the
bottom
that
you
could
pursue
all
right,
so
wpp.
Actually
it
doesn't
write
the
entire
string
or
whatever
you
specify
in
the
trace
event
or
d
printf.
A
A
A
The
first
thing
we
tried
was
to
use
the
intel's
isl
crypto
library,
specifically
the
256
gcm
algorithm
from
it.
It's
an
open
source
solution.
The
link
at
the
bottom
talks
about
that.
So
that
has
given
us
a
significant
performance
improvement.
It
actually
leverages
the
processor
advancements,
the
avx
ii
instructions
and,
along
with
that,
we
made
a
small
change
in
the
store
port
area.
So
basically
we
tell
us
to
report
hey.
A
If
you
have
rights
of
128k
or
more,
you
could
give
them
in
one
shot
so,
rather
than
chopping
it
into
64k,
we
are
capable
of
handling
128k
rights
at
a
time,
because
that's
what
the
wall
block
size
we
use
underneath.
A
So
it
kind
of
aligns
well,
so
that
change
also
helped
us
a
little
bit
and
then
we
did
borrow
a
couple
of
changes
from
upstream
and
just
to
remind
we
started
working
on
this
couple
of
years
ago
and
we
didn't
have
open
gfs
2.0
back
then
so
we
had
to
you
know,
borrow
some
some
nice
changes
that
that
we
learned
from
upstream
into
the
zfs
in
the
repo.
So
a
couple
of
things
that
helped
here
were
the
metaslab
unload
delay.
A
A
So
earlier
the
default
was
eight
and
which
means,
if
you
have
not
seen
any
activity
happening
to
a
particular
meta
slab
in
the
last
eight
transaction
groups.
Then
you
want
to
flush
that
data
structure
to
disk
and
that
was
kind
of
causing
a
lot
of
disk
activity.
So
we
bumped
it
up
to
2048,
so
that
kind
of
locks,
the
meta
slabs
longer
in
memory.
A
So
we
don't
do
a
lot
of
io,
so
that
really
resulted
in
a
what
performance
improvements
there
and
then
we
actually
again
borrowed
the
data
sync
percent
from
upstream
so
earlier
it
was
64
mb,
so
which
means
every
time
you
have
a
64
mb
of
nutty
data,
you
or
you
just
flush
it
to
the
disk.
So
we
bumped
it
up
to
20
of
4gb,
which
is
kind
of
a
configurable
number.
So
that
gives
us
more
leeway
to.
A
You
know:
collect
more
data
in
the
transaction
group
before
writing
it
to
the
disk.
So
those
are
the
bunch
of
changes
that
we
did
in
zfs
in,
and
that
is
what
the
performance
gains
look
like.
A
If
you
look
at
the
top
three
rows,
we
made
significant
gains
there
and
the
bottom
three,
although
the
percentage
gains
are
there,
but
but
the
base
is
so
small
that
this
percentage
gain
is
not
really
very
helpful,
as
as
far
as
the
practical
usage
of
the
the
driver
is
concerned,
so
there's
still
some
room
to
improve
here.
So
the
top
three
are
essentially
larger
block
sizes.
This
is
the
size
of
the
data,
so
128k
blocks,
whereas
the
bottom
three
are
smaller,
writes
and
you
know,
reads
4k
and
8k.
A
A
Now,
how
does
it
compare
with
opengfs
2.0
on
linux?
That
experiment
was
done
and
you
can
see
that
the
top
three
rows
are
still
are
still
okay-ish,
not
bad,
but
then
again
the
bottom
three
is
where
we
still
think
open.
Cfs
is
doing
much
better
on
linux,
and
definitely
there
are
some
bottlenecks
in
cfs
in
that
that
need
to
be
removed,
and
it's
possible
that
when
we
port
open
cfs
2.0
to
windows,
those
those
bottlenecks
still
exist.
A
So
that's
something
that
we
continue
to
look
into
all
right.
So
I
talked
about
jorgen
he's
been
fantastic
in
extending
support.
A
Now,
we've
already
migrated
the
changes
that
I
talked
about:
the
perfmon
counters,
the
wpp
tracing
the
store
port
change,
all
of
them
to
a
windows,
branch
of
that
repo,
open
gfs
on
windows,
slash
open
cfs.
So
the
idea
there
is
to
it's
kind
of
a
stage,
so
jorgen
takes
all
these
changes.
He
reviews
them.
He
requests
any
changes
that
are
required
and
then
he
you
know
bunches
them
up
and
upstreams
it
to
the
2.0
whenever
the
time
is
right.
So
so
that
is,
you
know
that
model
has
helped
us
move
at
the
base.
A
We
all
can
you
know,
move
it.
So
so
that's
the
model
that
we
are
following
right
now
and
then
going
forward.
We
want
to
look
at
the
the
2.0
code
base
for
windows,
basically
try
to
stabilize
it
and
then
you
know
work
on
the
performance
improvements
just
like
we
did
for
the
zfs
in
codebase
all
right.
So
that
was
the
prepared
notes.
I
would
be
happy
to
take
any
questions
at
this
point.
A
Yes,
we
did
try
a
lot
of
combinations,
but
then
I
don't
have
the
data.
Of
course
we
have
limited
time
to
talk
about
it
as
well
and
we
did
run
a
lot
of
other
tools
as
well.
Like
you
know,
this
was
not
the
only
four
corner
test
that
we
did.
We
ran
this
using
midi
bench
here.
Bench,
hammer
db
a
lot
of
stuff,
but
is
there
a
specific
question
around
it.
C
I
was
just
curious
because
I
know
that
dedupe
has
its
own
bottlenecks
associated
with
it
and
if
you
know,
if
we
got
markedly
different
results
without
dupe,
enabled
it
might
point
to
different
priorities
for
bottleneck,
resolving.
A
No
yeah,
the
the
latest
test
that
we
did
was
without
video
without
compression
just
128k
rights,
and
we
still
see
some
issues
in
zfs
in
compared
to
open
gfs
2.0,
especially
when
you're
running
it
on
nvmes.