►
From YouTube: Hardware Acceleration with QAT by Weigang Li
Description
From the OpenZFS Developer Summit 2018
Slides: https://drive.google.com/file/d/0B_J4mRfoVJQRV3ZOd1ZMWkphcV9OYXdWT0FBblVHbVZpSmZj/view?usp=sharing
A
Our
next
presenter
is
a
way
gangly
from
Intel
and
I.
Believe
you
flew
directly
from
Shanghai
right
yeah.
So
is
it
your
first
thought
in
CFS
conference?
Yes,
yes,
well!
Welcome
to
top
in
CFS,
so
wait
gang
is
going
to
talk
to
us
about
Intel's,
qat
technology
and
how
it
can
be
used
in
CFS
to
improve
the
performance
of
some
of
the
CFS
tasks.
So
please
welcome
a
way
gang.
B
This
is
my
agenda.
First
I
will
talk
about
the
motivation.
While
we
do
this
project
and
then
I
wish
you,
the
current
project
status
and
the
review
and
I
will
also
talk
about
details
of
the
acceleration
of
comparing
encryption
checksum
with
cavity
and
how
to
config
and
the
use
cavity
in
ZFS,
and
then
the
project
challenges
and
potential
future
work.
In
the
last
we
have
a
QA
session.
B
B
B
So
then
we
thought
that
some
cpu
intensive
working
load,
for
example,
that
compression
encryption
checksum
could
be
offloaded
to
the
dedicated
hard
well.
There
are
some
benefit
to
do
that,
first,
to
get
better
performance
and
second
to
free
up
cpu
resource
and
third,
it
is
really
important.
Is
that
because
the
FS
is
a
very
low
level
sort
of
well
if
we
enable
the
hard
work
at
work,
acceleration
g
FS,
ZFS
hard,
the
hardware
details
were
well,
so
all
our
customers
applications
can
will
not
be.
B
B
B
They
are
all
programmable
hard.
Well,
it
can
provide
pretty
good
flexibility
and
it
mainly
used
to
add
service.
Some
flexible
working
laws
and
Isaac
is
affixed
the
hard
work
salvation
it.
It
is
mainly
used
to
accessory
some
standing
working
load,
for
example,
that
compression
encryption
or
checksum
the
algorithm
were
stable,
so
they
will
not
be
changed
very
frequently
so,
as
it
is
a
best
choice
to
accessory
such
kind
of
working
loss
because
the
Isaac
panel
to
change
the
program
in
I
mean
it
is
a
fixed
hardware,
acceleration.
B
So
a
quick
assistant,
knowledge
activity
is
a
kind
of
Isaac
observation.
It
is
mainly
for
some
compute
the
intensity.
Volcanoes
include
the
park,
cryptography,
the
public
key
exchange
and
dead
compression.
It
has
three
form
factors
at
competing:
a
chipset.
It
has
been
integrated
into
our
Z,
our
latest
xeon
platform
in
the
Louisburg
chipset.
B
B
It
can
also
walk
with
Dilip
library,
with
a
page
and
Hadoop
also
support
the
Qt
codec,
which
can
use
QT
for
dead
compression
acceleration,
so
they're,
all
in
the
user.
Space
in
a
common
space
were
also
trying
to
integrate
clarity
into
the
Linux
kernel
crypto
framework.
So
all
the
kernel
modules
can
make
use
of
cavities
through
the
algae.
Cf
API
service
is
another
good
example.
We
are
working
on
to
enable
ZFS
support
activity
in
the
kernel
mode,
so
I
will
consume
more
details
later.
B
And
we
start
this
project
from
early
last
year,
our
contribution
in
the
ZFS
on
Linux
community,
our
first
patches
for
the
gzip
compression
of
loading
with
security,
which
has
been
merged
in
ZFS,
0.7
or
release.
And
after
that
we
neighbor
QT
support
in
the
30
KMS
RPM
package.
With
this
change
of
customers
can
easily
install
the
ZFS
with
Katy
support
and
after
that
pump
Caputi
who
implement
the
two
patches.
C
B
B
As
you
know,
or
they'll,
publish
jobs,
the
compression
encryption
and
checksum
in
each
stage
we
will
check
if
the
cavity
hardware
has
been
installed
and
the
QT
Java
has
been
initialized
and
if
the
buffer
size
is
good
for
offloading,
we
will
take
over
the
compress
or
encryption
or
checksum
API
to
the
cavity
API.
So
the
Qt
hardware
can
be
used
to
accelerate
the
working
laws,
because
the
CL
public
create
a
pool
of
thread
and
the
multiple
threads
are
very
good
for
the
hardware
accelerator
to
be
fully
utilized.
B
This
is
a
what
we
called
Lucas
odd
walking
model
the
GFS
thread,
the
yellow
thread
called
the
Qt
EPI
and
bla
quitting
the
job
completion.
The
cutiepie.
We
send
a
request
to
the
cavity
Hardware
on
Katie
Hardware
start
to
process
the
job
during
the
processing.
The
Qt
hardware
made
TMA
the
power
for
the
input
buffer
from
the
and
TMA
out
the
result
to
the
destination
powerful
genie
integral
after
chop
down.
The
hard
work
period
will
will
interrupt
QT
driver
and
the
key
to
our
Java.
We
have
the
zero
thread,
so
the
gel
come
completed.
B
Ok,
let's
look
at
the
compression
when
we
talk
about
compression.
Normally,
we
need
think
about
how
to
balance
the
compression
performance
and
its
cost.
There
are
several
factors
to
measure
compression
algorithms,
for
one
important
factor
is
a
compression
ratio.
A
better
comparison
ratio
means
that
we
can
save
more
disco,
disco,
utilization
or
network
bandwidth
if
the
compressed
data
is
transferred
over
the
network
and
enough
practice
to
put
the
speed
of
the
comparison
algorithm,
but
that
compression
is
not
free
to
get
better
performance.
B
B
Let's
take
a
quick
look
at
some
benchmarks:
we
with
is
some
simple
benchmark
with
just
compares
a
big
fall
with
to
kick
bass,
ketball
size,
and
we
measure
different
comparison.
I
prism
and
I
found
that
our
c4
is
a
faster
algorithm,
but
the
compression
ratio
is
not
good.
Gzip
can
provide
a
pretty
good
compression
ratio,
but
it
takes
a
very
long
time
to
compress
the
fall
and
this
standardized
new
algorithm.
It
has
a
better
comparison
with
you
than
she
zip
and
faster
than
gzip.
B
B
Another
important
thing
is
that
even
in
the
worst
shot
compressed
time,
the
CPU
most
time
is
idle,
because
all
the
job
is
done
in
the
hard
wall,
so
CPU
just
waiting
for
the
cavity
complete
the
world.
Ok,
that
means
with
a
hardware
accelerator.
We
can
provide
pretty
good
balance
between
getting
good
comparison
performance
with
reasonable
cost.
So.
B
Zfs
compression
is
a
block
comparison
for
each
block,
for
example,
one
to
a
kill,
but
it
will
call
the
zero
compares
that
function
in
this
function.
It,
which
is
a
comparison
like
versa,
if
the
gzip
compression
algorithm
is
selected
and
if
the
guilty
device
is
installed
in
the
system
of
the
gzip
of
loading,
will
will
always
happen
automatically,
and
the
output
of
the
cavity
hardware
is
gzip
format
compatible.
That
means
the
data
compressed
by
the
cavity
can
be
decompressed
by
the
gzip
sort.
Well,
if
the
system
has
no
cavity
install.
B
And
after
we
offload
the
gzip
compression
into
cavity,
Hardwell
will
run
the
benchmark
and
they
were
measured.
The
throughput
CPU
utilization
and
compressed
ratio,
the
pathway
we
use
the
ephod
tested
to
to
run
a
bunch
of
to
write
a
bunch
of
files
to
the
pool.
The
pool
contains
three
nvme
discs
and
we
compared
the
different
comparison
algorithm.
The
first
one
is
no
compression
of
the
second
ones
are
easy
for
software
compression.
B
B
Well,
our
c4
and
the
gzip
consumes
a
lot
of
CPU
cycles
to
complete
the
work
our
CFO
takes.
Modern
60%,
CPU
utilization
and
gzip
takes
modern
8%
because
of
severely
utilization
across
all
the
cpu
costs
in
our
system.
In
our
birthmark
system
it
contains
about
888
CPU
cost.
So
it
is
a
big
CPU
consumption
for
software
comparison
and
from
the
comparison,
visual
point
of
view
or
committee
can
provide
a
similar
ratio
as
gzip
software
is
much
higher
than
our
z4
okay
yeah.
B
Compare
with
that
compression
ZFS
encryption
is
a
little
bit
complex.
We
not
go
into
much
details
in
the
attackers.
In
short,
there
is
a
fika
file,
encryption
key,
which
is
derived
from
master
key
and
is
used
to
compress
the
data
in
one
file
with
the
AES
CCM
and
AEST
same
algorithm,
currently
or
ZFS
community
only
support
a
yes
Tasya
yeah.
B
So
after
we
offloading
the
AES
GCM
to
Committee,
we
update
this
path
back
result.
We
run
the
similar
test
as
compression
we
use
of
a
fault
testitude
to
write
the
files
to
the
encrypted
set
and
measured
performance,
different
number
of
caused
CPU
cores.
We
have
to
call
for
call
8,
16
and
32
cost
online
and
offline,
the
other
cost
and
the
first
one
is
no
encryption.
B
The
second
wise
is
DSM
software
implementation
and
the
blue
ones.
Est-Ce
que
o
ET.
We
can
see
that
overall
QT
time
provides
a
better
throughput
than
the
originals
to
our
implementation.
It
is
very
close
to
the
no
encryption
case,
but
please
note
that
if
we
have
Muirfield
cause,
for
example,
in
the
to
cause
or
forecast
cases,
if
we
want
to
get
the
best
performance
for
cavity,
we
need
to
increase
the
number
of
arrests
in
the
CLO
popular
because
the
zero
stress
not
the.
C
B
B
And
the
checksum
ZFS
has
the
default
fletcher
checksum
algorithm.
It
runs
very
fast
and
cpu
is
good
enough
to
handle
this
checksum
and
sha-256
is
a
strong
person,
but
is
a
versatile.
So
if
we
enable
the
sha-256
checksum
and
we
run
the
test-
just
arise,
some
fall
to
the
zip
who
we
can
see
that
the
proof
top
shows
that
the
sha-256
transform
data
is
a
hot
spot.
So
if
we
want
to
remove
this
hotspot
with
offloaded
to
cavity
and
yeah
compare
is
the
original
sha-256
software
implementation
cavity?
B
B
So,
at
a
bill
time
we
have
two
ways.
The
first
thing
is
from
the
RPM
package.
If
the
cavity
hardware
has
been
installed
and
the
committee
driver
has
been
installed,
we
can
specify
a
CP
root
system,
environment,
variable
pointer
to
the
cavity
driver
and
then
just
install
the
RPM
and
install
the
ZFS
module.
B
And
from
the
source
code,
we
have
added
a
new
parameter
in
the
config,
which
is
with
cavity,
pointed
to
the
QT
to
our
location
and
after
that,
just
to
make
the
source
code
and
install
the
ZFS
after
the
FS
con
module
is
installed.
We
can
see
that
it
with
depends
on
QT,
p,
icon
module,
and
there
are
also
three
new
parameters
added
for
cavity.
B
So
in
the
runtime,
if
we
need,
if
we
specify
the
compression
algorithm,
is
gzip
or
encryption
with
es
GCM
or
checksum
with
sha-256,
the
committee
will
be
automatically
used
for
observation
and
there
are
some
QT
contest
added
in
the
profile,
so
with
a
monitor
the
cavity
status
easily.
For
example,
how
many
parts
has
been
signed
to
purity
how
Manuel
cuesta
has
been
sent
to
cutie
device?
How
many
bars
have
been
generated
by
the
cutie
device.
B
Yeah,
there
are
some
challenges
when
we
implement
the
code.
Firstly,
is
the
memory
location
as
Katie
Hardware
requires
some
intermediate
buffer,
which
is
pre
located
in
the
DRAM,
and
the
size
is
read
record
size
depending
so
for
the
big
record
size,
for
example.
One
mid
part:
it
is
a
big,
continuous
memory
requirement.
So
for
now
we
just
limit
the
buffer
size
from
four
killed
buzz
too
Wayne
once
we
kill
guys
to
get
the
best
performance,
but
it
is
just
a
software
decision.
It's
not
a
hardware
restriction.
B
B
Yeah,
a
buffer
overflow.
Sometimes
when
the
data
is
not
compressible,
it
might
be
expanded.
So
if
the
destination
bar
for
pre-allocated
asthma
puffer
is
not
big
enough
to
contain
the
output
from
the
hard
well,
the
cavity
hardware
will
pass
somewhere,
oh
so,
to
resolve
this
problem.
We
add
an
additional
buffer
as
part
of
the
output.
So
this
way
the
hardware
comparison
can
always
success,
but
from
the
ZFS
layer
we
can
detect
if
the
data
has
been
expanded.
We
just
discard
the
output.
B
B
Your
some
potential
future
work
currently,
as
you
can
see
that
in
1
0
thread
when
the
zeroth
read
called
the
qtr,
it
will
of
just
waiting
for
the
completion.
So
the
thread
cannot
be
swept
out
to
you
know
to
to
do
other
work,
but
this
is
a
little
bit
like
synchronized
calling
of
the
hardware
acceleration.
So
if
we
can
change
it
as
a
synchronized
calling
it
will
improve
the
performance,
and
another
idea
is
that
we
might
think
of
changing
the
compression
and
checksum
in
one
request.
B
B
Sorry
so
the
my
requester
can
do
both
compression
and
checksum.
So
this
we
can
save
some
of
loading
cost
and
Katie
also
has
another
feature
called
P
protection
technology
which
can
be
used
to
protect
the
security
in
the
hardware.
So
we
thought
to
think
of
using
this
feature
to
protect
a
master
key
in
ZFS
encryption,
and
so
so
the
master
key
will
not
expose
the
outside
the
hard
well
to
avoid
the
memory
snooping
attached.
B
D
B
E
B
B
C
B
B
So
there
are
some
latency
introduced
which
day
of
loading
method,
but
for
that
compression,
especially
in
the
big
block
size,
for
example,
1
to
8,
kill
part.
Even
adding
this
of
loading
cost
still
has
better
latencies
than
the
software,
because
thought
well
we're
as
really
slow.
But
if
you
have
very
small
block
size,
less
than
4
key
advice,
for
example,
512
applies,
so
it
may
be,
have
worst
latency
compared
with
software
yeah.