►
Description
TiKV is a strongly consistent key-value database built upon the Raft algorithm. It stores data in basic units called Regions. Multiple replicas of a Region form a Raft group. When a read hotspot appears in a Region, the Region leader can become a read bottleneck for the entire system. In this situation, enabling the Follower Read feature can significantly reduce the load on the leader and improve the read throughput of the whole system by balancing the load among multiple followers. In this talk, we will walk you through the TiKV architecture, why we introduced Follower Read, and how we implemented it.
Presenter:
Minghua Tang, Infrastructure Engineer @PingCAP
A
Okay,
let's
get
started
like
to
thank
everyone
who
is
joining
us
today.
Welcome
to
today's
cncf
webinar,
how
we
are
how
we
doubled
system
read
through
put
with
only
26
lines
of
code,
I'm
jerry,
fallon
I'll,
be
moderating
today's
webinar.
We
would
like
to
thank
our
and
welcome
our
presenter
today,
minghan
minha
ting
infrastructure
engineer
at
pingcap,
just
a
few
housekeeping
items
before
we
get
started
during
the
webinar.
You
are
not
able
to
talk
as
an
attendee.
There
is
a
q,
a
box
at
the
bottom
of
your
screen.
A
Please
feel
free
to
drop
your
questions
in
there
and
we'll
get
to
as
many
as
we
can.
At
the
end.
This
is
an
official
webinar
of
the
cncf
and,
as
such
is
subject
to
the
cncf
code
of
conduct.
Please
do
not
add
anything
to
the
chat
or
questions
that
will
be
in
violation
of
the
code
of
conduct.
Please
be
respectful
of
your
fellow
participants
and
presenters
also
note
that
today's
recording
and
slides
will
be
posted
later
today
on
the
cncf
webinar
page
at
cncf,
dot,
io,
slash
webinars.
B
Okay,
thanks
for
jerry,
hello,
everyone
thank
you
all
for
joining
our
website
today
and
it's
midnight
now,
but
still
good
morning,
good,
happy
and
good
evening
for
you,
for
you
all
around
the
world.
It's
a
great
honor
for
me
to
talk
about
techyv
and
sean,
how
we
improve
the
system
with
throughput
and
today,
I'm
going
to
focus
on
the
follow-it
feature,
which
does
help
us
to
improve
the
real
throughput
and
the
of
the
whole
system
in
some
cases
and
introduce
many
other
possibilities.
B
As
you
can
see
from
the
title,
this
feature
was
implemented
in
a
very
simple
way,
with
only
six
26
nights
of
code,
and
so
are
you
are
you
guys
curious
about
how
and
the
way
how
and
why
we
made
it?
Okay,
let's
get
started
and
hope
you
will
join
this
level
today.
B
B
I
focus
on
techyv's
stability
and
build
its
ecosystem
truth
and
I'm
passionate
in
database,
especially
the
storage
lessons.
So
if
you
share
this
interest
with
me,
I'd
like
to
commit,
can
you
communicate
with
you
about
this
field?
Okay,
let's
look
at
the
agenda
for
this
webinar.
B
This
level
is
going
to
be
focused
on
the
follow
reads,
but
before
we
dive
deep
into
technical
details
of
the
follow
reads,
I'd
like
to
introduce
kv
roughly
in
case
you
might
not
be
feeling
familiar
with
tech
tv,
and
then
I
will
talk
about
the
general
use
cases
of
follow,
read
before
q
and
a
just
as
mentioned
type
your
questions
in
the
qa
qna
box.
During
this
webinar,
I
will
answer
them
after
I
finish
the
presentation
part
okay,
so
what
is
tech
tv?
B
Thai
is
short
of
titanium,
which
is
a
chemical
element
known
for
its
call,
correction,
resistance
or
resistance,
and
it's
widely
used
in
high
end
technologies
kv
is
heavy
here
is
easy
to
understand,
so
stands
for
keep
each
stand
for
key
value
database
so
from
its
name.
Oh
sorry,
okay,
so
techyv
the
key
value
database
with
zone
stability.
B
Besides
that
techhuv
is
a
distributed.
Transactional
k-value
database
with
a
sports
transaction
and,
of
course,
it's
open
source.
B
It
was
created
as
it
was
actually
created,
as
the
underlying
storage
engine
for
teddy
b
by
pincap
and
its
design
is
inspired
by
google
spanner
and
edge
base,
but
but
the
design
the
design
is
simpler
and
more
practical.
B
B
And
so
what
tech
v
offers
as
a
key
value
store?
Techyv
provides
general
general
key
value,
oriented
interface.
B
B
B
It's
it's
achieved
by
sport.
Geographications
the
data
you
store
in
tech,
tv
have
multiple,
have
multiple
replicas
like
this
for
three
three
graphical
cluster:
the
service
will
not
be
affected.
If
one
can
run
with,
one
graphic
is
done
so
the
service
will
maintain
junk
and
the
service
will
maintain
strong
consistency.
B
I
will
talk
about
the
strong
consistency
meter
this
and
the
then
maficus
can
be
scheduled
to
different
physical
locations,
and
so
the
transfer
has
can
survive
from
submersion
veneers
or
even
data
center
veneers.
B
B
Transactions
can
be
pretty
useful
and
you
know
when
you
want
to
operate
with
multiple
key
value
pairs
and
the
wants
to
happen
all
at
once
or
long
at
once.
B
Okay,
there,
this
is
the
overall
architecture
of
a
single
technique
load.
As
you
can
see,
it
runs
like
a
lawnmower
application.
Clients
connect
to
the
load
by
using
grpc
protocol.
B
The
fundament
part,
the
fundamental
part,
is
a
lock
search
engine.
You
know
a
rocks
db,
it's
not
distributed
it
does
it
does
not
have
any
knowledge
about
replicas
or
sheds
and
upon
the
storage
engine
there
is
a
consensus
module
based
based
on
rapt
it
handles
replications,
with
some
consensus
with
strong
consistency.
B
And
from
the
class
view,
the
data
was
cut
to
multiple
sets.
We
called
it
region
each
each
chart.
We
are
distributed
to
multiple
techyv
nodes
with
and
we
have
a
placement
driver,
it's
next,
it's
next,
the
brain
of
the
cluster
and
it's
responsible
for
managing
the
the
regions
and
the
scheduling
the
regions.
B
It
also
provides
to
locate
the
tent
stamps
for
transactions.
So
that's
the
basic
architecture
of
techyv.
B
B
Forward
was
built
upon
the
build
upon
the
wrapped
contentious
always
so,
and
now
we
can
take
a
quick
look
about.
The
raft
of
the
wraps,
always
as
you
can
see,
consistent
struggles
is,
is
the
most
important
algorithm
in
the
distributed
in
in
distributed
assistance.
B
The
one
thing
they
do.
The
only
thing
they
do
is
to
hear
of
the
neither
if
they
don't
give
any
message
at
all
from
from
a
leader
for
a
while,
they
will
conserve
that
the
needle
may
be
done,
and
then
we'll
become
a
candidate
at
which
point
they
try
and
win
a
leadership
to
become
the
leader,
and
if
they
render
in
action
and
to
become
a
leader,
they
stayed
neither
until
until
they
first
and
step
down
and
go
back
to
the
follow
again.
B
And
and
once
we
have
a
leader,
if
your
client
wants
the
same
mission
to
execute
a
command,
it
sends
the
message
to
the
neither
the
neither
adds
that
command
to
its
log
and
sends
out
to
other
followers
asking
him
to
append
that
note
to
to
their
log
assume
the
leader
gets
the
response
from
a
majority
of
followers,
including
itself
that
note
will
be
decided
to
be
committed,
and
then
then
then
it
will
be
sent
to
the
state
machine
to
execute
the
command.
B
B
It
will
reply
to
clients-
and
one
thing
you
must
remember
is
that
the
applied
index
returned
neither
and
the
follower
are
not
equal
and
also
the
applied
index
and
the
the
committed
index
are
also
not
equal.
B
B
The
normal
reads
on
the
needle
for
for
read
operations:
they
do
not
need
to
replicate
everything
anything
to
others
to
other
nodes
and
as
as
mentioned
before
one
only
after
the
needle
executes
the
command.
B
B
B
B
B
B
So
now,
let's
talk
about
how
the
follow
it
was
implemented
as
the
nine
decades
the
idea
of
follow
it
is
reading
from
the
state
motion
in
followers,
but
how
we
can
ensure
that
we
can
get
the
latest
states
from
the
followers.
B
A
key
operation
is
called
reading
decks
from
from
the
previous
introduction,
it's
easy
to
know
that
the
committed
index
indicates
the
last
command
is
likely
to
be
actually
executed.
B
B
So
there
are
three
steps
of
follow,
read,
read
index
and
wait
for
the
state
machine,
executes
the
commands
of
the
committed
index
and
then
read
the
local
state
machine,
but
we
have
exception
here.
I
just
mentioned
a
little
before
in
tech
tv
we
implement
the
pipeline
route
of
the
stages
are
executed
in
pipeline
way.
B
B
That
means
committing
a
lock
and
applying
a
would
would
happen
would
happen
a
synchronizing
asynchronizing,
so
off
so
a
folder
may
execute
commands
faster
than
the
faster
than
the.
Neither.
B
B
So
here
so
here
from
the
timeline,
the
the
follower
returns
a
newer
value
than
the
needle
that
breaks
the
sorry
generalizability
yeah,
because
we
have
got
the
value
from
the
we
just
went.
We
are
writing
to
the
system,
but
we
can
not
get
it
from
the
system
again
so,
but
what
parts
the
substitute
destination?
B
B
So
so,
let's
briefly
talk
about
the
transaction
in
techyv,
the
transaction
in
10kv
was
implemented
in
two
phase
commits
to
pc
it's
similar
to
google's
pocket
meter.
Every
key
has
three
columns:
log
white
and
defaults.
Here
we
can
ignore
the
default
column,
a
log.
It
represents
that
a
transaction
wants
to
write
something
to
this
key
and
on
the
right
right.
It
represents
that
there
is
a
transaction
was
committed
before
and
the
way
you
when
you
create
creates
a
transaction.
B
If
that
will
get
a
10
stamp
from
pd,
we
call
it
30s
and
try
to
read
from
if
you
try
to
read
from
techyv.
That
will
check.
B
If
there
is
a
log
if
lost
or
the
timestamp
of
of
the
log
is
bigger
than
the
star
ts,
it
will
get
data
from
the
nas
nasa
writer
records,
which
test
stamp
is
smaller
than
star
ts,
and
if
you
want
to
write
something,
the
data
will
personally,
firstly
be
buffered
in
the
client
and
will,
when
you
try
to
commit
the
transaction,
it
will
choice
your
key
as
primary
key
and
check
whether
right
any
transaction
has
acquired
this
the
log
of
any
key
you
want.
B
You
want
to
write
and
check
right
and
check
what
there
is
any
transactions
which
has
a
bigger
10
stamps
was
committed
by
checking
the
right
white
column.
B
So
if
there
are
not
any
conflicts,
it
will
write
a
lock
with
rts
to
this
key,
and
this
this
stage
is
called
prewrites.
B
After
it
requests
all
of
logs
of
logs
of
of
the
keys,
and
it
will
get
a
little
timestamp
as
committed
ts
commit
status
and
then
if
log
still
exists
and
it
will
delete
the
log
and
write
a
new
right
record,
atomically
yeah
once
once
its
primary
key
was
committed.
This
transaction
is
considered
to
be
committed.
B
That's
where
a
lot
hurts
snapshot
association
because
we
don't
complete
to
provides
yet
and
the
other
transactions
their
star
ts
must
be
smaller
than
the
than
the
committed
test
of
our
transaction.
B
B
The
first
thing
I
want
to
clarify
is
that
generally
fall.
Risk
is
not
helpful
for
performance
because
we
have
a
placement
driver
and
it
will
balance
the
neither
silicon
to
taking
notes.
B
B
B
It
works
on
a
similar
way
with
foreigners
except
except
except
sorry,
except
it
does
not.
It
does
not
participate
with
inactions,
so
we
place
the
nurse
on
time
flash
to
perform
analytical
processing
through.
B
Forwards
use
case
is,
when
you
deployed
the
data
cluster
in
multiple
data
centers,
you
can
perform
reads
on
the
nearest
data
center
so
for
reducing
the
use
of
binaries
across
the
data
centers,
for
example,
cockroachdb
has
has
used
the
follow
rates
to
reduce
the
multi-region
deficit.
B
B
B
So
I
think
that
is
everything
I'd
like
to
cover
for
now.
If
we,
if
you'd
like
to
grasp
more
information
about,
follow
it
on
tech
tv,
you
can
text
contact
me
by
email
or
always
to
all
website,
github
and
twitter
or
or
check
out,
or
talk
with
us
on
the
slack
channel
section
we're
happy
to
hear
your
questions
or
feedback.
A
A
A
A
It
doesn't
look
like
anyone
has
any
questions,
but
that
being
said,
I
think
we're
going
to
wrap
up
this
webinar.
I
want
to
thank
michael
again
for
a
great
presentation
today
and
though,
as
I
said
earlier,
today's
webinar
will
be
available
on
the
cncf
website,
along
with
the
slides.
Thank
you
again.
Everyone
for
joining
today's
webinar
have
a
great
day.
B
Yeah
yeah
yeah,
but
it's
to
provide
the
risk
consistency
is,
is
useful
for
some
users
and
we
can
provide
some
way
to
reduce
the
consistency
of
a
consistent
level
and
which
sounds
consistent.
Some
vehicle
consistency
for
for
users,
yeah.
B
Yeah,
we
can
actually,
we
have
a
neighbor
for
every
tekken
loads.
We
can
setting
a
label
for
every
tech
loads
and
we
can,
and
we
can,
we
can
mark
the
request
and
which,
on
the
which
label
which
neighbor
they
want
to
read.
Yeah.
A
A
A
All
right,
I
think,
we're
going
to
wrap
it
up
here.
Thank
you
again
for
a
wonderful
presentation
today
and,
as
said
before,
today's
webinar
will
be
available
on
the
website,
along
with
the
slides.
Thank
you
again,
bingha
for
your
presentation
and
to
all
the
attendees
have
a
great
rest
of
your
day.