►
Description
Vector search engines have recently been gaining popularity. In this session, we will be explaining what they can achieve, peek under the hood to check what is inside, and see why Rust is a good fit for the job.
---
Arnaud Gourlay is a software engineer at Qdrant where he contributes to the Qdrant vector search engine. Long time OSS user and maintainer, he enjoys programming in Rust and sharing his knowledge.
A
A
So
first,
who
am
I
so
now
you
know
I'm
Arno,
I'm,
a
software
engineer
and
a
full-time
contributor
on
the
quadrant
project.
My
two
favorite
things
are
open
source
software
and
rush,
so
I'm
very
grateful
to
be
able
to
work
on
the
quadrant
project.
I
have
myself
a
bunch
of
Open
Source
projects
that
I
maintain.
You
can
find
those
on
my
guitar
profile
and
I
also
have
a
Blog
where
I've
write
about
my
projects
and
about
rest
and
I
enjoy
particularly
to
write
about
the
performance
aspects
of
developing
rust
cool.
A
So
the
agenda,
of
course
we're
going
to
talk
about
quadrant,
but
this
won't
be
your
sales
pitch.
I
will
give
you
a
general
introduction
to
Vector
search
engines
so
that
you
can
transfer
what
you
learned
today
in
other
with
other
products.
Everything
is
okay,
we'll
go
over
a
couple
use
cases
for
the
technology,
because
good
technology
without
good
use
cases
not
really
interesting.
A
Okay.
So
what
is
a
quadrant?
Now?
You
know
it's
an
open
source,
Vector
search
engine
fully
written
in
Rust
and
we
do
embrace
the
rust
ecosystem.
That
is,
we
have
a
lot
of
dependencies.
We
do
suffer
from
time
to
time
from
long
build
times,
but
that
that
comes
with
the
two
layout
set.
Concretely,
you
interact
with
quadrant
via
an
HTTP
Json,
API
or
jrpc.
If
you
prefer,
we
have
for
that
three
official
clients,
python,
rust
and
go.
It
is
possible
to
deploy
quadrant
in
a
distributed
setup,
which
is
nice
for
scaling.
A
A
To
understand
what
quadrant
does
it's
good
to
have
a
look
at
the
evolution
of
search
for
the
past
years?
Traditionally,
search
has
been
about
having
an
exact
match
on
the
text,
keyword,
if
not
exact,
then
maybe
relying
on
a
list
of
synonyms
or,
if
maybe
a
fuzzy
match
with
something
like
an
edit
distance
to
a
better
match.
However,
if
we
look
at
the
internet,
we
have
a
growing
amount
of
infrastructure
data.
We
have
collections
of
text
files,
images,
videos,
audio
files,
and
we
have
trouble
making
sense
of
all
this
data.
A
Also,
if
you
use
the
internet
I'm
sure
you
you've
noticed
that
all
websites
now
do
recommendation
you
to
go
on
e-commerce
websites.
They
offer
recommendation
for
products,
so
you
want
to
watch
a
video,
or
here
you
go.
You
should
watch
this
video.
You
should
listen
to
this
podcast
and
so
on,
and
this
this
changing
Trend
can
be
described,
Under
the
Umbrella
of
similarity
search.
This
is
what
we're
gonna
have
a
look.
Look
at.
A
The
easiest
example
of
similarity
search
is
the
Google
Lens
project
which
perform
reverse
image
search.
Concretely,
you
take
a
photo.
You
upload
it
on
the
Google
Lens
service
and
it
will
first
find
pictures
that
are
matching
visually
your
photo,
but
you
also
give
you
additional
metadata.
So
in
that
case,
I
take
a
a
picture
and
Google
tell
me
this
this
picture.
This
is
this
flower,
so
the
first
time
I
used
this
service,
I
was
really
blown
away
and
I
was
wondering:
how
is
it
working
and
I
don't
know
if
you
think
about
it
yourself?
A
How
would
you
implement
something
like
that?
Would
you
maybe
compare
to
pixel
comparison,
which
sounds
a
bit
difficult
or
maybe
rely
on
some
complicated
Edge
detection,
computer
vision,
library,
but
I'm
not
sure
this
would
be
as
general
as
What
Vector
search
proposes.
A
A
On
the
left
hand,
side
we
have
classical
lexical
search
pm25
that
relies
on
keywords
and,
on
the
right
hand,
side.
We
have
a
new
neural
search
approach
and
we
can
see
that
for
the
classic
approach.
The
first
result
is
the
capital
punishment
as
existed
in
the
United
States.
Since
then,
I
guess,
there's
a
date,
and
the
second
and
third
results
are
not
correct
either.
So
it
seems
that
the
lazy
call
search
has
removed
the
stopwards
in
the
query
and
has
really
focused
on
a
couple
of
keywords.
A
The
problem
here
is
that
keywords:
I
mean
words
have
different
meanings
depending
on
the
context,
so
this
really
tripped
the
lexical
search
approach,
Warehouse
on
the
right
hand,
side
we
have
the
neural
search,
and
here
we
see
the
top
hit
is
what
we
want
is
Washington
DC.
The
neural
search
has
been
able
to
really
understand
the
query
in
his
entirety.
According
to
the
context,
so
neurosuch
is
pretty
good
at
answering
long
queries.
A
Similarity
search
is
done
by
using
so-called
similarity
models,
those
models
they
are
trained
via
machine
learning.
There
are
a
lot
of
models
online
that
you
can
take
off.
The
shelf
websites
like
perking
phase,
but
you
have
to
be
careful.
A
specific
model
will
not
necessarily
fit
your
domain.
A
When
you
pick
a
model,
you
have
to
make
sure
that
it
has
been
trained
on
data
that
is
really
close
to
what
you're
doing
the
alternative
is
otherwise
to
build
the
model
yourself,
and
if
the
model
is
large,
it
can
get
pretty
expensive
for
the
remaining
of
this
presentation,
I
assume
that
we
have
a
a
working
model.
A
Otherwise
those
are
more
machine
learning
concern
what
is
a
module
anyway.
So
concretely,
your
model
takes
a
complex,
a
complex
object,
let's
say
a
picture
MP3
file,
and
then
it's
going
to
reduce
it
in
towards
a
vector
and
these
Vector
those
vectors
are
called
in
the
in
the
ml
jargon,
they're
called
embeddings,
and
now
the
secret
Source
photo
similarity
models.
Maybe
very
good
information
to
remember
from
this
talk
is
that
similar
inputs
are
vectors
that
are
close
to
each
other
in
space,
and
here
we
have
an
example.
A
We
have
a
query
and
a
document
that
we
both
encode
using
a
neural
encoder.
We
have
then
aquarium
bedding
in
the
document
embedding.
So
those
are
two
vectors
two
vectors
and
we're
able
to
compare
those
vectors
by
relying
on
the
cosine
similarity
metric,
so
cosine
similarity.
It
relies
on
the
angle
that
is
formed
between
those
two
vectors
to
compare
them.
That's
the
magic
here
is
the
first,
let's
say,
checkpoint
a
little
executive
summary
about
Vector
search
engine.
A
Now
we
know
that
those
vectors
search
engine
dnable
similarities
such
use
cases
like
semantic
search
for
text,
similar
images
recommendation
systems.
They
mostly
have
two
role:
they
perform
storage,
that
is
the
process,
durability,
vector
embeddings,
and
then
it
will
search
use
case.
That
is
finding
vectors
that
are
the
most
similar
to
an
input
vector
and
obviously
the
whole
game
here
is
to
do
those
things
very
efficiently.
A
We
got
to
make
this
very
concretely
with
an
example
that
is
taken
from
a
Blog
from
a
Blog
here.
The
goal
is
not
to
look
at
every
line
because
we
don't
need
all
of
those
we're
just
going
to
understand.
What's
going
on
we're
gonna
use
here,
the
python
the
python
driver,
because
this
is
what
most
data
scientists
are
using
here
in
this
first
snippet
I
picked,
it
I
wanted
you
to
really
understand
that.
A
It's
really
that
easy
to
load
a
model
in
Titan
using
the
sentence
Transformer
package,
just
one
line
model
equals
entrance
Transformer
given
name
now,
you
have
a
model
to
work
with
seventh
step
is
to
create
a
quadrant
collection,
so
the
collection
is
the
abstraction
that
is
holding
your
vectors.
That's
what
you're
going
to
work
with
on
top
of
doing
the
networking
configuration
we
create
the
collection
by
giving
it
a
name,
and
then
the
most
important
is
the
vector
configuration
we
have
to
inform
a
quadrant
of
the
dimension
of
our
vectors.
A
Here
the
size
is
384,
it's
absolutely
not
a
random
number.
This
number
comes
from
the
model.
The
model
produces
vectors
that
are
that
have
this
size.
So
they
must
be
an
exact
match,
and
then
we
also
tell
quadrant
what
is
the
distance
metric
we're
going
to
use
previously
we've
seen
the
cosine
similarity
here.
We
use
the
dot
product,
so
it's
possible
to
pick
whichever
distance
we
want.
A
Once
we've
created
our
collection,
we
can
ingest
data
into
it
and
the
interesting
bit
is
to
perform
a
search
here.
We
have
a
search
term
and
we
want
to
search
our
collection
with
that
term
and
we
cannot
just
send
that
search
term.
Earth
observation
to
quadrant
quadrant
doesn't
understand
strings
it
understands
embedding.
This
means
you
have
to
encode
the
search
term
as
well.
Then
you
can
send
it
via
this
search
function
on
the
driver.
A
A
Something
specific
about
quadrant:
we
are
not
dealing
only
with
Vector,
but
now
you
have
more
interesting
use
cases
you
can
attach
additional
additional
data
to
your
vector
here.
On
the
right
hand,
side.
We
have
an
example
where
we're
going
to
insert
three
points
into
a
collection
collection.
We
wear
mentioning
a
point
is
something
that
has
an
ID
a
vector
and
a
payload,
and
the
payload
is
just
the
Json
object.
You
can
attach
a
additional
data
that
will
that
can
be
used,
then
for
filtering.
A
So
if
you
think
about
it,
not
only
you
can
ask
the
question:
please
find
find
a
restaurant
that
is
similar
to
this
restaurant,
but
you
cannot,
you
can
say,
and
please
make
sure
that
it
is
within
a
radius
of
around
500
meter
from
my
place.
This
is
pretty
neat.
We
support
different
type
of
payload
here.
I
mentioned
geo
coordinates,
but
we
have,
of
course,
text
numbers
and
so
on
all
right,
let's
dig
a
bit
deeper.
Now,
how
would
you
implement
vectors
search
yourself?
A
I
always
ask
myself
this
one
I
use
a
tool
and
the
way
you
could
do
it
is
simply.
You
use
a
SQL
database.
You
create
a
table
and
it,
for
instance,
if
we
have
vectors
with
a
four
dimension,
with
just
four
columns
focal
lens
w
x
y
z,
and
you
just
insert
all
your
vectors
in
there
and
you're
done
now.
We
have
a
search.
What
do
we
do?
We
have
an
input
Vector
coming
in.
A
We
need
to
compute
the
similarity
between
our
vector
and
each
Vector
in
our
storage,
so
we
go
go
through
the
table
and
and
on
our
way,
we're
going
to
make
sure
to
remember
which
of
those
vectors
as
the
higher
similarity.
Then
we
have
it.
We
can
return
it
to
the
user.
I
guess
you
understood,
we
just
performed
the
Full
Table
scan.
A
This
kind
of
approach
can
work.
Fine,
if
you
have
a
small
data
set
I
would
say
between
10
000
vectors
to
100
000
vectors,
depending
on
the
side
of
on
the
dimension
of
the
vector,
but
quickly
you're
going
to
run
into
problems.
Here
generally,
if
you
are
doing
a
database
business-
and
you
are
running
into
Full
Table
scan,
you
kind
of
think
I
know
I
need
an
index
and
we
need
an
index
here
as
well,
but
we
need
a
special
index.
A
So
re
little
reminder
here
on
index
is
really
a
structure
that
is
optimized
for
reads
that
we're
going
to
build
ahead
of
time
to
support
our
queries.
Unfortunately,
you
cannot
just
use
traditional
indexes
here
like
slapping
a
posting
list
or
B
tree.
Fortunately,
they
are
already
specialized
indexes
for
vectors.
That's
not
something
new.
A
We
have
the
entire
GEOS
geospatial
work
there.
You
have
indexes
like
the
KD
trees,
which
enable
you
to
do
geo-coordinated
queries.
A
Once
you
have
those
indices
you
you're
able
to
perform
what
is
called
a
k
nearest
neighbors.
There
is
a
search
that
finds
the
nearest
neighbor
to
a
point.
However,
you
have
a
problem
with
those
is
that
those
space
partitioning
techniques
they
are
too
expensive
in
high
dimension.
A
This
is
known
as
the
curse
of
dimensionality
in
the
literature.
Basically,
all
the
tools
we
have
regarding
algorithm
and
data
structure.
They
start
to
break
apart
as
the
number
of
Dimensions
increase,
but
we
need
to
enter
very
large
Dimension.
So
how
are
we
gonna
go
about
it
usually
as
usual.
One
way
to
fix
your
problem
is
to
relax
the
constraints,
and
this
is
what
we're
going
to
do
with
something
called
A
and
N
search.
So
a-n-n
stands
for
approximate
nearest
neighbors,
and
here
the
the
constraint
we
relax
is
We're,
not
gonna.
A
Ask
for
the
exact
precise
result,
we're
gonna,
say
I'm.
Okay,
if
if
this
is
an
approximate
result
this,
so
the
trade-off
here
that
you're
performing
is
really
precision
versus
speed.
Here,
I
have
a
screenshot
that
shows
you
various
algorithm,
because
this
is
really
an
active
area
of
academic
research.
A
lot
of
people
are
working
on
this
and
you
can
see
on
the
bottom.
It's
the
Precision,
I
know.
If
you
can
read
you
see
a
one-half,
so
you
have
50
and
goes
90
percent
99
99.9.
A
So
you
see
we're
talking
about
approximate
result,
but
this
is
a
really
well
over
1999.
It's
highly
dependent
on
your
data
set,
but
this
is
what
kind
of
precision
we're
talking
about.
There
are
many
good
algorithms
there.
Globally.
There
are
three
categories:
you
have
three
based
ones:
the
hashes
and
the
graphs
for
a
quadrant.
We
decided
to
go
with
a
graph
days
algorithm
because
it's
more
flexible
and
the
one
we
picked
is
a
pretty
good
performance
performance.
A
Hnsw
stands
for
hierarchical
navigable
small
words.
Try
to
explain
it
properly.
A
The
trick
here
is
that
we're
going
to
separate
the
links
according
to
their
length
scale
into
different
layers.
So
on
top
we
have
the
long
length
long
length
middle
Links
at
the
bottom.
We
have
the
short
links.
This
way,
we're
able
to
use
our
graph
as
a
skip
list.
If
you
don't
remember
skip
list,
you
are
jumping
to
a
more
dense
representation
as
you
go
down,
if
you
want.
Like
a
metaphor,
let's
say
you
want
to
travel
to
the
other
side
of
the
planet.
First
you're
going
to
take
a
plane,
that's
your
Layer
Two!
A
Then
maybe
you're
going
to
take
a
train.
That's
your
layer,
one
and
then
maybe
you're
going
to
take
a
taxi
for
your
layer.
0.!
That's
how
you
think
about
it!
Then
you
you're
gonna,
go
through
this
graph
performing
a
greedy
search
you're,
going
to
take
your
query
you're,
going
to
find
the
the
point
that
is
the
most
similar
to
your
query
on
the
layer.
A
Two
then,
once
you
have
the
point
that
is
the
the
most
similar
you're
going
to
go
down
one
layer
and
repeat
this
until
you
reach
layer
0
and
hopefully
you
find
the
best
result.
So
this
process
is
approximate
because
we
do
not.
We
encode
only
at
most
M
friends
that
is
M
nearest
neighbor
per
vector
and
also
at
each
layer.
We
don't
perform
an
exhaustive
search.
We
only
look
at
EF
search,
node
per
layer.
A
A
There
is
an
implementation
of
hnsw,
but
we
could
not
use
it
for
two
reasons.
First,
reference
implementation
is
in
C,
plus
plus,
and
also
we
need
to
support
payload
filters.
Remember
we
do
not
only
look
for
the
closest
point.
We
also
want
to
make
sure
that
they
match
a
specific
criteria.
This
enable
many
more
many
interesting
use
cases
to
think
about
it.
There
are
two
approaches
to
do
filtering
on
hnsw.
You
could
do
what
is
called
post
filtering.
First,
you
go
through
your
index.
A
You
find
all
the
similar
similar
points
and
then,
from
this
result
set
you
will
filter
them
now.
This
has
a
problem
because
it
forces
you
to
overfetch
from
your
index
and
then
hope
that
you'll
have
enough
point
when
performing
the
filters.
The
other
approach
is
to
do
pre-filtering,
so
first
you're
going
to
go
through
some
kind
of
index
to
find
all
the
points
that
are
matching
a
specific
filter.
Once
you
have
this
result,
set,
you
will
check,
which
are
the
ones
that
are
the
closest
to
your
input
query.
A
So
in
that
case,
you
do
not
even
use
your
hnsw
index,
which
is
a
bit
sub-optimal.
Let's
say
those
two
approaches
were
not
good
enough
first,
so
we
have
a
third
approach
where
we
actually
enrich
our
graph
with
additional
links.
Regarding
the
payload
of
the
indexed
points,
this
approach
enables
us
to
perform
the
search
in
a
single
pass,
called
like
single
stage
search,
and
that's
really
one
of
the
key
points
that
makes
a
quadrant
fast.
A
If
you
are
interested
in
the
details
of
this
approach,
I
can
recommend
having
a
look
at
this
block
article,
all
right,
let's
go
back
to
the
distance
metrics.
So
by
now
you
should
understand
that
this
is
a
very,
very
common
operation
at
runtime.
We
do
it
a
lot
during
indexing
when
building
this
those
layers
and
also
at
search
time
when
going
through
the
graph.
A
We
have
several
similarity
metrics
available,
depending
on
the
encoder,
that's
kind
of
dependent
on
your
model
that
is
internally,
we've
decided
to
use
a
trait
to
abstract
away
the
concrete
metric.
A
A
So
the
dot
product,
a
quick
reminder
from
high
school.
Maybe
when
you
have
two
vectors
you
want
to
compute
a
DOT
product.
You
can
do
it
by
summing,
the
pairwise
multiplications,
that's
a
simple
definition,
and
here
are
we
able
to
do
it
using
the
the
nice
iterator
from
rest,
First,
We
Take,
the
first
Vector
we're
gonna,
zip
it
with
the
second
vector
and
as
we
go
down
through
it,
we
perform
the
multiplication
and,
at
the
end,
we're
going
to
reduce
it
with
a
sum.
A
A
Let's
say
we
ship
this
and
we
start
doing
a
lot
of
search
or
indexing
on
it
and
let's
have
a
look
at
how
the
system
responds
you're
doing
a
lot
of
search.
You
want
to
see
what's
going
on
on
your
CPU,
you
start
profiling.
It
and
then
you
find
out
that
70
of
your
CPU
time
is
spent
into
the
sum
function
of
the
iterator.
Let's
say
infrastructure.
A
If
you
can
see,
we
can
go
through
the
stack
bottom
to
top
start
at
the
bottom
from
our
Tokyo
runtime,
then
we're
going
to
go
up
our
Vector
storage,
abstract
abstraction
until
we
reach
something
called
hnsw
index
graph
layers.
That
should
ring
a
bell.
If
you
remember
what
we
talked
about,
and
then
this
is
the
point
where
we're
gonna
try
to
find
the
points
that
are
more
similar
to
each
other
by
calling
the
dot
similarity
routine
function.
That
is
basically
our
implementation.
A
So
you
see
this,
and
so
you
wonder,
is
it
slow?
Well,
to
be
honest,
you
don't
know
if
it's
slow
to
know
if
it's
slow,
you
would
need
a
different
implementation
to
compare
it
with
right.
For
now.
You
just
know
that
this
implementation
is
CPU
bound,
and
that
is
a
bottleneck,
and
this
means
that
if
you
remove
the
bottleneck,
you
most
likely
have
huge
gains
and
of
course
this
is
what
we
did
to
do.
So
we
used
CMD.
So
CMD
stands
for
a
single
instruction,
multiple
data.
A
A
You
have
the
x86
for
X8
x86.
You
have
the
old,
SSE
family
or
AVX,
with
the
latest
one
being
AVX
512.
for
arm.
You
have
Neon
those
instructions,
the
whole
they
all
have
different
width.
The
more
modern
instructions
have
larger
width
which
tell
you
much
data.
You
can
stick
into
it
and,
as
those
instructions
get
better,
they
get
new
functions
to
combine
results.
A
There
are
two
ways
to
use:
CMD.
Either
you
hand
code
the
intrinsic
yourself
or
you
realize,
on
auto
vectorization
from
a
compiler.
To
be
honest,
if
you
really
want
to
make
sure
you
you're
using
CMD,
you
have
to
hand
code,
it
I
would
never
rely
on
auto
vectorization.
A
Broadly
speaking,
there
are
two
classes
of
simple
instructions.
You
have
the
horizontal
one
which
take
a
single
vector
and
apply
an
operation
to
it,
to
reduce
it.
So
in
that
example,
this
is
a
horizontal
add.
We
add
all
the
elements
or
you
have
the
vertical
vertical
operations
which
take
two
vectors
and
apply
the
operation
on
a
pairwise
fashion
and
at
the
end
you
get
a
vector
in
this
example
is
a
vertical
ad.
A
A
There
are
two
ways
to
solve
this
problem.
You
could
do
it
statically,
that
is,
you
will
produce
different
binaries
for
each
micro
architecture.
This
is
doable,
but
this
will
put
a
lot
of
stress
on
your
release
pipeline.
We
went
with
the
a
different
approach
that
is,
we
are
detecting
at
runtime
dynamically.
The
features
supported
by
the
CPU
you
can
see.
We
have
those
configuration
block
blocks.
The
first
one
is
for
x86
AVX,
the
other
one
x86
SSC.
A
A
This
is
what
CMD
code
looks
like,
so
first,
it's
pretty
unsafe
even
internally,
to
make
it
work.
You
won't
have
to
realize
on
Raw
pointer
to
load
data.
In
that
case,
this
is
a
heavy
X2
I
believe
we
use
256
bits
register
now.
The
trick
that
you
have
to
remember
here
is
that
we
have
a
256
bit
register
and
we're
working
with
F32
float
that
is
able
to
fit
eight
floats
into
the
register,
and
here
the
magic
is
that
we're
gonna
process,
the
dot
product
eight
floats
at
the
time.
A
The
for
Loop
here
goes
through
those
two
arrays
and
perform
a
multiplication.
Addition
with
the
F
FM
ad
I
won't
go
through
the
details,
because
the
code
can
be
a
bit
tedious
and
at
the
end
we
do
a
sum
we
have
in
that
sum.
256
register.
We
have
eight
float
that
somehow
we
need
to
reduce
so
many
casts
out
and
F32.
A
A
Oh
by
the
way.
This
is
a
simplified
version.
The
real
version
is
called
dot
avix
as
well.
You
can
find
it
in
our
code
base.
It's.
It
doesn't
fit
on
the
slide,
but
if
you
want
to
go
fast,
sometimes
you
have
to
go
dirty.
A
Is
it
any
good?
Well
here
we
have
a
crota
and
Benchmark
that
compares
our
DOT
product
function,
iterative
base
with
the
CMD
one,
and
we
or
we
use
it's
important
to
be
precise.
Here
the
data
set
uses
vectors
with
a
1000
Dimension.
So
that's
pretty
large
and
we
see
a
10x
Improvement
as
we
are
processing
floats.
A
Eight
eight
floats
at
a
time
we
could
already
expect
8X
and
we
get
a
little
bit
more
because
we
also
making
the
addition
a
bit
faster
when
you
see
benchmarks
like
that,
you
you're
always
very
happy,
but
the
question
is
always:
how
is
this
performing
in
end-to-end
system.
A
Now
we're
gonna
see
our
seemed
impacts.
Indexing
and
indexing
Benchmark
I
took
100
000
vectors
of
Dimension
500
500
and
I've
indexed,
those
on
my
laptop.
So
let's
say
the
absolute
numbers
here
are
not
very
interesting.
It's
done
on
my
laptop
and
I'm
pretty
sure
the
other
process
is
running
at
the
same
time,
so
we're
building
this
hnsw
index
with
our
DOT
products,
and
we
can
compare
the
two
results
here.
We
can
see
that
the
dot
iterator
is
about
300
seconds
and
when
using
the
CMD
version,
We
are
under
100
seconds.
A
So
it's
a
really
great
gain.
I
mean
we
basically
changed
around
150
lines
of
code.
You
would
really
miss
out
not
using
a
CMD
for
this
use
case
and,
interestingly,
before
using
CMD,
we
were
using
the
open
blast
project
which
is
kind
of
a
reference
in
terms
of
speed
for
linear
algebra
operation.
It's
done
in
C
and
Fortune,
and
thanks
to
the
work
of
of
the
team
in
optimizing
our
CMD
operations,
we
now
have
similar
performance
with
that
project,
which
is
really
a
great
achievement.
A
A
bit
about
the
architecture
of
the
project,
he
said
we
have
a
collection
that
holds
vectors
internally,
a
collection
is
composed
of
several
segments
and
those
segments
are
kind
of
created
on
demand.
As
you
add
more
points,
each
segment
is
independent
in
terms
of
storage.
We
have
inside
the
same
and
we
have
the
vector
storage.
We
have
the
payload
storage
done
independently
and
those
two
are
done.
Approcity
within
rocksdb,
we
use
the
Russ
roxdb
binding
there
concerning
the
hnsw
index.
A
We
save
it
in
a
binary
file
that
we
handle
ourselves
those
segments
they
have
a
life
cycle,
they
can
be
full.
You
need
to
create
a
new
one.
Still
you
need
to
be
able
to
delete
things
in
there,
so
we
have
a
couple
of
optimizers
that
are
running
in
the
back
that
are
performing
patch
deletion
going
through
the
Tombstones,
sometimes
merging
two
segments
together
and
so
on.
A
A
Concurrent
programming,
I,
think
by
now
you
understand
that
quadrant
is
very,
is
inherently
stateful.
We
have
to
juggle
with
segments
indexes
different
storage
and
it's
quite
complex.
At
the
same
time,
we
need
to
manage
the
concurrent
access
on
this
data
in
Rust.
You
have
like,
broadly
speaking,
to
fashion
of
doing
a
concurrent
programming.
Either
you
share
data
between
threads
via
Channel,
which
is
a
bit
akin
to
the
actual
model
with
the
mutable
state
is
enclosed
within
threads,
or
you
do
thread
synchronizing
on
concurrency
Primitives
such
as
mutex
and
or
aw
log.
A
We
use
both
flavor
depending
on
the
use
case,
but
we
had
a
lot
more
as
a
fun
and
with
the
mutex
and
RW
logs.
So
this
is
where
I'm
gonna
go
a
bit
deeper
before
talking
about
the
aw
log,
the
mutex
mutex
protects
a
piece
of
data
and
enforce
a
single
thread
accessing
it
accessing
it
at
the
time.
So
it's
very
strict.
If
you
want
to
have
something
less
strict,
you
have
the
RW
lock,
which
gives
you
several
readers
at
a
time
or
one
writer.
A
The
beautiful
thing
about
those
those
API
in
Rust
is
that
they
they're
enforced.
The
proper
usage
is
enforced
by
the
type
system
here,
on
the
right
hand,
side.
This
is
an
example
that
I
shamelessly
took
from
the
standard
Library
you
can
see.
There
is
no
unlock
method
when
you
acquire
a
lock.
A
There's
no
unlock
to
do,
and
this
is
a
bit
difficult
to
understand,
for
people
coming
to
rest,
What's,
Happening
Here
is
that
when
you
acquire
a
lock,
if
you
do
a
log
dot
read
you
get
what
was
what's
called
a
read
God.
If
you
do
it
right,
you
get
a
right
God
and
as
long
as
you
hold
this
card,
you
are
holding
the
lock
either
you
need
to
drop
the
card
explicitly
or
you
wait
for
it
to
go
out
of
scope
to
release
your
lock.
A
The
nice
thing
about
this
API
as
well
is
when
you
have
a
lock
readlock,
you
can
access
the
data,
because
the
read
card
implements
the
ref
and
the
same
way.
When
you
have
a
right
God,
you
can
access
the
data
and
get
an
exclusive
reference
out
of
it
because
it
implement
it
implements
the
revenue.
Those
apis
are
really
nice.
We
use
a
lot
of
logs
in
our
code
base,
but
we
use
a
lot
of
logs
from
parking
lot,
which
is
another
crate
that
has
like
more
concurrency
primitive.
They
are
generally
faster
and
lighter.
A
A
Another
way
it
can
happen
is
if
you
perform
a
double
read
lock
on
the
same
thread,
because
in
Rust
locks
are
not
re-entrant.
You
can
run
run
into
this
issue,
but
I
was
talking
a
lot
about
locks,
but
you
can
get
Deadlocks
also
with
channels
if
you
start
laying
around
with
Channels
with
capacity
and
Cycles.
This
is
very
dangerous
as
well.
A
Those
issues
with
deadlock,
they're
not
caught
by
Rossi-
and
this
is
sometimes
surprising
people
expect
that
all
concurrency
issues
are
magically
taken
care
of
by
the
rust
compiler.
But
this
is
not
the
case.
This
means
fixing
those
no
fixing,
but
avoiding
those
issues
require
discipline
from
yourself.
A
A
Luckily,
there
are
some
tips
that
can
give
you
to
help
you,
because
we
did
have
I
would
say
a
dozen
Deadlocks
and
it's
absolutely
not
fun
to
track.
The
first
one
is
to
use
a
runtime
deadlock
detector.
This
is
a
feature
from
parking
lot.
This
is
a
build
feature,
so
you
need
to
enable
it
in
your
cargo
terminal
file.
A
We
do
it
only
for
CI
build,
so
our
CI
build
will
check
Deadlocks
and
it
saved
us
one
time
here
we
were
able
to
find
a
deadlock
in
our
cluster
status
and
when
the
deadlock
happens,
you
have
a
really
nice
stack
trace
and
it's
really
a
full.
So,
in
my
opinion,
the
deadlock
detection
is
such
a
killer
feature
that
it
justifies
using
parking
lot.
A
If
you
want
to
go
further
than
this,
and
you
want
to
try
some
bleeding
edge
technology,
you
can
try
a
static
deadlock
detector.
So
this
is
the
project
log
pad
from
Burton
Queen,
which
analyzes
your
code
and
try
to
run
some
simulation
to
find
potential
potential
issues
in
the
interaction
with
the
code.
We
discovered
this
project
because
the
author
just
came
on
our
GitHub
issue
tracker
and
said
you
potentially
have
a
double
read
lock
in
your
proxy
segment,
and
that
was
very
nice
of
him
and
we
fixed
it
right
away.
A
A
Let's
go,
distribute
it.
Having
a
single
note,
you
can
really
go
a
long
way
with
with
a
quadrant
first
with
straws,
then
the
Sim
this
stuff
on
a
single
beefy,
node
I
went
myself
with
more
than
30
million
vectors
and
it
was
very
blazing
fast.
However,
that's
maybe
not
the
in
the
architecture
you
want
to
go
with
on
production.
First,
you
want
to
stay
available.
If
you
have
only
a
single
machine,
it
goes
down.
You're,
gonna,
you're,
gonna
have
a
bad
time.
A
Also,
when
you
scale
you
want
to
scale
on
commodity
machine
scaling
by
hosting
only
really
beefy
machine
gets
very
expensive.
The
way
we
did
it
in
quadrant
is.
We
have
a
classical
short
replica
architecture.
When
you
create
your
collection,
you
say
how
many
shots
you
want
and
how
many
replicas
it's
a
pretty
straightforward
from
a
user
perspective.
A
We
do
not
have
leader
shots
for
right.
What
we
do
is
we
send
your
request
to
anyone
and
we'll
figure
out
where
the
right
should
be
executed.
Any
of
the
replicas
can
handle
of
writing.
Then
the
the
this
variety
is
propagated
to
other
replicas.
By
default.
We
do
not
have
transactional
operation
of
the
vector.
So
if
you
need
to
support
concurrent
right
on
the
same
points,
that's
something
you
have
to
parameterize
yourself
by
passing
a
little
extra
option.
We
will
know
that
we
need
to
serialize
those
rights.
A
The
way
we
do
it,
we
had
to
synchronize
some
kind
of
cluster
State
and
have
a
distributed
collection
topology,
and
to
do
so,
we
use
the
raft
consensus,
algorithm,
quick
101
on
raft
you,
the
rough
consensus
algorithm,
enables
you
to
have
your
node
agree
on
the
sequence
of
operation.
This
is
called
your
log
and
your
log
as
entries,
and
you
want
this
log
to
be
replicated
everywhere.
Everyone
needs
to
agree
on.
What's
on
the
log,
it's
critical
once
this
log
is
replicated,
you
can
execute
it.
Well,
it's
a
stylized
code
of
understate
machine
replication.
A
If
you
have
the
same
state
machine
everywhere,
if
they
run
on
the
same
data,
you
should
have
identical
results
in
rough.
There
is
a
single
leader
that
is
elected
and
the
role
of
the
leader
is
to
receive
the
proposal
for
the
new
entries
and
send
them
to
the
followers
and
once
the
follower
act,
I've
replicated
that
thing
it
tracks
it
and
when
a
majority
of
the
followers
has
replicated
the
an
entry,
the
leader
will
commit
it,
and
that
says
I'm
committing
this
means
follower.
Please
process
this
entry.
A
Now
it's
a
common
misunderstanding:
misunderstanding
with
rough
people
seems
to
think
that
this
is
some
kind
of
atomic
Global
distributed
con
consensus,
but
you
will
have
basically
lags
node
lagging
behind.
A
A
This
was
the
most
mature
Library
we
found
at
the
time.
It's
definitely
not
easy
to
to
integrate
in
a
code
base.
There's
a
complicated,
dense
to
process
the
message.
Plus
you
have
to
implement
your
own
persistence
layer
for
the
log.
You
have
to
be
really
careful
there.
We
had
some
questions,
but
luckily
the
maintenance
are
very
nice
and
very
responsive
so
kudos
to
them
going
in
this
kind
of
Adventure
I.
A
A
We
went
through
a
couple
of
new
indexing
schemes
to
support
approximate
nearest
neighbor
and
the
specific
implementation
of
H
and
SW
ell.
Now
you
know
you
can
apply
CMD
to
your
bottlenecks.
If
it's
possible,
you
should
do
it
here,
we're
doing
a
lot
of
float
F32
business,
but
even
if
you
do
backend
applications,
you
can
parse
Json
with
Cindy
Json,
just
give
it
a
try.
Modern
Hardware
is
really
fantastic,
Fearless
concurrency
in
Rust
sure,
but
except
for
Deadlocks,
and
this
is
the
responsibility
of
the
developer
to
make
sure
those
does
not
happen.
A
This
written
distributed
systems
are
hard,
that's
not
very
new,
but
I
had
to
put
it
and
basically
this
whole
presentation
is
I.
Guess
another
data
point
in
favor
of
us
for
building
concurrent
highly
distributed
systems.