►
From YouTube: Memory optimization: binding table evolution in NebulaGraph. NebulaGraph community meeting
Description
In this community meeting, Dr. Xuntao CHENG shared us his work on refactoring NebulaGraph BindingTable.
B
Okay,
we
have
yeah,
we
have
Shinto
joining
and
Alexander.
Okay,
hello.
Everyone
welcome
to
another
community
meeting
for
never
graph.
B
B
So
before
that
we
can
have
a
quick
interview
of
everyone
like
a
Roundtable
and
I'm,
not
sure,
if
you
can,
you
can
connect,
you
can
do
audio
or
video
Alexander
welcome.
B
Yeah
sure
I
will
start
from
myself.
This
is
weiku
I
I'm
I'm
from
the
team
of
navigraph
I
had
that
I'm
wearing
is
called
the
developer
advocate.
So
anyone
most
most
of
people
in
community
will
see
me
more
often
than
other
guys
so
feel
free
to
ping
me
in
slack
or
GitHub,
try
anything
yeah
and
maybe
Alexander.
You
can
help
introduce
yourself
to.
D
B
Okay,
cool
yeah,
so
in
the
end
we
will
have
a
sync
discussion,
so
we
can
have
more
discussion
or
question
and
answers
then,
and
thanks
again
to
join
us.
Then
I
see
John
is
listed
in
the
upper
size.
You
can
introduce
yourself.
E
Hi
everybody
is
we
soft,
I'm
very
happy
to
be
here
so
so
welcome
for
everybody.
B
Okay,
thank
you
and
finally,
our
our
presenter
today
Shinto.
Could
you
quickly
introduce
yourself?
Okay,.
F
Yeah,
hello,
everyone.
My
name
is
shinheimer
engineer
from
laborator
graph
I
before
I
joined
the
graph
I
was
a
colonel
engineer
from
Alabama
college.
I
worked
mostly
on
cognitive
databases
or
RTP
databases,
I'm
quite
new
to
the
graph
databases
kernels,
but
I
find
it
very
interesting,
yeah
cool.
Thank
you.
Welcome
to
John
Amber
community
meeting
today.
Yes,.
B
Yeah,
thank
you.
So
then
I
will
quickly
go
through
our
latest
news
in
in
the
in
the
project,
so
we
will
have
the
missing
every
four
weeks.
So
this
is
our
place
to
have
sync
discussion
as
some
sort
of
topic
sharing,
so
anyone
would
like
to
bring
any
topic
so
I
want
to
share
your
stories
on
how
you
create
something.
On
top
of.
B
Feel
free
to
let
us
know
and
share
here
yes,
so
we
actually
skipped
one
of
the
term
of
the
meeting
a
month
ago
before
it
was
a
long
public
holiday
in
China.
Sorry,
for
that
so
and
in
last
two
months
we
have
a
bunch
of
new
PR
merch
in
in
both
our
Cloud
database
core
and
the
surrounding
projects.
So
I
just
picked
some
of
them
that
were
worth
worth
it
to
mention.
B
So
we
are,
we
have
a
bunch
of
bug,
fixes
and
improvements,
so
these
are
some
notable
ones.
I
will
quickly
go
through,
but
you
can
feel
free
to
check
out
the
release
notes.
B
Actually
we
just
released
the
3.3.0
today
this
afternoon
in
in
the
Asian
time
and
like
weeks
ago,
we
have
a
minor
update
in
3.2
as
well.
So
this
is
a
like.
We,
we
introduced
the
vertex
filter
in
this
processor.
So
again,
neighbor
processor
is
one
of
the
processor
when
you
are
doing
go
from,
and
this
is
a
configuration
items
that
you
can
optionally.
B
If
you
know
what
you
are
doing
to
switch
the
operating
system,
page
cache
for
rackdb,
we
added
as
a
configurable
switch-
and
you
know
yeah
this-
this.
This
one
was
highlighted
because
it's
contributed
by
one
of
a
new
contributor.
So
it's
around
the
optimization
on
the
shortest
path,
and
this
is
the
start
of
a
long
waiting
feature
that
we
support
filtering
conditional
filtering
on
the
properties
when
you're
doing
the
sub
graph,
together
with
the
the
find
path.
B
Previously,
we
don't
support
the
vertex
property
filtering
in
the
fine
past,
and
now
the
both
vertex
and
the
edge
filtering
can
be
supported
together
with
the
subgraph
and
find
path.
So
this
is
a
a
strange.
New
processor
called
get
this
device
source
so
it-
and
we
introduced
this
to
even
push
the
go
query
even
further
in
the
performance
perspective-
and
this
is
big
previously
in
like
3.0
or
3.1,
we
introduced
the
the
bare
vertex,
so
those
vertex
has
no
tags.
But
after
like
two
minor
versions,
it
turned
out.
B
We
hear
from
the
The
Voice
from
the
community.
It's
not
that
useful,
but
still
bring
more
complex
cities
and,
and-
and-
and
you
know,
the
the
good
thing
is
less
than
the
best
thing,
so
we
decided
to
switch
this
off
by
default.
So
if
anyone-
some
rare
in
some
rare
cases,
some
user
relies
on
this
feature.
It's
still.
The
capability
was
capped,
so
you
can
use
it,
but
by
default
from
the
3.3.0
it's
by
default
turn
Switched
Off.
B
Oh
this
one
is
worth
it
to
be
mentioned,
because
previously
we
we
we
put
a
bunch
of
different
Community.
Only
the
experimental
only
features
with
only
one
switch.
That
means,
if
you
want
to
use
one
of
them,
you
have
to
switch
all
of
them.
For
example,
if
you
want
to
use
the
data
balance
feature
and
you
switch
the
experimental
flag
on
on
true
that
will
that
that
will
lead.
You
have
to
use
another
feature
called
the
toss,
the
transaction
over
storyside,
which
isn't
intended
in
most
of
cases.
B
B
We
optimized
the
profile
result,
so
it's
more
readable
and
we
add
another
function.
So
this
was
contributed
by
me.
You
can
now
pass
your.
You
know:
Json
string
into
a
hash
map
now,
so
some
in
some
cases
you
have,
to
put
you
know,
a
giant
Json
string
as
one
property.
B
You
can
leverage
this
function
to
make
your
life
easier.
So,
for
the
other
perspective
other
surrounding
projects,
one
of
the
big
thing
is
that
we
support
session
pool
now
in
in
all
the
officially
supported
clients.
So
it's
already
released.
We
actually
released
them
today
in
Gold,
Java,
C,
plus
and
and
Python,
and
another
thing
is
a
python
orm
which
donated
to
our
community.
It's
called
the
navalock
Karina.
B
If
I
pronounce
it
correctly
and
another
thing
is
we
have
a
new
another
project
donated
to
Naval
contribute
organization?
It's
called
another
real-time
exchange,
so
it's
have
a
similar
name
with
our
Naval
Exchange,
which
was
based
on
spark.
So
this
one
is
based
on
a
flank
CDC.
For
now
it's
only
connected
with
my
SQL,
so
you
don't
have
to
you
know
where
your
Watcher
on
on
my
SQL
Bean
log
and
can
wear
it
to
Kafka
or
off
link
and
then
connect
it
to
Naval
graph
from
now.
B
You
can
leverage
this
new
projects,
so
everything
is
connected
out
of
blocks,
so
you
can
just
ensure
your
your
schema
was
not
changed
in
in
our
mycore,
so
you
can,
you
know,
do
it
in
a
real-time
fashion
with
this.
So
if
you're
interested
just
check
out
this
project,
it's
it's
contributed
by
a
a
college
student
as
I
know
a
very
cool
project
for
other
things.
Oh
yeah,
this
this
one
was
a
PR
merged
by
a
new
contributor
that
he
introduced
quite
sweet
cast
function.
B
So
you
don't
have
to
know
all
the
types
of
given
results.
You
just
cast
it
so
everyone
everything
will
be
serialized
in
expected
way.
So
it's
much
easier
for
the
Developers
and
we
we
have
a
new
release
on
kubernetes
of
operator.
So,
with
this
new
new
release,
we
have
we
have
the
control
plan
image
in
in
arms
architecture
support
and
we
also
support
multiple
data
paths.
B
Previously,
we
assume
every
part
has
only
one
data
volume
connected
so,
but
now
it's
you
can
bring
more
of
them
another
one
that
is
not
yet
merged
to
master,
but
in
a
separate
branch,
one
of
another
University
students
contribute
this
project
to
enable
you
can
do
the
you
know,
query
on
top
of
to
be
consumed
by
a
navigraph
algorithm.
Previously
we
bypassed
the
graph
D.
We
we
do
the
scanning
directly
connecting
to
to
the
storage
layer
so
mostly
most
of
our
cases.
B
This
is
the
best
way,
but
in
in
some
cases
you
still
want
to
doing
the
a
graph
algorithm
on
top
of
a
small
set
of
data
that
you
want
to
rely.
The
flexibility
of
the
ndql
query,
so
this
is
doable
in
this
bronze
now,
but
not
yet
merged.
But
if
you're
interested
just
to
watch
this,
this
work
yeah.
This
is
a
related
Naval
graph
analytics
it's
not
open
source,
it's
our
Enterprise
only
offering
so
in
this
product
we
have
three
four
more
algorithms
introduced.
B
Another
final
thing
is
that
our
Naval
graph
Studio,
we
have
a
new
release
today
3.5,
so
we
have
a
bunch
of
new
changes.
Maybe
I
can
have
a
demo
in
next
meeting
four
weeks
later,
two
of
them
are
need
to
be
highlighted.
Now
one
is
we
have
a
you
know,
a
drag
and
drop
GUI
interface
to
you
know
create
schemas.
So
it's
quite
fancy.
B
Another
thing
is
we
have
a
starter
page
that
you
can
easily
have
some
bootstrapped
example:
data
set
to
be
injected
with
one
click,
so
it's
something
new,
so
that's
all
from
my
side
from
the
project
updates,
so
maybe
I
will
give
the
screen
share
to
Dr
shintao,
okay,
I'm
sharing.
F
It
seems
like
I
cannot
share
my
screen.
Okay,.
F
Yeah
I
can
see:
okay,
okay,
okay,
let's
be
so
hello.
Everyone
from
today
I'm
going
to
talk
about
our
ongoing
project
on
optimizing,
the
main
memory,
data
structures
for
the
nebulograph,
okay,
okay,
so
I'm
I'm,
going
to
talk
about
mainly
two
parts.
F
The
first
one
is
on
the
memory
detect
data
structures
in
nebular
graph,
including
our
current
implementations,
the
current
problems
and
we
are
addressing
and
the
technicals
what
we
are
doing
and
the
and
some
preliminary
performance
results
and
then
I'm
going
to
talk
about
some
future
work
on
our
abundant
table,
which
is
which
is
a
major
player
in
the
memory.
Data
structures
and
other
projects
in
parallel
that
we
are
working
on
currently
so
first
slides
come
to
the
bending
table,
so
binding
table
actually
consumes
the
majority
of
memory
spaces.
F
If
you
run
a
graph
query
within
Network
graph,
so
it's
a
it's
a
main
target
of
our
optimizations
in
our
daily
job.
So
what
is
a
Banning
table
and
in
this
figure
I've,
given
an
example
of
signing
table
with
respect
to
a
local
execution?
So
imagine
that
you
are
trying
to
execute
a
match
query,
which
is
to
fan
the
players
that
is
connected
to
a
team
Duncan
where
an
edge
type
of
lag.
So,
basically,
this
query
is
asking
what
player,
who
are
the
players
that
team
Duncan
likes?
F
So
to
answer
this
much
query,
we
have
a
execution
plan
consisting
of
several
operators
here,
the
the
final
one,
the
the
bottom
one
is
the
index
scan,
which
is
to
scan.
What
is
this?
It
was
a
time
with
the
name
team
Duncan
and
from
that.
What
is
this?
We
need
to
Traverse
a
grab,
a
traversal
graph
to
to
Fan
all
the
all
the
vertices
that
is
linked
to
the
team
Duncan.
What
is
this
where
the?
F
F
Okay,
so
between
each
adjacent
pair
of
operators,
the
bending
table
easier
for
them
to
exchange
data,
for
example,
when
the
index
scan
operator
loads
data
from
the
storage
and
and
does
its
scanning
of
the
data
where
index
on
the
player's
name,
it
will
produce
a
Banning
table
consisting
of
what
is
this
and
it's
a
outer
bound
edges
from
there.
We
have
both
the
graph
and
add
more
even
more
data
into
that
button
table,
and
at
this
stage
the
Banning
table
will
consist
of
the
the
vertices
that
is
linked
to
team
Duncan.
F
With
this
build
this
Edge
and
finally,
you
we
append
all
the
vertices
in,
in
addition
to
their
properties,
still
in
the
form
of
pending
table
where
we
do.
The
final
projection,
so
binding
table
is
basically
like
a
working
desk
that
we
continuously
add
some
data.
Are
we
remove
some
data
from
it
to
proceed?
Our
query
processing
and
so
it's
very
important
to
both
the
memory
consumptions
of
accurate
execution
as
well
as
overall
performance?
F
So,
let's
see
if
this
is
an
an
example
of
the
law
abandoned
table
in
local
execution
plan,
as
you
may
know,
that
demography
is
actually
a
distributed
database
and
we
separate
computation
from
Storage
so
in.
In
this
example,
I
show
a
full
functionality
of
planning
table
in
the
real.
F
Distributed
setup
country
in
in
community
water,
so
basically
we're
talking
about
two
Services
here:
one
is
a
Storage
storage
D,
which
is
a
storage
layer
and
another
is
a
gravity
which
is
a
computation
agent.
So,
in
the
previous
query
plan
we
need
to
do
a
scan
of
the
storage
to
find
the
vertices
of
Tim
Duncan
and
his
Autobahn
edges.
So
this
operation
is
pushed
down
from
the
competition
layer
to
the
story,
D
services
here
in
the
the
green,
the
green
one.
F
So
this
is
the
operator
that
is
pushed
down
from
the
graph
D,
so
this
green
operator
is
going
to
generate
a
Banning
table
consisting
of
the
vertices
and
then
Ultra
bound
edges
in
the
main
memory
of
the
storage
D.
So
the
the
rest
of
the
processing
will
happen
inside
the
graftee,
which
is
in
another
service.
F
So
in
this
execution
flow,
there
are
several
oahas
associated
with
the
pending
table
as
the
first
way
the
the
green
operator
is,
is
going
to
write
a
large
amount
of
data
into
the
spending
table.
Although
we
are
talking
about
very
simple
query
here
in
real
world
applications,
this
pending
table
could
be
very
huge
and
there
is
a
o
had
associated
with
serializing
the
spending
table
into
Network
packets
and
also
the
network
of
transformed
real
Network
and
the
overhead
of
what
deserializing
it
in
the
main
memory
of
the
gravity
service.
F
And,
of
course,
there
are
overheads
caused
by
reading
this
Burning
table
to
as
inputs
to
each
The
Operators,
as
well
as
writing
into
the
banding
table
to
materialize
results
and
so
on.
So
so
there
are
several
overheads
associated
with
spending
table
and
the
spending
table
consumes
not
only
a
memory
spaces.
It
also
consumes
our
network
capacity.
F
So
this
is
a
main
target
of
mineral
data
structure
organizations
we
are
talking
about
today.
So
there
are
in
our
current
implementation.
There
are
several
issues
associated
to
the
responding
tables.
Firstly,
in
our
country,
Community
version
we
are
using
as
TR
libraries
to
to
to
implement
this
data
structure,
which
is
basically
stdq
consisting
of
raw
records.
So
raw
record
is
a
class
that
we
Define
to
host
to
store
the
data
of
a
record
consisting
of
multiple
columns
and
each
column
is
a
value
within
this
record.
F
But
at
the
author
side
it's
basically
a
cddq.
So
so
we
are
relying
on
the
C
plus
process,
TL
data
structure
here
for
the
implementation
of
banding
table
and
within
this
raw
record
here,
I'm,
actually
a
big
memory
consumptions
to
store
the
data
as
well
as
type
because
we
allow
users
to
use
different
data
types
for
a
single
property,
for
example,
for
the
the
age
property
of
a
player
you
could,
you
could
specify
its
age
using
flow
numbers
or
using
integer
numbers
or
using
even
strings,
so
it's
all
accepted.
F
So
this
results
in
the
fact
that,
for
for
value
for
values
of
the
same
property,
it
actually
contains
of
data
with
different
types.
So
we
need
to
store
the
data
as
well
as
type
together
inside
the
value
data
structure,
and
this
value
is
often
a
nested.
So
a
value
could
be,
could
be
a
built-in
type
like
integer
or
float.
It
can
also
be
didn't,
have
like
list
map
or
node
or
ADD,
and
so
on.
So
this
value
could
be
very
huge
inside
the
main
map.
F
So
this
is
our
current
implementation
I'm
going
to
talk
about
the
detailed
implementations
later.
So
there
are
some
problems
to
address.
Firstly,
we
need
to
still
continue
to
allow
the
users
to
have
various
data
types
for
the
same
property.
This
is
a
necessary
functionality.
We
cannot
sacrifice
this,
so
this
is
something
that
conventional
RTP
databases
they
do
not
allow,
but
we
allow
that
in
graph
databases-
and
there
are
big
serialization
and
deserialization
overheads
among
the
distributed
nodes
of
the
Banning
table.
F
Although
it's
a
mean
memory
operation
and
our
current
implementation
on
burning
table
does
not
support
patched
or
vectorized
processing,
so
it
limits
the
the
thing
that
we
can
apply
in
the
query
engine
to
improve
performance,
and
we
need
to
refactor
this
this
minimum
data
structure
so
that
we
can
do
more
inside
the
query
engine
to
improve
the
performance,
and
we
also
need
to
increase
end-to-end
performance
and
quality
of
services
here.
A
main
factor
that
is
limiting
our
Qi
say
that
the
dependent
table
consumes
too
much
memory,
memory,
spaces
and
our
query.
F
Engine
can
open
right
into
om
and
crashes,
because
security
takes
too
much
memory
and
that
will
hurt
our
general
availability
of
the
the
Property
Services.
So
we
want
to
address
that
we
want
to
improve
our
overall
performance
as
well
as
as
well
as
our
quality
of
services.
So
these
are
some
basic
problems
that
we
that
we
need
to
address
at
the
level
of
the
dining
table.
F
Okay,
so
so
this
is
our
optimization
goes.
We
want
we,
we
want
to
support
multiple
types.
By
types
we
mean
it
can
consists
of,
building
types
like
like
integer
and
Float,
of
course-
and
there
are
also
some
graph
graph
types
like
node
Edge
map,
please
and
so
on.
F
So
basically,
these
types
are
a
container
of
the
the
built-in
types
and
the
used
quite
frequently
during
the
processing
of
a
query,
so
it's
very
important
to
support
them
and
also
to
support
them
in
an
efficient
way,
and
we
want
to
avoid
rebuilding
the
banking
tables
in
in
the
main
memory.
Every
time
after
we
transfer
it
from
the
storage
to
a
property,
it's
very
costly,
so
we
want
to
reduce
Network
cost
we
want
to,
but
also
we
also
want
to
make
the
banking
table
durable
during
execution.
F
So
this
feature
is
desirable
because
we
need
to
for
quite
often
that
the
query
graph
Theory
can
be
very
huge.
It
may
take
a
lot
a
lot
of
main
memory
to
finish
and
but
for
each
machine
there
is
only
a
limited
number
for
member
capacity.
So
so
so
we
need
the
capability
to
Pro
to
finish
a
query,
even
if
we
do
not
have
sufficient
memory
capacity,
we
can
do
this
by
using
a
very
big
water
memory
and
using
a
disk
species
to
to
fill
in
the
Gap.
F
But
to
achieve
that,
we
need
to
to
materialize
some
some
binding
table
from
them
in
memory
to
the
disk
during
the
execution
when
it
is
not
immediately
needed
about
the
computation
and
we
need
to
load
that
banning
table
back
to
the
memory
when
it
is
needed.
So
by
using
the
water
memory
Respect
by
the
a
huge,
durable
storage,
we
could
make
a
very
resilient
Theory
processing.
That
is
capable
of
processing,
arbitrary
large
theories
so
so
and
and
the
one.
B
F
F
F
Maybe
the
producer
okay,
okay,
sure,
so
previously,
yes,
so
basically,
this
is
a
summary
of
our
organization
goals.
We
want
to
support
our
various
types
and
reduce
your
memory
consumptions
and
by
types
we
mean
the
built-in
task,
of
course,
and
and
most
importantly,
the
graph
types
which
are
the
data
structure
to
store
the
node
IDs,
as
well
as
all
their
property.
F
B
Of
his
still
optimization
doing,
optimization
on
the
network
thing
just
second.
B
F
Okay:
okay,
okay,
that's
cool
yeah;
sorry
I'm,
using
my
my
laptop
directly;
okay;
let's
continue;
okay
right
yeah!
So.
B
This
is
a
summary
of
our
organization,
you're
not
sharing.
Yet,
oh,
no
sharing.
F
Okay:
okay:
let's
go,
let's
continue
so
sorry
for
the
for
the
for
the
voice
issue.
So
so
we
basically
we
want
to
reduce
the
serialization
this
realization
cause
of
pending
table,
because
we
not
only
need
that
to
reduce
the
the
memory
the
data
transfer
overhead
within
a
distributed
setup.
F
We
also
need
that
to
support
the
the
capability
to
process
arbitrary,
large
queries,
because
in
which
we,
we
probably
need
to
materialize
a
banding
table
during
execution,
so
that
we
can
use
the
the
huge
storage
capacity
to
form
a
waterfall,
very,
very
large,
waterming
memory
for
us
so
that
we
can
process
a
query.
Who's
who's
working
set
is
larger
than
the
physical
drams.
As
so
so
we
we,
we
need
a
Banning
table
to
be
easily
durable
during
execution
so
that
we
can
achieve
that.
F
So
this
is
one
of
our
major
transition
goes,
and
we
also
need
to
refactor
the
bending
table
so
that
we
can.
We
can
achieve
a
batch,
the
processing,
vectorized,
processing
and
so
on.
For
for
this
type
of
automatizations,
they
need
to
add
exercise
a
batch
of
sequentially
stored
data
within
the
main
memory.
So
our
accounting
implementation
cannot
deliver
that
so
that
we
need
to
change
the
data
formats
and
now,
of
course,
we
need
to
reduce
the
memory
footprint.
F
Okay,
so
so
this
is
our
current
implementation
of
the
of
the
of
the
the
data.
It's
it's
basically
a
union
of
a
set
of
values
of
different
data
types.
So
here
we
have,
for
example,
eight
byte,
integers,
32,
batch
integers
flow,
double
and
so
on.
We
also
have
the
data
tabs
like
list
map,
node
and
Edge
here,
for
example,
for
a
no
tab.
It
contains
of
a
node,
ID
and
and
a
very
long
list
of
properties.
So
each
property
is
actually
a
map.
F
It's
a
map
is
a
pair
of
a
string
and
no
value.
So
we
can
see
that
this.
This
data
structure
has
several
problems
of
the
first
way
to
use
out
of
place.
Memories,
for
example,
here
the
list
and
map
node,
Edge
and
so
on.
They
are
all
pointers
to
other
places
in
naming
memory,
so
we
are
using
that
alt
Place
Auto
Place
memory,
so
there
are
two
issues
associated
with
with
this.
F
They
are
not
stored
sequentially
because
of
we
are
using
this
this
kind
of
structures
and
for
this
list,
map,
node
and
ads
and
so
on.
They
are
all
nested.
Data.
Tabs
are
quite
complex
for
the
memory
usages
and
if
you
are
familiar
with
C
plus
plus,
you
know
that
the
actual
size
used
by
a
union
is
bounded
by
its
largest
member.
F
So,
for
example,
although
the
the
minimum
member
of
this
data
is
is
a
broad
type,
it
only
occupies
one
one
bat,
but
its
actual
size
is
bounded
by
the
the
size
of
the
this
very
large
member.
Like
double
all
these
pointers,
they
are
all
eight
bytes
I
think
because
they
are
all
very
considerable,
a
lot
better.
So
this
Union
is
actually
very
large,
although
for
many
cases
it's
only
stores
very
short
data.
So
there
are
many
padding
within
this
data
you
mean
so
this
is.
F
F
So
we
need
to
store
the
type
for
each
data
element,
and
currently
we
are
using
a
64
integer
for
this
8
byte
integer
for
this
type
enum
for
this
typing
num,
and
it
actually
consumes
a
lot
of
memory
spaces
because
we
only
have
like
a
a
tensor
a
little
bit
more
than
10
number
10
types,
but
we
are
using
a
very
large
species
for
the
type
information-
and
this
is
quite-
we
cannot
simply
reduce
the
size
of
this
data
structure
because
we
are
storing
a
data
as
well
as
type
in
in
an
array
of
structure
way.
F
So
it's
basically
a
very
large
array
contains
of
a
very
a
lot
of
structures
and
within
each
structure
there
is
a
data
and
its
type.
So
if
we
reduce
the
type,
for
example,
to
one
character
and
because
you
are
storing
it
in
addition
to
its
data
in
a
Sim
structure,
the
C
plus
compiler
will
add
some
padding
into
this
type
enum,
and
it
is
still,
it
is
still
going
to
cause
the
volume
number
eight
batch
in
the
main
memory.
F
So
so
we
can't
have
to
use
64-bit
memory
if
we
are
continue
to
use
this
a
real
structure
layout
for
our
binding
table,
and
this
is
very
costly
for
memory
spaces.
So
this
is
our
current
implementation
and
we
want
to
improve
it
so
to
in
this
one,
I
summarize
some
of
the
techniques,
some
of
the
basic
tactics
that
we
are,
we
are
using
to
optimize
the
rule
store
the
the
services
are
used
by
each
record.
F
We
are
introducing
a
concept
of
violent
record,
which
is,
which
is
a
record
that
is
able
to
use
memories
of
variable
lens
and
it's
very
quite
adaptable,
and
it's
going
to
store
all
nebular
value
in
place,
so
there
will
be
no
pointers
that
is
pointing
to
other
spaces
in
the
mean
memory.
So
we
are
removing
all
these
pointers
and
storing
all
all
its
data
sequentially,
all
together
in
the
main
memory.
So
by
removing
these
pointers,
we
are
not
using
Auto
Place
memories.
It's
going
to
support
a
faster
allocation
and
deallocation.
F
It's
also
going
to
make
the
serialization
and
visualization
much
faster
and
we
we
are
going
to
sequentially
store
all
the
all
the
the
data
for
all
the
records
in
mean
memory
chunks.
So
so
these
are
some
of
our
basic
augmentations
and
we
are
also
introducing
a
memory
management.
Currently
in
nebulograph
we
do
not
have
a
memory
management
module.
F
We
are
relying
on
the
the
PMR
function
of
C,
plus
plus,
but
for
the
for
the
time
being,
we
are
introducing
this
functional
multi.
So,
with
memory
management,
we
can
avoid
the
allocation.
The
allocation
of
very
small
memory
pieces
that
are
scattered
random,
randomly
in
the
main
memory
we
can
reduce
the
memory
fragmentations.
We
can
also
improve
the
memory
allocation
performances
of
small
memory
pieces.
So
in
the
query
engine
there
are
actually
a
lot
of
places
that
we
are
frequently
using
small
memory
pieces.
F
So
this
can
be
improved
significantly
by
the
memory
management
and
we
want
to
in
return
all
the
resources
back
to
the
memory
Arena
at
the
end
of
a
very
big
query
instantaneously,
so
that
that
memory
resources
could
be
released
and
used
by
other
queries
So.
Currently,
in
our
current
location,
we
have
this
problem
that
we
are
really
using
memory
at
a
very
slow
pace
because
for
each
data
element
we
are
using
a
lot
of
out
of
memory
pointers
and
all
of
these
pointers
are
basically
pointing
to
random
spaces
within
the
memory.
F
So
it's
very
slow
to
release
all
of
them
so
by
removing
that
and
to
release
all
the
memories
of
the
memory
chunks.
All
together
back
to
the
memory
Arena,
we
can
achieve
instantaneous
memory
release
that
could
help
us
to
to
to
to
reduce
the
memory
pressure
after
we
have
processed
a
very
big
cure.
It's
very.
This
function
is
actually
a
very
necessary
for
us
to
reduce
the
risk
of
out
of
memory
and
crashes,
and
we
want
to
support
the
processing
of
arbitrary
large
queries
with
bounded
minimum
capacity
that
this
is
also.
F
This
is
can
also
be
done
if
we
have
made
The
Binding
table
easily
durable
okay.
So
all
of
these
optimizations
are
what
we
are
doing
on
the
current
Joe
OS
layout
of
the
Banning
table.
So
by
robots
we
mean
we
are
storing
each
record
one
by
one
and
within
each
record
we
store
all
their
properties
all
together,
and
we
are
also
going
to
foreign.
B
G
A
G
So
can
I
ask
a
quick
question
while
he's
coming
online
yep,
so
what's
the
story
on
the
backup.
B
Backup
yeah,
you
mean
the
backup
restore
yeah.
B
Yeah,
the
now
the
backup
restore
supports
the
objective
storage
and
the
local
storage,
but
I'm
not
I'm,
not
sure
if
we
support
the
the
hdfs,
but
S3
is
supported.
So
basically,
there
is
an
agent
there
and
help
you
manipulate
the
underlying
HST
files
and
then
make
them
place
them
into
either
it's
local
file
system
or
push
to
the
objective,
object,
storage,
S3.
B
G
B
We
can
see
your
screen
now.
Can
you
I
I
was
I
was
talking
now
we
cannot
hear
you.
B
A
B
Oh,
he
will
try
reboot
the
laptop
in
the
Finish,
but
before
that
we
can,
we
can
continue
our
conversation
discussion,
Porsche.
Okay,
sorry,
so
your
question
was
something
is
wrong
with
with
your
backup
restore
process
or
yeah.
G
So
in
my
Standalone
cluster
I
try
to
backup
restore
and
it
did
not
work
then
I
reached
out
to
in
the
community
forums.
Then
I
saw
a
response
from
B
saying
currently,
backup
store
is
broken
and
he
he
told
us
to
wait
for
a
couple
of
releases
are
at
least
next
release
and
so
I'm
trying
to
check
if
it
is
restored.
B
I,
don't
have
this
Con
context?
Are
you
doing
it
on
top
of
kubernetes
or
baremental
deployment
on
top
of
kubernetes,
oh
yeah,
that
that
part
is
isn't
ready.
Yet,
as
with
the
issue,
because
something
you
know,
the
the
current
implementation
of
the
the
agent
assumes
it's
running
in
a
better
operating
system
together
with
the
the
service,
the
Manhattan
Storage
process,
and
when
the
pro
the
agent
is
handling.
This
restore
request,
the
things
that
that
you
know
the
control
chain
cannot
be.
B
You
know
implemented
in
in
current
implementation,
but
that
is
on
our
roadmap
and
we
are
actually
working
on
that.
So
we
aim
to
you
know,
make
it
work
like
in
in
in
next
season,
so
in
in
two
or
three
months:
okay,
yeah,
but
before
that
we
we
have
to
you
you
can
you
can
even
do
the
backup
as
I
recall,
but
the
restore
is
with
issues.
B
Before
the
official
support
you
know,
everything
was
polished
for
the
container
rights.
The
deployment
of
the
Beckham
restore.
Maybe
you
have
to
handle
it
from
the
underneath
snapshot,
because
back
home
restore,
is
actually
the
higher
abstraction
based
on
the
snapshot.
It
helps
us
do
a
lot
of
you
know:
Dirty
Works
copy
and
paste
and
the
file
file
movement,
something
like
that.
B
So
in
theory,
it's
still
doable.
If
we
we
do
the
snapshot
thing
and
restore
from
the
snapshot
thing
on.
My
our
own
is
still
doable
I
think,
but
we
have
to
yeah.
We
have
to
manipulate
things
instead
of
the
corresponding
volume.
You
have
to
mount
it
somewhere
or
you
or
you
have
to
ensure
the
part.
Is
you
know
it
is
runnable
and
you
access
those
file
system
and
you
you
ensure
the
snapshots
binary.
Is
there
you
you,
you
follow
the
snapshot
procedure
of
documentation,
so
ideally
you
can
do
that.
B
Welcome
back.
We
can
hear
you
now.
F
Computer
and
I'm
not
sure,
what's
going
on,
what's
going
on
today,
sorry
so
yeah
yeah
can
I
see
my
slide.
Yes,.
B
F
Clear:
okay:
okay,
let's,
let's
continue
so
so.
Basically,
we
are
building
a
very
large
linked
list
linking
a
set
of
chunks
so
for
each
column,
now,
storage.
For
each
for
each
column,
there
will
be
many,
a
different
number
of
column
chunks
and
we
we
support
both
dense
format
and
sparse
the
format
at
the
same
time,
for
our
sponsor
format.
F
If,
if
a
attribute
of
a
particular
particular
record
does
not
have
a
a
none
now
value
in
in
a
column,
we
are
going
to
skip
its
skip
this
record
within
that
column,
so
that
we
can
reduce
the
overall
memory
consumptions,
but
for
the
tensor
layout
we
are
storing
all
the
attributes
of
all
the
records
in
all
the
columns,
regardless
of
whether
it's
a
law
value
or
it
has
a
some
Nano
balance.
F
So
we'll
support
those
formats
and
within
a
column
table
which
there
are
a
linked
list
of
column
chunks,
as
shown
in
the
previous
slide
and
within
each
column,
chunks.
We
are
using
a
bitmap
called
Low
occupancy
bitmap.
This
bitmap
is
going
to
mark
all
the
all
the
records
whose
attributes
4
within
this
column
chart.
They
are
marking
whether
This
Record
has
a
null
value
or
a
noun
value
in
this
column
chunk.
So
if
the,
if
some
bit
is
zero,
it
means
this
record
does
not
have
a
real
value
and
we
actually
skipped
its
physical
storage.
F
But
it's
it
exists
logic.
Logically,
in
the
data
storage,
and
if
some
bit
is
one
is
true,
it
means
we
have
a
physical,
real
value
for
that
attribute
for
that
recorded
and
that
value
is
stored
within
this
column.
Chart.
So
by
using
this
row
occupancy
bitmap
we
can.
We
can
deliver
the
same
functionality
in
the
sparse,
the
data
layout
and
in
the
density
data
layer.
F
We
don't
need
the
speed
map,
because
we
are
basically
storing
all
the
attributes
of
all
the
radicals
in
in
all
the
chunks
and
because
we
are
allowing
the
users
to
to
to
use
various
data
types
and
we
we
want
to
store
that
data
tabs
still
in
addition
to
their
data.
So
we
have
introduced
the
data
structure
called
the
tab,
vector
so
type
factor.
A
single
tab.
Vector
has
a
fixed
lens,
which
is
about
32
beds
or
64
bets.
F
We
fix
its
size
so
that
we
can
easily
manipulate
it
using
same
D
intrinsics
like
AVX,
512,
FX,
256
and
so
on.
So
we
can
use
simply
intrinsics
to
accelerate
the
operations
within
a
single
tap
vector
and
for
within
a
tab
Vector,
we
are
recording
the
memory
offsets
for
the
attributes
of
the
same
data
type
within
the
the
column
chunks.
For
example,
if
we
have
inserted
100
attributes
of
the
same
integer
32
data
tab,
we
only
need
to
mark
it
starting
offset
and
its
end
upset.
F
We
only
need
to
mark
this
data
type,
which
is
probably
zero,
zero,
zero
zero.
It's
only
for
bits
for
the,
for
example,
for
the
integers
32
data
Hab,
and
we
only
need
to
Mark
its
a
starting
upside
and
an
offset
so
by
by
marking
this
data
we
need.
We
know
that
all
the
data
between
these
two
offsets
have
the
data
type
of
some
some
code,
which
is
probably
integer
32..
F
In
a
sequential
way,
so
within
a
data
chunk
because
we
are
allowing
different
data
tabs
so
that
and
the
triple
does
not
have
a
fixed
size,
so
some
attribute
may
be
larger
than
the
rest.
So
this
is
a
trouble
if
we
are
inserting
data
into
a
data
trunk,
because
we
we
does
not
know
how
much
memory
spaces
that
this
record
will
end
up
with
using
after
we
have
inserted
it.
F
So
we
have
we
borrowed
that
this
idea
from
the
postgres,
which
is
to
insert
the
data
from
the
height
to
the
tail
and
insert
the
tab
vectors
from
the
tail
to
the
head.
So
by
doing
this
we
can
always
appending
data
to
the
to
existing
attributes
at
the
end,
because
this
upside
is
marked
in
a
chunk
header.
So
so
we
can
always
append
data
to
the
attribute
and
we
can
always
append
directors
after
existing
type
actors,
because
we
marked
this
offset
in
the
chunk
header.
F
So
when
this,
when
these
two
offsides
meet
in
the
middle
of
this
Chunk,
we
know
that
this
trunk
is
through
the
field
and
we
need
to
allocate
a
new
chunk
for
incoming
data.
So
this
is
the
basic
data
structure
for
the
column
chunk
where
we
install
the
attributes
from
the
head
hotel
and
the
tab
actors
from
the
tail
to
the
head,
and
this
format
is
what
we
learned
from
postgres.
F
It's
actually
quite
useful
to
support
the
storage
of
data
with
various
lens
various
lens,
and
if,
if
we
are
allowing
users
to
use
data
with
different
types,
we
are
basically
allowing
them
to
to
use
database
different
lens.
So
so
we
use
this
setup
and
we
are
also
Pro
going
to
provide
an
index
for
this
memory.
Banding
table
to
accelerate
0.0,
carbs
and
range
low
carbs
in
in
this
slide.
F
I
show
an
example
of
our
ongoing
work
on
providing
a
hash
table
index
for
the
planning
table
and
within
its
index
we
are
using
several
techniques
to
reduce
the
memory
footprint
of
the
memory
consumptions
of
hash
table
itself,
as
well
as
the
memory
used
to
store
all
the
hash
keys,
because
a
hash
key
could
be
very
long,
and
this
long
hash
piece
is
going
to
be
a
problem
if
we
are
storing
a
lot
of
entries
in
hash
table.
F
So
so
we
use
this
fingerprint
technology
to
to
pass
a
key
as
a
string
to
it
to
a
fingerprint
and
use
this
fingerprint
as
the
key
for
the
hash
table
now
where
in
which
we
use
a
Delta,
try
implementation,
which
is
a
very
memory
efficient
implementation
for
her
stable
and
after
the
hash
table.
This
keyboard
link
us
to
a
payload
and
that
payload
is
stored
in
the
column
chunks
of
the
dining
tables.
So
by
using
this
set
of
data
structures,
we
can
achieve
very
fast
Point
lookups.
E
Anyway-
and
this
is
what
I
suggest
so
we
probably
finished
the
conversation
with
Posh
for
the
BR
part,
and
we
need
to
work
with
shintao
to
make
do
a
recording
of
what
of
his
presentation
and
re-edited
the
session
so
that
it
can
be
shared
afterwards.
B
B
B
A
B
B
B
Guys
Shinto
said
it's
almost
finished,
so
maybe
this
topic
can
can
be
called
end
now
do
do
you
have
any
questions
to
Shinto
so
I
can
you
know
type
him.
F
F
B
B
Porsche
actually
asked
how
where
we
we
are
using
The
Binding
table.
So
maybe
you
can
give
a
brief.
F
Yeah
I
actually
have
a
slide
showing
that,
let
me
try
to
you
can't
hear
me
right.
I'm.
B
F
I'm
afraid
of
you,
yeah
I
have
a
slide
showing
where
we
are
using
the
spanning
tables.
Here
in
the
distributed
site
Hub,
we
are
using
the
spelling
table
to
to
gather
the
data
that
we
loaded
from
the
storage
and
we
transfer
this
data
back
to
the
gravity
and
between
all
the
operators
and
also
within
each
operator.
They
are
consuming
the
data
entries
from
the
balance
table
and
Proto
producing
their
results
back
to
the
bending
table.
So
it's
basically
a
data
container
that
we
use
throughout
the
entire
query
processing
procedure,
yeah.
B
Okay,
so
I
shouldn't
I
have
a
question
so
regarding
regarding
your
what
you're,
trying
to
refactor
it,
and
one
of
the
points
you
highlighted
was
to
avoid
the
rebuild
on
this
perspective.
Do
you
mean
that
you
remove
the
the
pointers
of
this
table
object?
So
is
that
the
where
it
you
know,
save
us
from
the
rebuild,
because
everything
is
pointed
with
the
offset
instead
of
a
pointer.
B
A
B
Yeah,
so
we
can,
you
know
we
can
manip.
We
can,
you
know,
pass
the
this
data
as
AIDS.
We
don't
have
to
you
know,
rebuild
the
everything
in
the
pointer
and
in
case
it's
passed.
We
have
to
okay
understand.
B
Thank
you.
Thank
you.
B
We
should
have
I
should
have
drive
a
rehearsal
previously
before
we
were
doing
this
I'm.
Sorry
for
this,
it's
my
fault.
Thank
you!
So
much
Dr
Shinto
and
it's
quite
you
know
it's
hardcore,
but
I
think
it's
it's
quite
insightful
for
the
you
know
for
us
and
maybe
upcoming
guys
joining
our
community
to
understand
more
of
nebula
grab
and
how
it
works.
Underneath.
B
Thank
you
so
much
and
it's
okay,
it's
okay!
Shinto
in
the
chatting.
He
said
sorry
about
this:
it's
not
your
fault!
B
Okay!
Maybe
we
we
we
come
to
I
will
share
my
screen.
We
come
to
a
final
part,
the
the
sync
discussion.
So
we
can
continue
Posh.
If
you
have
any
more
to
discuss
and
Alexander
well,
we
can
do
now
and
afterwards
we
can
close.
This
yeah
do.
G
You
know,
since
I'm,
going
to
make
another
try
on
the
backup
and
store
and
get
back
to
you
on
my
listed
latest
experience.
But
one
thing
for
sure
I
would
like
to
mention.
Is
we
are
using
nebula
Helm
charts
for
deploying
cluster
on
kubernetes,
so
the
cluster
when
I
say?
Are
you?
Can
you
hear
me?
Yes,
okay?
So
the
the
issue
with
the
cluster
is:
if
I
wanted
to
attach,
additional
volumes
are,
are
Dives
to
the
part,
and
there
is
no
is
there
any
way
I
can
attach.
B
No,
no,
you
know
we're
doing
it
kubernetes
fashion,
so.
B
Although
it's
underneath
the
the
Precision
the
volume,
but
we
we
don't,
we
don't
suggest
users
to
actually
manipulate
things
on
the
volume
level
because
the
because
the
the
parts
of
the
storage
they
are,
they
are
stateful,
but
they
are
binded
to
specific
volumes.
So
we
cannot,
you
know,
take
over
by
manual,
so
it
will.
It
will
be
broken
or
introduce
conflict
between
our
manipulation
and
the
operator,
because
Operator
Operator,
somehow
is
you
can
treat
it
as
another
administrator.
B
You
know
to
manipulate
the
the
classer
and
you
just
talk
it
with
the
crd,
the
yaml
file,
so
I
don't
recommend.
Previously
we
have
a
conversation
regarding
you
want
to
leverage
like
what
we
are
doing
in
NFS
in
a
non-uh
cloud
application,
but
in
kubernetes
this
isn't
the
recommended
way
to
doing
that.
G
So,
even
in
in
I
want
to
understand
this
more,
let's
say,
for
example,
the
PVC
that
you
create
for
the
storage.
Do
you
use
single,
read,
write
or
read
and
many
write,
what
kind
of
configuration
you
have.
B
It's
it's
not
assumed
any.
You
know
how
how
the
stories
was
lying
on.
We
actually
suggest
use
to
not
use
Raid
just
instead.
Just
we
just
use,
you
know
a
PV
bond
by
the
cloud
provider
or
even
in
certain
cases
you
can
use
the
local
storage
because
you
can
optionally
to
you
know
enable
replica
on
the
storage
space
level.
So
it's
still
acceptable,
but
we
just
I
recommend
to
use
the
fast
backhand
PV
provider.
G
Cool
and
that's
what
we
have
done,
what
we
have
done
is
we
attached
Prime
SSD
and
disks
PV
series,
storage,
SSD
storage.
The
moment
we
attach
a
SSD
disk,
they
they
are
not
read
many
right.
It
is
a
single
read.
Only
single
read
not
read
many
or.
B
G
Right
right,
so
that
means
one
card
can
read
and
one
part
can
write
to
that:
SSD
storage.
So
instead,
if
we
use
like
let's
say
because
of
we
are
all
Azure,
we
are
not
using
Amazon
at
all
AWS
at
all.
So
if
I
use,
Azure
storage,
NFS
drive
or
Azure
file
drive,
it
is
write.
Many
read
many
drive,
so
a
one
drive
I
can
attach
and
I
can
write
as
many
storages
as
needed.
G
But
thing
is
I'm
worried
about
the
performance
of
nebula,
using
storage
disk
as
disks
as
a
storage
versus
SSD.
B
Yes,
I
I,
don't
recommend
to
enable
the
mouthful
red
or
anything
hack
like
this.
The
reason
is.
Actually
we
can
do
that,
even
because
you
know
the
the
operator
itself
is
it's
open
source
you
can
customize
it
and
also
you
can
even
customize
the
all.
The
resources
in
your
kubernetes
cluster
by
default
is
single
right.
B
Multiple
reader
I,
don't
recall,
but
you
can
do
that,
but
I
don't
recommend,
because
you
cannot
guarantee
the
performance.
For
example,
you
know,
although
we
are
just
using
a
cloud,
you
know
you
the
the
PV
provided
by
Azure
or
AWS,
but
underneath
there
could
be
some
kind
of
affinity
like
maybe
certain
volume
is
stating
together,
more
close
with
your
part,
either
it's
in
the
Numa
node
awareness
or
you
know
the
data
switch
closeness.
B
So
when
one
disk
mapping
tools
are
given
a
part
or
given
node
is
a
it's
more
guaranteed
from
my
perspective
and
I,
don't
think
it's
worth
to.
You
know
re
risk
here,
because
I
think
the
database
here
is
quite
critical,
Mason
critical
situation.
So
it's
not
worth
it
to
hack,
search,
multiple,
read
approaches.
Another
reason
is,
you
know
in
this
whole
operator
concept,
the
disks
are
binded
or
logically,
one
per
one
pod
instance.
B
So
if
you,
you
know
enable
some
sort
of
multiple
read
you,
you
have
to
break
this
concept,
so
you
have
to
change
a
lot
of
things.
I,
don't
think
it's
even
worth
worth
it.
But
anyway,
if
we
we
are,
we
know
what
we
are
doing.
We
we
can,
we
can
try
it,
and
this
is
even
quite
easy
to
enable
you
know
the
PV
in
multiple
read
actually.
G
Because
I,
the
probably
the
the
most
simplest
method
here
could
be,
if
we
just
think
about
it
right,
let's
say,
for
example,
if
the
helm
charts
allows
an
optional
additional
drive
to
be
mounted
on
storage,
let's
think
about
it
right.
If,
if
I
can
mount
an
additional
drive
on
storage,
then
all
I
have
to
do
is
run
the
backup
and
copy
manually.
For
now,
from
the
local
storage
to
to
the
additional
volume,
I
can
move
the
data
right.
B
I,
don't
think
that's
the
case,
but
generally
you
can
somehow
do
it
because
yeah
the
files
are
there,
but
there
are
still
some
running
con
context
there.
You
you
mean
there.
There
could
be
some
corruption
if
you
just
copy
on
the
fly,
but
that's
more
doable.
Actually
previous,
previously
I,
just
I
I
also
looked
into
look
into
how
we
we
set
this
multiple
reader
policy
in
the
in
the
operator
code
and
I've
actually
find
where
we
can
change
it
and
I.
D
B
Can
enable
you
know,
snapshot
thing,
but
maybe
we
don't
just
do
the
directly
do
the
copy
pasting,
but
we
can
leverage
the
naval
graph
snapshot.
We
can
like
Mount
another.
You
know
tooling
pod
with
some.
You
know
some
other
handy
tours
inside
this
pod
and
enable
the
actually
multiple
read
of
those
volumes
together
in
maps
to
mount
it
to
one
part.
So
we
can
easily
perform.
You
know
more
hacky
snapshot
thing.
Yeah
I
think
that's
totally
doable.
Yes,.
F
B
If
it's
it's
considerable
to
be,
you
know
more
General
use
case.
We
can
even
trying
to
expose
this
policy.
Multiple
policy
in
the
in
the
Upstream
operator,
repository
and
but
by
default
is
a
single
read,
but
we
can
try
that
yeah,
okay,.
G
B
G
B
G
It
yeah
yeah
I
thought
that
actually
previously
you
had
that
way
and
then
you
required
two
separate
thing,
because
that
is
very
expensive.
Having
two
discs,
because
the
volume
is
very
low
and
I'm,
not
sure
why
I
really
understand
you
know
why
you
guys
made
it
a
separate.
B
B
B
G
Okay
and
and
one
more
thing
when
I
write,
we
write
to
the
nebula
storage.
Currently,
we
should
use
nebula
exchange,
but
currently
we
are
using
insert
operations
to
insert
data
into
nebula
graph,
but
what
I'm
seeing
is
number
of
SSD
files
on
the
storage
are
huge.
B
Oh
yeah,
that's
the
lsm3,
the
you
know
the
the
the
right
Implement.
You
know
the
you.
You
will
have
like
a
multiplied
like
three
or
more
times
when
you're
writing
initially,
but
after
that's
the
by
nature
of
the
lsm3,
but
after
other
compaction,
it
will
shrink
to
the
expected.
You
know
multiplier
of
the
storage
I.
G
I
tried
I
I
tried
to
compact.
Even
then
I
see
a
huge
number
of
SSD
files.
How
many?
What
is
what
is
the
math
here?
How
many
SSD
files
I'm
supposed
to
see.
B
Shinto
do
have
a
quick
idea
on
on
this,
but
as
I
recall,
we
have
some
on
the
final
numbers.
We
have
a
configurable
a
place,
so
I
can
check
and
come
back
to
you
later.
Actually
we
talk
about
this
configuration
in
our
discussions
last
week
regarding
this,
the
the
file
size
and
the
file
numbers.
So
it's
configurable
actually,
but
I
don't
have
the
idea
on
by
default.
How
many
we
should
expect
I,
don't
have
idea
for
now
right
now,
but
we
can
talk
offline
later,
yeah
yeah.
G
Okay,
okay,
that's
that's!
That
would
be
another
wonderful
thing
that
can
help
us.
B
G
B
G
Yeah
no
to
vect
when
you
say
no
to
back
it
is
so.
Let
me
say
this:
we
have
our
graph
so
far
in
our
POC.
We
are
using
for
Knowledge
Graph
medical
Knowledge
Graph
right.
So
when
we
do
our
graph
analytics
node
to
back
using
your
implementation
yeah,
it
is
running
out
of
space
because
you
are
using
Graphics
Library
spark
Graphics,
Library
yeah,
and
that
is
very,
very
I.
Don't
know
how
much.
B
Yes,
the
advantage
of
graph
x
is
most
all
of
the
data
engineering
team.
Has
the
competence
on
this
infrastructure
and
on
on
this
you
know
this
data
stack,
but
it's
not
perfect
or
it's
not
in
a
good
reputation
in
the
you
know,
resource
memory,
utilization
and
that's
why
we
we
provide
alternative
in
our
Enterprise
offering
in
analytics,
but
you
can
you
can
request
another
trial
of
analytics
to
to
see
how
many,
how
you
know
the
memory
utilization
comparing
to
the
you
know
the
graphics
level
algorithm
you.
B
B
Yes,
it's
it's!
Basically,
the
vanilla
graph
x
implementation
that.
B
Lot
yeah,
because
it's
usable
so
a
lot
of
community
users
can,
you
know,
benefits
from
it.
But
when
it
comes
to
you
know,
resource
visualization
requirements.
So
it's
better
to
go
with
the
enabler
analytics.
Yeah.
B
G
The
next
meeting
four
weeks
later,
okay,
so
then
I'll
come
back
by
that
time
and
I'll
get
back
to
you
sure.
Thank
you,
okay!
Thank
you.
So
much
thank.
B
D
I
have
a
question
I
also
about
graph
algorithms,
so
we
are
quite
new
in
the
network
graph
and
I.
Familiar
movies
tiger
graph
with
neo4j
and
I
do
have
a
plans
to
create
some
during
full
language
or
mapreduce
inside
the
database
to
be
able
not
run
algorithms
as.
C
D
Previous
speaker
said
on
the
separate
infrastructure
with
the
graphics
button
size
of
the
database.
D
Yes,
I
hear
that
I
I
know
that
there
is
a
graph,
analytic,
nebula
Analytics,
but
one
of
the
main-
let's
say
one
of
the
main
point
for
analytics:
it's
create
own
graph
for
algorithms
and
it's
sometimes
you
need
to
modify
some
standard
algorithms,
but
in
many
cases
you
need
to
tune
to
work
with
particular
business
case.
D
So
how
you
solve
this
for
now
or
in
future,
you
plan
to
solve
I,
just
try
to
understand
the
the
way
from
the
business
task
to
the
running
algorithm
on
the
graph
database
and
using
the
graph
features,
etc,
etc.
I
want
to
understand
this
Quest.
B
B
You
know
native
gravity,
but
for
now
we
don't
have
this
kind
of
offering,
but
the
analytics
itself,
because
we
are,
you
know
we
have
a
separation
design
between
the
computation,
the
graph
layer
and
the
storage
layer.
So
that's
actually
bring
our
benefits
that
you
can
treat
an
enabler
analytics
in
some
in
some
way
as
our
native
analytical
platform,
but
as
I
understand
you're
expecting
you
know
to
do
some
more
procedures
in
in
the
writing.
Query.
B
Instead
of
calling
analytical
platform,
you
want
the
flexibility
to
just
read
like
The
neo4j
apoc.
You
know
to
to
do
some
more
complex
analytical
thing
in
in
Naval
graph
itself.
Instead
of
other
platform.
Is
that
your
most
concerned.
D
D
For
example,
there
is
some
standard
algorithms,
like
page
rank
or
like
this
centrality,
and
you
need
to
modify
this
one
and
for
some
for
some
reason,
for
some
cases,
not
only
parameters
of
these
Cell
games
that
exposed,
but
some
logic
changes
the
logic
of
algorithm,
yeah,
yeah,
yeah
and.
E
B
Now,
actually,
if,
if
Graphics
can
fit
your
requirements,
you
know
on
the
resource
utilization
perspective.
In
that
case,
you
can
actually
use
our
nebula
algorithm.
You
can
call
it
from
a
command
line
in
sub
spark
submit,
but
in
that
case
you
lose.
B
The
capability
on
you
know,
modify
or
you
know,
chain,
Mod
change,
how
the
page
rank
behaves,
but
in
if
you
are
coding
it
in
in
a
sparkly,
you
can
do
the
Scala
thing
to
you
know
just
leverage
it
with
more
modifications
and
you
can
even
Fork
the
patreon
can
change
some
of
the,
how
it
it
will
iterate
and
that's
doable,
and
that's
if
you
are
with
the
background
of
Scala,
which
I
don't
have.
That
will
be
easier
right.
Even
you,
your
python
background,
you
can
use
the
price
work,
that's
still
doable.
B
On
the
other
hand,
we
one
of
our
community
users
are
maintaining
a
4
bulk
of
never
graph,
that
they
introduce
something
similar
to
neo4j
apoc
and
they
will
Upstream
to
it
to
our
community
in
upcoming
months.
So
I'm
not
sure
if
that
can
fit
your
requirement,
but
the
the
cons
of
that
implementation
is
we
it's
not
a
map
reduce
fashion
you
you
have
to
you
want
to
leverage
a
lot
of
resources.
You
have
to
have
one
single
huge
gravity
to
to
leverage
that
I'm,
not
sure.
If
that
will
potentially.
D
Yes,
good
to
expect
from
the
database
and
work
with
the
full
graph
with
any
number
of
nodes,
vertices
Etc,
and
create
some
steps
using
all
possible
ways.
Let's
see
the
implementation
of
the
in
the
graphics.
Implementation
of
the
algorithms
is
a
quite
interesting
because
it's
you,
you
need
to
use
the
spark
infrastructure
spark
map
you
need
to
map
the
data
Etc.
D
It's
the
simple
algorithm
can
take
1000
code
of
lines
and
you
typically
solve
infrastructure
questions
like
copy
data
from
here
to
here,
Etc,
but
not
focused
on
the
algorithms.
D
B
It's
have
some
a
lot
of
footprints
and
actual
layers,
and
it's
not
self-contained
I
got
it
and
yeah.
B
Do
you
have
any
you
know
inside
ideas
on
how
we
would
like
to
you
know,
bring
a
more
smoothly
integrated
way
to
you
know,
call
any
algorithms
in
inside
of
natural
graph.
We
have
such
plans
for
now.
You
can
hear.
F
Me
yes,
okay,
yeah,
we
have
talked
about
that,
so
we
are
planning
to
to
provide
a
set
of
apis
to
expose
our
kernel
functions
to
a
algorithm
layer
and
what,
for
whatever
graph
algorithms
that
you
want
to
run,
you
can
call
these
apis
to
to
load
it,
how
to
write
it,
how
to
manage
it
and
all
kinds
of
operations,
and
this
algorithm
can
utilize
the
number
graph
to
to
to
do
its
own
job.
F
So
this
is
what
our
plan,
so
so
we
we'd
like
to
use
this
this
design
for
generic
graph
algorithms.
We
also
want
to
extend
it
to
the
graph
neural
network
tasks.
D
Yeah
I
can
I
can
for
something
you
bring
some
examples:
ideas.
For
example,
we
have
the
Bitcoin
the
first
task.
It's
award
all
transactions
to
the
graph
database,
the
first
task
and
the
second
task,
for
example,
for
anti-model
ordering
it
finds
a
cycle
between
nodes,
for
example,
when
first
node
transfer
money,
so
second
mode,
etc,
etc,
and,
for
example,
it
can
be
term
or
more
nodes
in
this
site
on
this
Loop
and
we
need
to
find
them.
B
Yeah
I
got
it.
I
got
you
because
I
I'm,
you
know,
I'm
a
gcp
fan
and
I
watch
How,
bigquery
involved
and
I
noticed
the
couple
of
months
before
they
announced
the
you
know
the
more
capabilities
they
can
do.
A
lot
of
you
know
machine
learning
things
just
in
Sequel,
it's
fantastic
and
we
will
bring
this.
You
know
idea
to
to
the
team
and
as
Shinto
is
listening
and
another
thing
is
you're
you're
mentioning
that
you
want
to
do
the
node
and
loop
thing.
Are
you
mentioning?
You
are
doing
something
like
air
air?
B
D
No,
it's
like
money
transfer
when
one
people
transfer
money
to
second
second
to
third
third,
to
fifth
Etc
and
in
this
Loop
can
be
multiple
steps,
but
the
money
at
the
end
come
back
to
the
first
person.
D
Yes,
it's
about
transactions
and
the
task
is
anti-money
laundering
in
financial
sector
in
fintech.
B
Oh
yeah,
actually,
Shinto
can.
Can
you
still
hear
us?
We
are
actually
bringing
the
transaction
acid
to
Naval
graph
later
and
the
Shinto
is
the
one
of
the
guys
who
is
driving
this
design.
Actually,
he
even
gave
a
topic
on
how
we
want.
We
want
to
design
this
in
our
user
conference
several
weeks
before.
D
You
mean
you
mean
transaction
like
classical
understanding.
You
mean
confirmation
of
the
transaction,
I
believe.
D
Okay
in
in
my
questions,
was
an
analytics
task.
When
you,
you
need
to
find
the
loop
between
these
transactions.
B
Maybe
I
don't
have
the
you
know
needed
domain
knowledge
on
this,
but
maybe
this
this
question
you
can
send
to
send
us
in
the
GitHub
discussion
or
issues-
and
you
know,
share
your
stories
on
that.
So
we
can
learn
from
this
particular
question
or
requirement.
Okay,
okay,
yeah
yeah.
B
And
to
have
other
questions,
no
okay
and
feel
free
to
you
know,
reach
me
from
from
the
from
the
slack
as
well.
Oh
shinta
actually
is
tapping
the
answer
on
transactions.
If
you
can
I'm
reading
it
for
transaction,
we're
working
on
fully
acid
compliant
transaction
support
and
we're
going
to
operate
in
the
next
generation
next-gen
of
never
graph,
probably
in
in
2023
yeah.
B
And
that's
for
the
transaction
and
for
the
analytics
questions
and
the
requirements
you
can,
you
know,
share
it
in
in
discussion
or
slacks,
and
you
can
reach
me
in
slack
as
well
when
you,
when
needed,
yeah.
C
B
And
okay,
so
I
think
we
can
call
this
a
day.
It's
a
it's
a
it's
a
it's
a
quite
long
meeting
this
today
and
thank
you.
Everyone
so
see
you
in
next
month.