Nebula Graph Community Meeting, 7 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Memory optimization: binding table evolution in NebulaGraph. NebulaGraph community meeting

Description

In this community meeting, Dr. Xuntao CHENG shared us his work on refactoring NebulaGraph BindingTable.

A

A

Share my screen: yes,.

A

B

Wait for a while a couple of more guys coming.

C

B

Okay, we have yeah, we have Shinto joining and Alexander. um Okay, hello. Everyone uh welcome to another uh community meeting for never graph.

B

um So today we will have uh one uh topic, apart from the project updates and by uh Shinto, which is Dr Shino, which is our um a staff engineer in another graph.

B

So before that we can have a quick uh interview of everyone like a Roundtable and I'm, not sure, if you can, you can connect, you can do audio or video Alexander welcome.

D

Yes, I can do audio.

B

Yeah uh sure uh I will start from myself. uh This is uh weiku I I'm I'm from the team of navigraph um I had that I'm wearing is called the developer advocate. So anyone most most of people in community will see me more often than other guys so feel free to ping me in slack or GitHub, try anything uh yeah and maybe uh Alexander. You can help introduce yourself to.

D

uh Yes, my name is Alexander I'm, the head of technical department and we just started the Journey of his uh level program.

D

uh So we understand architecture and other questions uh and now going deep into analytics and as well.

B

Okay, cool yeah, so um in the end we will have a sync discussion, so we can have more um discussion or question and answers then, and thanks again to join us. uh Then I see John is listed uh in the upper size. uh You can introduce yourself.

E

Hi everybody is we soft, I'm very happy to be here um so so welcome um for everybody.

B

Okay, thank you and finally, our uh our presenter today uh Shinto. Could you quickly introduce yourself? Okay,.

F

Yeah, hello, everyone. My name is shinheimer engineer from laborator graph I before I joined the graph I was a colonel engineer from Alabama college. I worked mostly on cognitive databases or RTP databases, I'm quite new to the graph databases kernels, but I find it very interesting, yeah cool. Thank you. Welcome to John Amber community meeting today. Yes,.

B

Yeah, thank you. So uh then I will quickly go through our latest uh news in in the in the project, so we will have the missing every four weeks. So this is our place to have sync discussion as some sort of topic sharing, so anyone would like to bring any topic so I want to share your stories on how you uh create something. On top of.

B

Feel free to let us know and share here uh yes, so we actually skipped one of the term of the meeting uh a month ago before it was a long uh public holiday in China. Sorry, for that so and in last uh two months we have a bunch of new uh PR merch in in both our uh Cloud database core and the surrounding projects. So I just picked some of them that were worth worth it to mention.

B

uh So we are, we have a bunch of bug, fixes and improvements, so these are uh some notable ones. I will quickly go through, but you can feel free to check out the release notes.

B

um Actually we just released the 3.3.0 today this afternoon in in the Asian time and like weeks ago, we have a minor uh update in 3.2 as well. um So this is a um like. uh We, we introduced the vertex filter uh in this processor. So again, neighbor processor is one of the processor when you are doing go uh from, uh and this is a configuration items that you can optionally.

B

If you know what you are doing uh to switch um the operating system, page cache for rackdb, we added uh as a configurable switch- uh and you know yeah this- this. This one was highlighted because it's contributed by one of a new contributor. So it's around the optimization on the shortest path, and this is the start of a long waiting feature that uh we support uh um filtering conditional filtering on the properties when you're doing the sub graph, together with the the find path.

B

Previously, we don't support the vertex property filtering in the fine past, and now uh the both vertex and the edge filtering can be supported together with the subgraph and find path. So this is a a strange. New um processor called get this device source so it- and we introduced this to um even push the go query even further in the performance perspective- and this is big previously in like 3.0 or 3.1, we introduced the um the bare vertex, so those vertex has no um tags. But after like two um minor versions, uh it turned out.

B

uh We hear from the The Voice from the community. It's not that useful, but still bring more complex cities and, and- and- and you know, the the good thing is less than the best thing, so we decided to switch this off by default. So if anyone- some rare uh in some rare cases, some user relies on this feature. It's still. uh The capability was capped, so you can use it, but by default from the 3.3.0 it's by default turn uh Switched Off.

B

Oh this one is worth it to be mentioned, because uh previously we we we put a bunch of different uh Community. Only uh the experimental only features uh with only one switch. That means, if you want to use one of them, you have to switch all of them. For example, if you want to use the data balance feature and you switch the experimental flag on on true that will that that will lead. You have to use another feature called the toss, the transaction over storyside, which isn't intended in most of cases.

B

So now we split them into different specific Flags. So if you want to use data balance and only use it without enable the toss, you can switch the experimental flag on and then switch separate data balance configuration now and that's doable from 3.3.0.

B

We optimized the profile result, so it's more readable and we add another function. So this was uh contributed by me. You can now pass your. uh You know: Json string into a hash map now, so some in some cases you have, to put you know, a giant Json string as one property.

B

You can leverage this function to make your life easier. uh So, for the other perspective other uh surrounding uh projects, uh one of the big thing is that uh we support um session pool now in in all the officially uh supported uh clients. um So it's already released. We actually released them today uh in Gold, Java, C, plus and and Python, and another thing is uh a python orm which donated uh to our community. It's called the navalock Karina.

B

If I pronounce it correctly and another thing is we have a new uh another project donated to Naval contribute organization? It's called another real-time exchange, so it's have a similar name with our Naval Exchange, which was based on spark. So this one is based on a flank CDC. For now it's only connected with my SQL, so you don't have to you know where your uh Watcher on on my SQL Bean log and can wear it to Kafka or off link and then connect it to Naval graph from now.

B

You can leverage this new projects, so everything is connected out of blocks, so you can just ensure your your schema was not changed in in our mycore, so you can, you know, do it in a real-time fashion with this. So if you're interested just check out this project, it's it's contributed by a a college student as I know a very cool project uh for other things. Oh yeah, this this one was a PR merged by a new contributor that he uh introduced quite sweet uh cast function.

B

So you don't have to know all the types of given results. uh You just cast it so everyone everything will be uh serialized in uh expected way. So it's much easier for the Developers and we we have a new release on kubernetes of operator. So, with this new new release, we have uh we have the control plan image in in arms uh architecture support and we also support multiple data paths.

B

Previously, we assume every part has only one data volume uh connected so, but now it's uh you can bring more of them um another one that is not yet merged to master, but in a separate branch, one of another University students contribute this project to enable you can do the you know, query um on top of uh to be consumed by a navigraph algorithm. Previously we bypassed the graph D. We we do the scanning directly connecting to to the storage layer so mostly most of our cases.

B

This is the best way, but in in some cases you still want to doing the a graph algorithm on top of a small set of data that you want to rely. The flexibility of the ndql query, so this is doable in this bronze now, but not yet merged. But if you're interested just to watch this, this work uh yeah. This is a related Naval graph analytics it's not open source, it's our Enterprise only offering so in this product we have three uh four more algorithms introduced.

B

uh Another final thing is that our Naval graph Studio, we have a new release today 3.5, so we have a bunch of new uh changes. Maybe I can have a demo in next meeting four weeks later, uh two of them are need to be highlighted. Now one is we have a you know, a drag and drop uh uh GUI interface to you know create schemas. So it's quite fancy.

B

Another thing is we have a starter page that you can easily have some bootstrapped uh example: data set to be injected with one click, so it's something new, so that's all from my side from the project updates, so maybe I will give the screen share to uh Dr shintao, okay, I'm sharing.

F

uh It seems like I cannot share my screen. Okay,.

B

F

uh Yeah books, okay, okay,.

F

Yeah I can see: okay, okay, okay, let's be so hello. Everyone um from today I'm going to talk about our ongoing project on optimizing, the main memory, data structures for the nebulograph, uh okay, okay, so I'm I'm, going to talk about mainly two parts.

F

The first one is on the memory detect data structures in nebular graph, including our current implementations, the current problems and we are addressing and the technicals um uh what we are doing and the and some preliminary performance results and then I'm going to talk about some future work on our abundant table, which is uh uh which is a major player in the memory. Data structures and other projects in parallel that we are working on currently so first slides uh come to the bending table, so binding table actually consumes the majority of memory spaces.

F

If you run a graph query within Network graph, so it's a it's a main target of our optimizations in our daily job. So what is a Banning table and in this figure I've, given an example of signing table with respect to a local execution? So imagine that you are trying to execute a match query, which is to fan the players that is connected to a team Duncan where an edge type of lag. So, basically, this query is asking uh what player, uh who are the players that team Duncan likes?

F

So to answer this much query, we have a execution plan consisting of several operators here, uh the the final one, the the bottom one is the index scan, which is to scan. What is this? It was a time with the name team Duncan and from that. What is this? We need to Traverse a grab, a traversal graph uh to uh to Fan all the all the vertices that is linked to the team Duncan. What is this uh where the?

F

Where Edge, whose type is lag and then we need to append the destination vertices into a data set and the way to finally do a projection, and we will have our results, uh which are the names of these players?

F

Okay, so between each adjacent uh pair of operators, the bending table easier for them to exchange data, for example, when the index scan operator loads data from the storage and and does its scanning of the data where index on the player's name, it will produce a Banning table consisting of what is this and it's a outer bound edges from there. We have both the graph and add more even more data into that button table, and at this stage the Banning table will consist of the the vertices that is linked to team Duncan.

F

With this build this Edge and finally, you we append all the vertices in, in addition to their properties, still in the form of pending table where we do. The final projection, so binding table is basically like uh a working desk that we continuously add some data. Are we remove some data from it to proceed? Our query processing and so it's very important to both the memory consumptions of accurate execution as well as overall performance?

F

So, let's see if this is an uh an example of the law abandoned table in local execution plan, as you may know, that demography is actually a distributed database and we separate computation from Storage so in. In this example, I show a full functionality of planning table in the real.

F

Distributed setup country in in community water, so basically we're talking about two Services here: one is a Storage storage D, which is a storage layer and another is a gravity which is a computation agent. So, in the previous query plan we need to do a scan of the storage to find the vertices of Tim Duncan and his Autobahn edges. So this operation is pushed down from the competition layer to the story, D services here in the the green, uh the green one.

F

So this is the operator that is pushed down from the graph D, so this green operator is going to generate a Banning table consisting of the vertices and then Ultra bound edges in the main memory of the storage D. So the the rest of the processing will happen inside the graftee, which is in another service.

F

It could be in another physical machine or could be in another Docker, so this planning table needs to be transferred through the graph D node via the RPC core, in which we need to serialize this data data structure to network package and transfer it via the network network to the gravity and deserialize, which is the reform the spanning table inside the main memory of the gravity, so that the rest of the processing can go on.

F

So in this execution flow, there are several oahas associated with the pending table as the first way the the green operator is, is going to write a large amount of data into the spending table. Although we are talking about very simple query here in real world applications, this pending table could be very huge and there is a o had associated with serializing the spending table into Network packets and also the network of transformed real Network and the overhead of what deserializing it in the main memory of the gravity service.

F

And, of course, there are overheads caused by reading this Burning table to uh as inputs to each The Operators, as well as writing into the banding table to materialize results and so on. So so there are several overheads associated with spending table and the spending table consumes uh not only a memory spaces. It also consumes our network capacity.

F

So this is a main target of mineral data structure organizations uh we are talking about today. So there are in our current implementation. There are several issues associated to the responding tables. uh Firstly, in our country, Community version we are using as TR libraries to uh to uh to implement this data structure, which is basically stdq consisting of uh raw records. So raw record is a class that we Define to host uh to store the data of a record consisting of multiple columns and each column is a value within this record.

F

But at the author side it's basically a cddq. So so we are relying on the C plus process, TL data structure here uh for the implementation of banding table and within this raw record here, I'm, actually a big memory consumptions to store the data as well as type because we allow users to use different data types for a single property, for example, for uh the the age property of a player you could, you could specify its age using flow numbers or using integer numbers or using even strings, so it's all accepted.

F

So this results uh in the fact that, for for value for values of the same property, it actually contains of data with different types. So we need to store the data as well as type together inside the value data structure, and this value is often a nested. So a value could be, uh could be a built-in type like integer or float. It can also be uh didn't, have like list map or node or ADD, and so on. So this value could be very huge inside the main map.

F

So this is our current implementation I'm going to talk about the detailed implementations later. So there are some um problems to address. uh Firstly, we need to still continue to allow the users to have various data types for the same property. This is a necessary functionality. We cannot sacrifice this, so this is something that conventional RTP databases they do not allow, but we allow that in graph databases- and there are big serialization and deserialization overheads among the distributed nodes of the Banning table.

F

So this could be very huge and it could be even it could cost more time than the transferring the data via Network.

F

Although it's a mean memory operation and our current implementation on burning table does not support patched or vectorized processing, so it limits the the thing that we can apply in the query engine to improve performance, and we need to refactor this this minimum data structure so that we can do more inside the query engine to improve the performance, and we also need to increase end-to-end performance and quality of services here. A main factor that is limiting our Qi say that the dependent table consumes too much memory, memory, spaces and our query.

F

Engine can open right into om and crashes, because security takes too much memory and that will hurt our general availability of the the Property Services. So we want to address that we want to improve our overall performance as well as as well as our quality of services. So these are some basic problems that we that we need to address at the level of the dining table.

F

Okay, so so this is our optimization goes. We want we, we want to support multiple types. uh By types we mean uh it can consists of, building types like like integer and Float, of course- and there are also some graph graph types like node Edge map, please and so on.

F

So basically, these types are a container of the the built-in types and the uh used quite frequently during the processing of a query, so it's very important to support them and also to support them in an efficient way, and we want to avoid rebuilding the banking tables in in the main memory. Every time after we transfer it from the storage to a property, it's very costly, so we want to reduce Network cost we want to, um but also we also want to make the banking table durable during execution.

F

So this feature is desirable because we need to um for quite often that the query graph Theory can be very huge. It may take a lot uh a lot of main memory to finish and but for each machine there is only a limited number uh for member capacity. So so so we need the capability to Pro to finish a query, even if we do not have sufficient memory capacity, we can do this by using a very big water memory and using a disk species to to fill in the Gap.

F

But to achieve that, we need to to materialize some some binding table from them in memory to the disk during the execution when it is not immediately needed about the computation and we need to load that banning table back to the memory when it is needed. So by using the water memory Respect by the a huge, durable storage, we could make a very resilient Theory processing. That is capable of processing, arbitrary large theories so so and and the one.

C

B

Sorry, maybe there is a network leech in your side. We can. We cannot hear you. Could you hear me.

B

uh We're paying her pin him a second.

B

Just a second I will chat with secretary.

F

B

Now we can hear you.

F

Okay- okay, that's good, sorry, probably something something wrong with my microphone: yeah! Okay, sorry yeah! Let me continue. Okay, please.

B

Let's do this slide and maybe the previous one, oh.

F

Maybe the producer okay, okay, sure, uh so previously, yes, so basically, this is a summary of our organization goals. We want to support our various types and reduce your memory consumptions and by types we mean the built-in task, of course, and uh and most importantly, the graph types which are the data structure to store the node IDs, as well as all their property.

F

E

B

Sorry for this.

B

Networking in China Mainland isn't that stable when connecting Zoom.

E

I'm checking with him I have.

B

Of his still um optimization doing, optimization on the network thing just second.

B

B

B

B

Sorry for this.

B

Oh shopping again, it's trying.

B

A Porsche about yes,.

B

um Just I was just like to explain you when we're waiting for sure.

B

Okay, we can see you now. You can't hear me. Yes, we can. We can't hear okay.

F

Okay: okay, okay, that's cool yeah; sorry um I'm, using my my laptop directly; okay; let's continue; okay right yeah! So.

B

This is a summary of our organization, you're not sharing. uh Yet, oh, no sharing.

F

Oh sorry, okay, sure so.

F

F

Okay: okay: let's go, let's continue so sorry for the for the for the voice issue. So so we basically we want to uh reduce the serialization this realization cause of pending table, because we uh not only need that to reduce the the memory the data transfer overhead within a distributed setup.

F

We also need that to support uh the the capability to process arbitrary, large queries, uh because in which we, we probably need to materialize a banding table during execution, so that we can use the the huge uh storage capacity to form a waterfall, uh very, very large, waterming memory uh for us so that we can process a query. Who's uh who's working set is larger than the physical drams. As so so we we, we need a Banning table to be easily durable during execution so that we can achieve that.

F

So this is one of our major transition goes, and we also need to refactor the bending table so that we can. We can achieve a batch, the processing, vectorized, processing and so on. For for this type of automatizations, they need to add exercise a batch of sequentially stored data within the main memory. So our accounting implementation cannot deliver that so that we need to change the data formats and now, of course, we need to reduce the memory footprint.

F

Okay, so so this is our current implementation of the of the of the the data. It's it's basically a union of a set of values of different data types. So here we have, for example, eight byte, integers, 32, batch integers flow, double and so on. We also have the data tabs like list map, node and Edge here, for example, for a no tab. It contains of a node, ID and uh and a very long uh list of properties. So each property is actually a map.

F

uh It's a map is a pair of a string and no value. So we can see that this. This data structure has several problems uh of the first way to use out of place. Memories, for example, here the list and map node, Edge and so on. They are all pointers to other places in naming memory, so we are using that alt Place Auto Place memory, so there are two issues associated with with this.

F

Personally, when we are free the memory resources of a data union, we need to free this unit itself, as well as the memory space is pointed to by this pointer, and that can make the the free of memory space is very slow after we have finished the processing of a query, and another problem is that when they are deserialize or serialize the banding table, we need to reallocate these memories for all of these pointers and store the data there and all of this data.

F

They are not stored sequentially because of we are using this this kind of structures and for this list, map, node and ads and so on. They are all nested. Data. Tabs are quite complex for the memory usages and if you are familiar with C plus plus, you know that the actual size used by a union is bounded by its largest member.

F

So, for example, although the the minimum member of this data is is a broad type, it only uh occupies one uh one bat, but its actual size is bounded by the the size of the this very large member. Like double all these pointers, they are all eight bytes I think because they are all very considerable, a lot better. So this Union is actually very large, although for many cases it's only uh stores very short data. So there are many uh padding within this uh data you mean so this is.

F

uh There are some problems with the Union data unit, so there are also problems that we uh we have for the data tabs, because we are allowing users to specify the data types for various data types for the same property.

F

So we need to store the type for each data element, and currently we are using a 64 integer for this 8 byte integer for this type uh enum for this typing num, and it actually consumes a lot of memory spaces because we only have like a a tensor a little bit more than 10 number 10 types, but we are using a very large species for the type information- and this is quite- um we cannot simply reduce the size of this data structure because we are storing a data as well as type in in an array of structure way.

F

So it's basically a very large array contains of a very a lot of structures and within each structure there is a data and its type. So if we reduce the type, for example, to one character and because you are storing it in addition to its data in a Sim structure, the C plus compiler will add some padding into this type enum, and it is still, it is still going to cause the volume number eight batch in the main memory.

F

So so we can't have to use 64-bit memory if we are continue to use this a real structure uh layout for our binding table, and this is very costly for memory spaces. So this is our current implementation and we want to improve it so to in this one, I summarize some of the techniques, some of the basic tactics that we are, we are using to optimize the rule store the the services are used by each record.

F

We are introducing a concept of violent record, which is, uh which is a record that is able to use memories of variable lens and it's very quite adaptable, and it's going to store all nebular value in place, so there will be no pointers that is pointing to other spaces in the mean memory. So we are removing all these pointers and storing all all its data sequentially, all together in the main memory. So by removing these pointers, we are not using Auto Place memories. It's going to support a faster allocation and deallocation.

F

It's also going to make the serialization and visualization much faster and we we are going to sequentially store all the all the the data for all the records in mean memory chunks. So so these are some of our basic augmentations and we are also introducing a memory management. uh Currently in nebulograph we do not have a memory management module.

F

We are relying on the uh the PMR function of C, plus plus, but uh for the for the time being, we are introducing this functional multi. So, with memory management, we can avoid the allocation. The allocation of very small memory pieces that are scattered random, randomly in the main memory we can reduce the memory fragmentations. We can also improve the memory allocation performances of small memory pieces. So in the query engine there are actually a lot of places that we are frequently using small memory pieces.

F

So this can be improved significantly by the memory management and we want to in return all the resources back to the memory Arena at the end of a very big query instantaneously, so that that memory resources could be released and used by other queries So. Currently, in our current location, we have this problem that we are really using memory at a very slow pace because for each data element we are using a lot of out of memory pointers and all of these pointers are basically pointing to random spaces within the memory.

F

So it's very slow to release all of them so by removing that and to release all the memories of the memory chunks. All together back to the memory Arena, we can achieve instantaneous memory release that could help us to to to to reduce the memory pressure after we have processed a very big cure. It's very. uh This function is actually a very necessary for us to reduce the risk of out of memory and crashes, and we want to support the processing of arbitrary large queries with bounded minimum capacity that this is also.

F

uh This is can also be done if we have made The Binding table easily durable okay. So all of these optimizations are what we are doing on the current Joe OS layout of the Banning table. So by robots we mean we are storing each record one by one and within each record we store all their properties all together, and we are also going to foreign.

B

B

She's just to answer your question, the bind table, which is the how we structure and data in memory in inside of a number graph, yeah.

G

A

G

So can I ask a quick question while he's coming online yep, so um what's the story on the backup.

B

Backup yeah, uh you mean the backup restore yeah.

B

um Yeah, the now the backup restore uh supports the objective storage and the local storage, but uh I'm not I'm, not sure if we support the the hdfs, but S3 is supported. So basically, um uh there is an agent there and help you manipulate the underlying HST files and then make them place them into either it's local uh file system or push to the objective, object, storage, S3.

B

G

um um When I was talking to be a couple of uh weeks ago, he was mentioning that there was a issue uh with the backup restored feature and it is not working in the um and as expected, was it resolved.

B

um So maybe we can talk this.

G

Okay, later, okay, no worries; okay, okay,.

B

We can see your screen now. Can you uh I I was I was talking now we cannot hear you.

C

B

With the network is so bad.

C

B

Mate not a time of drawing.

B

How about this time finger crossed.

C

B

Isn't that bad next time we should do rehearsal things.

C

B

So I still cannot hear.

A

B

B

A

B

Oh, he will try uh reboot the laptop in the Finish, but before that we can, we can continue our uh conversation discussion, uh Porsche. Okay, sorry, so uh your question was uh something is wrong with with your backup restore process or yeah.

G

So um in my Standalone cluster um I try to backup restore and it did not work then I reached out to in the community forums. Then I saw a response from B saying currently, backup store is broken and he he told us to wait for a couple of releases are at least next release and so I'm trying to check if it is restored.

B

I, don't have this Con context? uh Are you doing it on top of kubernetes or baremental deployment on top of kubernetes, oh yeah, that that part is isn't ready. Yet, uh as with the issue, uh because something uh you know, the the current implementation of the the agent uh assumes it's running uh in a better operating system together with the the service, the Manhattan Storage process, and when uh the pro the agent is handling. This uh restore uh request, the things that that you know the control chain cannot be.

B

uh You know implemented in in current implementation, but that is uh on our roadmap and we are actually working on that. So we aim to you know, make it work like in in in next season, so in in two or three months: okay, yeah, but before that uh we we have to you you can you can even do the backup as I recall, but the restore is with issues.

G

But how do I? How do I back up.

B

uh Before the official support you know, everything was uh polished for the container rights. The deployment of the Beckham restore. Maybe you have to handle it from the underneath uh snapshot, because back home restore, is actually the uh higher abstraction uh based on the snapshot. It helps us do a lot of you know: Dirty Works uh copy and paste and the file file movement, something like that.

B

um So in theory, it's still doable. If we we do the snapshot thing and restore from the snapshot thing on. My our own is still doable I think, but we have to yeah. We have to manipulate things uh instead of the corresponding volume. You have to mount it somewhere or you or you have to ensure the part. Is uh you know it is runnable and you access those file system and you you ensure the snapshots uh binary. Is there you you, you follow the snapshot procedure of documentation, so ideally you can do that.

B

uh Welcome back. We can hear you now.

F

Okay. Sorry sorry.

C

F

Computer and I'm not sure, what's going on, what's going on today, sorry so uh yeah yeah can I see my slide. Yes,.

B

F

Clear: okay: okay, let's, let's continue so so. Basically, we are building a very large linked list uh linking a set of chunks so for each column, now, storage. For each for each column, there will be many, a different number of column chunks and we we support both dense format and sparse the format at the same time, for our sponsor format.

F

If, if a attribute of a particular particular record does not have a a none now value uh in in a column, we are going to skip its uh uh skip this record within that column, so that we can reduce the overall memory consumptions, but for the tensor layout we are storing all the attributes of all the records in all the columns, regardless of whether it's a law value or it has a some Nano balance.

F

So we'll support those formats and within a column table which there are a linked list of column chunks, as shown in the previous slide and within each column, chunks. We are using a bitmap called Low occupancy bitmap. This bitmap is going to mark all the all the records whose attributes 4 within this column chart. They are marking whether This Record has a null value or a noun value in this column chunk. So if the, if some bit is zero, it means this record does not have a real value and we actually skipped its physical storage.

F

But it's it exists logic. Logically, in the data storage, and if some bit is one is true, it means we have a physical, real value for that attribute for that recorded and that value is stored within this column. Chart. So by using this row occupancy bitmap we can. uh We can deliver the same functionality in the sparse, the data layout and in the density data layer.

F

We don't need the speed map, because we are basically storing all the attributes of all the radicals in in all the chunks and because we are allowing the users to to to use various data types and we we want to store that data tabs still in addition to their data. So we have introduced the data structure called the tab, vector so type factor. A single tab. Vector has a fixed lens, which is about 32 beds or 64 bets.

F

We fix its size so that we can easily manipulate it using same D intrinsics like AVX, 512, FX, 256 and so on. So we can use simply intrinsics to accelerate the operations within a single tap vector and for within a tab Vector, we are recording uh the memory offsets for the attributes of the same data type within the the column chunks. For example, if we have inserted 100 attributes of the same integer 32 data tab, we only need to mark it starting offset and its end upset.

F

We only need to mark this data type, which is probably zero, zero, zero zero. It's only for bits for the, for example, for the integers 32 data Hab, and we only need to Mark its uh a starting upside and an offset so by by marking this data we need. We know that all the data between these two offsets have the data type of some some code, which is probably integer 32..

F

So by encoding this information within the tab Vector, we can record all the data types for all the attributes that we have stored within a data chunk and a secret in a sequential way.

F

In a sequential way, so within a data chunk uh because we are allowing different data tabs so that and the triple does not have a fixed size, so some attribute may be larger than the rest. So this is a trouble uh if we are inserting data into a data trunk, because we we does not know how much memory spaces uh that this record will end up with using after we have inserted it.

F

So we have we borrowed that this idea from the postgres, which is to insert the data from the height to the tail and insert the tab vectors from the tail to the head. So by doing this we can always appending data to the uh to existing attributes at the end, because this upside is marked in a chunk header. So so we can always append data to the attribute and we can always append directors after existing type actors, because we marked this offset in the chunk header.

F

So when this, um when these two offsides meet in the middle of this Chunk, we know that this trunk is through the field and we need to allocate a new chunk for incoming data. So this is the basic data structure for the column chunk where we install the attributes from the head hotel and the tab actors from the tail to the head, and this format is what we learned from postgres.

F

It's actually quite useful to support the storage of data with various lens various lens, and if, if we are allowing users to use data with different types, we are basically allowing them to to use database different lens. So so we use this setup and we are also uh Pro uh going to provide an index for this memory. Banding table to accelerate uh 0.0, carbs and range low carbs in in this slide.

F

I show an example of our ongoing work on providing a hash table index for the planning table and within its index uh we are using several techniques to reduce the memory footprint of the memory consumptions of hash table itself, as well as the memory used to store all the hash keys, because a hash key could be very long, and this long hash piece is going to be a problem if we are storing a lot of entries in hash table.

F

So so we use this fingerprint technology to uh to pass a key as a string to it to a fingerprint and use this fingerprint as the key for the hash table now where in which we use a Delta, try implementation, which is a very memory efficient implementation for her stable and uh after the hash table. This keyboard uh link us to a payload and that payload is stored in the column chunks of the dining tables. So by using this uh set of data structures, we can achieve very fast Point lookups.

B

Oh, my God, we will lose you again.

E

E

Anyway- and this is what I suggest um so we probably finished the conversation with Posh for the BR part, and we need to work with shintao to make do a recording of what of his presentation and re-edited the session so that it can be shared afterwards.

A

E

Than an hour now, yep.

B

um We gave another try because I think it's it's already in the tail part.

B

B

B

I'm suggesting him to you know, do a recording for the rest person that we can share later. Okay,.

B

Another try: let's come back.

B

I should help.

B

A

Let's do all black.

A

B

B

B

Guys uh Shinto said it's almost finished, so maybe uh this topic can can be called end now uh do do you have any questions to Shinto so I can you know type him.

B

F

F

Okay, oh not too hard. Okay, yeah! Do you have any questions? We can answer questions we actually have finished. All the slides.

B

B

um Porsche actually asked uh how uh where we we are using The Binding table. So maybe you can give a brief.

F

Yeah I actually have a slide showing that, uh uh let me try to you can't hear me right. I'm.

B

F

I'm afraid of you, yeah I have a slide showing where we are using the spanning tables. Here in the distributed site Hub, we are using the spelling table to to gather the data that we loaded from the storage and we transfer this data back to the gravity and between all the operators and also within each operator. They are consuming the data entries from the balance table and Proto producing their results back to the bending table. So it's basically a data container that we use throughout the entire query processing uh procedure, yeah.

B

Okay, uh so I shouldn't I have a question so regarding regarding uh your what you're, trying to refactor it, and one of the points you highlighted was to avoid the rebuild uh on this perspective. Do you mean that you remove the the pointers of this uh uh table uh object? So is that the where it uh you know, save us from the rebuild, because everything is pointed with the offset instead of a pointer.

B

We lose you again, I guess: okay,.

B

Something's wrong with the.

B

A

B

Yeah, so we can, uh you know we can manip. We can, uh you know, pass the this data as AIDS. We don't have to you know, rebuild the everything in the pointer and in case it's passed. We have to okay understand.

B

Thank you. uh Thank you.

B

We should have I should have drive a rehearsal previously before we were doing this I'm. Sorry for this, it's my fault. uh Thank you! So much uh Dr Shinto and uh it's quite uh you know it's hardcore, but I think it's it's quite insightful for the you know for us and maybe upcoming guys joining our community to understand more of nebula grab and how it works. Underneath.

B

um Thank you so much and uh it's okay, it's okay! uh Shinto in the chatting. He said sorry about this: it's not your fault!

B

um Okay! Maybe we we we come to uh I will share my screen. We come to a final uh uh part, the the sync discussion. So we can continue um uh Posh. If you have any more to discuss and Alexander well, we can do now and afterwards we can close. This yeah do.

G

um You know, since I'm, going to make another try on the uh backup and store and get back to you on my listed latest experience. But one thing for sure I would uh like to mention. Is we are using nebula uh Helm charts for deploying uh cluster on kubernetes, so the cluster when I say? Are you? Can you hear me? Yes, okay? So the the issue with the cluster is: if I wanted to attach, uh additional uh volumes are, are Dives to the part, and there is no is there any way I can attach.

B

No, no, um you know we're doing it uh uh uh kubernetes fashion, so.

B

Although it's underneath the the Precision the volume, but we we don't, um we don't uh suggest users to actually uh manipulate um things on the volume level because the um because the the parts of the storage they are, uh they are stateful, but they are binded to specific volumes. So we cannot, you know, take over by manual, so it will. It will be broken or introduce conflict between our manipulation and the operator, because Operator Operator, somehow is you can treat it as another administrator.

B

You know to manipulate the the classer and you just talk it with the crd, the yaml file, so I don't recommend. uh Previously we have a conversation regarding uh um you want to leverage like what we are doing in NFS in a non-uh cloud application, but in kubernetes uh this isn't the recommended way to doing that.

G

So, even in in I want to understand this more, let's say, for example, uh the PVC uh that you create for the storage. Do you use uh single, read, write or uh um uh read and many write, what kind of um configuration you have.

B

It's it's not assumed any. uh You know how how the stories was uh lying on. uh We actually uh suggest uh use to not use Raid um just instead. Just we just use, uh you know a PV uh bond by the cloud provider uh or even uh in certain cases uh you can use the local storage because you can optionally to you know enable replica on the storage space level. So it's still acceptable, but we just I recommend to use the fast uh backhand PV provider.

G

Cool and that's what we have done, what we have done is we attached uh uh uh Prime SSD and disks PV series, storage, SSD storage. uh The moment we uh attach a uh uh SSD disk, they they are not read many right. It is a single read. Only single read not read many or.

B

G

Right right, so that means one card can read and one part can write to that: SSD storage. So uh instead, if we use like let's say uh because of we are all Azure, we are not using Amazon at all AWS at all. So if I use, uh Azure uh storage, uh NFS drive or Azure file drive, it is um write. Many read many drive, so a one drive um I can attach and I can write as many storages as needed.

G

But thing is I'm worried about the performance of nebula, using storage uh disk as disks as a storage versus SSD.

B

Yes, I I, don't recommend to enable the mouthful red or anything hack like this. uh The reason is. uh Actually we can do that, even because you know the the operator itself is uh it's open source you can customize it and also you can even customize the all. The resources in your kubernetes cluster by default is single right.

B

um Multiple reader I, don't recall, but you can do that, but I don't recommend, because you cannot guarantee the performance. For example, you know, although we are just using uh a cloud, you know you the the PV provided by Azure or AWS, but underneath there could be some kind of affinity like maybe certain volume is stating together, uh more close with your part, either it's in the Numa node awareness or you know the data switch closeness.

B

So when uh one disk mapping tools are given a part or given node is a it's more guaranteed from my perspective and I, don't think it's worth to. You know re risk here, because I think the database here is quite critical, uh Mason critical situation. So it's not worth it to hack, search, multiple, read approaches. Another reason is, uh you know in this whole operator concept, the disks are binded or logically, uh one per one pod instance.

B

So if you, you know enable some sort of uh multiple read you, you have to break this concept, so you have to change a lot of things. I, don't think it's even worth worth it. But uh anyway, if we we are, we know what we are doing. We we can, we can try it, and this is even quite easy to enable uh you know the PV in multiple uh read actually.

G

Because I, the probably the the most simplest method here could be, if we um just think about it right, uh let's say, for example, if um the helm charts allows an optional additional drive to be mounted on storage, let's think about it right. If, if I can mount an additional drive on storage, then all I have to do is run the backup and copy manually. For now, from uh the local storage to uh to the additional volume, I can move the data right.

B

um I, don't think that's the case, but generally you can somehow do it because uh yeah the files are there, but there are still some running con context there. You you mean there. There could be some corruption if you just copy on the fly, but uh that's more uh doable. Actually um previous, uh previously I, just I I also looked into uh uh uh look into how we we set this multiple reader policy in the in the operator code and I've actually find where we can change it and I.

B

We can talk about this offline, I, think and.

C

B

You, after you, we we changed this this certain policy. Add this in the crd.

D

B

Can enable you know, snapshot thing, but maybe we don't just do the directly do the copy pasting, but we can leverage the naval graph snapshot. We can like Mount another. uh You know tooling pod with some. You know uh some other uh handy tours inside this pod and enable the actually multiple read of those volumes together in maps to mount it to one part. So we can easily perform. You know more hacky snapshot thing. Yeah I think that's totally doable. Yes,.

F

B

If it's uh it's considerable to be, um you know more General use case. We can even trying to expose this policy. Multiple policy in the in the Upstream operator, repository and but by default is a single read, but we can try that yeah, okay,.

G

Thank you that that would be awesome and one more thing: um The Meta d.

B

G

uh The Meta D currently expects a separate uh volume for data and log yes, can we can we use one volume and split a split volume.

B

uh Technically, it's doable, in fact, we actually uh previously that's our implementation and we involved.

G

It yeah yeah I thought that actually previously you had that way and then you required two separate thing, because that is very expensive. Having two uh discs, because the volume is very low and I'm, not sure why I really understand you know why you guys um made it a separate.

B

Yes, big, maybe the the intuition of this uh change was uh in future. We can enable tiered, you know volume provider for different usage like we can use much slower.

B

You know PV here in in the log, but we didn't go that far, but now I understand your concerns, so we can talk about how we want to mitigate, to provide the flexibility possibility to enable this to you know, to to reduce the cost. Yes, okay,.

G

And thank you thank.

B

G

Okay and and one more thing um when I write, we write to the nebula storage. Currently, we should use um nebula exchange, but currently we are using insert operations to insert data into nebula graph, but what I'm seeing is number of SSD files on the storage are huge.

G

um Why? Why is it happening like that?.

B

Oh yeah, that's the lsm3, the you know the the the right Implement. You know the you. You will have like a multiplied like three or more times when you're writing initially, but after uh that's the by nature of the lsm3, but after other compaction, it will shrink to the expected. You know um multiplier of the storage I.

G

I tried I I tried to compact. um Even then I see a huge number of SSD files. How many? uh What is what is the math here? How many SSD files I'm supposed to see.

B

ah Shinto do have a quick idea on on this, but as I recall, we have some um on the final numbers. We have a configurable uh uh a place, so I can check and come back to you later. Actually we talk about this configuration in our discussions last week uh regarding this, uh the the file size and the file numbers. So it's configurable actually, but I don't have the idea on by default. How many we should expect I, don't have idea for now right now, but we can talk offline later, yeah yeah.

G

Okay, okay, that's that's! That would be another wonderful uh thing that can help us.

B

Another thing sorry interrupt. Another thing come to my mind as I recall, you would like to request the uh no two Vector algorithm in our you know. Naval Analytics do remember we we don't have this analytics offered previously. Last time we were face to face talking now we.

G

B

It we support it recently. Yeah I I, send your your request to the team and it's already supported and.

C

We will have a really.

B

G

Yeah no to vect when you say no to back it is um so. Let me say this: we have our graph so far in our POC. We are using for uh Knowledge Graph medical Knowledge Graph right. So when we do our graph analytics node to back um using your implementation yeah, it is running out of space because you are using Graphics Library spark Graphics, Library yeah, and that is very, very I. Don't know how much.

B

Yes, the advantage of graph x is most all of the data engineering team. Has the competence on this infrastructure and on on this uh you know this data stack, but uh it's not perfect or it's not in a good reputation in the you know, resource memory, uh utilization and that's why we we provide alternative in our Enterprise offering in analytics, but you can you can request another trial of analytics to to see how many, uh how you know the memory utilization comparing to the you know the graphics level algorithm you.

G

Can see okay I will I will definitely try for another request: yeah if I use the tiny graph, let's say a thousand nodes thousand edges your implementation works, but if I have let's say a billion node and edges, uh it is crashing yeah.

B

G

Doing very very bad performance.

B

Yes, it's it's! Basically, the vanilla uh graph x implementation that.

C

Yeah, you know a.

B

Lot yeah, because it's usable so a lot of community users can, you know, uh benefits from it. But when it comes to um you know, resource visualization requirements. So it's better to go with the enabler analytics. Yeah.

G

Okay, I will ask for another extension and I try out so.

B

G

The next meeting uh four weeks later, okay, so then I'll come back by that time and I'll get back to you sure. Thank you, okay! Thank you. So much thank.

B

B

D

I have a question I also about uh graph algorithms, so we are quite new in the network graph and I. Familiar movies uh tiger graph with neo4j and I do have a plans to create some during full language or mapreduce inside the database to be able uh not run algorithms as.

C

D

Previous speaker said on the uh separate infrastructure with the graphics button size of the database.

D

Yes, I hear that I I know that there is a graph, analytic, nebula Analytics, uh but one of the main- uh let's say one of the main point for analytics: it's create own graph for algorithms and uh it's uh sometimes you need to modify uh some standard algorithms, but in many cases you need to tune uh to work with particular business case.

D

uh So how you solve this for now or in future, you plan to solve uh I, just try to understand the the way from the business task uh to the uh running algorithm on the uh graph database and using the graph features, etc, etc. I want to understand this Quest.

B

uh Good question Alexander: um actually um we in long term we have ambition and the plan to implement a more uh analytical.

B

um You know native uh gravity, but uh for now we don't have this uh kind of offering, um but the analytics itself, because we are, you know we have a separation uh design between the computation, the graph layer and the storage layer. So that's actually bring our benefits that you can treat an enabler analytics in some in some way as our native analytical platform, but as I understand you're expecting you know to do some more um procedures in in the writing. Query.

B

Instead of calling uh analytical platform, you want the flexibility to just read like The neo4j apoc. You know to to do some more complex analytical thing in in Naval graph itself. Instead of other platform. Is that your most concerned.

D

D

For example, uh there is some standard algorithms, uh like page rank or like this uh centrality, uh and uh uh you need to modify this one and uh for some for some reason, for some cases, not only uh parameters of these Cell games that exposed, but uh some logic changes the logic of algorithm, yeah, yeah, yeah and.

E

B

Now, actually, um if, if Graphics can fit your requirements, you know on the resource utilization perspective. uh In that case, uh you can actually use our uh nebula algorithm. You can call it from a command line in sub spark submit, but in that case you lose.

B

The capability on you know, modify or uh you know, chain, Mod change, how the page rank behaves, but uh in if you are coding it in in a sparkly, you can do the Scala thing to you know just leverage it with more modifications and you can even Fork the patreon can change some of the, how it it will iterate and that's doable, and that's if you are with the background of Scala, which I don't have. That will be easier right. Even you, your python background, you can use the price work, that's still doable.

B

On the other hand, we.

C

B

On the other hand, we one of our community users are maintaining a 4 bulk of never graph, that they introduce something similar to neo4j apoc and they will Upstream to it to our community in upcoming months. So I'm not sure if that can fit your um requirement, but the the cons of that implementation is we it's not a map reduce um fashion you you have to you want to leverage a lot of resources. You have to have one single huge gravity to to leverage that I'm, not sure. If that will potentially.

D

Yes, good to expect from the database and work with the full graph with any number of nodes, vertices Etc, and create some steps uh using all possible ways. Let's see the implementation of the uh in the graphics. Implementation of the algorithms is a quite interesting because uh it's uh you, you need to use the spark infrastructure spark map you need to map the data Etc.

D

It's uh the simple algorithm can take uh 1000 code of lines and you typically solve infrastructure questions like copy data from here to here, Etc, but uh not focused on the algorithms.

D

B

It's have some a lot of footprints and actual uh layers, and it's not self-contained I got it and uh yeah.

B

Do you have any uh you know inside ideas on how we would like to you know, bring a more smoothly uh integrated way to you know, call any um algorithms in inside of natural graph. We have such plans for now. uh You can hear.

F

Me yes, okay, yeah, we have talked about that, so we are planning to um to provide um a set of apis uh to expose our kernel functions to a algorithm layer and what, for whatever graph algorithms that you want to run, you can call these apis to to load it, how to write it, how to manage it and all kinds of operations, and this algorithm can utilize the number graph to uh to to do its own job.

F

So this is what our plan, so so uh we we'd like to use this um this design for generic graph algorithms. We also want to extend it to uh the uh graph neural network um tasks.

D

Yeah I can I can uh for something you bring some examples: ideas. For example, we have the uh Bitcoin uh the first task. It's uh uh award uh all transactions uh to the graph database, the first task and the second task, for example, for anti-model ordering it finds a cycle uh between nodes, uh for example, when uh first node transfer money, so second mode, uh etc, etc, and, for example, it can be term or more uh uh nodes in this site on this Loop and we need to find them.

B

Yeah I got it. I got you because I I'm, you know, I'm a gcp fan and I watch How, uh bigquery involved and I noticed the couple of months before they announced the you know the more capabilities they can do. A lot of you know machine learning things just in Sequel, it's fantastic and um we will bring this. You know idea to to the team and as Shinto is listening and another thing is you're you're mentioning that you want to do the node and loop thing. Are you mentioning? You are doing something like air air?

B

Sorry airflow, just the Dax thing you want to create a you know, dag on on different input up to of different tasks. I'm saying that Dag.

D

No, it's like money transfer when uh one uh people transfer money to second second to third third, to uh fifth Etc and in this Loop can be multiple steps, but the money at the end come back to the first person.

B

So is it about the transactions.

D

Yes, it's about transactions uh uh and the task is uh anti-money laundering uh in uh financial sector in fintech.

B

Oh yeah, actually, uh uh Shinto can. Can you still hear us? uh We are actually bringing the transaction acid to Naval graph uh later and the Shinto is the one of the guys who is driving this design. Actually, he even gave a topic on how we want. We want to design this in our user conference uh several weeks before.

B

We will have it yeah in the future. We will have it and it's not too far from now as I.

D

You mean you mean transaction like classical understanding. You mean uh uh confirmation of the transaction, I believe.

B

Yes, yes, okay,.

D

Okay uh in in my uh questions, was uh an analytics task. When you, you need to find the loop uh uh between these transactions.

B

um I, okay, I didn't catch.

D

It yeah because.

B

Maybe I don't have the you know needed uh domain knowledge on this, but uh maybe this this question you can send to send us in the GitHub discussion or issues- and you know, share your stories on that. So we can learn from this particular question or requirement. Okay, okay, yeah uh yeah.

B

And to have other uh questions, no okay and feel free to uh you know, reach me uh from from the uh from the slack as well. Oh uh shinta actually is tapping the answer on transactions. uh If you can I'm reading it for transaction, we're working on fully acid compliant transaction support and we're going to operate in the next generation next-gen of never graph, probably in in 2023 yeah.

B

And that's for the transaction and for the analytics uh uh questions and the requirements you can, you know, share it in in discussion or slacks, and you can reach me in slack as well when you, when needed, yeah.

C

B

And okay, so um I think we can call this a day. It's a it's a it's a it's a quite long uh meeting this today and uh thank you. Everyone so uh see you in uh next month.

B

B