Cloud Native Computing Foundation Online Programs, 19 Jan 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Geo-distributed Metadata Management System

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

A

Hello: everyone welcome to today's webinar talk. I'm ji, Chun shu, the CTO of Dayton Lord Titan load is a startup company focusing on the cross-cloud storage solutions today, I'm glad to introduce a new metadata management system. Beautiful the geo-distributed environment named X line.

A

Here is today's main contents. First I will introduce a Project's background, Sky Computing and show how it changes the landscape of cloud native environment.

A

Then I will analyze the current cloud native metadata management system, etcd and its limitation in geo-distributed, environment and, finally, I will try to dive into today's topic. X line the new meta data management system.

A

So first, let's talk about Sky Computing. What is Sky Computing, it's a way to view Computing as utilities and infrastructures. Just like telephone service.

A

Do we use phone service from different companies?

A

We get the same service interface, the phone number, the phone card. We can easily migrate from one company service to another while keeping the phone number.

A

But what is the situation in Computing area? We have cloud computing companies which provide Cloud platforms.

A

We provide hundreds of services and many of them have private apis, which means the application developed for one platform cannot run on the other without any change.

A

So Sky Computing is a way of breaking the barriers of clouds trying to make Cloud app be written once and run everywhere.

A

However, people barely heard Sky Computing these years. The reason is simple: that every cloud computing company is trying to building trying to build a war, locking the users inside the private API Garden for commercial reasons.

A

It's not a technical reason. Cloud users are trying to train a way to leverage resources from multiple clouds or clusters. The trends are unstoppable.

A

Fortunately, the situation is changing, as Computing itself is becoming more and more easier to migrate.

A

For example, migrating a container is pulling an image. It takes a while, but migrating a serverless function is faster. We just need to upload a piece of code.

A

However, compared to the Computing data, moving is not that easy, especially considering data consistent cross-cloud data access usually means High, latency and low throughput. With that base, it's hard to implement High performed data consistency.

A

Our open source project, X line is built to resolve the related issues. Trying to bring the sky can build into the cloud storage.

A

So, let's first take a look of current cloud native metadata management system etcd, which is a most popular solution. First, we will discuss the impact of Sky Computing to the distributed systems. Then we will show how we deploy Etc service in the multi-cluster scenario and analyze the disadvantages of these deployments.

A

And finally, we will dive into the underlying roughter consensus protocol and tell the root cause of the poor performance. Most distributed system is developed for the single cluster use cases.

A

The message passing between servers is not a big deal, as it usually takes less than one milliseconds to send a message unless the throughput is high enough in most cases. As a result, the network is usually not a bottleneck of the whole system, but the situation changes where Sky Computing comes more and more cross-cloud. Communication appears and usually is unavoidable, sending a message. In that case you only takes up to hundreds of milliseconds and the bandwidth is a limited.

A

At the same time, the high latency and the low throughput leads to the poor performance of the whole system. Now, let's take a look of the popular etcd service. Try to deploy the service in multiple clusters above vs3. There are two methods. The first is straightforward.

A

We just locate all the edcd servers in one cluster.

A

The good news is that requests in that cluster get the fattest response for that they are in cluster.

A

The opposite side is requested from the other clusters suffers a long latency clients have to wait a long time to gather the result.

A

Besides the performance, this deployment cannot tolerate the cluster level failure.

A

What is class level failure?

A

That means the whole cluster server cannot response the request from the outside, even though the whole cluster servers fail at the same time is almost impossible. Misconfigured network May isolate the cluster from the outside world.

A

At the same time, we can reach the service at all, even is built with high availability in mind.

A

Unfortunately, we've seen that mistake many times in reality. The other way to deploy the etcd service is to divide the servers into different clusters.

A

We try to place at least one Etc server in a cluster.

A

It seems to be a perfect walk around each cluster has a local Etc server.

A

The client won't suffer the high latency, but at the same time it can tolerant the cluster level failure as minor service failure is not a big deal for etcd.

A

However, the performance of eetcd service itself drops quite a lot almost unacceptable. So what's wrong, the answer is the survey service internal message passing? Usually these messages are triggered by the underlying consensus protocol raft, which is used for data consistency. Now it's a good chance to take a look at Rough to protocol.

A

Then maybe we can find the answer to the performance job in that protocol. The client sends a message to the service leader.

A

Then the leader broadcasts the message to all the followers. The leader will wait for the response from the followers and to enough okay responses are collected.

A

Finally, the leader sends back the result to the client. The client can send a request to any server instead of a leader, but considering the followers will eventually redirected the request to the leader. Let's keep the process this way just for Simplicity to abstract the process.

A

We Gather the right graph, it's easy to see that to complete a roughed protocol. There are two round trips. The first is between the client and the leader, and the second is between the leader and the followers.

A

Both channels are cross-classers, It suffers twice the high latency. Now we know, edcd cannot work well for the multi-cluster or geo-distributed scenarios. Can we find a way to work around X Y is our answer to this question. I will first show the architecture of X Y and then analyze underlying curved protocol to work with the new protocol. We also need to consider the client-side SDK and its compatibility.

A

Finally, we'll show the performance of X line with a test and a benchmark in X line when a client sends a requested to a server it routes a requested to the consensus protocol.

A

Once the protocol completes the massive thinking, the server will try to apply its command by checking OS first and then applying the Storage level change. These architecture is used for etcd compatibility.

A

If we want to achieve better performance, we need to move part of the consensus protocol to the client part. We will show that later. The green part in the graph is a main innovation of X line that consensus protocol is called Curve curve is Geo, distributed friendly protocol, but how let's discuss it and drop down way?

A

First is a main idea behind this protocol is separating the data duration from the global artery. Let's revisit the rough protocol we discussed earlier in raft. A committed requester has two properties. The first is that it will not get lost if the majority of the cluster Servers Alive.

A

The second is the ordering of the request will not get lost or revised. If the majority of the cluster servers aligned. Do we need to keep them at the same time? The answer is no sometimes the second property ordering is not that important.

A

Let's take KV as an example. If two requests are changing different Keys value, the other one doesn't matter, we will always get the same final result. So exchanging order should be allowed in this case, but if the two requests conflict with each other changing the same key, the order does matter for the previous scenario. The ordering property can be skipped. The next question is: how do we know whether the request can be reordered or not?

A

The answer is a speculative execution pool. Every server will check the incoming request if it conflicts with requests in the pool. The server replies to the client telling there is a conflicting. Otherwise, the request is inserted into the port when we hit it conflicting, the backend protocol helps or the requester will also be sent to the backend protocol. That protocol will broadcast as a requested to all servers and decide the global ordering the backend protocol can be any consensus. Protocol such as rough and multi-paxels.

A

Here for Simplicity we choose fat. We can see the back end. Protocol is a second barrier, making sure every request has its own position in the global ordering. Eventually to some to summarize the curved protocol, it tries the requester optimistically first, if it fails, let another consensus particle handle the rest thing here.

A

The consensus particle is roughed from the above description. We can tell that the current particle can complete the consensus in one trip if there are no conflicting requests, even in the worst case. Two round trips are enough, as a corporate practical user X line should provide additional information to tell whether to request conflict with each other or not. If the users are using the etcd service, they can switch to XY easily, because xli provides etcd compatible apis, they are listed in the table. We have implemented kvos lease watch apis and the others are in progress.

A

The completed parts are the most useful ones, so, if you want to have taste, don't hesitate, we also provide two versions of client SDK. If the users don't want to change code so original as a etc, the SDK works well with X Y. So only drawback of this option is that the performance is also the same as etcd, because two round trips message passing happens. If you are searching the best performance of X line, you should use X Y SDK, whose programmer interface is very similar to the etcd.

A

With limited code change, your application should work with X Y in the X line. Sdk the client side is also a part of a corporate protocol. It ascends a request to All the known servers, so one round trip is enough for Lucky requests, though we know X line should outperform etcd in multi-cluster scenario. Theoretically, we have to demonstrate it with tests and benchmarks.

A

Here's how we build the Benchmark environment, we simulated the multi-clusters network, latency, making Intel server communication, latency 50 milliseconds and, at the same time the client should not be in the same cluster of the servers. The communication latency between client and the servers is 50 milliseconds or 75 milliseconds.

A

Actually, the latency is larger. In reality, we tested two cases. The left is the worst one and the right is a regular one.

A

In the worst case, all clients send the requests changing the same key curbs the speculative exclusion. Poor can't handle any of them, so it works the same as raft just like the result. Shoes x lines throughput, is almost the same as etcd.

A

In the regular case, clients are randomly pick a key from 100 000 key space. As a result, there's almost no key complexion in this scenario. Explain taples a throughput as expected, because the latency is halved in the real world. It's almost impossible to make the worst case. Now we have demonstrated X line is geo-disputed friendly.

A

Here is our roadmap. Last year we implemented the major API VTC deal for X line. We plan to add more features this year, such as persistent storage, support snapshot, support and the cluster membership. Changing support, men's club features are not listed here, but one thing is clear: in 2023, our main task is to make clax live feature complete and the next year. We want to make system more robust, so we plan to bring in the chaos engineering helping validate the system.

A

At the same time, we want to help more and more cloudy native components to integrate, with X line, bringing Sky Computing to Cloud natively. Eventually, the right up corner is our open source link, welcome to join the community and contribute to it.

A

That's all for today's talk thanks for your time.