Ceph Cephalocon APAC 2018, 22 Mar 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng

Description

Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Yingxin Cheng, Intel Software Engineer

A

They'll sound without, without me to say the homeowner, then they are anything you want to share with mr. Hackl and mr. Rob. Well, thank you. Thank you.

A

Showing citizenship palliative, sheer audacity to the continual differences and a fire that we much instruction. We see in terms of the word engineer, to give us a switch.

B

Hi everyone, my name, is changing I'm, a software engineer from Intel and contantly working on the performance analysis of self. So today's topic is pink quartz airport on a total cluster behavior missus. The motivation is, of course the bottleneck is a very important part in performance analysis, and it is a very difficult task to identify bottlenecks in in a very complicated system like self and I want to share, there's a way to do it instinctively and very fast, and now.

A

B

Want to share the important use case, use cases of district or the tracing okay. So, let's start- and you can also reach me personally or Australis email and we also have demo outside in the Intel booth- you can't meet them so I want to reason the entire idea, by with three fundamental questions, why performance matters? What is performance and how to improve it? They were helped understand the ideas about how to protect better performance data and how to represent performance from from the DISA collected and how to do the following analysis and improvements.

B

So first why performance matters? Of course, we don't want users, fear the system, the subsystem very laggy and unresponsive. So it means that if we do not have enough time to monitor all the activities inside the cell- and even you don't have enough time to allow this all the latencies in the user requests, but we should not overlook all the costs that is directly related to the the request the response presents, because any of them could be the bottleneck and the coverage of the bad user experience so using existing tools.

B

I think it is very tedious and time-consuming to to crack all the latencies from the every corner of self components, but I think the better ways to not clog the latencies manually, but to use a distributive tracing to reveal the history of of of requests, end of and start from, the respondent wanted to find their to find a critical path to the request being sent and the collected response, responding costs on that critical path, and we built up a prototype to to do this. This is an example of array.

B

Dos rights requests across one client and three OS DS in that cluster, and it is beaut based on pure happens before relationship between events, that between events and the the costs between these events are either in threatening occasion or cross, strap and even crossed host relationships, the the cost of in throught executions and the cost between threads and processes, and we can find sound from the responding point. There is one way back to to the request being sent because of the definition of the crater passes, the longest execution paths and Kazi.

B

The events are consecutive there. Their costs are not overlap, so it brings us a very big advantage to show the performance of concurrent requests and the. In order to explain this, I need to first explain what is actually performance. The performance is basically a two dimensional concept, the latency and the throughput, and it is it very easy to understand if we want to just chat for a group of people if, if used planks in his very fast, so it has latency, but it can only take a limited number of people, so it has matters.

B

Has the throughput is not very good but on contrary. If we use trains it has, it can take a lot, a huge amount of people, so the throughput is good, but the latency is not because the strain is relatively slow and we, we usually who usually accustomed to latency only analysis like the SAP, perf contest or collect metrics of latencies, and all we usually accustomed to measure cost individually use. Oh people as dop, trackers or or the using the SAP parking include a parking tracing that we distributed.

B

Choosing solution in South I think if we want to understand what the performance in the real system correctly, we need to focus on the performance of concurrent requests and we also need a new visualization for both the first throughput and latency, and here these two cookies is about the number of requests per second, but not the payload of this request, like how many pies the grunge music again and then to to vision, for example, to visualize a functioning, a very complex, complicated system like staff we can.

B

We can first draw two lines to represent these throughput. That goes inside this function and the throughput goes up of this function and the more steep the Landis means the Bannister put, and the distance between these two lines means represent the latency of that function. So if we improve this function, to have better throughput will see that the line of the output will be better. It will happen to be more steep, and if we improve the same function with that latency, you can see the result.

B

The distance between two lines become reduced and in the ideal situation, if it has better latency and baddest report, we can see two lines close to each other and we come with one line and so how to represent the performance of concurrent requests. Now we have one quick, critical path per her request and that it was consecutive costs.

B

Then we can stack the critical path together and aggregate the costs by its logical steps, and so we can represent the latency and the throughput of individual steps and the because the the output of the previous steps equals to the involute of the following steps, because the steps are consecutive, and so we can see the representation. The.

A

Representation.

B

Of of these steps can fit together and they together shows the performance of concurrent requests and the more important of that is. We can who can show the the performance behavior that is very powerful for existing tools.

B

For example, if we generate importing generates right inputs from a file and setting the I adapts equals to 32, and we can see clear from the second graph that the fil will wait until the previous requests being finished, and here it stands another the later group of 32 concurrent requests and next, how to understand the bottleneck in that graph and what Nakki is also a tree. I think it is also a two dimensional concept. Elating, the latencies of different steps are an unnecessary.

A

Dependent because if.

B

We will just for a group of people first to take plans and the next to take chains is speed of the plan, Treena independent to each other. But these circles are not because the the the low.

A

B

Of a plane will limit the throughput of the train, so it also means that the lowest to put in the entire procedure will redefine the. We always usually defined that the throughput of the entire procedure and.

B

It also is the worst case is that the the lowest performance part will cost the requests in weight. If you wait before this point, is the slowest stupid point and it costs weight latencies in in those requests and most of the case it becomes a bottleneck of the entire system, and we can very clearly see see this happening in that representation of performance.

B

Okay, now my performance can be identified relatively easily and that next is how to improve the performance. The very short answer is to identify the root causes of the bottleneck and then to resolve them and improve the performance, but the reality is much longer, because there are many kinds of factors that can impact the performance, for example, in three categories: physically the system, the cluster configuration or how we deploy the cluster or which, which hardware we use in that posture, will affect the performance and the.

B

Secondly, logically, the parameters we choose in functions and the different algorithm we choose and the whole architecture of the entire software can have impact to performance and other workload. At the same time, whether it is indeed a system or it's even self will impact the performance. So there are almost infinite combination of these factors and I think it is bad to do optimization in blindness survey, because we have, we have identified the bottleneck and also identified the related costs of that bottleneck, and we also related each cost.

B

With its physical location like where the host it is, were the component and process even the thread where it is, and the logical location like where the color it is, which we should call, is related and which, logically, which staff is in the work product and also attach the runtime context of request, like which request ID and what is the right lens, which is offset and what type of operation it is and because it is associated, we can index those costs and do incremental analysis.

B

It means to use control variables or open all methods to see the impact of different factors and find what to do to improve the performance and then verify these solutions to see. If we did, it is actually battle and the the incremental analysis means there is their needs are interact. The the incremental analysis means there needs to be an interactive front-end to do the data driven analysis and the soul.

B

We need first to understand the we have those factors, impact the system performance by using custom statistics and the custom visualizations to understand them and and then to dig deeper by filtering out the related costs or the sub set of requests to to have a deeper analysis.

B

Okay, so that's that's all of the entire idea about how to identify button acting in the SAP system and to do the pouring optimizations, the we leverage, the distribute tracing to collect the critical passes on the user, gig requests, and then we developed a visualization technique to to represent the performance straightforwardly and- and we developed interactive from them to do the incremental analysis. And here is the example: we we will have a prototype to to track the IBD image rats inside that cluster, and there are three requests related on the apathy layers.

B

There are image rat requests that internal internally represents the right right or Roger Klotz in that image, and this request will trigger object, requests to write the data into the objects in that image and then at riddles level. There are the object, request, triggers object, write operations to persist the data across the entire cluster, and we we define as three VM environment and the use a file to generate the right inputs and the collected Chasen's during the experiment and then in the interactive analysis content.

B

We reload the data with we load the result tracing result into a variable called data and the past that they turn into the three types of requests. Of course, there are many other requests in this existing code, but we we think that these three is the most important and this for this syntactic from Amy's view based on a path and Jupiter. It is a very famous open source, open source web application. To do the data-driven analysis.

B

And then we can visualize the performance of these three requests. We can see on the top the the image on top level the image right requests.

B

We can. We can see back what has very clearly that the the bottleneck is accident stamp. The image contacts right, X operation to the operation is completed, and we can choose to highlight this step in that in that graph and the true the purple color. So we can see that all costs are related to this step, and this step means the right acts of operation of, in fact triggers the object requests in the next layer, and we also can directly find out that the bottleneck is at least the old map.

B

Operation object, map operations and the AIO operate operations, and these two, these two operation actually triggers the object, write operations at readers level, and we also identified the bottleneck as in these three steps and they represent the queue operations in OST queue operations and the actual disk rats in in the object store. So actually the whole botnet is is limited to at this environment. Things is related to the queue operation and the writes in the object store, and then we do different analysis.

B

Firstly to filter out the P, for example, the OST queue operations by their step name, and then we we find if these costs are related to the physical location. So we distribution body were the host, it happens and we found that the median number are similar so that it it it has less impact by by the host. And then we tried found out if it is, if it is related to the logical location for the they happen ordering in the logical workflow, and we found that the first occurrence of the in queue operation.

B

Ust has real very, has a much larger median number and it means that this latency means that the following do. Okie operation in primary OSD has as not doesn't has have good performance by by using this interactive analysis, and then we start to find out the root causes of of why this impure operation becomes so slow and we found out three related configurations. The first is the number of page books of the poor and the nap.

B

The second is the number of threads per OS d sharp, and the third is a number of shouting that OSD and we found that very clearly that if we increase the number of shouts, the bottleneck of the cure of operation will be will be disappeared and we.

A

B

Things, it is environment, wicked, optimization to increase the tion numbers, and then we found that all the other costs are concentrated in the right operation. In the object store, then we can find out better configurations in an object, store or or to try to find a better discs like SSD.

B

And the interactive analysis actually can do more than that, for example, to to represent all kinds of distributions or to find out the longest or the most complex of quests and the visualize, those requests and or or we can represent, the message he kidnapped between hosts of the entire cluster and at higher level, because we attach to the wrong time contacts with this request. We can.

B

We can reckon Alice's if the right in elastics are balanced across the entire cluster or by comparing the cave EUR of image right requests in a BD and the objective quests in a BD to find out. If the Abadie caches is valid or is it works good, and if we can combined it with resource monitoring tools like Sabbath's, we can find out the specific logic that consumes excessive resources.

B

A

B

Of these presentation and thank you for listening, thank you.