Calyptia Fluentcon Europe 2022, 19 May 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Lightning Talk: Data Flow Control in Cluster Logging Pipeline - Pranjal Gupta & Eran Raichstein, IBM

Description

Lightning Talk: Data Flow Control in Cluster Logging Pipeline - Pranjal Gupta & Eran Raichstein, IBM

Logging pipelines are crucial in ensuring container logs are reliably collected and routed to persistent storage. Logs generated by workloads (container processes) are written to files by Container Monitor processes (e.g. Conmon). In production environments, as Fluentd deals with a massive volume of logs, the log generation rate often exceeds the rate of log collection, which causes log loss. There is a need to prioritise application logs so that administrators can collect logs from high priority workloads in a controlled manner. In this talk, we introduce a new feature in the in_tail input plugin, which uses group rules to rate limit log collection. We share exciting insights from our systematic study about log loss on Fluentd plugins using our open-source benchmarking framework. We also present a Log Flow Control framework that allows users to define and enforce log rate limit policies to control log loss predictably.

A

So hi everyone, sorry for the small delay. We had some technical problems and before talking, I just want to say thank you for the fluent guys in the community that are hosting us here. um I'm uh and pranjel gupta we're from ibm research, and we are working uh very strongly with the reddit team. So there is a ibm research and relative are working together.

A

The area that we are focused on is observability and and observability stack of openshift and prangel will talk about it and elaborate a lot more. The area that we are trying to promote is treating logs to be equivalent citizenship like uh to the rest of the resources in um in clouds and in kubernetes in openshift. So just like cpu and memory, which are managed and controlled resources in such distributed environment. We're trying to get to the situation that logs are treated in the same way so that we can control them.

A

We can control the amount of logs that are being generated and the amount of log that is being collected using fluency and and the logging stack just uh um that that this is managed- and these are those are not like uh free resources. uh In the system- um and I think that eduardo touched that in the beginning, when he was saying that there are too many logs in the world and everyone is sending logs out so this is exactly the situation that we're trying to handle over here, making sure that everything is under um control.

A

I will hand I will move the give the stage to pranjel to give us the entire uh talk, and you know kind of elaborate. Thank you.

B

Thanks around for the introduction, hi everybody uh so first, uh I would like to introduce like what is openshift. So openshift is a flagship product as a service platform for from red hat built on top of kubernetes, and it allows you to deploy and manage your containers in a easier way than using plain kubernetes environment.

B

Openshift logging is like a subsystem for logging and how to configure logging in your openshift cluster. It provides a high level. Semantics are in inform, like in form of an api to customers, so that you can configure your logging architecture.

B

So this is an example of how you can control your cluster logging through form of a custom resource definition. This simple and intuitive api will help you to generate complex fluency configuration and it on top of that it adds normalization, metrics and buffering to your cluster logging.

B

So on this slide, this is a like very high level view of how our logging pipeline looks uh so like logs from containers like in the form of in standard out and standard err. Streams are written, two log files on the disk by container runtime interface and logs from these files, which are stored in var, lock, containers are read from fluently, normalized and then sent to persistent storage like elastic searches, log affluent forward.

B

However, this seemingly simple architecture has many bottlenecks and like of which we are only talking about what we can control from the lock collection side.

B

So in situations where you don't have control of on the amount of logs being collected right where, like lot of logs, are being generated from each application and you want to troubleshoot, you want to debug and you don't know what is happening. So in those situations you need to have a very good state of what is happening in the cluster.

B

So, in those situations, when you have cpu and memory resource crunch, there can be buffer overflow due to which logs are not being flushed regularly to your end point so this causes like back pressure to your connected components, and you start to miss out on logs, so the so these two bottlenecks can we have done a study on these two bottlenecks and come up with a feature in the intel plugin, which is one of the most widely used plugins in fluendy to control.

B

What amount of logs is being sent and do you know how much log is being lost and what are the sources of that logs?

B

So now we formally define uh our two uh like areas like one is log loss. That means the difference between what was collected and what was generated by workload applications.

B

So this means that when fluenty misses log rotation, you start to lose logs right and this can be accounted to the number of uh missed rotations into the size of each file of log log file right. The second one is the data clogging. So when you, when you have very less memory or cpu resources available to you, fluently's output, buffer buffer, starts to get like overflow, and uh you know you, you tend to lose logs because uh you don't know what to do. Right.

B

Fluently starts to push back to slow down its reading, so that you can stand what it has already processed right. So this is data clogging. So these two are internally related, so data clogging can cause log loss right.

B

So, given these scenarios from our architecture right, we have come up with a motivation so during like worst case scenarios when you want to debug and troubleshoot what is happening in a cluster, you want to prioritize log collection at the input level so that you don't miss on important logs right and and as a part of the aggression process. You want to make sure that your crucial resources, like network bandwidth and persistent storage, are not uh saturated. Given the resource contains constraints.

B

So, as part of our research, uh we have developed a open source benchmarking tool which allows you to generate and measure log stress conditions. So this allows this using this tool. We performed our experiments and had some form of exp reproducibility in our experiments.

B

One key feature of this tool is that it can allow you to configure your log rotation pace in the cluster, so you can control the log rotation pace and check out how many logs or what is the amount of log loss that is occurring in your cluster.

B

So before moving on to the observations, I will just give you uh an overview of what is our experimental setup. So, in general scenario, you have two groups of containers. One is very important which you don't want to miss logs from, and one is the less important containers which are chatty which are noisy and it's. It is okay.

B

If you lose some logs from those containers and the objective is we want to preserve logs from very important containers so that you can troubleshoot right, which is very important for you as a developer or an sre to come to a stable state, and the approach which we are following is that we can? We are saying that we can afford to lose some logs from less important containers and preserve more from what is important to us and as a baseline, we are using one of the existing open source plugin, which is called the throttle plugin.

B

It allows you to control the rate of logs flow in your pipeline and if the rate of incoming logs exceeds then it starts dropping logs.

B

In all these experiments, uh we have two graphs. One is where we don't have throttle, which is the normal situation, and one is where we have throttle applied to the less important containers so upon applying throttle, you start to lose log, but you also get some benefits right. So there's a trade-off in what you choose.

B

So in this case, as you can see like we apply when there is no throttle, the rate of collection from each group of containers is pretty much same, but when you apply throttle on the less important containers, the rate of collection for important containers, that is the blue graph, is increasing, and the green graph is pretty much pretty much controlled as you have set it in your configuration.

B

So this is what we want right during exceptional situations, when you, where you don't have any control of what logs are being collected or which logs are being lost, you are preserving more from important containers and you can. You are doing the best you can in this situation.

B

So, in a way, you are increasing your fluency capacity to collect more logs at the same time, dropping proactively so that you are staying current of what is happening in your system.

B

This means that if you control your cpu usage in fluency at any cyc it at any point whether it has in whether it is input, filter or output, you can collect more right.

B

um The second observation is more more related to the implementation of intel, plug-in and fluency. So uh this is an experiment which we did to test the impact of outputs buffer size on the reading nature of intel.

B

So when we varied the size of buffer, uh whether it is file or elasticsearch or any other common buffer uh common output plugins, which you use in your cluster logging, uh when we have a large buffer size, we saw that the peaks are different, where what I mean by peak here is the amount of lines or the instantaneous rate of lines read by each file, so different peaks denote different workloads and each peak denotes how much line is read from that file or from that workload.

B

So when you have different uh like when you have a large buffer size, let's say 1gb or the reading, like the amount of logs read from each file, is different and when you have a smaller buffer size, you see that the peaks are of equal size. That means, irrespective of what is the generation rate of your log you're, reading equal amount of lines.

B

So why is this important right because in in worst case scenarios where you do when you don't know what to do? You need a good amount of logs so that you know what is actually happening in your system. You, you need to have a good, clear snapshot of your entire system, so you need to have some information from all your all of your pods.

B

If one of the workloads starts going hey where it is generating thousands of logs per second, you don't have a good snapshot of other pods, so you can debug what is happening so in a way, you need some some form of fairness in your reading, so that you have a good snapshot of your system.

B

So, based on these observations like if we can control the rate of flow or as early in the pipeline, we can save some cpu cycles and increase our collection and we can ensure some form of fairness so that we can have a good way of debugging our system.

B

So here comes our the feature which is called group based uh throttling in entail, so you can form groups uh in your. You can define user groups in the intel plugin, and you can define rules for assigning each workload to a file to a group. Then you can. uh You can read limit the logs being collected from those files, and this uh feature ensures that your groups are bred equally and rate limiting is done at the time of reading.

B

So this will ensure that you have enough cpu cycles saved because you're saving a lot of time while in uh like, as early all in the pipeline stage.

B

So this is an example for the generic or default use case for kubernetes by default. It will extract information from your path, so the generic workload path in warlock containers follow a specific pattern where the first keyword is spot name, followed by namespace container or docker id.

B

So you can specify your your parameters like in the form of regex, so this rule states that match all containers which have the following name: spaces, space, 1, space, 2 or space 3, followed by a pod name and which, which starts from app dot anything at after that and delete limit it to a number of lines as 200 after every 30 seconds, and this will also ensure that 200 to the total line limit is 200 per group.

B

So 200 divided by the total number of files or the workloads in that group, will be the lines read by each file.

B

However, this intel plugin is not only used for kubernetes right, so we have made this uh grouping pattern, generic so that you can use for other files as well. So you can define named captures in your group pattern and then you can specify matching uh key file or hash hash table in your match, parameter in the rule directive.

B

This can allow you to customize and generalize your grouping rules so that you can use it anywhere. You want this uh feature will will be available in 1.15 release version of fluendy, which will, I think, will be released in may end, and you can have a look at the pr and it has not been merged yet, but feel free to look at it and make some reviews if you want now coming to what we are doing as part of red hat and ibm research.

B

Just a reminder. This was uh this is the api which we are using to configure our operator. So what we are now doing is we are defining some policies through which we can control different components of our cluster logging pipeline. For example, we can simply limit the rate of logs being sent to kafka to one gigabit per second to avoid saturating network link because, as I said, persistent storage and network bandwidth is very crucial.

B

Similarly, you can control certain parts from uh from a namespace with certain labels and you can define per container limit, which is a rate limit per file or for rate limit for the entire group, or you can simply ignore certain pods as well. That means you don't even collect from those pods. In this way you are saving again the resource, very crucial, cpu resources and you're concentrating on what is important.

B

So how does this api look when we apply these policies? So you can see on the red box. Is the limit reference like we have like? We are trying to drop or drop logs if the maximum records, uh the incoming log rate, is more than 50 lines per second per. Second sorry- and you apply this rate limit to an input application, which is where we have defined custom groups there, which is like uh collect all lines from uh pods which have named less important and are from namespace log stress.

B

So in this way you can control different aspects of your cluster logging pipeline, whether it is input whether it is output or you can also control the filter components of your pipeline.

B

So in to summarize, in this talk, we identified what are the different bottlenecks in our cluster logging pipeline, and we also showed you like what is like a benchmark tool which we have developed for, generating and measuring stress conditions.

B

Through our experiments, we saw how to increase collection through throttle plug-in and what is the impact of outputs buffer plug-in when we change the buffer size or finally, we uh we all have also come up with a new feature in intel which allows you to control log loss and, you know add throttling at the input level as part of our work in red hat, we are working on a policy based log, slow control so that you can control different aspects of your pipeline, including fluency and elastic search.

B

So this is our team of our five members and if you have any questions, please please feel free to reach out to us on this email.

B

Thank you. If you have any questions, we are here to answer.

C

So my question is: is normal to think of dropping logs instead of increasing the capacity of the of the aggregator, I mean I never went in the situation that I want to drop logs. I want to improve to avoid dropping them.

A

So in in kubernetes, the cpu and memory and resources of the logging stack, like any other set of applications, is also limited. So we don't want to take it to infinity when there are applications that emit a lot of logs really a lot of logs. It does make sense to put some threshold or some limit to the amount of log and the amount of resources that the logging stack itself is taking from the system, because it's it will start to affect the other application that you have on the cluster.

A

This is why it does make sense, sometimes when it's really a lot of logs to start to see log loss, and this is exactly where you want to see the log loss on containers that are not the most important containers that you have in the system. So you want to to balance the effect, and this is exactly what we're doing.

C

Okay, so, basically is to decide which locks to drop, because you can actually also put some limits on the containers at the kubernetes level.

A

C

Cannot decide which locks to drop.

A

C

B

Any other questions.

B

Thank you. Thank you very much.