Calyptia Fluent Bit, 5 Jan 2022

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: An Introduction to Fluent Bit

Description

Here you will learn
1) Why logs are important
2) The challenges of collecting and consuming logs
3) How Fluent Bit works - and solves those challenges
4) How to configure Fluent Bit

A

In this video we're going to talk about fluent bit, which is a locks and metrics processor tool, as you know, all applications need logging and the main use case for logging is data analysis, something breaks in the application. You check the logs to see what caused the error or you're, trying to reproduce a bug and by looking at the application logs, you can understand what happened or simply to have an overview of what your application is doing.

A

Logs can come from different places, logs are produced by applications, but also server processes and so on. So you have different sources of logs and fluent bid is actually a general purpose. Log processor, meaning it can read and process logs from all these different sources, but note that, in addition to collecting logs fluent bit, also has metrics collection capabilities for embedded linux systems. For example, it can gather metrics on cpu memory, storage, etc and because its general purpose fluent bit can be deployed on any environment like bare metal servers, virtual machines, embedded devices and containers.

A

However, fluent bit is used the most for processing logs in kubernetes clusters. Now the challenge of logging in complex environments like kubernetes, is that you have many different applications which produce logs in different formats. Each application is running in containers which run in pods, which then run on kubernetes nodes. So, in addition to the log message and the application name itself, we have all this additional information about where the log is coming from. So if you have five replicas of the same application, you want to know which pod replica on which node produced this log.

A

This means the challenge is to collect these data from different sources and then process it like parse all the values and identify where they are coming from, as well as what the actual log contents are and parse them in key value pairs so that they can eventually be stored in elastic or kafka, so that, finally, we can see the logs and do data analysis on them. So, as you see, the log processor has a very important but also challenging job now. Processing. The data of course needs resources.

A

The log processor needs enough memory, storage and cpu resources to collect the logs, then parse the logs and filter them, and this should all be done as a background task right. It shouldn't interfere with your main application's performance, because then we have compromised the speed and performance of our application for a proper logging mechanism and, of course, the requirement for resources increases when you have applications with high throughput, meaning producing high amounts of locks.

A

So, as you see, the log processor not only needs to collect and process logs, but it needs to do it in a performant and resource efficient way. So we need a lightweight and high performance log processor and one of the most popular ones today happens to be fluent bit. So how does fluent beat work fluent bit uses input plugins to read the logs from the data sources, for example, if you need to read log files, you need a plugin to read from log files. If you're going to receive messages over tcp, you need an input.

A

Plugin that listens for messages over tcp and, as mentioned at the beginning, fluent bit supports many different input. Sources fluent bit also has input plugins for metrics data collection, for example. It supports statsd and collect the input plugins, but also supports collecting metrics on the host systems.

A

Cpu memory and disk once logs are collected and read, fluent beat, will process them and, of course, depending on the log format, we would need to parse them differently, for that fluent beat has different filters and parsers filters can be used to change the log record or even add some additional metadata to it like pod, id or namespace, where the log is coming from and so on. You can also use filters to drop or ignore some records to make the filtering even more flexible.

A

In fluent bits, you can use custom lua scripts as filters to modify and process the records. In addition to all of these, one unique advanced feature that fluent bit has is sql stream processing. This allows users to write sql queries on the logs or metrics to do aggregations calculations even time series predictions.

A

This is super useful if you need to calculate an average max or min before sending the data to the storage or count the number of times a message appears or aggregate data to reduce data costs. The best part about the sql stream processing is that no database is required and no indices are required.

A

Everything runs on the same lightweight high performance process, so you still keep that high performance and resource efficiency of fluent bit after the logs are processed fluent beat will send them to a storage like elasticsearch or splunk, where you can then see the logs in a nice visualized format again fluent bits supports many different storage backends and to send the logs to the storage. Backhands fluidbit uses output plugins.

A

So basically the input plugin knows how to transform the data of a specific format to what fluent bit can read and process. So, for example, tcp input. Plugin knows how to parse tcp data into fluent bit data. An output plugin knows how to transform the fluent bit data into what the output target understands.

A

So elasticsearch output plugin knows how to translate the fluent bit data into the format which elasticsearch can read and save, and in fluent bit you can send logs from multiple input sources to multiple output destinations.

A

You can do this log routing pretty easily using tags, you can add text to logs and then group them so that you can say parse all the logs with a tag that starts with apache with this parser or send all the logs that match nginx to elasticsearch.

A

Now, how does fluent beat actually run in a kubernetes cluster fluent bit gets deployed as a daemon set, which means it will run on every kubernetes node. So when a new node gets added to the cluster, a fluent bit pod will start there immediately so on. Each node fluent bit will gather logs from all the containers on that node. In addition, it will gather metadata for those logs like pod, ip container, ip name space and so on from the kubernetes api.

A

A cool feature of fluent bit is that we can suggest which parsers to be used on pods, using annotations in kubernetes configuration files. Some other advantages of fluent bit are that it has a pluggable architecture as a log collector, it doesn't try to replace the data sources like systemd or journal d. Instead, the goal is to integrate with different data sources and to do that fluent bit needs to be able to talk to tcp, read logs from a file system talk to systemd api, etc.

A

It also has built-in security, because when you are sending logs from the cluster out to the storage back-ends, you are talking to third-party services outside your cluster. So, of course you don't want your logs to be sent in plain text. You want to use https or tls for that connection, and it has a simple architecture which makes it easy to scale fluent bit on.

A

Hundreds of servers because, as I mentioned fluent beat, will run on each node in the cluster now fluent bit works in a very similar way as fluentd, which is another log processor from the same company. So if you know fluentd, you may be asking what is the difference between these two if they work the same way, which one should I use in which case?

A

First of all, fluent beat is much more lightweight than fluentd, which means it's highly optimized for performance and low resource consumption compared to fluentd, and as I mentioned at the beginning, if you have a complex application setup which generates a lot of logs, you want your log collector to work efficiently.

A

So fluent bit is designed to run at high scale with low resource usage, and it's actually the preferred solution for containerized environments. However, fluent beats follows the similar philosophy, as fluentd as a log processor, but also as a matrix processor fluent bit, is actually a cncf sub-project under the umbrella of fluentd and also they're, both vendor neutral.

A

So they can run on any environment regardless the platform and also interesting to know that there are even use cases where you can use both fluent beat and fluency together to create a very efficient and high performance log processing architecture for your environment. If you're interested in learning more about fluent beat, I recommend checking out the online resources and documentation of fluentbit.