Calyptia Fluent Bit, 26 Aug 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: [Webinar] Fluent Bit Operations & Best Practices

Description

Fluent Bit is a powerful tool that is more than capable of scaling to meet the needs of your mission-critical data and enterprise growth.

In this session, we explore:
1. Agent and aggregator patterns to maximize performance
2. Tuning Fluent Bit for cloud-native environments
3. Best practices in operating Fluent Bit in production
4. Key lessons learned from supporting Fluent Bit on 100k+ servers

A

Okay, so uh a brief introduction, uh most of you will not know me so I'm. Currently the support leader at caliptia. uh We are the as Austin said, the creators and core maintainers of fluent off the fluent uh ecosystem.

A

I've been working on engineering and in support a bit over 21 years already uh and I joined calypsia a bit over a year and I have been having a lot of fun, dealing with issues and use cases from some Fortune 10 customers that actually have deployments over 100k servers or nodes.

A

So I, it's been a a pretty nice trip so far, uh also on the open source side, I've been working with the documentation and issue triage in the in the GitHub, repo and I'm, also active on slack, along with Pat, the anurag and some other folks and and from time to time, going to or stack Overflow. That's not a huge community in our case, but still there are some interesting questions there. So if you're around you have any further questions you can reach out there directly. My handle will be lecados on those places.

A

Okay. So what are we going to talk about today? What we are trying to to share with you today, uh first of all common architecture, parents for the deployment of fluid uh how to.

B

A

These once you are in production and a little bit on tuning flame beat for Max performance and to to avoid unknown issues, uh I'll be done troubleshooting once you are in production, some some tips and tricks from supporting these large deployments and a couple best practices when deploying in production that we have learned by supporting these large deployments of our users and customers here in California, so uh before diving, deep into that, a little introduction to flu and beat for those that may not know uh follow me that well uh so fluency started as a as a log collector a long time ago, but then we have added uh metrics and traces, so it is now a full Telemetry agent and and fully open source.

A

Of course, uh these uh these deployments allow users to have a very high performance these compared to the the very good performance that flindy has in the past, or still has, but now it's from mid here around. So uh it's high high performance with millions of messages per second uh being delivered from uh point A to point. B body is not just a simple moving data from A to B right.

A

uh We also uh allow the users to filter to further process the the log records to enrich these, or even to drop data that you don't need in in your data backhand data that uh it's not relevant for your troubleshooting or for your uh auditing need. So that's something that is very useful too.

A

um So far. This has been uh deployed over 8 billion times only from the docker half. So we are not counting those where uh a user, a company May download only once and then uh download from their internal Repository.

A

So this is a a very exciting uh thing to share with you.

A

We know that there is a huge Community behind this and we appreciate all your your feedback that you provide through uh opening tickets and, of course, opening PRS on the public GitHub um a little bit on this 8 billion, uh the we have seen how the the fluid adoption has increased uh since March 2022, and now we should be already close to the 9 billion downloads and- and you can see there, that the the adopters are big cloud providers and observability providers as well.

A

They they use flowmid internally, either directly as the open source solution or by wrapping it in their own Telemetry or logs agent.

A

So uh how does flowbit work uh in? In general terms, we have sources from where you are going to graph your data, your logs, your metrics, your traces and you have at the B part. You have the things where you're going to move all your data and and in between you, you can transform this data, as I said before, and- and you can do this from any number of sources and then move this data to again any number of destinations or syncs in between you.

A

Have uh your data save by uh by providing a uh a mechanisms to to deal with a problem back pressure or when your service is not available.

A

So you can have different configurations to uh to keep your data when, when that's needed, um when you deploy to a kubernetes containers with a single demon set deploy there, you can graph all the logs metrics and traces that that you are creating in in that particular host and and then you can of course, start working over that we have support for Lua and Wasim for even a more complex processing.

A

You can have your own Lewis scripts to to enrich your data or to drop your data or do whatever you want with it and yeah by using Lua scripts. The flexibility is huge and we support that and yeah. We have over hundreds of uh Integrations.

A

So, as I was saying, uh this uh could be one engine to to rule them all so and and part of these uh plugins that we have are generic tcpe and HTTP uh inputs and outputs that you can easily connect to your legacy sources of information that only know how to solve TCP or HTTP or even new uh new destinations that you may need to move your data to uh and also you will have a specific uh vendor or technology.

A

Plugins like uh like Splunk like open search, elastic, Azure log, analytics, datadog, Google, Chronicle Etc, you have the flexibility to send to any of those with specialized plugins and, of course, we we work with common data structures uh if you need to, for example, uh send on gizi GC format.

A

If you need any other formula like Prometheus or hotel or Avro or any other formal yeah, we have support for those, and, as mentioned before, we are now a full Telemetry agent, with logs Matrix and traces and and a lot of filters that will allow you to enrich the data or or drop it.

A

A

A

Karen, will you please okay, so we're gonna now talk about the common architecture, parents that flu and B that is has been deployed? We have seen this through uh our experience with users and with customers and we're gonna talk about uh two of these more common uh architecture patterns.

A

The most widely adopted is uh as a collection agent, where you will have one agent in in one node, and this Asian will collect the information from all your applications and also from the operating system and from whatever source that you may have there.

A

If you also need your uh the information from your from your nose like using no exporter flow and bead can also uh collect that and then send it out to to whatever thing that units and that information right um this uh this agent is the the equivalent of uh sidecar container on on kubernetes uh there's some advantages and disadvantages on using disparing or or other uh on this case you, you can have all the uh processing done on this note and then send that information to your to your things already processed already enriched or already filter out.

A

So this should be um a pattern where you will have the the processing completely here in the in the agent, and it will come with the uh with the same advantages and disadvantages of having a sidecar right. So it will consume uh resources from your path. So uh you will notice the the resources consumption. If, if it is consuming a lot of resources, you will notice in the in the Pod performance.

A

And this uh these sidecars could raise against the the application pod right. So if one started before then the other there could not read all the logs Etc some some things to to keep in mind. Usually it will. It will work fine, but it's something to keep in mind when you start seeing uh unexpected behaviors right and then the the other pattern will be using as an aggregator.

A

So on this pattern you will have again one uh Asian per node, but the processing that you will do in the node will be minimal. uh These. This is a kind of centralized processing where you will have these. uh This fluent bit agents like forwarders of the data only so they will collect and immediately uh or with a very low uh processing, send the data to your aggregator and your aggregator could be one or more instances of flu and beat or even fluency.

A

If you still have that, where you will do most of the processing where you will do all the processing, if you are not doing any at the at the node level- and this also presents some advantages- if you need to, for example, change any configuration- you will only change it here and you don't have to redeploy or restart the agent in in all the nodes.

A

So this uh this will allow by moving your process into the aggregator, you will uh have a Max throughput here at the origin of your data and, of course the resource utilization will be will be really low. Bloombeat is already lightweight compared to fluency.

A

For example, the the memory usage was previously on fluency of about 60, Megs and now is below one Meg.

A

So uh only with with this, uh with these improvements- and with this pattern you you can maximize the the throughput from the origin to your aggregator and then you can have here all the all the configuration that you're required to preserve your data to further processing Etc. So the the actual uh processing power here in the aggregator will be able to escalate independently right.

A

If you need further power, you will not have to do any modifications in your Source, but you will grow here as desired on only in the aggregator side, yeah, and- and this of course will require to have uh more power here right uh it. It is a violence if you have uh deal with performance issues, if you have deal in the past with um tuning scenarios, uh you know that it is a concession game and where you try to balance what your, um what you're improving and what you can lose because of that Improvement.

A

So uh it is the same here. The resources are limited, so we have to balance all these variables and and yeah. The single point of configuration here uh will allow you to uh do modifications as needed and to not have to restart or move your configuration at the original of the data and, of course again as fluenbid is uh Bender agnostic or vendor neutral agent.

A

uh If you decide to change your sink or to add a new one or even keep sending to different endpoints uh at the same time say, for example, you will send your the debug level logs records to uh cold storage, because you don't need it immediately, but you may need them later or if you want to send other live information to any other service lab. You only send that information that you require, and you will change only uh here in the aggregator component of your architecture,.

A

Okay, there was a question there I think.

A

From Francisco Peggy gurin.

A

Us in your typing, it seems an answer there.

B

Yeah I was just about to send a quick response. He was asking about um extensions, but I'll uh I'll respond directly in.

A

The in the chat- okay, okay, okay, I'm gonna, move then to monitor influence in in production, so um fluid allow us to to send logs matches and traces to to have an overview to observe our systems right, but we also need to to observe the The Observer here to monitor flu embed and and for that we have. uh We have exposed some metrics. uh We exposed these metrics um like uptime.

A

We expose metrics per uh plugin instance uh about the the breakers process about the use memory, the uh bytes process. We publish both, so you will see number of Records processed number of bytes of the records processed.

A

um We also provide metrics about the filters because the filters could add or remove records and and also how how much data we have processed in terms of bytes and in terms of records, and we also have metrics about this storage, which will become important as we progress here, we'll see what the storage has to do with this, but those metrics are used to be in the past in another HTTP endpoint. Now, with the API version 2., all the metrics are in the same endpoint. You can consume them from one uh endpoint. Only.

A

So how can you can you scrape these metrics.

A

Estimate that over 90 percent of the users will use Prometheus, so we publish our metrics using Json and Prometheus formats. You can. You can have both if you, depending on what you want to do with them right, but, for example, um for Prometheus we can scrape the the data from a single host or a single pod with a uh really a basic scrape configuration in there. And you will. uh You will query this uh this path in the server and flu and bit um exposes this metrics to an HTTP server, the the default Port.

A

Is you the 2020, but of course you can change it. You can choose on which network interface you want to publish that Etc and and then you can graph the the metrics data from there and also from uh Prometheus. You can gather the metrics from all your pods running in your cluster.

A

For these, you will have to uh make some configuration in both in the following bit: demon set configuration, which are these annotations, that you can see on the screen and, of course you will also have to configure your Prometheus deployment on that same cluster like provide the permissions, the relabeling that you may need Etc. This is a a well-known pattern to to gather information from floam bead, so there are a couple blog posts on the internet.

A

If you are interested on configuring this, and also as fluenbid may speak, a lot of languages, let's say uh fluent bit, can also write to your Prometheus instance. You can use the uh output plugin promise, use remote right and send your metrics from your phone bit instances to to your server and, of course you can. As shown on this example, you can add your Bureau labels to that data that you are sending.

A

Okay, so with this Matrix in place, once we, we have already started to to watch our metrics of flu and bit there's some patterns that we can look for and see if fluid is presenting any issues uh before we receive those calls that we are not expecting from our users right.

A

So one of the important patterns to look out for will be the back pressure. So what is the back pressures the? Basically we can Define these when in in certain scenarios, your ingestion capacity is uh bigger than the the throughput that you could have in your output side. So let's say you, you are able to ingest uh 50k records per second, but your data endpoint is only capable of receiving 30k per second or even when your endpoint is down. So what happens here? How can we uh detect this uh and what happens at fluency?

A

Is that, as it is start to retrying, to send this uh this data to your endpoint, which is not available or not capable of cope with uh with the Drupal that you're trying to send a fluid uh buffering mechanism, will start consuming more resources, of course, because it has to deal with this uh back pressure in place.

A

um So, basically, uh you could see if you are consuming your data from the the metrics, something like this. uh The the first uh the first chart is for input records, so you can really see at about 6K per second there in the input and it was similar on the output, but then in the in the third one you can see the difference between both right.

A

You can see that the output has gone very, very low and the input continues with a with a good rate of ingestion, so on this case, yeah we're suffering back pressure from uh from the data endpoint from the scene.

A

Where we're sending this uh this information and and yeah, we can deal with this using a file system storage mechanisms that mechanism that's the most uh recommended we are going to uh so it could be more about that later and once you, you see this uh back pressure in your monitoring chart, uh you should make some questions to see if you need to actually deal with it or how to deal with it, it could be that you have peaks of data ingestion or that uh the uh the end point is down or how the resources look in your in your node, in your server, where fluent bead is running, probably is having a different kind of issues that is uh uh not allowing to to send the data to your endpoint, sync and another part to to look out for will be the retries.

A

For this. We also have metrics, um you will will have in the flu and bit Matrix the retries per output plugin and what is a retry? Basically, when, when you are sending your data to your endpoint, uh fluent B will try to flash this this data to your endpoint right and then there could be three uh answers to this operation. One could will be okay, meaning, yes, I took this chunk of data, I send it over, and the server replied with an okay, 200 right or 2001.

A

Whatever the server responds and the other will be uh an error, it means that for some reason, I send uh something that is wrong from my side and the the service rejected, and it rejected it, because there was something wrong with this data. It could be a format, it could be an index issue in your in your data back end or any other reason that makes this uh shank non-retrial okay. So it's an error. I will not retry this and the third answer could be.

A

Yes, there was an error while sending the information, but I can retry. This is, for example, when the end point is down or when the end point returns, uh 500 there or or even when the server not responded with the 500, but with a 400 saying that hey I'm receiving too too much uh connections at this time too much uh data coming my way so I cannot serve you now. So this will be the the three and and in the case of the third one, when I can retry to send this chunk.

A

uh Flume bit will schedule uh retry task so internally, the the fluent bit engine will uh ask the the scheduler to to a schedule, a retry task, and it will be retried in the future. Okay and yeah, and the other will be the the drop records.

A

uh Disrupt records uh are counted when I have records that are not longer retrievable. This could happen because of this uh 400 errors right where I send uh in proper data to the to the endpoint or when the retries were exhausted. You can tell who embed hey if the end point replies with uh with an error which is retrievable, then retry three times five times or retry forever or no retries at all.

A

So in any of these cases on the retrievable that already exhausted the retries or when in in the case of an irrecoverable error, we will count the drop records and then yeah you will uh have to start looking into the logs into the the flu and bead logs. What is uh going on here?

A

So these are the the both uh common parents that we have reserved. The reason for that are are multiple are, as mentioned before, um that the endpoint is not available, or we made some errors in the in the formatting or even in the index Creation in our uh data endpoint I'm, seeing more questions sustained yeah.

B

There's a there was a quick question on um what logging is expected for: retries dropped Etc, and then there was a follow-up question to that. That was asked before by John.

B

I believe we uh we just covered some of it actually, but um for the question yeah, um the has the V2, metrics and Prometheus endpoint been tested against Telegraph um John said that he recently tried using it, but got an error about duplicate help statements.

A

Okay, no I I, don't have this! That I think is a good uh idea to uh to reach out to GitHub or to Slug, and we can take a look uh into that that that's interesting I haven't used that against uh with Telegraph.

A

Audio was from Telegraph, not filming John Lewis, saying.

A

In the chat, okay,.

B

Yeah kamola we'll follow up with you uh afterwards and see if we can help uh help with that error.

A

A

A

Yeah, okay, a little bit on tuning for a bit on this uh Cloud native environment.

A

uh Before going into that uh a brief overview of uh Alpha limit works right, so we have multiple sources that we can graph data from then in the middle, is flu and bit processing and and then send into two different endpoints and to to do these flu and bit will instead of just saving the the Json, the most common format out there.

A

Instead of uh saving this data through the processing through the pipeline, it will use message pack that that we use to, uh of course, say space and and the one that we use uh during the whole pipeline. So fluent bead will have the ingestion size, the input plugins then we'll have the filtering, then a buffer, and then it will route the data to the different endpoints and and in the middle. Then it will use message pack to uh to save the data. As you can see there, there are savings.

A

These are simple examples of how message spark use less space than Json. So this is what what we use for data serialization inside flu, embed and and then once we we have this data. Converted to this format. We will have our uh shank files, um sorry chunks, which are also chunk files if you use the file system storage, but the the base structure for this is a chunk.

A

So a shank will be a group of events that uh belong to the to the same tag right you'll, do processing, you will convert it to message pack and you will put it into into the chunk and if there is any other filter in the pipeline, it will repeat this operation will unpack from message pack we'll do what it requires to do with this event and then we'll pack, it back into message pack.

A

So the the uh the format that fluent use will be this I shank with attack with with the records, and we made that data as well inside it.

A

So when uh flambeat is working with this, it will have chunks of two megabytes. This is kind of a soft limit, so let's say that from those two Megs you, you have only uh 15 kilobytes free, but your next event is of 20 kilowatt, so uh it will save it to the to the same shank, most probably so, it will move a bit over the limit, but uh the the soft limit for this is of two megabytes.

A

This is uh something uh hard-coded on fluid, uh not a configuration parameter yet, and film bit will process this data and we'll uh append to shank based on this uh buffer chunk size. This is uh configuration uh setting that the users may change. We have basically to uh two parameters. There they've offered chunk size and also the max size, but these are the units that will um will tell fluid how much data at a time it will process and all this processing is done in memory.

A

Even if you have uh a file system back storage, it will do all the uh all the processing in memory fluent bead has this concept of having chunks up and chunks down. So, while flu and bit is working with a shank, it will be uh up in the state of up. It means it is in memory right, not not on the disk. In case you, you are using file system, storage and once uh fluid is done with that, it will put it down right.

A

That could not be immediate, because fluid will then pass this instruction to the kernel and the kernel may decide to do it later on time, but uh this is how a fluent bit works and how it treats this. um uh This chunk structures.

B

And Lucario there's a quick question: if messages are over two megabytes, um is there any issues with the chunking process.

A

Okay, yeah good question, so um what will happen is that you will have chunks of over that size. Okay, fluent bead will not cut it out. The the option where the the event or the log event could be uh carried out will be or left out is, if you, if you use, uh skip long lines. Okay, that's a parameter that you may use for that. But if you you're not skipping long lines- and you have a message over to Max, uh the the chunk will grow uh enough to uh to allocate that log event.

A

And yes, as Ryan, says there in the question, yeah it's about pattern and- and you should try to to troubleshoot that to know why you're having such big log events, we have seen uh even funny stuff there. We have seen like a full PDF files written to the log file, so yeah, it's something that you need to to look into.

A

There's another question of enemies: I think yeah.

B

I'll uh I'll grab that one and answer it in.

A

The okay, exactly.

B

um There's another one from Francisco, though, so what is the use of using different sterilization mechanisms with fluent bet pipeline, since it will again be destarily or de-serialized at the output? Is it to reduce memory usage of float bit.

A

Yeah exactly as we mentioned before, throughmb was born after the experience with fluency right. Fluency was written in Ruby and with a with a bigger footprint on the system, and now yes, fluent bead is a lightweight Telemetry agent. So yeah, that's one of the reasons we are using a message pack to actually compress the data yeah thanks for that question. Francisco.

A

Okay, so some settings that flew me disposes to to deal uh with this uh will be the the aforementioned buffer chunk size. It means uh flow and bead will process uh this amount of uh data each time for each file. Okay, this uh thinking that we are using the the tail input plugin. This uh configuration uh parameters are per file, so per file fluid will process this amount of data, this uh amount defined by buffer chunk size and the buffer Max size.

A

Yeah is the maximum it will have for each file, and if you are not using the file systems storage, you can also limit the amount that fluent bid is going to consume by using the member lamin.

A

So um it will uh pause the ingestion if this limit is Rich and you are not using uh their file system storage and then you have a storage limits, mostly at the at the output side and and also on the on the input side. But this uh could be. These have been proven more useful in the in the output.

A

So in the in the input side, you can improve performance by enabling threaded input, plugins, uh slow and beat before these run in the the input.

A

In the main thread of the application, so of course, if you have Parts in there or any other operations, or you have a lot of input instances, yes, you will notice that that the ingestion was not as performant, but now you can Mark a input plugin as threaded, and then it will create a different uh thread or quarantine to to process the the input instance and recently we have introduced processors that will allow to have more filters in the in the input side, not not only, for example, a parser that you can Define on the tail plugin, but you can put there any any other filter that you you may need, and those will run on this um on this thread, uh not on the on the main thread and in the output size.

A

You have workers uh fluid used to run uh again in. It was a different thread, but still only one thread, but now you with the workers configuration parameters, you can tell uh fluent B2 uh open, uh let's say eight different uh workers for the same output plugin and they will run on their own on their own thread and you also have the processors there I think uh anurac talk about this in on the previous webinar.

A

uh But if you are not there, this will allow you to uh send or filter further filtering your data before, but just before, sending into different uh outputs.

A

So these uh this could help you if you were doing uh rewrite tag or sending from one pipeline to to the other for further processing before sending to your uh data endpoint. So now you can do this at the end of your pipeline right. So do the common processing from the left to right, and only once you are at the output you can yeah do further processing.

A

Let's say you only want to send the uh in for log level data to one output yeah, you can filter those out and only send what you require and in the output the storage limit. This is to deal with uh with the back pressure right, uh how much disk or will will you use to deal with this? Will you allow losing records Etc, so this will allows you to to say how much uh you want to use from your uh from your system.

A

We know that uh some users even will not use any disk at all and we'll only want to have something in memory right. So so you can. You can manage this um settings in the output side to to limit the storage usage.

B

So Carlos we have a another, quick question there. If you run multiple workers, does that mean that Downstream, you might get events that are out of sequence.

A

Okay, good questions um yeah. Well, actually they, this sequence of the the log events, is not guaranteed when, for example, you have retries right um if, if you have retries and and one new chunk, which has new uh log events, uh if that one succeeds before the retry of the previous one yeah, they will not be in order, so in general we do not guarantee the the order there. Fortunately, thanks for that question,.

A

Okay and and also related to this um compacting data on on fluid.

A

If you have multiple filters running in in your pipeline for for your pipelines in your configuration, this could degrade the performance because it will be constantly uh decompressing and compressing again from Json to to message pack and and for these, uh what we proposed was to use Lua, so, instead of having one filter after the other, you will have only one Lua script that will do all the Transformations required by your pipeline. uh We did uh in in conjunction with with one of our partners.

A

uh They did a lot of testing on these. uh They they have amazing uh images about that, and- and they proved this and and we did some testing and yeah. This is the the recommendation that we can do now to. If you have multiple filters there, it's better to use one Lua script to um to do all the Transformations required.

A

uh If you miss the the previous webinar there Thiago our Lua expert, he he mentioned uh this uh way and he's also around on the slack and on GitHub. If you, if you have further questions about uh the use of Lua- and we have uh this uh multi-line support- and we have basically two uh ways of supporting these one- is the built-in multi-line parsers- we have a parser for python, go Java and, and you can use them right in the in the tail input plugin and in in the other case.

A

Another scenario you can use your own custom, uh multi-line parsers uh there you will Define your multi-line, parser or regular Expressions right. You will first Define the start state which is uh actually required. That does must be the state name and, and you define yeah how the the first line of a multi-line message will look into this uh log that we are gathering right and then I could look the second one and and so on. uh These are regular Expressions.

A

So you will not go over each line of of your multi-line parses, but you will have to Define them to to match with the with the body of it right uh this being open source. You can take a look into the into the code how the different uh multi-line, built-in multi-line parsers are defined and, of course we are there if you, if you need help with with it through the uh community and long lines. We mentioned that with the with the previous uh question.

A

uh You can skip long lines if, for some reason there is no uh end-of-line character in your log record. You can skip that if it's too too long- um and we have also seen one issue- called the clubbing issue where we see that records that are supposed to be uh or not multi-line are growing uh immensely right. We are, we have detected uh logs over uh 5 Max, and that was because the the regular expression was not correctly defined.

A

uh We used to recommend the the use of rubler.com to to test your regular expressions or or even we are. We have seen some users and and part of the team asking to uh a AI shot, whether or not the the regular expression is is good or if it is uh optimal. So that's something to to look into as well.

A

Okay, a little on the troubleshooting uh in in production. uh These are common things that we see in both their community and some of our customers. So let's say that your data is not flowing right.

A

Our recommendation is to start a simple and then increasing the the tooling to to find the root cause of why this is not flowing right. If you start with the flu and bit log file, uh you should have very good Clues on what's going on. There pay, of course, attention to errors, but also look into the warning.

A

Messages uh through embed will tell you that it could not deliver a chunk of data as a warning, because it's retrial right, but it could be that your employees down but uh fluent bid, is not yet capable of detecting that pattern and act upon it. So you should yeah look into the warnings, see what they're saying and see. If there is something that you can do to remediate that, and if is nothing there or nothing is clear there. Then we recommend to increase the log level.

A

We know that this could use a lot of disk space but yeah it works on sometimes to increase the log level to debug or to trace and see. What's going on in your pipeline and yeah, of course, you have the metrics, take a look into your dashboard, see if you have drops or retrines Etc I think.

B

A

Another question there.

B

Great, uh is there anything like fluent D's secondary, so logs aren't lost.

A

uh I'm not familiar with Flynn this secondary.

A

um So sorry, I cannot answer that. We can probably follow up later.

B

A

Sounds good we'll follow up we'll.

B

Follow you directly Phil, okay,.

A

And okay, let's say we're dealing with corrupted and rejected messages uh yeah in increasing the the debug level, because uh we have some extra information there from Splunk elastic, open search.

A

uh You can have the trace error and Trace output options, so you will see uh the request and the response from from that service and you can find there that there could be because of uh mapping or indexing issues in your data back end and the the always very useful standard output, blue and bit has a an output, plugin called uh STD out that will print the messages to your standard output. So you can see what is actually trying to send over.

A

And another common uh case that we have seen is that the formatting is seen correct, and for this we recommend first check the source check that you didn't change anything in the architecture or the there were no changes in the application for that. Of course, you use this standard output plugin and look for changes, for example, on kubernetes. We have seen lately an increase of these cases where they changed the runtime format.

A

So now it seems that floam bit adds some extra data to the to the log record, but it's actually that the runtime change so the format of the logs change and if, if your log events are multi-line, yeah take a look whether or not. This is a a built. Sorry if this is a built-in uh parser, if it is a custom, if it is correct, validate your regular expressions and yeah that that will be the the recommendations on on these cases.

A

A

In production, this is something that we we have uh learned together with users and huge deployments in in our customers uh over 100k uh servers deployed. There have shown us a couple things uh if you are not allowed to uh to data loss, uh yeah use storage file system.

A

So if your backend is not available or phone beat, for any reason cannot send the data, you will have a space literally a space to retry that in the future, so use a bit of of your disk and to to save that records until flu and beat can retry or successfully retry that um what happens when, when there is an error from the um from the endpoint, is that it could be retrievable or not right. So this will determine whether or not fluent bead will use your storage to save those.

A

uh While it is uh waiting for the for the retry right. If, if the data is finally not sent and yeah flame, it will count that we'll say: hey I had to drop this I could not send them for further information. You have to check the log files right, but it will. um It will delete this. uh These records that are not retrievable or that were retried uh to the Limit configure and then is the the retry logic that enters into the game.

A

uh This retry logic by default, we'll retry only once, but you can, you can tell to either retry uh Without Limits or a fixed number greater uh greater to one. Of course, a fixed number of rejoice and this retries will will be managed by the the scheduler not directly by the flu and bit engine.

A

And in the ingestion side start simple, then specialize right. If you deploy fluid, as a demon said, it will graph all your logs and and then you will have to start learning from your locks right know. Your logs know how the different applications that are sending data to your uh to your pipelines or that they're consuming through your pipelines learn how their is their data if it is a structure or what will be the regular expression that will actually parse that data that comes in your log events and again again check your regular Expressions yeah.

A

We have noticed that uh regular expression can have a great impact whether they are greedy or not. They can impact in the in the processing. We have done changes there that somehow help to deal with this, but uh by removing the the processing from the main main thread, but yeah. You should always always check your regular expressions.

A

Okay, I think, that's from me now now going back to Austin.

B

Great thanks girls. um So if.

A

You guys haven't joined our slot Community, yet.

B

uh Feel free to use this QR code, um I'll drop some or drop some links in the the chat. um If you aren't able to access this. So so don't worry we'll leave this up here for a second for anyone who wants to join us on slack, it's a great place to ask questions.

B

um It's a great resource to reach out to lucaros Pat on rug. Anyone else on our team we're all in that chat and uh respond relatively quickly to any and all troubleshooting issues. So it's a great place to be um yeah Let's. uh Let's jump to the next slide.

B

Big announcement for our fluent bit summer series so we're excited to announce a full half day, training, that's covering a lot of the content that we covered in the three-part, webinar Series. So if you've enjoyed this content- and you want to get Hands-On we're doing essentially a half day lab with um Eduardo on our team- um probably Jose and honorag as well on our side and we're going to be going through an intro to fluent bit uh fluent bit, Advanced we're going to be covering a lot.

B

Some processing and things of that nature in that session, and then operations and monitoring um really really excited to kind of offer. This training to you guys and kind of get you some hands-on experience with all of the content that we've covered in our webinar so far. So if you're interested another QR code for you to sign up right there um and we'll include this with our our follow-up from the webinar today as well and I'll include it in the links that I'm about to throw in the chat too.

B

Great and without further Ado, uh let's move into q a we had a lot of great questions during uh the conversation today. Guys so really appreciate that if you have any last last minute, questions or anything you're curious about feel free to throw it in the chat now or throw it into q. A and we'll cover it here.

B

um There is one that that just came in and Jose. So if, uh if you wouldn't mind uh when there are logs in queue on Startup, does flumpit handle those the same way that it would logs that are received post startup.

A

Okay, so yeah kind of um fluid will do basically the same right. It will uh process these uh files that already exist there. We call them static files uh instead of live files, which has which are the ones that we um tail. uh But there are some uh tuning options in there.

A

It will basically try to ingest as fast as possible, but if you see that it's uh using a lot of resources in your um in your node, where it's deployed, you can tweak that a little bit, you can restrict the amount of data that fluent bead will process from these static files on each uh uh loop of the loop event. Loop sorry.

B

And uh and one last question here: um is there a way to process different log formats when they're all coming from the same source.

A

uh Yeah yeah, that's another thing we have seen uh with our users. Yes, you can tell fluid in the in the parser or in the multi-line parser parameter. You can say hey here: I I could have either go python, Java uh format, so you will just put the those uh three uh parsers there separated by a comma and flu and bit yeah, we'll we'll try those different formats and we'll apply the the corresponding parser to them.

B

Awesome, thank you so much so Carlos. uh If you wouldn't mind, jumping into that. The last slide that we have here today. Thank you guys so much. Thank you for coming to the three-part webinar series. um I noticed a lot of you kind of join for all of the the three webinars.

B

So we're really grateful to have you guys um and excited to producing some of uh some more content with the film bit summer series so look forward to seeing a lot of you at the half day, training session as well, um if you're interested curious and have questions feel free to reach out to us at hello caliptia.com, um but with that I will wish you guys a good afternoon good rest of your morning, good rest of your day um and we'll see you guys soon.

A

Thank you. Everyone bye now.