Calyptia Fluent Bit, 11 Aug 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: [Webinar] Fluent Bit Advanced Processing

Description

Fluent Bit has a number of features to help users enrich, redact, remove, or reduce logs as they are sending data through Fluent Bit.

In this webinar we will go through the following:
1. Introduction to Processing with Fluent Bit
2. Advanced Lua and Stream Processing
3. Processing in the Real World - Best Practices for Redaction, Reduction, Enrichment & Tagging

A

So everyone thanks so much for for joining us again. This is going to be Olympic Advanced processing. This is the second in our summer series, uh just as an intro to myself, one of the co-founders here at calypia maintainer Flynn bit um about about 10 years in the fluent ecosystem. So now about going going through uh fluid D and when that started all the way to Flint bit I'm fairly active on the community, slack and other channels.

A

So if you're posting things and discussion and the GitHub uh very eager to to answer any of those questions, I'm also joined by by Thiago who's, also a fluent fit contributor and a software developer with multiple 15 years plus experience and, of course, work on many open source projects. uh You know one one example: creator of the neovim project so with that uh let's go ahead and get started here.

A

The way we're going to do this is one for those who might not necessarily know much about Flint bit we're going to talk a little bit about that. Then we're going to head into Flint bit processing uh talk a little bit about Advanced processing with fluentbit and then, of course, talk a little bit about some examples, so showcase this working in real time. How does it work? How can you start to make use of that pretty much today?

A

So let's go ahead and start into what is one bit, so you know we won't make any assumptions about how if folks know what what fluent fit is but really start it all. With logs collecting all the logs, you can from almost any Source tail log TCP and we've added metrics and traces so really combining that entire observability data into a single all-in-one agent, extremely high performing so something that's written in C.

A

That kind of Kernel level type code and and being able to collect that very very high scale maintain a very lightweight a lightweight profile and, of course, it's more than just a simple pipe right. We have not just a to B But A to B, with changes with processing which we're going to of course dive into today. Now, don't worry, you're, not the only ones.

A

Using this, we've been lucky to see a lot of folks adopting this, whether it's major Cloud providers, observability providers, retail, uh you name it- there's most likely fluent bit- embedded somewhere across the organization or Enterprise, and we've we've seen about eight billion plus Docker downloads, which is uh steadily increasing day over day.

A

So one thing that we don't talk too much about and is super interesting, is how can fluidbit get used from an architectural perspective? So we talked about it collecting a lot of these logs metrics traces, but from a collection side. What does this actually look like? How does this get deployed and really it is something that can get deployed at the node side. So if you have a server, you have a laptop, you have a containerized environment. We can deploy fluent bid at that node level.

A

It collects logs or metrics, or traces from applications that run side by side or next to it, and then routes it to multiple or any end destination so going and sending data to Splunk elastic datadoc. It's a very uh specific use case or very well used used case so folks, who are leveraging fluent bid across hundreds of thousands of servers. This will be a very, very common pattern, uh one of the more widely adopted ones for that now. Another is how flimpit is also adopted as an aggregator and from an aggregator side.

A

The way we think about this is you're, not just collecting data streams or collecting local information, but you're, potentially processing that you're doing some sort of transformation. uh You are building heavier machines to handle high high traffic flow, so, instead of just being used as an agent, you can also leverage it as what we call an aggregator, and the nice thing about this is fluent bid, can send data to itself.

A

So you get to use the same exact binary, the same configuration from the agent side to the aggregator side and support all the same end destinations that you normally do with flimp it as an aggregator.

A

Now, how does processing influent bid work right? It doesn't matter if it's an agent doesn't matter if it's an aggregator. If we expand our fluid and we think about that workflow of how it works, it's got an input. It's got some parsers, it's got filters. It's got some buffering to make sure that data is reliable and and sent to the correct place and then a router that takes that data and sends it to multiple outputs.

A

So you know again, that's Blanc, elastic, datadog, open search whatever may be, and within that whole entire data stage of processing and fluent fit. There's multiple layers of where you can start to do some of this processing. So you could do it at the input side. You can do it at the parser side and you can do it at the you can do it at the output side now again with coolant bit.

A

The idea here is we're connecting all these sources and destinations really servicing, as that data backbone for your Enterprise, as you are connecting all of your data destinations, your your uh your databases, your data backends uh also collecting that data. So, as those tools are getting used, how do you make them useful you plugged it into them? You send them them stuff as real time as possible, making sure that we can connect as generic with the systems you might already have in place.

A

So things like TCP HTTP, if you have a specific vendor or technology, making sure that we work with that common data structures and, if you're, doing compression geez, if you're using things like Apache, Arrow working with that or parquet or or other formats that are very, very well known last but not least, write the data types as we mentioned logs metrics traces working with common things like Prometheus working with open Telemetry. These are places where fluent bid excels, because it has that really open broad ecosystem. That's built on top of this High Velocity open source.

A

So with that, let's talk about why Flint bit for processing and in order to do that, we want to take a step back and look at how most of our users are doing, processing today or how folks in general do do processing today right and in this, if you look at how do we, how do folks go and attack Tech or attack processing?

A

Today, we typically see a lot of complicated Stacks with Kafka Spar Flink you're, using things like Kafka streams and and really, if we expand that out to look at what folks are doing, it can be simple things like hey I, want to add a field name I want to do a small calculation, I want to do a redaction uh and, and there we're leveraging really heavy Java processes.

A

I mean these things are meant for very Mission critical streams of data looking at large slots across time time windows, but many times we overuse and over complicate it for simple tasks. It's it's a lot of Overkill and you think about from a heavy implementation like a full Java virtual machine.

A

It is something you have to train on something you have to learn, and it's always post collection right, so you're collecting the data you're, sending it to these systems and you're doing a lot of this processing today, and what we've found is that, as folks are building out these, these Tech Stacks with uh Kafka Spar link K SQL, Apache nifi.

A

They are adding to this this complexity by saying great here's all the secret rules that run on top. Here's all this logic that we're building on top and many times we can. We can bring that to the agent side as we're collecting the data. How can we enrich it? How can we contextualize?

A

How can we redact- and that makes it much much more appealing, I guess as a as a quick question for for folks here, if you, if you're able to put in the chat, would love to hear uh you know, are you using Kafka and Spark for a lot of use cases today? Are you using it for some of the simple data processing always interested to to make sure that we can showcase what we're seeing with these Tech Stacks with with the fluidbit as well?

A

Now, what are the use cases, and why would you use Limpet for processing now the the goal here is not you can get rid or or dump any of those uh highly complex processing, Stacks, no way we will keep those for the use cases they're really great at for the simple things: the things that were overkilling our systems for and overkilling our operations or developers or practitioners, for we don't necessarily need that. What does this mean things like schematizing or formatting logs?

A

If you want to parse, you want a format into Avro, for example, you don't necessarily need to write these giant SQL queries to go. Do that if you need to do remove sensitive information as it's streaming, you most likely want to do it at the collection layer, because that way, it's never stored in a way that that sensitive information could get leaked. The all the greatness of those complex data tools is hey. Let's replicate, let's make sure this is highly available across 50 100 servers.

A

Well, actually, we don't want uh sensitive data to to be uh blasted across all of these things now uh excluding noisy logs right. If we have a ton of a ton of logs that are are non-critical things like debug, like Trace info things, we don't necessarily need in production. Let's remove it. Why? Why pay for those egress charges? Why pay for all of that data transfer costs? If we can exclude it's not useful to you, it's clogging up the systems and, last but not least, context at the agent and aggregator level.

A

You are at a piece in the architecture that you can get really informative context like the hostname or what's happening in a kubernetes cluster level, what's happening within this AWS environment, uh even goip. These are great places where you can plug in and add contacts and not necessarily have to do that with with the large place. There's a really great question in the chat about Kafka persistent storage can fluent bit. Do that? Is it possible to replace Kafka, would flip it right? I I'll I'll say what I said before too it's the idea here.

A

Is you don't do a full replacement of these complex data stack with something like flip? It does flip. It have persistent storage. Yes, can some use cases of Kafka be replaced, but just doing everything age inside? Yes, if you're doing Simple, processing, simple storage, simple buffering, simple retries, all of that's included in in for a bit from an agent aggregator side, you might not need Kafka just as a as a message, queue or or something like servicing in between.

A

So those are great ways where, if you analyze what you're doing at each piece of the data stack, you can start to leverage uh fluid in a broader capacity, because it's written again at that kernel level, C super high performance lightweight, gives you a lot of flexibility and what you might need to accomplish.

A

So, let's start talking about this processing in practice. So, let's start with the most basic thing so parsers.

A

If we look at a fluent bid and what many users are leveraging it for its log files, log files at the most basic level, just a giant stream of text right, it's the text that gets output, it's someone a developer, maybe has written log.console hello world and, as you are grabbing all of these logs and and all of this stuff, that's coming in Via um via these streams of data, it's important because we we want to contextualize it for that operator or the practitioner that's actually looking at it and and for example, with the parser.

A

There are very well known formats of these logs. So in this case, I have an Apache log. It's an access log. It has an HTTP method, it has an IP. It has a date time. I have my SQL logs, that that have databases, database engines and then Json right, that's already in a key key value format. So parser is again the most basic level of processing. These can allow you to extract these key values as you're collecting that data automatically right and many times these are out of the box.

A

So, with a lot of the parsers that you might be using within a microservices environment within a node, an agent, an aggregator you're, going to have a lot of these built-in parsers out of the box, so Apache nginx, kubernetes, Envoy, Json, Docker, crio istio. A lot of these come out of the box and if they're, not the great part about this, this fluent Community is being around for as many years as we have there's, there's a ton of parsers that are available throughout GitHub through the slack channel in GitHub.

A

Specifically, there are parser example, files that have some additional type of log files and what good parsing rules for those might be.

A

So if you don't have something out of the box, don't worry, we can Define custom parsers as well. These are parsers that live in the service section. You can use regular Expressions to do the extraction and the great part is say. For example, you are reading data that is not generated at the same time, you may want to use the original timestamp of the data itself and these parsers allow you not just to extract and create key values but say hey the time of the original record is what I want to send to my backend.

A

So when I'm doing a search, let's say, for example, an open search, I'm I'm searching for a Time window of a to B I, don't want the ingest time. I just want to know when that record was actually written to disk and collected.

A

So a custom parser allows you to do that now. A fun one is that we we get asked all the time is hey what about multi-line messages, so my processing is not just a key value pair, but I have these messages, as you can see here, that this one lives across six different lines and if I just send six different lines to my back end, it's really hard for me to understand what happened.

A

I might miss what uh line a stack Trace occurred in so I have to do some processing to say that this is a multi-line message and it should be captured as a as a single record. Now we have a blog post that goes into a lot of detail about multi-line messages, but at a high level from a processing standpoint. We've also made sure that Olympic can do a lot of these multi-line things out of the box. So you have multi-line parsers.

A

You have a lot of stuff, that's enabled for kubernetes and dark with both Docker and CRI runtimes, and then you can also tack on to that hey. This is a there's, a go, laying container golang stack trays, Ruby, stack trays, Java stack trays or a python stack, trace and, and we've done our best to Define patterns that that we think address a broad range of these languages.

A

Of course, if we don't have something under a box you're doing something really custom, you can also Define these multi-line parsers, so essentially something that you have it starts. This is the pattern that starts and then here's the continuation of that. So these are again great ways to go. Do that type of processing you might need from a simple log log capture that you might not expect hey I need some processing for well.

A

Actually, by doing this processing, you could streamline your operations, how how you're searching that data and how you're debugging and troubleshooting now the the more familiar place of processing for for those who are familiar with fluentbit, is what are called fluent bit filters now. This is a little bit of a misnomer. It's something adopted from the flu and D side and when we think of a filter we think of reducing or having it run through a strain.

A

If you will to use a water-based analogy and the these filters can actually do way more than just reduce or remove things you can add, you can modify, you can do conditional statements, you can Nest objects. You can do lookups against a large list of ips a large list of domain names. You can do glip lookups, you can even talk to apis. So kubernetes is a great example. The kubernetes filter talks to kubernetes and says hey, who am I?

A

What's my context, what namespace do I belong pod container all these things that are important in context of that application? It can add into the the log message now: here's a really simple example of a basic filter that does follow. Okay, let's remove some of the stuff, and here what we're doing is excluding a certain subset of logs that match a pattern in this case. If anything matches the pattern of my app we're going to exclude it from what we're capturing so these are.

A

These are the more common places where you essentially have a dedicated piece within your configuration that does this type of that does this type of processing.

A

So now, let's talk a little bit about Advanced processing with full embed and for this I'm going to hand it off to Thiago um deco.

B

Hi everyone uh yeah I'm gonna, be talking mostly about Lua, uh so I send recognition. There.

C

B

Filters in a little bit, but there is always the those Edge use cases, and for that we have Google is a full programming language that can be used. That is implemented is our future influence quiz and it allows you to do visually any kind of processing that you need.

B

So some of the highlights that has a very uh readable syntax. It's similar to python, uh it's a language, that's very simple, very limited syntax and it it's very easy to read and understand the code. uh It's also very lightweight.

B

uh What is uh I think might be the light most efficient, most memory, efficient, scripting language that you can embed in a program. So, given that the fluidability is targeting the high performance usage, efficient uses, uh it kind of makes sense to use Google as a embedded language.

B

Even though Google is small, you can actually do a lot with just the building uh the built-in libraries the extended Library. It has a very a symbol, but efficient pattern matching language, but it matches. uh You know a signal to regular Expressions, but it's a bit more limited in what you can do, but it's it allows you to do a lot.

B

For example, a lot of people that come to Lua, don't like you, because it lacks many uh many common functions. For example, a very common function is to strip white space from from a stream. Google does not have a function. That does something that, like that, but the pattern matching Library. When you start to get plan video with it, you can always find a one liner that does those things so yeah, it's a very small tool that allows you to develop with it and finally, Lua is very widely using the industry.

B

uh It's very heavily used as a gaming speaking, language I think it's probably the most used scripting language for games just because of its performance, uh but also because it's easy to embed into other other products. So a few examples are Roblox World of Warcraft uh for the older gamers uh there's also adult Photoshop lightable, which is an image processing uh favorable for Adobe, Photoshop and yeah.

B

These are some of the examples that it weighs more. You use them and we tend to think even though a lot of people never heard about you, I, don't know about it. It's very widely used in the industry.

B

C

B

Is a very simple example uh of a filter: uh this is a fluid filter, configuration that it defines a script. An external file called append tag, dot Lua, and it says that uh there is that phone which should call a function CB on the line filtered on that script and on the right you can see the the script.

B

uh It's just defining a doable function in Google. That takes the tag. The record tag assigns to the record as a field.

B

So this is one of the simplest examples that I can think of uh to learn more about the low API I can go to the documentation, but to see to summarize what is it's doing here? It's written in one which for fluently means it should process the record. It should take the modified record and replace uh the record uh if you return zero.

B

uh It's not it's going to use the original record, so that's also a simple way to skip processing that record and you can also return minus one to drop the record, so uh you can also use lure to drop to remove records from the pipeline based on very complex criteria.

B

uh So, in this case, it's showing the classic uh front bit configuration uh that uses uh the filter Block in a separate speed, but you can also use uh there is a second uh the next slide. It shows how to uh how to uh uh can you advance the slide so.

C

B

This case, uh you can also shows how you can put the screen in line in the affiliate bit configuration one of the limitations of this. uh This example is that this script must be uh in a single line, so you must reformat the screen for very simple processing. This can work, but it might not be the best example for everyone, but in this example, is this exactly the same as the previous, but it's simply reformatted in a single line and instead of script, we specify the code uh date director.

B

With us, so it uh flat bit since I think version 2.0 started supporting emo as a configuration format.

B

Emo is much more flexible.

C

Disassemble, it's.

B

The same as the previous one, it's the code embedded into the into the configuration, but we're using the block syntax from yemo to do embed the multi-line lower theater into the configuration uh so yeah it's the same, but yeah uh uh this. uh This next example is showing uh processors. uh What are processors?

B

Processors are a new feature, uh offline bits uh at first. It might seem like very similar to filters, but the difference is that uh with the process of which you are touching, the lower code or the filter, it doesn't support only work, but any fluid bit filter you can attach to an input or an output plugin in this. uh Normally, when you specify a filter, you can specify a match pattern and that filter is only going to be applied to records that have a tag matching that pattern.

B

When you define a processor for an input or output, it's uh always going to be part of that input or output. So there is no match specifier. So.

C

If you define a.

B

Processor for an output, every record that which that outputs is going to go through that filter before it goes into the alarm stream and.

A

C

B

So it's a bit. uh It might be quite a little bit of uh knowledge about how fluidbit works, to make more sense, but.

C

Yeah, it's an advancing feature and also it's.

B

Something that's only available for the annual field. As you can see, uh these are uh emo configurations. You cannot do this with the classic configuration.

A

Actually, real real, quick I see we're getting a lot of good questions, both in chat and open, uh I. Think there's it's a good place. Maybe we can answer a few of these, so one uh Francois really great question with Lua filter to Nest lift data in a payload, be as lightweight and efficient as using the nest filter. This is a a great question around how does Lua processing compare to fluentbit pipeline processing and and really what what Diago is showing here on this slide is two factors. One is on the performance side with Lua.

A

You can do a lot more versatility and you only have a one-time penalty of converting from Json to message pack, this internal format of Olympic uses now within a fluent-fit filter. You can Nest multiple or you can have multiple filters, go one after the other other, but actually you pay a performance penalty. Every single time you use a filter filter with Lua. You pay that penalty. Just once with other filters, you pay that every single time you need to do some sort of logic.

A

So, in essence, if you're doing a lot of transformation, a lot of processing Lua is going to be way way more efficient versus just using multiple Flint bit filters. In the end, it's all you know extremely lightweight High performing, but if you're trying to get that super fine-tuned use case even down to threading right there's another great question: someone has hey: do we need to be an expert in C language? No, do you need to be an expert in C language to use Linux?

A

No, it's it's the language that the tool is written in the debugging and the operations are are very much separate than the the language it's written in, so that that makes it easy to use consume. Build this really high performing data pipeline in and then being able to run operations. I'll put a plug for our next webinar, which will also be in another slide, which is around the operations monitoring performance of fluidpit.

A

So how you can look at this and how we've seen some of our users, who have hundreds 200 000 servers, do this type of monitoring and management as well with debugging uh and I? Think one more question was: is there more General, but it's great to answer here? Is there a reason for someone to use fluenty versus fluid bid? It's a great old. You know longer term question of like hey, there's two projects here, which one should I get started with. Well with fluidbit we've invested a ton of stuff to bring up that parody.

A

Now, with fluent D, there's a again a decade plus of community plugins Etc that you might still want to make use of that might still be be available. With fluent bit we've.

A

We felt we've gotten up to that at very, very high Mark, we're always eager for Community feedback where that, where we don't have that that parity and the other piece is really within the metrics and traces so who invid has that full support for otel Prometheus, this larger and big, broad ecosystem, and so using fluidbit is something we recommend if you're, just starting out versus a warranty because of that integration into this really deep and Powerful ecosystem.

A

Okay: let's go ahead and continue here.

B

Okay, uh so uh one of the uh problems with uh using too much fuel is that you end up copying and pasting. A lot uh uh were eating. One of the ways that you can walk around, that is by installing newer modules. Lua is a full programming language. It has its own package manager that can install telepathic libraries and for a bit filters can make use of that so interior. It's possible that you write.

B

If you have a script, a lower function that you use a lot in your processing, you can write a lower module, a word package and publish to lower rocks, and then you can install it on any Machining, simply use that have a thing: I script in a single line or required.

B

So uh this is something I'm gonna talk about now. Lua is uh has a lot of packages that you can install ready to use packages, uh something that is common in the low ecosystems that the libraries the packages they tend to be C bindings. They tend to bind directly to to uh Native libraries, so they are very high performance, uh uh and this is one example. uh This is actually one of the simplest examples of using our modules.

B

We actually do have a repository of word samples showing how to use other modules, but in this case we're writing a filter that is calling openssl library to compute uh cryptography hash of the logs.

B

So in this case, uh on the first uh on the first school work, you see how you can set this up on your machine, or you know your Docker file uh whenever, whatever you're preparing the fluid to to run this uh Ubuntu or Debian example, you install some libssl development headers into a hog rocks, which is the package manager, and then you go with lower rocks and install Google ossl, which is the lower Library, SSL minus, and uh you can see in the left and how how it invokes, how you can invoke a package.

B

How can water package you just require and the name of the package how it's registered with the system? uh This is as I said. This is one of the simplest examples. uh It also shows some interesting, lower features uh that I have not talked about, but uh I. Think it's interesting to see like Lua, has closures functions as objects, so you can see that is a digest Factory and it returns a a fact, a function that you simply computes the the digest for the specified algorithm.

B

So you get to create md5, Char, 1 and shot 256., and then we simply in the city filter. We simply add one field for each of these hashes cryptographic hashes and we completed the hash of the log of field and uh we might link this later. There is. uh This was taken from the fluent Beach samples repository, but there are much more advanced samples. There are seven there's an example of using XML processing.

B

You can do a very efficient XML parsing when you want- and there are many more things you can also import, regular special libraries Lewis socket, which is a networking Library. So if you ever need to do, networking calls with Lua, you can use something like it was sorted. So there is no limit with what you can do here.

B

So one of the uh calistians has developed a a playground uh 20 bit lower playground, which allows you to directly with tests do a code and as it's a simulated environment like uh on the left, you see this is a screenshot. You can see uh the link on the top right. uh This is a just to clarify. This is a web application. It's running uh directly in the browser. There is no server, so this is using a Lua in browser implementation to emulate a fluid bit like environment to test and.

C

B

You type as you make changes to either the inputs or the lower filter. It's going to automatically update the output and it's actually a quite simple uh way to test your Snippets and and learn a bit about uwell. So.

C

If you go to that you're.

B

Going to see this the same sample, this is the default exam. It's like the Facebook. The written in were afraid of it.

B

So uh now I'm going to talk about about Street processing, which is another feature: uh it's not related to new one. Basically, the simplest way to see it's fluent bid plus SQL, uh it's a very similar to SQL SQL like language, and it.

C

Allows you to do some.

B

Some things that are not currently possible with Google.

B

So if you look at this, uh this diaper, uh this is the the basic thing to understand. Is that string processor? It operates when when records reach the storage uh after it, which is the storage, it repeats the screen processor, and then you write SQL like queries to filter or modify records. There then can be ingested back into the pipeline before uh going to the an output.

B

C

B

B

uh So this is a very simple example. Basically, I stream is a every input. Inflate is associated with the Stream. Usually uh this stream is the name of the tag that is assigned either by the input or by the user in the configuration. In this case we are selecting every field from a stream called Apache and, in the second example, we're just selecting the code field, but we rename it to http status from every record that matches the tag, Apache dots star so yeah when you sit in SQL database.

C

When you select.

B

Use it's only going to display these fields in stream processing, it's going to create another record that has a subset of the fields.

B

uh In this, when you see this only the select uh statement, this is actually uh just sending that that selective data to the standard output- it's not actually creating another slip. So it's a good way to debug SQL like previous software, is you write the select statement and you see standard output, but usually you want to do something like create another stream from the uh SQL query. So this is.

B

These examples are showing that we're creating a new stream called hello from every uh field of every record of the stream, uh and this is something that goes back into the beginning of the pipeline. So it goes through of all the the filters that match it.

B

So if it's something that you you can combine with another filter, but you have to keep in mind that you need to properly set and match uh text uh for this. uh It's basically the same examples before and only it's freaking new screen.

A

I get a good question in the chat, is post stream processing? Does it go back to the input, or does it only go back when a new stream is created.

B

A

This is, if you.

B

C

Yeah, if you just.

B

Do the select, if you just do the select statement, it's it doesn't create a new string. It just sent the result of that to standard output, so it is a good way to debug to develop the the stream you're going to see in standard output. But usually we want to combine that with the create to create another stream and yeah. It goes back to the it goes back to after the input like.

C

It's like another.

A

B

That's an artificial input.

A

Yeah some use cases that I I I've seen some folks talk about in the pull into slack Channel. That I think are are awesome and uh plug would love to chat with more folks who are looking to do. The stream processing site, too, is you'll group a bunch of messages together and you'll send those the outcome of the group messages as a new Stream So. Maybe doing some analysis about how many HTTP 200 404s 500s, that you have within a certain window doing alerting so things like.

A

If you see an error, then have a new stream connected and have that stream only send data to slack or or an alerting mechanism. So a lot of really fun things that you can do with with stream processing- and you know stream processing is- is something that we're always trying to build on top of too so really eager to hear more about those use cases.

B

uh Can you advance in slider to the uh there's yeah, that's a simple so.

C

B

More complex example: it's basically a grouping records. So when you do a group select statement in a database, that's uh just grouping the existing data in the case of fluent bits, it's operating on a stream of data on a continuous stream of data, so we don't know how many records there will be. uh There can be uh not many right so this in this case, the group The aggregation in stream processing is by using a Time window.

B

So in this case, imagine that there is a cities, a bunch of records that represents cities and this they have the concrete in each record. So in this case we're grouping the records by uh by country, we're selecting the country and showing the count of cities that are in that in that group in the existing that area. But this is uh using a Time window of five seconds. So so, if all the records come all at once in a single file, it's going to group all that correctly.

B

But if it's coming uh from a network or something like that, it's going to this group is going to happen in in five second intervals.

B

C

B

Actually, something I mean I, think string processes. One of the ways that you can complement Lua cannot do what string processing does because Google does not uh your filters. Do not cannot ingest records back to the pipeline, like stream processing does so when you combine stream processing with you, I mean the the.

C

B

The limit with what you can do in processing.

A

Awesome so yeah, let's happy to talk talk through some of the more questions that are coming through the chat. um Is it fair to think stream processing as an aggregation engine opportunity? We would love to hear a little bit more. Maybe I could throw out a use case about what I mean by aggregation, which is hey I'm. You know sending all this data, maybe a thousand records per second, but I only care about the high level notes about those records, the metadata about those records.

A

So maybe how many unique IPS there are, how many total host names there are and maybe what the content, if content has X or content has Y in it. Stream processing is a great way to condense and do the logic across those thousand events per second, so maybe you're getting syslog Network traffic. You can build that stream processing on on top of with it. So that's a great way to to Really drive a lot of these.

A

These things that you potentially need something really heavy weight on the other end and do that within the the agent itself. uh Another great question: can you accidentally create back pressure, or is it run in separate thread, very good question? So uh just some background. What is back pressure right? This is fluid bid. It has a an interesting problem where it is so fast high pressure that sometimes back ends can can sort of say.

A

I can't accept this much data you're, sending way too much, please slow down and as it starts to fail, those messages gluing bit starts to say: okay, I'm I'm, failing my my sending of that data I'm going to retry, which essentially creates this pressure in the system where it's starting to store that data.

A

That's what the the back pressure is now with stream processing you're, essentially taking those records and and doing some computation on it and then re-sending it to um the fluidbit pipeline to to do more processing or send it again, and in that case yeah, if you're doing a lot of processing on all these records and you're slowing down, because the back end can't handle the the traffic and you're buffering you're, essentially adding more input into that that pipeline.

A

So the the processing, um the filtering all that can run in separate threads, but from a back pressure side, it's uh it is there. It will increase that. So, let's get into some of the the demos here.

A

So this is processing in the real world. I have three uh use cases that we tend to see in the slack Channel or hear from the community. All the time we've got some sensitive information, I've got noisy logs, I need to add contacts to these logs. How can I go and and enable any of these scenarios now the great thing with Lua is, you know you can, like Thiago said, get this really great Insight of what's Happening internally of what's happening in the record, use that conditionally to to build some of that.

A

So what I'm going to do? I'm going to share my screen here in another window.

A

And in this window.

A

I have a VM, that's just pumping out a input stream of um hello world and then I have a configuration that I'm running here on the side. So the first one I'm going to take a look at here is: how do we do redaction or how do we do enrichment of those logs with say a host name?

A

So what I'm going to do I'm going to modify the configuration here and in this configuration within the Lua function, I'm going to call a function to say, hey, go and get the hosting and add that hostname to my record, so something I can only contextually get at the agent side. How do I go and add that into each and every single record? So when someone searches, they know where this came from.

A

So if I go ahead and save that it'll go ahead and repopulate this agent and then this agent is going to go ahead and say great I'm. Now adding this hostname to each and every single one of my my records so that same hello world now has the hostname that's part of there. Now, if we look at the sensitive information that could be something like a credit card, it could be something like a.

A

It could be something like a social security number there's, always a large amount of things that we might have for hey. We don't want to send that outside of what we're doing. So. Let me grab this configuration here and what we'll do is replace this, and in this configuration you can see I'm outputting, this log say here's my credit card number. It somehow made its way into the log and I write a Lua function, very simple function to say if I find some numbers go ahead and redact them. So let's look at it.

A

What it looks like here without the specific.

A

Without the specific filter and then we'll enable that filter and then see what what happens so here we go, looks like it got loaded up and it's restarting and great here's my credit card number for everyone to to go and see: let's go ahead and add some redaction. So that way that doesn't happen anymore. So go ahead and do this.

A

And perfect great now, I'm redacting, all of those credit card numbers that are coming in um and and now it's it's immediately there um and not being shown. So everyone who does see these logs they're just gonna see those uh those those Stars.

A

Now, let's go ahead and look at a more fun use case, so something that has uh a lot of logs. This is probably one of the more uh more discussed use cases. So this is I am outputting two streams, I have one that's saying: debug world and then I have error. World and I have a Lua filter which will go ahead and comment out real, quick and what this is going to Output is debug and air, and those are going to both start coming in at a pace about one per second right.

A

Maybe my debug in real in in the in real life, would come in a hundred debugs per second or so, and essentially I. Don't really want those debugs right, they're, they're, clogging up my system. They cost the same as an error. Log they're not as useful and I'm, not really taking a look at them. So, instead of capturing them. What I'll do is I will drop those so with Lua, as Diego mentioned, you can say hey if this. This content contains this specific parameter. We want to get rid of it.

A

So in this case, I'm going to go ahead and get rid of debug and all I'm going to see are the errors which, at this stage, come in at one per second, so no more debug showing up if someone accidentally adds a debug, someone accidentally adds an info that stuff is effectively uh not not being shown at all or it's being skipped, so we're essentially skipping that now another fun thing which we'll cover in the next webinar too is with metrics and monitoring.

A

You can actually see from a Lua perspective, how many of these messages are are getting dropped, so you can see how many did I filter out, how many did I actually capture what my savings are, what my output looks like, so all of that is is available uh within within this this whole system. So those are three examples. I have some gifts that I'll paste here within that that YouTube, so you can copy and try it out.

A

These samples as well and then for this system I, was actually using our our ecliptic core Fleet Management, so you're interested in trying that out be sure to sign up. This is how we're, if you have maybe 100 servers, putting that same Lua rule across a whole 100 of them or uh looking for for folks who would want to try that out so awesome with that, let me go ahead and share the slides again.

A

And yeah, really that's kind of concludes. Most of the webinar I will make sure that we answer all of the the questions that that came in. Oh there's, another good question about how many stream tasks are okay, to run at the same time on one instance of fluidbit and the maximum number of seconds remember recommended. Oh, these are fantastic questions, so uh actually there's another there's a third one. I'll answer the first first two here, so how many stream tasks are okay to run?

A

At the same time, uh really you could think of what what are the boundaries of steam processing all of it's happening in memory, and it's happening across uh that entire data set that you put in there.

A

So if you have, maybe a hundred thousand messages per second, let's say: make an obscene number you've got to have the memory capabilities to store all those hundred thousand messages per second, for the amount of time that you want to perform this aggregation on top of, and so that's really important when you're thinking about sizing and how much and and how little I'm I'm doing.

A

Because if you have really high velocity uh streams of data, then you you've got to have the memory in order to perform some of these processing just as a simple factor of hey. We take that put it in memory and we do a computation on top of it from a number of tasks perspective.

A

um You know this is again. Each task is going to have its own set of memory as well, so doing that calculation of if I have 50 or so tasks. Each one is storing this, for maybe a window or a selection.

A

Those are places where you're going to want to to Really take a look and and introspect also where you can use Lua in some cases, right Lua is not doing the stream processing, but it can do a lot of those single record modifications that you might need so adding a field removing a field, and if you can do that, all in a single Lua filter, you don't necessarily have to run multiple stream tasks, so that can be a little more efficient for you there too, and then, which is the difference from using from streamtail or using from TAG thinking on what I use with tag I believe they're.

A

The I might pass this off to Thiago I'm, not sure if you know the the syntaxial difference, I believe it's the same, just a matter of if you want to think of it as a as a stream or if you want to use it as anything that comes in with that particular tag um as well, and let me check the chat as well.

C

Yeah I, don't know, I actually got a couple couple queued up that we I think skipped through during the presentation. So um one of them was at kafka's persistent storage um and can flow a bit do that I was asked kind of at the beginning.

A

ah Yes, I I answered a little bit of this in the beginning. Really Kafka is a a great great data pipeline tool. In some cases you can you can fully replace that with with flip it we do support data persistence. We do support buffering, we do support routing to other destinations. It really depends on what you're using Kafka, for we work really well with Kafka, and we work really well for certain use cases that are a little overkill for for Kafka, so good to you know always welcome to come chat with okay.

C

But I think we we lost your audio there.

A

Can you still hear me now yep there we go all right, sorry about that, my uh headphones died. um So yeah you, you know happy to chat with you, we'd love to learn more and uh see where we can help replace Kafka there or over in Richard, are augmented great.

C

uh One of the other questions was: uh do we support grok and MultiLing rock parsers out of the box today.

A

ah Really really good question uh with with Croc and multi-line rock parsers within the open source. You'll have to find some some Library there's some little libraries that do that uh within within the clipka versions and clip to processing. We do have stuff that uh that does some of those parsing lpeg out of the box um as well, but yeah.

C

And then uh another one was: uh can we write filters with go Lang as well? I think we kind of touched this, but yeah.

A

To add, on top of that, and in addition to luo we support webassembly or what's called wassum, it's a it's a brand new um addition to the fluent bid filtering inputs side. So say you like python, you like rust, you let go! uh You can write these inputs and filters in any language. You want you compile it down to webassembly, so you can see this is getting used in a lot of.

A

We follow a lot of games, as you can see, and it's super high performance uh super super good resource utilization, so you can use that it'll, compile it down to the fastest version of it, and then you can leverage it within the pipeline. It's still a brand new ecosystem. That's evolving! A lot of good stuff coming in with with webassembly so definitely want to keep plugged into, but yeah. If you want to try out the go functionality there with filters, I would love to have that from a community side. Here.

C

Awesome and then, if you want to jump to the next slide, really quickly, we uh for everybody, that's still on the call with us. We have a brand new community survey that we just launched um and we're super excited that all the feedback that we've gotten so far, but if you guys haven't had the chance to take this yet please scan the QR code. Give us a couple of minutes of your time, so we can help the community and provide direction for all the contributors. That would be great.

C

So um this is a QR code for the next webinar in the session. um So if you guys, like the content, you're interested in operations and best practices um or you just like listening to me, talk, um you can jump on to the next webinar uh and join us for the last webinar of this three-part series.

C

um So sticking back with q a um there's. Another I think one final question to get to here, and it was um why use low scripts instead of adding hostname by simply following in my Linux float bit configuration.

B

C

B

I replied that that I I think it maybe I did not reply to everyone.

C

We got that one side off then: okay,.

B

I'll copy again.

C

Awesome Yep uh just wanted to make sure we got to all the questions. um Thank you, Thiago. Thank you anarag, and thank you to everybody who stayed on with us with the content today. um Super excited to have you guys all here hope to see you guys in the next session, um as anurag kind of plugged a couple of times through the webinar.

C

If you're interested in trying this out for free visit us at califtia.com, you can go in and you can try out our TurnKey data pipelines solution and we look forward to seeing you guys in the next webinar. So thank you.