Cloud Native Computing Foundation CNCF Webinars, 20 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar: Fluent Bit v1.5

Description

Fluent Bit, a CNCF sub-project under the umbrella of Fluentd, has reached it version v1.5

Come and join Fluent Bit community on this webinar where you will learn about Logging for Kubernetes. In addition we will dive into the new exciting features on this major release that includes performance improvements and new connectors for Google Stackdriver, Amazon Cloudwatch, LogDNA, New Relic and PostgreSQL.

Presenters:

Eduardo Silva, Principal Engineer @Treasure Data
Masoud Koleini, Staff Research Software Engineer @Arm
Wesley Pettit, Software Developer Engineer @AWS

A

Okay, let's go ahead and get started. I'd like to thank everyone, who's joining us today. uh Today's webinar is fluent bit version 1.5, I'm julius rosenthal I'll, be moderating. Today's webinar we'd like to thank our presenters today eduardo silva principal engineer at arm treasure data massoud, colony staff, research, software engineer at arm and wesley pettitt software developer engineer at aws a few housekeeping items before we get started during the webinar. You are not able to talk as an attendee. There is a q, a box at the bottom of your screen.

A

Please feel free to drop your questions in there and we'll get to as many as we can at the end of the presentation.

A

This is an official webinar of the cncf and, as such is subject to the cncf code of conduct. Please do not add anything to the chat or questions that would be in violation of the code of conduct. Basically, please be respectful of all your fellow participants and presenters.

A

Please note that the recording and slides will be up later today on the cncf webinar page, and with that we'll hand it over to our presenters.

B

Thank you well welcome everybody. My name is eduardo silva and one of the maintainers of the fluent project, and today, together with wesley and massoud, who are my interns from different subsystems of the project, we will share some news about this new release, but also a for new corners for new people. That is just learning about this ecosystem of logging.

B

We will start with a little introduction, then we'll start with the news about fluid, how our experience make creating a original plugins it and go to c language and finally, an introduction to one of the stream processing capabilities.

B

So this webinar will be a little bit more than just the release, but so we are going to share more knowledge about different components, so fluent bit in general, aims to solve all the problem that is generated when we want to achieve data analysis right. If we want to achieve data analysis we want to.

B

We have to collect all the data from hardware software and deliver these to a central place, and this gets more problems when data starts to scaling up because we get performance penalties and the data challenges is that data comes from different formats from different entries points like tcp, udp file system or journal d. So how do we accomplish this? And, if you think about distributed environments such as kubernetes, you can think that every bot, for example, has one application more one. More container, every single container is generating their own login information.

B

So if you are on this cloud native area, how do you solve this data analysis? Concentrating all the information together, which is a major challenge that we have and addition besides, centralization is about to deal with different data formats, as you can see. For example, apache logs comes with a unstructured format, same thing for my sql and json maps, and so on.

B

So one of the goals also offers a lot processor tool like fluent bet or fluency, is the ability to convert this into a structured way and have the notion of key and values and in with an internal representation in binary format.

B

So before the data analysis, we need to collect this data from different source and convert this data and also there's many cases where we want to enrich data. We can say: hey. This data is coming from a different node, a specific port, or this specific instance.

B

So, besides just to have the message that is involved, we also want to add some context to it to say: hey. This is coming from this host, this ip address and so on, and this is when it comes flowing bit, which is a tool that aims to solve all these data collection problems and data processing problems.

B

I have to say that fluid it's a cncf sub project under the umbrella of fluency, one of the graduated projects of the cncf and fluent beta started in 2015. It origins was, at the beginning, was created for lightweight environments, for embedded linux and quickly evolved for the cloud native space. So, and with this, we always think that we need to have something that was very lightweight and really fast and very efficient from a memory and cpu perspective.

B

That's why it was written in ceiling, which is very optimized. If you are doing nothing with fluid beds like a needle instance, it barely will use 600, kilobytes and with blockable architecture. We provide more than 60 plugins to deal with different source of data data formats and also to with ability to ship this data out to different destinations.

B

And, of course, we provide built in security, tls and networking io fluent. It can conceive, be considered like a data pipeline when, in one entry point on the left, we have all the inputs of data. We have all the person filtering, buffering capabilities and, finally, the final step, which is the ability to route this data to different destinations with destination. It can be a database, a cloud service or anything where you aim to store your data together.

B

If we think about kubernetes use case, which is a most complex scenario, for example, you can think that in your node you have bots and the pods are: writing all the login information to add to the file system. So how do you solve this with fluid? Basically, you deploy. One of the use cases is that you can deploy fluid bit as a demon set for this kind of a generic scenario where a demon set is just a pot that runs on every node.

B

It's able to read all your logs from audio containers then talk to the api server to retrieve all the metadata that is associated with that pod to finally be able to send the data out to a central place like your database or cloud service, and this is one of the deployments model that we have for fluent bed, but also you can use it as a side card, which is a very common use case used by many many customers and companies.

B

So here's where we talk about now about the journey of fluent bet. You know flow bit was started about five years ago and we're reaching our version. 1.5 and there's a couple of improvements on how do we deal with networking sessions a when you deploy a tool that leads and talk to different network services?

B

We face many issues you can have network outage and responsive services and what the only thing that we can do from a client perspective is to be able to provide setup to define specific behaviors to work around or perform certain actions with that happens. So on this version, we are introducing everything about connect, timeouts, meaning I'm trying to connect to this endpoint, but that input is unresponsive. How much time should we wait and the ability to define a custom source address or for big servers that has a a ton of network cards?

B

Say: hey, please, connect using this specific network interface and also the ability to you, reuse, the tcp connections and tls session, which is mostly called, keep a light, and this is keep alive me. It's about the concept of keep the socket connection open after one successful delivery of data has been done and also we support the keep alive idle, timeouts.

B

One of the the features that people was very interesting in and also surprisingly for us for the maintainer has been that people's asking a lot of support for windows environments well fluent since last month we have integrated a windows, support at different phases and on this version right now we provide a full windows service support. So you can manage flumebeat as a windows service.

B

Now the plugin that is able to collect windows event logs, which is like the mechanism for the windows engine. Now we are able to fully encode all these messages, utf-8 and also a we have full support for kubernetes on windows. I know that for many people this is a surprise and for us it is too but yeah. You can just flip it on windows and also manage all your windows spot login data with phone bit without any problem.

B

Also, the community has contributed many ways on how to improve the experience from a professional perspective, with wounded. So right now, with the repository and documentation, we are shipping out some grafana dashboards.

B

That can be really useful for you to just connect the dots between all your login pipeline, monitor where how the data is flowing data rates, which is a really important information.

B

In addition, the storage, metrics and the whole matrix has been extended. We used to have metrics for the pipeline like a how much data is being generated by this input, plugin, how many bytes, how many retries we are facing the output side, but we got many use cases that say hey. We want to know how is internal buffering mechanism?

B

How is the data flowing? I need to shoot down through a bit, but I need to know if there's any data in the queue being processed okay, so this is where introduced the the new storage metrics endpoint, which basically just pushed out a json adjacent map which have different information like a house storage layer if it buffers in memory how many the buffers in the file system.

B

If the batteries in the file system are also up in memory and from the input side, we can know if any plugin is ingesting too much data is overlimited or we can get more granular information about this, and I think that this is a huge improvement from an operational perspective.

B

Another big news on this is the ability of new england price connectors and I want to say, reinforce that enterprise connector aims to say connectors that are contributed by or that we work it join with them and to create for a specific services and for these companies, but also for the own customers. So on this new version, we're officially launching a amazon, also the support for the amazon, elastic search service so for the elastic search database hosted by amazon, we have full support with that.

B

A website will talk more about it same as for the amazon cloud watch service, and now we also have two new connectors for uh the log, the login services provided by look dna and eurek, and actually we are very pleasant to have been working with amazon and logged in a new rayleigh, google, sumo logic and other companies. On this period, google has working really hard has been working really hard.

B

Also on improving our own google stack driver connector, so we built an initial stackdriver connector that has been used by the many customers, but also google started contributing back a few months ago. So now the stackdriver connector is trying to have the same parity of futures than the fluidity connector for stackdriver.

B

So this is really good news for everybody who relies on google cloud services and some project status.

B

As you can see the adoption of fluid, we have just a fraction of visibility about who's using it or the number of deployments, but at least from the public perspective from our docker hub repository just on the half of this year, we're reaching more than a hundred million deployments.

B

So barely every day we have more than 500 000 and the stats keeps continue growing. That means that fruit is employed every day that amount of time, but also the same note that was was created, was destroyed.

B

Maybe that note was created again, it's deploying fluent bit again, so I always said that this is like a cumulative, a stats and not unique stats, but in general we can see the trend that adoption of the project is growing a lot and in the enterprise, and we can say that most of the biggest companies are using fluent right now. I think that, as a team, we are pretty of it so like logistic companies like for, for example, a leaf transit and we have the club providers.

B

Like aws google cloud binds like cloud, we have walmart using fluid bed. Now the really giant swan whom.

A

B

So for us having all these companies a adopting technology is really important, because also it means that the ecosystem of fluency is also growing and passing apart to the fluid ecosystem.

B

So and working with companies for us is a major priority, because we know that the kind of use cases and challenges that they face every month is kind of new in performance in deployment modes configuration and so on.

B

So now I'm going to hang over the presentation to wesley who will talk about the aws plugins and how they were migrated to the fluent bit core.

C

Awesome all right: let's do this hello, I'm wesley, I am the aws maintainer of fluent bit.

C

I joined the project this year, so I'm fairly new, but I've done a good amount of work and I'm going to talk about that.

C

So last year um I launched aws for fluent bit uh which had a set of uh plugins in golang for amazon. So there's amazon, cloudwatch logs uh kinesis, firehose and kinesis data streams.

C

And those were launched in 2019., so the reason why we wrote the plugins in go. One is because go lang supported a very fast development. It helped us launch an integration with flintbit very quickly, but also there was an address sdk in go so aws has its own authentication mechanisms there.

C

It has a it's called sig v4 signing, which is the auth algorithm, that adores uses for indoor services for our apis, and then there are many sources for credentials depending upon where you are running in aws, and these these mechanisms are all supported by the standard innovas sdks.

C

There was not an aws sdk in c that I could use for fluid bit, and so that's why I was unable to contribute to the core of fluent bit. However, we've now I've now fixed that so I spent I had a took a long vacation in the winter and I spent most of my vacation actually uh building a custom sort of low-level aws sdk in fluent bit that worked with its built-in http client and concurrency features.

C

um This was showing a screenshot from part of the final pull request, which ended up being over six thousand lines of code. It was quite big, but I'm very proud that it's finally done- uh and it's now launched in 1.5 so with this library, uh we're able to make requests to aws inside the core fluent bit. um So the first thing that we did with that is, uh we enabled amazon elasticsearch service support.

C

So there was already, of course, a fluid bit plug-in for the elasticsearch project, but it did not support the amazon hosted version of elasticsearch because that version uses uh aws authentication. So, with these new uh with this new library, you can enable aws authentication. So there are a couple new fields.

C

So one key thing to point out: first of all, the host, when you use amazon elasticsearch, you do not include the transport protocol, don't include the https, that's one thing that got me confused when I first started testing it out, the port is almost always port 443 and then you have to add these fields. It was off turn that on and then you have to add your idios region. You also want to enable tls, of course, then you optionally also have this parameter called abus rollarn, which lets you specify an iterous.

C

I am role that can be assumed to make calls to elasticsearch so that will use sts, assume role if you're familiar with aws. That should all make sense. If not, we have some documentation on this that you can go read afterwards to understand, then also so. As I said you know, last year we launched a cloudwatch log support influentbit um through an external golang plug-in this year with 1.5. This is what I mentioned. I added a natively influent bit in c cloudwatch log support, so the plug-in name is cloudwatch underscore logs.

C

The old plugin was just called cloudwatch, but besides the nature change in the name of the plugin, it supports all of the same parameters and it works exactly the same way so try that out, it's very easy to migrate.

C

So what I'm quite excited about is the performance improvement um that we've seen from switching to go to see so here's some benchmarks that I ran. This is tailing log files and then sending them to cloudwatch.

C

The cpu usage is a little is a is a good bit less for the new core plug-in in c, um but also, I think, more importantly, um the max throughput that the core plug-in can achieve is significantly higher. This is something that I'm still working on quantifying.

C

um I don't actually have hard numbers on it, yet I just have like anecdotal evidence that it's definitely a lot higher um eduardo and I are speaking again at the virtual kubecon, which is happening in august and hopefully by then. If I have time I will find a way to actually uh measure measure and quantify the per the performance and throughput difference between the two plugins. um The existing golang plugin had sufficient performance for essentially all customers that we had, but it's still nice to have higher performance.

C

For you know, any new higher performance use cases that might come up, and this is showing the memory uses. The memory usage is really where you see a much larger difference, because, of course, golang is a garbage collected language. The aws sdk for go also is not a particularly efficient library. In general, so the difference you can see a three to four x difference in memory usage between the old plugin and the new plugin, where the the new plug-in will be like 25 use 25 of the memory.

C

So you can see here in in this graph that even at 20 000 log lines per second- that's tailing 10 log files, each 2 000 block lines per second we're only using uh about 40 megabytes of memory, which is is really impressive um for comparison um compared to so before I switched to fluent bid. I was working with fluentd fluent d compared to the fluid bit golang plugins. That was like about a 5 to 7 x performance improvement.

C

So if you add an extra 3x performance improvement in memory with the c, that's like a 20 to 30 times improvement over fluenty, which is just really insane. um So we're very proud of that. um So the long term plan is that I hope I'm going to work on. Hopefully writing rewriting all of the original golang plugins in c and contribute them to the core flint bit. Then we can deprecate the go plugins and the idea. Maybe what we can do is once we have the c plugins at full parity with the go plugins.

C

We can remove the go plugins and alias their names in fluent bit core so that, basically you don't have to migrate your configuration. It just works with the c plugin, starting in some release of fluent bit. The timeline on that is very uncertain, and I should be very clear that this does not represent any sort of a hard commitment from myself or from aws as to exactly what we will do, but it's it's uh definitely something that I'm thinking about.

C

So what am I working on now so right now, I'm actively working on supporting outputting logs to amazon s3. um There is a github issue on the felipe core repo. If you have thoughts or ideas on how that should work, I've already posted some information there on building a prototype and on how that how the plugin will work and and the options that it might have some ideas so so go check that out, if you're interested in uh sending logs to s3.

C

um So here's, uh possibly what the config might look like for s3. um What the plan is is to be able to do um set a file size that you want an s3. So, like you can decide, you want a 250 megabyte files in s3, so it will upload to the file logs till it reaches that size and then truncate the file and move to a new file.

C

The plan is also to support kind of like a timeout, so you could say, create a file in s3 once per hour, but part of the goal is that part of my idea is that I'm going to use the multi-part upload api for s3, which enables you to upload large files in small chunks that can be uploaded over time, so that should enable fluent bit to basically, as it gets logs to upload them to s3 in parts and then once it's uploading done uploading the right amount of logs.

C

It can concatenate them together with the s3 api and create a file that way. You'll get. You know nice large files in s3 created over time, but fluid bit will not be buffering very much data locally, which means that you're not you're, not at very much risk of losing data if fluid bit quits or you know, if your instance goes down that you're running fluent been on or anything like that, because it's sending the data off to s3, basically as quickly as it gets it, it's streaming it basically um that's the goal.

C

I haven't actually gotten it working yet so we'll see, but um probably I think by the time we do the talk in kubecon next month. I should hopefully have uh the plug-in mostly fully working by then the goal is to launch it in september in fluent bit 1.6.

C

um So finally, if you're an aws user- and you want to get help with using fluent bit so here's here's your options. So sometimes people find my email and email me or message me on twitter. To be honest, that's not the best way uh to get in touch, because I don't necessarily reply to those very very quickly. um You can open an issue or comment on an issue in a fluent bit core repo and mention my github name, which is petted wesley.

C

I watch that and I will respond. um The best way is actually, though, to go to the aws for fluent bit repo, which is our distribution of fluid bit.

C

The reason is because there are a couple other folks who have been training to understand fluent bit, and so we all watch that repo and answer questions there. I prefer that, because it's nicer for me, because I can kind of distribute the load of answering questions um to multiple people lately, we've definitely felt that it looks like fluent bit is really becoming popular, because the number of questions that we've been getting have increased significantly.

C

Finally, if you're interested in becoming a fluent bid contributor yourself, I wrote a uh contributing sort of guide like uh not like as a style guide, but a guide of like if you're a beginner, how to understand the code and kind of like how to write the code. It is it's definitely a beginner tutorial. It's not going to give you enough to like write some of the more complicated stuff that eduardo and myself have written, but it should get you enough to like start. You can start writing a bit of code.

C

You know, try to start with smaller features and then work your way up. If you want to contribute to fluid bit, that's it and with that I'm going to turn it over to masood.

D

Hello, everyone- um this is massoud, kolini and I'm the maintainer of stream processor for flambet.

D

So if some of you know what is stream processing, it's actually the ability to perform data processing, while it is still in motion.

D

In general, what you do with stream processing is that you can apply computations on the stream of data in real time. That is coming from your.

D

Devices for stream processing, we have events which are records that are emitted by applications, services or harvard, and you receive them through your input, plugins and events of are structured messages, they're composed of timestamp every every event has a time stamp which specifies the time that event was created and the message with is actually the data inside the inside the record.

D

um What we need from a stream processor, what are the goals in general? We want it to be fast and likely data processing. We want there to be no tables and we want there to be no indexing and it has to have an easy to use programming model. For example. What you can think of is that you can think of fluent bit as a kind of data collector, buffer and distributor, and you can think that okay. So what?

D

If I want to do some computations, while I'm receiving data from one or more of input plugins, and if I want to do the computations not to send them out and before sending them out and only send them out when I need them, then stream pressing is possibly one of the major tools that you can use in.

D

Flipbit, but how we can do that and how, in journalism, processor works, um a stream processor in general can receive events or records from hardware software that are attached to them. They can apply real-time data analysis and then they can send the result out to a data collector or event collector.

D

So in general, a stream processor receives structured events or records exposes a query language which has key selection, filtering aggregation, functions and event routing, and it does the processing in memory.

D

However, when you're doing, when you want to do some stream pressing on the edge with the current on the shoulder stream processors, what you do is that you have log event collectors on the edge and then it sends everything all the records to the cloud to the stream processor, which can be a spark sql which can be apache link which can be um kafka sql.

D

And what they do is that they do all the analysis in the cloud and send the result again back to some servers like some data collectors or log collectors or metric collectors.

D

However, what you want to do is that processing stream processing on the edge.

D

This is what we saw like current distributors are working on cloud, and what we currently have is that we have added a stream processor to fluent with core.

D

So you can do all the processing that you want all the computations that you want on the edge before sending huge number of data out to the cloud and do cloud site log or data analysis for uh for the data that you are receiving from different hardware and software.

D

What are the reasons in general? What stream processing on edge can do for you is that it offloads computations from servers to data collectors, just assume that you have thousands of data collectors, and many of them may have free resources like cp and memory available, and if you can upload, just even a small piece of calculations to your devices, actually you're, paying a lot less for cloud. Computations you're, paying a lot less for your traffic flow between edge devices and cloud, and you will be actually very fast.

D

It also only sends required data to cloud, so you may do computations. You say that okay, so if the computations had this result, don't send data to cloud.

D

So you can decide on sending the date sending the data that you want to cloud like many other stream processors full onto it also uses declarative, sql sql like language to express the computations. If you guys are familiar with the sql, you know that it is a like. It is a high-level degree, declarative language that you can easily specify many kind of computations in that on the records and data, and it is easy to understand easy to write.

D

So it is very useful if our stream, processor or any stream processor uses a sql skill like language to define the computations that they want to do on the edge integrated. It is integrated inflated core. So it is very fast memory and cpu efficient.

D

This is the syntax. um This is a simplified syntax. We can say uh we will see that in demo later you can write, create a stream stream name. As and now you can have a selected statement similar to a singular fiscal where you can write select.

D

The result statement is the same terms as a sql, like you can say, select the average of memory select, the minimum cpu usage select the minimum of this field in the record, and you have from any sql.

D

You have front from a table, and here you have from from a different input, plugin or another stream, and then, as you might know, as well, when we do computations on the streams, we need to have a window that we do apply the computations on, like we can have a window that we put all the events from last, for example, 30 seconds in we apply computations, and then we throw everything the way we wait for the window to be filled in. We call this window tumbling window.

D

We can have another window that some of you may know that as moving a midwing window, where this window is called hopping here, and we can say that there's a hopping window of size 30 seconds and it advances by, for example, one second. So every one second, we throw away the old data from the first one second and put back the new data for the coming one. Second, that we were in this is called window hopping.

D

We can have work condition where we can filter the required records, and important thing is that we can have group by when we get. We can group the records by specific fields and apply calculations on the groups.

D

We support functionalities. Here we have many more functionalities, but some of the more important ones are. You can apply average on the key count: minimum maximum and sum similar to aggregation functions in conventional sql.

D

We can have time functions like we can add now, which is the current time or unix timestamp to the record and you're, adding more complicated functions like if some of you are familiar with time series functions like forecasting, which is a linear regression or simple linear regression.

D

You can apply a forecasting cure to your stream and you can say that okay, so we have memory, we can. We are getting memory from the input, memory, memory plugin and give me the forecast of memory usage in the next 100 seconds. So this time service functions can do that. For example, time service underlined, forecast under land r is the reverse of that. You say that tell me when, in future my memory usage will pass a certain amount, so I know that I will be in the safe in the safe range of memory usage.

D

I will show that to you in the demo how you can use, in general, the stream processing and these time service functionalities. One nice thing is that for stream processing we support sub keys. Let's say that you have nested json.

D

You can apply those computations on nested json, using like what you have down the page key square brackets sub q1 screw brackets sub q2. If you have and nested json of depth 2 a simple example, you can write a stream processing rule like this or task we say create a stream with the name of results.

D

Weight tag is results similar to the tag that you've defined for the input plugins. You can tag your stream as and now you have your selected statement, select average of cpu from cpu stream. That is actually your input. Cpu plugin, with a tumbling window of size, 60 seconds simple, so you wrote a full functioning computation of taking an average of a specific field of a record using just one line of skill statement.

D

Let's go for a demo and if you guys have any problem in seeing my screen, please write that in the chat window. So I can fix it.

D

D

I'm hoping that everyone can see my screens um here, I'm running through a bit, um so we need a configuration file. This is a configuration file for the demo, I'm hoping that you guys are familiar with writing configuration file for flambeat.

D

Here we say that we have the service section which says: okay, so um flash flash output every one second log level is info. Http server on plugin files. Is this: you need to add the path of your stream file, so streams file is actually the name of the file that you are. Writing your stream processing tasks inside.

D

That's it! You don't need to do anything else here in the config file you define here, we have an input, plug-in memory that we are reading. Data from this input plug-in alias memory tag memory which will use that in the stream processor and what I'm doing here is I'm sending the result of forecast. Let's say here, you say that I have match forecasted star.

D

That means that send the result of the forecast set the result of any input or stream processor that has a tag forecast star to the influx tv and flex db is a time series database which I'm using that in order to visualize the result of these stream processing tasks to you.

D

In addition, I'm writing everything else to cd out in order to make sure everything is working.

A

Okay, can you increase your uh font size, please yeah.

D

Thank you good sure, um okay, so I hope that you know so you've. At least you could see what I was saying. So I will go to the next step. Now we have defined the streams file. Let's see what is inside streams file.

D

Okay. This is the way that you write, string, processor tasks. You start with the section stream task name of the task forecast and you define an exec which is actually uh defines the sql defines the sql processing syntax, um similar to what I mentioned in the in the slides. You start with create a string forecast, the name of the stream.

D

As now here. Your selected statement, you can say, give me the average of as the result give me the average of memory used and the time series forecast give me the forecast where the input is record time and the memory used for the next 100 seconds. Tell me how much the memory I will use in the next 100 seconds, and you say as forecast.

D

That means that the re named the output record as forecast from memory stream memory, which is actually our input, memory, input, plugin, use the hopping window or sliding window of size 15 seconds and advance it by one. Second, I've added another stream processing task here, just for the sake of demo, the forecast value may not be very smooth, may not may go up and down a lot. So what I'm doing is here is I create another task called forecast average and I say, create a stream forecast average as now select the average of forecast.

D

What does the stream processor do is that it reads from the forecast stream, and the nice thing here is that every task in a stream processor can be seen as an input. So we can pipe cascade many many stream processing tasks together. So I'm reading from a forecast memory and taking average on the forecast value again, I define a hopping window of size 15 seconds at once, but by one second, okay, nice.

D

So in almost two lines we have defined two string processing tasks with with two almost complicated functionalities.

D

Now, let's see how it.

D

Works, I just started running it, so what I do is that, as you may know, I run fluent bit and pass the configuration file demo.conf.

D

For the next first 15 seconds, because the window of the simple server filled up yet you can see just the output of the memory plugin and around now. You should be able to see the forecaster value yes here it is so I highlight it so it says that oh memory usage is 3.4 gigabytes and the forecast is 3.4 gigabytes.

D

So it says that I don't see any reason that in future, uh in the next 100 seconds, the memory usage will be changed, because we are actually not running a huge number of um processes on the local machine.

D

So what I'm doing here as well? I'm hoping that you can see my um my screen and you can see uh this graph. It is a graphing dashboard that actually is connected to then fluxdb and it is showing the average memory and memory usage. You can simply see just one line because they're overlapping, because the memory usage and the forecast is almost the same. It says that the the yellow line is the average number usage which is 3.4 gigabytes and the forecast is almost the same.

D

It says that I don't see any reason that you know we are we. I don't see any trend of change in the memory that shows that in future, we'll use more memory. Now, let's do a trick. Let's run.

D

A process that starts consuming memory of my machine and, let's see what will happen, wait for the refresh.

D

And possibly you're seeing a green line going up where the green line showing the forecast of memory usage every in every time. The yellow line says that look. The average memory use the memory that is used currently is 3.4, but in the next 100 seconds we will use 3.7 and look at here. I would define a line that is called alert.

D

That means that whenever the forecast has four gigabytes send an alert create an alert send it, you know, send an email to the admin or to operator or a phone call or whatever, before we actually get to that point. That means that something is is going wrong with the machine, um so we can see that the gridline is forecast that our stream processor is calculating and the yellow line is the actual memory usage at the current time, and now the green line passed the alert and you can see that no, it gets red.

D

Now. Let me terminate the memory consumption and see what will happen.

D

Yes, yellow line drops a bit, but green line drops a lot more because it sees the trend of reduction in the yellow line, and it thinks that oh, we are going to consume a lot less memory, but after some time, when things get settled, they again get back to normal and everything will be a lot better.

D

So forecast will be again the same as the actual memory usage. If something is not wrong in your machine.

D

Okay, I can see there are some questions. I will ask the questions after this, because it's almost the end.

D

um Okay, uh let me go back to slides, so I think we are almost done with the demo and you're almost done with the with the stream processing presentation. But possibly the final thing that you can think of is that it is not bad to not to see front bit as a data collector buffer and router.

D

You can see fluid bit in general and stream processing, specifically as a way that you can do a lot more computations on your data that is coming through to the fluent bit. Without sending that out to cloud you can do loads of things with fluent bit. You can do machine learning on the edge. You can do many different calculations, and this can do a lot for you in your in your projects. Thank you.

B

Thanks mason, so I think that we're going to on the last part of the webinar, and now we are going to jump into the q a so please. If you are in the chat, you have some questions or you would like to assign to the maintainers this.

A

B

B

Yes, I'm not sure if they can unmute themselves or need to be through the chat.

A

Yeah, so everybody feel free to ask your questions in the q a and I'll go ahead and ask them. So could you guys uh this is from jindong? Could you elaborate a bit more on the hopping parameter, 15, second, and one second.

B

That's the question for massoud.

D

um Yes, um so if so, this is called um so the hopping window works like this: you it creates a window of size, 15 seconds. That means that it buffers all the data that is coming for 15 seconds and now what is the? What it does is that in every second, it throws away the old, the oldest data from the first one second and puts back puts into a queue it's like a queue and puts back into the window, uh the new data that is gathered for for for the current for the last one.

D

Second, that means that in every one second, it throws away the old, the the oldest one second data and adds the newest one second data to the window, and what it does is that that means that it still keeps keeps some some of the data that is arrived in the last 15 seconds. I mean that it keeps the data from the last 14 seconds and then adds one second data in the next one.

D

Second, it keeps the data from the last 14 seconds and adds one second of data, so that means that the data size that the window sizes always shows the last 15 seconds. Does it be clear clear that, for you.

A

Yeah looks like it was.

A

A

All right does anybody else have any.

A

A

I guess we have one: can you guys elaborate on the name fluent.

B

Okay, I'm going to elaborate on that. um This is by what it was sold by the creator of fluenty okay, when it's part of the story of the company, so when they created the treasure data at the beginning, this first a hadoop as a service, that's how it started and they needed to have a tool to ingest data into this new cloud platform.

B

But it I need to collect data from different places from or from different languages, because, as you can see in the fluency ecosystem, we have different sdks.

B

So this new tool needs to be too fluent enough to understand different formats fluent enough to uh to be connected with different language languages, and that's why the ecosystem was called fluent and then the project was named fluentd as a demon. You know most of the unique services ends with d and that's why the fluent name and well fluent bet was created in 2015.

B

I would say four years after fluency, so we put in the bit part of the name, because we need it to be like a lightweight option at that time for embedded and constrained devices, but we always keep it as a fluent ecosystem.

A

Great uh ali has a question: can you explain a bit about the forecasting part and how the computation is done.

D

um So so, actually, this time series forecast is a simple linear regression which tries to estimate the trend, the trend with the line and based on the based on that it calculates when, in future, the line will reach a specific value which is called forecast.

D

So if you want to read more about it, so you can just search simple linear regression, and this is actually the way it works.

A

Okay, great last call for questions.

A

Okay, oh here we go, uh can plugins written in rust, be integrated with fluent bit.

B

At the moment, the only integration that we have to write plugins, of course, are c language. Flambet is written in c language. You can write a output plugins in addition to see in golang right or you can write filters in lua. We don't have bindings for rust, and I I know that aws was investigating the if it was good or not enough to to invest into integrated rust connectors.

B

But one of the major uh I would say, integration problems was that, for example, if you want to write an output connector, you have to reimplement or reuse external components to replace internal things of fluent bit like, for example, network io, http client, because it's like influence, everything is done.

B

Handcrafted right and it's very optimized for concurrency, so putting external component in rust sometimes is not as straightforward, because you need to rely mostly on rust, api and their own libraries for connectivity or for other things, and I think that this is not a challenge that has been faced just by rust, but I think that is pretty much similar to what happened to to go to golang.

B

So if you write an extension for fluid beating gold and you face pretty much the same issue, you are relying on your english apis right. So you get advantages. You write that thing in a natural level language, but you lose a bit of optimization. You lose performance because you are not using the internal code that is designed to work with a special concurrency, a mechanism a I. I know that also there's other projects, for example in the ecosystem, on the cnc ecosystem, that they are investigating the same thing.

B

This is not just about rust, but other languages, for example amboy, who is a boy proxy? They are evaluating a. How can we integrate plugins in rust for filters but they're pretty much in the? If you can see the open discussion on github they're, pretty much evaluating the same thing? Hey, we will ended up writing a lot of components just to be because it's rust right.

B

So I think that at the moment for the original question, no there's no plans to integrate rust, because I think that for fluid project, there's not enough benefits at the moment and mostly from an integration perspective and challenges.

C

Can I I'll add a little bit to that um so yeah at aws? We investigated this the idea of working with fluid bit and rust. We of course started with go and then we moved to c, but we also investigated rust. um So you can actually use the golang interface um to write code and rust, uh uh but as eduardo said it it ends up. Not actually you don't gain a lot.

C

um You because you can't work with the concurrency features of fluent bit um like in a way fluid bit kind of like has its own run time almost since in golang and rust you can't really work with those. You don't gain very much um you end up having to.

C

There are other ways that you could also like integrate rust, because you can compile rust and see together, which we experimented with, but you end up having to use a lot of c objects in the rust code and we kind of ultimately decided it. It didn't really have any strong benefits, it's the sort of thing like if you really love rust and you just want to write in rust.

C

Just because you love it, you could do, but if you're just making it from a purely kind of like a pragmatic decision, it didn't really seem to hold enough value.

A

Cool and second part of that question are there future plans for rust integrations.

C

uh As eduardo said at the moment, no okay, I'd say like especially for us as as maintainers. Since um you know, eduardo and and massoud are both I'd, say, expert c programmers, I'm maybe more intermediate c programmers c programmer. It's it's easier for us as maintainers to maintain the project. If it's only in one language, you know that that's also one of the issues.

B

Yeah and one of the questions that we always get is about if we can replace some components with c plus plus, which is happens, to be pretty much similar question that for other languages and at the end, the answer that we gave for us. It was pretty much the same that we did for c plus plus right. So this is not about a which language is better, more secure or weak.

B

It's about a matter of performance and- and I think that, to be honest, is really secure from all the angles and yeah so and we're not saying that is something that we will not do. I'm saying that this is something that, at the moment this was was said, there's no much value added to it, but at some point I think this is beneficial to extend to different mechanisms. So people can extend it on different ways.

B

But what happened is that, for example, most of people in the filter side, for example, if they want to add a special logic or complexity, to take decisions on how to modify their records.

B

As of now, lua has big enough, and the performance penalty is pretty low and in the output side a most of the connector relies on http backend, so write an http output. Plugin for fluent bait is pretty straightforward because we provide all the apis for networking, failures, http, client and everything that is needed to achieve that, and that's why. For example, a many company has implemented their own connectors, mostly sometimes without any external help, for example the case of data.

B

They just come up and say hey. We want to write our plugin, they used the api and was really straightforward for them.

A

Okay, one final question: as I saw on the early days of fluent bit docs, it was said that it was designed for embedded devices for the low resource, consumption, etc. Is its focus now on kubernetes and cloud services.

B

Okay, the design at the beginning was for embedded devices. So initially, with that statement in mind, it was mostly input and output plugins. There was not filtered at the beginning and with inputs I'm referring to. We just offered the plugins to read data from serial.

D

B

Memory, metric cpu, matrix and most alternate log messages, things that happens inside the device and in the output side was treasure. Data output, http, pretty simple, and the traction was mostly seen on the cloud perspective. So we said: hey we're going to focus in the cloud, but when we say we're going to focus on the cloud it doesn't mean that we're going to be build a big beard right. That was really really heavy. So I think that yeah fluent beat is now.

B

I would say cloud focuses because I would say: 90 of our users are kubernetes user docker users, but we always keep in mind this mindset of low cpu and memory usage. Always so it doesn't for us, it doesn't matter where fluid is. It needs to be optimal, so for us it can run on a very small arms cpu or can run on any big kind of server on any cloud provider.

B

So yes, originally created for embedded devices for embedded linux devices, and but now it's a is with a focus in the cloud, but without losing performance and memory, and and that's one of the strong reasons. Also of the language because managing our own memory, we know how to optimize. We know how to use the memory and avoid any kind of issues or performance issues that are mostly generated by garbage collectors.

B

A

Okay, great well thanks eduardo massoud and wesley for a great presentation. I'd also like to thank the attendees for joining. uh As a reminder, the recording and slides will be up later today on the cncf webinar page. uh We look forward to seeing you again for another cncf webinar thanks. Everyone have a great day.

A

B