Cloud Native Computing Foundation Kubernetes Community Days Bangalore 2022, 11 May 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Design Choices in Ingesting 1mn Events s Using OpenTelemetry and SigNoz

Description

The session will focus on design choices and tradeoffs we made to reach the scale of ingesting 1mn events/s to an open source observability tool (SigNoz) Key points to be covered: - Horizontally scaling OpenTelemetry collector in Kubernetes - Designing hierarchy of OpenTelemetry collectors (cluster level/node level/sidecar level) - Extracting metrics from traces using spans2metrics processor - Getting application metrics and infrastructure metrics in the same UI - Why distributed columnar database is efficient for storing telemetry data

A

Hey folks, I am one of the co-founders at signals. I will be talking about ingesting huge number of events for your authority platform using open, telemetry and signals.

A

uh So let's get started so here's a brief identity of what I'll be discussing about. First I'll, give you a flavor of what's observability, and why is it important?

A

Then I will get into a more details of open telemetry, which is the instrumentation library we use that signals uh to send data to our backend and then visualize it and then we'll take you through the design and architecture of signal so that you get a flavor of how system is built and what could be uh the challenges which might be there in ingesting huge number of data points and how you overcome that right and in pro in the process we'll also discuss about uh how do you discuss and expand open, telemetry collector, which is the main english component through which you to which you ingest data and then some more points about why columnar databases are interesting for auxiliary data right.

A

So, first of all, uh let's talk about what's obsolete, so observation to define. It is the measure of how well a system can be inferred from the knowledge of its external outputs.

A

So, for example, if you're trying to understand how a person is by trying to understand what the temperature is right. So, if you think about it, the temperature of a person is indication of something wrong which is going on with the person right and hence that is a a reflection of what's being wrong and we use tools like thermometer to understand why the temperature is high and to measure the temperature and then that is used by doctors to infer what's come so observability for computer systems and software systems are is very similar.

A

We try to measure the external symptoms of what's what the users are experiencing, to try to debug, to get a sense of if something is wrong and through those systems trying to figure out what could be the potential root cause of it right so so think about it like if something if a system has high latency, for example, if your web servers are responding taking lot of time to respond to your users, what could be the reason behind it?

A

So you so the high latency is a symptom, but uh the underlying cause is because some database query is taking more more time or some external apis calls are taking more time right.

A

So that's probably what auxiliary is uh now. Why is it work and more important to know right? So one of the things which has happened in the recent, maybe last two three years, is that more and more people are shifting to microservices earlier it was the code bases were primary monolithium single big code which is running, and then uh in this case you can easily uh print out the locks and try to understand what's going wrong right.

A

But if you have a micro services architecture, you don't have a single system on which you can print logs. There are different components of the system and each components behave independently and then to figure out what's wrong with the whole system. You're, not sure. Where do you like, which part do you measure right?

A

So, as this graphic shows in the monolith, you can just put a thermometer here and then the sense of what's wrong. While here there are like different uh subsystems- and then you're not sure like where to put the thermometer to get a sense of what's wrong right and hence uh we believe that as more and more systems are moving to micro services, objective systems will become more and more important right. And that's like that's one of the reasons why we started working on students.

A

So if you think about it, uh there are like primary three pillars of observation: one is matrix, which is the aggregates of uh time. Series data points which you see, which shows up, shows what how your systems are perform. For example, if you want to see uh latencies, so it's a time series a graph of how the different latencies are are there and you get a sense that okay in in this time something has some latency has gone up or called down.

A

But fundamentally it's an aggregate of your underlying matrix blocks are important events which occur during the execution of the code. So, if you're a developer during the execution, you can add a certain statement in the code that hey something got printed or some some function got executed.

A

So when the function gets executed, you get a sense of what's what got printed like that's just that step has passed and traces are basically a complete request across the software stack. So just to give you a sense of like what does this look like.

A

Let me share a quick example of signals yeah. So if you see this, these are like the different applications.

A

uh These are the different metrics, which I was talking about with, for example, this graph is talking about application metrics, for example, p99, and uh you can see like different uh time series without which tell you the state of the application, and then you can go to like different traces, which shows you uh like how a particular request is spending its time in different parts of the microservices.

A

So, for example, this this is a trace. This shows you complete, stack trash trace of how a particular request which was made on this api went through like a different services, for example. Here it's going from dispatch, uh micros uh end point to customer endpoint, then root endpoint uh and then maybe other different microservices.

A

So you have lots of details on this and you can try to figure out like what's going on right, so so these are like the two key pieces of observation. Of course, the third piece is the uh is the logs.

A

Right so now we have sense of like what does all what is already like. It helps to figure out what are the different parts of the system and what and if some issues happen, you can quickly figure out through these uh signals on what's what's going wrong right?

A

uh Let's not discuss about open telemetry, so open transmitter is a is a fairly recent project and it's a collection of tools, apis and sdks, which help you generate this and collect and export this telemetry data right. So we talked about like different type of telemetry testos, which could be metrics logs and traces, so open telemetry is the instrumentation is like helps. You instrument your applications with these and.

A

And then you can send it to open telemetry collector, and then you can export this telemetry tags and then you can use a background, for example like signals to visualize it and store it and like do more analysis on top of this right, so open telemetry is uh open standard like top companies like microsoft, aws, etc, are contributing to it, and it's a fairly uh maturing uh project. So do have a look at it.

A

If you haven't had a chance to check out and at signals, we use open, telemetry underlying for our for instrumentation, so you can instrument your applications with open, telemetry and start sending data to signals, and then uh signals will help you to visualize and understand what's going wrong with the systems right.

A

So some of the key features of open telemetry are like it has all the three key pillars of authority we talked about. It has traces, it has metrics, it has logs. So you can uh create this telemetry data and also collect from your services like so, for example, if you're running a web server, you can create a student, your web server to send tray sales, metrics and logs and then visualize the backend systems. It provides stop instrumentation, so it has lots of auto inject libraries, which can for principles like spring is v.net core express.

A

So you can just inject this libraries in your code base and then start seeing data about it right, and this is open source and vendor neutral. It is supported by lots of industry leaders, as I mentioned, like microsoft, aws, etc, and hence it's uh effort from the community to make this make the instrumentation part of the observability of open standards so that everybody can rely on it and then you can use different tools to sort of visualize it right. So that's where, like open, telemetry fits in open till and let us cover.

A

So let's talk about open, telemetry collector, which is sort of the one of the key components of open telemetry. uh Let me share a quick diagram. Maybe that would be more helpful for you yeah. So this is a good diagram which explains how the open telemetry works. So uh this open telemetry is something called collector which on which you can send start sending a data from your applications, and then you can visualize it through multiple metrics right. So now, open telemetry collector has three key pieces. It. It has something called.

A

Something called receivers which helps you receive metrics from different places uh like your applications or you can receive prometheus metrics, and then you can process it. For example, you can process the open telemetry, uh the metrics you receive through uh your applications and then add more data like, for example, you can enrich with the node metrics or you can maybe uh remove your sensitive data. You can create a processor to remove any sensitive data from the data you received and then try to process it, and then it has exporters.

A

For example, it can export to different uh databases. It can export to different backend systems uh so that you can then export visualize it later right. So that was that's. What open telemetry collector is uh and we'll we'll come back to this later, where we discuss more about like how we how this is important. But let me first give you a quick intro of what's like signals, because uh we use open telemetry heavily so so as of now so signals is an open source objective platform.

A

As of now, I'm primarily focusing on uh the application, monitor performance monitoring piece. So we are an open source apm. uh Currently we have metrics and traces and, as I showed you within a quick demo, you can go from metrics to traces very quickly. We are soon also adding logs.

A

We have very powerful tracing uh filtering functionalities. You can aggregate things very well and the core uh principle which we have is to make it as easy as possible right. So if you have, if you see other sas tools like new, really data dock, they they let you set up systems very quickly, but most of the current open source systems are not as easy to set up right. So so one of the things which you want to do with signals is uh set up as easily as possible.

A

uh In just a few clicks, you should be able to start seeing your application metrics and we light it over um lights. It in signals right. We use click house as the data store. It's a it's. A olaf data store column, column, db and I'll also talk about later on how why columnar database is important for authority data and like being open source? You can you don't need to send data to any outside servers?

A

You can run signals in your in your cloud right, so so that's probably what we do at signals uh you can set it up very easily uh install get installed and run in your old cloud right now. Let's take a quick look on like how we have designed signals and then this would lead to what could be the potential issues which may face right at investing in use. So probably we use open.

A

Telemetry, we use open, telemetry libraries to instrument applications, so you can instrument your applications here. uh Add open, telemetry libraries. This goes to open, telemetry collector, uh which receives all this libraries. And then you write to clickhouse store right, which is the our main data store, and then uh you, the dashboards, which you see they they call the go, go backend, which is a query service which talks to click on uh underlying right. So that's probably the architecture, and so now what? Let's talk about?

A

Come to the topic of like what could be uh issues which my face and on like, if you try to send huge amount of data um through signals and open element right so so. This is just example, which we came across when we were trying when you're working with one of our users, who was like sending huge amount of data to us, and then we wanted to test like what what's the um like scale, which had signals can handle right.

A

So so that's when we started working on this and trying to really test the limits of what open telemetry uh with signals can handle right and what we found was really surprising and that's what like, I'm really excited to share with you. uh But let's to give you a quick sense, uh let's, let's take a quick pause and try to see like where could be the issues right so cool.

A

So uh we just saw that okay, so signals has this architecture there like this, are different applications from where you start sending data to open telemetry so say this is that one? This is app2 now you're trying to send this data to this open, telemetry collector, which comes with signals right, and then you try to send this to clickhouse.

A

Which is our data store right now? There could be a couple of potential bottle uh next, so if, if you send and the type of event scale which are talking about like one million events per second uh so.

A

uh One bottle point could be this that, oh, if your apps are sending huge amount of data to the hotel conductor, uh there could be back pressure here. So lots of portal collectors are like lots of data is being sent to the social collector is.

A

Is his data getting dropped here right and then second piece is this, which is like? uh If you uh send a lot of amount of data from hotel collector to click house, how much is it able to handle right? So these were basically the two questions we wanted to test and and see like. Where do we end up right like or how much scale is this being able to handle.

A

And like couple of quick points about open, telemetry, so open telemetry, uh the collector which we have is a statelet selector. So it can be scaled by increasing the number of replicas. So if you're using, for example, kubernetes, you can add a lot of uh number of replicas to handle the scale right. So so that's like one of the good features of open telemetry that you can scale it uh horizontally and, just like add multiple number of replicas to it and and yeah.

A

Let's, let's discuss now about what could be the different hierarchy in which we can run open, open, telemetry, collector right so yeah. So, uh let's, let's see like what could be the different hierarchy for open, telemetry, collector, so um open telemetry collector can run in like three different modes. Primarily one is a side car three modes. Primarily one is the side car mode where, like.

A

Side, car mode, the second is it can run at each mode which is like the hbm or third is it can run it run on each cluster right. So these are privacy uh modes in which open, telemetry, collector can run sidecar mode is nothing but you run open telemetry alongside an application. Suppose you have a application one, and by this I mean something like say, a springboard application right, so you have a springboard application and then uh you want to collect metrics from it. So either you can inside car mode.

A

You can just add a hotel collector, which can run here as sidecar.

A

And then it will collect all metrics and then try to send to us third party right. So uh the way this will work is like each application will have its own site card.

A

Each application will have its own sidecar.

A

And then maybe multiple, if you want to send it to a particular bigger hotel, collector or they may not, click click that it will try to send to that right. The issue with this approach is that uh there are like lots of hotel collectors everything, so uh it's uh difficult to manage, also like, because each total collector needs certain resources to run. It takes lots of resources from the system right, but but the advantage that it's it's like blast radius is very small.

A

If one of the open electric connectors goes down, only this open, telemetry collector goes down right. uh The other approach could be. You can run only on a single node. So, rather than are you running on like each collector, you don't run it here. You run a single collector on each node, which is like the vm here, and then you start matrix sending metrics from here right. So the advantage here is that you will get all vm metrics like memory, users, cpuses, etc.

A

You can get it here which you couldn't have got here, because you don't have a sense of what a vm is here uh and then you can sort of send this matrix to here. Right again, this has like a bit lower number of portal collectors, so uh less less number of fault points, but also uh like.

A

uh More compared to like, if you are on a single hotel right, the other approach could be. You can just like run a single hotel, collector and rather than each application, sending to like their site car. You can just send directly to here. So no, uh no, no hotel collectors at the node level. Also, but just runs things send instrument all applications, have your applications and then keep on sending it to uh the main cluster right.

A

So there can be different nodes, so this can be vm, 2, vm1 and each vm can be sending data or like applications in each vm can be sending data to uh force and data two of central hotel collector, which can be on the cluster network right. So these are like different ways in which open telemetry collector can scale.

A

uh You can either run it as sidecar with each applications. You can run it on node. This advantage that you get the infra metrics. Also, while here you can uh run a test cluster right, so so yeah so.

A

This slide basically talks about how does um these different model works and- and let's talk about like what we worked on right, so we worked on uh ingesting and trying to test this in a docker swap environment. So this was just a test. Ideally we could have done it on a covenant disclosure, but this was just a test on handling how much resources can an open, telemetry collector handle right. So we started generating data around one million events per second, uh the amount of resources it took.

A

So we had it in a single uh dockers form setup, and this was this was not a kubernetes setup. So this was a single set of where you can. You have a single virtual machine which is like we used a 9 96 cpu, 72, cpu virtual machine, and then we uh used a 16 hotel, collector replica set so made 16 replicas for it, and but a single vm instance right, a single click house instance in the vm.

A

So uh what we found was that uh this setup was able to handle like one million events per second right, which is quite huge and which is what we're like very surprised with, and if you see broadly, this is at this skill this. This resource utilization is very cost effective. Like uh you get a 42 v cpu machine in aws at like very easily like you can get a 96 uh cpu intensive machine very easily, and then you can now in just like one million events per second right.

A

uh The published benchmarks uh for hotel collector, like if you are not using any sampling, is uh for a 4 2 cpu 4gb machine is like 20k.

A

um You can send so so broadly, uh if you see to send around 1 million events per second, you can you have to run around 50 replicas, while in our experience we were, we found that, like 16 or 10, collectors were able to handle that scale, so that's around like 50 to 60 000 for events, but that also depends upon, like the number of events uh which you are sending and like maybe also the type of the events uh but but like so this. This is scaled decently.

A

Well, we're able to send around one million events per second handle it with just 16 replicas in the dock or some container, and then on the amount of cpu start using like 42 cps right so so. This was like a really surprising um for us that hey like a single neil opened elementary collector installation, um with uh with using a single click house instance uh worked for handling.

A

This is one million units per second right, so this gives you lots of confidence like like on how much open telemetry can scale and also like uh how click offs is a good database. For this. uh There are like possible improvements for uh which we can do. As I mentioned, between between open telemeter collector to click house, um we can have some sort of queueing in unique house, of course, something called car kafka engine in-house and then there's also concept of buffer engines.

A

So maybe we can try to improve this uh by adding a collector, but like for the one million events per second scale. We are not seeing any such buff issues uh if you want to have better performance for the click house. Click house is also both horizontally and vertically scalable, so you can scale click house horizontally saad. You can also um set up data replica replication to enable redundancy.

A

uh This might be useful for protection. But for our case you are just testing it on like, but where the load limit so within or really get into these things, but might be an useful exercise.

A

If you want to or run this in production right and and the third piece is, you can also do a distributor set up of click house so rather than having a single machine click on this running, you can set up distributed so that a same query is like distributed among different instance, and then you can get data from different instances and then merge it back right. So that might be in it like some of the possible improvements which which we think could be made.

A

uh But even this result at like even to test this, this results is very interesting to us um that it was able to handle one million events per second without any issues right. So so, yeah and last point like one of the questions that many people have was like, why? Why did you choose to uh use a columnar database to us?

A

Columnar database might make natural sense for observability, because you can add a as many uh like rows and like attributes, you want and there's no prior schema, which is needed uh in in observatories. Basically, if you, if you want to provide uh good analytical capabilities like we do provide in signals, uh aggregation of queries is very important, and this works pretty well for, like aggregate uh for column, columnar, database, right and uh and so just to give you a sense of like why this is important.

A

uh Let me share a quick, maybe example yeah, so uh so just a quick example of on like why columnar databases are important so generally for observability, especially in the application monitoring space. uh The type of data we deal with is like, for example, you are trying to uh trace a request across different systems right, so you have a syst example like get, and then it's making query to example. This is the resource uh string and then there's a like. You have time like time taken operation.

A

And this is the method name and time taken, say: 12, millisecond and then say: http code right.

A

Say 200 so like, and then you can add different attributes here right so now suppose if you have different, so each request is like this: uh if you are storing this in a like columnar database format, you can just do. Aggregate queries like hey. Do uh tell me the average time taken by queries, which has this http status code, equal to say, 200 right.

A

So what you have to do is you just have to say like find http queries which has http code this and then just copy the like try to get an average of this right. So so, if you see, uh if you're, storing these as like columns, you don't need to look into other columns at all. You can just say: hey like what are the. What are the rows which has http gold is equal to 200? Then you take the time for those and then just do an average of this.

A

So uh this is just this basic example of like why uh column based data makes more sense, even especially, if you are doing lots of aggregations right- and this is where we think uh the real power for observatory systems will come.

A

uh If you allow like users to slice and dice the data in different ways and and columnar databases really seems very, very uh suitable for this right, so so and uh just to point out and and like if you are, if, like all the required queries, are in the same locality, uh it also increases quite a speed, there's a blog by honeycomb, also on why they, uh they chose a columnar database as the main data store and uh do have a look at it. It's very interesting. You mentioned some great points there, so yeah.

A

So that's broadly uh uh like what I was trying to convey that hey, uh even with using open, telemetry and and like a click house data store which we use that's ignored, we we have been able to scale it to like one million events per second, which is like a huge scale and like good enough for many enterprises to work for right and and the resource utilizations are not very high.

A

So, uh of course, there are like many improvements which you can work on, but this this this shows uh like very encouraging signs on how these systems can be uh scaled and like how the next generation of observatory tools would look like uh yeah. So that's that's probably where we are.

A

um If, if you are interested in the project, if you're, probably interested in the uh ability domain check out, our repo uh do do also check out the slack community hangout there we have around 800 plus requester and if you're interested in working on it or working on the domain, we are always looking for good people to join. Our team. Do give me a ping and yeah or if you're, interested in contributing to the project, uh just have a look at the issues, open issues and then feel free to say, hi cool.

A

So that's all about me. These are my coordinates if you want have any other questions or discuss more about on what I've shared today happy to listen to your thoughts and yeah. Thank you. Bye.

A