GitLab APM, 24 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Incubation Engineering APM -- Weekly Demo September 24th 2021

Description

Weekly demo issue - https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/14

A

Hello, joe shaw here uh full stack engineer in the incubation engineering department. My focus is application, performance management, monitoring and observability as a single engineer group, so we can bring it an observability platform into gitlab. This is my weekly update, video here's, the issue I've created to cover that and I'll link to the recording. Soon you can subscribe to the weekly issue links from there.

A

So in the last week I've been focusing on creating a metric schema for click house that will be very flexible that will support any number of measurement dimensions and store things in a flexible way. That is also reasonably performant compared to the existing solution.

A

um So if we have a look at that uh previously we'd use click house, uh we've evaluated click house to see how that will work with respect to sort of metrics and time series data against things like timescale db. It performs very well- and I was happy with that evaluation, so we're going ahead with using click out, so um I've been refining a design for it. So in the existing benchmarks with click house, the schema that are used for the metrics is very uh restrictive.

A

So for each measurement, if you will there's a separate table, so you can see, for example, here there's a cpu table that has these sort of typical fill fields like create a date created after a time, and then it's got all the cpu fields baked in now.

A

The obvious problem with that is you know if that cpu measurement changes or we have a different agent or a different style of collecting metrics, then you have to modify this table and make this table flexible or have duplicate tables for different styles or different measurements, and it's not really flexible at all.

A

So there'd be a table like this for every single measurement, so it might be disk io. It might be um some kernel specific resources. It might be like pod memory utilization, for example, um and then it has again a tags table, that's very fixed that has really very specific tags for this benchmark. But again in reality, you can't have a fixed tax table, because you know that your users are going to have arbitrary tags.

A

The only thing we can really condense out of that is the hostname, which we know would be in most data sets.

A

So again, this performs very well with the queries as we'll see, but it is not flexible enough for our use case. So moving on from that, what we do is we start with a naive schema where we basically just denormalize this and flatten it down, so it has the timestamp and host a measurement, a field and a value and an array of tags.

A

uh So every one of these uh fields in here then becomes an individual record um and it's a naive approach. So we get a baseline of you know how how badly can we do it without putting any effort in then we're refining that to create a better solution? We think um so.

A

I benchmarked that as well and then looking at the actual refined table the process we went through, um we started looking at codecs that we can use for better compression, so the timestamp is better with a double delta codec, which is really good for sequences uh like times, ideally ones with like a fixed gap between which might always be the case, but it compresses this very well low, cardinality strings for a lot of the fields.

A

So this uses a different data, set data structure in click house for strings or any or any particular compatible, cardinality type that doesn't have a large set size, and so that's quite ideal for things like the measurement name and maybe certain tags. In reality, it probably wouldn't work for hosts, because you would have lots and lots of different named hosts if you're looking in a real data set.

A

For example, if you've got clusters in google cloud, they're going to have loads of unique hosts, so that might not be appropriate, but for this benchmark, we're leaving it as a low cardinality and then for actual value. We're changing that to a gorilla codec, um which is being designed um as part of a a research paper from facebook, where they were designing a time series database and the gorilla codec forms an xor xor between adjacent binary values.

A

And so, if the values don't change very much, it significantly reduces the space required on disk.

A

Then we put in partitioning um so partitioning by day, so the actual chunks that are being uh written to disk getting petitioned and then click house when it's looking at queries, can only load and then and search indexes that are relevant to particular partitions. So that's quite useful.

A

um Each one of these changes we benchmarked and we got you- know gradual, better improvements as we went along and then we actually moved the tags and fields into nested structures so, rather than having completely completely denormalized uh method, as as in the naive one, we bring it back into having the fields in the same table as the measurement, um but having the flexibility that those fields are just sort of key value pairs, and you can see that the table structure we end up with there, and this is what we're benchmarking against, and I think this is you'll be glad to hear if you've watched my previous videos the last few weeks, the last benchmark, I intend to do for a while.

A

uh So that's good, um so yeah. So we couldn't use the nested uh structure here for the fields, because I couldn't find a way of getting the code to apply properly in that format.

A

The nested data structure there allows you to specify these nested arrays and it will validate that the length of them is the same, which is really useful when you're inserting data. So that makes sense.

A

So let's have a look at us. Benchmarking, this against time, scale db. We wanted to make sure we rerun that as well just to verify the previous results and compare it here. I didn't want to use the old benchmarks just to make sure you know we got fresh, runs default click house uh design, which is this sort of optimal design that I showed you that wouldn't really work in the real world.

A

uh Our our single table naive solution, which we expect to perform quite badly and then the single table, refined solution, which is this one up here uh so and I'm using the um devops data set. So we're not only getting cpu uh data in there we're getting. I think, there's about 10 different data sources so when it comes to having a single table there, that table is where all those data sources are going as opposed to going into separate tables which does put a burden on that table in terms of performance.

A

So we have to kind of tune it and make sure it works well and we're using different scales, so 10 hosts 100 hosts and a thousand hosts, and that's it by the time you get to a thousand hosts for the simulated data set. It is quite large, so that should work well.

A

um So here we go so loading metrics rate you can see uh while the refined one doesn't perform as well as click house itself. It performs slightly better than timescale db, so that's good volume sizes, pretty small for the refined solution, even smaller than the original click house, one in some cases. So that's great.

A

Now we get into queries and, as I go down, you'll see the refined solution performs pretty well.

A

The single table naive solution even beats time scale in certain cases as well as you can see like here and here, um and there are a few outliers, but it never performs particularly badly, and this is 95th percentile. So again, 95 of all the results in terms of latencies are doing much better than this. So this is kind of worst, almost worst case scenario, uh so I'll quickly, scroll through here.

A

So there's only there are only a few where the single table won't refine one just relatively betterly in comparison um and this one that timescale db1 didn't even return properly, uh but these are as we get further down here. These are queries that are quite unusual, very high stress, queries um and not at all. The sort of thing that we would actually be using the data set for is just to kind of put it through its paces, so yep so still doing quite well.

A

In this section again, for example, some group by just does very well there as well uh as you come back down to some of the end ones again performs very well, so we can be happy with that. We could, you know, as it gets put into real use. We can then start analyzing the queries that are actually getting run and see if we need to do things like have materialized columns or views on top of the data set. I think this is fine for the time being.

A

As far as I can tell so, moving on to the actual cpu memory use, you can see that it has similar profiles to click house itself in the normal table setup and as the dataset gets larger cpu usage is pretty good and actually its memory usage uh over the runs is generally better than all of the cases as well, especially the much larger one down here. It has a very uh solid memory profile, whereas sort of time scale is really stretching up to uh well over 60 gig and click house as well.

A

Is it's using quite a lot of memory so that that's a that's a really good time? So I'm happy with that. So I'm going to go ahead and then use this uh schema that we've got all the way up here.

A

I'm going to use this schema going forward with the build of the metric storage, and hopefully that will work well for us and be very flexible.

A

uh The only other thing added to the issue here was there a few links found. While I was designing that there's a an article here about how you design time series efficiently some information about the codecs that I'm using and some nice feature slides- uh and there was an article about recently- click house incorporated and it's become its own company as well, which is great to see. Hopefully, that means a lot more releases and new features from them, which is fantastic.

A

Okay, that's it for me and I'll see you next time.