GitLab APM, 9 Nov 2021

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Incubation Engineering APM -- Weekly Demo November 9th 2021

Description

Weekly demo issue for the APM Single Engineer Group - https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/33

A

Hello, joe shaw here full stack engineer in the incubation engineering department looking at apm application performance monitoring management and observability solutions, um so just as a brief um catch up from last time.

A

What we're looking at is how we integrate our existing metric storage solution that uses click house and currently supports the datadog agent with gitlab, so that when you're setting up an agent uh with it being datadog or potentially other agents, you can specify the get uh project id and optionally an environment id within that project, uh so that we can validate against those and store those in the click house database and that's now a requirement once I've completed the merge request.

A

So just as a brief uh reminder, this is the issue that we've got for project and environment integration.

A

uh Datadog sends an initial validate request with an api key. We use uh gitlab's version endpoint as as part of rest api to validate the key against that and then datadog immediately afterwards send. It sends an initial intake request and this intake request will have host keys in and we can see an example intake request here.

A

As part of this, documentation has a lot of stuff in it, but one thing that we're interested in is that we can have these host tags and we can set up global uh values like the gitlab environment id and get that project id that are related to the api key as specified in the header with the agent that we then pass through.

A

We can then just collapse this down, there's a bit of documentation in here about how we then use the gitlab api to make requests to get a project um and validatex id and optionally check the name against that. If we're using the name instead and then optionally get the environment, and then we also check that the user has the relevant permissions so that they have a developer level permission against that project, and then we will accept rights against the series endpoint of datadog of the of the datadog api um and what we do.

A

If I have a look at the open, merge request here, which is functionally complete. I'm just doing a bit more testing around this now and one of the key details is that this is where we've now introduced. Redis into the solution, as a session store, we use redis quite a lot to get lab with the uh rails component, so it seems like an obvious choice here uh with a time to deliver 60 minutes per session. What we do in the redis via the api, is to store a hash of the api key in the host.

A

We actually prefix the length of these keys in front of each one of those, as we add them together, so we don't get collisions and we use a hash set in redis to store the validated projects in environment. That way, we don't have to keep going back uh to gitlab constantly to uh revalidate those values, and it will be, you know, valid for a certain amount of time, 60 minutes. In this case, we could configure that also it's the case that, with datadog those host tags aren't sent through in subsequent metric storage requests.

A

So we need a way of uh storing this metadata based on the api key in host, and the assumption here is that, as long as we've got a specific host ringing agent, that will be tagged to a specific gitlab project.

A

um So that's a kind of limitation here, but you know, I think I think we can live with that. If the user needs to change that, then they just generate a new api key for another project uh or change those tags and restart the agent and a new session will be created, for example, so that should be fairly straightforward and we've had to make some changes to the database to add the project id and environment id in there as well. So I'll give you a brief demo of that before I do that.

A

Actually, what I'm going to do is in the apm project. I've already got here, I'm going to create a project access token, so this could be a user access token that we could create for individual users. They have developer level permissions um or it can be a project access level one and I'm just going to give it the read api permission there and I'll just leave it as maintainer it doesn't. It doesn't need any specific, uh more complex role than that it could be a developer as well.

A

So you can see you attach a roll to that there and we've got the read api. So all we need is that minimal scope. We only need to go and get the project information uh from gitlab specifically, so let's create that access token.

A

I'm going to copy that. I will delete this before I'm done um so now. If I look at our home deployment in here, this is the one we use for development, but I'm going to use this for con convenience for this demo.

A

uh At the moment, there's no gitlab url specified, because in dev mode it uses a stub that we've created, which provides some pre-canned like responses to projects and environments. So we can easily test against that, rather than having to have a gitlab instance set up, and that was very easy to write the rest. The rest api is very, very simple, and so what we'll do is override that with the actual gitlab ul, so we're going to use gitlab.com.

A

That could be any instance um and then what we're going to do in the datadog configuration we've got a a datadog, that's deployed in development here. That means that it's much easier for testing we're going to set it to that api key that we've just set up and the other things are set. What we're actually going to do now is go back here and we're going to set this get our project id we're not going to bother with an environment id that's optional, so we've got to get that project id.

A

This is set up to a default value here at the moment for testing with that stub- and this is this- is a sort of in the uh rest docs. This is a project. That's that's uh in the examples that I've copied uh and this these environment uh tags will take either the full project path or just the integer id the integer id is what gets stored in the database.

A

um So I'll show you that. So if we look at our apm project here, we can take the full path of that and put it in here, so we're linking this deployment to this project, and that should be enough to get that up and running. So now, if we go into here, we'll see uh that datadog and the gateway are going to restart.

A

Now, if you look at the gateway logs, this should start to get some new logs in there as we start to reconfigure and uh set ourselves up and we get we're getting access requests here. While the agent comes in and we've hit an intake url, you can see there if I stop the logs, so that's come in and that's validated now so that that will have taken the api key and verified that against the lab, and we can see that now.

A

If we go into the click house cluster and we're just going to run a click house client here use the apm database and I've got a pre-canned um query here. Just just to demonstrate this, so we're saying get get all the metrics where the project id is this specific project id. So again, if we go back to apm the project that we've got here, you see that that matches the project id there, which is the integer idea of the project in gitlab.

A

um Let's go back to the terminal and we're just saying: let's just just grab see what their system cpu measurements we've got and we're going to cap that uh just a one record. I don't want to get all the records out of the database there and you can see. We've just got a new one in it's got that correct project id. We didn't specify that environment id, so these are just set to zero.

A

uh Just because nullable handling click house with sorting keys is a bit awkward. So just assuming zero means empty with my host and the newest timestamp there and you can see based on my system clock, but that's a fairly fairly new timestamp, but then we get more records in. I could link that to five. You can see the most recent cpu records coming from that host.

A

So that's all tied together and if I were to change it so that the api key were wrong, then we wouldn't see any more records. You know if I prefer to change that to a totally incorrect paper, yet key that would start to fail because the redis session uses the hash based on that key and the host name. So likewise, if a new host is added uh with the api key, it's got to go to gitlab and do the verification before that will become a value in the in the redis cache that we could.

A

We can reuse between requests um so back to my issue. So that's that demoed for you. So I just need to merge that after I've done a bit more testing, so I do have a lot of unit integration tests there. But I want to do some more manual testing, where I set up environments and put the wrong environment ids in and things like that and see how it behaves after that, I'm wanting to start investigating click house for storing logs, because I think it's quite important to get those two things underway as well as metrics.

A

My initial review is going to be looking at this c loki implementation, where the um uh get grafana's low-key design has been built out with click house. Here I'm gonna see what this looks like, uh because this could be quite useful if it supports the native log ql language from grafana, then it might be quite a nice back end to at least start with off the shelf.

A

So that's everything from me for now and thank you for watching.