GitLab Analytics Section Meetings, 11 Sep 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: 2023-09-11 Analytics Section Meeting

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Welcome everyone to the section sync for the analytics section on the 11th of September um I. Think I only have the only topic here for today.

A

um I just wanted to quickly share something that was it or came out of a discussion from uh uh Mr that um Dennis did totally into this configurator and potentially affects our in, like the infrastructure that we create um quite a bit um and so I think it's important that we uh kind of raise awareness about this before we change it, and everyone also understands what this would mean and um can give feedback on on what they think about it or if, if there's any um doubts or potential problems that anyone can foresee, um Bruno share my screen just to kind of visualize.

A

What I'm talking about so.

A

I'm not gonna, reiterate this in detail, but um I think most of you probably have heard about beforehand that um we're gonna need to change a few things um to support clickhouse Cloud. We um had this topic before and that clickhouse cloud is going to be necessary for um our production clusters on gitlab.com.

A

um Yes, this allows us to get through security review quite a lot quicker, because the actual.

B

A

Then not managed by us, but by someone we trust and.

A

um Which already went through security with you, the problem with clickhouse cloud and our current infrastructure is that if you look at our current infrastructure, normally clickhouse directly requests things from Kafka. There's a Kafka engine table in clickhouse. So clickhouse has the capability, in theory, to directly access a stream of events or or lines block lines that you, as you can see in Kafka and import them into into clickhouse.

A

However, this functionality, the Kafka engine table, is not available.

A

In specific cases, just doesn't support things that the click house open source solution that you can you can host yourself supports, and the Kafka table is one of those the Kafka engine table. um So we need to find another way to get the events into into clickhouse and the evade the way that um actually clickhouse mentions um for clickhouse cloud to get event events into it and that then it started to work on is using a tool called Vector.

A

um Vector is created by datadog I, think it's originally for processing blogs or sending log somewhere, but I think it definitely also works for our use case and it changes the way the architecture works so that, instead of having this Direct connection between Kafka and clickhouse, we suddenly have Vector, which takes the things from Kafka and then puts them into clickhouse. Cloud, um there's also difference that clickhouse is no longer requesting something from Kafka, but rather Vector is putting something into into click out.

A

So it's actually it's just using the closest apis to to put things into it, um and so Dennis created an MR to update our configurator to support those two different use cases, because essentially, when now doing the switch, we would need to support two different infrastructures, which is um this one, where we connect directly between clickhouse and Kafka.

A

And then this one, where we um connect from Kafka to back or vector, gets the things from cover and then puts them into clickhouse cloud, and this creates changes in the configurator, because the configurator sets up the right tables and so on, and so a configurator needs to know about whether we want which kind of architecture um we want. And so the discussion came up, whether we shouldn't just support Vector everywhere.

A

So instead of having two different architectures, those two um that we switched towards the system where Vector is always running and the only difference again is whether clickhouse is running in a separate cloud or within our own infrastructure. But the configurator doesn't need to care. It just gets a clickhouse API URL.

A

And also the rest of our system doesn't really need to care that much about where clickhouse is actually running. um If we have Vector running everywhere, the database set up the tables and everything would be the same um and in essence we have these two options. We can either say: okay, we want to have support these two different architectures, one with Vector one without vector, and then we have to deal especially in the configurator and maybe also in other areas with the differences um or we say um we don't want that.

A

We'd rather have one architecture, which is just this one, which would mean we have Vector running everywhere. So it's one additional service that also needs to run, for example, in Docker compose or um in the in any other kind of Stack, um but since Vector would run everywhere the advantages that Vector actually has some inbuilt capabilities. You can do transformations of of the the events that go through it um and um you can, for example, try to send things directly to specific databases based on the IDS that they have in them.

A

So in theory, we could investigate using the app ID of a um event to directly put it into a specific table so that we no longer have to have one time table where everything gets taken out of into the specific tables. All these things we could start to investigate. We could make vector and active component, whereas if we have the two different options, I think we would keep Vector as passive as possible, just so that um the the two different environments stay stay as close as possible.

A

um I linked the issue for me to investigate this, um but I'd really like to have everyone's questions or also feedback on what you think about this change um yeah, since it would change our architecture quite a bit.

B

So what would be the benefit of keeping the architecture like the type of architecture without the vector with just clickhouse like what would be the benefit of having these two different options that we switch between?.

A

You have one less service that runs in those and so one less service that can potentially break that can potentially take performance so right now, I, don't think we really know how much um Vector is taking when it comes to um resources like CPU resources, memory and so on. It's supposed to be lightweight so I assume that that much, but still it's one more service that runs within those environments and one more service is one more thing that can potentially break and that can potentially create issues um but yeah.

A

That's that's the main thing that I could see. Maybe maybe so.

B

It's about like maintenance work on making sure that it also works locally yeah, um because yeah so I'm keeping also clickhouse would also generate additional maintenance work right, because we still will need to make sure that the local click house is working and with Vector we would be using the Cloud clickhouse right, not.

A

Necessarily I think those are two different. That's a good point, but those are two different changes. So when introducing Vector in here so within within this, uh the the current kubernetes clusters we have, we can then still have clickhouse run locally because you might not like just to keep it simpler from a setup perspective also um to allow potential self-managed users who don't want to use clickhouse cloud to use the whole system to be able to host everything themselves.

A

um So the question whether clickhouse runs within the system like within the same cluster or within the same Docker, compose setup or whether it's run separately on clickhouse cloud I, think is separate to whether or somewhat sector somewhat separate to whether we introduce vector or not. When we introduce vector everywhere, we gain the ability to use clickhouse cloud everywhere. So you could have a local setup within Docker compose that connects to clickhouse Cloud, for example, which I think would be another advantage of probably having Vector.

A

Because again, you can just, um for example, try out issues that might be specific to the cost Cloud. But still um you could all. You can also add vector and I guess for the norm, Docker composer. We would do this add Vector here and still keep the clickhouse locally.

A

um So nobody, not everyone, needs access to it. So.

B

It is possible to keep the vector as a mediator between Kafka and local click, hose yeah.

C

A

This is only one way: clickhouse Cloud doesn't support the Kafka engine, but both clickhouse cloud and the normal kit clickhouse both have the same API, which is what is used by Vector to insert things.

B

In general, I think I would be in favor of keeping the same architecture everywhere.

B

um Well, there can always be stuff that needs debugging and sometimes it may be related to Vector. It may be related to clickhouse Cloud, so you will need some tool to be able to detect these bugs or introduce changes in case they are necessary.

B

um Yeah and just from my experience, I feel like usually keeping like a separate system just for death. It also ends up um creating work that we need to do like maintenance work. So.

B

I, don't know what.

D

B

Think about it.

D

On the topic of um Vector supporting click house, local and Cloud I'm, assuming it also supports a bunch of other database types and event pipeline locations that you could pass it through to other areas. If you wanted to um that's the correct assumptions in it,.

A

um It is there's um I, just think they can actually there's. They have the concept of syncs, which is kind of where you put your um the the events, which is quite a list of things, um I. Think it's important to note that in theory also snow plow I would support putting things elsewhere. So, for example, we could uh so I think um in here you can, for example, put uh the the events into an S3 bucket um with a vector. That's something that also Snow block would do directly.

A

That's something we wouldn't need actually vector. 4.

A

um and Vector is definitely kind of um built around and and and kind of biased towards observability data, so also the whatever you can see as soon is more in the direction of observability. So you, for example, I. Don't think they support any other real data warehouse there's, um but they support like something like snowflake or something like that, but they um support I, know putting things directly into New Relic or into Data dog, or um so, while it is the case, I, don't know how much of an advantage.

A

This really is um because it's it's not necessarily geared towards our use case, where I think something like I, don't know. Snowflake or other data warehouses would be a more natural replacement for clickhouse than the things mentioned here.

D

Okay, that's fine, um yeah I was just thinking from a community contributions perspective. If, when we open this, they stack up- and you know- remove certain things from it to allow Community contributions, whereas the areas that we might see people wanting to submit contributions to add support for their own architectural Stacks that we don't currently support for them. um So it sounds like this.

D

My Vector may be useful for them, but more likely it'll be further up in snowplow sort of area that they'll want to make change plug it in elsewhere and skip the whole vector. Rickhouse wrote all right.

A

We'll Max, you got more more opponent, so.

C

Yeah there's more question about whether or not this is a long-term thing, because um I think it's already sort of preemptive amount so because Dennis mentioned on PTO is something called click pipes which kind of fills that Gap, um but appears to be in beta and is very specific to to clickhouse Cloud um is: are you envisioning back to being sort of the long-term solution to this particular problem?.

A

From all that, we know right now. That would be my assumption, um but it's really hard to say because I mean clickhouse can move quite quickly in in certain areas if they want to, but also they also sometimes don't so, for example, just to give an example, they I think added Json support um for kind of actually reaching into Jason's a year ago in some kind of beta version, and nothing happened since then.

A

um So it feels to me that we, the best thing we can do especially when it comes to clickhouse, is just rely on technology. That's kind of established right now uh and assume that's going to be the long-term solution. Then, if something better comes along, we can still evaluate it, but um based on that experience, I'd say yes, I I Envision it for for the longer term, however long that might be then.

C

That's fine I I, don't ask with any particular sort of uh expectation or or uh preference in mind, um because I mean if we're going to trust anyone to support data pipelines. Then datadog seem like a you know: pretty solid Choice, um that's fine yeah. The last I heard about this was the click pipe seemed interesting, but that was the last I heard before I went on PTO, so cool thanks.

D

With that in mind, I'm fully supportive of us unifying our stack, the short to median term benefits of our velocity, having a unified experience across the board and being able to debug early while we're in the early stages of this of uh analytics, instrumentation and product analytics more more uh well related to this stack, more specific, like um I'd, be more than open. Let's unify it and make our lives that much bit easier for the short-term medium term.

D

If we need to change in future, at least, we've only got one area that we need to enter.

A

A

I I think I agree. um Do we have any any other opinions, anyone that sees a problem? If we we were to go down that route.

D

I think the biggest problems would be performance, but we don't know well. We have no idea what the entire Stack's going to do once you properly gets some proper amounts of data being around for it, which is where the you know. The gitlab.com test is coming from to really put some genuine data for it in any considerable Mass, so wait and see if it becomes a problem we're early enough to be able to Pivot. If we have to but I, don't I mean it's data dog I doubt we'll need to give it that much.

A

And I think one note regarding that: we don't know yet if it's possible, but if it were like what what I think we should try to investigate, then is um configuring Vector in the long run configuring Vector in a way that it actually sends data directly to the database specific to a certain app ID?

A

um So I don't know if everyone remembers but um originally um or right now the the database setup Works in a way that um we give me a second and I can actually share it.

A

B

I think maybe let's put this.

C

B

Then very much here to everyone.

A

Yeah, so currently it works in a way that we always put everything into one queue and then there's these materialized views, which put it into the specific project, databases um and that's actually I- think something I'm more worried about becoming a bottleneck.

A

uh If we have loads of uh views consuming from the same database um and with Vector, we have another opportunity to instead of doing this kind of shenanigans just directly putting things into the right database, so it directly kind of gets, pushed inside of a app ID, something something database um which would then decouple databases much more from each other and I think also reduce the risk that um is kind of inherent in these shared architectures.

A

It's very like uh one con I, don't know one customer coming in that creates millions of events or that it is uh creates. Loads of events leads to slow down for all other people, and I mean we're always going to have that risk. As long as we have one kickhouse database with multiple customers in there, but if they don't use, one share, DB I think that's that's at least somewhat reduced. If they are all kind of spread across databases and within the database, we actually don't have anything, that's shared across them.

A

um So that's an argument. I think that might even go in favor of vector, but it's still something we can figure out if it's possible.

D

I assume do it going down. That approach would also improve our uh vertical scaling. Our horizontal scaling, sorry, with our ability to be able to just span out new Brick House databases, if someone's being a particular pain and causing a particular influx yeah or even you know, uh get that dedicated um and helping them get that set up and having their own dedicated area, although they may use similar architecture yeah.

A

That might be the case.

A

All right, if there's no more feedback, then that's also from my side. um Thanks a lot everyone for attending and for sharing your thoughts. I think it's helpful when we kind of discuss this in a bit of a grander audience and then make sure that everyone's aligned with those changes before we make them more. In this talented thing,.

D

So yeah thanks for sharing it's been interesting.

A

Thanks right, then, have a nice day. Everyone.

B

Thank you, bye-bye.