GitLab Data Team, 8 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SaaS Service Ping Automation Instance Level

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right over to you radovan, I just added a few points but yeah we can go through them once you finish with your presentation,.

B

Of course, of course, maybe it's easier for us to or for me to share the screen, if you didn't, do that sure just easier to track the thing yeah for new team members and who can't attend a prior session or demo session last thursday morning, it was apart friendly this time I touched the presentation here, it's related to serious ping, automation, part and what we did and what was our plan and intention to do that.

B

Do this and how we build uh or try to build a stable pipeline and replace the manual work, and it was really annoying work on product intelligence team site. Now we will take ownership of that and have a stable rate time. Stable pipeline runs one per week, but for more details. Please refer to this presentation and also current state of sushma about issues. I finished everything when it comes to this known part for me to provide a stable pipeline.

B

Two issues are on the review. I think both of them are with you and also one of the crease one with justin, and it's about translate the snow postgres to snowflake syntax for sql and second one is about testing part.

B

Two do not hide the error when it comes to any kind of error table, then we execute queries for automation, service being, and also I put the second point here: do you think we need some additional work here in order to provide what is needed for the reconciliation, because what I thought original idea was to replace or to add on the top of the existing model's automation, service ping source right?

B

That is what I inherit from at you, and I don't know, did you change your mind or you have some additional tasks for me, but generally I name three things here: what we provide when it comes to the data type namespaces redis metrics, that's a new feature and also instance, redis matrix in this case it's flattened.

B

So that's that is just what we have now.

A

Right, my only concern was you know. We know that when we are trying to reconcile this data, there's no apples to apples match. So I was thinking. Is it better to just go ahead and create some prep models and start using them in our fact tables to see how the data looks like instead of refactoring, the existing fact tables or the prep tables? So that was my only question because just in case, if we wanted to revert back any changes it'll be easier. If we already have the old models too,.

B

Yes, I think so because here I put the links here, it's about redis, metrics and instance flat, and you can see these data are landed in prod legacy. Double instance. Flattened and source for this model is for in prep schema here, and these data are flattened like key value pairs for each metric. Like you hear metric name and method query- and this.

A

B

We have here so sorry, yeah.

A

Sorry I was saying this is a different model than the the sas usage ping name space right, because currently, uh what we have is the uh sas usage pick name, space from which we are reading into that prep table correct.

B

Correct, let's, let's find the model and see: did you put it here ah this one right? Let's take a look together.

A

Yeah, it's not this one, but yeah the source for this uh prep table is what I'm talking about.

B

A

Yeah, but that's okay, maybe just go ahead with your presentation and then we can just get back to you know uh what the next steps are going to be like what the impact is, I'm going to be on the prep models or on the fact tables.

B

B

Still recording the session, as I said already, will find the overview of this presentation there, but I want to just give give a little bit different overview. What we've done here more to focus on technical aspect and follow-up work behind this implementation, especially, I think this is beneficial for new starters and guys who are not so familiar with this. What is the risk being our motivation challenges, how we set a pipeline lifecycle of the data, how we handle error? What is the possible future improvements? Also?

B

This is a good meeting for that documentation for the product and any questions for you. For me, what is a service ping? I just copy and paste this from handbook. You can find all details data, but generally it's a processing, gitlab that collects and says weekly payload to gitlab incorporated data for us from data side. We call it before usage ping. Now we call it strictly statistics. So please stick with that, because this creates also confusion. For me, series ping is a background.

B

Process runs weekly to be precise on sunday and we get the metrics for our www.gitlab.com product couple of metrics are mentioned here is active user count, unique visits, see a runner releases, merge, requests, issues, etc.

B

Then our motivation for change current setup was tough. Also. I mentioned that because product intelligence team generate this data manually, I think from someone's computer once per week or something and it took six to eight hours, also notice a couple of situations. It was more than one day in average, with potential extension depends on many parameters. What was the main problem here? Declaring the postgres data directly and that database is based for application processing, not for data warehouse processing. So partitioning is on id, rather than on per day. There is no uh partition.

B

There is no indexes on the columns. We need it's hard to find, define the data end date range and for that reason text time here currently in the in snowflake, it takes 15 17 minutes in average, which is great, I would say- and also we want to establish automated mechanism created by us and to take ownership to us and have everything in place. So we can manage orchestrate control and enhance the current setup and it's obligated to establish a proper checkpoints and error handling.

B

So for each stage we want to be sure we have a trusted data inside without any errors and also trusted process without any errors, so one we want to be in control and, of course, to replace manual with automatically generated data.

B

Our challenge is mainly, as I said, we got postgres sql syntax to get and execute the queries and get the data. So we translate everything into snowflake generically. We have algorithm to do that, which is now in a very good shape. I would say so for us we get sequels in positive, syntax, translated to snowflake syntax and get the data from snowflake.

B

That's the reason and magic why we have everything we need in 17 minutes instead of six hours or eight hours and of course, during this implementation we discover what was missing and implement a table per table like in this query, use five tables and you miss two tables and two columns from two existing tables. So we need to wrap up everything and get everything in the postgres replica in order to be able to execute the queries.

B

So if you have a perfect test cases, if your all of your queries passed, it means you're on good track. If something is missing, you need to add something.

B

So, as data from gitlab.com is kind of first class citizen data- and we won't don't want to make any compromise with data, so everything should be in a perfect case without any error. When it comes to execute the queries, we have now 700 queries, it means 700 metrics and nothing failed for now, which is great, probably in the future. We can extend this set of data with more queries and it will be scaled automatically, which is also a good thing.

B

No error handling mechanism was implemented. Now we have one error table and it will be visible both in airflow as a custom-made, sql informed us if there's any error and also in size and trusted data dashboard, which I think we should rely more and more in the future, and it will be very, very easy for us to make service ping data visible. Also. For that reason, I invented a tag called service ping, so we can easily follow, connect, execute or ignore everything when it comes to service ping.

B

My lesson learned about the implementation. What I failed to recognize flow for error handling, so I created from the scratch I mean generally matt, you give me some hints, but still there wasn't implementation, so we let's say, put everything from zero to this to the production I had to go through a couple of iterations to figure out.

B

What's going on, what's the product, I have zero knowledge about the product and generally about the company and also about gitlab.com product, but now I think it's a much better shape and also I struggle with many data types how the code was implemented.

B

They inherited the translator and when it comes to linters my pi packages black packages, I had to wrap up code to be in a testable mode, because you have, let's say one very big procedure, so you need to break it down without changing the logic too much and provide test cases for each of these chunks of the code, which is now I say, stick with the best practices to respect solid principles, to respect how to test code, to ensure the code, quality, etc, etc.

B

Graphically speaking or graphical representation about automation is two major points here: sql based counters, red, reddish-based counters- you have everything in lucidchart in details, but still general story about that is everything started from restful. Api calls transform the data and land data in raw part of the snowflake uh data. Warehouse same story goes for red is based counters, except in redis. Counters data is kind of ready to eat, ready to serve for us. Everything is pre-computed there.

B

It's just a json file with key value pairs, so we just store the data in raw stage when it comes to sql based counters. It's a little bit more complex story and also I describe it and also we'll focus on that in next couple of slides about ownership of the data and the process, one part will be uh under product intelligence ownership. Now I think that condition is ongoing, but at the end, gitlab replica and also restful api will be under their supervision.

B

So it's good for us, because we should ping only one team when it comes to any problem, troubleshoot or whatever, with postgres replica when it comes to ownership of data team, we started to when we pick up the data and json from api and everything is on our ownership. This diagram is good if you want to troubleshoot something, if you want to check something or if you want to understand the process, let's say you have issue to extend the current functionality. This will help you help you a lot.

B

Also detailed story about transformation from postgres syntax to snowflake. Synthesis has been exposed here. What I did here mainly, is you have a happy flow for this algorithm and I invented and handle special cases like scholar, sub values, regular sub queries. Also, very, I would say strange notation when it comes to postgres uh aggregate functions, they have column, dot, column, dot, column and it's not like you have only column, dot column and that's it as simple. As that also good thing.

B

If you want to extend change, alter test this algorithm data lifecycle, good graphical representation, I like this picture or this slide. Why? Because it provide me the full life cycle of the data, how data looks in raw sql postgres, how it looks when we transform them? How metrics looks as it's the result of executed query and at the end, what sushma already mentioned?

B

How flattened data looks like in key value pairs in let's say normal normalized model in snowflake, so you can easily follow the life cycle of the data and transformation of the data good for you to understand the basic.

B

Why and how we implement the algorithm to get the metrics from snowflake rather than from postgresql lifecycle, main motivation replacement work. I already said that it's very specific mechanism here, not usual data warehousing job, to push the data or extract the data. We have some strong logic on data warehouse side. I would say this is not usual, but this is good because we decrease the time how and when we execute all queries and also when it comes to text tech.

B

Our typical well-known tools, we are using python custom-made code, restful, apis, snowflake and dbt are in the game here. Error handling also one specific detail. I want to share with you. We have instasecular stable in raw schema.

B

Every time we go through the loop and execute all these queries to snow on the snowflake database and get the data if something fail. Data with full error and with following explanation will be landed in instant sql error table in raw schema, and it will be alerted by us both, as I said in airflow and sizes, very, very important for us. We can act very quickly if something fail. These data I run once per week, which is not too frequent. Slo is not too tight, I would say so.

B

We have enough time to fix the issue and provide the good data good to mention one more time. This is first class citizen data, very, very important, crucial for the business, so that's the way how we should deal with any kind of error and also, as I said, we have enough time to fix it. Maybe in the future this will be tighter, like around each hour each day whatever, but we should be ready for that.

B

Our future influence data reconciliation sushma gave us the good introduction for this. This is actually our next job for this, technically speaking, try to avoid json files directly storing snowflake, but this is more general story. Why? Because we have this limitation in json to handle maximum 16 megabytes per column value. This can be a problem with the future, but I think robert can correct me if I'm wrong, the data lake can somehow sort out this problem as a part of project postgres replica version 2. This issue is with me.

B

I can believe we can speed up these things, 50 minutes not too much, but still. I think there is some way to improve the to performance. It's not a show stopper at the moment, but this is nice to have and also uh in project postgres replica version number two. We want to ensure hidden content data, it means when we run data now or after two months, we will get the same results now we do not support deletion statement from postgres replica.

B

That's potential problems sushma, but we are aware of this and I hope uh differentiation will not be too much and we are able to reconcile the date and also, technically speaking when it comes to best practice, should stick with that solid principles. Python airflow level decouple the code, make it testable executable and things like that.

B

Also I put on the last slide here the full set of resources. You can use to understand the service being to go forward, change, something you can break the rule if you know the rules and also get some readme files and last two things are related to lucidchart diagrams and also a couple of pages inside it. So briefly, this is the story about the improvements and yes, questions.

B

If you have any a question.

C

Yes, what is the performance improvement between the current service ping generation process in the production live site versus running it in snowflake.

B

This is a very good question and I like it because you have two typical setup and it comes how you should model and design your database pause the database. Currently, what is the replica of the production database when it comes to www.gitlab.com product? That database is made for application, and usually you don't have partition per day per week per year?

B

You have indexes probably or most of the time on ids, because you you want to quickly retrieve what you need issue number request number whatever and when it comes to our purpose, our purpose is to generate the data for data warehouse data. Warehouse approach means partition per day, because usually we query the data, we don't care about particular id. We care about date, weeks months years whatever. So.

B

This is the main difference between these two approaches, because when product intelligence teams clearing the data on postgres site, usually they use the full table scan which is extremely expensive and when it comes to the snowflake, even we don't have partitions or something behind the scene.

B

Snowflake is using mpp multi-processing data he's running on the cluster, so we can. He can digest that very very fast, rather than go for the full table scan on progress. So that's the main advantage. We translate everything move everything in snowflake, because snowflake is designed to do mpp support, rather than go to do that in postgres.

C

That's great radovan thank you and by the way, miles and sushma and chris you are awesome, you're putting your questions in the agenda, you're, keeping the notes up to date. Fantastic radhavan uh asks, if possible, could we gather some real data, perhaps in coordination with the pi team? Let's get like the last.

C

I don't know, as many runs as we can from the pi team. Elena probably has this on like the total execution time, and then I would like to capture our most recent runs running in snowflake, just as a data benchmark test.

B

Thank you, yeah yeah thanks. I already put something I know in an official way just a screenshot of test from testing airflow uh environment, but at the end we can make it official and put benchmark tests. What alina can send it to me to compare what we have just to see the obvious improvement and our win? How we deal with that and the story behind. That is actually this presentation and our mutual effort to be there. Of course, work is not done, but still, I think it's also good to know.

B

I I think people like to see obvious numbers and how we improve the things right, rather than listen me for half an hour that one picture will be very, very nice to expose.

C

They're both good great chris.

D

You had the first question yeah I was just wondering I mean. Obviously the previous manual process was running on the live postgres data and now we're sort of running off the our backup which is synchronized to snowflake.

D

I was just wondering if there's any impact on the sort of freshness like like what sort of delay are we adding to that.

B

A longs, a long story short when it comes to positives, also good exercise. For me to to be sure, I understand the situation. What we have. We have live production database as db1. It's a real deal. Everything is stored there to avoid the impact on this production db.

B

Infra team creates a replica for us, and pi team from this replica creates the replica for us and actually use this db number three. You have one production replica. Second, one, third one this is for us. In that case, we should avoid any impact on the production database. Originally, uh service ping gathered the data from production database and it slow down radically because you have that full table scan I already mentioned so.

B

This replica of replica is something for us, and that is what we have uh based on, that we have data running uh or jobs running once per day. If I'm not wrong in airflow, we can check it together later, but I think it's one per day, plus some full refresh once per week. It means you, don't you have data, not older than fell hours or one day maximum. If everything is right, we have a problem with positive replica in one table.

B

It's called notice, it's the biggest one, I think, and sometimes it fail and you will see when you're on triage. If not, is failed, it's kind of uh recurrence error. Good thing is, we already have an issue for that to fix the things and have two or three solutions how to get there me and that we should work on this this or next week to have the permanent solution and behind that story we have architectural move to version two.

B

When it comes to postgres replica, I think it's not the best solution in the world, but still working good and it will be work much better in the future. So but shortly positive data is not older than day and serious ping is running once per week.

B

Okay, we'll put answers later mars. Thanks for your contribution, welcome to the team I mentioned airflow and sizes, actually how things are going with this alert. We execute queries on the top of snowflake. If something fail, data will end in error. Sql table, we watch, we have a triage person, let's say I'm on wednesday and look the airflow. If something fail, I will be informed in slack. I go there and see the logs if something happen with this table, or I found any data, I will see the error.

B

That's the one thing, let's say old-school style, to go to the logs and see the error. Second thing is visual solution of the problem or visual spotting, the problem we have size and trusted data dashboard with tags and based on text. It's called thrifting tag. You will see. We have a problem. Why? Because we have data in the table and it's not expected, it means we treat it as an error or as an exception.

B

So, with these two tools in mind, we can act quickly. As I said, first class citizen is gitlab.com data. Many things are inside many metrics crucial for the business to run a business based on data. In that way, we should act as fast as possible. So we put two checkpoints for this type of data.

E

And if something were to go wrong, like me, as an analytics engineer, I would find out from uh the triage person posting in slack hey. We had this error.

B

Yeah also, we have a process to inform. Slo is bridge like we, you don't have the data or this part is missing. This is a very specific. Why? Because you said you have, let's say 700 metrics, maybe one metric should fail, everything else will be perfect and we will inform you like. Okay, we have a problem with this process, because metrics name uh monthly number of ci bills failed because there is some problem or structure of the postgres database has been changed because there is also impact behind that we are.

B

We are relying on postgres replica and this data translated into snowflake.

B

So that's the way how you should be informed. The good thing is this kind of resilient thing, because not all metrics should fail. If one metrics failed, the rest of the matter should be processed and it will be visible for you with the freshest date.

E

Great, thank you. um The second question was just about since we're transitioning over to this new process. um I'm just wondering if we've audited the result of the process against the result of the old process. I've seen in migrations like this before, where there can be just unexpected discrepancies, and so I just wondering about that.

A

E

We have a dashboard.

A

For that, uh if you could just place the link to the dashboard here, that'll be great uh radawan, where we're actually trying to see what the metric values are and what the percentage of differences between the whole process and the new process and what the threshold is. So we are yet to determine as to how much of the threshold values are acceptable. Currently, but yeah we do have a dashboard for that.

B

Can you see me send me the dashboard name? I do not emphasize this very often except today, to refresh the commission. My account, do you know the name.

A

It was the one that matthew created, I don't I'm just trying to find it. I, which I can't there is one that compares both the metrics. Let me just check it was that in one of the issues uh I've got to find that.

B

Something like this anyway, if you can find it just ping it here, I will or put it in an agenda and.

D

B

What I want yeah, what I want to add here is you're perfectly right. I spend a lot of years doing the reconciliation between I don't know some all the new process, integration into the one common data warehouse whatever and still there is always concern that it will not match hard to match 100.

B

But still we are aware of this problem. The deletion part with postgres and also this is next step for us. What sushma said about reconciliation.

E

Okay, cool uh this might be even too early to ask the question, but uh generally when you're migrating to new process, it's a more reliable, more durable process, probably improved logic. So should we expect that if there are discrepancies, it's because new process is more accurate or it's better or.

B

Depends, I would say we can't say my too much in advance. uh Previously data has been created on the live database or original postgres database from product intelligence team manually. Now what we have is there will be some difference for sure, but if it's what matt you put here, I think you remember chris and sushma that sequel, he compared all the new data and also we have that in sizes to compare the difference.

B

If b differences, I don't know 20, probably we should go backward and check what's going on and run that particular matrix, and if it's not, if it's one or two percent, we will see what is acceptable for us and what we all can allow to be discrepancy imperfectly. Everything should be the same, but I think it's not possible in real life.

B

Great. Thank you. Welcome.

A

So I just had one question radon so currently we are thinking of uh probably trying to do a match between the sas usage ping name space, which is a source table for for the prep table currently for sas, where we have one row per metric per uh it's one row per name space for metric.

F

A

Think uh the idea is, to probably uh I mean coming to this new world. I think the table that is going to be similar to that flatten table is this works as usage ping instance flattened, correct, correct.

B

And yeah, I thought what matt you said. If you remember properly, there was some plan to do a union old, plus news source, and everything should be in the same structure right without changing on the downstream models. If I understand properly, that part is not 100 clear for me, so maybe you should help help us here with your advices.

A

Yeah, because, currently what we are doing is we have this: uh the prep table right, the prep sass usage name space, so that is basically reading from the source table, which.

D

A

Table of course, uh and we just created it as a regular table and one more question, is we wanted to get rid of all the work tables and the legacy. So was there a reason why we created it and created it as a work table in legacy, because the idea was to just have everything in prep schema, because uh that's.

B

That's true: I inherited these models from matthew. I mean I found them here, so I wasn't sure shall we move it somewhere, decommission these two models, or rather the data somewhere else downstream. So it's.

C

Up to you to decide.

B

How, or for you increase and rest of the team, how how you want to handle the data, how to see the data and what is most efficient way for you. For me, it's not a problem to change or redirect anything, because we have the good baseline, json file format, and this is just uh let's say: flatten. The data.

A

Yeah sure, because that.

B

Was the plan initially.

A

B

We didn't want.

A

To have any work tables in the legacy space, uh so that's the main reason that we created uh the sas usage name, space and prep. Instead of you know, uh we just got it off that work table. So I remember that for sure, but yeah that's something that can be taken up at a later point of time, because right now I think we just need to focus on how this data is going to flow to our prep models or, as you said, unioning yeah.

A

I have to look into that and see how best we can fit this new data into our old data, or do we want to create new models separately all together or how we want to consume this data.

B

Yeah, because when I asked betty what means legacy he told me, this is not a legacy. I mean it's, not the commission just name legacy, so it's kind of steely news. That is how I understand- and yes, I agree with you and also maybe that issue for reconciliation will be the good place to put all action points for me from the engineering side.

B

If I need to change something right, if you know what I mean like okay, we reconcile the data, but meanwhile we want to change this or add this or modify this column or whatever. So we can easily follow the progress and see what's going on.

A

Yes makes sense to me we'll do that yeah currently yeah. There are some separate issues that have been created, but yeah. We can just create one single epic and then have everything under have all the issues there.

B

Yeah, because for now I put all all issues under that epic, what marcus can see easily where we are with the progress and also all of you, but still maybe we need one more uh epic, like serious, uh serious, being automation, reconciliation part. Something like that. I don't know it's.

A

We can probably just create an issue under the same epic so that everything is under. You know one epic.

B

Same for me same for me,.

A

Okay, we will do that. Okay, currently, you said this is running once a week correct.

B

Yes, yes, on monday, oh sunday, sorry.

A

A

So my other questions were related to uh you know the refactoring of the fact and mark models, but I think first we need to analyze what we want to do with respect to the prep models first and how we want to bring this data in.

B

There's a first step for sure and you you and chris can help us a lot here, because my knowledge is fully unusable there. uh I didn't touch, mart or fact model, but I'm eager to learn more about that part. So.

A

B

The first thing is to connect and check the source new source against the old source. Then we should move on. But what I understand from it, you shouldn't change uh logic, either structure too much except you want some other new requirements.

A

Nobody union wouldn't help right, I'm just confused as to why matthew said union, so I just have to look deeper into that model and see yes,.

B

Exactly I should analyze, because you have the current model based on the manually added data from current service ping, and he told us probably the best way is to do union with new sources, including radius, metrics and sql, based metrics. That is what I know. What understanding what I found in this knowledge transfer session.

D

Does this um does this new automated process include all the same history that we've got in the other model?

D

Sorry, can you repeat chris? uh Yes, so I mean the current manual process will. Will the new automated process include the same time frames? Does it have all the same history.

B

Yes, I think so because it's based on the same type of queries actually uh when we call the restful api and get the queries. These queries are used for for from for pi team to get the data manually speak and also it's just translated to snowflake and get the matrix.

B

Everything naturally is the same. It just replaced the the database management system from postgres to snowflake and also the way how basic appear is kind of automated as a part of pipeline yeah.

E

So if I'm understanding everything correctly, it sounds like. Maybe we don't need to union the old data to the new data if the new process is just generating the data for all of history, I think for the historical.

B

Reason we should you use that union sushma and that's the answer of question mice. I I don't know yet, but we should discover that part right.

A

Yeah, we need to analyze.

B

A

Right right, we need to do some more investigation on that, because there is this model called prep usage usage data flattened. So I'm thinking that we could probably take a look at this as well and see if that's something that we need to use or.

A

Let's see yeah we'll see what the best approach is, because currently, I'm um still thinking about just creating a new table all together in prep and then just bringing in this data or how best we want to use it into our pull it into our fact tables. So.

B

Yes, the good thing is, you have already in place in dbt documentation all models, so keep it pick up. What you think is sufficient and also you can go back upstream and see the source of the data, so you can easily modify for yours. Your case think first analyze explore right, go back to me if you think something has been changed, but still we have, I think, good cornerstone for us to to expand this topic in this product. I.

C

Have a couple thoughts on this: um the first is in very broad context. We have two major sources for service paying data at different grains of of.

C

Of namespace information, the very first is sort of an instance level service ping and that's what we receive from our self-managed customers who deploy a git lab in their own environment.

C

Gitlab.Com should be treated the same as that simply uh an instance of git lab that's running in an environment, in this case we're hosting the environment and it's a multi-tenant environment.

C

The second major flavor category of service ping is the namespace level service ping that happens to run in gitlab.com, and it provides granularity to an individual's customer's namespace activity.

C

um But you know just thinking through the data flows, those should be coming into the environment in sort of two big buckets. One is instance, level service, paying information which includes gitlab.com, self-managed, namespace level, service ping. If you look at that data, it's going to have a namespace id as the lowest level of grain, and that's one other big bucket.

C

So, ideally, the data coming into raw sort of is organized in those two big buckets, because having that broken down, as well as the name space indicator, just provides us a little bit more flexibility and scalability down the road.

C

The last point, I'll make is, let's, let's not design out uh the ability to modify this in the future. I know we're not, but it's it's good just to remind ourselves of that in the very short period of time, in a quarter or so we're going to have another gitlab.com we will, we will likely have something deployed in emea or apac to host all of the the new users for that specific geographic region.

C

So we're going to end up with that customer data running in snowflake, it's going to be another set of postgres, backup syncs, but we're also going to need to have a separate service ping process running for that region. So the concept of gitlab.com, north america, gitlab.com amia, gitlab.com north america instance, a instance b instance. We need to have some capability to do that in our at least in our raw data coming in and then further downstream in the dpt models.

A

Yeah, so are we saying that we just want to have like separate tables created? Is that what uh you know we are saying, or do we just want to kind of, because I think eventually for all those areas that we are talking about like emia, or you know us or all these areas we would want to do the union right based.

B

On my experience they can be, this can be where it three can be should be considered very, very carefully because you can end up with. I don't know: 100 servers rob mesh like email, epic region whatever, but you can in in us. I don't know 20 servers whatever, so it's up to us to design this in a good way with some flags columns conflicts whatever in in way to be efficient. I I speaking because of this experience. I mentioned that we support the u.s states, one server per state and support 50 states.

B

It started very naive. Like two states, we end up with 20 states, with master of the code with many copy paste codes and many many problems, so I want to avoid it here if it's possible so yeah, you should think in that way. Somehow.

C

Yep exactly- and uh you know the other big difference between those two service ping sources, instant service, ping versus name space service thing is going to be the data volumes name. Space service ping is going to be one record or one json package for every customer in in one of the get lab hosted instances.

C

So the data volumes are going to be thousands and thousands and thousands of times greater than the instance level service pings from self-managed and our git lab hosted instances.

C

So just some considerations during the design of these objects.

A

B

So yeah that's generally story behind the automated service ping improvements or implementation with yeah many things to consider in the future. Sushma chris, I think, you're the main players now to to analyze what needs to be done in order to to include or touch automated sources here.

A

Yeah, I'm going to create a new issue and I'll put my notes in there regarding the next steps and yeah. We can take it from there because I think I need to go through some of these tables that you created like.

C

A

Then yeah we'll check out all the metrics and the data and there and then we'll see how we want to best reconcile the data.

C

I would say this is uh such an important part of our future sushma. The degree that we can have a very good rigor around the design process will be very valuable.

B

C

Couple of ideas in general, I know that it's taken quite a bit of investment for us to get to this point and uh the legacy code is problematic, to say the least, to try to dig through and understand, and the data flows, that's to the degree that we can redesign a much cleaner processing stage post raw and to prop it into common. I think that would be fantastic sure, general rule of thumb, I think, is one or two stages per s per schema.

C

So if we have, if we're finding that we've got 15 steps to get from raw to fact or we're all to mark for the redesign, probably that's probably 10 too many right.

A

B

I think we should stick with best practice here to have raw prep and prod, and that is what I like here. What we have in snowflake just to stick put everything in in a proper bucket right, not more than that. If it's not yeah.

C

There are some uh there are some legacy challenges, especially when we try to join the service ping data to our subscription universe of paying customers.

C

That part of the model is very complicated because of the different locations we can receive. The subscription information.

C

Anyway, all for the future raudavon's super content. Thank you. This is. This is really great.

A

Yeah, thank you so much. I think this is definitely a great start. uh You know for this and without your help it wouldn't have been possible. So thank you so much.

B

Thank you, everyone for your contribution. I I think it's very important now to follow up and make things done fully and also consider what rob said in the future to scale almost infinitely with this. Thank you. Everyone. If you have any kind of question ping me and yeah, we'll do my best to help. I.

E

Have a question for the group quickly? How good is license at telling us all of the dashboards reports, resources that rely on specific tables or views.

E

A

E

Like if we were to cut over to a new.

A

Yes, like data mart, was.

E

It easy to discover all the dependencies of the old data in sci-sense.

A

Yes, there is actually a dashboard for that as well, where you just pass on the name of the table uh that you want to that you're looking for, and then it tells you what reports are actually dependent on the table or that have been created off that table or the view. I can just go ahead and find that and ping you, if you want.

E

Awesome, thank you sure.

A

All right, so, thank you guys. I think we are way over time, but um there was a lot of content to cover uh I'm going to post the recording in our rde fusion team channel. Thank you so much. Thank you for your time. Bye-Bye thanks. Everyone bye, bye,.