DataHub Community Talks, 27 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: May 27 2021: DataHub Community Meeting (Full)

Description

Full version of the DataHub Community Meeting on May 27th 2021

Welcome - 00:00
Project Updates by Shirshanka - 00:01
◦ 0.8.0 Release
◦ AWS Deployment Recipe by Dexter Lee (Acryl Data) - 09:48
• Demo: Product Analytics design sprint [Maggie Hays (SpotHero), Dexter Lee (Acryl Data)] - 12:32
• Use-Case: DataHub on GCP by Sharath Chandra (Confluent) - 30:16
• Deep Dive: No Code Metadata Engine by John Joyce (Acryl Data) - 48:35
• General Q&A and closing remarks - 01:12:13

A

Hello, hello, everyone. I think we have all our speakers and we have quite a few people on the call. So welcome uh to the may edition of the data hub community meeting.

A

We as usual have a very packed agenda, so I'm gonna keep things moving really fast.

A

A

Cool so let's see what we did this month. First off uh quick project updates uh on the release. We have aws deployment recipe that dextre will walk us through. um Then we have a demo uh of the product analytics design sprint that maggie led and we have, for the first time a pre-recorded uh video from uh sharat that confluent um we'll talk, walk us through how he's done?

A

uh How he's deployed data hub on gcp he's actually here, but he's just uh on a low internet connection, so he's gonna be there for questions and then finally, john is going to walk us through kind of our big highlight uh no code metadata, and if we still have time we will go through questions all right. So first, uh a big uh update on kind of the metadata space itself.

A

uh Last week we uh ran metadata uh day 2021 in collaboration with linkedin, uh and you probably saw a lot of folks who joined the slack channel as a result, because we used our slack community as a way to kind of have conversations about the event, a lot of great content. I highly suggest you watch the video. We got a great group of experts and addressed a lot of these burning questions around how to do data mesh right.

A

A few uh controversial statements like you know we should be contra confronting the mess and not running away from it. But a lot of good stuff go. Take a look at it. What was really nice to see is that everyone was aligned with essentially getting all of the domains to publish metadata out and get it all connected up into a single metadata graph, which is kind of very aligned with how we think about uh the world at datahub.

A

So this is uh great, so go to youtube check it out lots of good nuggets in there all right, so project updates uh 0.8 is coming. uh We opted not to cut the release before the long weekend, just because we didn't want people to upgrade and then um run into issues, and you know not get support over the weekend. So we'll cut the release right after the long weekend so enjoy your holiday. If you are taking it, if not wait and we will get it done right after uh stats.

A

Look very similar to the previous one about five weeks, trending about the same number of commits interesting update. We've got 13 new committers into the project and this particular release will have 26 committers from 18 different companies. So that's great a lot of diversity. This is exactly what we want in terms of the biggest highlights.

A

Of course, it will include the product analytics feature as well as no code metadata, and there are a bunch of other highlights as well that I'll quickly walk through before that um there's an official uh kind of sense to our airflow lineage integration. Now uh the astronomer team has uh kind of published our provider on the registry, so it's now kind of official airflow supports data hub as a lineage back end, we're actually listed as a featured partner.

A

So this is great, I think, we'll see a lot of people using airflow connecting us up with airflow for lineage, and this is going to be great. So really, thanks to all of you for kind of getting us over the hump and all of the support, we'll probably do a longer write-up about the integration in a future blog post.

A

Okay, so product improvements, a couple of uh big improvements to search. We now support autocomplete across types. So, as you start typing, you not only get recommendations for data sets, but also charts, as well as other kinds of entities, depending on what the hits are. So it's it's pretty cool. uh It's table stakes for any nice search product, so we kind of built it. uh Try it out. It's already live on demo. Dot data hub so go take a look and play around with it.

A

Second one: it's almost like something we filled in that we didn't get to the first time. uh Pipelines now include also visualization of the tasks that are part of the pipeline themselves, so we actually organize the tasks within the pipeline based on kind of a sorting of the dependencies between the tasks and um that's that's kind of one of the things we added a few improvements, and this is thanks to the uh new york times team that have been playing around with the themes uh that are available in data hub.

A

uh So they added a few things like making the logo a bit more friendly to customization, as well as a subtitle below datahub. I'm actually very curious to hear what they are planning to put below datahub. But this screen shot looks pretty good to me.

A

And next up coming is business glossary. It was one of the big kind of requested items. The saxo bank and thoughtworks team have been working really closely to kind of build this with us. So it's great. This is kind of screenshots from their production deployment internally and next month, they'll be talking in more detail at the town hall about this, so it's actually in the code right now, but we're calling it incubating because we haven't yet published full-on documentation for how to use it. But this is sort of how it looks.

A

We've got a single card for tags as well as terms together, so you can have curated terms as well as tags living side by side.

A

You can have them at the data set level as well at the schema level, and then tags or terms in this case also get their own home page, and you can look at related entities as well as kind of sources for those terms that have been brought in like the fibo glossary and things like that.

A

So I'm really excited about this uh because it finally gives kind of the maturity uh that we've been looking for. For people to actually have curated uh glossaries that they can attach to uh schemas as well as data sets all right as usual, a lot of improvements and integrations on the systems. Side. uh Herschel has been obviously doing a great job uh managing the community here, but a lot of people have uh made a lot of contributions. So, thanks to all of you, I've listed out your github ids if you're here.

A

Thank you very much for all of the help. Big changes that we've added transformers, so you can connect to a source and, as you're extracting metadata out, also transform it before it goes into data hub people are using it to add owners. People are using it to add tags to uh metadata as it's flowing through, and I think the sky is the limit. So we'll probably see a lot of new integrations there in terms of systems, we've had deeper integrations, so improvements in integrations with dbt looker migrated out of incubating into supported production.

A

So looker is now fully supported. We actually had support for added support for views and a few other things. So thanks for that, hive also got better. We now support kind of the data bricks hive, as well as the hd and site hive they're kind of odd, in the sense that they don't use the thrift binary protocol, but they use the http and the https protocols for transporting uh for doing jdbc over http, and so it was a little bit of work, but we got it done so.

A

Hive should now be fully supported in all shapes and forms, and a few other improvements that I've listed out, uh the one I was very happy to see- was uh schema inference added for mongodb, so mongodb, as you know, is a kind of document store and there are no real schemas in it. You can put whatever you want, uh so we added uh kevin added the schema inference capability, so you can connect data hub up to a mongodb instance and it will not only get the collection names but also infer schemas.

A

A new thing that was added was kafka. Connect it's still an incubating stage because it's not fully supported, but hopefully in the next release it becomes a fully supported integration.

A

All right, a few upcoming roadmap items that I haven't published on the public roadmap, but since the community asks about this a lot I thought I would drop it in this talk. One is neptune. We get asked about it a lot, so um it's gonna happen in june for sure. So this is a graph database that is supported as a managed database managed graphdb on aws column level, lineage we're going to get to it in either june or latest by july.

A

Great expectations we're going to get to it in june, and our back is again going to happen in june or july. So over the next couple of months you should see kind of these big four items: land in the project.

A

uh All right so now, I'm going to hand it over to dexter to walk us through some of the improvements that have happened on the kubernetes side, as well as the aws deploy capabilities and the recipes that we just published dexter. You want to take it over.

B

Yep, uh hello, everyone, I'm dexter from aqua data, um so in the last few months I've talked with quite a few of you um trying to help out how to set up on kubernetes. um So the first thing we finally moved out of contrib, uh please check out our new home for all things, kubernetes and data hub dash kubernetes.

B

um So the aim um has been to make deploying on kubernetes easier and during my talk with everybody here, um I figured out some of the pain points that we had. One of the pain points was um that setting up the storage layer on kubernetes was difficult. um So what we did was we added a quick start configuration for setting up the storage layer, including my sql, neo4j uh kafka and elasticsearch, um so folks can just easily create using the helm, charts that we created by the end of next week.

B

We are planning to publish our helm charts to helm.datahub project.io, so please wait for the announcements there um and we're planning on adding guides on exposing data hub front-end, which was the hard part so setting up ingress, um to expose the data up front and to external um parties, uh and this is very specific to different platforms. So gcp has their own way. Aws has their own way, so we're planning on adding guys one by one for the widely used platforms.

B

um I want to give kudos to pedro shakti, zak ricardo and everyone else who have contributed um to improving all him charts all right. So moving on to the aws side of things, so I created a simple guide as a quick start guide for deploying on aws.

B

So it starts by talking about how to easily create a kubernetes cluster on eks, deploying data hub and depth, using our hum charts on the kubernetes cluster. um Third is the big part, exposing data up front end using the application load, balancer controller, of course there's other ways of exposing data up front in, but I wanted to focus the the aws specific way in the guide.

B

Finally, we talk about ways to use managed services as the storage layer for elasticsearch for rds and msk, and we will be adding guide to support neptune once that is supported so check out our guide for more details, awesome hand it back to you.

A

All right and I will hand it back to you so the next thing on our agenda is uh the first kind of big item. uh The data hub analytics design sprint that uh mackie uh led for us um maggie. Do you want to actually take over the screen, share and drive from there yeah.

C

Yeah that'd be fine, sounds good. uh Give me one second, so hello, everybody, I'm maggie hayes, I'm a senior pm of data services at spot, hero based out of chicago um so earlier in april.

C

I think it was april. Time is weird right now. Who knows it was within the last month or so um I teamed up with the guys over at actual data to run, what's called a design sprint. So I'll walk you guys through. What is that? What does that mean? What did we do? uh What was the point of it and then we'll move into a live demo?

C

So can y'all see my screen? Look good all right. So um if you've never heard of a design sprint, um it's something that was created. It came out of gv or google ventures and it's basically a framework to rapidly move through discovery, ideation solution, prototyping and testing solving hard problems with technology uh in five days granted we did it in three days. There are a bunch of truncated ways that you can do it, but the the original one was in a five-day, uh a five-day sprint um and I'll walk you through this.

C

If y'all are interested in learning more about this, there's a ton of information online, um this is the book. This is kind of like the main uh source of record. I guess of what a sprint framework looks like you can find out amazon all that um also on youtube. There's a channel called aj and smart, where they have. uh They have videos that break down every single session. They call it design sprint 2.0, so it kind of gives you like a refresh of it there so ample ample context or resources for you online.

C

If you want to run um similar things in your own companies, so um what the role that I played in this was really facilitators, so moving the teams through a bunch of different steps of this process.

C

So on the first day, we tackled understanding our problem like identifying and understanding our problem at hand, so that we could ultimately build a strong prototype around it. So we asserted that our problem was that the owners and admins of data hub do not understand how users are interacting with the tool. So that's a big problem right. There are a lot of technical approaches.

C

You could take to solving that and so what what we started doing was taking a step back and understanding how to contextualize that problem into the bigger picture of the data hub strategy. So we talked about how does this fit into the long term vision of data hub, and we rallied around this. This vision that in 12 to 18 months data platform owners will want to deploy data hub at the organization because it gives them superpowers so right away.

C

When we start talking about solving this problem, we want it in the the context of data hub is going to provide an immense amount of value right. So how do we, um how do owners understand their user activity so that data hub can give them data superpowers?

C

um And then we talked about what we identified? What question or questions we would be asking at the end of this process to understand if it was a success and we rallied around, are we providing data platform owners with actionable insight, so user usage analytics is not all like it doesn't. Just because you have usage analytics doesn't mean it's meaningful, so we wanted to make sure that we would be able to ask concretely. Do you now have actionable insights so that you can move towards this, like future value of data hub in the long run?

C

The next thing we did is we. We started to break down um all of the potential pain points within developing or solving this problem within the current stack, um and we reframe this into. What's called a, how might we and really it's just a way to kind of like flip, a problem on its head and turn it into an opportunity? So we talked about how might we make the analytics infrastructure easy to manage? So it's not another service for operators to manage.

C

How might we give clear insights where there's poor data qual, sorry, there's poorer uh data quality coverage but heavily used assets so that way we're we're trying to solve this solution without adding too much burden on the uh the owners or operators of the platform and then also giving insight into? Where are you seeing a lot of activity and there's actually opportunity to enrich that metadata?

C

To give folks more more power, there then another thing we did was we talked to our experts within the data hub community and wanted to make sure that we had a well-rounded understanding of this problem, set. How folks even thought about how products analytics would fit into their management of data hub, and so you know sample questions, and these user interviews were what are some like. The top questions, you'd like to be able to answer around user activity and what decisions would that inform.

C

So the idea is that everyone in this design sprint is included in every single stage of this process so that we have all their perspectives all of their kind of like joint knowledge of how to solve this problem um again. On date, we're still on day one. It was a busy day.

C

We then mapped out our kind of user experience within datahub so that we had a very concrete understanding of where this solution fit into that workflow. So um we talked about how you would install datahub as a poc have some step of ingesting metadata share it with your users gather feedback. Maybe do some iteration cycles here from there. You then move into feature development and improving metadata to then move it back into this this flow and we really targeted this idea of. We are assuming that the poc exists. There is metadata.

C

There are active users, we are gathering feedback and making decisions about user activity to inform future development areas to improve metadata and ways to drive adoption. So again, this really just helps us have a very laser like a laser focus of where this problem fits into the vision of data hub, the user, life cycle, etc.

C

Then we move into sketching solutions. You can see that these came in a variety of different ways. Some folks are writing a pencil paper. Some folks are whiteboarding mocking things up with the ui. The idea is that we just start visualizing. What does this solution? Look like then, day, two decide on a solution, we're again we're we're deciding on a solution to tackle this one big problem.

C

um Once we had all the solutions up here, we did a lot of you can, since we're doing this remotely, there's a bunch of like little emojis or thumbs up to kind of show areas where we think they're good ideas, and it's really just rallying around. How are we actually going to solve this?

C

We then walk through our user test flow to get very concrete about what are the steps that folks are going to take in order to see if this actually solves their problem, and I'm I'm zooming through this very quickly, because I want us to get to the demo, but we'll have the deck posted.

C

You guys can look through this in more detail, but basically, this user test flow then moves us into having our storyboard so that everyone who's contributing to this project knows what exact steps are going to be taken, how they fit into a user test flow and how we can kind of asynchronously begin building together.

C

So this is day two by the time we moved into day three, we started uh moving towards our prototype um and I think here, dexter.

B

You want to take things over from here, yeah cool, so you.

C

Want to share your screen.

B

uh Let's continue with the slides and then I'll share once we start the demo. um So while we started building a prototype, we wanted to have some guiding principles um on as we make decisions on our architecture.

B

So the first thing is to standardize the way usage events are produced on the react app, so please check out the event schemas there, so we standardize the page view, events search events, browse events and so on and so forth, where uh we put enough information for us to understand where these usage events are coming from and what these users events like it actually mean. um Second, was to utilize existing components of datahub, as maggie mentioned before, we don't want to make operators lives even harder by adding even more components to deploy.

B

So we wanted to use whatever components: we've already deployed to actually support a initial prototype of the analytics class analytics product. um The third was: while we wanted to have this default way of using existing components. We wanted everyone to be able to plug their own architecture for consuming these usage events.

B

So usage events are actually posted to a kafka stream, so anybody can just plug in any consumer of choice for data collection and analytics operators can also wire third-party analytics tools, like google analytics and fixed panel to the react app. So please check out this doc for more details on how to do that.

B

Unfortunately, for now you have to fork the repo, but we are going to work on making that through config all right, so moving on, uh let's go on to the the end-to-end flow, so you can see each component here are existing components in our data hub graph, um so our reactive, so that we have the user mark here, as the user interacts with the react app, it calls it sends over the events through the track endpoint in the front end. So the front end collects these events and posted to a kafka topic.

B

So we created a new topic called data hub usage event v1, and that is where all the events go through. So we added a consumer in the mae consumer which already had a connection to elasticsearch, which is why we chose this one.

B

It will listen to data hub usage event, v1 and process these events that come through so note that these events are not hydrated. So what we do like, for example, a user earn comes in. We want to know the details about this user uh to do that, we go back to gms, so we call the remote uh dow local dao to get the details about the entities, so we hydrate the entity features and we package it into a single document which we send over to the data hub usage event.

B

Data stream on elasticsearch, so elasticsearch connects all these usage events and front end. So we created a new analytics controller which sends over filter and aggregate queries to the last search data hub usage event, data stream, where it kind of says it, tries to count and do a bunch of time series analytics and things like that to build some bare bone, uh charts and tables that power our analytics service and that is fed, backed into our react app at the end.

B

So let's go on to the demo. So let me take over the share screen here.

A

Dexter just one thing, uh maybe just take a minute or two at max for the demo, I'm just looking at the timing. Oh.

B

Okay, uh you guys see the yeah the data all right, so what I did here was I modified any consumer job a little bit so that it prints out the ma the usage event that is coming in, so we are in the usual data hub app.

B

So as we click you can see that the events are coming in. So we have a page view event. You can see browse events browsers all click events as well as so, let's try searching.

B

You can see the search event that came in so it says it queried with a query sample as well as search view, events that talk about how many results was in the search page and as we click on it. Each of these actions that you take um inside the entity page inside the search page inside the browse page will translate to a certain usage events that comes in so now. These usage events are all sent over. You can see that the elastic search connector is sending the bulk request to our data stream there.

B

So once we go to the analytics beta, what we do is each of these components are configurable inside the code. In the data hub front edge, we have highlight cards, we have time series charts um and then we have tables as well as stacked bar charts. So we created these main four different visual cards that we want to support, and then we implemented all of them. So you can see here. This is searches last week and then top search queries that come in you can see sample.

B

There was top search, five searches as well as section views across different entity pages, so we have lineage, we have ownership, we have schema and so on uh also actions by entity type. You can see. We have update tags here. I updated a few yesterday to just show you guys, and then we have top view data sets. Of course we will be continuing to add more charts here. So it'd be great if we could get feedback here, so I wanted to go over the charts that we see for our own demo.datahub page.

B

So you can see we have amazing. We have 421 weekly, active users, crazy thanks for using the demo. um You can see the searches that are happening as well as the various search queries. So we can. We can gather a lot of signals, but what users are doing on this platform by just looking at these few charts?

B

Awesome. That's it for the demo.

C

Yeah one thing I'll add here: um dexter: could you actually just display open your or hide the um terminal in there, so you can see like a full s, uh a full view of the dashboard perfect. Thank you. So what we're trying to do is like find find ways to contextualize not only activity, but also where is their opportunity um to really like uh leverage the power of of data hub right.

C

So, if we're thinking about um the the number of data sets, so we have 92 data sets and half of them have owners assigned that's great. So what that means is that we're halfway towards having fully documented data sets within data hub right? So it's it's not even just the what are people looking at, but what are people looking at that's specific to the value that data hub is driving um the other part, and so speaking, from my perspective as a product manager managing this type of tool?

C

I want to understand how do I decide where to invest? My team's energy are people only looking at data sets. Are they looking at pipelines now, and maybe our pipelines aren't well documented or in the actions that they're taking? Are they? Are they adding tags? Are they manage changing owners? Are they looking at ownership, detail, lineage, etc?

C

That way, I can start to narrow down um where to have my team and my stakeholders start to and uh invest in having more robust and and uh more meaningful metadata within there. The other thing that we're thinking about is um you know. Looking at the terp uh terp is not a word excusing. The top search queries to understand like what are people even looking for in here?

C

Is it specific terms- and I think one one thing we were talking about- is um uh there as as the set of data platforms expands, do we have people coming in and searching for something like salesforce data or braised data like some of these other tools that maybe aren't in there, and that can be a leading indicator of other ingestion mechanisms or pipelines that we need to pull in?

C

So I think we can also start to leverage this idea of finding the gap of what people are searching for and trying to do, but we're not actually meeting that demand um and, like, like dexter, said, uh if you have ideas or questions about how to make this more impactful or meaningful, I uh will route you all over to the actual team, um but we're definitely excited to see where this, where this heads.

A

Cool thanks a lot uh maggie and dexter yeah. It was a great experience and I was talking to young- and you know, nick and ben over on the linkedin side as well, and they've actually built a very complex and very expensive analytics capability on the product stream as well, so at a future date. We can get into that as well. That includes sessions and a lot of deeper analytics. So it's pretty cool what people are doing with it all right. So coming back, we are running uh pretty uh late.

A

uh So what we have today is uh a very interesting talk from sharath uh he's actually in idaho, uh backpacking or something like that. He drove a thousand miles, but he was dedicated enough to pre-record uh a talk for all of you, so I'm gonna play that right away and he's on the meeting. So he'll be around to answer questions.

A

And I just have to remember to share.

D

Sounds hi folks, my name is sharath and I'm here to talk to you about data hub deployment on gcp and before we dive into it. Maybe a brief introduction about myself. uh My name is shirat. I am a data engineer at confluent, I've been one of the first teas on in confluent, and that means I helped set up the tech stack. The data stack help build out some of the tools that we use within the data science and engineering teams um yeah. So let's talk about today's agenda.

D

uh We'll cover a few use cases of why data hub is a tool that is going to be valuable for confluence internal data, warehouse team, we'll talk about the data stock and the data hub deployment steps um also what current statuses and the future plans with data.

D

So, let's get into it so at confluent our data warehouse stack is basically uh if we are a google workshop right, so we use bitquery. As our data warehouse, we use um cloud composer, which is airflow as an orchestrator for high volume jobs. We use pi spark on data procs, which is again a managed cluster on gcp um and essentially within bigquery. We have multiple layers. Think of these. As schemas landing is like a layer.

D

We have modeling and transformation layer, then we have the clean layer and finally, the reporting layer where various data, scientists, business stakeholders, power users, build out reporting views and tables to essentially pull out the data that they want from the data warehouse and what are the use cases so like we said we have these different layers and we want to make sure that anyone who looks at this data has an understanding of what this data represents.

D

What schema it represents, and just at a high level, even column and table level descriptions, and I think documentation is an essential part. We want to make sure that it's centralized and visible to anyone who wants to use the data.

D

When we talk about onboarding um right now, we have the data warehouse set up in a way that all of the lineage that exists within the data warehouse, which is big, query, will be existing or will be seen in data hub. But eventually we want to onboard various engineering teams who have kafka streams as their input, who could have their own silo databases and- and another example, is right. Now we have um data sources where an engineering team produces data into a kafka topic.

D

We use a connector to push the data into bigquery and then there are multiple layers like you see that transformations happen and you see a final table that could be real time real plus a batch, or it could be a batch processing over real time, but this lineage would help us essentially identify okay. This reporting table has a source of this kafka stream and not just the internal data warehouse tables, but also the source of the stream that we have that you are from, and the third part is visibility.

D

I think it's needless to say that data lineage would increase the visibility of how these how the data flows within the system um we'll also allow the engineering teams to use these emitters for them to emit additional information like owners or if there is additional information on lineage metadata, then these emitters can be used uh for folks who want to understand how the data is flowing yeah.

D

So why did I have so? We did look at a few options. I looked at um lifts amundsen. um There was. We also looked at one of the proprietary metabases, so we used metabase as a bi tool and we wanted to derive a lineage using database, but that had very high limitations, and you know data hubs, architecture being having having kafka in their architecture. We really want to leverage this and make sure that we use confluence kafka and data hubs, intel architecture to really power.

D

This tool that can help drive not only the metadata but also real-time changes and alerts. That could happen over this real time. So one of the other projects that we're doing is to build streaming applications, and I think this is a good example of how streaming applications can blend in with one of these metadata tools, to give us good info um just to not maybe not to dive too deep. But something else that I think is important is a lot of data. Warehouses always fall back on things like okay.

D

If something changes upstream, how do we know? There's always this gap, where you want to try to make sure that engineering teams are in line with the downstream users?

D

These metadata changes or alerts that happen real time can really help communicate that if you have a staging or a production environment, if you, if you deploy these in this these different environments, if any upstream change happens, the metadata is automatically captured and an alert is sent to downstream teams.

D

So uh before we dive into the deployment itself, just a high level uh overview, so the first step is using helm, charts with deployed data hub, and it really took us less than 30 minutes to do that. The second is using gcp. We deployed ingress services. A third, um a simple cron job to load the metadata uh into bigquery um into data, big queries metadata into data hub, and this cron job is, as is because we know that the tables and columns don't change.

D

Very often, we have like a weekly crown job that pushes this data and then the custom tag that helps us emit all of the lineage data into github. So let's maybe get into the uh data hub deployment, so just quickly going over the environment right. So, as we said, we are a gcp workshop at confluence data warehouse internal data team is the gcp uh uses all of gcp services, so it only made sense for us to deploy, manage kubernetes gcps manage kubernetes service use, use that to deploy data hub.

D

So we have a community service that runs data hub right now, and our internal servers like, for example, cloud composer, um connects with this k8 service to emit any information.

D

So before, and- and so one of the things that we we discussed is um the first step of this is basically when you go to data hub kubernetes and you have the quick start guide there, we had to do very little, nothing to very little changes to that guide. Just some infrastructure changes where we had to increase and decrease some of the uh preloaded values for the cpu and the memory space. But apart from that, I think everything else that we used from the charts helped us really deploy um data hub on gcp kubernetes seamlessly.

D

Once it was deployed, there were two things that we wanted to do that we required to do. One is to run the ui and the second is to run a gms service ingress. That would help us connect to this data hub service to emit any data, so google provides or gcp provides a good way to create english services.

D

So if you, if you look at this slide here, we have all of these data hub services that are running prerequisites that are running and all we have to do is click on the two services which is front end and gms and select and and and just create english services on this, and when you create this english services, uh gcp automatically gives you an ip address that you can use to interact with these services and, in our case the one ip address that we used for the one ingredient service that is created for the front end.

D

The ip address would be used to interact with the ui, and the second would be used to interact with the airflow, which is cloud composer. So going back, we deployed our our kubernetes using helm, charts into gcp. We created two ingress services, which is again just two clicks away. The next step is to run a simple, tiny, cron job that will push all of the data metadata into bigquery, and this cron job could be scheduled. However, you feel your database.

D

How often, if you feel your database is refreshed and the final step is, I think, how do you operationalize your lineage data into data hub? How do you make sure that your lineage data is captured in data hub? um So I think I'm going to take some spa time and and maybe walking through some of these things that we set up, but before that just a few components that are required for this. um So we figured that gcp's. Audit query log is a very good source of understanding the lineage of data.

D

That means every query that is run against. Bigquery is captured in the audit log. So any query where a source table is a table within the information schema and the destination table is also a table within the information schema. Any queries with this that match this condition is safe to say that it's a transformation uh or record in the audit log for.

A

D

I were to load table t1 from joining three source tables, s1 s2 s3's.

D

Then the audit log would capture one row where you would have the destination table as t1 and the source tables as s1 s2 s3, and when you think about data hub, that's exactly what it up is doing. It's trying to take your source tables and map it to your destination tables and bigquery audit logs gives this out of the box.

D

You don't have to worry about which sql script runs, which of the tables are which sql script is responsible for loading which of these tables and the steps you need to get this audit log into bigquery is also pretty simple. You have the gcp lobs logs in the um logging again as a service in gcp, a logs router can be set up to push these logs into bigquery. There again has basically nested rows in bigquery and we use cloud composer again airflow to transform this data and push it to data hub as emitters right.

D

So thinking of the first one uh on the left side, you would see here that we have a logging setup and the destination of this logging would basically be bigquery and using this query on the right, I have links to this place to the queries here. I can share this um during the meeting um these.

D

This query essentially breaks down um the audit log into source and destination, just two cables that do two columns just source and destination so for every destination. What are the different source tables um once you have this information you're? Basically, what you're trying to do is you use the emitter task to use this information embedded embed each of these source or destination tables into an emitted task and create a tag out of it and use this stack to push the data into a data hub?

D

So so the next step was basically to take all of this log data create a sort of a source and destination hierarchy and then using the template that is given by data hub, create an emitter task and using that emitter task we create a dag that is then executed or scheduled. However, frequently you want so, let's maybe we take a quick look at the data hub emitters itself right.

D

So when you think about the query, this is the query that we were talking about how what is the best way to extract this query in a way extract the log in a way that we get source and destinations once we have that data. All we are doing is creating upstream and downstream funds or urns, basically trying to say that for every destination table, let's build these upstream and downstream dependencies and push this entire task as a data hub emitter operator, so going back so that that's exactly what we're doing here.

D

So we are constructing these tasks and pushing the task as a dag within airflow and airflow is connected to the data to the data hub service, using that gms ingress or load balance that we created.

D

So this is the setup uh just a quick overview of what we've done.

D

We first deploy data hub on gcp using helm, charts that are using charts that are already provided in the startup quick start guide. The next would be to create these two english services, which is a two-step process, just to click, select this service and create custom services, and the next is to use the template. That's already, given as an example to load bigquery metadata. The next is to consume the logs the audit logs and create emitter tasks that are then pushed into data hub for the lineage.

D

um So our current we're taking all of this into consideration with our current. um Our current status is this. We have data deployed in our sandbox. um We don't have any of the fully managed services. For example, we don't have cloud sql or kafka confluent using all of the internal services, that data hub is packaged with what did do. What we did do is enable oidc connection, authentication using octa and uh and any user who wants to access data hub now goes through octa.

D

So there is that step of security as well, and the next part is merited and linear. So this is pretty much automated, so we have metadata and lineage um almost automated into our sandbox environment, and maybe this is a good opportunity for us to take a look at how this data looks in the current format um in our in our sandbox environment. Right. So when we look at the data here, we have a lot of different tables and for this purpose of this demo, I've just selected one table and this particular table has a hierarchy.

D

When we look at the hierarchy, we have a lot of downstream and upstream uh dependencies and within these downstream map stream dependencies, we have these nested upstream dependencies will also have subsequent dependence, for example, this particular table can have more.

D

So all of this rich data set lineage can help when, when a new user is trying to understand where the data into a particular table flows and need not worry about, the code need not worry about what the sql does at a high level can start looking at where these tables populate and support, and and and and essentially push the data to and from yeah and the next steps. I think for us. The next steps is to productionalize our current setup, make sure that we have the right versioning in our deployment.

D

um We also want to make sure that we have our metadata and lineage um uh productionized in a way that any time there's a new change. It's automatically pushed to uh data hub. um We want to start using managers like cloud cloud sql for mysql or even confluent kafka to use uh instead of the data service kafka, that's already installed as a part of the charts. um Also, we want to make sure that the metadata that we emit is rich. So, for example, we want to add more column, metadata column level data.

D

We want to add owners to this. We also want to add processing information right now. We don't use inlets and outlets, so we're trying to see what's the best way, to make sure that we include the airflow processing information into the edges. So then you have the complete view of which is the table source table which is the destination table and which processing setup really helped push this data into a bigquery yeah.

A

Thanks a lot, I wanted to make sure that we had time to get into the no no code metadata piece from uh john as well. Sharath is on the call. So if you have any questions about kind of his deployment on gcp and how he's setting it up at confluent, definitely uh being him either here or on slack, but thanks a lot charles for sending this over ahead of time and being able to attend, even though you were remote all right with that, I will move things over to john.

A

You have 14 minutes to tell us about all the magic of no good.

E

Hey guys, um can everyone see my screen: yeah, okay, awesome, yeah, thanks trishanka um and thanks dexter and sherat. Those were these are really informative and great. uh Today, I'm going to talk a little bit about a project. The accrual team has been working on in the past few weeks that we're calling no code metadata.

E

I'm going to go through sort of the problem, we're looking to solve the solution. We came up with and a little bit more deep technical details around that, as well as a demo fit in there a little bit as well, so with that, let's get right into it. um So what are the what's? The problem we're trying to solve with no code metadata? Well, the problem is that adding an entity to data hub today is pretty hard. So these are three pr's.

E

They all add two entities each and you can see that the number of files that needs to change just to add them to the gms. So the backend layer which we're going to focus on here today is about 50 files right. So we'll talk about two things, one is. The first is just the sheer complexity of adding that entity uh today. It requires greater than 25 files for entity in the complex case, in the average case, it's about 25 files and those files consist of models.

E

So these are snapshots and aspects that we're all familiar with probably search documents, relationship models, these wrestling resource values, burns and I'm sure, I'm missing something here. Endpoints we have entity and optionally aspect, endpoint files, which are all separate. We have clients which are just wrappers to actually talk to those wrestling resources. You've created the endpoints.

E

We have these things called data access, object factories, so each entity has its own search, local browse and graph dial, which allow you to talk to the persistence layer and as such, we need each entity to have a configured factory which is uh to create these. Like strongly typed factories, we have index filters which are effectively just lambdas that take metadata change events and turn them into updates against the search index and the graph index.

E

And then we have some configurations like the elastic mappings, json file, the settings, json file, etc, and that's kind of what composes that you know. 25 file per entity so again very difficult, and this is sort of a visual representation of how complex adding an entity really is.

E

This is the actual onboarding and new entity uh guide doc thing visualization, so you can see 27 steps listed here in seven different subcategories, it's just very difficult, um and because of all of that complexity, it takes a long time to actually add entities based on conversations with folks who have raised pr's data entities.

E

We found that on average it would take one to two weeks for a new data hub contributor to actually get up to speed with all the abstractions and add entities, and that's not even counting the back and forth that occurs on the pr after you've added 50 files, which can span up to a month, we've seen so it's just too hard right. um That's the problem we're trying to solve with the no code movement. That's where the no code movement comes into play. um We started to think about.

E

How can we make the process of adding an entity much simpler? And specifically, we started to rally around this goal that it should take no more than 15 minutes to add or extend a datahub entity at the backend gms layer and what we wanted to be in scope are the ability to read and write of the new entity using a rest api, the ability to define searchable fields that are indexed in the search index and to define sort of outward graph edges uh coming coming from that entity.

E

So foreign key relationships- and this is quite ambitious- we thought, but we tried it. We had some non-functional requirements. We wanted to make sure was, was present in the solution. The first one is more of a requirement based on the 15-minute primary constraint.

E

We wanted it to be declarative, which means you know you shouldn't have to write any a java code or imperative code at all, ideally or at least minimize that requirement. The second thing is we wanted this to be extensible, because we are going to hopefully move towards a more declarative world. We'll have a dsl.

E

We wanted things to be extensible horizontally, as new requirements popped up and then the third thing is: we wanted to just be very usable right, intuitive hard to make mistakes well validated up front such that runtime bugs are sort of hard to come by based on these changes, and so now I'm going to go right into a demo of what adding an entity looks like in the new no code world after the the work over the last few weeks um and then we'll go into sort of how it all works under the hood.

E

So I'm gonna get out of the slides here and go over to this town hall demo dock. I have here and we're gonna start with just modeling an entity so we're going to imagine. We want to model a an entity representing a service right. So we've talked about this before a service is maybe like an online microservice that you want to represent in data hub, and the first thing we're going to do in the new world is we're going to actually model the aspects and aspects are just.

E

You know metadata about the service that can be updated independently and we're gonna have two aspects, we'll start with the first, which is a key aspect. So what the key aspect is is basically, it consists of the fields that kind of uniquely identify the service.

E

Traditionally, this would have been in a strongly typed urn, but there's a new concept, we're introducing here, which is a key aspect allowing you to access those fields that uniquely identify a service or an entity as though you were accessing any other aspect right, as opposed to cracking open a strongly typed term.

E

So here we have a an aspect called service key and we come to our first kind of new piece of metadata that we're adding in the no code world, and that is an aspect annotation and this annotation basically allows us to assign a common name that has to be kind of globally unique across the data hub ecosystem to this aspect and we're calling it service key in this case, and then we have one field that we're going to use to primarily identify the service, and that is its name.

E

We have the second annotation, which is pretty interesting here on top of the name field, and that is the searchable annotation, and so what this allows us to do is mark the name field as searchable, and this allows us to sort of index that field based on these configurations. So we're saying this should be a partially a search, a field that can be partially matched right. That's what we're saying here and then we're saying we also want to support auto, complete queries against this field.

E

So, if you're searching in the search box on the ui, you should be able to see auto complete results based on the service name. The second aspect- we're defining, is just a set of properties. In this case, we just have two simple properties: a description and an owner, and you can see again we have a searchable annotation on description. In this case, we don't provide any configurations, because we are okay with the default searchable configurations which will simply make this a space, delimited search index field and then the second.

E

The second field has the next kind of annotation. We want to talk about the third annotation, which is this relationship annotation, and what this basically does is it marks this field as representing a foreign key relationship, an edge that extends out of the service entity and into a different entity.

E

In this case, we don't put any bounds on what can be on the other side, but we do support the ability to add something like this, where you can say, entity types is corp user, which will then restrict that edge to only have a user on the other side. So for this purpose, I think that that makes sense here.

E

The second thing we'll need to do is just add the aspect union. This is exactly the same. Nothing has changed from what happens today. We add a service aspect which basically pulls together all of these individual pieces of metadata, so it pulls together the service key, the info aspect we just defined, and then there's this third thing- that I'll quickly talk about which is pretty fun, uh which is this new browse paths aspect, and this is something that the no code initiative has allowed us to address.

E

This ask: that's come up repeatedly over the past few months, which is the ability to customize browse paths. So browse paths are what you see when you're navigating the explorer hierarchy right, we're seeing prod snowflake something else right now. All of those are generated based on this hard-coded logic that sits in gms. We've actually changed that such that you can provide custom browse paths as a normal aspect, as you would any other metadata, and so here we're actually adding that aspect.

E

So we can provide browse paths and we'll demo exactly how you would query for that in just a moment and the final thing, the final big thing we have to do is define sort of the entity model. This is the snapshot model that everyone's used to. We have one final new annotation that we put on the snapshot model, which is the entity annotation in the same way that the aspect annotation allows us to define a common name for an aspect.

E

The entity annotation allows us to define a common name for an entity which is globally unique. It allows us to get away from using the fully qualified model name as the de facto name for the entity, which we can talk about the benefits of that in a little while, but the second piece of metadata we specify here is that key aspect right so here.

E

What we're saying is that the key for the service is represented by the service key aspect which we've defined above, and what this allows us to do behind the scenes is translate between that service, key struct and a generic urn right.

E

So we have this ability to sort of serialize the service key and deserialize the service key into a struct that you can use such that when you're querying for an entity, you will always get back both in urn, which you shouldn't have to look into and a service key aspect which you can then pull fields. Out of so it's sort of this idea of a virtual aspect here and then the final thing is is just adding that service snapshot to the list of all snapshots.

E

This is again nothing different from from what we do today and that's pretty much it right. So four steps entity is now added. We redeploy gms and we redeploy mae consumer, a few other containers and we should be able to now interact with that entity, and so that's what I'm going to show now. I've already actually modeled this entity and redeployed, my own local versions to save you guys some of that awkward silence time, but we're going to go ahead and try to write an entity.

E

So the first thing we're doing is we're using a newly created generic entities endpoint, which allows you to read and write any entity in data hub and we're going to write into it. This service snapshot, so some things to call out are the description. Is my demo service? There's an owner marked here. So remember: that's the foreign key relationship.

E

This is going to be indexed in search, and then we have these custom browse pads right. So my custom browse path. 1, my custom browse path, 2. An interesting thing to note is you also have made it such that you can specify multiple browse pads, so you can actually access this entity from multiple explorer traversals, which I think is a pretty cool feature, so we're going to go ahead into this terminal and I'm going to go ahead and paste this in, and hopefully everything works.

E

Okay, so nothing came back, which is good means, there's no exceptions and we can validate that by reading that entity using the exact same resource. So this is the same end set of endpoints. All generic we'll go ahead and curl that and you can see okay here we are so we've got our new our new entity back. It has all the data that we we think it should, and we also can call out that we have that service key coming back right.

E

So now you can actually ask it: what's your name uh in a much more clean, clean way and we're gonna actually search right? So here we have a search endpoint again, it's generic! You can search across any entity in this case we're going to search across the service entity in particular, and we're going to pass in my demo. So if you remember, our description was my demo service and we'll get back a my demo service. So this is saying yes, this matched your search.

E

Query based on this this field, my demo service, then we're going to go ahead and run the autocomplete test, so autocomplete is available on that service name within the service key aspect, so we're going to go ahead and query my dem and it should auto complete into my demo service and there you have it completes into my demo service.

E

Finally, we're going to go into this browse endpoint, so this is going to allow us to say give me all the paths under my custom browse and we're going to try that out and see if we got any results so again, this will enable that explorer experience that you guys are used to on the ui.

E

Okay, we've got one back. My demo service is the browse entity there, so this seems to be working and then finally, um we have this. This new endpoint called relationships which allows you to fetch arbitrary relationships between data hub entities and in this case, what we're going to fetch is an owned by relationship that is actually incoming into a corp user right. So basically we're trying to test the inverse relationship.

E

So we're saying get me anything that I own as user one right, so we're gonna go ahead and run that and you can see okay, we got the service back, so the edge has been indexed it's available via this generic relationships. Endpoint now, and so that's basically the demo, that's the process of adding a new entity takes, you know less than 15 minutes. I think I was talking through it, so maybe 15.

E

um and we're going to go back to talking about how we did it so quickly I'll go over sort of what the architecture used to look like um you know in a nutshell, the theme is that at every one of these segments you had to define components on a per entity basis right so at the client layer you have individual classes or components for each entity. Data sets users charts dashboards, that propagates over to the resource. The wrestling endpoint layer as well.

E

You had a different set of endpoints for data sets users, charts dashboards right and inside of those components you had another layer. You had individual dowels for searching for writing and reading to the key value store and to getting relationships, and these are each specific to the individual entity. So you can see this just scales with the number of entities.

E

If you head over to the mae consumer side, which is responsible for updating the search index and the graph store as metadata changes come through the system, you'll see that we have the same exact pattern right, so you have a search builder, that's specific! For a data set. You have a user search builder. You have a user graph builder, you have a ds graph builder, um and so it's just kind of scaling with the number of entities.

E

It's the common theme across all of this, and now I'll show you kind of what the after looks like. So this is the after um we've revised that such that we have two sort of generic sets of endpoints. One is about entities, so it provides the ability to read, write, search, browse any entity in data hub.

E

We have the relationship endpoints, which allows you to effectively do the same, but for edges right and then we have these service classes that sit behind those and those are the the key they're kind of the generic uh read and write abstractions over the persistence layer right. So we have the entity service, we have the search service, we have the graph service and then heading over to the mae consumer side. We followed a very similar pattern where we have a generic entity search index builder.

E

We have a generic relationship graph builder and this is all driven based on those annotations in the model right. So that's how we compute what the updates need to be, and so I'm going to go quickly over this because we already talked through it. There's a few slides just going deeper. You can feel free to look through these in your own time. But in a nutshell, here we have four new annotations: um the entity aspect, relationship searchable, annotation that we talked about it's very easy to just define them.

E

Alongside the model as additional metadata I'll talk quickly about one of the key abstractions that unlocked sort of all of this ability, which is the entity registry, the entity registry is really a time source of truth for both models and metadata, and it's what all of the services and the index builders now depend on to get information about the model and information about the metadata. So you can think about the storage configurations, the annotations. How should I build the search index?

E

How should I build the relationship index and a key part about this is that we decouple all of the service and index builder layer from the metadata models and configuration itself such that in the future. You know, maybe we aren't using pegasus, maybe we're using protobuf, maybe we're using something else, or maybe we're having a completely dynamic entity registry where you can curl in a new schema um like a database right and from the graph services perspective from the services perspective.

E

In the indexability perspective, nothing would change, which is which is pretty exciting, so again, services. They all allow you to do generic things to each entity in relationship. One key point here is they're decoupled from storage technologies, so they're all based on data hub specific abstractions.

E

They are interfaces or abstract classes that can be very slim, they're very slim and can be implemented for a multitude of different persistent stores, which provided the default implementations for ebean, which is all the sql stores elastic on the search side and neo4j on the graph store side so far, endpoints again, we've seen them generic endpoints for both fetching entities and relationships and then index builders.

E

So one interesting thing here is that these service classes, we've introduced, are actually used now, both on the read path so from gms, but also on the right path from the index builders. So these search service and graph service are kind of common across multiple parts of the stack which makes kind of changing and updating things much easier, and we have one central abstraction that everything else depends on so just revisiting the the non-functional requirements we talked about earlier. Talking about how we may have achieved them um uh start with the declarative one.

E

So we provide a dsl for defining models as well as storage configurations without any java required right, no coding, we defined an extensible model where it's easy to add new indexed field types. So you remember text partial. We saw earlier it's very easy to add something new there. It's easy to add new relationships, it's as easy as defining a new relationship, annotation, it's easy to plug in new storage implementations and then finally, it's usable.

E

We have build and runtime model validation, provided such that, if you, when you define your entity hierarchy, you should know at build time if something is wrong right. So if you have a conflicting aspect, name or entity name or maybe you didn't define a key for some reason.

E

You'll know a build time so that we can avoid these runtime exceptions and then a couple fun features we think which are configurable browse paths as well as the moving away from the requirement to have these strongly typed earn pdls, as well as java classes, which is the current requirement, and so just quickly going to talk about the impact of this initiative. So I think firstly, the biggest impact is just the reduction in complexity across gms.

E

That's unlocked from this, so I don't know uh if, if you guys want to call out or write in the chat, how many files you think may have been made redundant or can be removed with the no code initiative, if anyone has any guesses, let me know before I.

E

Okay, it's 271 files right, so this is 271 files that we can remove from data hubs code base tomorrow, given no code right, so huge reduction in complexity, we've also boosted developer productivity from 25 or more boilerplate code files to one set of declarative, pdl files and from one to two weeks of development time down to less than 15 minutes to add a new entity.

E

Now I'm going to talk a little bit about where no code goes from here, so we're going to look to actually move up the stack right now. All of the work that you've seen is really limited to the metadata platform layer so gms and beyond, but we want to actually auto generate the graphql api at the datahub frontend layer, as well as exploring the ability to sort of gener dynamically generate sort of ui configurations.

E

um There's a couple other things we want to do as quick follow-ups, so we want to clean up all of that legacy code, so 271 files we want to get rid of if we can- and then we want to you know, continue to expand, expand on those apis. We've developed they're, very minimal they're slim.

E

They do exactly what data hubs ui needs them to do, but we can certainly foresee the ability to add sort of new uh capabilities to those apis as well, and then I'm just going to close by talking briefly about sort of the vision of datahub. As we see it, we really want datahub to become this sort of true metadata platform, where you can do things like dynamic model, registration and storage configurations, perhaps perhaps even at runtime, as we touched on before.

E

We want to provide these rich multi-language client sdks, which allow you to both read and write from the async and sync pathways into data hub from any language.

E

We want to have rich access, control roles and policies at the metadata platform layer and again we want to have those async and sync read pathways available easily available, and this is sort of the visual representation of that.

E

So we kind of think of the this having multiple kind of layers, where we have the metadata storage engine responsible for access controls, index, building the commit log, maintaining that entity registry, and then we have a set of apis on top both synchronous and asynchronous, so kafka based and and rest apis, and then on top of that, the client sdks, which can then interact with all of those and then finally I'll just conclude by saying uh talking a little bit about. You know how we get to know code.

E

So how is this going to be released? The code itself is coming next week early next week, and it includes a few different things. One is a newly introduced data hub upgrade container in cli that allows you to actually perform the upgrade against a running instance of data hub. That would be required to move to no code right. So there's two ways to do it. You can either sort of restart the data hub instance from a clean slate and that'll just work, or you can run this data hub upgrade cli against a running instance.

E

If you have a lot of data and we've tested this against a few hundred thousand rows in a in a sql store and things look good, so the second part is: we have all these guides. Basically that allow you to run the snow code upgrade against docker, compose deployments, helm deployments or manually, if you want to, if you have a different setup, but basically in a nutshell, it'll be deploy. New data hub containers, run this migration script and then verify and validate.

E

You should be done and with that I think I will hand it back to shoshanka. I may have gone over time, yup ten minutes over time. Sorry about that guys, but I'll hand it back.

A

That's fine, john. I think we can all deal with the productivity events. You've gotten us, so you've saved us a lot of time as well. A couple of things I wanted to point out, as we did the design exercise for this. um We didn't want to boil the ocean too much so a few things we actually kept them the way they are. um For example, the mce schema stays the same for now and uh even for strong types we made them uh stay as a constraint.

A

We wanted actually so these models that you're using to define your entities and your aspects they're actually used to create a serializable uh structs that you can actually use to send metadata over. So you don't lose strong types. As a result of this, uh we just created a generic entity. Endpoints so really excited about this. I hope everyone is.

A

I saw a lot of good feedback on the chat, so uh we're looking forward to rolling this out next week and then helping you as you upgrade your uh ecosystems, and you know we'll be there uh online to help you out with any of this. We've tested it out internally, quite heavily, and we've done lots of backwards and forwards compatibility testing, so we're uh comfortable that this works. But you know you only know when you finally run it so looking forward to helping you all get over the hurdle.

A

We've actually introduced storage formats now as a concept, so the next upgrade is actually going to be much more seamless, because now we have versioned metadata around. So uh that's pretty much all we had. um I was looking at the questions that came in just before the.

A

Questions that came in just before the.

B

A

uh Wait one second, there were some questions and I didn't quite know what to make of them. uh The first one was uh an update on the eta of cloud implementations.

A

uh We are going to publish uh cloud, hosted documentations about how to run these systems on aws uh gcp as well as azure. I think the aws one is out. uh If the question is about uh the eta of a cloud implementation, uh we are hosting a data hub on aws, so contact us if you uh want the flexibility of having a hosted instance of data hub run for you by us um on the struct type for injection via hive herschel looked into it. I think there are some details with how pi hive does it.

A

So please contact him. If you have questions about how to do it correctly, we have ideas, but we would need to work with you to make sure that it works for you and then the third question around data and metadata discovery.

A

If someone is on the call who wants to explain what the question really was about happy to answer, it.

A

Live all right, uh so we'll uh see you in a month, but uh hopefully next week we get the new release out and looking forward to hearing your feedback.

E