Delta Lake Delta Lake Discussions with Denny Lee (D3L2), 19 Jan 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: D3L2: Migrating from a Data Warehouse to a Lakehouse with Structured Streaming and Delta Lake

Description

In this D3L2 episode, we sit down with Christina Taylor, data engineer at Carvana, Bread Finance, and Walt Disney Company to discuss her path from data warehousing to the lakehouse. In the process, she led her teams to an open data lake that unifies batch and streaming workload with Delta Lake that decouples data storage from proprietary formats, dramatically reducing data extraction costs.

A

D3L2, take it away Denny.

B

Welcome uh to this next section of the D3 L2, vidcast and podcast, my name is Denny Lee and I'm really happy to have Christina Taylor here for our new next session about migrating from a data warehouse to a lake house with structure streaming in Delta Lake. But before we start, I really wanted to go ahead and actually allow Christina to introduce herself. So, hey Christina. Why don't you tell a little of that, the audience of who you are and like how'd, you even get involved in the the data engineering space but yeah?

B

Let's just start with who you are and a little bit about your history, yeah.

A

Sure thing: hey folks, uh I'm a data engineer today at carvana, but kind of an interesting personal story originally had a background in teaching and education.

A

Graduated into a recession worked in universities in career advisement for three years could not afford paying rents in New York City on a teacher salary, so um I got into data and analytics the Insight I've learned being a career advisor I thought. Oh okay, this is hot! So, let's see um what's going on here, um started off as an entry level analyst made it all the way to analytics manager, but then realizing okay for people to do the cool business intelligence, visualization data, science, stuff, you really need a strong data engineering foundation.

A

So let me be that janitor with a keyboard, so I took a staff to join the founding team uh at d-plus, so learned and start streaming and structure streaming on the job migrated from EMR to daily breaks, Microsoft from redshift to snowflake, so fast forward. A few years grew to be the staff engineer for fintech startup called Brad came over to carvana after the acquisition, so very strong, startup mentality really love the spark distributed computing framework.

A

So think about the code that you wrote are innately parallel, you don't have to think about Distributing across multiple machines. Partitioning all that complexity is hidden behind the data frame API, so I've really enjoyed it. Working with spark and Delta was sparkling acid right. So.

C

A

I've seen that shirt and it brings transactional nature to um perkate files, so it adds all these cool features like time travel um and version controlling and safer overwrite and replace so I really enjoy working with that as well. So here we are.

B

Okay, well, this is pretty awesome and I'm actually sort of impressed by that fact. So, let's, let's go back actually just a little bit, because I I find your history very interesting. So you originally were going to be a teacher um and you hit the recession which sucks, of course, but then you're. You notice that the Hot Topic or the hot uh career was within the realm of data engineering, and so, as you were doing career revisement, and so then that's actually how what got you involved in the in into this? uh The space yeah.

A

That's where it all started, so what can I say live like there is no tomorrow, learn as if you are going to live forever, I.

B

I love that approach to uh to to approach to it, because it it definitely continues that push for the idea of self-learning, like you know, where you're constantly educating and changing and updating yourself to based on the newest Technologies in the in the newest pushes which is really cool, and so then, okay, so you got involved involved what ultimately LED you into the realm of like even just spark and structure streamings I mean I'm, really happy that you write like the the data frame API, which is great.

B

You know a shout out to the smart Community to help build that, but like what got you to care about distributed processing the first place like you know um you, because you jump right to there right away and I'm like well. How did you get there like what what led you to actually needing to utilize? Something like spark in the first place, yeah.

A

So I'll take a spark and then streaming so I became a huge advocate for open source, Technologies and open format as well after seeing a lot of the opposite Trends in some of the more traditional Industries, so I felt like um with a spark framework. First of all, the code you write is naturally parallel and the second place you have the support of all the entire open source Community for contributions, rather than just using particular patented technology within uh your organization.

A

So that's extremely powerful, probably explained Spark's popularity and growth and open format and I think I'll touch on a little bit more when we talk about data, warehouses also I feel like it's like very misunderstood and but uh necessary concept as of today. So the idea of owning your data and no vendor login and or keeping it whether on-prem or secure in your own cloud environment. It's really really important. So um a matter of fact, and it's this kind of thinking that contributed to the 500 000 Cloud cost reduction here at carvana, so I'll discuss.

B

That a little further okay, but.

A

There's a huge opportunities for various and performance gains and cost reduction here, which also ties into my take on streaming. So when I think the biggest conception of streaming as of today as people think about us or as real time always on 24, 7 dollar, sign dollar sign dollars.

A

Because, certainly always the case right.

C

A

Can be real time but I think the most powerful um aspect of stream spark structure streaming is the idea of checkpointing and Disaster Recovery I'm, coupled with um Technologies such as AWS sqis, a notification system. You can achieve file, detection and ingestion on demand, as well as this trigger-based streaming pattern.

B

You know that makes a lot of sense. I mean uh exactly to your point, um I. Think one of the key facets that a lot of people forget about in when it comes to things like Sparks or structure streaming. Is that the most one of the most complex parts of this is actually the State Management, just actually understanding what files did you process or what rows? Did you process and average having a system that actually built in pre-built in that state management right from the get-go?

B

Actually simplifies your life, whether you actually are dealing with real-time scenarios or dealing with batch snare, so that, actually that the fact that you basically can think about the problem um purely in terms of latency, but the business logic stays exactly the same right, I think that's a pretty powerful concept, and so I think this actually naturally segments to exactly what the well you know the title of today's session is which is migrating from a data warehouse to lake house um and with structuring the Delta Lake. Well yeah, once you give your motivation about.

B

In your background, in terms of you're, saying you're going to migrate from a warehouse to a lake house, so what what was the the infrastructure or the setup that you had in terms of data warehousing the first place? And why? Oh you know, let's just start with that, like what is the current and then afterwards we'll discuss why the transition to lake houses, yeah.

A

For sure, let me, uh with the business impact on what problems we're trying to solve, with data so being uh from an analytics background, really gave me the sympathy for stakeholders, so my group at carvana handles customer Communications. So that's customer engagement retention, uh workforce management, conversation and AI blah blah blah, so um the Chrome drool of our front-end Services is called Sebastian. So that's the chat about you talk to when you visit the carvana website.

A

Often uh your first point of interaction with a customer and it would also route Communications to a relevant Advocate, and so we keep track of all Custom Communications and that goes into something called a come router service. That's a legacy monolith service that has a collection of a lot of different things that goes down like oh Customer, Center chat um and the child is assigned to an advocate or customer click. The escalation button blah blah blah. But it's a lot of different things.

A

All mixed together and as I said, this is a legacy service and they were now designed for analytics use cases. But then people want to help Sebastian work smarter, so help workforce management to assign conversation more effectively. So we definitely want that Insight. So what do we do there? Okay, so um this was before we had a proper data engineering team, so the quickest and fastest TurnKey solution.

A

It's actually very commonly seen and I've seen this across several other organizations that have worked in the past, so click off quick TurnKey solution um using a data warehouse as a Target. So.

B

A

Example, Google cloud bigquery is one of these targets. Oh so from the Google Cloud logging service. You click a button. You choose a BQ table as a destination boom in a couple of minutes, and um you have an analytics table and I have to say that having worked with SQL Server, redshift, snowflake, Google cloud, bigquery and BQ is actually, in my opinion, one of the most powerful analytics data warehouse. There is, unlike all the others, it's truly serverless. You don't have to provision at all.

A

Even some of the like serverless data warehouses like snowflake, for example. It requires you to somewhat size the warehouse right. You choose like small extra small and quite often there's a contract model. So you have to have some thought in mind, but with BQ you can truly just pay as you go and the pricing is also fairly transparent. So it all depends on how much data you Pro process and the query for on-demand pricing, um five dollars per terabyte and it's been pretty fast and I've actually not had performance compliance.

A

So you think, oh, what is the problem? Yeah.

C

B

That verbally exactly wait wait. This sounds pretty good yeah.

A

C

A

Money at the wall, serverless bottomless right, so pay pay your way off. This is not a.

B

Problem, oh that's, brutal serverless model, I. Think I'm gonna take that tagline now so.

A

um Cost is not a problem until it becomes a problem right, but some so one of the reasons why cause became out of control is that when you're using these out of the box things, so you don't write any code right, but there's also a very few customizations you could make and at best you could only partition that destination table by timestamp BQ is smart enough to recognize that as a date time. So it's not really partitioning on that hard cardinality, but you can't do anything else.

A

You won't be able to Cluster or the Water by event, name or event type, so, resulting in every time somebody does analytics online table. It's a full table scan so um think about like a two terabyte table uh every time someone does they like to start from table where events equal to something it has a full table scan so ten dollars for query, so that gets very expensive, um but um it's only one problem.

A

um This target doesn't really have schema involvement capabilities as many data warehouses. It doesn't just add that. um So if your Yvonne has a new field, it doesn't add it for you. That's not something that typical data warehouse can support, so we've been very lucky that this is a legacy service and the schema doesn't change a lot. So schema change using a new Target and do a tedious backfill was just uh like a once in a year event.

A

But when that happens- and it's it's quite tricky and it's really annoying um okay, so I talked about uh the Full Table scans and.

B

A

Able to give out Fields, right and so okay, so um so we realized one of these jobs, because we need to recompute the metric every uh five minutes. So, even though it is just on a two terabyte table that job itself a thousand dollars a day on a two terabyte table, it's crazy uh and uh using DML, it's very tricky in BQ, it's not like in Delta Lake.

A

You can just do after one match of the Star right, so it's very tricky and and there's a lot of things that are unsupported due to our business logic, resulting this thousand dollar day job for only a puny data side, two terabytes.

C

B

It got it got it.

A

Okay yeah, so um we're like this is really unsustainable. um So what do we do so? Let's our first try was actually to have one separate sync for each event: type.

B

I think I know where it's going, but please please explain this sounds.

C

A little iffy already.

A

Yeah so and then people would keep be requesting new event types and new tables, and we we do use manifest to release these things so kubernetes config connector, but that quickly became a deployment nightmare. So a senior developer needs to be involved. You are messing with production, gcp resources and we just cannot keep up with the Demand right.

B

Right because, basically, the the just just to roll back a little bit just to provide people context. The idea is that every single time anybody wants to go ahead and create something generally generate a new topic of any type. Basically, it generates a whole new set of tables. You actually have to have somebody like a senior person. It's not automated, it's not metadata, and even if it you could now there's a maintenance nightmare, because you have all of these tables to go, keep track of I'm.

A

B

Of summarizing that context, okay, for.

A

Sure and the worst part is once you release a new thing: you're like oh wait, I gotten at least have some historical data, and then you go into that all income, housing log router- that has all event types and you try to do a backfill right, but we're only interested in the Json payload and it's a structure type and when your Source contains so many different event types.

A

Your schemas don't match, because what I ended up doing I have an Excel spreadsheet of a Target table and a Target schema, like maybe 80, to 100 columns and I write an index match against the original table to generate the um bike field. Sql statements for me: I hate it wow.

B

Okay, I've just some pretty ordinary stuff in my past too, and yes, I- think you and I can both compete when it comes to some funky sequel, backfill statements, okay, got it wow.

A

Yeah you'll know how I feel I'm a hardcore data engineer. Scala is the way I already don't want to write SQL, not a long time, Excel, VBA or formatting stuff. Oh.

B

Yeah, no, no! No! Okay! Sorry! This is story time just for the for the sake of little uh just just for for posterity for fun I. Actually, if you think generally from Excel is bad I actually had worked in a system of my past um I'm not going to mention it which system it is, but in my past, where uh uh they designed the pipelines using the precursor to Vizio.

A

B

The Vizio code generated Pearl code and then the Pearl code would process the data.

B

Oh SQL script, not a code I, don't wanna I, don't wanna I, don't want to claim sql's the code, so yeah so I I could I could empathize with pain, but it also is really enlightening that basically, we still back when I had done it so I'm pretty much dating myself really nicely. But the fact is that we see keep this.

B

We seem to keep Reinventing the wheel and doing more or less the same type of hacks, even when we're talking about completely different projects like, as in my case I, was saying, I'm, Vizio and and pearl you're talking about Scala but SQL, but bigquery like yeah. It's just it's interesting that we're we're constantly shifting back and forth. Like that. Like doing doing the same mistakes, I mean.

A

Whatever works.

B

Whatever exactly what it works exactly whatever it takes all right so so this motivation is almost clear as daylight. Okay, in.

A

Terms of I haven't uh finished with all of my pain points. Oh.

C

That's right: you have not I'm sorry, I forgot. Okay, continue your pain, points, I'm, sorry, you're, right, I forgot! We didn't even.

A

Discuss the support right, yeah, that's right already, rent it all.

C

Day exactly already.

B

You should be yeah. This is already painful enough as it is you're absolutely right. There's a lot more of this. Please go on I'm. Sorry,.

A

There's one last uh pain point to us in the data export now. Carvana is a very interesting organization. uh We don't have a CTO and our cpo's vision is for each Engineering Group to be run like its own independent startups uh great. So my team has like a team of people with very strong startup mentality. He roll up the sleeves. We picked, our own favorite Cloud tool, Technology based on our skills and interests, and other teams did the same.

A

So we ended up on all three clouds, so the so the engineering uh operational database is in Azure and some SQL Server stuff in gcp, the communications Team all on gcp and the core analytics team on AWS. So it's come to the point where we need to understand how much business the company has done from a centralized perspective, and my comms team needs to provide data to other organizations so we're taking data from gcp into AWS every day, so that two terabyte table I was talking about right.

C

A

There's actually three bills: we have to pay painfully I've realized this, so first select something from bigtable five dollars terabyte and then we pay a BQ storage, API cost, which is more than the network calls. They didn't give me a breakdown unless we pay a network, egress um 12 cents per gigabyte from.

B

A

B

A

Funny thing is: when you look at the bill and it's a network egress North America I have an English degree. I looked at the gcp pricing page, which is pretty transparent compared to a lot of other providers. I had to write to support and to have him translate for me hey. This is my understanding. Is that correct, so we're actually paying because of cross cloud in nature.

C

A

Global rate so effectively shipping data from gcp into AWS we're paying the same as if shipping data from North America to anywhere in the world except Asia and Australia, so that one expert Java cost us 200 000 a year. I was like.

B

No holy sh Nikes.

A

Yeah open format, so we need to find like a better way to extract this data. Okay,.

B

Okay, so the motivation is there it's clear as daylight, whether it's the event availability, it's schema management, Target efficiencies or, just like you called out the data export the the fact that you've got to talk about 200 000 a year Okay. So all right motivation is clear.

B

Now I get it because, even though you started off talking about how much you you loved using BQ as a data warehouse the overall operations, the overall infrastructure resulted in the fact that you need a data platform that goes way outside the boundaries of what a traditional data Wars house can do, and so this invariably I guess LED you to thinking about needing to build a lake house per se. Basically, is that is that the transition, or did you end making any steps before you went, decided that the lake house was the right approach.

A

A lot of factors went into consideration here, so I think one of the core uh principles is the open format, so I really like the idea of decoupling um storage from the compute and from a specific vendor or technology. So actually, this was particularly relevant because I came from fintech, so our customers at that time was very adamant that each tenant would have its own AWS account and that data lives.

C

A

In our age of us account not in some other data, warehouse providers account. So that was extremely important for us and we also like the idea of open format, because one of the considerations of data warehouses is very easy and TurnKey solution, great user experience and there's usually no cost or low cost. When you put data in so the price problem Stars, when you have this repeated compute, there's only so much SQL can do and then, when you want to take data out, that's where you start paying hands on me.

A

So even if you you don't want to use this anymore, there's a better product, so just a sheer accounts of data exports along uh that's very scary.

B

Oh sorry, I, muted, Myself by accident, so basically, what it came down to is that before I even started that lake house discussion, you already are thinking I need to store this in an open format. So that way you can actually make sure you're not locked into any vendor such that, even if that's the Tool Du Jour today the realities with how fast this industry is changing and we're introducing new systems all the time. Maybe you want to use blah right and then this project blah is great.

B

It's even faster, even cheaper, but guess what the egress cost of getting that data out from the original data warehouse is so extensively expensive. That now you're limited in your ability to use project blah. Just because the fact of the egress costs are basically kicking your butt.

A

Yeah exactly and super important consideration for us, and uh sadly, one of the very leastly covered aspects when people are evaluating these Technologies, it's like worse than a divorce.

C

B

That's pretty brutal all right, so let's switch gear. So then that invariably I guess LED you to before again before the lake house. The Delta like that I guess like did you is I mean. Is that what made you into the open format? Or did you start with parquet, first or Json? First I'm just curious? What was the modus operandi for you.

A

Yeah, so um we've worked on both inside my structured data uh and structured data in the past, and there was a lot of uh Legacy systems that are still writing, parquet, um but uh I. Think of Delta as like a supercharged parquet files so um and it really enforces the schema. So it's not like that. You can append too incompatible uh schemas together and just lay over them in a parquet file and have to do this. Expensive, merge schema, calls every time and it's as a transactional in nature.

A

So you don't have to worry about partially corrupt files and it has a schema involvement capabilities which was really important for us. So we don't have to redeploy a Delta thing every time the service wants to change schema, and perhaps one of the more powerful features we like is how easy it is to do an absurd so on a parquet table. If you want to do an update, it's like a seven step process.

A

You have okay, create a Tamp and populate the team drop the table, rename the table there's way too much going on, but with with Delta it's it's. It's like it's so easy that you don't think about it.

B

Oh no I mean don't get me wrong, that's pretty awesome, and so it basically the features of Delta Lake itself and be in itself plus. The openness basically allowed you to feel comfortable, saying we're not going to need to worry about those egress costs anymore. We have this one Lake that basically allows us to put all of our data and then process or and query our data against so.

A

B

A

What we ended up doing for the Legacy service, um so we first ship the data to cloud storage and we take it out from cloud storage. We have to paint network, but it's something that only has to be done once right and because of structure streaming, and this extraction is incremental in nature. It's not like every time anyone wants to do an event. I pay, this bigquery select fee, plus the storage, API Cost Plus network.

A

No, so I only have to do this once and once the data landed in the bronze Delta table like I, feel very interesting thing we can do so. Not only can we partition an event date, we can de-order on a event type, so it's like a multi-dimensional inside index and the lookup is much faster and we can do also interesting things that we're not able to do in SQL and that's called uh star expansion in pi spark. So we're what we're really only interested in is just the Json payload.

A

So, instead of endless writings like Json payload, attribute one Json payload the attribute to repeat 80 times, so we can just use Json payload.star and turn this into columns. So it's very simple and very elegant. Oh.

B

That's pretty cool so basically it because of the nature of basically combining Pi spark with structured streaming uh and uh Delta Lake you're, actually able to have that reliable store and because of the Json payload attributes that you're trying to call, even though we sort of bashed on SQL a little bit before it's actually nice and simple now, so for any of the analysts that actually need to get uh the analysts that want to get access to it. It's a very simple SQL statement for them to extract this data.

A

Exactly so, we have an O income, passing bronze, Delta sink, that's being incrementally ingested, using structure streaming. We don't do it 24 7.. We we use trigger available now. So we update this bronzing uh once an hour and we do use a data Medallion architecture here so Brown silver, gold, and so uh the people who are writing the silver and or gold jobs actually have a lot more understanding of what the attribute to look for for a particular event type.

A

So, even though the bronze table is a little ugly um okay, so it has 100 columns. Lots of thems are NOS, but this is like overall, a good trade out of between the number of things you have to develop um and the cleanest okay. So, ideally, probably one event: group should have one destination, but a state. This service really wasn't designed with analytics in mind. So, okay, so the schema registry and um the publishing events to a queue. That's the that's a Holy Grail!

A

That's what every data team is, after that contract between consumers and producers, but at least by having the brown silver gold data layer separation. We can shed some light on the data lineage.

B

That, okay, that that makes a ton of sense, and so- and incidentally, you know I- do want to give you a call out that like even though the bronze may look ugly honestly, that's the whole purpose right. It's the raw logs are supposed to go there. So that way, there's minimal impact on processing.

B

So that way in case you ever need to go back, because the business logic was wrong or there was some uh a failure in processing you'd always go back to the original source and then and reprocess it from there and then subsequently, you, you know, as you called out in the um the Delta Medallion architecture in which we're talking about like a data quality framework, we're from bronze silver cold, where the bronze is where the where it drops down. Silver's filtered gold is basically the business level aggregations or Roll-Ups.

B

It basically allows your users to then to be able to go ahead and access comfortably, that they're actually accessing reliable and cleaned data. That's what.

A

It is exactly yeah exactly, and there are so many cool things you could do with spark structure streaming that, uh like a data warehouse sink, you will now be able to easily accomplish so uh streaming aggregation and streaming deduplication. So because we.

C

Have control yeah that.

A

um There's actually a lot of consideration that went into this design so um from the log router there are actually several destinations. You can choose other than bigquery. You could choose cloud, storage or Pub sub, for instance, but if you use cloud storage as a native destination, so the file gets rotated every hour.

A

So that's a huge latency requirement and you also don't have a lot of control over um the layout of the storage like you can't do things like lexical ordering for structure streaming, optimization uh well and then okay, so we we tried Pub, sub and technically you don't have to use spark structure streaming. So there's a python library that can ingest from a pub sub Source, but we like structure streaming because of the fault, tolerance and checkpointing mechanism, and we really don't want to worry too much about tuning the python job and worry about okay.

A

So how many in a badge and that sort of thing like spark, handles that for us and we actually used uh like an Apache beam job, also open source technology running on data flow, which is Google, Cloud's, managed a streaming pipeline so to copy that to the pop-up message to cloud storage. Well, because we want that infinite retention and easy backfill, so pops up doesn't retain messages forever. But um um analytics events is a mission critical data set for us.

A

It impacts in the way that we train Sebastian and the way Advocates gets evaluated by their speed to answer so we want to be correct as well, so a lot of balancing out between speed and reliability and scale. So we ended up exporting pop-up messages to cloud storage, every five minutes, and so right now we do trigger the pipelines, the injection to run every hour. But another factor that we actually really like about instructor streaming is that it's actually one unified API for both patch and streaming.

A

So you use trigger available now for a batch style workload and do you use um you come and data layout online change it to go real time, and uh so all of our Brands and silver jobs are streaming aggregation in nature, because we know that all the events should be received within five minutes. So that's how the file rotation works, so we're able to compute data as they arrive near each other, so we're using um watermark with a window of 10 minutes for.

C

A

Duplication because pubs are ensures at least once in delivery, so there will be duplicates, but we really don't want to do that. We have distinct call on a huge table.

B

Right right right: yes, yes,.

A

So that streaming deduplication is very powerful and also streaming aggregation right. So in real time someone wants to look at a performance metric, so you couldn't also use a window function to do that so have this moving average. So it's really cool.

B

Yeah no, this sounds amazing. So basically, the idea from your perspective is like you've simplified the code base significantly because of the fact that you know with the logic by which you think about streaming. The logic by which you think about batch basically is the same logic.

B

There's no change in business logic outside literally a line of code that you comment out um and then, but that way you I think Reynolds Shin had actually explained this like when he talked about like spark streaming all those years ago about, like uh I, believe the term was continuous applications.

B

They refer to like I'm, referring to I, think a spark, an a summit from like four years ago, where the context is basically what's great about spark streaming is the idea that you don't actually have to think about streaming anymore, like the business law, you're you're, decoupling, the business logic of of uh decoupling, the business logic from the latency, basically yeah.

A

Exactly so like on the high level, these are the things we consider when we design a pipeline. um So like we're, what is the format of the data? Where does it come from? How often does it get updated, but, more importantly, so the the one-time historical load will be easy. But how do you accommodate new data? How do you avoid recomputing of historical data and that's exactly what a instructor streaming offered us with these uh window based functions and streaming datuplication and aggregation.

B

Got it so so I mean sort of almost like a wrap up, then it seems to me like what it boils down to is that you've migrated like we read the titles migrating from data warehouse to lake house for structured streaming, Delta lake, so structure streaming and Delta Lake we've talked about in terms of stretch streaming gives you the ability to have that uh to be able to deal with batch and streaming at the exact same time the streaming aggregation streaming uh deduplication. These are amazingly powerful tools.

B

Delta Lake gives you the reliability, but the Crux of it. All this is that it's all open, and so, from your perspective, the idea is that migration from that warehouse to a lake house really is because the fact that the because you have multiple clouds, because you have multiple like warehouses, all these different systems, where there's egress costs, Ingress costs and so you're, basically to prevent that lock-in. In essence, that's what bolt down just like okay, if I switch to a lake house, I've got an open system and then you're able to build whatever you.

B

You actually need, as opposed to going ahead and being locked in saying too bad you're going to have to use the system for a while. Just because.

A

Yeah, so it's not only the vendor login, but Cloud login as well. Right I mean it's it's easy. If all of your infrastructure say it's in the AWS, you are in West two, but the reality is especially at bigger organizations. We're increasingly need to live with a multi-cloud environment. So that's where I think open format really shines.

B

Got it got it cool uh I mean this is probably a good end of the uh end for the this particular episode. This is super interesting uh any like tidbits any advice you want to give to people who actually are facing the very thing that you're currently doing right now, which is multi-cloud multi-systems migration from Legacy any other little tidbits, because this has been super interesting. Super helpful I just want to figure. Leave you the last little note basically yeah.

A

um I guess one of the most important lessons that I have learned is it's actually incredibly difficult to compare vendors or clouds? It's you're, always comparing apples to oranges and the cost and estimate um itself could require a separate degree for it. So I really wish we would have one for cloud called understanding or cause reduction, um but I think I've been fortunate to be able to work in both AWS and gcp and under stand some of the nuances when it comes to network Ingress, egress but I I.

A

Suppose my advices for anyone who's evaluating the technology so think about not only how you will use it but think about what the alternative would be and what the potential impact would be. If you ever choose to leave this particular vendor or Cloud. Oh.

B

This is a thank you very much for saying this has been a wonderful session. You've been super helpful, so for any of the folks that are watching us on the vidcast or eventually on Spotify, for the podcast uh join us on go.delta.io, so slack. If you have any questions but Christina I want to say. Thank you very much for taking your time to speak to me today.

A

Yeah my pleasure d3l2.