youtube image
From YouTube: Stream Processing at Scale with Decentralized Data Pipelines - Bernhard Borges

Description

Data is the lifeblood of decision making and many enterprises have been trying to evolve into data-driven organizations. Despite allocating significant resources to achieve transformations to data-driven concerns, progress has been limited. Outdated notions of centralized data storage and ill-fitting tooling such as traditional ETL, ELT and data pipelines in between are generally recognized as the root impediment to achieving the desired transformation. Contemporary notions such as (decentralized) Data Mesh and Data Product have been at the forefront of achieving rewards associated with data-driven transformations.

A revised notion and implementation of data pipelines plays a critical role in enabling the Data Mesh. We introduce a model for processing potentially large data streams in near real-time with distributed and eventually decentralized data pipelines. Data pipelines in our model are intended to verifiably integrate data from different sources and provide data preparation, transformation, integration and analytics suitable across data ownership boundaries. We provide an implementation example of our model and propose that modern, decentralized data pipeline models play an integral part in enabling (trustless) privacy-preserving technologies, such as FHE, at scale.