youtube image
From YouTube: Lightning Talk: CI/CD for Data Pipelines with Argo Workflows - J.P. Zivalich, Pipekit

Description

Lightning Talk: CI/CD for Data Pipelines with Argo Workflows - J.P. Zivalich, Pipekit

Making sure that data pipelines don't break in production is no small feat. To combat this, we are seeing a new paradigm emerging in the data space influenced by modern software engineering discipline. This paradigm borrows heavily from CI/CD concepts, but applies them to data pipelines. In this talk, we'll go over how to implement CI/CD for data pipelines with Argo Workflows as the data pipeline orchestrator. We'll cover storing Workflows and WorkflowTemplates in git, validating them on pull requests, and syncing them to a cluster or clusters when releasing new versions of the data pipeline. Additionally we will cover how to test WorkflowTemplates using sample data, allowing teams to be sure that all base, corner, and edge cases are validated every time that they make a change to their production workflows.