youtube image
From YouTube: Product Recommendation system on Kubernetes using Argo Workflows by Osinachi Chukwujama

Description

DataOps is the process of automating the delivery of data from where it is produced to where it is analyzed. It mainly consists of three processes: Extract, Transform, and Load. These processes are automated using pipelines. A pipeline consists of extract, load, and transformation tasks that are connected using a scripting language, oftentimes Python. You can set up a pipeline using a tool like Apache Airflow, but you will have to trade off not using the awesomeness of Kubernetes. When you want to utilize the autoscaling advantage of Kubernetes, you may consider using Argo Workflows for your pipelines. Argo Workflows is a tool that allows you to set up pipelines that run on Kubernetes. You can set up a pipeline that extracts insight from IoT time-series data using components like Spark and Minio, deployed on Kubernetes. This talk is meant to educate Data Engineers and DevOps engineers looking to set up data pipelines on Kubernetes. It’ll go over a normal setup of a product recommendation system using Argo Workflows for setting up the pipeline, Minio for storing blob files, and Apache Spark on Kubernetes for running the analysis tasks.

---
KCD Africa 2022 is the 2nd iteration of the Kubernetes Community Days Africa, a CNCF-powered free community event. Visit https://kcdafrica.com for more information.