youtube image
From YouTube: Scaling Machine Learning Workflows to Big Data with Fugue - Kevin Kho, Prefect & Han Wang, Lyft

Description

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Scaling Machine Learning Workflows to Big Data with Fugue - Kevin Kho, Prefect & Han Wang, Lyft

Data scientists often use Pandas for data that fits on a single machine, and Spark or Dask for larger datasets that need distributed computing power. What happens though, when the data starts small and then grows too much for Pandas to handle? Data scientists often find themselves reimplementing the same code to transition to Spark. Even code with the same business logic needs two separate implementations. Fugue is an open-source abstraction layer that solves this. In this talk, he'll show how Fugue lets users port native Python code to Spark or Dask with minimal code changes. By using Fugue, data science code will be written in a framework-agnostic and scale-agnostic manner that allows it to be ported to different execution environments. This will be demonstrated by showing how to scale data compute from a single machine to a Spark cluster set-up on Kubernetes.