youtube image
From YouTube: Adding CI/CD to ML Pipeline with MLflow, Dagster, and Github Actions

Description

This is an ML pipeline with both CI & CD components*.

The story today is about CD - Continuous Deployment**. I’m assuming as the Data Scientist I don’t have access to the production environment (no “one-click deploy to prod” for me)***.

I develop my model (new feature branch) locally, using our dev database and local compute. I track experiments with MLflow and orchestrate my code with Dagster.

Once satisfied with my new model, I do the following to deploy:
- I check my code into a feature-branch in source control
- Open a pull request to the dev branch of our code base
- Kick back and watch

This starts a CI job via Github actions that:
- Builds my project
- Test my code
- Deploys to dev environment (teeny little deployment :)

If successful, another team member will merge my code into the dev branch.

Upon the merge into dev, this automatically triggers (dare I say, continuously) a deployment job:
- Deploys my code to Staging
- In my case, this job re-runs my ML pipeline and tests on staging data
- The benefit here is that typically staging data is closer to production data than whatever I was using in dev

If successful, this job initiates a manual review process for deployment to prod:
- Prompts an admin to review my code and choose whether to run the final job in the workflow: deploying to production.

- As the admin, I approve the deployment and the CD jobs finishes by training and deploying the model in prod.

Lots of hand-waving in this example, but I hope it helps show the git-based workflow moving between environments and the larger theme of the significant work required to actually deploy an ML project.

* This is part of my ongoing saga is to get closer to something resembling an actual production deployment instead of the notebook-based fit/predict/API patterns you tend to see

** I have a tendency of using CI/CD interchangeably (read: incorrectly). Setting up this example has really helped clarify where

*** In my examples, I only move code between environments, never models. This is the pattern I see most often with ML teams. It’s possible you might deploy your model from dev to staging to prod

Continual - we're lucky to work with ML teams that care about software engineering best practices. If that sounds like you, please hit us up.
#python #ml #dagster #mlflow

Feel free to connect with me on LI: https://www.linkedin.com/in/gustafrcavanaugh/