24 Aug 2023
Tony Tzeng, Chief Product Officer, OctoML
Abstract:
Generative AI is suddenly the hottest workload on the planet. There isn’t a company on earth that isn’t working to build LLMs, audio, image, and video generation features into existing apps (or build new ones from scratch). If the hype around generative AI is an oncoming hurricane, product and engineering leaders are standing in the eye of the storm.
In this talk, OctoML's Chief Product Officer Tony Tzeng will share strategies for how to balance the competing priorities of speed, cost, and quality in AI deployments. This presentation is for leaders of AI, engineering, and product teams, from startups to enterprises.
You will learn:
- Methods for selecting the right model to work with (build vs. buy, open-source vs. closed)
- Why fine tuning & customization is the secret ingredient for killer generative AI apps
- The practical and cost benefits of hardware flexibility for AI workloads
- How to leverage model optimizations to improve user experience and reduce cost
Abstract:
Generative AI is suddenly the hottest workload on the planet. There isn’t a company on earth that isn’t working to build LLMs, audio, image, and video generation features into existing apps (or build new ones from scratch). If the hype around generative AI is an oncoming hurricane, product and engineering leaders are standing in the eye of the storm.
In this talk, OctoML's Chief Product Officer Tony Tzeng will share strategies for how to balance the competing priorities of speed, cost, and quality in AI deployments. This presentation is for leaders of AI, engineering, and product teams, from startups to enterprises.
You will learn:
- Methods for selecting the right model to work with (build vs. buy, open-source vs. closed)
- Why fine tuning & customization is the secret ingredient for killer generative AI apps
- The practical and cost benefits of hardware flexibility for AI workloads
- How to leverage model optimizations to improve user experience and reduce cost
- 1 participant
- 18 minutes
15 Aug 2023
Learn the three main techniques to customize your image generation models.
- 5 participants
- 18 minutes
15 Aug 2023
Instantly transform any image with one prompt. Powered by OctoAI.
Filmed on 8/3 at Seattle Tech Week at OctoML HQ.
Filmed on 8/3 at Seattle Tech Week at OctoML HQ.
- 8 participants
- 47 minutes
22 Jun 2023
Introducing OctoAI, a self-optimizing compute service to run, tune, and scale models so you can focus on building AI-powered applications that wow your users.
OctoAI is currently in public beta and available to try at octoai.cloud.
OctoAI is currently in public beta and available to try at octoai.cloud.
- 11 participants
- 56 minutes
7 Apr 2023
Learn to unlock deep insights into model performance with OctoML's Profiler
In this session, you’ll learn to use the OctoML Profiler to gain insight into the end-to-end performance of a prediction function on popular CPUs and GPUs using different efficient compilation techniques. This includes PyTorch 2.0 native compiler, TorchInduction!
Use these insights to optimize your deep learning application and right-size your hardware selections to avoid costly over-provisioning. The best part? It can all be completed from your development environment.
An and Ben will conduct a live tutorial using Google’s Flan-T5 transformer model. They’ll demonstrate how to add a few lines of code to any PyTorch inference to see how an optimized version of that code will perform on popular GPU and CPU hardware targets. By the end, you’ll have the skills to profile your own PyTorch models so you can confidently select the right model, software configuration, and hardware for your AI application.
Try Profiler yourself by signing up here https://profiler.app.octoml.ai/
Here are the GitHub instructions to get started https://github.com/octoml/octoml-profile
In this session, you’ll learn to use the OctoML Profiler to gain insight into the end-to-end performance of a prediction function on popular CPUs and GPUs using different efficient compilation techniques. This includes PyTorch 2.0 native compiler, TorchInduction!
Use these insights to optimize your deep learning application and right-size your hardware selections to avoid costly over-provisioning. The best part? It can all be completed from your development environment.
An and Ben will conduct a live tutorial using Google’s Flan-T5 transformer model. They’ll demonstrate how to add a few lines of code to any PyTorch inference to see how an optimized version of that code will perform on popular GPU and CPU hardware targets. By the end, you’ll have the skills to profile your own PyTorch models so you can confidently select the right model, software configuration, and hardware for your AI application.
Try Profiler yourself by signing up here https://profiler.app.octoml.ai/
Here are the GitHub instructions to get started https://github.com/octoml/octoml-profile
- 2 participants
- 32 minutes
2 Nov 2022
OctoML CLI 🔎
https://bit.ly/OctomlYOLO
YOLOv5 Model Deployment Tutorial 🔎
https://github.com/octoml/octoml-cli-tutorials/tree/main/tutorials/yolov5
YOLOv5 Object Detection Model Deployment to Docker Desktop (Full tutorial)
Object Detection is a highly popular use case of the famous YOLO computer vision model. It is widely used in many industries and its popularity is continuing to grow. In this video we will be deploying the YOLOv5 machine learning model and pass input images and videos into it. In the end we will be able to get output videos and images with object detection bounding boxes.
This follow-along tutorial is designed to help you quickly get YOLOv5 computer vision models deployed to your local computer for inference. Below you’ll be introduced to the OctoML CLI, a free command line utility that packages machine learning models into deployable Docker containers with NVIDIA Triton Inference Server. When you’re ready to deploy to production, OctoML CLI can also be used to accelerate and deploy YOLOv5 to over 100 instance types in AWS, Azure and GCP.
🔑 TIMESTAMPS
================================
00:00 - Intro
00:55 - Download Docker
03:52 - YOLOv5 Model
07:35 - OctoML CLI
11:26 - Recap
12:17 - Model Inference
🚀 Join me on #100DaysOfML
================================
🛫 Start from day 0:
https://youtube.com/playlist?list=PLVBat3Ko2nN9z2L0izo1Reb3SmXdw7npz
👩🏽💻 LET'S CONNECT!
================================
🦄 Join my Discord server:
https://discord.gg/ZzUwYDjFwm
🛤️ Machine Learning Roadmap 2022 Website:
https://bit.ly/LearnML2022
📩 SUBSCRIBE to my machine learning newsletter:
https://learnml.substack.com/
🔔 SUBSCRIBE to my channel (It's FREE):
https://www.youtube.com/c/smithakolan?sub_confirmation=1
🎓 Connect with me on LinkedIn:
https://www.linkedin.com/in/smithakolan/
🤓 Follow me on Instagram:
https://www.instagram.com/smithakolan/
--------------------------------------------------------------------------
MORE VIDEOS:
--------------------------------------------------------------------------
📌I'm Starting My Machine Learning Company (Day 1)
https://youtu.be/lh_wyUrjS9k
📌Top Machine Learning Certifications For 2021
https://youtu.be/YhXzUZGKhIY
📌Why You Should NOT Learn Machine Learning!
https://youtu.be/reY50t2hbuM
📌How I Learnt Machine Learning In 6 Steps (3 months)
https://youtu.be/OuC3wgp1Fnw
📌How To Learn Machine Learning For Free
https://youtu.be/QNKYKzTGerA
https://bit.ly/OctomlYOLO
YOLOv5 Model Deployment Tutorial 🔎
https://github.com/octoml/octoml-cli-tutorials/tree/main/tutorials/yolov5
YOLOv5 Object Detection Model Deployment to Docker Desktop (Full tutorial)
Object Detection is a highly popular use case of the famous YOLO computer vision model. It is widely used in many industries and its popularity is continuing to grow. In this video we will be deploying the YOLOv5 machine learning model and pass input images and videos into it. In the end we will be able to get output videos and images with object detection bounding boxes.
This follow-along tutorial is designed to help you quickly get YOLOv5 computer vision models deployed to your local computer for inference. Below you’ll be introduced to the OctoML CLI, a free command line utility that packages machine learning models into deployable Docker containers with NVIDIA Triton Inference Server. When you’re ready to deploy to production, OctoML CLI can also be used to accelerate and deploy YOLOv5 to over 100 instance types in AWS, Azure and GCP.
🔑 TIMESTAMPS
================================
00:00 - Intro
00:55 - Download Docker
03:52 - YOLOv5 Model
07:35 - OctoML CLI
11:26 - Recap
12:17 - Model Inference
🚀 Join me on #100DaysOfML
================================
🛫 Start from day 0:
https://youtube.com/playlist?list=PLVBat3Ko2nN9z2L0izo1Reb3SmXdw7npz
👩🏽💻 LET'S CONNECT!
================================
🦄 Join my Discord server:
https://discord.gg/ZzUwYDjFwm
🛤️ Machine Learning Roadmap 2022 Website:
https://bit.ly/LearnML2022
📩 SUBSCRIBE to my machine learning newsletter:
https://learnml.substack.com/
🔔 SUBSCRIBE to my channel (It's FREE):
https://www.youtube.com/c/smithakolan?sub_confirmation=1
🎓 Connect with me on LinkedIn:
https://www.linkedin.com/in/smithakolan/
🤓 Follow me on Instagram:
https://www.instagram.com/smithakolan/
--------------------------------------------------------------------------
MORE VIDEOS:
--------------------------------------------------------------------------
📌I'm Starting My Machine Learning Company (Day 1)
https://youtu.be/lh_wyUrjS9k
📌Top Machine Learning Certifications For 2021
https://youtu.be/YhXzUZGKhIY
📌Why You Should NOT Learn Machine Learning!
https://youtu.be/reY50t2hbuM
📌How I Learnt Machine Learning In 6 Steps (3 months)
https://youtu.be/OuC3wgp1Fnw
📌How To Learn Machine Learning For Free
https://youtu.be/QNKYKzTGerA
- 1 participant
- 14 minutes
11 Oct 2022
Did you know that over 80% of Machine Learning models never make it into production? But it doesn’t have to be this way!
In her talk, Automated ML Deployment: A Single Stack for Hardware Independence and Maximization of Performance Per Cloud Dollar, Vanessa Yan, Staff Product Manager at OctoML, will teach you how to automate the work away.
For more information on #YOLOVISION22 please visit: https://ultralytics.com/yolo-vision
Ultralytics ⚡ resources
- About Us – https://ultralytics.com/about
- Join Our Team – https://ultralytics.com/work
- Contact Us – https://ultralytics.com/contact
YOLOv5 🚀 resources
- Vision API – https://ultralytics.com/yolov5
- GitHub – https://github.com/ultralytics/yolov5
- Wiki – https://github.com/ultralytics/yolov5...
- Tutorials – https://github.com/ultralytics/yolov5...
- Docs – https://docs.ultralytics.com
In her talk, Automated ML Deployment: A Single Stack for Hardware Independence and Maximization of Performance Per Cloud Dollar, Vanessa Yan, Staff Product Manager at OctoML, will teach you how to automate the work away.
For more information on #YOLOVISION22 please visit: https://ultralytics.com/yolo-vision
Ultralytics ⚡ resources
- About Us – https://ultralytics.com/about
- Join Our Team – https://ultralytics.com/work
- Contact Us – https://ultralytics.com/contact
YOLOv5 🚀 resources
- Vision API – https://ultralytics.com/yolov5
- GitHub – https://github.com/ultralytics/yolov5
- Wiki – https://github.com/ultralytics/yolov5...
- Tutorials – https://github.com/ultralytics/yolov5...
- Docs – https://docs.ultralytics.com
- 1 participant
- 24 minutes
21 Jun 2022
See how you can use the new OctoML CLI tool to develop and deploy intelligent applications faster. Take a trained ML model from popular frameworks like TensorFlow, Pytorch, and ONNX and deploy it into your app in three easy steps.
Try the OctoML CLI and TransparentAI on your own at: https://try.octoml.ai/cli.
0:00 How Developers Use The OctoML CLI
1:40 Local Deployment
3:47 Acceleration
5:10 Cloud Deployment
Try the OctoML CLI and TransparentAI on your own at: https://try.octoml.ai/cli.
0:00 How Developers Use The OctoML CLI
1:40 Local Deployment
3:47 Acceleration
5:10 Cloud Deployment
- 1 participant
- 8 minutes
21 Jun 2022
See how IT Operations teams can accelerate and deploy ML models into production, at scale. This demo shows how to use the @OctoML CLI to improve the reliability, performance, and cost efficiency of your ML deployments.
Try the OctoML CLI yourself at try.octoml.ai/cli
0:00 How IT Ops Uses OctoML CLI
1:09 Improve Reliability
3:30 Improve Performance
5:22 Improve Cost Efficiency
7:00 Deploy to Kubernetes
Try the OctoML CLI yourself at try.octoml.ai/cli
0:00 How IT Ops Uses OctoML CLI
1:09 Improve Reliability
3:30 Improve Performance
5:22 Improve Cost Efficiency
7:00 Deploy to Kubernetes
- 1 participant
- 10 minutes
14 Jan 2022
Matthai Philipose from Microsoft speaks about the work his team did in partnership with OctoML for flexible, bulk video analysis (millions of hours of video and billions of images analyzed per month). At this large scale, inference is a significant portion of the total compute cost and the team is working to make inference super-efficient. Microsoft ran experiments to optimize key ML models with TVM, varying input size, batching and processor targets and compared the inference throughput against the production baseline. The results showed 1.2 - 3x higher throughput typically when optimizing with TVM. Microsoft is now planning on moving TVM optimized models into production.
- - -
Recorded at TVMCon (https://www.tvmcon.org), the Machine Learning Acceleration Conference in Dec 2021.
TVMCon covers the state of the art of deep learning compilation and optimization, with a range of tutorials, research talks, case studies, and industry presentations. We discuss recent advances in ML frameworks, compilers, systems and architecture support, security, training and hardware acceleration.
Connect with us for the latest in ML Acceleration and Deployment:
Website: https://octoml.ai
LinkedIn: https://www.linkedin.com/company/octoml
Twitter: https://twitter.com/OctoML
- - -
Recorded at TVMCon (https://www.tvmcon.org), the Machine Learning Acceleration Conference in Dec 2021.
TVMCon covers the state of the art of deep learning compilation and optimization, with a range of tutorials, research talks, case studies, and industry presentations. We discuss recent advances in ML frameworks, compilers, systems and architecture support, security, training and hardware acceleration.
Connect with us for the latest in ML Acceleration and Deployment:
Website: https://octoml.ai
LinkedIn: https://www.linkedin.com/company/octoml
Twitter: https://twitter.com/OctoML
- 1 participant
- 6 minutes
10 Jan 2022
Day 2 Keynote from TVMCon (Dec 2021), the Apache TVM and Open Source ML Acceleration Conference.
SPEAKERS: Luis Ceze, Jason Knight, Vanessa Yan, Sameer Farooqui, Matthai Philipose
--
Learn about OctoML, a startup headquartered in Seattle, WA that is focused on making artificial intelligence faster and easier to deploy.
Product Demo features the OctoML Model Zoo with pre-accelerated extremely fast (sub-millisecond) ready-to-download computer vision and language models for both cloud and edge targets.
About OctoML:
OctoML was spun out of the University of Washington Paul G. Allen School of Computer Science & Engineering where a group of computer science experts worked on helping companies deploy machine learning models on varied hardware configurations. It led to the creation of the open-source ML deep learning compiler Apache TVM which has quickly become the defacto deep learning compiler used by companies like Amazon and Facebook.
++ Contents ++
00:00 - OctoML Company and Product Vision - by Luis Ceze
14:55 - OctoML Product Vision Intro - by Jason Knight
16:27 - OctoML Product Demo - by Vanessa Yan, Sameer Farooqui
24:17 - OctoML Product Vision Continued - by Jason Knight
35:17 - TVM-Optimized Video Analysis in Microsoft Watch For - by Matthai Philipose (Microsoft)
41:13 - Future of the OctoML Platform - by Jason Knight
- - -
Recorded at TVMCon (https://www.tvmcon.org), the Machine Learning Acceleration Conference in Dec 2021.
TVMCon covers the state of the art of deep learning compilation and optimization, with a range of tutorials, research talks, case studies, and industry presentations. We discuss recent advances in ML frameworks, compilers, systems and architecture support, security, training and hardware acceleration.
Connect with us for the latest in ML Acceleration and Deployment:
Website: https://octoml.ai
LinkedIn: https://www.linkedin.com/company/octoml
Twitter: https://twitter.com/OctoML
SPEAKERS: Luis Ceze, Jason Knight, Vanessa Yan, Sameer Farooqui, Matthai Philipose
--
Learn about OctoML, a startup headquartered in Seattle, WA that is focused on making artificial intelligence faster and easier to deploy.
Product Demo features the OctoML Model Zoo with pre-accelerated extremely fast (sub-millisecond) ready-to-download computer vision and language models for both cloud and edge targets.
About OctoML:
OctoML was spun out of the University of Washington Paul G. Allen School of Computer Science & Engineering where a group of computer science experts worked on helping companies deploy machine learning models on varied hardware configurations. It led to the creation of the open-source ML deep learning compiler Apache TVM which has quickly become the defacto deep learning compiler used by companies like Amazon and Facebook.
++ Contents ++
00:00 - OctoML Company and Product Vision - by Luis Ceze
14:55 - OctoML Product Vision Intro - by Jason Knight
16:27 - OctoML Product Demo - by Vanessa Yan, Sameer Farooqui
24:17 - OctoML Product Vision Continued - by Jason Knight
35:17 - TVM-Optimized Video Analysis in Microsoft Watch For - by Matthai Philipose (Microsoft)
41:13 - Future of the OctoML Platform - by Jason Knight
- - -
Recorded at TVMCon (https://www.tvmcon.org), the Machine Learning Acceleration Conference in Dec 2021.
TVMCon covers the state of the art of deep learning compilation and optimization, with a range of tutorials, research talks, case studies, and industry presentations. We discuss recent advances in ML frameworks, compilers, systems and architecture support, security, training and hardware acceleration.
Connect with us for the latest in ML Acceleration and Deployment:
Website: https://octoml.ai
LinkedIn: https://www.linkedin.com/company/octoml
Twitter: https://twitter.com/OctoML
- 5 participants
- 45 minutes