Argo Workflows and Events Community, 17 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Argo Workflows and Events Community Meeting 17th Mar 2021

Description

02:45 Streamlined machine learning workflows with Argo and Ploomber - Eduardo Blancas
18:45 Automating computer vision workflows with Onepanel and Argo - Rush Tehrani
40:30 Argo Workflows and Events Survey Results 2021 - Alex Collins

http://bit.ly/argo-wf-cmty-mtng

A

There we go good morning. uh Thank you for coming to the argo, workflows and events uh community meeting here on the 17th of march 2021, I'm just going to give you a quick overview of what we're going to do today. We've got um two presentations today. The first one is from eduardo blancas uh and it's going to be on a streamlined machine learning using our go with argo workflows with plumber.

A

Is it plumber correct me if I'm wrong and then we've got a second presentation from that's right?

A

Is it great and then we've got a second presentation for us at one panel, he's going to show you um a demonstration of using argo with one panel, which is an open source piece of software to do computer vision, um automation which I think is going to be really interesting and then finally, I'm going to share out some of the survey results from the argo workflows survey that we've had uh earlier on this year and tell you what we think about that and hopefully um be interesting to get people's commentary on that as well.

A

um If you haven't already, um please do add yourself to the attendees list here. I know that. There's more attendees than than people listed here always great to find out who's coming along to these meetings and what you're interested in um for those of you don't know, argo workflows and events.

A

They are basically a small ecosystem of cloud native, highly scalable, uh workflow, automation tools, argo events is used for for triggering workflows, typically or other actions on kubernetes and workflows is typically used for um large workflows on kubernetes um myself, I'm one of the under the principal engineer on argo, workflows, narco events, and we have a couple of other argo core engineers here as well. If you want to ask questions, this is a great opportunity to pick the brains of people who work on this every day.

A

um If you do want to ask questions, I mean you can typically ask straightforwardly just by unmuting and asking yourselves or you can come and ask on the zoom chat, and somebody will ask your question out if your microphone's not ready- and um you can also ask further questions by coming on to slack our slack channel afterwards, we do record these meetings and then we typically share the recordings out on youtube uploaded to youtube to share out to other people.

A

If you've got some colleagues who you think will might be interested, I do try and make sure that all the time stamps are in those videos. So you can just fast forward to the particular piece of interest to you as well. Okay. So that's all the preamble done with all the introduction done with um eduardo. Are you ready, yeah, yes, okay, I'll uh hand over control to you now.

B

Okay thanks, I'm gonna share my screen.

B

Okay, um is everyone able to see my screen.

B

All right so uh thanks everyone for joining today, and thanks alex for inviting me to present my project. So as alex mentioned, I am going to be speaking about plumber, which is a project that I've been working on for a little bit over a year and specifically how plumber integrates with argo for lean machine learning workflows.

B

uh Just some background about me. I've been doing data science and machine learning for the last six years in projects uh spanning working with government, some academic research and, more recently in industry and the last few years I've been focusing on the deployment of machine learning in business, critical settings in the financial industry.

B

So really the motivation to start this project was my daily job, because I just wanted something that gave me some assurance that the way my model works in development, it would work in production and to automate all this process of taking my notebooks and my scripts and my functions uh in a way that we could make predictions and use the model.

B

So, let's speak about machine learning, deployment and training, it's complicated to get models into production. For many reasons- and one of those reasons is that is the difference between the tools that we use in development versus the ones that we use in in production in deployment.

B

We usually data scientists, we usually use a combination of jupiter, notebooks scripts functions and frameworks like pytorch or scikit-learn, and when we want to take our mods to production, we we use a production already system like our workflows in this case, and one of the problems is that the difference in these tools um force us to change our code. If we want to run it in a production environment.

B

So we may want to take our notebooks and scripts, and if we want to make predictions using our workflows, we may have to change the code right. For example, if we have a node we're going to have to convert it into a script and maybe expose a command line interface, so we can call it um as a step in another workflow, and this is a really risky process, because the moment we start moving different pieces to our pipeline, the pipeline that we were developing locally or in a single server.

B

We may break things and it usually happens that when we start moving a lot of things, we are not able to reproduce our results. Maybe we have a certain uh quality metrics about our model and now we compute them, um and they are no longer true because we move a lot of things. We don't know exactly at which point we broke the model and even before we make a deployment, we may want to use a system like our workflows for training.

B

If we want to train different models, especially because machine learning is really experimental, so we don't know exactly which model is going to work. We may be training a neural network, a random forest or support vector machine.

B

We don't know which one is going to work best and if we have access to a kubernetes cluster, it's a good idea to try and leverage these compute resources and use our development pipeline to train hundreds of models at the same time.

B

So it's it's a combination of these two ideas deployment, but also uh using these four training models on a larger scale, and as I mentioned these things, uh when we start modifying our code, we may break our model and on top of this, this is not something that we don't have to do once right. Every time we change our model. Every time we find something that we can improve, or maybe we found an error in our pre-processing steps. We have to modify the pipeline and we have to deploy again.

B

So this is something that has to do. That has to happen very often and the easy, the easier for us to streamline this process, um the more robust our machine learning pipeline is going to be because the moment we find an issue we are going to be able to uh fix it and automate the deployment process.

B

Now that's the motivation behind plumber. The idea is, that um is, to provide uh see a simple api and development and a good development environment so that we can use whatever tool we like if we like python scripts, if we like jupiter notebooks, if we are doing deep learning with pytorch or classical machine learning with scikit-learn, whatever, whatever tools we like and make us the most productive, we should use those and then uh do not force us to make a lot of changes in order to run this in production.

B

That's the that's the main motivation for bloomberg and one of the main, um the main features of bloomberg is to offer a really simple api. The reason is that when we are doing machine learning, we don't know what's going to work, so we are experimenting a lot of things. We try different models. We may try different pre-processing steps, so there should be an easy way for us to modify our pipeline and see if we are making progress. It's like a really experimental kind of projects in machine learning. So this is the api.

B

It's actually really simple, similar to argo workflows. We just have a general file and the only two things that we have to specify is where's our source code. So in this case we have a task which is a function in a module called tasks and a function called get and it saves an output here and we just repeat the same logic for every every task in our pipeline. We generate some features.

B

We join all the features and we finally use a script to train a model, and then we generate an evaluation report and the model that we may want to use in production.

B

So now, just to make things um more clear, I am going to show a demo and you can take a look at the code um here if you want okay, so oh first, uh there's there's a repo with all the examples, and actually you don't have to install anything. So I'm showing the repo with examples. You can click here or here, and this is going to open a jupiter lab environment and you can just try the examples here or if you want to run locally you can you can use these commands okay.

B

So now I am going to run one of the examples. First, I'm going to show how the training pipeline looks like uh using a plot and I'm using plumber to generate this chart.

B

So, let's open this, this project has a training pipeline and a serving pipeline. So I'm going to generate both. So we can see the differences serve png.

B

Okay, so, as you can see, um the top one is the training pipeline, they look really similar and they should be when we are training a model. We get historical data. We generate some features uh the three tasks here in the middle, and then we train a mod when we want to make predictions. We get new data, maybe data that was generated in the um in the previous month or week or even one day.

B

So we get new data, uh we run the same pre-processing steps in the middle and then we load up previously trained models and make predictions- and this is also one of the most common problems when deploying machine learning pipeline is that we have to make sure that all the pre-processing steps that we make to everything that we do to the data and training happens in the same exact same way in production.

B

um This is a program called training serving sku, and this happens when the pre-processing processing steps don't match and plumber offers a way to compose pipelines. So you you can make sure that this is happening. Okay, so I'll just keep one of these here and I'm going to show how this looks like in code.

B

So you can see on the left side that I have the declaration of the pipeline and all I'm saying is: um I am going to get some data using a function here, I'm going to show the code in a little bit, so we are going to get the the data we are going to save it here and then we are going to import our uh filter generation pipeline.

B

uh This is optional and the reason why I'm importing tasks from a different file is because I want to use this for training and serving, but you could just have everything here and then. Finally, I am going to train them all using this script right and this generates this pipeline. Now, if we take a look at this serving pipeline, we are going to see it's really similar right. This is a training pipeline. This is a serving pipeline.

B

We are um also importing the features. So these three tasks we are. Instead, we are getting some new data, let's say data from the last month, and then we are using a previously trained model to make a prediction. So we are saying, use this model and save the predictions here now, let's take a look at the actual code. This is just the declaration of the pipeline.

B

I'm going to open the task that gets some data. It's it's really simple. uh All you have to do is to declare this as a function and you can use whatever framework you want inside the function. The only requirement is to follow this convention. You should have a product argument and you should use this this value to save your output, so just getting some sample data and using this product argument to save my output.

B

So this is the get task, doesn't have any dependencies now, let's take a look at some of the tasks that have dependencies.

B

um Okay here, a really similar thing, just one new thing that we have an upstream argument now I am saying that the sample area task- the one here has to use, get the output of get as input. So all I have to do is say upstream, get and plumber is going to use these references to build the pipeline, so you don't have to explicitly state dependencies.

B

um You only have to make references to other tasks and then, when you run this function, this value is going to be the output of this. This task, and and that's it that's uh all we have to do- we repeat the same process for different functions and it's the same logic, so um I am going to run this locally.

B

This is uh one of the important features uh to run these pipelines locally. Sometimes before we run these inaugural workflows, we may want to make sure that the pipeline actually runs. We may be adding some features or adding some pre-processing steps. So if we run this pipeline with a sample, let's say one percent of the data, it's gonna run really fast and it's gonna make sure that it it runs.

B

uh Very often it has happened to me that I have a pipeline that I just modified a really tiny thing and I think it's just gonna work and then I send that to a cluster and then I wait for two hours and then it breaks. So this is just a nice sanity check, so I run the training pipeline.

B

This is just with a simple command, how you run it and I'm going to show the output um just to uh this is also a nice feature that plumber has so I'm going to show the output of the training step. So let's go back to the pipeline declaration, so I'm saying I'm going to use this script and then generate the model and an html report.

B

So if you are running this in argo workflows- and this works the same locally and in other workflows, so if you're running this inargo, I'm going to show an example running in argo.

B

After this, you want to train a model, but you also want to know if the model is good right. So this is how it works. Every time you ha you use a script, that's the difference between using a function and a script. If you use a script, plumber is going to convert this to a notebook first and then it's going to execute it. So you have your fit script here on the left side, and here you have the output version. You can hide the code if you want, but this is really uh really handy.

B

You can just train your model and then get this nice report back and take a look at the numbers and the metrics, and you don't have to save any any other files. Any extra files. You can have a single standalone, html report, so this has been really convenient for for me when I train multiple models and I want to find which is the best one, so that's just how it works. Luckily, now um I'm going to play a video that shows how to uh it runs in ago just to make things smoother.

B

This is the same example that I've been showing same code. So the first thing I do is: I run this supervisor export code. This is another package that we are developing and this takes care of exporting to argo. So you can see I just run. I just run this command in the project folder and then it generates the general spec to run um to use in argo workflow. So you can see on the left side that I got the the jmo spec and then I'm going to run this um this pipeline.

B

This is the training pipeline, so I submitted and now I have the training pipeline. So I'm going to fast forward this, so you can see it's running the first task that gets the data. It's gonna install the dependencies.

B

Then it should run the task somewhere here here. Here runs a task and then it's gonna start running the upcoming task, which is the which are the two that generate the features here, and it's really simple. So you only have to export from this export command and it generates that the channel spec for you.

B

Another thing that I forgot to mention that it's also really useful when you are working locally, is the interactive console? So if you do plumber interact, uh you get a an interactive command interface and you can get your pipeline. You know back variable, so you can tell ask which are the tasks in this pipeline.

B

You can get the specific task, let's say, and this is really convenient to debug pipelines once they start growing and you have dozens of tasks. This is really convenient.

B

uh You can ask for the source code, the upstream dependencies and even start a debugging session here and just run things. um That's that's a a high level summary of the different features.

B

So, if um I'm looking for early adopters and contributors, so if you are interested in being part of this project, please reach out to me you can pip install plumber. The code is on github and, as I mentioned before, there's a repo with some examples. You can run this in the command line. You can also use binder, which is going to set up a jupyter lab environment for you, and you can try that with no setup you don't have to install anything. So that's it. I think we have a few minutes for.

A

A

I think dan's saying great presentation so eduardo it doesn't sound like we have too many questions. um So thank you very much for coming and doing that presentation for us today. I really enjoyed it I'd like to hear about um the different use cases, particularly machine learning. So that's great. Thank you very much.

A

Okay! So up next we have rush. Oh do it did we have a question, I'm hearing things so next up, we have rush um from one panel he's going to be talking about computer vision. Russia, are you ready to start? I am.

C

Let me share my screen right and we'll go from there.

C

Are you able to see my screen now.

C

Perfect, hey everyone, uh I'm rush tirani and thank you for having me uh and the one panel team, and and thanks alex for inviting us to uh have this presentation um today, I'll be talking about automating um computer vision, workflows with one panel and argo so, and just so you know um just so. You know who I am I'm actually one of the founders and uh the creators of one panel, uh one of the developers, and uh we have other developers core team members on this call as well.

C

So just to know what to give you an idea of what one panel is. We call one panel, it's an open and extensible ide for computer vision and the way it works is we provide uh built-in tools for the entire ml life cycle um for specifically for computer vision, and then one thing we do on top of. That is not only you have these built-in tools.

C

We allow you to bring in your own tools and models uh in this whole process and the way we do that is we make sure that we're following open standards and our open standards are very much kubernetes native yaml and docker images and that's kind of where argo comes into play in in this whole picture as well.

D

Okay rush, I'm sorry to interrupt, but you seem to be sharing your presenter view as opposed to the uh the other view in case you wanted to correctly.

C

Yes, thank you thanks for letting me no.

D

C

uh All right, let me uh um google there it is. I think this is the right one. Hopefully,.

D

Yes looks much better all right.

C

Thank you um so to give you an idea uh of what um kind of this end-to-end workflow looks like and the live site and the ml life cycle for computer vision.

C

um We have uh you know one panel kind of covers a big subset of it, and you have the appropriate, like the data uh preparation, uh that you have to prepare uh kind of uh stage of the whole life cycle, where you use computer vision, annotation tools or annotation tools. You do automatic annotation in that in this phase, and then you also do data augmentation pre-processing um for the build, uh and then the next step is actually building your models.

C

And in that case you are, you know using jupiter lab and vs code workspaces uh and you could use our built-in algorithms or bring your own as always, and you can also bring in your own tools uh as well into this whole pipeline again. I'll show this in a demo um and then finally, you uh not one of the other steps, is training and tuning your models.

C

uh So we have the training pipelines and the parameting tuning pipelines, which allows you to uh different yeah use, different hyper parameters for your models as your two as you're training them, and then you can again run these on multiple gpus in parallel, as you like, and then. Finally, you can actually do a visualization uh of how your model is being trained as training and uh and how it's being tuned as well.

C

And then, finally, you want to deploy the model uh which you can do and also what you also want to do. Is you want to kind of orchestrate this entire workflow and automate it? So, for example, you want to come back and have it have your annotators kind of do the annotation use the model, that's been trained uh for pre-annotation and so on, so you can actually automate this entire workflow that we have in place as well, uh and just so you know this. This whole process is very continuous and iterative.

C

So it's it's so automating it it's going to make it a lot quicker. So, instead of taking days or you know weeks, you are getting a lot. You go going through this process, a lot quicker. You know you do doing it in hours.

C

um So to give you an idea of like what we use, we use. uh We basically bring some best in class, open source technologies and um basically integrate them together into this one, unified interface. um So here's a list of the technologies and kind of go through what they're used for each.

C

So we use argo for deploying kubernetes resources like istio, stateful sets and so on.

C

We also use it for data pipelines and training pipelines, and then we also use a vargo for orchestrating that end-to-end workflow I just mentioned earlier. We also have tensorflow and pi torch, which are the deep learning frameworks and again you can bring in your own tools uh and your own frameworks in here as well.

C

It's just again yaml and docker images, and then uh we have cvat for image and video annotation and we use nni for hyperventilator tuning and soon enough, we'll have uh nni4 neural architecture search which allows you to kind of come up with your own model, architectures on the fly and even compare multiple model architectures for your types of for your particular data.

C

Jupiter and vs code are workspaces and again as data scientists use those for coding, and you know just notebooks and building their models and building their data augmentation.

C

uh You know coding their data augmentation um pipelines and then you can finally deploy that into uh in a workflow right from the jupiter notebook and I'll show that in the demo and last but not least, we have cooler integration for uh so you're able to actually uh declare these workflows in python as opposed to yaml and again run these workflows directly from jupiter lab, as you were, testing and coding.

C

So just a quick um overview of why we picked argo versus some other options. um Really our requirements, for it should be kubernetes native. That was a hard requirement because it goes with our goals, uh proven scalability. You know organizations actually using the uh the product at scale, active community and speed, speed of bug, fixes and new features which you can see by looking at github and argo checked all those requirements.

C

Airflow was a no-go for us because it's not kubernetes native. You can write it in kubernetes as an operator, I believe, but you can't declare your workflows as yaml kubernetes in native yaml and tekton looked great overall, but it's very much specifically designed for cice. Even if you go talk to the developers, that's what they say and argo is more general purpose uh as a more general-purpose workflow engine.

C

um So I'm going to jump into the demo uh now and talk about that.

C

So, as I mentioned earlier, there's you know: data preparation, there's the entire life cycle, that's in one panel and where it usually starts, is with annotation and just to give you an idea. Basically, what you do with annotation is you're trying to come up basically annotate and label issues in let's say for object, detection or semantic segmentation in this case we're trying to identify potholes. This is an actual real use case and we are annotating that in cvat to make that happen.

C

Some of the other use cases for one panel, uh just out of the community, are defects. Detection in manufacturing, uh object, detection for robotic arms uh and tree damage. Detection is another one where uh you know insect damage. Detection for entries base be caused by uh rhino. Beetle is another one. So a lot of it is a lot all about object, detection and semantic segmentation. This is a semantic segmentation example.

C

uh If you look at this one, this other task here we're trying to identify guardrail damage, so this one is not damaged, but there's other ones that are here that are damaged this one's obviously damaged and what's great about this whole environment, is that once you do your annotation, you can actually easily train it from um from right in your cvat environment. This is enabled by a python sdk, so you can see the the training, workflows and pipelines that are available in the system.

C

uh So this one is again. Argo, argo, workflow, based and I'll show you what this looks like.

C

You can actually define these in um in one panel, and it will show up in any kind of environment that you bring in via our python sdk, and so this is kind of what this workflow looks like. It has a processing piece. It has the hyper parameter pre-processing, which is pre-processing.

C

Your data has a hyper primary tuning uh which again I'll show you in more detail on how what that looks like and then finally, is the metrics writer which writes some metrics, and you can even add something to uh send notifications in slack when a training is complete and one feature that we have uh that's kind of nice is, you can actually define these parameters with different types. So argo has the parameters, but we also add these different types, so you can have a yaml type where you can define your pre-processing parameters.

C

In this case, you can define how your hard parameter tuning uh is done like in the search space and all the all the information in regards to hyper camera tuning. You can put it in as the ammo, and we also have this north pole, node selector module, where you can add, as many of these as you like, and have each task in your pipeline be kind of uh run on different different machines. For example pre-processing you don't want to necessarily run it on cpus, but training.

C

You definitely want to run on a gpu or a tpu machine.

C

um So if you execute this workflow, it's going to take a little bit, so I've actually got a pre-launched one for you, so we can take a look at it together, um and so, as I mentioned earlier, you have the pre-processing hyper-primer tuning. What's really interesting about uh this, uh this hyperparameter tuning or this this procure task is obviously you have the logs, as you would get those with argo, uh but there's this concept of uh what we call interactive sidecars, which allows you to bring in different visualization tools.

C

So, for example, this is the hyperparameter tuning visualization. You can kind of see how it's training and what kind of results you're getting based on the different high parameters, you're trying.

C

You have the tensorboard, which is um kind of showing your metrics uh as it's as things are training and that and then you can also bring in um what we have. We call glasses, which is already integrated, and this actually allows you to see that how to use your resource view usage for that particular task as it's as it's training.

C

uh Another interesting thing is: uh we do have another side car here, which is a file server sidecar, and this is actually showing what's happening in the mounted volumes in this task, uh so you can actually see your model being generated, live here and it's essentially exposing essentially exposing your volumes um uh from that particular tasks, pod or container, and so that this is kind of like basically how the training works and then what we do here is I'll show you uh how this all works.

C

In the end, when this f is finally trained, it creates a model like the best uh model out of this possible models that you could uh you can different, depending on different parameters. It creates the best model uh for you and you can actually take that model and bring it back into the annotation tool. I just showed earlier and what's great about our platform, is you can essentially um grab stuff from object, storage and push them into any kind of any of these workspaces?

C

So I want to push this into the models.

C

And then I can just sync it to the workspace and the data for my models are now going to be syncing. This workspace.

C

And then I can find it here and then this basically what you'll do is you essentially bring once you bring in that model? You can come back to your task here and do automatic annotation.

C

And select that model for automatic annotations in case in this case, it's a mask: rcnn object, detector. Now, instead of your people going in and doing this manually, every time you can use the same model that you're training for object, detection or semantic segmentation to also pre-annotate your data, so that costs that cuts uh a lot of that manual processes down, and you know that again cuts down the whole uh end-to-end process that we have here.

C

So what I showed you uh was what we have built in. um We also have this concept of workspaces, which you saw the annotation tool in here uh and with workspaces. You can actually bring in your different tools uh and again these are actually um built uh in the back end.

C

We use our google workflow to actually um deploy these and pause these and resume these, and you can actually switch this between cpus and gpus, um which is uh again if you're, like let's say using jupiter lab to do your um you're building your code in jupiter lab. You can switch to a gpu machine and do your test there.

C

Another thing that's interesting here is: I mentioned cooler and we are using cooler here to define our pipeline in python.

C

So what you can do is you can go through this whole process and actually do your data augmentation in a jupiter lab notebook and you can see the results and you can kind of see the results here uh as I've, augmented the data and then the next step in the pipeline is the actual training on that data, and I can again train in jupiter lab notebook and kind of test my model out and then once I'm ready, I can actually define those as a dag via cooler, our cooler integration and then uh submit them to uh the platform.

C

I think I didn't write. Okay, so I didn't run this tensorflow source.

C

Now, do that again and I should be able to submit.

B

C

And we just created a workflow template uh in one panel. uh It's a python defined workflow template, so it just got created by uh that. Just that cooler example I just showed earlier- and now it's also running the workflow as well, and then that last but not least, we can actually, uh like, I said, um automate the entire workflow, and this is like a quick example of that this is a pre-annotation workflow.

C

So what it does is it just launches a workspace which I showed the annotation workspace that I showed earlier and you can create a cvat and create the tasks and set the data for the task and then use the model that you just trained to pre-annotate the data. So when your annotators show up the next day, everything's ready to go, they can just make adjustments and then do the whole training process again automatically.

C

um So that's pretty much. As far as the demo is concerned, I kind of want to talk about some of the challenges we ran into. While we were.

C

Building this, so one of the main challenges is running this on multiple platforms. You have to deal with different ways to handle auto scaling. Eks has its own ways of doing it. Gke and aks are a little bit easier to deal with. You have to have different gpu daemon sets, uh depending on your the provider. They have their own different demon, sets uh and also storage classes, and so on, so eks was a little bit harder than the rest of them.

C

Another interesting challenge for us was those interactive side. Cars I showed earlier. How do you do those, and how do you make sure that they're garbage collected at the end, so we use istio and basically, we had to make sure and then we create kubernetes resources deployments, and you know, services and virtual services uh to make those happen.

C

So you have to make sure that these those are cleaned up when the uh the workflow is done, um and then we also needed to support azure, blob storage and gcs across the board, not just in in our argo but across the board.

C

So what we did there is we actually use that menu gateway in between um in between uh our s3 apis and then just to use a menu, eight gateway, so we're using s3 across the board, but then minor gateway in between makes it possible to go to azure, blob, storage and gcs at the same time, and last but not least, our back and network policy for namespace isolation. That's a big deal.

C

uh These name spaces need to be separate because it could be different teams working in different on different projects and one team necessarily shouldn't have access to the other pieces or annotators shouldn't have access to some certain types of data. So you want to make sure that separation exists.

C

So thanks for listening, uh we have our github repository listed over here in our documentation. uh Is that docs dot one panel dot ai? uh And if you like, what we're doing and you find it useful, give us a star we're always looking for contributors and if you have anything any features you want to ask that. Have us develop for you uh that we can find, we will find useful for the entire uh community and let us know uh if you need any ques.

C

If you have any questions or need help, join our slack uh and or ping me on ins uh in gargoyle slack too, you can feel free to dm me there as well.

E

Thanks russia, I see this uh there's a question on in the chat from yuan, about you mentioned that nni is used to build a hyper parameter tuning functionality. Is there a particular reason for that choice and have you looked into any of the kubernetes native solutions, such as cube flow of cadence.

C

Yeah, that's a very good question. We did look at cat tip. um I nni has a big big community around it, uh so we decided to go with that, but captive is always an option and I know it's more. It's definitely more kubernetes native then we put could potentially bring catapin as well. We do want to have multiple uh tools, so I think that's uh that's something we'll probably look at later, but we did look at it and just picked nni because it also has uh it has.

C

uh I think it's neural architecture search and hyper primer tuning was a little more more mature than cad. At this point,.

E

A

Rush, thank you very much for that. That's great presentation really interesting. I really um you know this already, but I was really interested in your kind of sidecar model. This kind of you got an interactive sidecar and I wondered if anybody else had been using something like this with with their workloads. I would definitely be interested in typically any kind of workflow. I kind of want to get that feedback fast, especially if that workflow takes a while to execute and that workflow itself can often tell me better than the infrastructure.

A

What it's doing you know where it's progressing. If it's having issues if tests are failing, you know I want to know if those tests are failing as soon as they fail. Not you know wait. The entire suite has ended, and I wonder if anybody else um had similar use. Cases that were an interactive sidecar would be useful to them.

A

That sounds like a no, so it's just me in that case fine. Okay, uh let's moving on to the last topic for today, let me see if I can find the right window here. Is the argo workflows survey so about a month and a half ago we sent out four separate surveys. I believe um one to about our go workflows, uh one about argo events, one about arko cd and one about argo rollouts, which henrik was involved in the last two of those now.

A

This is obviously not the best um forum to share at the argo cd results, but it is a good forum to talk about argo, workflows, naga events, I'm not going to talk about argo events today and the reason I'm not going to talk about them is is because I have not really collated the results yet and the reason I've not collated the results yet is because we didn't have quite so many respondents for that survey. I think we had less than 20 respondents and I'm kind of cautious about making conclusions on those numbers.

A

So today, we're just going to focus on the ongo workflow survey results and I'm just really interested in getting people's thoughts on this as well. If anything jumps out to them as being interesting, and maybe I have a few questions for people who are listening today as well about this, so we did send out the survey. We got 60 responses, which is fantastic and we computed an nps score on that.

A

That's net promoter score and that came out at 66, which is classified as I think exceptional, so we're very pleased to find out that people thought you know would be. You know. The average rating was exceptional people very pleased to recommend the product to us. When we looked at the kind of roles and use cases, it probably shouldn't surprise anybody. This was dominated by software engineers, 90 of it.

A

We did wonder if we'd see more and data scientists amongst this, and it kind of maybe hints to the fact that the people we got responding to the survey were maybe operators of argo workflows rather than um users of arco workflows, but there's not very much we can do about. I don't think, um and the use cases with the same kind of six use cases that kind of dominated in 2020, so data processing, use cases, machine learning and and an amount of ci cd and infrastructure automation.

A

One of the things we try and get feedback on a regular basis is about new features that we introduce. So we typically look to develop, features that are popular and have clear use. Cases and popular features are basically the ones that have got the most thumbs up on in our issues list. um So these are some of the features we developed over the last year, and you know we weren't surprised to see um things like workflow templates at the top of the list, and we were we were I.

A

I was surprised to see how many users, half of all users, use the api and that kind of really validates that decision for us as well, but actually people weren't aren't using memoization and semaphores and mutexes.

A

So it's kind of harder to develop those features out. You know if they're, not so popular, um so it's good. You know it's good to kind of always feedback on features you're using on that, and you may have noticed that memorization, I think, appeared in roughly august. We haven't really kind of progressed that feature as a great deal, and I think this kind of reflects that we were quite interested at um people running at scale so into it.

A

We run we're probably getting on for 350 installations, 350 clusters, um where we install our go workflows for a variety of different use cases, and we we've seen a lot more people using our foes at very large uh scale. um You know workflows with ten thousand twenty thousand and thirty thousand um pods of them simon. I think you saw a larger one recently.

D

Yeah we had a user, uh we had a user uh notify us that uh grpc doesn't even uh handle so grpc can't even handle a message size required to transmit uh his workflow that contains over 135 000 individual notes in a single workflow. So that may be the live.

A

That's maybe the largest I've seen, I think. Let me know if you've got a bigger one. I think we've already I've got. I've got a gallery of the biggest workflows I'm going to put together at some point.

A

Okay, so this may be a bit of an area of interest, so we looked and asked people what the biggest challenges that they had, and so my question here is that it came came down to kind of one main thing which was yaml.

A

People found that a lot of users were unfamiliar with uh yaman would prefer python, but we know that we have a couple of ways to author workflows in python there there is specifically the most popular way of doing that is using the cooler dsl, and I wondered if people were aware of that or they felt like their users were aware of that or not.

A

No response, no questions on that. Okay. What we're trying to understand at the moment is, is you know how much more we need to invest in a python dsl? There is a ticket in the uh github and, if you're interested in that, you can subscribe to that.

A

We people also thought about a couple other areas so improved ui um and so our go workflows, treatment, we're kind of ahead of the curve there on that, with the new user base, interface in argo, workplace, 3.0 and improved areas of documentation, people were not clear about any areas of specific improvement, so we've created a ticket for people to highlight areas and documentation where they want specific uh improvements.

A

um One question is: why do people choose argo workflows? I have anonymized this section. I removed the products that people talked about. This took that so not the users, but the products that people compared to, but the biggest um the biggest factor was the fact it was cloud native.

A

uh So there are other workflow managers out there, but they're, not cloud native, and so our go workflows. You know dominates us as the best um one that was there to do, and people also found it was really easy to use. So that was another key point, and the final thing is that people like the interrupt with other uh tools inside the argo ecosystem, particularly I'm using argo cd, and we do that ourselves.

A

We use argo cd to insta, uh to install and manage arca, workflows and I'll go work for templates, and you know core components are managed using argo cds, installation just a little bit of next steps. I think I've mentioned this previously in this discussion. um We need to determine figure out what we're going to do about. Like python supports. Do we need to promote this solution more or do we or do we need to provide more first class support for python in the software, and you know what areas documentation specifically improving, how?

A

How do we know what those are and who can help us with that? Okay, that's the summary of this. Does anybody have any questions they would like to ask about the survey.

A

Okay, a bit of silence there: okay! Well, that was the last topic for today in the community meeting. Thank you all for coming along today. If you do want to ask more questions or follow up, you come and hunt us down on slack to ask those questions and we'll we'll get back to this, um and hopefully I'll have the recording available to share probably about this time uh tomorrow.

A

Certainly by the end of the week, um if you're interested in presenting or coming along, we're really keen to have people come along and talk about the stuff they've been doing and about argo, workflows and interesting use cases they've been doing. If you want to present, you know just look us up on slack, um we're also looking for people to come along and write guest blog posts.

A

So if you want to talk a bit about argo, workflows or argo events or our go cd or I'll go rollouts, you know we're really keen for people to come along and join and become a new guest post on our website.

A

E

A

E

Out before you close how you say so, the call for papers for kubecon north america is also open, and so it would be really good to see some more argo sessions there and we'd be happy to help facilitate.

E

If, if you want, if you don't really know what you want to present on, if you're looking for someone to present with you know we're happy to facilitate and help that as well generally, if you have more than one percent drawn on the session, you increase your chances of getting accepted, so we're happy to help where we can there as well and work together to come up with some good sessions.

A

Okay, thank you very much and have a wonderful day.

A