Argo Workflows and Events Community, 16 Mar 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Argo Workflows and Events Community Meeting 16th Mar 2022

Description

02:00 WebAssembly Plugin Lightning Demo with Felix Seidel
14:10 Netflix Metaflow Demo with Savin Goyal

A

B

There we go, let's fix that, so again. Welcome to the argo workflows, events community meeting today, an overview of the agenda. We have two items on the agenda. The first one is going to be a lightning demo from felix on a web assembly plugin, which is a really interesting project in uh which I think will be. I guess available quite soon for people running argo, workforce, 33s and using template, plugins and the second demo is going to be some from savin is that is that right, siding pronunciation?

B

Yes um on netflix uh mediflow, um which is a new popular framework, I think, is really interesting if you've not already done so. Please add yourself to the list of attendees in the document, so we know um where you're from and don't be scared to speak up and say hi to everybody.

B

um For those who don't know what workflows and events is just a very brief introduction. Argo workflows is the most popular workflow execution platform on kubernetes and events is often used alongside it to trigger workflows, based on things like kafka messages and so forth. uh My name is alex. I am a principal engineer, intuit and I work predominantly on argo workflows, as well as operational analytics and previously on argo cd.

B

If you, if you want to ask any question during today's demonstrations, feel free to find the right point in the demonstrations to ask it or you can wait at the end or you can also ask in the chat- and hopefully somebody will read out your question for you, that's absolutely fine and if you want to find out more, obviously you can often come back and ask in the cncf argo workflows and anchor events slack channel.

B

If you want to ask any more questions now, we are recording this and the recording is usually available within about 24 to 48 hours. If you want to share and we'll tweet that and share that in slack as well. If you want to know where you'll be able to find out about that, okay, felix. Are you ready?

B

Yes, I'm ready. Okay, felix.

C

uh Take it away. Yes, I will need to share my screen. So that's what's up yeah okay, now check work. Give me a second.

C

Nice, so you should see my screen right.

C

The presentation: is you see it? Yes, it's very good. Okay, thanks for having me, uh if you have questions um you can uh put them in the chat as excited or ask them at the end, so it will not be a very long presentation. So you will hopefully remember your questions until the end. I'm felix I'm a master's student at the hasselbuckner institute in potsdam, which is very close to berlin in germany.

C

I'm currently writing my master thesis on web assembly and workflow systems, and this is why I decided to give the new executive plugins feature in argo. Workflows version 3.3 a try that version is released today, so you can also uh like try the plugin uh beginning today and um yeah. It is publicly available on github and I really invite you to um yeah. Give it a try if you want, but first let me explain why why webassembly so you've come to know and love containers, which is what you're using today.

C

If you use argo workflows and with containers, you put your application into a docker image and the docker image contains basically a whole like operating system.

C

In all these libraries and all dependencies, but as an application developer, what you really care about is only your business logic, and this is really the smartest part of the of the stack so 90 of what you ship to the cloud and what you run with argo workflows is boilerplate and uh your business launching is just a very tiny part of it, and this is uh why webassembly could be a good idea uh is a good idea to be used also in the cloud, because with web assembly you don't need all the boilerplate anymore.

C

You basically end up with small modules that encapsulate your business logic and those modules can use what we call capabilities to talk to the outside world and access a database, for example, or do some http requests?

C

Okay? What what do you get if you use web assembly? So, first of all, it's about like developer experience and ease of use, so um you don't need to build or test docker images anymore. You can just ship your application as a tiny module and what's good about this, also is, I think many of you could already be using like macbooks with an m1 or m2 processor like a custom, cpu architecture and what's good about web assembly. Is that all you build with webassembly is completely architecture independent and it will run just anywhere.

C

Another point is security, so like kubernetes, security and container security is notoriously hard to get right. If you use third-party images or if you don't rebuild your images periodically every week, then you're, basically in a lot of trouble and web assembly, is a lot less complex and more easy to reason about which is, which is why it's easier to secure, and the third point is performance, so I mean once a container is running um like the code executes approximately as fast as a web assembly module, but every time you start a stop a container.

C

You have some overhead. um This is this is what you also see in kubernetes. It just takes some time until it begins running, and this management overhead is also not the case with web.

D

C

So it's just a lot faster, okay enough of the talking. Let me let me show you a quick demo, so hopefully you can see my vs code here. um What I did here is just just a quick demonstration of how it looks to use plugins and web assembly with argo workflows 3.3.

C

So what we're trying to do is print a qr code which encodes the link to my repository and the current temperature in berlin, and uh let me quickly start this workflow and in the meantime I can explain some stuff to you so for fetching the weather, I'm using um a really out of the box python container, which just talks to an api and produces an output parameter. So that's what you already know from argo workflows and but the second and third step are actually more um more interesting because they use the new plugin.

B

C

And they say that they want to retrieve a web assembly module from from this from this identifier. So in this case they are hosted on github's container registry, and this is basically all you need to run web assembly modules in our workflows. When you have installed the plugin, and now we can see just like a normal workflow, the container or the part has finished.

C

The the part has given some information to the message step and if we now look at the workflow, um we'll find the name at the top. This one.

C

Then we can see.

C

A beautiful weather with a rust crab and a qr code to my repository. So that's what the web assembly modules aim to do cool. So this is how it looks in real life and, as as a last step, just a call to action. You can really easily install plugins.

C

Now it's really really cool to use if you're interested give it a try and if you're interested in any of the topics that are listed here below so web assembly in kubernetes or web assembly in every headless use case or capabilities, or something like this, please don't hesitate to like shoot me. A message on twitter or contact me on the cncs, and with this I give it back to um yeah the rest to addicts.

C

B

Oh, do you have any questions from the audience.

A

Yeah I have a question on how the um plugin development experience was for you, so any feedback you have on um you know things that could be improved or things you liked didn't like.

C

So I mean initially, I had some some struggles with uh just understanding how to build the config map, but for this I already submitted like pull requests, and otherwise there was. There was one minor thing about like the http implementation, but otherwise I thought it was really straightforward to the plugins. um So I would. I would suppose that many people will start building plugins for four useful use cases.

B

How how do you get the modules into the app.

C

um So the web assembly modules, uh like I'm thinking about two or three ways what I'm currently doing, is that I use basically the same mechanism as docker uses. So oci registry docker registry, um because um you can also basically in a docker registry, can store arbitrary files. Anything that you want. So you can also store web assembly modules, um but there are like also other projects which could be useful.

C

So there's a web assembly application package manager and there's also a project called bindle, which wants to really like find a new way of bundling applications together. So there are a few ways, but the way that's implemented right now is the most interoperable because just use the same stuff as containers.

B

So it's like running a um like: it's like image, isn't it's the it's like it's like pulling an image and running it? Isn't it.

C

Yeah it does it.

B

Does interesting.

C

I'm just thinking about that.

B

Supply chain security on that on that aspect,.

C

I mean the whole the whole supply chain. Security aspect is a lot easier with web assembly because it's not like, like there are no library dependencies or anything like this. So, um for example, you could imagine that um a web assembly module would would not even depend on something like openssl or any other library, because all the the stuff that mojo uses to talk to the outside world is provided by the runtime, and you would like update the runtime periodically, but that can be done um way way easier than updating each image. So that's that's!

C

B

I want to if I want to make a http request in my module. How do I do that? You know I have a wasm plugin that wants to make an http request. Yeah.

C

So so what I mean there are a few options, but the the thing that would follow the uh like the goal of webassemblymouse is that the webassembly runtime provides a way, a method or a library to execute http requests.

C

E

C

You don't open a socket and like call anything, but but the runtime provides means to talk to the outside world. Okay,.

B

A

I won't ask any more questions.

B

A

Well, on a related note, so with about file, processing or access. So I I see from your example: the web assembly module can accept like text input or parameters from the um from the workflow, like a text that you passed to the whale say.

A

If, if is it possible for a web assembly, module to, I don't know, pass in like a path to a file somehow and then do processing sure.

C

Sure that is, that is actually a feature that I still need to build is to support like the artifacts feature, other workflows and- and this could be represented as files as reading and writing to files. Yes, um but like opposed to containers or opposed to normal programs like with web assembly, it's all white listings, so by default the module doesn't have access to any folder on your machine or any machine.

C

So you all need to allow any operation that could potentially be dangerous.

A

I see okay in terms of the actual execution of the web assembly module. um Where is that done? Is it offloaded to some uh deployment that is just running all the webassembly modules, or is the pod spun up to execute that? How does that work so.

C

So right now- because this is this- is really the first iteration of what I'm trying to do. All the all. The web assembly modules are run within the agent part, which is created for every workflow, so every workflow gets their own agent part. So this is the scope of the web assembly modules, but as a goal for a masterpiece, I want to run the modules like in a distributed fashion, and for this I can also link the project. It's called crosslat.

C

um They basically yeah. They they play a node in a in the kubernetes cluster and allow you to to run web assembly modules as ports. So, in the end, that's what I'm trying to do, because that allows you to use the full power of a cluster and not only the power of a single machine.

A

Yeah that that sounds really cool.

B

Cool cool, okay! Well, folks, thank you very much. Of course. If you've got any questions, do you can't ask about it? I'm sure felix would love. If you want to try that out as well interesting to see what other use cases we have okay for our should we give philips bit of a round of applause. We haven't done this before.

A

B

We could do more of this okay cool, thank you so for our next demo today we have uh savin he's going to um be doing a demo. On netflix metaphor: it's happened. Are you ready? Yes, let me just.

E

E

All right, can you guys see my screen?

E

Yes, we can. Let me just do a slideshow.

F

Awesome, so um I'm here to give you a demo about a project that I've been working for now close to like half a decade um and then I'll follow up with a short demo uh around the project.

F

So, as for my background uh so till from 2016 till uh mid 2021, um I was at netflix where my team was responsible for building netflix's ml platform and um as part of that, uh we built a framework that we ended up open sourcing in 2019 called metaflow, and that framework is pretty much sort of like responsible for running, um mostly all of ml at netflix.

F

All the way from sort of like you know how does a data science prototype nml model and how do they sort of like you know, take that prototyped model onto production and since 2019 there are like many organizations outside of netflix as well that have adopted um the framework. You know, if you think about like in the streaming universe, we have netflix hbo amazon, uh the bioinformatics based companies like 23andme many financial services firms, uh many e-commerce firms as well.

F

So so many many sort of like organizations who have many diverse needs around machine learning uh projects. uh These days, sort of, like you know, they're in the market for an ml platform and metaphor sort of like that happens to be one of the choice uh that they end up making now um at netflix.

F

Of course, you know uh if you look at sort of like the kind of audience that we were trying to cater to these were data scientists uh who had a strong background in machine learning, uh advanced degrees and maths, stats, uh physics, chemistry, any sort of like you know, quantitative discipline, uh but they weren't necessarily software engineers by training and what that meant was.

F

There was sort of like usually a huge gap between prototyping, something on their laptop, which was usually sort of like you know, quick and easy on a limited set of data, with sort of like limited compute resources available on that sort of like scaling it out to the cloud and then eventually deploying that model uh into production and also production, uh pretty much sort of like uh also varied from project to project.

F

So there were sort of like certain projects like netflix recommendation systems, where you needed to deploy a model uh with really high slas, and usually they were sort of, like you know, a bunch of engineering teams involved as well as and when a new model would be rolled out versus.

F

There were many other internal projects which still sort of like required, a huge amount of compute uh to train and score that model, but the sls weren't really high enough uh as well as you know, certain models that people were just training on their laptop, but they want to wanted to sort of like train those models reliably so that then they could write memos around how that model was performing, maybe sort of, like you know, some offline analysis of an a b test that was happening um and then also many of these ml projects.

F

They ranged from sort of like a solo data scientist focused on it. What's this, you know, like a team of 10 15 data scientists working on a singular uh project. So our goal was: how could we sort of like uh ensure that uh all of these projects in their different uh stages of maturity, uh they all uh can sort of, like you know, benefit from a cohesive uh ml platform and the key sort of like um what's happening? Sorry.

G

F

Was frozen yep and now, if you sort of like you know, uh look at the key concerns that our data scientists had was they essentially had to battle two different kinds of complexities? So one was the complexity on the ml side of the house.

F

How do you take a business problem and model it uh in mathematical terms and use sort of like you know all the machinery that's available in the machine learning universe to get their work done, and then there was the entire engineering complexity uh that okay, like how do I run my compute reliably when I have to scale out to the cloud to access more resources? How do I do that when I have to think about like a b testing my models or when I have to think about, like you know, debugging anything that goes wrong?

F

uh How am I sort of like you know, thinking about all of those engineering concerns and by and large our data scientists they very much enjoyed um tackling the ml complexity, because that's what they had sort of like you know trained for that's what they uh really sort of like wanted to do over and over again. But the engineering complexity was being sort of like a huge train on their productivity and for us sort of like you know.

F

The goal was that how can we sort of like make sure that uh our our customers are end users, they're, not always sort of like fighting the engineering systems to get sort of like you know, even the simplest of model in production, and they can sort of, like you know, focus their time on data science work uh the work that's sort of like you know, a lot more valuable uh to an enterprise like netflix and many others.

F

So uh so for us, you know, rather than sort of like reinvent the wheels. uh If you look at sort of like you know uh the basic components uh that organizations need uh for any sort of like ml activity, usually sort of, like you know it's some sort of like data layer, and these days you can sort of like get access to s3 gcs. You can buy, like you know, snowflake data breaks or many vendors, offering many mature solutions on data platforms.

F

uh When you look at sort of like compute platforms, there's kubernetes there like bunch of options by aws, uh google microsoft, you can sort of like you know, uh deploy ray or dask as well, and many of these tools are sort of like again, very mature from an engineering point of view, similar argument around like workflow orchestration as well. You have argos, airflow, step functions uh and that layer is maturing as well pretty rapidly. Then, when you look at sort of like the ide side of the house, then you know most data scientists.

F

If they are sort of like in the python universe, then they want to sort of like use notebooks. If they are in the r landscape, then they are happy sort of, like you know, using r studio or maybe sort of like you know, pycharm uh and many other sort of like ide.

F

So so that's also sort of like again yet another mature stack and then everything about that in terms of your training frameworks like tensorflow, pytorch or if the organization wants to roll their own again, there is sort of, like you know some degree of maturity that we can expect. So in terms of the raw building blocks uh of these sort of like ml infrastructure. Ecosystem.

F

Very mature pieces have existed for a really long period of time, and we also did not want to sort of like reinvent those wheels, and our goal was that okay, can we sort of like assemble this stack and provide sort of like an opinated ux on top of the stack so that our data scientists can be uh productive and we can essentially just piggyback on the great work that all of these communities are doing uh already for us?

F

So let me sort of, like you know, just quickly, go through um what what sort of like you know at a very high level, metaphor ux, looks like and then I'll quickly jump uh into a demo. So, let's, let's sort of like walk through uh a hypothetical project life cycle. You know a data scientist. Let's say uh they start: um okay yeah, so you know usually sort of like they'll start exploring uh within the confines of a jupiter notebook.

F

But then very often what happens is that you know, while notebooks are great for exploration, they very quickly uh become messy. You know, there's like hidden state people are still sort of like figuring out.

F

What's the right way of organizing notebooks uh versioning them sharing sort of like you know, notebooks with one another and for us we sort of like you know, position ourselves that uh either inside the notebook or outside the notebook, one sort of like uh you have done some early exploration, then it's the right time to sort of like start creating a workflow, and we want to lower the barriers of how easy is it or how difficult it should be to create this workflow.

F

So so, for example, uh in the case of metaflow, you can essentially lift your individual cell blocks uh into this workflow specification.

F

So you can sort of like have this class flow spec and you can have a bunch of python functions that you can annotate with this decorator called step, um and you can just embed uh your cells uh within this, and now this is sort of like a fully functional metaflow workflow that you can then uh execute through sort of like you know, python, micro.pyramid and now sort of like the interesting thing over here is around sort of like the state transfer right, like notebooks, essentially sort of like uh within sort of like these different cells.

F

You can access uh state from sort of like the previous cells that you have executed and in this particular example, if you see I have sort of like this uh variable self dot x uh in the start step and uh that variable cell.x is available uh in the in steps.

F

So that's how sort of like you can deal with the state transfer and when you sort of like execute a mediflow flow through sort of like uh this python micro dot, py run at that point, meta flow will merflow, will essentially snapshot and store uh all your execution, so it will assign a unique id, an execution id to that workflow. uh And then you can go back in time and for any sort of like given id of a workflow. You can access sort of like the code that executed that workflow.

F

So you do not have to rely on sort of like you know, explicit, git uh versioning you get access to all the internal state, that's been generated and you can essentially sort of, like you know, within a notebook or any other python process inspect either your workflow or your colleague's workflow uh in a very uh straightforward manner, um and that that's all sort of like you know uh fine, uh you've sort of like now taken sort of like the contents of your notebook and you're able to sort of like create a workflow in a very simple and straightforward manner.

F

But now one sort of like issue that comes up uh very often is that, okay, you know like uh even when you are using a notebook or you know, running a workflow locally. You are still confined uh to the resources that are available to you, with within your workstation and many times. It may be the case that there is sort of like one step that needs gpus, but you do not have gpus attached to your instance. So how do you sort of, like you know, uh scale out?

F

Maybe you need sort of like more ram uh to process some data frame, uh and you know your laptop, let's say max is out at 16 gigs of ram, but you need something like a 500 gigs of ram.

F

So in that particular scenario, you can sort of like very simply just slap in this decorator at resources, and you can specify uh what are the resources that you would need and then what metaflu will do is when you execute this workflow, it will essentially execute the start step in your cloud environment, and it will essentially sort of like make sure that your code is already available on that particular instance. With the resources uh that you need and then uh the rest of the workflow can then still continue executing uh on your workstation.

F

So in this particular example it's you know, you're processing, sort of like this pandas data frame and that can execute on an instance with 128 gigs of ram and then the end step may very well just execute on your local workstation and then metaphor will take care of passing uh data around. So you do not have to think about sort of like how data transfer is happening behind the scenes feed uh but then, like one one sort of like you know, common use case uh in machine learning.

F

Is that at any given point in time you don't want to train just one single model. You may want to train multiple different models say you want to train a model for every single country, language combination. So if you think about netflix, that's already sort of like you know, 1000 uh models that you're training or maybe you want to train models for like multiple different hyper parameters.

F

So it's it's very likely that very soon you will be in a scenario where you want to sort of like run the same job over a different set of inputs, and we allow you to do that very trivially using sort of like you know this notion of self.next, where you define sort of like uh your next stage transition. So you can also pass in a list.

F

So in this particular scenario, I'm trying to sort of like train a model over this list of parameters which has let's say you know 100 elements and what that allows you to do is you can very very trivially uh launch this workflow and this workflow will run your first step on your workstation and then it will essentially run 100 containers in your cloud, and each of those containers will train this model.

F

And then you know the next step can still sort of like just run on your laptop, and you can collect all of the uh 100 models that you have trained and then decide what to do next.

F

So you can very easily sort of like use, many computers and all this while your workflow is still getting launched from your workstation.

F

Yep uh so so now you know with metaflow. You have now the capability of accessing compute in the cloud and not just sort of like one instance or a handful of instances, but like hundreds and thousands of instances, potentially right. But then uh one of the biggest bottleneck that quickly comes about. Is that all of these uh steps that are running in the cloud or even on your workstation? They need data right and if even if you're able to sort of like launch these instances reliably and run sort of like this high scale workload.

F

If you're not able to access data quickly, then it's very likely that you are paying sort of like you know, for a really expensive gpu instance, but that gpu isn't engaged or you're. Just not, you know waiting for data availability and um but what we enable people is sort of like efficient data access from say, for example, s3. So so here's an example where, let's say you have a data engineer which has sort of like you know, build a source table for you through some etl process.

F

And now you want to do some sort of like feature engineering and one sort of like way to do that. Feature. Engineering step uh could be that you know you sort of like launch a spark job uh that creates sort of like this temporary table. My data and this data may be sort of like backed by parquet files and then with metaflow's s3 client.

F

You can then essentially read these parque files directly from s3, and our implementation, usually sort of, like you know, gets you upwards of 10 gbps of throughput and in many scenarios it's more efficient to read from s3 than sort of like you know the local disk that's attached to your compute instance, and that sort of like just unlocks um a variety of new paradigms of you know how people sort of like think about doing these uh high performance, computing applications um and then sort of like you know now you have sort of like this workflow, that's still sort of like running from your laptop and you are able to sort of like access the cloud and you're able to run sort of like you know this really large scale uh compute.

F

uh But then there comes a time when you would want to sort of like run this uh workflow at some schedule. You want to run this asynchronously, maybe sort of, like you know, every single night or maybe as soon as sort of like an upstream uh table partition appears. That's when you may want to sort of like kick off your execution and again, you know, rather than sort of like re-implement, uh the entire stack uh we piggyback on production, create workflow orchestrator. So one example of that is aws step functions. Now.

F

The good thing with these workflow orchestrators is that they execute a dag. So we can essentially compile a metaflow workflow into a step, function, state machine, and at that point you essentially get the benefits of metaflow. You get a great user experience of authoring these workflows and as well as all the benefits that metaflow provides around like state transfer versioning.

F

uh You know the ml specific concerns that it handles for you, and then you get all the benefits uh of a tool like step functions when it comes to high availability and I'm happy to announce that we also support argo workflows.

F

uh Just like we do uh step function so so there is a pr uh that I've been working on, so the pr is feature complete and over the next couple of weeks we are sort of like just going to finish up the documentation and then uh hopefully, early next month with next month, uh we'll be able to sort of like do a comment. Announcement of that and I'll follow up with sort of like a demo of what that looks like after this presentation.

F

But now you know now you're at a spot where now you have a workflow, that's now reliably running on one of these workflow orchestrators. Let's say our workflows.

F

Right uh still, you know there are many things that can go wrong right, like let's say you are, uh depending on some version of tensorflow right that you, you still have to worry about like where, where are sort of like your library dependencies uh coming from, and let's say you decide to sort of like piggyback on some specific version of tensorflow, uh even if, let's say you're, always sort of like reinstalling that version of tensorflow, it's very likely that you know one of the transit of dependencies can break at any time.

F

And now one of the issues with machine learning workflows is that compared to sort of like traditional software engineering workflows is traditional. Software engineering workflows will fail loudly uh with ml uh workflows. uh You know your results are going to be just slightly off and it's really difficult at that point of time to try out what really went wrong. Was it something that was wrong with the data? Was it something that was wrong with the implementation or did the universe?

F

Just moved move around me right like if, let's say I'm installing tensorflow or specific position of tensorflow, it may pull a certain transit of dependency and then that transit of dependency could change the next time.

F

I try to reinstall the exact same version of tensorflow, and now you can essentially you know one alternative to that is that uh you essentially make your own docker image, and then you uh ship that docker image to all of your consumers, uh but then that again sort of like still begs the question, that okay, how are you still installing tensorflow inside that docker image and doing sort of like uh having reproducible, docker images or docker files?

F

It's still sort of like you, know, really complicated, and we did not want sort of like our data scientists, to sort of like conserve themselves with that uh either. So for every single step. You can essentially also specify the dependencies uh that you need and what metaphor will do is it will essentially create a dependency log file for you behind the scenes and it will snapshot all the dependencies and store that in history.

F

So, even if let's say tensorflow 2.5.0 and all of its transitive dependencies disappear from the upstream uh package repository, um you will still be able to uh create sort of like this um execution environment. uh That's that's something that metaphor will guarantee for you, so you can sort of like lift and shift. So if it works on your laptop, you can essentially know sort of like get it working on our go step. Functions, kubernetes, aws, batch.

F

Whatever sort of like you know, you want to execute your compute now you we are now at a spot where you know you have a workflow, that's running sort of like reliability and we have frozen the execution environment. User code is already frozen, and now you can sort of like only monitor for your data drift sort of like you know something is broken. uh Now comes a scenario where you know like now. You have reliably deployed one workflow.

F

uh Now your uh business table holders are happy, and now they want you to sort of like work on different variants of the same model. You know maybe sort of experiment with different features. So what you may want to do is you may want to sort of like run uh isolated uh variations of the same workflow in parallel to sort of like try out different models, and now how do you sort of like do that?

F

uh Because now, if, let's say somebody sort of like you know just copies over your workflow and executes that workflow over again, then there is a strong likelihood that it makes it sort of like you know, just overwrite results somewhere where you did not intend it and uh with metaphor, everything is sort of like very strongly name spaced.

F

uh So you can essentially create a project and you can create multiple branches of uh different projects and run them reliably with sort of like a strong uh guarantee that uh none of the results are going to be overwritten inadvertently, and what that means is that, let's say, if you have a new data scientist on your team, they can very easily fork your project and they can just like start executing and start experimenting on top of it without sort of like you know, stepping over the production runs.

F

Okay, yep, but you know things things will still fail. uh That's that's sort of like how things happen uh all the time. uh You know you deploy something on our workflows. Everything is working. Fine data changes, some something breaks uh there are sort of like you know, uh some uh failures that you want to sort of like handle gracefully, uh so you may want to sort of, like you know, retry a bunch of times or maybe sort of like certain failures.

F

uh You sort of like just expect to happen, and you may want to sort of like catch those failures and move on to sort of like the next step and metaphor sort of like provides you with that capability, but then many times uh you may want to actually dig in like what really went wrong and, uh let's see you know, you had sort of like a workflow that was executing uh on argo, workflows and midway through that that workflow failed.

F

Now you may want to sort of like replicate that failure on your workstation and then sort of, like you know just like uh change the code a little bit and then again re resume that workflow from where that uh workflow uh collapsed.

F

uh So that you can sort of like get the confidence that you know any sort of like changes that you're shipping you're able to sound like I create all that quickly and once you've sort of like figured out the right fix, then you can again sort of like deploy your workflow uh back to argo uh workflow. So so we need to sort of like complete that loop of like prototype to production and back to sort of like prototyping debugging, uh which is what sort of like you know, metaphor, makes it uh super simple.

F

G

Now it's demo time, so let me quickly get you to that. uh Let me share.

F

My terminal window.

F

All right, uh so this is, uh let me walk you through sort of like you know, one of the workflows that I've written. uh So this is sort of like a very simple for each workflow. So in this workflow I have a start step and the in the start step. Let's say you know, I have this list of movie titles and now I'm running the step a on every single element of this title right. So so I'm sort of like specifying for each titles and in this let's say you know very simply.

F

What I'm doing is I'm assigning sort of like self.title, that you know I'm processing a given sort of, like, let's say, exchange of things title and I'm just printing that here and then I'm joining it. uh So in the join step, I collect all the titles that were individually processed and then in the in step. Let's say I'm just sort of like printing uh the results right.

F

So what I can do very simply- uh and just like you know one note here so you can, you can see sort of like you know over here, people can sort of like access the inputs um and uh like, for example, in the join step. I assign self dot results and because this is sort of like you know, just wrote python in the instead. Naturally, I expect sort of like access to uh self-doubt results here as well.

G

So what I can do is just.

F

And this run step will essentially run my workflow and it will assign uh this run id and it will run all of these steps. So, for example, I expect three copies of step a to execute, so I see sort of like you know three of those print statements and then just one copy of the end step to execute. So I just sort of like see that over here, so you can see sort of like you know.

F

There are like three different pids for the a step and just like one single pid for the end step and the joint step so metaphor is just running sort of like some sub processes uh on your instance, and you can also very simply sort of like you know, just uh tag any step with uh add kubernetes or add batch and we'll just sort of like lift and shift that particular step onto kubernetes for you and then you can also, let's say.

F

Yeah, so you can use the client to also inspect any run, so, for example, I had this for each workflow, so I can.

F

I can check what is the latest run and I'll say that the latest run is 89 23, and then I can say that okay, like for the latest, run 89 23. I'm looking at the step, a for the step, a the task that I executed or, let's say end, and for the task that I executed.

F

What is the data that I saw? So I can just do this and I get access to sort of like you know the data uh over here and I can also sort of like just look at like the step, a tasks, and it gives me sort of like all the tasks that executed for this and I can click sort of like you know the first task.

F

I can take any task and then I can do it.

F

Yeah and then it sort of, like you know, says narcos processed and I can swear like I trade through all the other tasks as well.

F

Now I can also sort of like um execute the entire workflow uh on top of argo workflows. So I can just do let's say.

F

And then we will essentially just compile this workflow into our go workflows.

G

F

G

Share my entire.

F

Okay, so I've shared my entire desktop, so you can just go to argo workflows and you will essentially see sort of like you know this for each flow executing and it will execute uh all of these one after the other, and you get all the benefits of all the uh nice little utilities that argo provides. All the logs are available uh within that.

F

So, for example, now you have sort of like you know three of these steps uh running and um like right before this, I actually ran one of the workflows, so, for example, this is the for each workflow that I just executed and you can essentially look at the logs for every single step, that's available. So uh from an argo point of view, it's sort of like a native argo workflow, but then from a mediflow point of view as well. uh It's a native uh meta flow flow.

F

So so you essentially are able to get the benefits of both the universes and you can also run sort of, like you know, super complicated workflows as well. uh You can have, like you know, dynamic nesting within workflow, so this is like one workflow that I ran.

F

So we have sort of like the start step that leads to sort of like a for each uh and then we run sort of like you know, an a step and then there's a linear transition to a b step, and then we have sort of like a split step or a branching into a step d and an e and then the step by itself sort of like you know, runs yet another uh map task.

F

uh So we have sort of like you know three different iterations of it and then everything sort of like a green joint back. uh So that's that's pretty much sort of like um where we are at uh with our integration and uh yep. That's that's pretty much. It.

B

Awesome, thank you. Seven. uh We do have a few questions in the chat, the first one's from jesse. I don't know if you want to ask your question directly. Jesse.

A

Oh yeah, I was just actually wondering how it was possible for your s3 performance to be faster than the local disc, like what time yeah.

F

No, I mean that's, that's the beauty of sort of like advances in networking that aws has shipped I'm I'm happy to sort of like you know after this call, just like uh share sort of like some of the work that we have done um on that regards uh so so, there's already sort of like a component in beta flow that you can use uh independent of sort of like the rest of my ecosystem, if you're curious to sort of like benchmark that yeah, but I mean essentially what what's really happening uh is if you look at sort of like uh the border, three history client, uh it doesn't sort of like maximize your network throughput, and so so we essentially just run multiple instances of boto3 to fetch sort of like to get sort of like you know the most performance that we can get you know cut and that nickel card is uh has a lot more throughput than sort of like any volume that you attach with your instance.

A

Cool okay- um and my second question was um so you showed two examples: one using the kubernetes adapter or the integration and then the argo integration. um Since argo workflows is kubernetes, what's the difference that they provide.

F

Yeah yeah great question so metaflow by itself is not a workflow orchestrator, so we do not deploy any service.

F

So what you saw uh that I was executing through mediaflow was all of python library, so when we are executing your mediaflow workflow locally on your laptop at that point, it's just that python library running a top sort and running one step after the other, uh and uh people may want to sort of like form out compute or some steps of their workflow to kubernetes and uh but then there comes a point where now you want to just like run uh this workflow uh every single day you do not want to sort of, like you know, uh wake up in the middle of the night and schedule that execution from your laptop.

F

uh So at that point we would want to sort of like lift and shift that entire workflow and run it on top of, say, an airflow. Our goes to functions at that point.

A

F

Rather than keeping.

E

Our own workflows.

A

Between the way, the um the meta flow with kubernetes works versus metaflow, with the argo like.

F

So so we have tried to make sure that you know all the functionality that you get with meta flow with kubernetes is the exact same set of functionality that you get with metaflow with argo workflows on kubernetes. The additional functionality that you get with argo workflows on kubernetes. Is that now there's a lot of work that has gone in and just like, making sure that our workflows by itself is like fault, tolerant and highly available? So we do not have to sort of like redo that engineering work.

F

Those concerns do not exist when you're just running your workflow from your local laptop.

A

Got it okay, but the kubernetes one is still able to use multiple pods. uh Yes,.

F

A

Okay? Okay! Yes, yes,.

F

Yeah, so so you can essentially sort of like run your entire workflow on kubernetes um as well or directly from your laptop, but then when it comes to scheduled executions or asynchronous executions.

F

At that point we would just want to offload it to some other servers that can take care of that for us, so one integration that we want to execute on is integrating with argo evens so that, let's say you have some data that appears in your data warehouse that that can trigger a bunch of events that a few machine learning workflows can be waiting on or you know, if you want to let's say, um string together, multiple different machine learning, workflows as well, uh then we can sort of like just have some even base triggering for that and achieving that on.

F

Your local workstation won't be.

A

Cool makes sense thanks.

B

uh I've got a challenging question for you who, who are your competitors, not an expert butterfly.

F

Yeah yeah, I know uh great great question. I think um when, when it comes to sort of like meraflow, what we are building is an end-to-end platform. So people come to us uh having a variety of different needs. uh So so there are folks, you know who are looking in the market for an experimentation management solution and given meta flow sort of like provides you that capability out of the box, so many sort of like tools in that universe see us as competitive.

F

Then, uh of course, there are like a similar workflow, orchestration or ml focused workflow, orchestration tools too, uh so that could be sort of like you know. Yet another uh comparison that's usually made, I think. Historically, what has happened is that we have been compared quite a bit with our go workflows, step, functions, airflow and with sort of like these integrations. We want to sort of like dispel that notion that, rather than sort of like the competitive, we are like very complementary.

F

um I mean we do not want to sort of like reimplement our own workflow orchestrator, as well as sort of like maintaining and deploying the workflow orchestrator. Isn't really that straightforward either um and many organizations they may have other use cases for a workflow orchestrator beyond sort of like machine learning, so we are.

F

Our customers have made a choice: okay, cool cool.

B

uh Do we have any more questions.

B

Okay brent, should we give uh seven a round of applause?

B

Well done! Thank you, uh okay! So uh that's the end of the presentations for today, um our next community meeting will be on the 20th of april.

B

um We hang and have a demo of plummer from eric meadows, um I'm actually not sure where he's from which organization he's from, I guess we'll find out next week, if you're interested in presenting at a community meeting or you'd like to get involved in um more community stuff, such as um blog posts and the like, do uh let us know you can drop us a slack message um where I mean we're super keen to people to do more blog posts and so forth.

B

So um and it's it and it's a lot easier than you think so do let us know if you have any more questions of course, jump into the slack channel and ask them and uh the recording for this will be available uh the before the end of this week. Okay, thank you very much and all have a lovely.

D

C

Thank you very much. Thank.

D

You thank you take care. Thank you. Bye.