Red Hat OpenShift AI / Machine Learning | OpenShift Commons, 1 Mar 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift Commons ML SIG Meeting March 1 2019 Full MLFlow Introduction and MLFlow Operator demo

Description

OpenShift Commons
ML SIG Meeting
March 1 2019 Full Meeting Recording
MLFlow Introduction and MLFlow Operator demo
Databricks Mani Parkhe
Red Hat Zak Hassan and Diane Mueller

A

All right, there'll be a couple of small files there when I look at the recordings, so today I'm welcome everybody back to the machine learning on open shift. Sig and my name is Diane Mueller I am the co-chair of this mi, regular that cig meeting and today we have a couple of speakers lined up to talk about ml flow and an ml flow operator. That's being worked on. If you have a chance in the chat. There's a link to the meeting notes that you can see on your screen right now.

A

You can add that if you can add your name do the attendees. That would be great, as I was mentioning before I hit the record button to Manny and Zak. One of my goals, for this is to what the next steps are in terms of the ml flow operator. So, unlike many to give and take the screen over now and maybe explain what ml flow is, give us some context for that and then Zack, who has been working on an ml flow operator and he's on repo.

A

We get that work moved into a space in the ml flow repo that other people can collaborate and work on there that the maintain needles are telling, if that's the appropriate thing to do that's kind of what I'm trying to suss out today too, and for those of you who don't know. Yesterday, we launched something called operator hub, do where there's some about a dozen now one day in operators that are that run on kubernetes generic standard stuff and as well as on an open ship.

A

So it's not an open ship, centric hub, but I love to see an ml slow one in in that up as well sooner than later so money with with that, how many of you introduce yourself take over the screen? And let's hear a little bit about what mm flow is today.

B

Perfect, can you folks see my screen.

A

Yes, I can just make it fullscreen it'll be perfect. Yep.

B

Let me go into presentation mode.

A

And that's perfect thanks money, perfect.

B

All right, hello, everybody! Good morning, my name is Manny Parque I am an engineer at data brakes working on a Mel flow, so I'm here to talk to you all about what is ml flow, the motivation behind building this machine learning framework, what the competence of ml flow looks like and where you're going to take this so I'm gonna introduce all of this in the next 15 minutes or so as a preamble to Zacks presentation. So anybody I'm sure this team appreciates how complex ml flow development is and also appreciate.

B

That's a slightly harder than traditional software development, as we've done in the last few decades. Sort of to tease this out a little bit. I want to sort of talk about some differences between traditional software development and machine learning development. So, let's start with a problems in each. So let's you take a problem in traditional software. Let's say you building a credit card transaction system or a functional verification system, or any of that you start with a phone or with a functional specification.

B

You know exactly the terms and conditions and what products you're trying to build. So the goal is pretty clear what you want to do in machine learning. The goal is to optimize the metric, so there is no perfect answer. You just try to get better and better. The metric could be increasing accuracy or it could be a vector of different metrics that you're trying to optimize all right. The other difference is quality in traditional software.

B

The problem, when you're trying to solve problem the quality of the product eventual depends primarily on the code, which includes, obviously the unit tests that you write: integration tests, but primarily the code that you have read that that engineers write and has, and any I to some level on the system, so that that's pretty contained as a problem.

B

When we go to machine learning, there's not only all the code and the system, but also the data that we use for training the models and then how how well the model is T own and how we like regularly continue to update with fresh data, and we have to tune the models, use, different, algorithms and and so on and so forth. So the quality is sort of a shifting goal and a moving target just like the goal that we talked about, and the third thing I want to talk about is in traditional software.

B

You develop off of a common software stack, something that the team is Flexer is has worked with in the past release different projects on you're, pretty well acquainted machine learning. On the other hand, you want to constantly experiment with new libraries that constantly come keep coming out different frameworks. Your algorithm is very different types of models, and not only that or other than experiment. You must know how to production lies them. So just having a model that is built with this new framework is not good enough.

B

You need to know how to production, lies it and put it into in a serving fashion in your existing framework, so that sort of becomes hard.

B

Let's look at the same thing slightly differently from the machine learning lifecycle and how all of this presents challenges.

B

A typical machine learning lifecycle always starts with taking in raw logs of some format, and then there they eat healed and cleansed, and then you feature eyes them to get some feature vectors, and then you train it using different training, algorithms serve it and then monitor how the serving looks like- and this looks simple enough- but it's not because eventually there are these backward loops, for instance, after you sort of training, or you realize that okay, the which feature vectors, were not perfect, so I need to go back and redo those or you might have to retrain with another algorithm or if the model looks good, its production izing the production logs.

B

It's going to create even more data that you might want to bring back into the loop and retrain with a retrain for the next iteration of this model. So by itself. Even if this looks like a straight forward loop, it's it has, it has its own iterative loops built in now.

B

Let's say you go ahead and try and implement each of these steps in some machine learning life cycle you'd realize that there's is a zoo of ecosystem of frameworks for each of this, your logs could be coming in from Kafka and you may be using spark to do all the or the heavy heating and cleaning of data pitch rising, and then there is like every there is transfer flow, traditional traditional machine learning, algorithms like scikit-learn, and so on so forth, to train the data.

B

And then you have all of these different frameworks where you can deploy that the using all of this can sort of present all its challenges, and typically for machine learning practitioners. It is incentivized to use different, different algorithms and different frameworks in order to best receive to get the best quality that we've been talking about where your model. So, if optimizing metrics is the goal, then you pretty much end up trying out various different tools until after that, oh just being able to train and create a model is not sufficient.

B

We all know that using the right parameters for tuning is a quintessential to getting the right model. It can make all the difference between, like you know, just guesswork versus using the right set of parameters to get like the ideal results that you've been looking for for development of product and beyond that, as we talked about deploying of these models and managing the lifecycle of each of these models becomes a challenge in itself.

B

Now to add to that, if we sort of act, the next layer of as the team starts growing bigger as the amount of data gets larger scale adds up, and you end up having to deal with with various different people, and different teams are owning different components of this lifecycle.

B

So you may have a team working on on a team of data engineers working on the ETL and logs part of it, and a set of data scientists working on the training aspect and a set of system engineers on the deployment part of it. So you have to sort of work through the lifecycle of a model how it passes through these different teams and governance of it.

B

All and then finally, building a model and and putting it in production is, is it's one side of the problem working with the rest of your product, whether it is a fraud, detection system or ad click tracking?

B

You want to be able to use other systems in your ecosystem that work, along with your machine learning models, for example, you may want to maybe test different style of models or you may want to sort of track how these experiments are flowing through different life cycles of the user, or you want to sort of have a more automated, orchestrated way of training, these models and managing lifestyle and, finally, if they start worrying about how the models drift from the expected quality and data drift, and so on so forth.

B

So this in itself, what we have seen is an extremely complex process. There is not a single tool out there that makes it easy to use and in fact, you have to end up using multiple tools that you and the team are comfortable with right. So, as we start looking at this or when we built mo flow as an open platform to be able to for users to manage their models and machine learning frameworks, we wanted it such that it works with. All of these.

B

Existing tools makes it easier to work across the stack these various tools and not necessarily be one of these tools, but then would work with a lot of these tools to solve the problems that have not been solved so going over to stuff. You have to talk about what is ml flow in a quick three bullets line. So no flaw is an open platform that helps you manage a machine learning development lifecycle and it does it in three ways: one: oh, we have lightweight ABI to be able to work with any ml or library.

B

So, as you talked about as like everybody has their favorite library to use- or in fact you want to try out different libraries, they could be in different programming languages. The key aspect that we saw here was data scientists and machine learning. Practitioners don't want to get locked in to one particular library or one particular language.

B

You want to sort of be able to use any kind of language, so that's where we build ml flow as an API first approach, so you can talk to ml flow as components using REST API of Python Java and an are they all built up on this basic REST API and lets you interact with them a flow. The second thing we wanted to do is we wanted to make it a make reproducibility of the runs a primary.

B

So, for instance, you have you typically train your model on your local machine, and then you want to sort of make sure that it reproduces the exact same results when you're running it on a cloud somewhere any kind of cloud platform. What makes it hard is getting get that reproducibility again, if you, if you send your code to some other engineer on your team, they should be able to do it the exact same way. So having the having your ml sessions run.

B

The exact same way on any cloud was one of the other development principles of he principles. Behind building Emma flow and then finally, simplicity and ease of wrapping up was the other goal such that it should be useful for one engineer on an org, but also easy to scale up to a thousand people, or so the scale and ease of use was also sort of thought in when we sort of designed that, okay, so from here, the ml flow supports like three different components.

B

To begin with, one is the ML flow tracking server we're going to spend some time talking about that. It's necessarily a centralized repository to store all the critical information that is, that is generated from and required to generate. Your machine learning run when this could be taken with the parameters and configs and and and also the metrics that come out right. So that's that and then it has mechanisms to query that. So it's like a like a database for all your machine learning runs the second one.

B

Mlpro projects is a code packaging format and that is targeted towards making your runs reproducible when you run it on any cloud platform or at any point of time later later on. So this is a way to sort of store everything such that you can guarantee that you are reproducing that runs and anywhere, and anybody else can do it as well and the third one is a anode models is a generate of format.

B

A packaging format for models such that the models that are once written out by ml flow can be deployed across a variety of production platforms. May it be excuse, me may be in real time scoring format, or it could be like that scoring or or streaming platforms. So that's that's the three big components and then I'm going to spend a few minutes talking about each one of these.

B

So let's jump right into the key concept of tracking ml for tracking. We talked about tuning machine learning your algorithm, so parameters which could be like a numeric or string parameters, are a key to sort of making sure you get what you want out of be our machine learning one data scientists sort of try out thousands of experiments with the same algorithm to sort of get the right right.

B

You get the right results that they are looking for. So one of the key concept is a dictionary of parameters that might be associated with your particular one. Another thing is, after your run is done you you generate numeric values by scoring you or on a test data. So what are the metrics accuracy error so on so forth? That becomes the key concept that the tracking server might want to. The engineer might want to keep track of in this consolidated repo and then.

B

Finally, if you start looking at your models, the bottle generated from the attract training run or could be itself an artifact that you might want to store away to keep track or governance, or even use it port for scoring later on. Along with that, you want to keep track of metadata around the or around of what was metadata around how the model was called. What source code was used? The exact version maybe get our ID get hash.

B

These are some of the interesting things you may want to store around along with your training data, and then, finally, they could be some texts that you might want to store some document that might want to capture the details of your model, and for that we have this high level tag or notes that that you can record for your training run and then at the bottom right. You see a UI that sort of shows how it would show up on an ml flow UI.

B

This is the query layer that I was referring to where you have an ability to show all your runs and then using this sequel like syntax query that run for metrics and parameters and sort of slice and dice your data based on specific metrics, for example, you can say metrics tour accuracy is greater than point 98 or so and so forth. Okay, so how do people use this consolidated storage?

B

So your ml for tracking server as I said, is a database and you could have various people write into this database for each training one either through some hosted notebooks or a local app running on your machine or a cloud job right. So they're writing to this centralized database through REST, API or Python API, and and that's how they're, storing as as we'll see how it is done and then you are, we have built a UI layer and then there is an API layer to be able to query that database.

B

So in this picture we sort of abstracted away the database. We started off when we opened up when we a release demo flow using a file back store, and now, if you have various different stores will soon be releasing a seek, will be a store to sort of store, all the metadata around runs and- and so you can query that, using again as a UI that we've built or using python api all right. Oh now.

B

Finally, how does this look like the lightweight api I was talking about? It starts off with very simple. You start a run and then you, you can record the parameters that are super important for the particular run, and you can see that highlighted in green here and then you can paste in your training algorithm and at the end of it, you are tell you, you compute some metric based on some test data and you can law that metric or you can actually log with the artifact or some higher level.

B

You can walk the model or some high level artifact like, for instance, a lot of symmetric data and that can be logged as well, and then you in this particular example, the model happens to be a chance of work graph. So we are logging, it like a tensor flow graphs and the flow has these packages within it. To understand that hey. This is a tensor flow graph and we write it as a model. So any kind of tool that understands the tensor flow graph can use that for deployment.

B

So with very simple API that surrounds your existing or training algorithm, you can. You can now very easily use a tracking server and whatever happens during this run, gets logged to the tracking server all right, so that was a tracking. Now, let's talk about projects now. As a reminder, this is again a mechanism, a code, a packaging mechanism to make it easy for you or anybody on your team to reproduce your runs on any cloud platform anytime later.

B

So what are the things that we need to be able to do that? So in this packaging format, we obviously need to include the code that is needed for training. You may also need some configs and some pointer to data, so this effectively constitutes of what would be needed to be able to reproduce the run, whether it is running it locally on your somebody else's local machine or running it on some remote cloud.

B

But then how exactly does this packaging format look? Like there's an example: it's necessarily a directory structure where you have this file called ml project and if you look at the file, use an example of a very small version of the file. It right of the bat tells you that it means a Khanna environment right. So this is the current environment that the data scientists use to train their model and the content is a Yama file that is included within that directory structure.

B

And then, if you look at the next thing, it talks about one entry point, a main entry point for being able to run this project, and it says it. Ok, you need these parameters, which is a path for the training data and a float parameter called lambda, and it's here it's defined as a default value of 0.1, oh and then how does ml flow know what to do when it sees that entry point?

B

It says: okay, it's runs of a simple Python command called me in dot pi, and it gives it the training, data and lambda, and obviously this is the code main that PI and then there could be other dependent modules. Included of that. This is the code that you have written to create your training rod. So pretty simply when you write a training algorithm, you include this within that restructure, create a simple ml project file and capture your condo environment and boom.

B

Anybody is able to then reproduce your run using a simple couple of mo slow commands that can just take your existing code, use their data, use their parameters and we and in create a run or you can reproduce your run. So how does somebody do that through a command-line ml flow space, wrong space I'll get how people or it can be a link to this local directory structure? If you want to do this, programmatically there is I'm also run a Python command to do that.

B

So with a bunch of with a bunch of cluster specifications, you can run this on a cluster as well.

B

So, very simply, you can sort of do all your local training package it up in this format, along with the corner environment and then you're set to go or to to run this on the cloud and then finally, to talk about models models again was a format that, once you write out a model using a multiple models, it should make it easy for you to to deploy this on any kind of an environment that can host that model.

B

So for, let's take an example where excuse me wait, you have created a model and it could be like a tensor flow graph model that in your local environment, and then you want to write it out in such a format that it can be. They can be loaded on, let's say, Amazon sage maker and then also in some kind of environment that runs Python.

B

So this would be a way of saying that the same model that is used by one team- for that say bad scoring, is also used banner, the team for some real-time scoring and it uses the same model that is being written by a mouse law. So how does this look like again? This is a packaging format, though the format will be very similar to what you've seen and projects. It is animal model file, but it has a few things.

B

First, it tells you a few metadata about when this model is created, what run and then it says, said: okay, here we have this model written out in two different flavors. One is a tensorflow flavor as a dictionary of things that any environment that.

B

Excuse me that can run a tensor flow format, would need to be able to this utilize this model and make it executable and then any environment that uses Python can use this ml flow, Mart module column L for a tensor flow to be able to then load this model module and then execute it like like it would with any Python function. Again, it all depends on making sure that all the dependencies are in play.

B

So now, if you have a team of data scientists, all they need to know is how to how to load around their own training and package the model using the ml flow commands, and then at the end of it, you could have another set of system engineers who can easily take these ml flow routine models and deploy it onto any framework that makes sense so ml flow becomes that central place where users can use to sort of create various different deployment frameworks deploy to those wider or not that okay.

B

So how can you learn about a little aspect? So this was a little flavor of how a mil flow looks like there is a lot more in terms of documentation and n training examples. If you want to do some type of parameter, tuning multi-step, workflows for forests and bat scoring, you can start off with installing ml flow through PI bi or you can download it through the github repo and and try it out. So the point the key point to go to would be ml flow org that hosts the website and has all the examples.

B

Okay, in the last few minutes, I want to just talk about the open source community growth and how ml flow that started off about eight months back has has been well at ppreciate, annex and an embrace by community. So this is a quick I wanted to use for metrics to sort of show that it started months ago and was released during a sparkle CI summit in June.

B

And since then, we have 77 core contributors from 40 different companies that have contributed to it and here's a quick plot that shows about the number of contributor growth over over months as compared to a couple of other favorite open source projects and across, and this this is a great chart chart that shows how quickly people have started using it and and actually writing color and contributing different things.

B

Another metric that is amazing, is about 420,000, monthly downloads and pi PI, again comparison with some of the other favorite tools that people use. So this is about this is this is great growth in the last eight months or so, oh, we have a San Francisco, Bay, Area, Mia ml flow meter.

B

We have about seven and 18 members, and here are some pictures of one of the latest need ups, that we had your data breaks office and then some folks from our customers and other people showing, however, they've used ml flow and in that in in their workflows and then finally, we recently had a survey on ml flowed on our website and 30 orgs want to publicly list themselves as users or flow and I want to sort of state that this survey is still open and I invite everybody to go.

B

Look at ml floor org and there's a Google Form for this survey. We want to know if you have used a more slow and if you have not, what sort of things are you looking forward to in that I want to know more about what what what makes what were the requirements that you're looking for and what would make it easy for you to use ml flow.

B

So at a very high level, are some of the large contributions. General flow came from a terminal orgs.

B

It could be system related things like get integration and cloud support, or a UI level of work to visualize experiments or and an integration with different packages.

B

A lot of a lot of contributions came from external contributors, and these are some of the big ones, and- and this is way more than this- this slide is slightly dated in the last few minutes of orders to start talk about roadmaps and where they sort of going from here, as I mentioned that we have the survey, one of the things in this survey was how have you been using a real flow, and what would you like to see more, so one of the things that we heard was that we wanted to the users had a lot of requirements for existing ml for confidence to name a few.

B

Currently we are at version 0.83, and we want to sort of, as we start going to 0.9 and 1.0, they sort of thinking of adding in point nine we'll be adding a sequel, backed store for tracking server. This is one of the most frequently asked requests, and then we adding flu and Java and Scala API a lot of UI scalability.

B

Enhancements and then more on the logging side of things for projects we talked about Condor being used to reproduce your runs. There has been a lot of ask for daca based project environments and then finally, I want to mention in ml flow models.

B

A lot of requests came around that okay, this I want to use a model I'm going to save it, and then I want to sort of have inject some custom code. It could be sort of a querying form of feature store or it could be, like you know, transform suite features a little bit, so the sort of custom, logging and and custom code is something that is being worked on actively, instead of releasing at point nine and one point out.

B

This is some of the things that are or are on, the roadmap for the next few releases. But then we are also starting to think of some more components that we could have, for example, a model registry or a multi-step workflow using GUI.

B

You can already do that using the ml project and you can have multiple entry points, but but it would it make sense to have a UI to sort of edit, your modifier multi but ISTEP workflow, and then we're looking at adding some telemetry components or logging data and metrics and taking you back into some analytics tools. So these are they're looking for feedback around that so again, another plug for the survey on ml flora org.

B

We want to know how you use your mo floor, libraries and frameworks and how what would you like for us to add as integrations in ml flow? So that's the survey. Thank you all for your time. I want to just sort of again put out ml soda or join us on slack or there's an email thread. You can join and thank you for giving me the time to talk at the summit. If anybody wants to job to come to spark some, it here is a code that wanted to sort of share with you all.

B

If, if, if you have any questions, I'd be happy to take them now or towards the end of the talk. Also, if you want to just reach out to me I'm a data breaks, my name is Manny MA and I. You can reach me at data breaks or through slack or Emma, for or thank you.

A

Well, that's that was great I look great over YouTube um quick question: what is the dates for the spark AI summit? It's.

B

In April, if found not mistaken, it's April 26.

A

I'm sure we'll have some people there read that for sure.

A

So, next up, if there's no questions or maybe we'll save the questions for after Zak's demo to exact, if you want to take over sharing the screen, we will get you set up and many if you could add in into the notes the link to your Bay Area Meetup, do because I think that would be of interest for folks.

B

Yes, I'll do that: okay,.

C

Can everybody you see my screen? Absolutely the canvas go for it.

A

C

Okay, so so my name is Zack and I wanted to give an update on some open source work. Then I'm working on around a no flow operator and just just trying to share some of our use cases and motivation of why we built that.

C

So we had a requirement, and our use case was to to basically find a tool to perform hyper parameter tuning. So you might ask you know what would is hyper parameter tuning?

C

What are these things called hyper parameters, as many pointed out in his presentation, you know: well, when we're creating machine learning models will be, you know, whoa. We may want to do some hyper parameter tuning right, so when we, when we're creating mo models, you'll be presented with different ways on how to define your model architecture in the beginning. You don't know what the you don't know. What the optimal architecture should should look like for your model.

C

You would need to do multiple, runs and test your your model with multiple. You know different different parameters to find the optimal model architecture, so the so. The parameters that we define the the model architecture are called hyper parameters and the process of searching for the ideal model architecture is called hyper parameter tuning. So let me try this back up to the use case here so applying this to to our use case.

C

Let's say: let's say: if we have an unsupervised machine learning problem our machine learning technique such as means, so you might ask what is k-means k-means is used to partition data points in K clusters and, let's suppose K is 3. Let's say we are three clusters and we have ten data points. What k-means does is it takes that takes the features of the ten data points and assigns each point to either a cluster one two or three, and the data points that are similar will be grouped under one cluster.

C

So when we wanted to do experiment tracking on on that particular use case, we we were sending performance metrics of our k-means cost remodel such as parameters such as the number of clusters, number of pca dimensions, and then we were collecting many metrics, but the most important ones for us was the inertia and the mutual information score.

C

So that's more into the use case. You know, but you know, mo life cycle is the complex thing. We wanted something off the shelf open source that we could pull in. We we're running on open shift, which is our distribution of kubernetes. So we wanted something that runs on open ship to do things like this training, the model doing hyper parameter, building versioning we found mo flow great tool that has these all these great features and then tracking experiments.

C

So this is just a I'll show. You actual live demo of this, but we built an operator, a kubernetes operator that deploys ml flow tracking server, backed by s3 and stores our models and can store our models in sec storage.

C

I'll show you a demo today on on how you could you can experiment track back your models, parameters and metrics and even artifacts? This is the mo flow UI. This is seven man, oh, so just for our demo I'm just rolling out a set nano showing you that you can store your models in s3 in SEF.

C

So if you you didn't want to store, if you want more of a on-prem storage step, is a great technology to try. So I wanted to keep this quick and dive right into demo.

C

So so we have open shift here. We have our operator here mo flow tracking operator, so I've already deployed it. We have one instance of the server and we're gonna. We have mo flow here, that's already running so that one instance is there it's pointing to an s3 bucket called Zack Hasan and that s3 bucket is here in Seminole, okay. So what I'm gonna do is I'm gonna run this this code here, which is the standard example, that's on mo flow website, but I'll just point you to where the code lives.

C

Which is here it's just a standard example where we're using a standard example and it's using the same MO flow api's. It's logging, the parameters, it's logging, the metrics and then it's logging storing the model into the server and it's choosing different parameters. And then we can compare these three different runs. But let's without further ado, this is our our UI we're gonna refresh just so, there's no smokes and mirrors we can see. There is no runs in there right now.

C

So if I go ahead and do a run, so we already did a build of a done that get repo. We built that already it's already here, we're going to inject a secret that secret is is here since we don't want to store our. We don't want to ask users for their AWS access key their secret. We kind of want to keep it in secrets, and then these secrets would get injected into the container okay.

C

But let's go ahead and run this we'll call this open ship comments and.

C

Let's go open ship comments. This container is running.

C

One two three and let's refresh here: okay, so we have our three experiments. Let's take a look inside, we can see graphs.

C

We can see our model stored here.

C

We can see our dependencies here. We can see the format that Manny was showing us, so if I wanted to go actually and and maybe pull down, this particular model file, I could just go to this s3 address so I'll, just search for the run ID in SEP storage.

C

I'll, go into self storage, find the run ID.

C

Or the fact model- and this is our model here- so I- could pull that down from s3.

C

But going back going back into into this and multiple UI, but since I have three runs already I want to say: let's say: I want to compare these three run side-by-side I can do that and then I have a UI. That I can just compare different runs. I can see. You know.

C

You know I could I could take a look at different runs and will not compare different different things and I can see them side-by-side. Right and I can choose the one that provides the best metrics.

C

So that's that's pretty much the the quick demo of running an M machine learning, job tracking experiments, storing the model and s3.

C

So, just to wrap this up. What's next for this, so the next thing we did was we: we created a PR in the mo flow repo and, like all things, open-source, it always starts with the PR, and we have a PR in ml flow and we're contributing this this operator to the mo flow community and collaborating there, and that's that's pretty much it so.

C

I'm happy to take questions.

A

Okay, well, that's awesome because you actually answered a couple of questions for me in terms of what the next steps were there. So in terms of getting this into the ml flow repo and getting it donated there I'm happy to see that so thanks, Zac and Manny for making that happen.

A

We're looking to see if there's any questions in the chat.

A

Don't steal any right now, but what I would do is I'm going to stop and unmute people. If you would like to ask any questions.

A

I'm muting everyone- you all are now free to unmute yourself and ask questions. I have one question for Zac about your operator and that is: are there any? Is there anything any dependencies on the openshift distribution of kubernetes? Or could this just run nice nicely on any standard, kubernetes distribution.

C

So this is you in 100%, k-8 kubernetes dependencies there's. This should be able to run on on kubernetes if there is no dependency on OpenShift but yeah. So it's it's all kubernetes stuff.

A

So what what are it sort of the next steps that you need? I would love to see this by the way: I cuz, I'm, totally biased and and I put the link in there to contribute it to operator of Daioh. So if you can take a look at that link and see if we can get it in there anytime soon, that would be awesome from my point of view and it would give some more visibility. Ml flow and folks at data bricks are doing and I'm just wondering, but is this?

A

Is this operator I know it's a hard question the answer production ready yet or what kind of feedback do you need before you're comfortable having it in general, widespread use.

C

I'd like to compile it it first off it get, it probably have more rigorous testing and you know, run it on more current kubernetes. Perhaps certain google cloud and other different cloud providers and see how you know have it running. There probably have some documentation on how to run it on Google, Cloud, IKS, Azure kubernetes. It should all run all perfectly fine there.

C

It's just I should I want to try to run it up. There also I want to get feedback from the ML flow community. There are multiple users and have them try it out, maybe run it on mini shift or mini cube and and and get feedback there, and then we can continuously improve it and make it better. So I think it's it's it's. They should be able to be used today. We should be ready to use today. Okay,.

A

Well, that's good news and I think if we can get it into operator hub do and maybe money if we can send an email out to your with the videos from today and the links to the mailing list. We we ought to be able to get you some more eyeballs on the codebase, and maybe you won't have to be doing all of the testing yourself back, though that's one of my hopes here.

A

So we can find other collaborators to work with you and give feedback and log issues and continue to document and tweak this so that it works well for everybody.

B

That's another great idea.

A

Awesome all right, good one chat in the chat. You just see who's got questions yeah, perfect they're, just asking you know, talking about being able to run ml flow on mini shift and huh yeah that'd be great Hema. If you just unmute yourself, you can ask those questions. If you get the okay there.

C

D

Yeah hi, um this is Hema I'm, currently working with the AIC OE team had a conversation with Zak as well, with the different use cases for Emma flow, particularly for the machine learning aspect. As a data scientist I was testing and trying so I just had a couple of questions. Manny yeah.

A

D

Do know that there are a lot of like you mentioned. It was great to know that there are new versions of ml flow. Coming with with some of the features that you mentioned, and the only questions some of the questions I had was, as I was testing out ml flow. um Let's say we have like three or four data scientists who are working and testing different kind of models. One is running k-means. Someone else is running, you know k-means plus + or some spectral clustering.

D

So when we run the ml flow server, which is a one server which is shared by multiple users, is there a way to sort of identify or, if we're all pushing it under one experiment? Is there a way to sort of analyze this, or do we all just sort of push it under one particular experiment? I d-- or give it a name?

D

Is that like something that's already supported or just just wanted to know, if, if there were any further easier ways to sort of understand what are the different people, how the different people are sort of working on the same kind of project, yeah.

B

That's a really good question email, so I I can answer it a couple of ways. So an experiment, if you start thinking about it is, is your way of organizing different runs. So let's say the all you four or five data scientists are working on the same project and you want to sort of work off of each other's runs. You want to sort of share your data with each other and you're related to one exact problem that you're solving. Then it actually does make sense to wrap them up as one experiment.

B

Okay and then you all are submitting your runs to the exact same experiment, and then they will be recorded as different different runs. That are because, if you look at it the runs are you you IDs, so you can then have, and in the in the UI they will be recorded with your specific user ID as to who recorded it. So it's very easy to sort of do that. Now.

B

You mentioned a few minutes ago that everybody may be using different algorithms, and that's fine too, because each algorithm is many different set of parameters they they will generate different set of metrics.

B

It might get a little harder to compare across those, but if you really need them all to be together, Emma for supports that, if you prefer to have them as different experiments, you can just create different experiments and and and store that so, for example, let's say all of you working on some eric, like tricking tracking or something like that, and you want to sort of have two different experiments for two different class of models. Right, let's say one is a deep learning model.

B

Another is a traditional model you could just sort of like you know, once you're working or you working on that project for the deep learning you can record it to a deep.

D

Learning experiment.

B

If you want to go a traditional model, it's going to the other one, so it all depends on how you want to use it ever flow supports. All of this, as I said, it is designed to scale across a lot of users. In fact, yeah a lot of users that use a muscle, open source. Have it such that the servers running off of an ec2 instance, the data stored on s3 and and a bunch of people. Dozens of people in the organization are reporting the data to the same server.

A

I'm just looking to see if there are any other questions, we're almost to the top of the hour, so I always love it when I think I'm, giving everybody 15 minutes, but you just blow it out of the water and then use the time wisely. So I'm going to thank everybody for their time today and we'll there's a couple of things that I'm looking for people to add into this the notes here.

A

So if you can link in your slides or send me a PDF version of your slides I'll, send it out through the mailing list, as well as post it on a blog on OpenShift comm with the video and, if there's a topic that someone who's listening today wants to talk about at our next meeting, which is on April 5th. Please just reach out to myself and again at D Mueller at Red, Hat, comm or via the mailing list, and we'll add you to the agenda for the next one.

A

So of note, there's a couple of things coming up: okay, they're! All just thank yous you're! All welcome I'm, really thrilled to host this, because I learned something new. Every day with the with the stuff, the guys are sharing. So, as I mentioned at the beginning, they're on March 11th, there's going to be an open ship, Commons gathering, which is a number of the videos coming to talk about their inference.

A

Work on video gpus and deep learning, they're boomers coming to talk about their M 3db operator and there's going to also be an operator's workshop hands-on in the afternoon in a second room, parallel track. So if you're in the area for that come for that, as well as I'll, add in the link here or they, the SPARC summit, and thank you Manny for the data bricks discount and if there's any other events coming up.

A

I know there's a few coming just in May one of the open data science conferences coming as well soon, so, hopefully we can think up because I think that's just before Red Hat's, not.

A

Anything else anyone has I think this was really yeah very good, very well done everybody, and thanks very much and I'll push the content out, and hopefully we will adds more feedback to both the ml flow overall project and to Zacks operator as well and I'd love to see it in operator of sooner than later.

A

If we can get there so Zack and now me, if you need any help whatsoever on that, and let me know we have a whole little team pumping to do coaching for that right now, so now's a good time to get it in there all right, then perfect.

B

Thank you for putting anybody down, I, really appreciate being able to talk about ml flow here. I'm.

A

Happy to have you every time, so we'll do it again when you have enough another update. Thank.

D

C

Thank you, Dad all.

A

Right thanks, Zack, the fullness all together, take care all.

C

A