Cloud Native Computing Foundation KubeCon + CloudNativeCon Europe 2019 (Barcelona), 22 May 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: KubeFlow BoF: David Aronchick, Microsoft & Yaron Haviv, Iguazio

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

KubeFlow BoF: David Aronchick, Microsoft & Yaron Haviv, Iguazio

https://sched.co/PiUF

A

Before I begin is, can anyone volunteer wow? That is loud? We turned on just little bit. Would anyone volunteer to just write down questions, usually I like to do these very open-ended and just get a bunch of questions, and then we can republish them later and we'll publish it to the queue flow discuss list and people can talk about it. You might thank you always always step it up. Josh.

A

Did everyone people are coming in late? Did everyone get the room switch? Apparently this is not financial services. I think that's the for no see one! Sorry C one C one! Thank you.

A

Okay, well, let's get started. This is cute flow birds of a feather. Thank you all for coming I like to actually not have any agenda whatsoever here. The idea is a birds of a feather we're just supposed to get together and talk about what issues we're having or questions we have. This is actually a merging of two different birds of a feather that we're trying to get together right now.

A

The cube flow side where I'm happy to answer questions, but then also we're doing a cross project effort around metadata and seeing what we can do to bring coop flow metadata to other cross platforms. And you know we can answer questions about that as well. The metadata folks, let's make sure to loop up afterward, and we can talk about what we should do, but let's start it up so who's using Q flow. Who has questions comments, points of view things you'd like to bring up who's using in a production, good experiences? What do you got.

A

B

A

I'll give you some different: okay.

B

And my idea is what is in the roadmap for a good flow in the sense of managing these new challenges that are architectural and on process and related to the data science, I'm or ops, guy yeah, but I'd like to have the knowledge to how is a future architecture will be storage? How to deploy. For example, there's a part where you can with the Jupiter. No, not a book crate type line. It's not fully automatic.

B

There is not a visual tool to create that, for example, in data briefs or in the production ready function for the data scientist. So what is the plan? What is the fusion? What do you make so.

A

The future comes tomorrow, I strongly suggest. The question was relative to your cue flow and cue flow pipelines. How do you think about moving these things to production and I'm gonna? Add on top of that, without forcing the data scientists to think through kubernetes or operations, or things like that. Let me recommend very selfishly to talks that we have tomorrow.

A

We have a cube flow 1.0 talk towards 1.0, not announcing 1.0, we're getting towards 1, find out tomorrow, where Jeremy, who is sitting two seats away from you will be demonstrating some really cool tech that is shipping today in tube flow called fairing and the idea behind fairing.

A

Is it abstracts for a data scientists all of the elements you just described it lets you take a notebook strip, it of the things that are uninteresting to you package it in such a way can doctor container and so on, with the correct, endpoints and so on, and then it deploys in whatever form you need either locally to a distributed cluster, maybe to the cloud doesn't matter, but but it will give you a native scientist.

A

The ability to do all of those steps without having to understand kubernetes at all Ferenc is the bridge behind the scenes that takes care of it. So you know in just a few lines of Python code they're able to do that. The second talk which will be at 2:55 is one well. I will be demoing ml ops as we call it, which is machine, learning operations derived from get ops, and we will be demoing how to drive a cube flow pipeline from a get check-in.

A

So you'll do your get chicken of your model code that will you'll parse it through make sure it's appropriate that notebook is correct and and everything is correct and then it will kick off a cube flow pipeline right there that does your training and potentially even moves it out to production, but then potentially and the demo we have tomorrow, just to show what it we see is a very common scenario. It hands that that trained model off to an inference endpoint.

A

So, instead of self hosting your cube flow, you host it using something like Google, Cloud, machine learning, engine as your machine learning, engine and so on, which is a very common scenario because you don't want to you know necessarily host on the same cluster you're, doing your training on sorry and- and we also have a Thursday session from McGraw Co. You should check out their stuff as well they're doing.

C

Ok pipeline right and thank.

A

You very much yong yong on, for they even extending that further is to go into the server list using the exact same abstractions, but was server less so rest assured. We have heard your concerns loud and clear. We want to get there very quickly, but we're trying to be thoughtful about it, and and do this in a very production-ready way.

D

Hi Matt I've a question regarding multi-tenancy I want just that there are lots of different technologies to do it. At the moment, q bonitas we have namespaces and network policies, OPA split and so on and so on, and so on. Then I know that q4 consists of several tools that we have what what Jupiter habits is using or not using honestly. So when do you think will be the time that or how will you find a good way to have multi-tenancy and all the role so.

A

That is probably if abstracting data scientists from kubernetes was the number one feature request.

A

Multi-Tenancy multi-user is the number two feature request Jeremy's about to answer your question in far greater detail, but we have heard that one is very, very loud with her as well Jeremy. Do you want to talk about anything in specific yeah.

E

I mean we're. The plan is to leverage the functionality, that's landing in kubernetes to try to deliver the multi-user story. So obviously that story is still evolving in kubernetes, so we're not gonna be able to get very far ahead of kubernetes, but mostly what we're looking at doing is using things like sto to isolate.

E

You know, network traffic, so that you know you can't access somebody else's names, notebook if you shouldn't, and then you know using namespaces to isolate teams and users, and our back rolls right, so you're gonna be sort of limited by you know how. Well you trust those mechanisms and, as those mechanisms become more secure, you'll be able to do stronger tendency isolation. Then you can done. But that's that's our current thinking, yeah.

C

You break things, do namespace, it's still not fully baked, but you can actually you're not forced to deploy in cube flonase with namespace. So you can actually create multiple cube flows in different namespaces.

A

Right so the summary is again: we've heard it loud and clear, more than anything, we'd love your guidance, there's that in fact, I think even a multi-user cig and am I wrong about that. Then we have a multi-user say or is that the on prim sig that keeps talking about multi-user? We.

E

Don't have one in coop, though there is a multi-user, say: I'm, sorry, multi-user multi-tenant sake. I believe in kubernetes.

A

Yeah yeah, so that's it, but regardless we have heard it at the end of the day. Again we want this to be. You know fully abstracted for a data scientist. They just go to an interface, they click a button and they are provisioned within whatever their namespace or their scope is a Jupiter notebook that they that can operate. So that is absolutely our goal. It's just a matter of when and the we can get to win faster. If we have your feedback.

E

B

Look I really do have a bunch, a lot of questions, I'm.

A

B

Okay, some problems, standardization of serving be quick because it seems a battle between tensorflow serving and seldom, for example, is there's a way of civilization. First question: another one: okay, you can answer this one nephew so.

A

Serialization for serving no.

B

Sorry, and serving just a serving it's.

A

B

Protocol or standard or something like an api for the future, your.

A

Your you're speaking my language, so one of the biggest things we're doing around metadata right now is there is a cross group collaboration right now who is working on exactly what you're describing it's called KF serving it's an open discussion and and those that are interested in working together. This is people like cell. Then we have people from google.

A

We have people from asher who are we.

B

Seek sorry he's, especially in tennis to you it's.

A

Not an official sig right now, because we don't have a sig like procedure, but certainly you know it's not being done in secret e there's just like conversations. The specifics is, ideally you will.

A

There was a there will be a way to explicitly through a standard set of api's provision, a model to be served in any of a number of serving formats, and you should jump on board and- and I know the team is working really hard to get there, but but yes, the the long and the short of it is yes, they are working towards a standard api around that I mean I, mean I, know yeah yeah, the my card. Okay,.

F

My question is a mail flow, for example, have met some works on on standardization? Why there is any and I didn't see on integration with what to ml flu, for example,.

A

Sorry there is super, actually you said, ml flow has what.

F

Works some stuff don't on standardization of servant, for example, ml.

A

Flow as a standardization serving.

F

And, for example, there is experiment tracking, like does keep flu pipeline. Wider white wasn't Justin to great it in inside.

A

So so I know a lot about ml flow ml flow doesn't standardize what they do. Is they abstract away the actual interface, and so when they do an ml flow deployment. They're doing the hard work to translate between what happens in ml flow to the the ultimate serving stack. Unfortunately, that means that the provider of the serving stack doesn't a lot of their goodness doesn't come through right, because there's no standard way for them to extend it or anything like that.

A

We would love. We would love ml flows collaboration, they don't you know they haven't seemed to do that yet, but we think with what we're doing with K F serving we're gonna give you something that actually works across a number of different serving stacks and is supported, most importantly, supported by the vendors who will be doing the serving rather than supported by the framework.

A

So again, you know, and we've talked to them fo many times it's not that they're they're just moving very quickly and they're, trying to figure it out and and for my understanding, they're looking at doing queue flow integrations, but you know they to date. Haven't been interested in working with other open source projects.

A

So what your own just said for those didn't hear: they're working on a standard for experiment tracking across a number of different frameworks and he'll be demoing. That.

D

Hi and then a question regarding the installation process: qfo takes the step to have the installed as a separate, binary and hides the installation process like spinnaker does. Is this the future of cuban ideas so that every application ships its own installer? You.

A

Know, I think installation look. We've we've heard that installation is not easy and we would like to get much much better at it. The reality as we moved away, we initially were a bash script, and then we moved to this binary I think we would love to consume. Whatever is the the cloud native most flexible way to do it at the end of the day, but we've we think that the go binary that that the team has worked on is probably the best that we found so far.

A

That said, I suspect in order of weeks there will be a sample Hjelm chart for installation out there and other things like that. So you know we're again we're open to whatever the team would like. I think that what you're really seeing is the number of layers that for a long time, we're merged together and we're really trying to tease them apart. So you say like well: here's me provisioning there, the hardware or the Aya's layer.

A

Here's me provisioning the services on the overall framework and it here's me providing you know, provisioning cloud, specific extensions and we're trying to tease those apart. So that they can be loosely coupled but Jeremy, would you like to yeah.

E

I mean I think David's exactly right, KF Carol, it's basically I would say it's like syntactic sugar, trying to address a usability issue that we saw. So ideally, we would just be using native installation tools, whether that's um or customized, or something else, but we. What we saw was that if you just write down the commands, that user would have to do to actually do it. It was just too many, and so we as David said we started providing a bash scripts and then, for various reasons.

E

We decided to rewrite that in but yeah I'd love to get rid of that. If we could. But it's a question of what's what's the right solution.

G

G

Hi David there's a hot topic about the get-ups I. Don't know if you guys think about how the cube flow or the metadata discovery of this pack and evolve is the kid yeah.

A

I think you came late. We actually have a topic tomorrow about ml ops, but I'm gonna steal the second point of what you said and and talk about that for a moment. What one of the things we want to leverage here is really think about metadata right and the cue flow team has done a great job, integrating the ml M D from tensor flow extended, which is the standard library for for you, know, reading and writing metadata and things like that.

A

That's great, but that still leaves our job or you know a number of different people's jobs to define what the metadata is right. Think of ml M D is the whatever G RPC. This simple way to read and write metadata, but it's up to us to define specs and in the cue flow metadata, repo there's already a great start there and you should all jump in and think about like what looks good there.

A

Look candidly speaking, I think that that metadata is a cross industry issue and if it is just something that exists inside of Q flow I think we're not going to get as much adoption to of these specs as we would like. I. Think and again.

A

My proposal that is coming soon will be to figure out a way for the ML Commons effort, which is a very broad industry, effort with people like Google and Cisco and Dell and Stanford, and all these folks coming together to establish some standards around ml specifications schema, because if you do that, then now rewriting or writing portable ml pipelines becomes very, very powerful. Right. Imagine let's say you're a Google cloud customer and you have you're using cube flow.

A

But then, instead, at the last step, you want to use cloud machine learning engine to do your hosting today. That would be hard because you would have to have some translation layer between how to serve in cube flow and how to serve using cloud machine learning engine.

A

If the metadata between those steps were standard, then doing that translation would be much much easier and and the steps that I'm talking about schema Tizen that I think a really powerful are, for example, when you finish your your data processing job, you have a schema that describes what what you did to your data processing. Literally, you know what the source data was, what the transformation was, what the state statistics of that are, after you finish, training your model.

A

Perhaps you output, you know, standard statistics around that what framework you use so a version so on and so forth. When you want to package your model for production, you might want output, profiling information. This needs GPU. This needs CPU, so on and so forth. That's another bit of metadata and then obviously how to serve this in production. How many regions, how many zones you know what kind of machines so on and so forth?.

D

C

Schemas would be for it, for example, for training jobs and data, preparation, jobs, etc, and what I'm currently focused on and I'll show a few slides in a minute is how do we create sort of this common layer? The first substrate that, on top of it, will build those schemas yeah.

A

By the way, does that mean I I'm? Not the only host of this. Your own is also a host of our birds and feather does everyone. You know your own first off. Do you know who I am I'm a guy I helped start Kiva you're on you, wanna introduce yourself.

C

In talking to the microphone, they can okay, so hi I'm, Iran, I'm, iguazu CTO. It was just sort of doing machine learning platforms for for a living. What we're trying to do we're also adopting cube flow in many of our solutions, but we're trying to we've examined all the different solutions in the market, including the learning of ml flow and other solutions, and we're trying to serve create a common denominator where we can work with the cloud provider tools. You know adrià or sage maker or whatever our platform etcetera.

C

So one of the things the themes for this session that we want to applies present a way to deal with method a we also did some github issues around proposal and and get some consensus from all the serve vendors as well as users. What this needs to look like I think it's very critical for us to agree on that layer, and once we agree on that layer, then David has a lot of vision around building those sort of what I call schemas or domain specific metadata mechanism and.

A

And so, as a summary, though I I do want to say, I.

A

Feel like I'm in an airplane Airport, but the net of your point is absolutely correct. First, off I I cannot overstate. That cube flow is the first public project who is set even the the idea towards metadata, so they are setting the bar right now and I want to like. Have it plug in all I want to do is take it slightly further than cube flow, so that other folks can plug in as well, because we see this, you know this need all over the place.

A

Do you want it? You dude.

A

Sorry, that's my fault. Well, why do we do a question while we're doing setup and who else have a question? Yeah go ahead?

A

What about tensorflow?

A

Okay, sunette? Yes case? No, it's dead, we won. So we are a 0.5 project. We knew that we would make decisions in part of being before 1.0. That may not have been the right ones that what case Annette is on me as much as anyone we needed a framework to take to allow for portability, one that could be parameter. Ties based on the environment that you ran on. We look at case in it.

A

It looked like a good mapping, unfortunately, with it being deprecated, it's just not a good fit for us anymore, so we're moving everything to customize which is native to kubernetes. So we feel much more confident that it's gonna be a long term bet for us and it is absolutely part of our p0, but ideally everyone in the community to help us translate taste in it.

A

You know artifacts into customized artifacts and it should be fairly straightforward to do so, but we're gonna build tooling and testing and we're you know it's on us to make sure that that translate translation happens quickly and with a few as few bumps as possible.

A

Do you want to walk through your.

A

Just a little bit more to your mouth.

A

Do use this hello.

C

I'll, take a few minutes of your time to talk about the metadata and experiment tracking problem and how we think we should go about it. So everyone already familiar with the data science pipeline I think just one thing that people tend to forget that it's not always bad GTL. We also swimming and dynamic fetching of data and and all that and every step since you have some inputs that consist of code and some other things and have some outputs that consist of data and potentially notifications and metadata.

C

When you move into the serve the influencing or serving or more, the real-time there's also serve a different type of pipeline. Your experiment tracking is not a one-time thing like when you do a batch. You run something you have a result. When you run something in inferencing, it's actually could be a real time series data that keeps on pumping. So we need to think about those things as well when we create a common execution or metadata model by the way. In my session on Thursday, we'll demonstrate some of those things.

C

So there are many monitoring tools, I just listed few of those. That's an uber tool: that's ml flow, that's model DB and I could list another five of those here. Each one is trying to do model experiment, tracking, okay, and you could see the commonality around around them. They start trying to record some metadata about experiments, some parameters and some output values. Okay, so that's the typical experiment, tracking tools. There are other tools that sort of from a different neighborhood, somewhat related somewhat, not related. They deal with data versioning, so we in queue flow pipeline.

C

We do have a mechanism for data. Versioning has some challenges, but it's there, but there are other tools like dvc. If you've looked into that psycho term data breaks as Delta I assume there are a few other tools in that space as well one of the challenges, by the way that you see that these are different types of tools, one praxic experiment, outputs, one tracks: they serve the inputs and data related. We want one mechanism to track both the other thing that I, you know: I'm also I, really hands-on.

C

Guy I'd like to code and I, do machine learning and all that so I'm I'm feeling the pain when it comes to dealing with secrets. You know moving environment variables around and in my Jupiter notebook I don't have, or in my ID I, don't have secrets. So how do we mess with all those servers? Still maintain security? You know at least we remove the mini one-two-three password from the serving the layer of the of cube flow.

C

Now it's environment variable, but still there is an issue that if you have a serving server and I have different users acting the same serving server, then all of them have the same credentials. So security was it's a challenge. Other things like its runtime specific I would, when I need to run an experiment in my notebook, typically do shift enter. You know, I, don't have any any wrapper that serve wraps and injects my different parameters. When I'm running a CLI. You know you can do command line when I'm running it in a container.

C

I need a way to pass those parameters to the container and if it's in a workflow I need. You know another way to pass it across steps and if it's a service function, it's even per invocation, so there for every runtime. We have different ways of passing metadata and passing it around, and it's also platform specific, whether it's a google cloud or haze or cloud or kubernetes or queue or data breaks, it will be different ways of doing that.

C

So what we want to do is is provide a common way of managing metadata data and results. So it's not just focused on experiment, tracking I think we need a holistic approach to all of those and I'll show you a prototype. I did over the weekend, so it's not product productize. Just to show you how you can go about that, okay, but what we want is to maintain execution, metadata inputs and output, so the focus is on an execution.

C

What is an execution execution can be something that does training could also be a function that does inferencing. So we don't need to think just you know. One of the things I think I told Germany is like we say, experiments in in pipeline. No one said that an experiment cannot be a CI CV pipeline I can essentially create a workflow. That's always work in life is not run.

C

Experiment is essentially to drive a CI C D pipeline exactly with the same tool, so we may even need to consider changing experiments to some other more generic term, so we want to do it in a runtime, independent way and platform. Independent way and key focus is how do we abstract that to the user to those data scientists that want to write code and also we potentially you know everyone? Every vendor may want to build his own visualization tools. Maybe the cloud guys want to store it in there.

C

You know bigquery or DynamoDB or whatever, so we don't necessary want to lock just like in ml flow, there's an abstraction of the the way things are stored, so we want to keep some abstraction. So what do we mean when we say tracking those runs? So first thing we need is metadata tracking, so every run has its the unique ID, but it usually has some name of a task or a job. It usually falls under some project or a workflow sort of some parent. Above it, it comes from some source.

C

The source may came from a git and has some hash key or some tag associated with it, and potentially we can go on and on with metadata through labeling and annotations, like every kubernetes element. Obviously there is an owner associated and maybe group of owner, so that's metadata. Then we have the inputs inputs. We can define three types of or four types of inputs. One is parameters.

C

The second one is secrets, because mostly everything we would need to do, especially when we're starting to think were real production. We need to have credentials to access different resources, so we need a way to inject a secure way to inject secrets.

C

We serve platform-independent way and not just secrets on kubernetes, maybe secrets on kubernetes book when I mean on my ID I, don't have a kubernetes cluster and then we need to think about artifacts and could be two types of artifacts could be served more blood than files and could be also.

C

You know data set or data frames, and then we need to think about outputs outputs again, like three categories of of outputs, I think we're confusing the term when we say matrix in in cube flow, it's actually what I call Val values or outputs values are singular values. This is the result of the experiment. Matrix is more of an array or a time series and that's the same.

C

Essentially the same definition in ml flow, laminar flow metric means an array so and if you're running inferencing a matrix may may be endless because the sense she keeps on pumping data every every run, every injection it could be an ongoing time series and then we also have artifacts, which could be again file, objects and tables and there's also some things like status. You know, when did it start? When did it end? What's the current state of this execution at Sara, so I think in general? Those are the type of thing you know.

C

Maybe you could add more and as a community we want to do the work in surrogate and everyone served will need to find a home for that. But once we do, then we need to start standardizing. What we want is part of metadata in context and then using that semantics. What we want to do is allow very simplistic approach for data scientists to go and consume and and use that context and then we'll figure out how we implement that and go and Python and whatever, but the general idea.

C

We want to start this execution with getting some parameters. Okay, so now the nice thing is that I want to supply some defaults, because, if I'm running in my notebook- and there is no runtime that injects parameters, I want this to just run and just add it and go delete and change and run cetera. The next thing I want to do is so.

C

The thing that you want to do is establish a context who generates the context we'll talk about it, but when I'm running something I want a context, and this context preserves all the state of inputs, outputs and metadata. So again, I could have parameters. I could override parameters from external context.

C

I may want to have access to metadata like in this case the name UID and you'll, see that the parameters out once I'll print them they're, not necessarily the ones that I used as default, because maybe the runtime injected new parameters, because I'm gonna run this exactly the same execution just in different parameters. We want to have simple access to secrets. So again it could be an interface we're getting secrets and using them we may pipeline those secrets into resources like storage and I'll, show that in a minute, and then we want to access artifacts.

C

One of the issues with accessing artifacts is that artifacts could be local file system. Then there's one way of accessing them and if it's object store, it's a totally different set of API. So maybe we could even here come with an abstraction of saying. You know what I'm doing a get or a thing.

C

If you want a road data, just give me the path I'll figure it out, but if it's actual data and I want to change for read locally from a file and remove from an object, maybe I want some interface to wrap it and let something else figure out all those same secrets: how to do parallel, get from s3, etc, and then again we have outputs which could be logging of values, logging of matrix and logging of artifact.

C

So a simple use of that mechanism would be to wrap it in a function that I can just execute and then just generate an object of a context and and run it okay. So that's one example, but we could do things that are slightly more sophisticated again I did it as a weekend job so but again the same code that we discussed it just with slightly more explanation and what I want to do is I can run exactly the same thing from my notebook and just running. You know this is the this function.

C

I won't show you the secrets, because that's my credentials to s3 and I can run the same thing with now, just injecting a context, route command line. So, for example, I'm gonna run it I'm changing the parameter which used to be to remember. It was one and I'm, essentially changing it to three and when I'm running this script, you see that it essentially generate auto generating the UUID. It changed the parameter because I injected the parameter, it also changed the output and it's reading the file from a local file called seekers called whatever.

C

Was there this in file dot? Txt? Ok, so this guy is a local file. I also have my s3 bucket in there I put a different file called in file, and you could guess that it's not it's not saying it's a local file and then I I can run exactly the same thing and just say you know at this time, I want to override an artifact in file. Dot. X is not really in file Docs text, it's like sv, / /, my bucket, you know whatever and then.

G

C

I'm getting is the result of the file is now my s3 object and again I can override parameters, and you could see the things are. You know all the resorts are served, a metadata that could be written to a database didn't manage to write it to a database in couple of days, but again, that's a general idea. So this way we can develop using a very common semantics, which is very abstract for every data scientists could get it.

C

It will actually simplify all the environment variable meshing around and all those things that I injected could be more automated like when I'm running in a workflow. This injection process could be something that pipelines inject into the function, and maybe the secrets when I'm running on kubernetes will be gathered from kubernetes secrets, whether they're manifested through environment variables or through a file. By the way.

C

This the reason that I didn't need to say anything about the secrets to get to my ESRI bucket, because the secrets file sentient has all the secrets that are passed across all those things that need access to secrets, but this just demonstrates a simplistic way that we can all, as a group, a trying and again not pushing any of my agendas but I'm saying if we agree together that this is the standard we want to follow, then it will make the life easier for building the next layer in the chain. Okay, any questions comments.

C

A

Right, I guess we're out of time.

C

A

They kick us out. Do.

C

You think it's a good valid idea, so.

A

We are out of time, I guess we're being removed from the room like now. Oh.

A

4:30 all right well well, wait until someone opens the door all right, that's what we'll do or you can leave it so.

C

Is it something that people want to pursue so.

A

If you'd like to talk metadata, let's make sure to like go outside mehran's point, though just doing something like this, where you have a structure for you, know tying multiple pieces of metadata together through like a context or a run having standards for inputs and outputs for given steps of your metadata will will unlock so many important things and really you know just the presence of those will help unlock you know metadata and get ops and m/l ops, and things like that and and like I said, I think that cube flow has done a phenomenal job being the first person to really reverse organization to think about this generally, but I think it would be great if we could expand it out to help other organizations adopt similar ideas.

A

All right, they still don't have anyone. So yes go ahead.

A

This is a proposal for a project.

C

That you're on is working. There is no.

C

C

A

Be incredibly clear in all of these capacities that the people who show up with code and passion and real-world use cases no boiling the ocean if you solve a real-world use case like the folks collaborating on the standard for calf serving or doing that's gonna I win. No, it's not it's not win or lose it's just. That's gonna have the most momentum. So if you're passionate about this- and you know- have the time and energy and coding- and so what I'm sort of worth guess what that's what then people are gonna to gravitate to? But.

C

I can publish the gate grateful for this. Yes,.

A

So this is absolutely not official, but it's also not not official, because he's doing this and the code is there.

H

A

To visualize the metadata, no, and not not to my knowledge, I mean obviously queue flow pipelines has a great visualization built-in. When you go to the dashboard. I, don't know how decoupled that is. It's just not my area like whether or not you could theoretically visualize that in some other capacity, I can't imagine why not it's probably just stored in ml, immediate.

G

A

But regardless yeah, if you haven't seen it regardless I think you know visualization standardized a is the standard libraries inputs/outputs for visualizing. What I think would be really powerful and if we were able to do that, then model DB could use it and ml flow could use it and keep the pipeline's could ideally use it, and things like that. So if someone wants to show up with that, then that's great too.

B

A

So there there are two things and I don't want to blur them. Cube flow has its own repo called cue flow of metadata with issues and discussions, and things like that. There is in addition to that, an effort right now called ml. Spec, that's ml spec discuss, and that is a second effort. That's ongoing right now. My dream is that the two are coincident, but cube flow has to move faster right now. An ml spec is waiting for ML Commons to land before we make that official.

A

So, if I could wave a magic wand, we would cube flow would continue on its path and- and we all all the people who wanted to work on ML spec, worked on cube flow metadata and made sure that it looked good and was portable and things like that, and then, when it came time to graduate, we moved it into ml comments and then everyone's happy, but we'll see I'm sure there will be. You know, whoa. We want to use it. You UID, no I want to use it. This I'm sure that will happen.

A

I will have to forgive it out. Yeah.

C

But I think there is a enough common practice today in the industry, because.

C

A

And and I'm gonna credit Jeremy for saying this, like he did it at the contributor summit. If there is not a specific use case for it, it will not be successful if we are just defining a schema and nobody adopts it.

A

It doesn't matter so there's already great one for serving my proposal is that we collaborate on one for model, packaging or experiment tracking, both of which are big needs right now, right, if you didn't, if we figured out a a schema for model packaging, for example, that'd be really powerful, because then that would allow for provisioning underlying infrastructure based on the model package. Oh I see this is a tensor flow model and it needs GPU, and this much RAM I know how to provision for that.

C

A

Is great but if fairing would need an interactive mode to something like a SAS right and so that that communication would have to occur over some form of standard schematized metadata, and so that's all I'm saying I'm, not saying I, don't want to write the libraries for this. What I really want is just standard schemas.

C

My plan I'm, showing here it's not an experiment.

C

It's actually running full test suite and then we're doing redeploy on target for the different clusters. So I think that's I mean.

A

The the dream is that if we get to the standard metadata, you know you, you know for Googlers here, it's like it's like you develop proto's and those proto's can now be represented and binded to your language of choice. So, for example, we have a schema, you could take that scheme it. You could build a Python package for it and then all of a sudden, now you're able to instantiate. You know pick up this thing and instantiate a native object in your notebook that says.

A

Oh, this was the training run, and this is the model, and this is done and you can make all kinds of intelligent decisions around that. That's really powerful. Again, that's not tomorrow, but it's not. It's also not terribly hard right. Once you have a proto. There are very. There are a whole bunch of standard ways to translate those into specific language. Bindings, yeah.

A

So we we mentioned that earlier case. Annette was a choice. It was not a good choice, we're moving towards customized as the the new templating which is native to kubernetes, but and I. Don't know if you heard what Jeremy said earlier. Really, what we're trying to do is abstract the way or provide clean layers for the different things that are needing to be provisioned and so I could I expect.

A

What you will see is both a packaging format for particular components that is made out of customized, and then you would have something like helm or something like that that allowed deploy many of those things together, or you know something like if you're on Google Cloud, maybe its deployment manager, a deployment manager template if you're on a juror. You have as a resource manager that that provoke then provisions helm that provisions using customize so I think what you're gonna see is as we as we clean up the layers. You're gonna be breaking these apart.

A

There were, there will be some tactic- sugar, as Jeremy said earlier, that tie these together for your platform of choice, but with with the api's and everything changing so rapidly right now, it'd be a little bit premature to start building higher layers.

A

Any other questions we are out of time, but I'm happy to sit here and keep answering.

C

So anyone wants to participate on the.

C

A

Yeah, if you're interested in metadata, we'll just gather outside but we'll keep answering questions until they like forcibly remove us go ahead. What was your question.

A

I, don't think we've ever thought about an operator like for like for expansion like basically you would you template eyes and hand it to an operator in your cluster, but.

I

I mean you, you can easily configure the operator itself so.

I

Okay, so just the helm, deployment or operator is decay, is just a unified way to deploy everything. Yeah.

D

I

And this approach seems to be working and after all, it's probably something that makes right now cube flow, somehow unusable for on-premise deployments. For instance, it's well I mean KF CCL. It works on your on-premise distribution, but just it's it's kind of a it's an additional complexity, so.

A

I I hear you, you know, I, think that we would be open to we. The installation will be a mountain. We will climb forever like making installations simpler, more portable and things like that. Writing an operator or other higher level structure. To accomplish that I think we are more than open to it. I think we have to solve the get rid of case in at first and then we would go from there, but I have no absolutely no objection with using an operator like.

C

Dense flow could also live without cube flow, so I think it has to be sort of componentized that there is each component needs to take care of itself, and then you could install it whether it's in the context of cube floor. There are some specific tools that our cube flow like utilize, but if you want to install forward or Seldon or nucleo or any of those, they should have an independent solution. Yeah.

I

But if you need to install Selden, for instance, you can just you- can change the logic of the operator itself, so it can handle all this this. So we can use I, don't know config map, for instance, to to describe the deployment you wanted to I'm.

C

Personally, in favor of helm chart for all that for multiple packages, but so.

I

A

Thank you all for coming.