Cloud Native Computing Foundation Runtime Special Interest Group, 17 Sep 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF SIG Runtime 2020-09-17

Description

CNCF SIG Runtime 2020-09-17

A

Good morning, just give it a couple more minutes or a few more minutes, some more people to.

A

A

Okay, pasted the link to the name so to place that yourself as an attendee.

A

All right, so it's 802 almost 803, so thank you. Everyone for joining so.

A

Today we have uh seldom core and the project and clyde is here, and thank you for uh deciding to uh show us the the project and what it's all about. Hey.

B

A

Excited to learn about it, um I think this is probably the first uh ai machine learning type of project that has presented presented instagram time, so we're pretty excited about it.

A

um Yeah so um go ahead and take it away.

B

All right, cool, yeah, great I'll, uh share my screen. If I can, let's see cool, let me see if I can move this cool. Can you see that yeah all right cool so great to see you so I'm uh clive, I'm cto of selden, um so I'm just going to give you introduction to some of the projects that we work on and yeah be good to get your feedback and see how it connects to the things that you're interested in would be an interesting thing for me.

B

C

B

Going to go through some sort of rationales of what selling's trying to do so set the sort of landscape. um So one of the things is uh this paper from google from 2015, uh which is really setting the scene for what we were trying to do um was they're saying you know when you do ml code. You know the data scientists sometimes think, or even the whole organization thinks it's just.

B

You just got to work your little algorithm and that's it, but actually there's a whole set of other things surrounding it, and this was a paper by google from their own internal analysis, which some of you might know. It's got quite a lot of uh fame afterwards, um where the size of the boxes are the amount of code, they had to write for these other things surrounding the ml code. So there's a lot of technical debt that gets created in organization and we'll sell them that was sort of inspired us to start doing.

B

What we're doing is that we wanted to solve those parts of the technical debt in terms of serving infrastructure, analysis tools and the monitoring of your machine learning, et cetera, has really helped our organizations in that area.

B

We've seen a trend in recent years that the organizations are looking for best to breed and parts for the whole ml life cycle. You know from initial data um analysis and um through to the training and then the serving, and um so that's really the direction we see. That's taking obviously also with the cloud native world and tools fitting into that and we'll discuss that a little bit. um So that's one of the rationales for seldom to help organizations in that area.

B

For these these things with their ml codes, so their data scientists can focus on that part. uh The core you know uh code part um so another way of looking at what we do, and so the issues uh that we find uh in organizations is one you have. The data scientists on one side and they've got their own set of tools that they know a lot about.

B

You know, obviously, all the training tools and tensorflow and video spark etc, and then you, on the other side, you've got the devops who've got their own tools in the cloud native world and all the tools they work out well, and there really is sort of quite a hard divide between the two. The data scientists don't really care, sometimes too much about the devops. What the devops are doing and the devops are scared about the machine, learning and all this stuff, which they don't really understand.

B

It's not so simple as a normal, app they're going to just deploy onto their kubernetes cluster, and so you really get this um issue of of of these two teams working together and that's also what we're trying to help trying to put some devops into data science and data science help them move their stuff out, so they can get it into production to really speed up that time, to get those uh machine, learning projects from just being a project out into production and making a difference for um companies and organizations.

B

So that's, that's! That's another rash. Now this one obviously should make a lot of sense to you guys- and this is you know, we're saying nothing new- that you know a person as a data scientist they've got tool kits, they want to really deploy their model scale and update it. You know onto a compute cluster with cpus, gpus and tpus, and obviously we all know about the um sort of stack that's been built up over the years from the container runtimes.

B

Then obviously, kubernetes is coming in to orchestrate uh more complex projects on top of those container runtimes, and then projects such as istio for the service meshes it's about you to do all the things that that allows you to do in terms of handling that sort of network management and then interesting projects like k, native and other ones, to really build on top of that for serverless and there's there's that sort of gap in between that and what the data scientists need to do, and so really uh the issue is, if you wanted to use that stack, um you've got all those things that I think we all know about it, but data scientists are probably less interested in.

B

So I'm not going to go through it. The big list doesn't think it will all be um familiar as well. Some sort of communities I'm like aficionados, uh but here- but um you know it's quite a challenge for people to do that as a data scientist or some companies to get all these things when they're trying to get their machine learning out of the stack so really so to sell them. What we're trying to do here is with our project seldom core and care serving uh just building this stack and to bridge that gap.

B

It's about data scientists, it's a sort of focused tool which is sort of cloud-native and allows them to get their machine learning into production. So that's that's another question now for us and then one of the last things is also very important to us. Is it's the whole um so ethical side?

B

So obviously, there's a lot of interest in society of how ai is going to be used when it's put into production, um and so that has built up a lot of momentum and it's obviously key to for all the projects and companies that started to use ai and apply it to their customer bases and then there's now beginning to be a lot of regulation in certain areas.

B

You know so gdpr in in europe and another another regulations coming in, which needs to be applied and trying to build up some sort of rules for how ai can be applied, but there's really still there's still a space of how that is actually going to be how companies need to apply those regulations and what is the best practice when you actually put your machine learning in production?

B

How do you know that it's actually um not harming your user base, and you know it's really doing what it says and can be audited et cetera and so there's a gap there and that's also what we're trying to fill with some of the tech. So this is so. This space is really emerging. You know from how society is reacting at the base to ai, as as it affects them more and then in the middle layer of companies and um regulations being applied and how that's working, I think that's still being worked out.

B

You know how that's going to apply and then the tools at the top, but it's a space that we're in and we're trying to also help out help out there and then. Finally, I just wanted to bring up one more academic side. Is that um there's some really big conferences every year on machine learning, one of the biggest is icml um and the first time. This is the first time they had like a um workshop icml on challenges in deploying and monitoring machine learning systems. So it's really being viewed also in the academic world.

B

What are the challenges when you try to put machine learning into production and we actually were lucky enough to get two talks into that workshop or one on serverless inferencing on kubernetes uh by myself? And then there was another talk on of the work, we're doing in alibi for um some monitoring, explainability of models in production and there's a lot of research areas that are being applied in this area. What are the challenges when you're actually deploying the machine learning model so just wanted to sort of highlight that as a an emerging area?

B

That's coming also from academia with a lot of research being done um so? Okay, that's so that's the background, so I hope you understand what we're trying to solve and sell them and what's projects that we actually um work on to help to achieve some of those uh goals.

B

This is our stack of projects that we work on, um starting in the layer in the middle, so this is all open source in the middle, um so the sultan core, which probably the most mature project, will be for five years. Maybe I can't remember now, but it's got a lot of traction. That's going to into more detail, that's providing like a abstraction, a custom resource. Allow you to define how you want to put your machine learning model and or whole inference graph out into production, managing it.

B

Updating it scaling, etc, and then, more recently with in the last year or so, we've been working with a group of companies on cash serving which is building very similar aims. It was building on the stack of k-natives, so looking at serverless.

B

How can serverless help uh in the area of machine learning deployment, um so that's sort of the middle layer and then at the bottom there, which feeds into that middle layer a suite of projects that we've been working on to focus on some of the um things you need to do once you've put your model out, which is obviously the core concern for I'm a company but creating a data science model and putting it out there then the things that surround it. So things like explanations.

B

um You know, why is the actual model giving the response it? It's actually given and try to understand that uh for various stakeholders, the customers be auditors or the actual data scientists themselves who want to understand how the model is behaving. So I'll discuss a little bit about that, and then we have another project called alibi, detect uh which is uh focusing on the ways you need to monitor models when you put them into production, so things like drift, detection, outlet, detection, um there's like adversarial, detection and stuff like that and these feed.

B

This is also open source. You can find them I'll give links to them um on github and these feed into, and then these projects above to actually allow you to actually deploy those models on kinetics and add these techniques surrounding your model and then at the top. Obviously it's a company we need to make money.

B

We have um a project which is our core enterprise project, which is bringing this all together for companies to provide a full enterprise, stack uh for machine learning, bringing all this open source or like an open call company and feeding it in and then providing a full solution. So companies can scale uh and manage their models I'll, give a quick glimpse of that at the end. Obviously I'll focus more on the open source so going into a bit more detail on some of those projects. So we have seldom core.

B

So what is that trying to do? That's basically allowing you to build up a whole graph of containerized components that are doing inference and put them together, allow them to be reusable in different projects and then manage that for you, so that graph is um defined by a custom resource. So we have our own operator, that's running that and you define by that custom resource the various components, uh various customizations you want.

B

You know in terms of what customizations and what the model is, um what type of model it is and and how you want it to actually connect together. So you might define just a single model in your inference graph or you might have much more complex things. So we have customers, for instance, doing large inference cars with things like multi-arm bandits which which decide in real time based on the input traffic which of the underlying models. This particular request will go to.

B

So you might add that, in with a suite of different models, and then you might tie in further things earlier in the um inference chain- uh maybe some feature transformation that needs to be done before the actual.

B

The request gets the model, maybe some transformations that need to be done after the request has come back from the model and then adding an avid other things like I discussed like outline detection and explanation, so you basically allow you to define this whole graph in uh yamalotsi, yum or json as a custom resource and then deploy it and then manage it and update it.

B

So that really handles the core thinking of what data scientists want to do, deploy scale and update their model, and they do that by updating that custom resource um with the various definitions. Then then, the next question, I suppose, is how do they get the core components of to actually derive these sort of containerized boxes in their inference graph and there's really two ways that we provide. One is out of the box machine learning service.

B

All they need to do is okay, here's, my artifact on s3 or google storage and then we'll fire up a say, tensorflow serving or a triton server. You know for that artifact and manage it and um tie it all into the inference graph so that the actual api can be used through that. That's one way, that's very popular.

B

That's obviously the most simplest way that they they can use, uh sell them once they've trained a model and got that artifact onto some location either on the cluster, as I say, or in the cloud in some bucket, then the other way, which is surprisingly actually quite popular as well um with lots of organizations is, if we have, we have um particular language sdks, so in different languages, if you have custom code and what we allow is the data scientist just to focus on the prediction codes say in python.

B

We have a python wrapper as we call it and they can just focus on the predict call of the python wrapper and maybe some other stuff to set up their model at the start and then we'll manage that in terms of wrapping it up into a um it's, a micro service allow them to um and containerized add in the metrics uh tracing and other other parts, and so it can easily be slotted in here as part of that inference graph and that's actually quite popular with a lot of our customers who have like custom code in different languages.

B

We have people using java, some people using r there's, actually a talk at the last kubecon europe about uh guys from nasdaq um using seldom core and they've got some models in r so with various language wrappers, and uh that allow you to wrap up your code uh so allow the data science just to focus on the machine learning code and then put it into the graph.

A

I have a question so yeah, uh you mentioned the models on the right side um and they can be like uh artifacts in s3, but once those models are created, the models are created. They are they loaded on on into memory or some other place where the influence actually can happen faster or is it? Is it still in s3, or is it still or like? Maybe ebs, storage or maybe typically, some of these models may be kind of kind of large right, so yeah.

B

Absolutely that is a quite a good point. So what what we allow them to? What we like people to do is define a uh location where it's from actually s3 and then the actual model server will download the model from s3 onto the local volume and then run it in memory.

B

So there is still work to be done in terms of very large organizations that maybe have lots of um using the same model from different locations. How you can use caching and we're looking into that sort of caching layer. That's that sits between um so s3 and the actual local model server, which will have the model in memory. um But at present we provide um sort of uh standard downloaders like a in a container that runs once the model server starts up.

B

It has the inner containers given the location of the model and it understands how to talk to s3 or gcs et cetera, and then it downloads it locally. And then the server starts in the main pod and reads from that local volume and reads the model into.

A

B

Okay, got it cool. Thank you, cool, and so one other thing we add as part of what we do is is a service orchestrator. So obviously you just define this graph and you don't need to decide how how this graph connects together. We, so you can that's one way you can define how the graph connects together, but but will manage the sort of request and response flow uh so through the graph.

B

So we add in a component which you don't see here, which is a surface orchestrator which is going to take the request and manage that flow. It's going to say: okay first, I need to call say the feature transformer and then once the response comes back to that they've defined this to go to a multi-arm, bandit and I'll, send the response to that. And then the multi-magnet says.

B

Okay, I want to send to say model a and in this case I'll send the model, the request to model a the model, a response, and it will send it back for out the graph. So that's an extra component or sidecar that we add in uh to the grass. But apart from that, they've got complete flexibility, so they can define parts of these to be in certain pods that scale it in a sort of different way.

B

So they say the feature transformations could have like hpa uh the scales in one way and and the models can be on like another uh pod that is going to scale on us, so different metrics you've got a lot of flexibility in how you define it as well.

B

I mean then in terms of how people use it so in terms of what we've done in terms of our life cycle, we've focused initially more on so rpc use cases a real time, machine learning inference and that's probably how most of our customers um come to us and use us at present, but also what we're seeing is customers do want like a unified solution, so once they've created their model, they don't just want to expose it via rpc.

B

They want to also send us a batch request to it or or use their streaming, but I say kafka, k native um and all and send that so we allow them to do all free, basically um and it's very easy to use the same components and irrespective of how they're going to sort of consume that model in their organization.

B

So, let's seldom core and I'll just go on. If there's no questions, sorry, I.

C

Don't know if you can hear me, though. Oh sorry, it's a bit, it's been low. Your volume.

B

But yeah I can hear.

C

You mentioned something about containerization of their model. Initially. Is that part of what you provide as well, or do they they are assumed to have used s2i or whatever, when they.

B

Yeah so um also a good question, so we provided our docs as examples of how to use s2i and we we have actually sy builder, to allow them to easily use stoi. That brings in the appropriate uh sort of dependencies say in python and stuff to actually wrap the model easily, but we also provide them um docs for how they can just use a standard, uh so docker file to actually uh create their model and we have people using different methods. Obviously not everybody wants to use this s2i this. You know some people like it.

B

Some people don't and there's obviously many ways of building their containers. We.

C

B

Quite agnostic to that so yeah, it's not like a unified solution, that's that we provide them with different sort of resources to really um create their container in the way that they want to. Okay. I was just.

C

Trying to figure out the scope of it: okay, great.

B

Okay, cool, uh that's sold and cool. So, as I said, there's another area that we work on, which is uh sort of looking at things. You need to surround your model with, and that's two projects that we work on. um um One is added by explaining one alibi. Detects alibaba explains looking at explanations, uh so, looking at um once, your model is out there how you? How if you get a particular prediction back, why did the model give the actual predict prediction? We have different techniques for this, both black box and white bug.

B

So black box has advantage it just purely treats the model, as it says, as a black box, and you just um talk to the model over an api. So so the big advantage of that is you? Don't you don't care how it was created? What technique it could be like a deep neural network. It could be like a tree based model. It could be just a sort of simple linear regression.

B

It doesn't matter because all the techniques do is just query the model many many times normally by changing some of the inputs from the um like initial input and trying to understand how the model is actually responding to those slight slight changes in the input and from that it builds up a picture of how of what the model is taking into consideration, and then it can give like a human, understandable response, and that has a lot of interest and especially in organizations who want to keep it quite separate, so the team that they built the model they can keep it completely separate from the team that needs to explain it.

B

It is obviously also more challenging, though, because it's treating, as I said, purely as a black box, so you're just talking over an api to it. So we also have white box explanations which is focused on if you know how the model was created, so you actually have access to the model weights. You know.

B

If it's in your network, you have access to the keras uh saved model, then you can load that saved kos model, and then you can look at the weights and you can do analysis of that again, probably um doing various techniques to actually understand how it's working or if you've, got like a tree based model. You can actually load that tree based model and try to understand it and give an explanation for this partic particular prediction.

B

So each of each has that both their pros and cons and there's not like a single way to do it and also just to quickly jump into the alibi uh project. uh Just to illustrate the sort of different ways of looking at it. That we have a large number of different state-of-the-art techniques and and the key thing is those techniques give different ways of viewing those exclamations.

B

Also, they have different focuses on the type of input data, so it could be that some techniques work with classification, actually most of them here, work with classification, but some are more focused on regression and also some are more focused on certain types of input data. So if your data is just tabular, um then fine, and also if it has some categorical variables, you know, is it um some work better or worse with that somewhat focused more text, someone images um and other features like that.

B

So there's not really one uh way of solving the explanation, question in terms of machine learning models and also the way these give these models give their responses is also um quite different, and so some are suited more for data scientists. uh Where say some likes of counterfactuals may be suited more for the actual customers. So, for example, counterfactuals would tell you what do you need to change? So if you have like a loan system, say, that's all rejected your loan.

B

It's like automated system, and if you use the counter factual um explanation technique, then it could tell you what do you as a customer need to change for this system to change its mind and actually give you the actual loan?

B

And so it's easier to understand that way, whereas other techniques might help the actual data scientists understand what features are being taken into account in terms of how the models, so they all have their pros and cons, um and we need to work with the organization uh concerned into which one would be most appropriate for the actual models being put out.

B

um This is also quite an active research area in which we and also the other challenges, that's also quite complex, so you need to train some of these on the actual training data, and so even though some of them are black box, so they don't care about what technique you use to train your model. Some of them need to understand the training data. So if they're going to actually uh sort of perturb the input, then they need to know say if it's an age.

B

If this feature is an age, then it only makes sense to sort of perturb it between say, 1 and 110, or something there's no point putting in the value of 1000 and seeing how the model behaves, because it's going to give you strange results. So there's certain you know things you need to look at in terms of that.

A

So yeah this is very interesting, so um do you actually integrate with some other graphing tools? I would imagine, like some people want to understand, you know so about how these models actually behave. So, for example, like something like tableau right, so people want to see how people are using the model or or how or how it's um making its decisions right. So.

B

Oh yeah, I'm absolutely, I think, there's quite a challenge there I mean it's. One of the large challenges inside explanations is really how you show that data to the user and how, how you tie it back into into the how come the actual model is being used. Actually, maybe I can quickly sort of dip into our enterprise body just to show you the explanations in action. Hopefully I can just give a quick demo, so this is one model. This is our like.

B

Our top level viewing panel of cell deploy, which shows the different models you've got running, I'm not going to go into too much detail here, we'll see if we're we're focusing more on the open source, but this will help hopefully answer your question. So I've got one model here, which is a sort of incomes or loan categorization model uh which is based on making predictions uh based on sort of demographic features, is actually a standard sort of data set.

B

We have it's actually taken from the u.s census in 1996, um open data set, so we've got various demographic teachers features and it will predict whether the person is going to have high or low income. um So we can take a particular example: uh send it to the models. It's like a binary classifier saying this person uh 86 chance of having low income and because we've added an explainer um to this model. It can basically try to investigate that model and and understand why so.

B

This is using a technique called anchors, so we've got the core demographic features here and basically this is um showing that this model is quite focused on two. It's saying: maltoset, separation, sexy or female, so say 95 of the time. If you just had those two features, the model would particularly low incomes.

B

Obviously this is key, because this is saying well is this acceptable to put this model out because it looks like it's quite biased in terms of gender, and so it may be that you, this is completely unacceptable, but it's more like you need to get more or maybe that you need to get more data for uh various sections of your model. So I mean this technique anchors, which is the technique I'm showing here really gives you the core anchors that the model is using.

B

uh You can set a threshold in this case, it's 90, so it tries to look for features. Obviously, there's a lot of features here.

B

You know, there's features on what is white colored uh the race and other things where they are united states, their age and other things like that, and it's trying to find core features that, um from that initial request that we're trying to explain that really made the model go in one direction, and here it seems to be mostly amount of status, but because you set the the confidence to be at least 90 of the time. It should be true this answer this explanation.

B

Then they tried to it went further tried to find one more feature which in this case was the gender feature.

C

Is the underlying model here? Is it you know random forest? Is it linear regression? How do we see that.

B

Yeah this yeah, this model is actually just a very simple. I think it's um linear regression and this technique is black box, so it doesn't actually, as I say, depend on the actual underlying uh model, but it's a very simple model just to illustrate that it's quite biased and actually this data set um from the census data is actually very um so unbalanced. There's actually only um one percent of people who have high income, so most people have low income in the data center. It produces very biased values.

B

This is why we're using it as an illustration of the things you need to do to of how you can use explanations, and it can help you understand your model more and then obviously get early indicators of some of these things.

C

Okay, so like, if you have linear regression, and you are asking for a year- that's outside the model, it might tell you look, this is outside the range and so for the linear. Regression is invalid, or something like that.

B

um Yeah, I suppose that's I mean that's slightly related. Actually this technique is actually needs to have access to the training data, so it can understand the range of I think this is the I'm not sure which one is the h thing, but it needs to understand that if it say in what it was doing, it would probably send several hundred requests to the model.

B

It would probably change this feature a little bit 65, you know 55 and stuff, so he needs to know that there's no point doing 600, because that's going to give you know strange results, it won't help the explanation, so it it sort of understands those aspects of the actual training data and then it fires off lots of requests to get this answer.

C

Okay, so it's not really rule based of uh oh you're, doing linear regression in the wrong way or something it's not going to tell you.

B

Yeah, no, it's not like that. It's really just I say, treating the model as a black box and trying to understand uh by um uh understanding how the model is behaving just around this particular value. So it's changing features and seeing if the model changes its opinion and um what are the core things that make it focus on that result.

C

B

Yeah yeah yeah, it's it's called set of techniques, and so here down here, we just give access to the actual training data sets. You can see examples which actually fit that explanation and then there should also be some examples which don't fit that so these are people who have those features, separated and female, but have high income, and so you can see. Okay, maybe I need to expand out my training set by getting more of these. um You know to sort of label these and stuff, and you know it just helps the data scientists.

B

This technique is obviously, as I said, you need to be very understanding of the actual stakeholder that needs to look at this if she gave this to a customer that would probably be very confusing to them. So this is a technique. That's probably more focused on the data scientists, or maybe the auditor, as opposed to some of the other techniques.

A

Yeah, this is one comment. I think this is a great feature. I mean some of the the things that I've been hearing about. Some of the models is that they're biased- and this is a great.

C

A

See that you know they're, biased and then maybe telling some of the data scientists that they need to change their models right or they need to in a certain way, so they they are less biased. Like.

B

Yeah, absolutely so, it can be used at various different stages of the machine life cycle, so it can be using sort of development and sort of so auditing, but also, obviously in production as well, that you know you might get a request. Maybe this has gone, live and someone's saying hey. What was I you know rejected. uh You know for a loan and then we say oh look, this is these are the reasons the model is looking at. You know, and so you know you can get early with the early warning of that.

B

I'll. Just show you one more. This is like a different model. This is the sort of text-based explanations. This is a model that takes um as a movie reviews and tries to decide whether it's a positive one or a negative sentiment. So hopefully I can find an example and I'll predict and then I'll explain that example um and we can have a look at it. So this is a movie review. It's a visually exquisite, but narratively, opaque and emotionally vapid experience of style of mystification.

B

So obviously, that's some negative, and um so what this explanation here in the text thing is saying what are the words that the model was looking at to really make this sort of negative and obviously it's highlighting these two emotionally rapid as the two core things. So it shows like another angle that you need to have different explanation techniques for different types of data and and how those work for different types of data should be different. So you know, obviously this is different.

B

You know to the tabular case and it's sort of highlighting things in like a different way, but it's actually using the same sort of technique, but for text-based data rather than um sort of tabulators. This is anchor text as it's called anchor tabular technique.

B

uh Cool so yeah, so that's alibi, explain and then we also have alibi detect, which is looking at um so more the monitoring side. So looking at outliers, obviously because you don't want to actually give um responses from your model if it's a outlier, because because it's quite like the model is going to give a very strange results, if you have an outlier, you know you should probably throw that result away from the model, then other things like um adversarial attack, detectors and the source area that we do research on.

B

Obviously it's quite niche uh um there's, you know obviously particular areas. It's important for um this is an example taken you know from sort of um a traffic sign detector, so this is a classic one. You know where you've got a stop sign here and the model's saying stop great, but if you just attack it in certain ways by adding a few pixels, it still pretty much looks like a stop sign to us, but actually the model gives a completely different answer and the same for these other traffic signs.

B

um I always wondered what stop sign was blue. I think it's it's like taken from a german data search and I'm sure maybe stop signs of blue in germany. I'm not sure.

A

The use of their enjoyment.

B

Yeah, that's true yeah yeah, so so that's uh we do that. We also look at things like drift detectors which tell you. When do I need to retrain my models? It's the input distribution, the model saying completely different to what it was trained on, because again it's probably going to be giving bad answers. You probably need to retrain it on on the new type of data and it's the same sort of thing.

B

So this is the alibi, detector, github repo and again we there's a suite of different uh techniques again based on some of the different um sort of uh modalities and sort of decision points. Yet is it? Is your data tabular image time series text? Does it have categorical features? You know, do you want to do online outlet detection, or do you want to have outliers at the particular feature level?

B

You know so for like a so image, the feature level will be sort of the level of pixels rather than the whole image, as you need to make decisions based on that and similar for the other sort of things, adversarial, detection and uh sort of drift detection.

B

That's obviously the key is that we we do this data science. Then we bring it back into seldom course. You can then deploy your model using cell and core, and then you can add these explainers etc.

B

Explain this out by the textures to your model, to give the things surrounding it and that's obviously, the goal to allow we're gonna, allow those organizations to do that, um so cool, so I'll go on to another project we work on, which is care serving. So this is like focused on using some of the things from say, k native, so we're focusing on scale to zero, because, obviously you know things like uh gpus and other aspects of machine learning influence are quite costly.

B

If you, if it's, if you've got the model, it's not being used, wouldn't it be great just to get rid of the infrastructure from your kubernetes cluster, so I mean it's really great stuff to do in k native, and so this is building on top of that and everything is we're. Looking at gpu, auto scaling in this project because actually gpu auto scanning is quite a challenge. I think I'll talk about it in the next slide, just how that how canadian solves that, and also what we're trying to do.

B

We actually founded this project with some very large partners, google, bloomberg, microsoft and ibm and we're trying to really create some standards for machine learning, inference and feed them across the whole industry. So one thing we've done is: is created like a standard protocol from machine learning, inference and we're starting to get people from these organizations to actually buy into actually using it.

B

So obviously that takes time as all standards do, but it's it's an interesting direction as part of what the project was created for um so this project is part of key flow, so key flows unless you've heard of it. It's like a sort of ecosystem of machine learning projects right from some data analysis, training at scale and some hyper parameter, optimization and serving- and so this part of that. So it's like a great um sort of location to join together with these other projects in this ecosystem to work on machine learning on top of kubernetes.

B

So, just focusing one thing in care serving that we solve so one thing: that's really difficult is gpu autoscaling. Why it's difficult, because, when you've got gpu some models using gpus you've got various metrics you'll have metrics from the actual server using the cpu and you'll have metrics from the actual gpu itself, and also those gpu metrics are sometimes hard to uh get from your kubernetes cluster, and it's also then harder to combine all those into a single rule.

B

If you want to do auto scaling, you know if you want to say: okay, my cpu or my server gets over. This amount. Am my gpu stats say this, then I scale up and it's actually quite hard for people to do so after. Luckily, we can simplify that by using the ideas from k native, which just are gonna. Basically look at the number of in-flight requests going to your actual pods. In this case, machine learning about ck native is quite so generic.

B

So all you have to do is say what for these pods, um what is the amount of um concurrent requests they can manage? Maybe this model server can handle 10 requests, or maybe it's just one and so basically k native, takes that into account. It actually uses um various site cards and stuff it puts in, like you, proxies to understand how many requests are in flight and then from that it can take those stats from that and how many requests are coming in uh to actually decide she likes.

B

Should I scale up or should I scale down, and so it makes it much easier for the actual user in terms of machine learning who sits at the top? Who just wants to have their thing scale automatically not really to do much. All they need to do is say how many requests can I serve my actual model server server at the same time, and it really solves that obviously there's some other great things in k native.

B

If there's a burst of requests, then they get stored in a component called the activator before they get pushed on to the actual replicas when they've scaled up, so you don't get like too many requests hitting your mod at the same time. So it's really interesting and we're trying to build on top of that technology um for machine learning, inference.

A

The question here is um so when it comes to handling many many requests we're talking about millions, millions of requests, so so the standard practice is to keep uh share storage. You know for these for these spots or you know, so they can. They can actually do the serving from that storage. I mean if you, if you have a really big model, because spinning up uh pots for every single request and then creating a you know, models, storage for the model. You know every for every single pod and we take a lot of time right.

A

B

A

B

And that's a very good point you've. Actually uh I don't think I have it on these uh slides, but um I gave a talk at the um icml on just on kf serving, and so one of the slides was the challenges presently with with using it, and that is exactly what you said. The challenges are that once you scale up, you've got that big lag times.

B

You've got all your requests waiting for this replica to start and so yeah, and that is a clear challenge and actually even some people, if um so at the moment, the way to solve that is is to um uh well there's not really many ways that president of to solve that. But one is to have your model locally, so it doesn't have to be downloaded. So we should get rid of some of the network time, but even then you're still going to have time for the model server to start up or the other thing.

B

The other thing people do is just to get rid of the scale to zero and just say: okay, I want to have at least a certain number of replicas, but then that's obviously getting rid of the whole point of okay native, so yeah. It's definitely an open challenge, um and that is something we're actually looking at as part of cash serving or so with the k-native community. You know how can that be solved?

B

Yeah got it. Thank you. Yes,.

A

B

Your your finger on the one of the key points, uh cool um so yeah, so I I've showed you some deploy, which is our sort of closed source architecture. That's really bringing everything together. So it's it's it's it's allowing you to use core and care serving and alibi, explainer detect and tying it all up with standard components.

B

uh Things like uh you know so things like githubs, which were like a big believer in and say which, if you don't know, I'm sure you do that everything gets stolen to source control before it's put into the cluster. So you've got your sort of clarity of representations.

B

You know, as you define a model and deploy it's pushed out to github bucket comes back on using things like argo cd, which really allows you to give that full audit trail stuff like that, and we tie it together with um you, know: metrics and elastic stack and and off levels with decks and key cloaks tying into ldap and enterprise api. So that's how we tie everything together for like uh enterprise customers and obviously that's quite key.

B

So, like you know, in terms of um some of the models, I've got like a model running here um where you get all the stats, but the key thing is just to highlight: the githubs you've got that full. um You can go to uh see what all the actual actions you've done on the model, because it's all stored in in github every everything you did so with all the canaries, you created or other ways to update it, and you can see what's the difference between that and any um documentation.

B

And obviously, if you feel there's a mistake, you can go back to a particular point in this little chain of your github repo and just restore to that state. If you wish, so it really allows to do that so yeah, that's what we're really doing um cool. So that's the enterprise product. I don't that's pretty much. My last slide, I just wanted to finish on some uh things for the future and things which may interest you, I'm not sure, but um so some things which have come yeah last.

C

Night, sorry, the one that had our going.

B

C

Is already using tecton, you know instead of argo, can they should they just use and they wanted to use selden? Would they have argo and selden on their system or like? Can you replace argo in that picture with tacton.

B

Yeah, this is actually something slightly different. This is I'm like another project in, like the argo ecosystem called argo cd. So it's it. It's all about sort of um um git, ops and sort of syncing things from a source control onto the cluster. Okay,.

C

Okay, I got it.

B

Yeah yeah, thank you it's, but it is actually confusing because, like actually when I created this slide, I couldn't find, but there wasn't any logo for argo cd, so I just stole the actual argo logos. This is why, like exactly what you said happens, people think it's just argo yeah, okay,.

C

All right thanks.

B

I think the argo cd people need to get a logo um so yeah. Some final um points, some things we're looking at so so gpu sharing is comes up quite a lot in terms of you know, because you've got these very costly um gpus and, you might have say, with some of our partners like um from bloomberg and kfc.

B

We've got thousands of models which are say very slightly different and they all um skype learn model sales or something, and they and what they need to do is really share on a single server or those models to so decrease cost, and also some of them might not be used to be used very much.

B

So that's a definitely a challenge and actually part of the care of survey project. We're looking at extension of that project to do so, multi-modal serving is what we call it to allow you to have multiple models on a server and so pack them in to one server and also there's other things. We're looking at like volcano, which has a gpu scheduler to allow you to share gpus and stuff. So it's very early stages and I think the actual volcano gpu schedule is also very much in alpha.

B

There's interesting stuff there that are coming back from our customers that we need to look at um stuff like edge, I'm sorry as like a company and also the code that you've seen we're not really sort of edge focused at the moment, but that's definitely uh which obviously, we all know, is growing and um be great to get your feedback of what you're.

B

Seeing from other people who probably presented, because I saw you had some cube edge presenting some weeks ago so interesting to get your feedback on that point actually, rather than me, I think we've certainly seen customers um in that area. Definitely um and so general model. Optimization, is something we're looking at into the future. So you know so. Customers will sometimes want to just give us the model. It could be just a tensorflow model, but then we can do various optimizations optimize it for various endpoints. It could be for edge.

B

You know, sort of and or other ways of optimizing the models you know perhaps take a skype in their model and turn it into one that can be run on a cheaper you and stuff. So there's lots of interesting stuff there that we're looking at and then just one uh final thing is to shout out about care serving and sort of general machine learning data plan. We're doing so.

B

As you said, we've created a general machine learning protocol, uh which was is then going to be supported by their various people in the industry, supported by nvidia, triton infant server, present, seldom self care serving and and then hopefully, in the near future, torch server. Facebook doing some work there. So that's an interesting development of cool. So hopefully I haven't taken up too much time, it's probably longer than I thought, but they're happy to open up a discussion or any points.

A

Thank you for the presentation.

A

It's very interesting um space and then yeah in.

A

You know, I think, a lot. A lot of people are moving more towards. um You know having more of a cicd type of system where you can have like um you know, have the data scientists create some of these models and have some of the the cluster operators, so the devops uh uh people in an organization handle some of the serving parts. So I think this kind of fits them in there to to fill that gap like.

B

Yeah absolutely yeah thanks yeah. I think it is quite a big need, definitely as hopefully like the slides at the start showed um so yeah. Definitely.

A

So what question did I have that I is a seldom team um looking at maybe having one of their projects join the cncf for being part of the cncf.

B

uh Yeah, I'm certainly we are thinking about it um in terms of several projects, but we can't say which but yeah we're, certainly in talks with cncf and the guys lfai um so yeah. um I think that's, probably interaction, we'll probably think about some of the projects.

B

A

Great anybody has any questions that don't want to be the only.

C

A

Ask most of the questions.

C

I guess I had a question about the kf serving it's part of kubeflow.

B

C

uh Components that are also part of coke flow, or is that that is I I somehow had the idea that all of seldon was in kubo, but I guess that's not. That's are all the open source parts. I guess that's not right. Yeah.

B

Yeah so seldom is like separate from kubeflow, but we have integrations in queue flow. So if people want to use seldom core in queue flow, they can and also the the project kfc is actually in keyflops. Yes, it is a bit confusing, so uh I'm so careful serving as part of the keyflow so domain and that project is inside um keyflow for serving. So it's all both for the two projects one is outside, but you can use it and one's inside being developed on there.

C

So if uh seldom became part of cncf, then kubeflow might do that separately and they just interact together. That would be the way to do that.

B

um Yeah, I suppose it's quite separate, I suppose, kevlar, because it's so under uh so google, I think probably it's up to them. I think there's actually an open discussion in in queue club of how the individual projects were, what the governance would be, whether individual projects could then move into cncf or lfai or. However, I think google is interested in keeping it together at the moment.

C

A

Yeah, we, I think, uh with cube edges, there's also a good fit there uh you can maybe edge can be used to.

A

You know manage some of these workloads and then you know some of them can be used, maybe to send the these um workloads or or the the models and the the serving mechanism over to the ah right, where maybe you want to have faster response time and I've seen some of the use cases where you know you, you do some of that inference on, say, stop sign or or maybe license plates and, uh for example, right so yeah.

B

Yeah, absolutely I mean it's not a. um I know too much about cube edge, but it's something this part of our research path to see how we can get closer to those guys and see how we can collaborate.

A

Any other comments.

A

A

Once twice got some people on the call that are very.

A

A

Okay, well, thank you very much. uh Okay,.

C

A

Thanks for all the insights and yeah, let's keep in touch.

B

Absolutely absolutely yeah thanks for having me here. I appreciate it alright, thank you.

B

B