Continuous Delivery Foundation MLOps Special Interest Group, 12 Aug 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CDF - SIG MLOps Meeting 2021-08-12 (2 of 2)

Description

For more Continuous Delivery Foundation content, check out our blog: https://cd.foundation/blog/

A

A

B

A

Yeah good thanks, I'm glad to have you here.

B

A

So, uh um have you been along to uh one of these sessions before.

B

uh I've been to a meet up for the uh interoperability uh just last week, um and this is my second uh meeting of course,.

A

Okay, great well, um what we're doing here really is just trying to uh coordinate some of the work on the ml ops roadmap, and I don't know if, if you've had a look at the roadmap document,.

B

uh Not a document but evernote in.

A

What we're doing here is trying to collate uh a set of documentation uh which basically paints the picture of all of the challenges that you can expect to encounter when working with machine learning assets in production, environments.

A

And so the uh the mlx roadmap document uh sets out our vision for what ml ops should be within a breast practice environment.

A

It lists all of the challenges that we've encountered to date.

A

And it proposes a set of technology requirements that are necessary to be able to successfully address those challenges, uh and then it uh it also tries to monitor the uh the available solutions and potential solutions out there.

A

uh So it paints a picture of uh you know what we have to be able to solve versus what we can do today uh and when we expect to be able to uh solve some of the other challenges in the future.

B

Right right here.

A

And the idea really is to encourage teams that are working on uh continuous delivery and ml op solutions to actually fully understand what what the customer problems are in that space and what capabilities we need in in the tooling uh overall, uh so that people can actually achieve what they need to be able to achieve.

B

Right um I've been interested in mlaps for some time. I guess this year I have some background machine learning and that kind of part, but now I'm moving towards the cloud and much more developer operations, kind of side of things and that's how I kind of found out about the cd foundation um and uh before this I spent some time looking at what emulates first, I think there was a conference over a cdcon and there was a presentation over as well, so the current consumer captured my interest and I've been thinking like as a developer.

B

I would like to be so sufficient to be able to make some products on my own like projects right um and in corporate machining into them. So that's why I really wanted to understand uh how machine learning operations kind of work together, um decelerating all technologies out there like what's practices. What problems do you guys have?

B

What kind of issues you guys run into? What kind of issues are there industry to explore?

B

So that's kind of my like, where my interest comes from um and that's kind of what I'm trying to look ahead and learn from the uh sick here, um just trying to like understand what kind of problems they are and then, and if there is some way, how can I help to contribute to that um to be like? If I can share my knowledge, you if I can help with my time, they're pretty good as well.

B

I'm just student, I'm I'm in the last year of my university, uh so kind of exploring this, these different spaces and learning about these and these kind of things, because open source uh is really something I'm interested in. So that's about me.

A

Okay, thank you. Well, there's there there are lots of um currently unsolved problems in this space. So uh it's a it's a fascinating area if you're looking for something to get your teeth into, because there are lots and lots of challenges and relatively few people really working on them.

A

So I expect we're going to see a lot of really interesting projects over the next few years, as people start to really get stuck into fixing. Some of these things.

B

um So I I have a question about envelopes right. How is this different from data engineering, virtual machine learning right? I'm expecting that you have a component for machine learning. That's somewhere between the pipeline that you have with data engineering like you have some ingress. You have some egress, you have a pipeline um and then you're probably doing some predictions for the machining uh component. Is that right.

A

So that's that's close to what we see as being uh best practice, but it's actually a very long way away from what people are actually doing right now.

A

So, um to give you a little bit of background um a lot of what goes on in the machine learning space at the moment has come from the data science field rather than from the sort of software engineering field, so uh in in that space uh you're. Typically looking at a group of people who come from a mathematical background and studied.

B

uh Your voice is cut off. I cannot hear you.

B

I'll rejoin and maybe finish on my own.

B

B

Hello, can you hear me.

B

I still cannot hear you.

B

B

B

B

um Can you hear me now.

B

um I'm not sure I can hear myself and I can listen to audio on youtube. I check the zoom and it's kind of recording for me. I'm not really sure why this is happening.

B

Okay, can you hear me now? I can hear you now. Thank you.

A

Okay right, I've switched to an alternate microphone must be, uh must be a driver problem.

B

Oh okay, ivan: are you like a linux engineer.

A

uh I do a bit of everything.

A

That's interesting.

A

So uh yeah, so what I was saying was that um mostly or the majority of people who are working in a machine learning field at the moment have come from an academic background where they've studied a lot of machine learning techniques and statistics.

A

But I have only had a limited training in software development and no production engineering experience at all.

B

A

So what they learn in that space is to use jupiter notebooks as a sketchbook in which to test their machine learning. Algorithms.

A

And then the the assumption is made that what you need to do is take your jupiter notebook and put it into a production environment and the process of doing that is called ml. Ops.

B

A

Now, in practice, that's uh not a very good strategy for productionizing your machine learning models, because it's very hard to make it consistent and reliable.

A

It's hard to um hard to make it actually a repeatable process with versioning, uh and uh there are no a number of problems that you can run into uh if you're relying on you know, dupes notebooks as the way of managing the the essential parts of your process.

B

A

um And then there are, there are a whole bunch of broader challenges that expand out from from that, so uh so in in an academic situation. Python is really useful because it's easy to learn and it's got lots and lots of machine learning libraries associated with it.

A

So you can. You can get a huge acceleration on being able to make things work.

B

A

But of course pyson is a scripting language.

A

Which means that it's inefficient and it's also uh very difficult to manage dependencies in a python environment.

A

So so, if you're, relying on sets of libraries, it's hard to to keep those under control over an extended period of time,.

A

And you've also got the challenge that, because you're using a scripted language, you're deploying your source code into production environments.

A

So anyone who can get access to that environment can potentially tamper with the source code and affect the running of your application in in production. Without you, knowing that that's happened,.

B

Right, yeah yeah.

B

That makes a lot of sense and I think that's pretty important as well, because you may also want to induce testing in between them um and then there's also the concept of a b testing, um as in try and test rfq model performs the same as you did some months ago, and it's doing it right now and maybe it may be making some predictions that uh you may personally not expect, but the machine learning model is giving out and you may want to test like the data operations are correct or not, if it having fed into that.

B

It's not um anything. Some pilot process is a bit like trying to understand what data was used to create a kind of model as well and trying to make sense right. Is there any difference in data that's coming in the future, as in data versioning right that we have some systems over there uh like we have where source code versioning would get so your bit bucket do stuff like that. um So I think that's also a component of it, because I've seen people treat machine learning models as black boxes, where you just given some data.

B

It gives you some output um and, most of the time I've seen innovation come from people learn to do feature engineering properly, where they manage to extract the right properties and the features from a given data set. I think that's where the most value I think comes from um when it comes to deploying models and testing out the accuracy. Obviously, every organization would want. I don't want to do the best, but there's 100 machine learning. It's never really possible.

A

Yeah you're absolutely right. There's there are some very significant challenges um and in fact the data versioning one is uh very easy to say, but to actually implement. It is incredibly difficult because, if you've, if you've got say, 10 petabytes of training data, then the practical challenges of managing versions. Snapshots of that much data, um especially if it's changing uh significantly on a on a daily or weekly cadence.

A

uh Then you you, you run into massive problems or of just trying to keep a track of the training sets and the test data sets that you've been using for particular uh runs.

B

Right, so um so what so, so? What does eminent always propose as a solution right? um There are different solutions, um some like coming from, because I also have an interest in software architecture and database design stuff like that um and coming from that perspective, I think there's always a trade-off when you have to make when you're working with such a data set, you have to think about backups. You have to think about versioning, you think about um if everything is being consistent and there's also property of assets, if you're doing such kind of property.

B

um If, for example, uh you have an application that has some kind of streaming kind of machine learning where you say like facebook right, facebook has a lot of different uh mechanisms built into it and some functionalities or features try to guess. You recommend you something based on what kind of um drinks like things are you liking your posting or stuff? Like that, I mean I should recommend you the same kind of things in the future as well.

B

So there's also the idea where you have to continuously get data from the user, and then you have to process the output or somewhere, and then you have to make it a pipeline in such a way right where you had a machine learning model and learning is from you know, new data that is all coming into it at the same time and it has to give some predictions on this of that problem. So it's a very changing kind of model um that you have to think about as well.

B

So it says like a problem in machine learning, machining operations, so yeah.

A

There there are multiple problems in in this space. um One one of the real challenges is that.

A

You've got two different categories of machine learning if you like, you've got the type of machine learning that is run offline against uh pre-classified data, where it's easy to put that through some sort of pipeline and manage the whole process end to end.

A

But then you've also got a class of learning which is uh happening online.

A

So the system is learning from its inputs in real time and is retraining itself dynamically on the fly and at that point, you're operating outside of a build environment, you're in your operational uh environment and and you're you're training in production.

B

A

So so so there's this this there are major differences between those two solutions uh and um it's it's difficult to do things in the first scenario and very difficult to do things in the second scenario, uh and so in a lot of cases right now, people are just not bothering to do the a lot of the basic best practice to to you know properly test and audit things.

A

So it's uh it's the wild west out there in in many cases right now, there's uh um the quality controls are, are limited and we're likely to see significant numbers of large incidents over the next few years. As you know, some of these problems start to turn into issues.

B

um So I can see people from software engineering coming in and learning about how to deploy. Machine learning model is exactly possible because I think businesses will want to um put features that can they like they can get from machine learning um into their products as soon as possible, and for the principal of agile that we have weaver, obviously likes a solution that would uh remove the barriers and the hardness that it is to make a machining model.

B

This doesn't have to be uh the soda right, the standard, the state-of-the-art kind of model that we talked about in research, your kind of your academics, but it can be a simple model which, for example, has access to seven percent accuracy. I think most businesses will be okay with that um and they would like software engineers to come in and and then help them easily deploy and manage this kind of machine learning models in their applications.

B

So they can provide some kind of systems right, um because you often have uh I've like been through some machine learning courses, and we often have the examples of recommendation systems where you can recommend a user something. um But what bothered me about? Those was that uh when I was sitting down, I shoot and implement kind of that kind of thing in jupyter notebooks. I realized I was given a data that was already collected from somewhere and while I could give some protection like I, I can give you some accuracy or position over there.

B

I cannot actually provide you some input over what that kind of decision. Actually is your help. You have you, like your help me change the data in a new way, so maybe I can get a better date in the future. Maybe I can help like make the part of an application.

B

um So do you think that in the coming future there will be a large boom of that, and why hasn't there been like already been a large push? um I think evan losses is relatively not like not popular right now, it's obscure, but if you go out and try to find it, you will obviously come across it.

A

So a lot of this comes down to.

A

What's available to people right now,.

A

And that's primarily driven by or what's supported in cloud environments.

A

Now, what's what's supported in in the main cloud environments uh is based on the demands from customers for particular types of solution. So it's a bit of a circular situation. Right now in the um the cloud environments are providing jupiter notebooks in production, because that's what the customers are asking for, um and so there's there's been a drift towards that particular way of working, um but there hasn't been sufficient work to evaluate whether that was the right direction or whether it's going to lead to escalating problems in the future.

A

Hence why we set up the the road map itself so that we could paint the bigger picture, um because, if you think about it most of the work that's going on in machine learning at the moment is small teams who are experimenting to see if they can build models that will improve their existing products or allow them to release new products.

A

So they've got minimal practical experience um and they're they're just trying to do whatever is easiest to run an experiment and and try and incorporate it into an application yeah. uh So that means that most people are at the at the bottom of the ladder. In terms of the journey to having a machine learning application in production at scale for an extended period of time,.

B

A

So so they're not yet aware of all of the problems that are likely to be encountered as they climb up that ladder. uh So lots of people have made lots of um approximations.

A

By by saying, oh yeah, we only need to solve this problem in this problem, because they're not aware of the 50 other problems that they also need to solve to actually be successful.

A

So what there's there's a there's? A large number of people trying to enter at the bottom of the ladder um and statistically about 80 percent of them, are failing to make it into production, uh because the the bits that they're doing only get them so far, and then they run into a show stopper problem and they they can't go, live.

B

Right, yeah, yeah.

A

So um so what we've been trying to do is um work back the other way, rather than try and create new technologies for ml ops as a standalone practice.

A

What we've been doing is encouraging the incorporation of ml ops features into existing ci cd, tooling.

B

A

So, for example, jenkins x.

A

Within jenkins x, you can treat machine learning as a uh just an another asset in your build pipeline.

B

A

uh In fact, jenkins x will create not just your production environment, but it will also create the training architecture that you need to run your trainings.

A

A

You can use a quick start for a machine learning project and you can select a a particular um machine learning platform and a basic algorithm, and it will generate a template project for you with with all of that set up.

A

You can then refine that model to do what you want it to do, and then, when you commit your changes, it will automatically set up a training environment, run the training, evaluate the trained model against a set of test criteria, and and if it passes your thresholds for a successful model, then it will push that model into a staging environment where you can test it against the rest of your application and then promote it into production.

B

Right um so for today um maybe having someone else um and what should we talk about today in today's meeting.

A

So the the thing that's significant from our perspective is that we need to um make sure that we've got uh all of the updates in for the 2021 document.

A

So um it'd be really helpful if you were able to find half an hour at some point to to read through the um the mlops roadmap as it stands today and then let me know if you've seen any patterns or challenges that that we haven't included in in last year's document and then we'll get an update in for for that.

B

ah So you mean this thing right. I found it from github.

A

That's correct, yeah, that's the right version.

B

B

So I will go through this and I will read more about it um and if there's anything addition that, I think can be done I'll, let you know or I'll make a podcast if that kind of works as well.

A

Okay, that's brilliant right! Well, uh thank you for uh for coming along. It's been uh interesting chatting with you and look forward to uh to having you along in the future and uh having some contributions on the document.

B

Thank you. That's that's. Awesome yeah, um like I'm super, excited to be like a part of something like this, because uh I feel like some like envelopes is something that I think will blow up in the future as machine learning has done before, but instead of like the r and d part of it we're moving towards the production kind of part of it right and how we can quickly appropriate machine learning, changes in the model and stuff like that.

B

So I think mlaps is the right kind of perspective on that kind of thing um and there's something that I can unlock. I actually want to work professionally sometime, if I can, as a machine learning engineer. um So I guess a bit uh more background on me was with them. I started learning on machine learning back in last year's june and july much more seriously. I got a certification and and indeed running from deep learning.ai.

B

But after that I felt like I shall think about what are some ideas or points that I can make that have machine learning in them right and I was quickly dumb struck because I could not find any easy way of putting my models out there or I was very limited to what technology I was using. So I have like. I only knew one way of doing this in this kind of thing, but I did not even know if this came in this other technology right, because I could not find a drill on it.

B

I could not find any mentions a blog post articles on it and I don't never kind of change, because people in the industry who actually using machine learning um how are actually putting the models out there. So I wanted to learn about that and that's kind of how I found a way to make like.

B

So I think that's an interesting journey but yeah I I'll, be sure to contribute and ask you if I have any questions um that would be pretty good as well, so I've ever left with that sure for.

A

Sure, brilliant all right! Well, uh it's uh it's been good to chat to you and I look forward to uh hearing your feedback on the document.

B

Thank you so much. Thank you for your time.

A