Continuous Delivery Foundation MLOps Special Interest Group, 2 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CDF SIG MLOps Meeting 2020-07-02a

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

A

A little later.

B

On Terry, how you doing yeah.

A

Good thanks and you can't.

B

Complain doing all right just see if anyone else turns up in a few minutes.

C

C

You so and kitties on the call today you've met Terry. Have you met Michael, oh I nicked, one of our very involved community members, who's generally I, think insects and Michael Neal is one of our co-founders.

B

Nice to meet you well I guess we could I guess we could start I didn't really didn't have much to follow up from last week, I did spend some time fleshing out some more of the technical requirements technology requirements in the roadmap. There was a few sections, I sort of skipped and I thought we could talk over them today.

B

There was a lot of interesting stuff in the news on bias and and AI algorithms. In the meantime, I know we were talking a lot about it last week, and so that was interesting. I put a few links in there in the in the notes for it, because I know people are interested in that there was one other thing I didn't didn't put in there.

B

Maybe I could no I'll find the article on reinforcement, learning so I know, we've talked a bit about the different types of supervised done, supervised and reinforcement is sort of the three big I guess classes.

B

There was an interesting article and paper from Facebook that I read on how they've replaced some of the supervised learning, with reinforcement, learning for things around push messages for groups and administrators of groups so deciding when to notify people that there's something that might weren't there attention they used to use supervised, trained algorithm that they would refresh every so often, but it's there was I think they explained why it wasn't optimal to do that, because there was short term staff and long term staff and things you might be interested in later on, and they found that a applying reinforcement learning actually helped them to knit better, but they do mention it had a downside of being even less explainable, which is it something that Antares brought up before that you know in theory, reinforcement learning, ideally wouldn't be any different.

B

You still should be able to go back to a point in time and explain how it got there, but Facebook being Facebook. It's not really there I think. So. um If I could find that article again I'll add it I'll. Add a link to the paper. I thought that was interesting. um I've got it here. It was a link to sort of the so enterprise applications of reinforcement, learning which I hadn't really seen before.

B

It's not that well-known, it's usually more used in gaming or everybody used a bit in in some of the self-driving stuff, at least sort of learning, with simulators a bit in sort of enterprise settings you don't. If you're using, if you're, applying machine learning to some domain, you might not be able to simulate the environment, so reinforcement learning doesn't usually work. That's that last link there there was an interesting read: yeah.

A

The problem that we're facing at the moment is that these types of learnings are very context dependent, so you you can get the system to learn things and those things will be useful so long as the context remains pretty much identical.

A

But if the context shifts this currently no good mechanisms for machine learning systems to recognize context, and so you, you actually then end up with models breaking very quickly because of of a shifting context. So, for example, most models that learn things from people's behaviors pre covert are now completely broken, because everybody's behaviors changed.

A

Because there's no ability for the system to recognize a shift in context. It's it's! It's very hard to to manage that that scenario, because you can, you can throw everything away and and start from scratch relearning. But then the context is gonna switch back again and then you've got the same problem in Reverse and you've lost your original model.

A

So this there's still quite a bit of work to be done in in terms of enabling those types of learnings to go on within.

A

Recognizable contexts where the the solution is learning the context, as well as the patterns of behavior, so.

B

Does that mean you could have potentially multiple contexts? Specific models like you've had you've trained a model previously like. If you could enumerate these contexts, I'm trying to think of an analogy like we've, all or probably all you know, you've got to do a a talk to a group of people. These are customers or a community or whatever, and you kind of have to calibrate, based on your audience like what level am I talking yet and then you kind of swap in the context.

B

Okay, I mean I, took it this way that that's not appropriate for all audiences, like I. Imagine a similar solution for this, which does complicate things a bit if you're gonna have multiple potentially concurrent models deployed, you're, not just replacing the old version with the new version, but you've got concurrent ones that switch in and out. Yeah I could see that have not.

A

Certainly, humans are incredibly context sensitive and, and so you see behaviour where, where people people will completely shift their behaviors in different contexts,.

A

Being with their mates down, the pub will be a radically different set of behavior. Yes, but also you see situations where humans are able to do certain things very successfully in one context and then are somehow unable to live rich that same behavior in a different context. So so the the same patterns do exist within natural neural network systems that that, if you, if you get the context flagging wrong, then you can learn something. But you can only do that thing in the context in which you originally learned there.

B

I've seen I was just trying to look up then I've seen ensemble neural networks where there's multiple networks or models, and there is also a neural network itself that decides which one to use but I, don't know how much those practically been used, but they sound like they've, been a solution tried to you know. If you do have different contexts, then it's it's it's a it's. Some learnings involved even understand that I mean just just the same as us like we learn when you're young when you're a youth.

B

You behave foolishly around all sorts of audiences and you learn. Okay, it's not okay! To talk that way. You know at school or or at home, that's completely normal to have different tones and that's, but that's something we learned. It's not built-in, so I think that's the nutrition there.

B

So there is, if you look at solid neural network, there's different articles on that, but I think they're viewed is very, very difficult to do, because you know it's hard enough to train one, but if you're training a bunch of them and then trying combinations of them yeah. If you knew what those contexts were, then you could kind of partition. Your problem often like, if you knew like you're in wartime or peacetime or pandemic time or you know, and so I haven't any time you could prepare.

B

But you know I, guess that's yeah, so that's kind of interesting. So the I guess that's I'm. Trying good analogy for you know, for you know software. You know traditional software when you would deploy it. That way, you wouldn't normally have that many different versions for things running well, maybe yeah I, can take you to analogy and so Farrah phone. You would.

A

B

C

Your plugs well.

B

I was thinking feature flags is one of them. You might go. Okay, we're gonna segment, customers, this wings playing yeah, yeah.

A

Modal software packages, where no you're you're operating in a particular mode, and then you you switch context and switch the mode of the package and then it's it's doing a completely different function.

A

Getting some level of metacognition into the machine learning models to recognize what context is actually valid or to detect a changing context and therefore train a fresh model rather than.

C

Go on I interrupted. You know.

A

I'm just gonna say you know that it would be useful to to be able to have the system automatically for Kumada land spin start training different flavors about in different contexts.

C

Yes, and did you not speak about this during office hours, I thought that you were doing some of these parallel trainings for even different aspects of a model and then combining sort of best memory, no best results. Yeah.

A

So we've certainly been looking at some of the ensemble stuff.

A

So it's it's. It's going to be helpful to be able to build things in in that way, but again, there's this big shift between doing sort of structured and controlled training mechanisms that you can do within a CI CD environment versus your completely unstructured learning, which tends to happen at runtime in an unsupervised context.

A

And it may be, what we need to do is look at ways of, rather than building an application that learns at runtime. You build an application that is reporting, real-time metrics back to its own development environment, where the the changes to the model of being reintegrated under the control of the CI CD system and then redeployed in near real-time, put in a more controlled fashion.

A

B

This you get the yeah, you get the same end result, but you're you'll you've got the pipeline in the loop there, so it it's as if it was, you know, supervised training triggered by a person going or a timer or whatever going against time to retrain a model. It's it's happening. It's the same mechanism, it's just it's being triggered in a more continuous way, so that I guess the pipeline in that case becomes part of the production.

B

In a sense like you, it's just you just you got that the machine's always turning, and it's always upgrading versus updating in place at runtime, yeah.

A

And that gives you that more emergency control, you know in the event that your system suddenly learns to swear. You can reverse it back to a previous instance model with with minimum disruption.

B

Yeah yeah I've got it yes, I'll just take some notes: yeah. Well, that's yeah, I. Think.

A

This probably warrants a separate section in the thing that challenges table.

B

The challenges or the requirements.

A

Will need to define what the challenge is.

C

And what metrics are you in general? Looking at Terry, because I mean you couldn't get accuracy and loss, but this is rather different. This is more like a real world. I.

B

Guess it's the real it's the actual feature daughter. Isn't it like real users, data yeah.

A

So this is this is where actually gets really interesting, because for those those sort of self learning models you you have to be very careful about what you give us metrics, because the system will seek to optimize the metrics that you've, given it whilst ignoring everything else.

A

A

It's very very easy to.

A

If you're doing positive or negative reinforcement, so if you're saying you know, work towards this reward or work away from this punishment, the system will massively over optimize on that often to the detriment of the task or it might decide to. You know, opt out of the of the problem once and for all. So you know if, if you, if it identifies the pattern of punishment, is coming from a particular source than it may refuse to do anything that involves engaging that that source or it may try and remove the source. Yes,.

B

The Machine version of humans, taking painkillers to to make pain, go away and increase pain, pain, pleasure pain, Freudian, sort of thing. It's just a finding, a local Maxima that that's it's a naive, optimization.

C

Thinking I'm like this is like the worst of human behavior. You know.

C

B

Because great reinforcement is is unsupervised I, I guess it can do some of the job of supervise, like you, give it goals and rewards to try and optimize something but you're not whereas we have supervised I, guess you're explicitly, telling or giving you know, say rows of data or records or some data. That's labeled saying this is this category, or this is good. This is bad. This has this score, or this score or there's some judgment.

B

That goes along with the example and then it kind of fits its view of the world to match that, whereas reinforcement, you're, sort of saying I would like this value to be as low as possible and here's the environment you can play in. Maybe there's a simulator. Maybe there's not that's sort of that link that last link in the agenda thing there's an interesting write-up on it, nothing the interesting bit, but it's the idea is to achieve the same sort of thing. It's just.

B

It might have unpredictable results because you, you told it to optimize something so like I've seen, there's a company that does this to optimize settings for auto scale or JVM flags. You know things as mundane as that using reinforcement learning and they run it in a staging environment. While your applications running under a integration test workload, but a naive version of that. If done wrong it might you know you might think okay, I'm gonna tell it to use as little memory as possible, and so it decides well. I just won't run your app.

B

So I've answered your problem. That's it's like it's I guess you! There are a bunch of changes that that's probably an obvious one, but you could you could have it come up with something? Unfortunately, all the swearing problem, I think I like to call this swearing from one, but that Terri kind like that time. Microsoft bot is one of definitely one of the the the technology challenges for this kind of learning, which is emerging, yeah.

A

Yeah, obviously you see this in in a business environment, give teams a set of gave the eyes they will optimize those and everything else, except that they typically operate within some sort of overarching moral framework. So they're self-regulating against that framework, but yeah our machine learning systems don't have.

A

B

No common sense: there's no yeah.

B

Yeah yeah, it's still an emerging field here and yeah yeah I'm wondering if I don't know Carrie. If you know anything about transfer learning, I've heard that mentioned a few times where you have sort of pre trained models with some basis of knowledge and then anything you do you add on top. Maybe that could be done by having some sort of hardwired daughter in there. I don't know. But have you come across that in your travels I have.

A

Experimented with it myself, but I think there will be some some interesting opportunities in that area.

B

Yeah, because that might be a solution to a lot of these things, it's absolutely pre-trained common sense. If you will forgiving environments or contexts yeah of.

C

The off-the-shelf Maru being provided black cloud providers and and suchlike are, are pre-trained, you know, and then you take them and you either apply them directly or you do some amount of transfer learning.

B

This I've seen there's sort of a marketplace: I mean you're, really familiar with Google I. Think sage maker has some too, but there there are a bunch of sort of vertical ones and then there's a bunch of sort of like I've, been using some natural language stuff from Google, which seems to work pretty well, but I haven't I, actually, don't know how it works.

B

They don't really they'll sort of tell you how it breaks apart, syntax and those certain things, and but there's no underlying explanation, which is you know not great, but I haven't I, don't know if people do that yet like, but there certainly is a marketplace for things. You can sort of find one that you know will categorize certain things a certain way and from what I've seen is that there they come with.

B

You know the training data or the scripts and the the notebooks and everything, and then you can try it out and adapt it, and then you kind of change the data a bit bring your own daughter. I, haven't really seen sort of binary models that you bring along and then transfer your stuff into it, but Amazon claim they do that with a cog. Cognito is one of their natural language.

B

Api's has maybe it's Google has an understanding of you know in different languages, but you can also bring along your own set of training data in you know, either at a beulah format or whatever that's categorized or ranked in some way, and then it will learn from that, and it says it transfers that on top of the existing model, they've got so I. Guess there are is commercial examples of it out there, but I haven't seen like he's one for working with. You know automobile insurance, and it's got all the common-sense in there.

B

It's not going to do anything. Stupid, I haven't seen any sort of marketplaces for that yet I mean.

A

The reality is that for most commercial applications of machine learning at the moment you have a certain model which is detecting certain features and then, following on from that detection, then you've got a bunch of conventional programming to try and catch and block all of the bad decisions that you're also getting out of the model. So it's a general case plus a bunch of hard-coded exceptions too, to manage real-world scenarios.

A

And, actually, that's probably something that we should touch on in the in their roadmap as well, because that will be a very common pattern. Is the need to be able to deploy a model but then wrap that model in in a bunch of sanity, checking and sort of emergency cutouts.

A

B

Is that a challenger technology is a technology requirement to have a well.

A

Again, the way, the way we structure the document is, we we specified the basic challenge. This is the thing that you need to be able to do, and then we expand on what that would mean from a technology perspective in terms of.

A

Design in the future well.

B

I'll, take a note and look at the document. I'll try and add those two challenges. One is the swearing problem and one would be some sort of emergency cut out. They're, probably related. You might. If isn't one thing, but I'll have a go at adding that to the challenges for this section in the coming week.

B

Do you wanna open up the document on your screen Terry and a few questions on the technology requirements.

B

So if we go down to the first TBD.

B

So that oneness is providing mechanisms by which changes to training sets training scripts and so does represent a orderable across their full lifecycle, um very closely related to versioned appropriately like what I'm, having trouble sort of understanding. What auditing means in this context, like I, often think of order Jing is like well, if there's a clear record in some something like sign to get commits, then that's pretty good for a lot of legal requirements and that like what's what? What are you thinking here so.

A

They look back at what the original challenge was that that goes with this.

A

So that was that one so so this shows you some of the the scenarios that we need to manage and in fact, though, we touch on the the self learning in in in this area.

B

It's a bit blurry your screen, so I'm, not sure which one can you read out the left hand column paragraph of the challenge. Okay,.

A

So let me just see if I can share that in.

B

Maybe the resolution is it just me: it's blurry so could be it's.

A

Because it's doing video compression on on.

B

B

If you explain it, if I was the case, oh yeah, yeah, that's fine, yeah, that's much better! Thank you.

A

So what we really need to expand on in the technology requirements is.

A

What steps you would you would typically need in a process to allow you to to do a root, cause analysis or to support a legal investigation into an insulin.

A

And I would also expect that we'll we'll see move towards legislation that says that third parties are required to audit sensitive decision-making systems. So you may see that some of the large of accounting audit companies moving into providing you know ethics or bias audits on decision-making systems for their customers and in the future. So.

B

That to me that looks like it touches on the legal stuff, compliance and possibly explain ability would explain ability, be your requirement. Oh I, guess we don't know so.

A

In this particular instance, what we're? What we're looking at is how we would how we would create an audit trail of changes across the system.

A

So so this is where you actually need to have a chain of of Records, because you you, you may be adding new training data to your training set, which has then triggered you to run a new training which then gets you a new model or which you then deploy, and then something happens and you need to be able to work backwards from something happened to what actually changed, and so the root cause would have been the the new training data that was introduced.

A

So you would want to be able to tell exactly what the difference was in the in the training data. For, for that particular insert.

B

Right so the ordered auditability are there any I know things like sarbanes-oxley: triggered sort of a push towards configuration management. You know ten twelve years ago being sort of standard across the enterprise. Are there any sort of existing?

B

You know compliance laws or practices that you know should be mentioned here. I guess this is the technology once they don't really like. If it was sub owns awfully you wouldn't mention, he said yeah that doesn't matter I.

A

Think there's very little formal legislation at the moment, a lot of stuff in discussion, so a number of groups pressing for Ethics analysis and bias detection.

A

There are obviously a number of lobby groups arguing that we shouldn't be using machine learning to displace people out of jobs.

A

Over the next few years, much of it quite knee-jerk in reaction, so I would fully expect that the compliance load on organizations will go up very steeply when it comes to managing anything associated with machine learning, and therefore there will be a long-term need for mature tooling. That helps you to manage that the end to end problems.

B

So the next row, after that, the treating training sets as managed assets under ml ops workflow.

B

Is that sufficiently different from like two ones above as providing mechanisms by which training sets training scripts and service wrappers? They all even versions like what's the restraint, treating it as like? What's a managed asset, I guess I'm asking is that any sadder is this kind of a big data problem like training sites could be? You know gigantic yeah well,.

A

I think the difference here is that one one thing is a is about being able to track an end-to-end view of change, whereas the other is actually about being able to identify differences in the first place.

A

So the the big challenge is in in that space in the in the in the data space right.

B

In the second case, it's about the right, so you so I managed us. That would be so say say: you've got a change in your training set and like if it's source code, you do a dish. If it started you don't really do a different less it's a trivial change, but you might you know you there's some statistical report like. What's this, you know portion of data in these category versus that? How does that differ to the old one like? Is that what you mean by I managed asset like you? Well.

A

Know it it's it's more like if you've got if you've got a set of compiled source code, that is an immutable asset, fixed version.

A

So you, if you've, if you've, got version 1.1 as the sort of a compiled application, then you know you can you can obviously check the you put check sums on on the set of bytes that you've you've got in the data to validate that? What you've got is an identical copy of fixed asset that was Bradley's organization, yeah.

C

A

Have a situation where, where you you can guarantee that what is being used is an explicit snapshot of a particular asset.

A

But in the data space people are much more used to using mutable data sets.

A

So in many cases, they're they're just running a training on a point in time on a mutable set of operational data and and that function time may have been yeah a given, milliseconds and and you're actually unable to to identify what data was even in the training, because it changed a few seconds later and nobody kept any record of all.

B

A

The training right.

B

So the requirement is to do, is sort of reproducibility and like like yeah the checks on my dear, like you didn't yeah, okay, that makes sense and.

B

And you know later on, the solution might be well. If you're extracting data to learn from from a mutable source, then you're going to need to keep a copy of it. If the underlying you know you might be using a relational database, it's not using a ledger or anything like that to keep track of all all changes, in which case you know the solution. In that case, is you just going to have to have a copy of that data? Somehow that's kind of the implied requirement there that.

A

The game's the solution rather yeah in terms of the way in which you want to structure your your data management, so much more looking towards tools that can actually do fast, lightweight snapshots, yep.

B

And then copy copy on write things deltas or whatever it is yeah yeah, there's lots of solutions for different sizes, dota, yeah, okay, that helps so moving on to the next one section: the managing security of data and the MLS process. We particularly focus on the increased risk association wears aggregated data sets used for training of batch processing, so by an obvious example of that would be personally identifying information, or you know confidential with I think we talked about this before confidential information or information. That someone has a claim to.

B

Is that right, like it's so security security from from the point of view of exposing daughter, I guess? Is that what it means like you, you need to have access to this data to train a model, but the raw data he is riskier to expose like it might be less risky to have the model go out there. Just like people are more sensitive about source code in a proprietary world than they are about the binary. For obvious reasons like this is kind of the analogy there is that or is there what Judas than that.

A

This is this is one that I've actually asked the the cube, though guys, to to to write some pieces on, because obviously they've had a a recent experience with with this and it'd be useful to get their input on on on this particular area, I mean the example that they have was actually on on Microsoft's cloud, where what what they've done is given a an option to people saying you know tick this box, if you want to want to expose a dashboard for the system, but if we're going to do that, you're gonna have to manually install a bunch of security systems around it.

A

And, of course, what happened was everybody ticked the box and nobody did the manual work to put the security around it and as a result of people, were then using that as a vector to take over people's CI CD environments and use them for mining cryptocurrency.

A

So now it's a often a case that the is a trade-off between convenience and security and and you you actually have to potentially set the the convenience barrier differently for systems that are going to be used on. You know target rich environment, so you know we. We probably need to expand on this section to to flag the fact that machine learning applications are typically going to be in more challenging areas of the customers, business and generally higher risk, and therefore.

B

Also gonna Rasta gonna make you rich they're. Also gonna require a lot more daughter and lot more daughter. It means a lot more exposure like it's there they're not like a micro service. That's stateless that will just get pushed the data makes its you know, does its calculation, you know for some interest rate or some brokerage fee or something and the the amount of data that you sort of collect and message and extract and maybe store, is you need to keep it stored somewhere for training and model?

B

Louise is so much bigger than that, so that having the data exposed itself is a risk that yeah you might not have.

A

The reputation or damage associated with a breach in in this area is going to be much higher. Yeah.

B

A

It's going to be a breach associated.

A

Information yeah.

B

It's not just people cracking in and running crypto and miners there yeah doctor can be exposed to me. So so, when uh one thing someone asked me about so I was looking at training from issue trackers and things like juror. You know sort of data in that semi structured format, some of which is text like the names of things, but then there's other categories and priorities and project names and labels, and all this other stuff and people going well.

B

Could you obfusco that data before you know, as you extract it out of the sort of the original system as you extract it and prepare it for a training run? Could you obfuscate it as part of that, like we're sort of one way hash, because the you know the mole doesn't really care that the projects called Project X. You know it could just be some arbitrary value as long as it can map I, guess, I! Guess that sort of begs the question of maybe those that of hashing is not really cryptographically secure.

B

So it's a bit of security theater, but I thought that was an interesting point. Someone brought up I, don't know if people actually do that you, obviously if it's numerix and you you have to kind of keep it, you need to preserve the scale. You can't hide numeric. If it's, if it's plaintext, you want to do an LP on yeah.

A

Well, typically, in order to do your processing, the first place you're actually having to encode all of your data into a numeric form. So it can just do maths on it.

A

So, yes, they're, very there's a series of steps that you would do to redact an information out of your training sets and then to encode that information, but this actually throws up another problem, which is that that same process then also has to happen operationally in order to be able to use the model.

A

The all the processes that you apply during training also need to be transferred over with.

C

A

Operational system- and we have got another challenge in the table that we need to address, which is that problem, but yeah there are. There are issues in.

A

Just substituting a number for a name is often insufficient to generate real privacy. Yeah.

B

It's not yeah, there's lots of science to show how things could be because fun fundamentally it is. It does match to something in the real world, because you have to do that when you do a prediction or feed something to the model. It's kind of a one-way thing: I guess it's a defense at most. Maybe it's a defense-in-depth thing where it's a little bit more opaque.

B

But having said that, if you so say some of the previous pro as you need to be able to prove how your model was trained, then you need to be able to point back to the original source data and it's unencoded form. You know in the case of an audit, to go like here's, the here's, the set of you know insurance claims that we fed it on july 2019.

B

You can't give just the encoded form for that, because you can't go you can't its. Unless these things would be one way, you can't necessarily go backwards like if the original data is gone because it's a mutable store, then you would effectively be inaudible, like you wouldn't be able to show it's like I guess, yeah I, remember, I was saying that I guess he kind of it doesn't really completely help, there's no way to just totally obfuscate it just by design.

B

A

Situations where there will be pieces of information that are rare enough, that they will be identifiable if you have a certain number of them. So if you know roughly where somebody is, you know at a county level, and you know that they have a certain medical condition or some other rare distinguishing feature.

A

Yeah you're only going to get two or three hits of that combination in that area, so you were already nearly into the data set and in many cases what you can do is adversarial attacks on the model where you're you're putting data in which you expect to trigger against certain scenarios and therefore you can. You can actually detect whether someone is in a model by manipulating the inputs that you.

B

Well, I think there's some make to chew on there and after those there's about six or seven more to go.

B

Some of those other ones are more technical, but I think I think might call it a day today, I've just gone past the 45-minute thing so yeah I guess is there anything else anyone run to throw in at the end like this stuff that we can go away and chew on I know, I've got a some action items to add some challenges and flush out some things, but there was nothing to follow up from last week. Was there anything else.

A

No I think we've. We have several contributions that we've merged back in yeah.

B

I saw that I was good and I'll just take in note of attendees for the minutes and yeah. Well, thanks. Everyone for for hanging out yeah are always interesting conversations and uh see you next time.