Red Hat OpenShift London 2022 OpenShift Commons Gathering, 6 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Next Generation AI/ML Workloads on OpenShift using knative and Infinispan - Ian Lawson (Red Hat)

Description

Next Generation AI/ML Workloads on OpenShift using knative and Infinispan
Ian Lawson (Red Hat)
This OpenShift Commons Gathering was held on July 6, 2022 live in London, England
https://commons.openshift.org

A

So hi everybody, uh my name, is ian lawson. uh I am a account, remember, uh working at red hat. um The reason I'm a little bit vague at the moment is that I've just got over covered um courtesy of dublin openshift commons, which seemed to be the super spreader event.

A

What that means is that my usual ability to multitask has disappeared.

A

Currently, my multitasking involves staying awake and breathing, so if either those two stop, while I'm talking someone either come over and kick me or wake me up or apply cpr.

A

So what we're going to talk about today is basically the concepts of next generation, ai and ml, using some of the new technologies we've got as part of the openshift package. I come from an ai background. I worked for some agencies that remain nameless for a long, long time, um working on huge data loads, and so I've got a personal interest in this. So I've been sneakily doing this instead of my day job, I don't think any sales people apart from dave in the audience, so I can't get told off for that.

A

So why containers for aiml workloads wow? This is slightly better than dublin in dublin. I couldn't read the screen because I'm getting old so my eyes don't work, so I had to keep walking back to read the slide, so I can read the slides off of this one so why containers for aiml workloads? Well?

A

Firstly, it's the decoupling of application and infrastructure, and this is incredibly important because if you come from the backgrounds that I've been in terms of ai and machine learning, all of the experimentation you did bolted itself onto machines used to have huge hadoop lakes. Would you actually process and all these kind of things you'd have to keep the hardware up and running all the time, and there was a very tight bonding between the application and the infrastructure.

A

The whole point behind containers is that you remove that binding in the old days. I used to call it taint and the reason I called it taint is I used to write applications and give it to ops and I'd, say: here's my application, it's a lovely piece of java, oh by the way you have to install this jvm on the machine, and you have to install this library and you have to install this database and by the time they installed all the bits and pieces they needed for my application.

A

The entire machine was tainted. It wouldn't run anybody else's application. The advantage of using containers now is all the taint. The things that made ops hate us developers travels with the containers and that's that kind of distinction between the old days of having everything locked down to a machine or traveling with the actual container.

A

It's all about agility of application creation as well. We always forget this when I'm always talking to customers, I say well, I'm going to fire this application fire up this demo within 30 seconds. Bang. I've got a running application now. I know when I used to work in development, and I expect it hasn't changed in the real world.

A

It used to take four to eight weeks to get a virtual machine and that virtual machine wouldn't be anything like the one I asked for so this kind of concept, where you can actually just fire up things instantaneously, is amazing and that's a huge opportunity for ai and ml on-demand execution. I will get excited about k-native um which again might blow my blood blood pressure through the roof is a new technology. We've got a part as part of openshift.

A

That really makes this on-demand execution shining and when I'm talking about on-demand execution, it's the ability to spin up an application and only consume the resources you need for that application for the duration of the call, not the application itself, and I'm jumping ahead slightly at this point. But it's like, if you had let's say an application that consisted of four services. Three of those services are called once every 24 hours once called uh one's called once every 100 milliseconds.

A

If you're running a standard system or a standard kubernetes system, you have to have all four of those applications up: 24 7 waiting for those requests to come in. It's part of the design of the way kubernetes works. What k native does and I'll explain a little bit more in depth when we get to. It is actually allow you to have applications that spin up on demand and they only exist and consume resources for the duration, they're being called and finally elegantly solves the classic developer. 7030 problem.

A

Have you ever heard of this good because I made it up about six years ago and I was hoping it disseminated through the industry by this time? What I mean by the 70 30 problem is that when I was a developer, I used to get paid reasonably well well, really well in terms of development developers, never get paid. Well anybody any employers in the room.

A

Here you know developers don't get paid enough, but I used to get paid a reasonable amount of money, but I used to waste 70 of my time when I was developing, so I just spend 70 of my time building frameworks installing libraries writing boilerplate code and on average I I ended up spending 30 of my time. Actually writing that core code. I wanted to write when you use a technology like openshift when you use the technology of containers that goes up to about 95 you're, not writing!

A

Boilerplate you're, not installing your frameworks, you're, not setting up the machines to be able to use them. You get it all out of the box with openshift and it solves that wonderful, 70, 30 problem.

A

And it's not about scale, don't let people tell you all this. A and ml stuff is about scale, it's about dynamic scale and the difference between scale and dynamics. Scale, ai and ml applications need to scale up massively but not be at scale all the time.

A

If you're doing an experiment that requires, let's say a thousand iterations ten thousand iterations, it needs to spin them up, consume it and then go away. These things don't have to persist in the past. This was impossible just the time binding between the application and the infrastructure itself.

A

If he wants to set up a machine to run hadoop, that machine would only run hadoop and he'd be sat there with the hadoop like ready to process jobs, but it will be there all the time assigned to the hadoop side and with kubernetes and containers and specifically native this, has massively changed so understanding the container mindset. I love the first line on this thing.

A

Containers are file systems with delusions of grandeur. That's literally, all they are is file systems that think they're operating systems. It's a set of files, that's executed in a process-bound space thinks it's an operating system. It's just a file system with delusions and grandeur to exploit the design features of kubernetes correctly applications. Application experiments need to follow certain design patterns.

A

Now, in the old days, I used to call this sausage machines, so you write these applications that got all that all their actual configuration through dependency injection did what they wanted to do and then went away, so they were disposable applications.

A

Applications should be processors and not persisters, and what I mean by that is in the old days when you wrote an application, it used to sit there used to interrogate database used to keep its its sort of current state in memory and all those kind of things you have to get out of that mindset when you're using containers, especially if you go down the aiml route, you should write these little processes actually process a piece of data and then push the data off somewhere else to be persisted.

A

They should be stateless, they should be sausage machines and it's not a limitation when you're actually designing applications from a container perspective, because if you use a persistence method or a methodology like data grid, which I'm going to show a little example of or persistent volumes which comes as part of the openshift package, it's absolutely perfect for those who don't know what it does. Persisted volume actually expresses a file system into the back end of a container.

A

The container sees it as part of its file system itself, but when the container goes away and is restored to its original image, you can reattach that file system and it can carry on from where it was which is brilliant. It adds that kind of point-to-point state persistence, you don't get by using standard containers out of the box.

A

So introducing candidate of serverless- and this is where I tend to get a little bit overly excited so k native as a concept is simple, and I don't like the word serverless and I normally get told off at this point, because this is normally being filmed so I'll, stop myself from swearing. But when I'm talking to customers, you know they talk about serverless and they say well when you're talking about serverless you're talking about the unicorn's arse, because there's no such thing as serverless, it's someone else's server or it's someone else's resource.

A

You know the whole concept of serverless is insane: it's not magic, it's not just creating it from everywhere. We've kind of bolted on to the server statement, and I don't think we should have because the whole point was serverless or k-native side and it's a very, very simple concept.

A

Is you can scale your applications down to zero replicas and that's it going back to what I was saying about applications that normally run within kubernetes when you fire up an app, it sits in a container, sits in a pod and it waits for ingress traffic or waits for events and stuff, like that, it's just sat there consuming resources, doing nothing waiting for things to come in what candidate of service allows you to do is to tell that application to go away to literally scale itself down to zero and it's offlined.

A

It actually exists on the worker node, but it's offline to the point where it's not consuming any resources whatsoever and what we've got is. We've got two triggers that allow it to actually come back into life. One is called canadian serving and that's the standard one and for those who know kubernetes already what this does is it actually creates a service ingress point that sits at the service point for the application looking for traffic. But what that service point does is, rather than just put push the traffic into the application itself.

A

It pulls the application to see if the application exists and if it doesn't exist, it spins it up when the application spins up. It processes the ingress traffic and then there's a time limit during which, if it receives any more traffic, it remains alive, but if it doesn't receive traffic it spins down to zero.

A

So it goes away consuming no resources and that's massively efficient again. I normally get told off by salespeople when I'm talking about it, because my first pitch to customers is, you need smaller clusters, you need less worker nodes. You can put loads more information on the on much less worker nodes.

A

Of course the sales people take me outside and kick me because we want to sell more worker notes, but the other type is called k native eventing, and this is the one I really like, and this is to do with a new concept called cloud events. So, instead of actually having an ingress point, that's based on a service. It actually has events that drive the creation of the the actual application.

A

Now I used to work with eventing I used to work with message: queues used to work with kafka and all those kind of things and every single one. That's got a different protocol and everything's every single one is very, very confusing. So what we've done is we've actually abstracted it and made it incredibly simple. In fact, I made it slightly more simple, because I complained to the people that wrote this.

A

It was slightly too complex because all I wanted was basically a type that told me what type of cloud event it was and a binary payload.

A

So what the concept of a cloud event is is basically a payload and a type and that's it and the way that serving actually or the way that eventing actually works is it has what's called a broker, so a broker sits within the name space within the kubernetes or the openshift project.

A

The broker itself can receive these events, and then it has a set of registered triggers that actually drive the creation of additional applications.

A

So you set up your k native applications as being driven by a trigger that's driven by a type of cloud event and the broker works to decide which of the actual applications to push it to now out of the box. When you install one of these brokers, they're actually ephemeral. So when you actually put it up, when you throw cloud events at it, it'll go through its triggers.

A

It'll push them off to the triggers that require it, and then it goes away and there's no kind of audit trail, there's no kind of persistence, but one of the beautiful things we've got about the new version of k-native. Is you can back those brokers with kafka?

A

So you can have a kafka topic sitting behind the broker, which is actually feeding those cloud events into the broker, which is driving the actual recreation of the applications through the events, but because it's kafka, you can wind the temporal stamp back, so you can replay the actual messages through. So you get all the advantages of a complex messaging system such as kafka. But you get the simplicity of this interface and a very quick question.

A

A very quick sort of statement on this when the whole concept of cloud events first came around the people that wrote it wrote it want it wanted it bound to specific format types.

A

So what would happen is you'd have a specific json type or an xml type or a yaml type for your cloud event and you registered the cloud event as being a broker type x and json.

A

But there was a problem with this, and this comes from the the work I used to do with with crap communications, and things like that is that if you've got a broken packet, let's say you have the event arrived at the actual application and the json itself was slightly broken. What it was doing was it was auto basically decrypting that package into a form that could be consumed and if it failed the formatting. If the json was broken, it would report it as an error and it wouldn't get to the actual service.

A

So I went to them, I said well, what I'd like is I'd like a name and just a binary payload, because I'd want my services to be able to recover. If the data is slightly corrupted because the data wasn't even getting to the services, I've gone off a bit of a rabbit hole there.

A

So why is this relevant to artificial intelligence and machine learning? Well, al-ml workloads are all about size and repetition, they're all about massive experiments, but they're all about repeating, repeating repeating repeating and most organizations. Alternatives are limited by resource, either by size or cost. You know, machines are expensive, aws is expensive. Cloud is expensive.

A

Containers and category technologies allow for massive experiments in much smaller footprints, and that's the key thing with this. In fact, if you took a big system, did functional decomposition down to atomic services, you could represent every one of those atomic services as an individual k-native service and suddenly you've got massively complex systems made of thousands of these connective services where the services are only existent for the duration of their calls, and suddenly it becomes beautifully, elegant and efficient, and I say openshift has also got this targeted orchestration and strict resource control.

A

So no more noisy neighbors. The thing called c groups and limit ranges to stop other things consuming too much, but one of the things we introduced quite recently as part of the 4.10 release was the ability of the underlying hardware to express into openshift through the object model.

A

What capabilities it had so now you've got a situation where you can stand up a worker node, that's got gpus, it's got numerous zones and those can be expressed through the object model into openshift and openshift can use that information to correctly orchestrate jobs and combine that with the k-native server uh or the case of a venting approach, and you can see that you could build a model such that if parts of your experiment required a huge amount of gpu processing, you could target the gpu hardware and it all comes out of the box, and I say it's much more efficient resource consumption means much better results for less outlay right, the fun stuff.

A

So that was all just introduction.

A

um I've been obsessed with neural nets. For years I love the concept of neural nets particles. I want to know what this thing does when it's not suffering from fever. Now, to demonstrate the theory behind this dynamic execution, I've applied an idea around these things called neural nets. Neural nets work by actually combining atomic components called neurons and neurons.

A

Your brain is full of them and what neuron does is it takes a number of inputs and it will aggregate those inputs together to generate a threshold and it will generate events at the far end depending upon those thresholds, and you can build massively complicated systems. So what I was thinking about was using the cognitive services servicing and cloud events. You could build and simulate these neurons, the problem being that, with candidate serving and with standard container technology, there's no persistence between calls.

A

So what I've done is I've actually installed infinispan or red hat datagrid? And what that is, it's an in-memory data store and what happens is the neurons are very small containers that are spun up on demand driven by cloud event types. So I have different cloud event types that are actually generating different payloads into the neurons. When a neuron starts up, it will go into the data grid and get its latest memory state.

A

It then processes all the events. It's got, it adds them together. It applies threshold algorithms to see whether or not it needs to send anything on.

A

If it exceeds the thresholds, it will throw other cloud events back into the broker which actually trigger off other neurons and when it's finished before it actually physically shuts down, it writes its memory state back into the data grid, so the data grid works as being kind of the neural memory state, whereas the neurons themselves are on-demand execution via k-native services, and I say I shouldn't be doing this isn't my day job, but it's really quite fun to do.

A

Dangerous part of their presentation, so I'm gonna do a demo. um These never work um and I think people just come to see it fail. But what I wanna show you I log into this cluster here.

A

What I've got is what looks like a slightly complex setup of some creative serving stuff so I'll, let it start and I'll. Let it just get into a situation where you can see it now. You probably can oh you. Actually, you can read it. So what we got here is basically a little setup to show you some of the examples I've been talking about in the center. Here, that's the broker, so I've got a single broker, the broker's, a namespace banned um for us old people who were around when the web first came around.

A

You can actually emit cloud events into the broker. Just by doing a post, you actually put the c type as a header into the post, and you put the actual physical payload into the post itself to demonstrate this. I've actually got an application running here called cloud emitter and cloud emitter, and I say I apologize profusely because I'm still using style sheets. I wrote in 1996.

A

what this allows me to do is to target individual brokers and push named cloud events into them. So what I'm going to do is actually push a cloud event to that broker. That actual cloud event has got a type of caucus event, so what it is is the broker is waiting for these events to come in the broker's got triggers that actually trigger off these candidate services that are actually offlined at the moment. I'm not going to send it initially, because I want to show you the trigger.

A

First of all, so I've got two k native services here. This one here is a quarkus service. I love caucus any java developers in the room. Yes, I always say that normally and there's no java, because there's no one, there's no there's no one. You know I'm over 20 in bloody audience.

A

um I've got a caucus thing set up because very very fast way to actually spin up these kind of things. I've also got a camel k definition here, so I've got a little camel event room that camel event is waiting for a certain type of event in the broker itself. You see, I've got two triggers this trigger. Here is actually looking for type caucus event and it drives requests into the caucus event.

A

So if I emit a caucus event from that cloud cloud cloud event emitter, we should see that spin up what the caucus event is going to do is actually just log the information it gets and then push another event back to the broker. The event it pushes back to the broker is a tech talk, event of which there is a trigger into the camel k.

A

So what we should see is a chain when I push the event into the broker itself, the caucus event should kick off the caucus event should process and then generate an event back into the broker itself, which drives the creation of the or the the resurrection of the the camel k one. It's a very pithy example, but I'll show you it running so I'll admit the cloud event and I gotta zip back to the topology very quickly.

A

You'll see bang the caucus event immediately, fires up so what's happening is the broker has actually seen that event and pushed it into the actual caucus processor. The caucus process pushed one out which has gone into the camel k after a certain timeout, which is configurable. Both of them will go away again. It's a very pithy example, but you can see how dynamic this system is, and that's the whole point behind this now it does require you're, actually building experiments and doing stuff, like that.

A

You have to think in a different way that functional decomposition of your experiments down to very small atomic components. The fact that you have to persist state external to the actual experiment itself, but it means you can build some highly complicated systems in a much smaller footprint.

A

I say this is just a thought experiment and I say if anyone's actually interested.

A

um It is a work in progress, but there is a github repo where I've got basically a nice little white paper on what it's going and all the kind of in bits and pieces. You need to do that, including the yaml for creating the infinispan datagrids and my first feeble attempts at writing the caucus processes.

A

But for me this is the next generation of programming. This is the next generation of development and it just feels delightfully elegant. um That was pretty much it. uh I'm not sure if we're allowed questions or not, I've managed to stay just within time, but I hope that was useful. um I will be around for the rest of day.

A

I have gone past the almost reached the two o'clock point, which I go to sleep due to covert but feel free to wake me up offering me a beer and normally wakes me up, but again, thanks for coming.