Cloud Foundry CF on K8s Forum, 20 Sep 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CF on Kubernetes Working Group Forum 20 Sep 2022

Description

The call was a demo of a POC by Steven Taylor from SAP integrating Cloud Foundry with the new Korifi tool.

A

Hello good morning or hey afternoon evening, wherever you may be.

B

A

All right, as usual, I dropped the link to the document in the chat I'll. Do that again in just a little bit um we'll give it a couple minutes for anyone else to join. If you would like to add any topics, please do so. You cannot add anything to the topic list. Then I can add it for you.

A

Whatever works.

C

Hi good morning, hello, hello, Stephen Taylor here from sap right.

A

On glad to have you thank.

C

You thanks for having.

A

A

All right, maybe just another minute to fill in any topics.

A

And then, as usual, you can add them as needed. If something occurs to you, foreign.

A

A

Alrighty, so uh thank you. Everyone, that's here and uh welcome to the cloud Foundry on kubernetes working, Group, Forum meeting, uh there's a document of topics that is going to be posted in chat once again for anyone that would like to add to it view it and follow along so I've just dropped that link in the chat. Again, if you like to have a topic, please do so um otherwise we're going to skip rolling and uh we'll see where we are from that point so looks like right now.

A

We just have the one topic, but uh should be interesting, which is a CF plus karifi POC. That.

B

uh My codex decided to give krisha a try a couple of days ago and she came up with some pocket. Looks awesome, so I'm glad to hear him today. So Stephen, please see Fury. The four is yours.

C

Sure ready all right. Let me share. Let me share this.

C

Tell me if you can see my screen, the list is yeah. You see that.

B

Yep I can see it.

C

So I mean I'll, give you a bit of background on sort of uh what we were doing um sort of. Why I sort of stumbled on karifi? How it helps me solve some of my problems um and, and then it's more around. This is extremely Alpha way. I mean this is beyond wet paint kind of like quality. So don't don't read anything into Good the Bad, the Ugly of it.

C

It was literally hacked together um over the past couple weeks, because I needed to solve a problem and I wanted to sort of see it in a way that um sort of highlights some of the benefits of what you know. Karifi is offering because I think it's just not clear enough yet to a lot of people, especially when I stumbled upon it. You know that it solved some of my things that I was actually actively looking into so once again, this is an sap. You know hacked together, don't read anything into it.

C

It's probably never may or may not ever go anywhere, um but you know, as I said, I'll fill you in sort of like the background, so I mean it all really started with a problem that we had several months ago was. Actually you know going into like late last year, where we had a problem with the um rate limiting on cloud Foundry. Now there are many ways that we saw we could skin.

C

You know skin this cat, but I mean realistically, you know using you know, applications to limit, you know rate calls or using the go, routers and and services, and so forth, just added more pressure onto an already heavily um loaded landscape. So what we were looking at is right.

C

If we've got external callers coming in, could we rate limit that externally using something that was a little more Dynamic than h a proxy, because you know HD proxy being configuration versus uh things like istio, where we say right, we've got inbound rules and routes and so forth. Can we then start limiting on particular URLs and then do it very dynamically? So what I did is I built up a a landscape?

C

Well, this was actually a a very, very large test landscape that we had up in sap, which was for another um POC which I took over, and it was really could I wrap this thing in in istio and prove that we could do things like rate limiting, so I could take a particular service, put a rate limit around a particular call and say when that call comes into H.A proxy or in this case directly to go routers um that we would then apply a rate to it and I mean it worked, and it proved that I could, you know, could front the Clusters with things like istio I I'd get more of the intelligence that I needed in front of the cluster and then I could add things as I said rate.

C

Limiting was one of them. There could have been Transformations, it could have been anything else. So then, I looked at that and said well, maybe I could do a very similar thing where I would just essentially move an application to that top cluster and use you know istio rules to to route based on um a particular set of criteria, so I could take an application, move it to this applications cluster and then you know, call the same domain and have it route seamlessly under the covers.

C

It's a relatively straightforward thing seems pretty simple and pretty logical, but um nobody had done it. So, as I said, I was heading down this route before the log for jbug. You know kicked in sort of end of last year that sort of put a break on everything that we had done and this cluster had to be terminated because it was, you know, a compromise cluster potentially so we put that on hold for a while, and we thought well, okay, we'll revisit this.

C

You know, as we start to look for different use cases around this and I picked it up again a couple a few weeks ago when I was starting to think about. Well, maybe what about if I were to create some kind of API layer that would sit in front of this applications? Cluster that look like a cloud Foundry domain, but you know, was- was backed by um kubernetes.

C

So I was looking through a lot of your websites and you know blogs and bits and pieces and I stumbled across I think there was another effort to to rewrite the cloud controller to go, and in fact that was done by some guys within sap.

C

So you know, I looked at that and then I think buried down in one comment was this: was this thing called um karifi so I mean, as I said, what I was really looking at? The goals of this particular project was you're not allowed transparent movement between Landscapes. You know: I wanted a very frictionless developer experience, because I didn't want to have the developer to have to go and relearn new tools.

C

um I didn't want them to have to go and learn new paradigms. That means breaking cicd pipelines. It means breaking everything, but I also wanted to modernize this under um kubernetes in this year.

C

So you know, of course, the the the question came up was why couldn't I just use separate clusters? Why did I need um to do some magic with writing? Well, I mean clusters. It just adds more complexity when I have different URL endpoints I can't change that seamlessly under the covers um it got very complex, I mean I'm, not a big fan of like separate URLs and separate clusters and having different you know, tenants live within different clusters. I prefer to have it more of a generic endpoint.

C

That I can then say call this endpoint and I will route that for you right. This is one of the reasons why I went with istio. You know. In the first place we looked at I have a very simple API endpoint I call that API endpoint I do the Rowdy for you and I will make if anything goes wrong on my side, I know how to route you around it right.

C

So this is where carifi sort of came in. So, as I said, it solved one of my problems, which was I needed a an API landscape, an API that that wrapped this landscape that looked and felt like a um a cloud Foundry cluster.

C

um It was- and you also backed it with you- know, with the modern paradigms of of kubernetes. So you could create an application, deploy an application and it would look and act like a cloud Foundry application and under the covers I could then use it for for um for things like um istio routing, I could start to do build extra applications that attach to that within the kubernetes Clusters I'm, not stuck in one or the other. So it gives us that flexibility of saying it is a cloud Foundry application.

C

A developer doesn't need to go and change everything they need to do. They understand that Paradigm and, as I said, sap partners and customers, you know have spent a long time, understanding, Cloud, Foundry and then to go and tell them to go and rewrite their entire application. Suite, to you know, run on kubernetes or run on a new platform is an extremely daunting task for them, and they, you know it could break anything or everything in between.

C

So, if I can keep that facade the same and let them seamlessly migrate over time to a new paradigm, it saves me, you know a bunch of headaches. um We certainly one of the paradigms. I'd like to continually use is when Apple moved. You know moose processes right and chips. You don't really have a major paradigm shift and say everybody go into a whole bunch of different things. Apple creates the tools and the the facades around your application to. Let you you know repackage, deploy your application exactly the same way.

C

It just runs on a new chipset right. There's nothing that says this is a whole new Direction. You know, go and learn a whole new GUI, all the tooling. No, it's completely different um and and go ahead, and uh you know, spend the next six to 12 months, trying to migrate your applications right, I, wanted it almost in an in you know one day switch for for um for a developer right.

C

So this is how I technically laid out the landscape that I'm running um I'm running multiple clusters, um so because we have uh solos blue gateways and glue mesh I'm able to Federate. You know kubernetes clusters and istio deployments.

C

So what I have in this one is a gateway, cluster, a lab cluster and a management cluster and, of course, the Bosch deployment which is running. This is all running on VCR by the way. So the Bosch cluster is running the clip in the cloud Foundry deployment, which is, you know, a basic deployment, but it is, you know, a full um deployment.

C

What I am doing in the lab cluster is because karifi was failing a few times when I first set it up. I ended up running it under v-cluster, because I could tear it down and rebuild it in. You know: 10 minutes versus tear down a full. You know kubernetes cluster rebuild it set up new IP addresses. You know, set up all of the configuration again, so you know I I streamlined the deployment of the karifi cluster down into you know a simple set of steps.

C

It gives me what I need, um albeit not super performant, but you know exactly what I needed it to do right now. So what I have done is once again install a bunch of you know: solo Enterprise agents within each one of these clusters that helps me.

C

Federate configuration you know down the road, and um so when I change something within or deploy a particular route, you know within the Creepy Cluster it will propagate over to the management cluster and get picked up by other pieces.

C

So the main the main pieces that I have set up I mean I, didn't change your Contour because of the complexity of it, because you've you've built um you've built quite heavily around Contour. What I have done um is I still expose it.

C

As a you know, service endpoint, with a with a load balance IP address within um the inbound Gateway I, do um use wasm filters to to change some of the the um the authentication information and then based on on that context, switch I will actually route to to either one of these classes right.

C

So if it happens to be I'm talking to the Contour cluster um to the Creepy, Cluster I will switch those credentials from the um from the bearer token to the class certificate, and then I will reroute you to Contour right and this only routes based on the API um URL, not any of the login or the UAA or any other one, but very specifically for for a single API.

C

um What I do have is this tenant mapping service, which is running on my laptop um on my PC, so it what it does wasn't will call out to this tenant, mapper service and say, if you know a particular ID I unpack the jar.

C

If the jot ID says it's user, X I will then do that token switch for you, as part of also when I bring up an application on carifi I um I have a cluster update service which is looking for your CF routes which get propagated and that will go and update the actual kubernetes cluster um sdo deployment, as well as update Route 53, and the reason I need to do. That is because of the Sni routing right.

C

So if I have a particular route that gets created, I need to create a DNS entry which points to a cname entry which glue has been added as well as a route and that gets sent through Contour because of once again your CNO, your Sno routing um Contour, won't take it if it doesn't present. uh You know during that TLS handshake, the the actual router wants to go to. So it's a little bit of a pain, but once again it does work um the cloud Foundry one is it's just a wild card domain?

C

So unless it's like login dot, you know Stephen, taylor.net or UAA, or anything else um or API. It will just default everything to that Bosch cluster right. So the way I set it up inside the routing tables within the Gateway is everything by default will go to um Bosch and to Cloud Foundry and unless I've said specific rules or I'm doing Dynamic routing it will it will route to either one of the Clusters? Does that make some sense yeah?

C

So, as I said what I had proven, you know, at least through the login sequence.

C

You know we are. You know during this you know a standard login event. It will go to the the Gateway. The Gateway C is okay. There's no authentication header present, we'll forward that you know to Cloud Foundry Cloud Foundry will do a standard login and return that back to the CLI the CLI will, then you know, you know, follow the login sequence which works, and then it will go down and say all right give me the organizations, but at this point in time you know there is a an authorization header.

C

The Gateway was in picks that up and says call out to this token exchange service. The token exchange service is the user. Id sends back um a new token, which then I will replace that in the outbound, the outgoing header and then call out to the karifi cluster right carifi responds as though it's you're talking directly to it and the CF. You know this, the the the CLI doesn't know any different.

C

All it's seeing is a bunch of API endpoints coming back to it, you know, and it's able to to follow the standard process right, so it doesn't see anything different, there's, nothing. That seems to be broken now. This also works with the um CF push and any other subsequent calls, even on a token exchange. If, if the CLA is um the token has expired, you know it once again calls into the login service and gets a new token. So you get a refresh token.

C

So this all clear to you guys any questions so far.

C

I know I'm going quite quick on that, so.

A

uh No, this is great um and we can always review the recording later yeah definitely details. So thank you.

C

So I mean what do we? You know what came out of it. I mean once again, you can log in in a standard, CF cluster. um There's nothing that says it's special or has to do anything. Tokens are exchanged, you know transparently, so the Verizon filter was a bit of work, but you know it does its thing. It's not performant, but of course it does what it needs to do.

C

You know it routes correctly to the correction cluster and you know every subsequent command that the CLI sends over the wire will be routed based on you know that particular header apps. Well, they shouldn't technically know the difference. If you know karifi's Janet's job correctly, so I mean as long as I've got the information within the apps. It should work um the developers shouldn't see any of the routing and it should be very seamless to the developer to continue their current workflows in the current.

C

You know processes without having to do a whole bunch of extra work um so far, apps deploy and act as normal I'm still yet to get some of the Cross routing working I have to create an application that will cross some of these routes um and once again, Cloud, Foundry and CLI is you know, oblivious to any of the routing it's done on. You know before they even know. What's going on so I mean I can just show you a quick demo of.

C

Let me show you this one can get rid of that.

C

So here is. Let me see if I put this in the script, so, if I log in to, can you see my screen.

B

Yeah we do okay yeah. We do.

C

So once again, if I just do a standard, login.

C

Of course now it doesn't want to work.

C

Okay, so it just did a standard login um to the Cloud Foundry application.

C

Okay, so, as you can see, this is now talking to Cloud Foundry. There was an authorization call, but it didn't do anything across it. As you can see, my application is taking a call from a wasn't filter. So if I do you know a list in verbose mode? Sorry, it's a bit up and down right now. You will see that.

C

You are getting you know, wasm is being called right, but, as I said, it's talking to Cloud Foundry right now it doesn't it's not switched over. But if I do a log out and switch in.

C

To a different user.

C

We've now switched over to the Creepy Cluster.

C

And there is a token exchange going on in the background, which is why it's much slower.

C

And now you have a bunch of different apps which are sitting in a different class, so this is on the karifi cluster. Now, if I turn off the wasm and do the same call no applications are found, because now it's talking to Cloud Foundry again right, it doesn't think it doesn't know any different if I turn it back on and once again, I could completely change this out.

C

It will go off and call the currency cluster again.

C

C

And once again, if you're looking at the actual applications, I mean this is the run one running on cloud Foundry. This is the karifi ones.

C

um You know they should technically, because they're on the same domain, you know should be able to call each other. But, as I said, it's an interesting point to do some routing later on and, as you can see within you know, glue mesh. You know within the Gateway I have a bunch of different rules based on imported. You know applications for example, and they are being you know there is I, create a destination.

C

um You know for its particular particular endpoint, and then that will be added into a route saying under certain hosts. Come in, you know, draw a route to a particular route to predict a particular destination which happens to be your Contour thing and through that they do the sna handshake um for TLS and gives me once again. You know the ability to to Traverse that now what would have been interesting is if I could have bypassed the whole Sni thing and just pushed it to a back-end TLS or directly into the actual routing service.

C

In in Korea than I could have done almost East-West routing from you know, one cluster to the next.

C

A

C

Said the you know, I get the tokens in, they said I do a lookup on a user ID set back. You know two things, one is the header and one is just saying this is my cluster that I'm going to and then based on routes, which will be that you know, there's a header rule that says reroute to a particular destination based on that header.

B

So how do you? uh So? How do you uh know which one is which I guess based on the user, because you know how you map them and based on the state? Because the output is indistinguishable right, it's the same behaving the same way.

C

So it's purely done on the on the clients on the set on the user ID in the jar. So once again, if I I will I will just take the user ID that I have set, you know from the decoded jot. I will then add this header I will replace the header into here um and then the um then set the the particular cluster I mean. The actual you know if you want to look at wasm, I mean.

B

I guess my question was more like about the user. I guess the user will have to know how these mappings are set up because the output is indistinguishable. It's like seamless right and what happens if you push the same because you're sharing the same domain happens. If you push the same map, uh virtual tourism result in the same route. Do you get like some different subdomains?

B

Have you thought about that at all, I haven't thought about some domains, yet no yeah, okay, I, guess it's a feature against because obviously they're going to somehow clash with each other right now it could do.

C

Yeah I mean right now: it's it's as I said it's hacked together, I mean the Watson filter is really you know the sort of core of the work of doing that routing, but to be able, once again, Within wasm, to call out to an external service to make a better judgment of what to do next right right now, it's it's Brute Force based on you know a particular user ID, but because I know what the inbound call is right, I know, I know all of the information about what is you know what you're trying to do I can actually make a better decision further down the road and saying all right, I know now what the URL I know the subdomain I know you know potentially subpars and reroute.

C

Based on that, that's it. This was very Brute Force to get it to work, um but it seems to be possible.

B

Okay, cool thanks.

C

C

As you can see simple, you know, CF apps will get you based on a different route and seems to seems to work as expected.

C

Do I know if there there's probably a billion different use cases that may fail, but I don't think that it's uh I, don't think that that's just a probably discovering you know, as we sort of dig through some of this stuff I mean that's also not handling things like custom domains and routes and so forth. So.

C

So that's kind of it as far as.

C

B

C

And POC when questions.

B

I guess we would be interested in our overall experience and any kind of feedback, like one obvious thing, would be you'd probably like to put history in there instead of contour, and how how? How would that change that picture, to show how much more simple would that become if.

C

We can just swap that yeah. The goal was in the original picture to to replace ha proxy with istio, and that would let me route directly to the go routers and then I could almost build in East-West routing directly from one app to Cloud Foundry, not to an application, but at least from let's say a kubernetes deployed karifi application to a cloud Foundry.

C

You know cluster now, I didn't really know what that means right now, because you know we haven't dug deep enough into it, but it definitely looks like it has possibilities to build more intelligence about inter-cluster routing.

C

You know from let's say one application: could I deploy half an application on cloud Foundry and the other half on carifi I, don't know I mean it's possible. How would the routing look? That's a very good question. You know, service Discovery is a little more tricky. You know. How do you do populate services within you know within istio that are deployed on cloud Foundry and vice versa. I don't know yet.

B

But it will be I guess it will be easier for you to to do this. If, yes, if you're a swap Contra for Easter to.

C

Like it would be because then I wouldn't right because it kind of works within the same, let's say because we always say it speaks the same language right then I don't need to have complexities such as DNS, DNS, entries I can just do a pure routing and based on on that, I could go almost directly to her to a service and said I haven't you know, I I hadn't looked into how how deep that would need to be. But yes, I mean my experience.

C

So far with karifi I mean look it you know it's starting to solve. Some of the problems that you know are interesting as far as routing goes as far as building a cloud native application goes as far as once again, I could deploy my legacy. Application to you know to karifi and then modernize that around it over time.

C

You know, as I said, I I, don't know, I mean I like the cloud fantasy deployment and and development experience. I think it's much easier for a developer to understand how to do a cloud. Foundry application versus you know a pure kubernetes in istio I. Think that a lot of developers, you know outside of guys that really want to dig you know into the details, I think having a developer experience that is as simple as a CF push makes a whole bunch of sense for them. It just needs to be modernized.

C

I think that the underlying the underlying pieces that we could have, especially if you look at things like telemetry, you know Canary rolls out rollouts. If you look at um you know, failovers and H, you know h a geodistribution.

C

All these things could be built into the istio layer versus you know the application trying to do what they need to do.

A

A

um Obviously understandable live demos are always fun, because everything that can go wrong will go wrong, but.

C

For some reason, karifi is, in the background, trying to do something: I, don't know what it's doing. It's trying to pull um I have a harbor I, have a harbor Docker registry set up and for whatever reason, it's in the background, pulling images left right and crazy and terminating things. So.

A

C

Don't know why.

A

Yeah, regardless we got to see it in action which was really cool so yeah. But yes, it does work and, as I.

C

Said we are looking at as I said, we're going to show this at least to some of the sap folks to to get more of a I said, I think they're just far better ways than producing tools to try and migrate people off.

A

A

um Yeah I'll definitely probably want to take another look at it later when the recording is published, uh but yeah awesome awesome demo. Thank you for that of.

B

A

Else that wants to pipe up and say something um you know you have before at this point.

A

I also, don't see any other topics on the agenda at this point. So um if there isn't anything else that people want to talk about, let's you know make sure everyone gets any stewing thoughts out of their brain on on the cool uh POC demo and uh chart flow chart. We got and.

C

By the way, I mean I'm in Palo Alto. If you guys are you know in my time zone so I mean I know some of you guys are from VMware. So.

A

Yeah we're kind of scattered across the globe, but there are a fair number of us that are in uh if not the same time, zone adjacent time zones, so yeah yeah.

C

A

Stephen, there's still a bunch of us who are in the Bay Area, even so.

B

Yeah yeah I guess you can probably use the slack Channel I, don't know if you invited you there yet, because you've been communicating via email, uh go for it, so I I guess you can drop in the channel and ask stuff like propose open issues. What kind of feedback is very welcome? We already created the story to look into these two stuff which I don't know exactly. Why we're going to pick it up, but it's in our list.

C

Yeah I think some of the also getting up and running as a you know to you know, develop some of the code. um You know at least to spin up a developer environment. To get it up and running. Is it's probably something that you know would be interesting to me now at least to debug, because someone said I can usually you know, function offline most of the time so I know there's the hacking guide, but.

B

uh Yeah we're using kind deployments for local development, debugging trying out stuff so I. Guess, if you I guess we were we're constantly improving the install instructions and they're constantly breaking down I I think we're in one of these Loops now and a release is pending soon, I guess pretty soon, they'll be back to normal and you'll be able to, but I mean you can try them out.

B

If you have any problems, you can drop in the channel, make sure I send you some sort of a link invited on how these things work, but I'll find out yeah. uh So I can give it a go. If you think yeah.

C

Yeah, no, as I said, I think you know if I can get at least debugging. You know put together and say: I'm, not you know going to delve into writing a bunch of code, but I'm just going to go and delve into it and I can debug this thing. When it breaks, I mean I've had to go and Patch the cfcla. You know to give me you know some of the things I needed to uh at least view the you know: redacted variables that in I needed to see so.

A

I'm I'm curious about your.

C

Original use case.

A

If you could speak to that, like you said you, the the original impetus of this is right. Limiting yeah. Could you talk a little bit about that yeah absent question and why why rate limiting was a requirement or.

C

We have a lot of applications that were beating Cloud, Foundry applications to death.

A

C

And I wanted to insulate the cluster from those applications. They were out of cluster right and once again putting in a rate limiting service was doable, but that created the double. You know hop through go routers which doubled the calls right.

A

C

Even more load on the go routers than than was needed, so I wanted to insulate that completely almost like a denial of service um thing now, I could have done that in ha proxy, of course, and there was a brute sort of force way of doing it.

C

But what I liked about istio was I could then do inspection on who's? Doing the call and I could dynamically change those rates right, I'm, not going to say you've got 3 000 calls um per minute allocated where I could say right, I'm, seeing that you're calling me, but in a burst right, I could smooth that burst out, hmm so I needed the more Dynamic capabilities of rate, limiting in fact, we've seen a lot of applications, um especially new ones.

C

Coming in you know new clusters and- and you know, on cloud native sort of like kubernetes deployments, looking at the more Dynamic rate limiting and not from istio, but you know from the solar distribution that we have, because we have a partnership in partnership with with solo, so we're using their rate. Limiting and teams are looking at actively rate limiting.

C

So you know, unfortunately, it seems bright, dead, simple to everybody, but it seems you know it just turns out to be complex.

C

A

Yeah yeah I'm, guessing from the shape of your test deployment, you didn't get a you haven't, actually tested this thing under load at all.

C

Right right, um this thing is what I've got right now, you know is, is definitely not handling gonna handle.

B

C

As I said, it's that's a that's something. We can definitely look into it. You know over time, I, don't I, don't see that you know that the kubernetes classes I think can handle it, but, as I said, the the transformation and so forth. So why some stuff doesn't work at low.

A

A

Foreign well, if there are no other thoughts on this at the moment- um and there are no other topics- we can probably wrap it up. I'd say keep this uh line of thought open and discuss it. We, you know, have the slack channels and the repos, you can drop issues on and all sorts of stuff, so we'd love to have that uh engagement and kind of see what's going on and how people are using it.

A

So I want to thank you for putting in all that effort to create a set of slides and the demo and all that um it's just awesome. So thank you. Okay,.

C

Can you guys have got questions I mean um this is stephen.taylor at sap.com, so cool.

A

Awesome well, it looks like I might be able to give you all about 20 minutes back, which would be great for everyone, depending on where you are doing whatever um so yeah, uh unless there are any pressing concerns that someone wants to uh raise their hand for I'm gonna say. Thank you all so much for coming out for this. This is cool.

A

um I will received ammo. So if you have ones you want to do in the future, please come out and do it, but otherwise I wish you all a great couple of weeks and we'll meet up again in a bit. Awesome thanks. Everyone. Thank you. Yeah.

B