Red Hat OpenShift Operator Framework SIG | OpenShift Commons, 23 Jul 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Strimzi Operator with Tom Bentley - Operator Framework SIG Meeting July 2018

Description

OpenShift Commons Briefing on Strimzi Operator with Tom Bentley (Red Hat) Operator SIG Meeting July 2018

Meeting Notes: https://gist.github.com/dmueller2001/502b1237f783e2062ed5c93b7d7174da

A

Take it away Tom.

B

Okay, hi folks. Hopefully you can see my screen. Yep start presentation mode. There we go okay, so I'm going to be talking a little bit about rimsey, which is about running Apache Kafka on kubernetes and OpenShift, though, if anyone doesn't know what Apache category is it's basically a highly scalable messaging system? It's come out of LinkedIn quite a few years ago now and it's obviously being a messaging system. It is quite stateful all that reason it's something which is quite well suited to using operators.

B

So the scrim seed project is about making that as simple as possible, so that you can just use Kafka in queue at ease, no can shift without having to necessarily know all the operational details, so obviously the operators that have been bet some of that logic, so we've actually got two operators in string Z. At the moment, we've got the first one is called the cluster operator and what that does? It takes a custom resource which represents the Kafka cluster and it will deploy a corresponding Kafka cluster and a zookeeper cluster occurs at Patrick.

B

Africa relies on having a zookeeper cluster available to it.

B

So obviously, this cluster operator is responsible for managing the scaling and rolling updates and everything to do with the Kafka cluster running inside kubernetes.

B

So the manage resources that the sort of operator is responsible for are a stateful set each for zookeeper and Kafka, and then we've got some services in order for the clusters, the zookeeper cluster in the Catholic lustre to communicate amongst themselves and also to provide client access to the Kafka cluster and for Kafka to access you can. We need participant volume claims because, as I said, these things are stateful, though in particular for zookeeper.

B

It's important that if a pod gets restarted, it has access to the same data that it had before, because it won't be able to recover that from the rest of the zookeeper nodes in the zookeeper cluster for Kefka. It's not quite that bad, because kefka's built-in replication is able to recover, but it can take quite a long time. So it's worthwhile using a persistent volume claim in order to minimize the amount of time it spends fetching information from the the rest of the cluster.

B

So the the other operator we've got is what we call the topic operator. So once you've got this Kafka cluster running inside kubernetes, the next thing you want to obviously do as they use those you're going to start using it, and you will want to be able to deploy your Kafka applications inside kubernetes in order to use it and it was kind of nice.

B

If, as part of that same deployment, you can have a custom reserved representing a topic within Kafka so that you can deploy your topic and configure it at the same time as you're deploying the rest of your application without having to have sort of extra manual steps or extra scripting in order to set up the topics that your application needs. So that's the sort of the rationale behind the cluster operator- and this is all sort of well and good, but there's a bit of a problem which I'll get onto in a minute.

B

Hang on I'll just talk about this slide first, so this is what our custom resource looks like. We started off using config Maps, actually to represent the topics, sort of identifying them with a label so that we knew which config maps represented caf-co topics, and this is now the sort of the customer source direction which we're now going in which I've been working on this last week.

B

So, as you can see, topic is sort of a relatively simple thing, whereas go got built-in partitioning, so we have to say the number of partitions in our topic and we also get to say the number of replicas in our topic and there's a bunch of different configuration options. So there's about a dozen or so different topic, configs that you can set there.

B

So that's what a the topics customers also looks like, but the problem that we've got is that in Kafka there's an API for creating topics, applications can create topics and so and also there's, depending on how the broker is configured. If you try and consume from or produce to a topic that doesn't exist, it will get created automatically for you, which means the scope for things to get out of sync.

B

We can end up creating topics in Kafka, which then there is no corresponding resource for in kubernetes, and so that's sort of something that sort of gets in the way of the list, otherwise quite a nice concept of being able to use custom resource.

B

So we wanted to make this the synchronization work properly, which meant that we would have to create topics when topics were created in in Kafka would have to create them in kubernetes or openshift. So we ended up having to contemplate having a topic operator that would synchronize the topic information in both directions. So we bidirectional, in other words,.

B

If it's going to be by direction, we have to consider what's going to happen if the two ends change the same topic at the same time. So on the left. Here, we've got a topic as it might exist as a customer source inside kubernetes, and someone might change something there and at the same time, and they might get changed on the Kafka side and we don't want the state to get all out of kilter and end up with a mess where the the resource in kubernetes doesn't accurately reflect the topic in Kafka.

B

So you might think that the chance of that pretty small, but the window is probably bigger than you realized, because we've got to consider that the topic operator might not always be running if it crashes and restarts, then the window will be as big as the the time that it's down and if it gets under Floyd and then later redeployed, then again the window could be quite a lot bigger. So to do it properly. We've got to deal with the this.

B

This inconvenient fact that both ends can change at the same time, if you think about it. If the operator starts up- and it sees the state on the left and the state on the right here- it can't tell what's changed: it doesn't see the things which I've highlighted in bold. It just sees, you know some set of properties and some other set of properties, and it can't tell which end has changed, since it last tried to reconcile so to solve that we use a three-way diff.

B

Basically, we have a our own copy of the topic state, which we treat is the source of truth, and whenever we reconcile the customer source, we compare it with that private copy that private state, and likewise, when we're looking at the topic in caf-co, we compare it with our private state and from that we can can figure out how that end has changed. So how the kubernetes resources change or the Kafka topic, and then we can check first of all, but they've not changed in compatibly in the same place.

B

So if the number of partitions, for instance, has been increased to a different number on each side, then obviously that's an incompatible change, and in order to deal with that, we have to apply some sort of policy over which side should be the winner. But if those two differences are not are disjoint, so they have an empty intersection, then we can construct the union of those two differences and apply that both sides and that will reconcile the changes which happened at both sides. So in this case the Union looks like this.

B

A number of partitions has changed 18, and this config property has changed that and in that way both sides end up having the same state. So obviously this is not the normal way that you would write an operator. An operator is normally sort of first thing to say really is that you only want to make your operator by direction if you absolutely have to because it does increase the logic rather a lot, and obviously the simpler your operator is the the more likely it is to actually work properly.

B

Another consequence of this is it makes the operator stateful, whereas a normal unidirectional operator is pretty much stateless. It's just consuming the resource on the kubernetes side and then trying to make other resources match that so with it being stateful, we have to worry about the availability of that persistent state, so for us it may tend to use zookeeper because it's required by a factory Kafka anyway, and therefore, if a zoo keepers not available, we wouldn't be able to reconcile the topic.

B

But if you were having to contemplate something like this yourself, then you would have to have some highly available persistent store available to you to be able to do that properly. And finally, this is a little bullet point here. Is you always want to make sure that you update your private store of date last so that if your operator happened to crash before that gets updated, but you've updated one on the other side, then, when your operator restarts it would correctly synchronize this to date.

B

Again, though, that's the end of the slides I'm hoping to do a demo as well. Now, though, I've already got the cluster operator is up and running, so you can see, we've got a bunch of different pods going on there. So we've got the cluster operator itself at the bottom here, and we've got a cluster of three zookeeper nodes and three Kefka nodes and I've also got a.

B

So in the slides, I've shown you the the customer resource version of the topic that I've been working on this week, unfortunately hoping to get that, although I was hoping to get working for this demo, I've not quite been able to get it working in time, though I'm going to be showing you a version, that's based using config maps to represent the the topic, though you can see here.

B

We've got the the same sort of data that we had in the slides here, so we've got some number of partitions and some replicas and some config, and likewise, if I've got it back in my history, I can that running in the Kafka cluster. We've got that same topic and.

B

They've got the five and the number of bytes in the segment and in seven here, and their different ends ends with 30 and o2. So what I've done is I've prepared it already so that both ends are slightly different and the topic operators we saw when I got the pods isn't running yet so I'll now start the topic operator and hopefully we'll see that both sides match up so just so that you know. So it was the this one which changed on the Kafka side.

B

It was the segment bikes and on the kubernetes side it was the retention milliseconds, which I can now say, OC edit.

B

Cluster, yes, this is the customer sauce that we're using to represent the kapkan the cluster itself. The moment this doesn't include.

B

Set that to use the empty object, which means it will use the default, switch right for what we're trying to do here. So this when I save this, it will create the the topic operator inside that class. There.

B

So yeah, so it's started up there, but it's not yet not yet ready wait for that to become ready now, hopefully, I can get the one side which is showing the five and get the other end which is showing ending in 30. So it's reconciled both ends when it started up so I think that's pretty much all I've got to show you has anyone got any questions? I.

C

Have a question: I was just wondering how you deal with race conditions. If topics are being created and deleted is there? So how do you create a critical region, I'm just curious about there in kubernetes? How do you avoid that race condition.

B

So that's not something which she operator actually deals with right now. So a lot of thought into that I mean obviously you're unlikely to have the situation that you're going to have the same topic created and deleted indefinitely will eventually either end in one state or the other, and the operator should deal with that, but in the meantime I imagine it would keep going, but I must admit. That's not something I thought through fully. Yet, okay.

C

Just curious: what did you know how that sort of thing is handled in kubernetes or how you know the mechanism at this level did you're not down in the operating system you're up higher than that, though, yeah.

B

I mean certainly in the in the cluster operator. We use a locking strategy so that we only processing the cluster sort of you know with one thread at any one time. Basically, so, although you can have multiple clusters all sort of being reconciled simultaneously, there's only one which is being operated on sorry, although you can have multiple being reconciled simultaneously, there's only one thread which is operating on an ocular cluster, though yeah okay,.

C

B

Thanks and then, if you saw synchronously wait for their resource that the other end to reconcile, then yeah. That would stop you from sort of trying to reconcile, at the same time, that you were sort of already reconciling.

C

Yeah I was just recognizing that this three-way diff could be a tricky situation in some cases.

C

A

Are any other questions for Tom everybody.

B

Just digesting that one they just digest it.

C

D

Was gonna put to the group just around? Does anybody else have a need in an operator or can think of a situation where you want to have this kind of bi-directional traffic? Moving where you know you're gonna use some like the kubernetes api is a creative CRD that might drive something in the operator, but then also have to consume things. That would create resources of that same type from a different API.

A

um Yeah I have a use case like that. We're gonna be building an operator to work against key code, which is an authentication server. I want to allow the developers to define which, which realms and users and clients they want to consume in their application and have the operator set those up and drop them into secrets or that, but that users who consumed so they could potentially log in to that key code and change something about one of those things. That then, would mean that the people would be able to sing.

A

For instance, though, that's probably another situation where you could have this two-way sing, we are kind of going to simple route where we're saying that kubernetes is the system of record if you're gonna use the operator and basically always sing from there. So if they make the change, it'll get replaced, but obviously documents, yeah yeah,.

B

If you are in the position to say that you know the kubernetes is the way and the only way to do it, then that's definitely the way to go much easier to have a policy like that. We couldn't because of Africa streams, for instance. The way it creates topics dynamically just meant that we can't turn around to people and say: oh we're, you shouldn't be using kakhovka streams because that's you know one of the main sort of selling points of Kafka so yeah. We felt that we couldn't really take that approach.

E

D

The previous question of just when you can use Cuban aids as your data source of truth. Then you get all the atomic operations that you get from the kubernetes api, which really you know where the etsy primitives that are being pulled through. You can do like those compare and swaps and, like you know, only have one person operating on an object at a time which is pretty nice I.

E

Had a quick question, uh I think I just heard this, so the topic or stream resources. Do you provide all the information to access it? Is that correct, like a secret credentials, etc?.

B

E

B

Don't quite understand what you're asking Jones is.

E

Does the does the topic or stream resource provide all of the necessary information for connecting to said topic or stream, or would that be also in the may? Be the I guess yeah? Does it because there is there, like other information, that would also be needed retrieve. Basically,.

B

You sort of asking is this sort of more about like having a status section in the CR D, for instance, that you can usually consume the resource from something which depends on reading it like.

E

If I, this is based on the topic in the SDK mailing list that we had, which is if your resource gives me as if it exposes a secret for connecting to it, for example, and of the URL or whatever needs to talk to it. Maybe it's just the name of the service and the port, then what would be really exciting for me is being able to actually just prototype using your coffee operator with my my metering operator and so I just want to know like what kind of support there would be for that.

E

A

B

Nothing that's on the immediate roadmap, but we do want to add support for several dub resources in our CRD, eventually so you'll be able to scale the catechol cluster using the cube shuttle command to scale it and stuff like that, and likewise have States information so that you can easily consume the endpoint from other things. I think that's what you're asking about yeah.

E

Basically, because what I would like to see at the long-term is where, if I enable life like autumn Auto, connect to Kafka streams like my operator would be able to discover these streams and create database tables that until the data in those tables with the content of the streams, an actually I'm very specific use cases in mind for this, like man, if that's going to be operator and irreverence up, sending to Kafka I could start analyzing these logs, and that would be I. Think the next dream. Lee interesting use case for certain things.

E

So it's just excited to see where these could kind of, inter off yeah.

B

No, we definitely want to get to I mean that sort of one of the benefits having the resource is. It makes it all sort of much more kubernetes native, and then you know obviously sort of other software, such as your own, is able to consume these resources as well and discover them so yeah. That's definitely one of the benefits of this approach.

E