Red Hat OpenShift OpenShift Coffee Break | Red Hat Livestreaming, 16 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OpenShift Coffee Break | Change Data Capture with Debezium

Description

What is CDC? Why is it getting so popular? Get your espresso ready for the OpenShift.TV Coffee Break, as our special guests Hans-Peter Grahsl and Chris Cranford will introduce us Debezium, an open-source project to implement Change Data Capture pattern with your favorite database and data streaming solutions.

A

A

A

Coffee break this morning at 10 A.M, European time it looks like it's 3 A.M in the night before some of our guests, so big shout out to Chris we're gonna introduce all the guests in a few moments. Let me please introduce myself. My name is natala vinta I'm product marketing manager. Here at openshiftv and I'm, together with my fantastic co-host, Fabio Fabio.

B

Good morning, everyone I'm Fabio a solution arcade for Reddit and here co-gets, with Coast, with with Natalia these Fantastical sessions.

A

All right all right for the people that doesn't know what is openshift TV coffee break is a weekly show where we talk about openshift, kubernetes, Cloud native in a web TV. This is a kind of a emea time zone like show uh so I'm very happy to have today uh our super special guest. Let me introduce first, our special guest from another time zone, which is Chris, hey Chris. How are you I'm doing fine? How are you guys? Oh final, fine, thank you for joining us so early in the morning.

A

This is special kudos to you um and um Chris is together with Aunt, Peter, hey and Peter. How are you.

C

Hi I'm I'm, fine, I'm fine. For me, it's pretty uh pretty good time in the morning, so everything.

A

That's great that's great.

B

So you are not doing after hours, yeah.

A

In fact and Peter looks Smiley I see the lights in his face that well also crazy, very Smiley, but you know and Peter looks a lot more. um You know um I don't know more. He has more rest in his eyes for sure.

A

Well, folks, do you do you like to introduce yourself so the people know who are you and what you're doing and what we are talking about today.

D

A

D

Can go first, um like you said, my name is Chris Cranford I'm the lead engineer on the division project here at Red, Hat I've been with red hat for almost seven years, um and most of that time has been working on actual Division and changing their capture, so uh really nice to be here and hope to be able to kind of share with you guys what the museum is and why it's important and how it's really kind of shaping uh the CDC landscape in a really positive way.

D

C

Yeah and uh hello, my name is Hans Peter I'm I, recently joined red hat so basically pretty much uh three months ago, as a developer, Advocate I'm working in the team of Natalia lots of great other colleagues in there and yeah.

C

My relation to the museum started a couple of years ago already I was uh back then also starting to get involved into into Kafka and the whole ecosystem, and things like that and yeah from then on I think it's pretty much about five years now that I have looked into division for the first time and I'm really happy that we can join this show- and um you know, share it with with the rest of the community.

A

Fantastic, thank you. Chris can thank you once Peter for the introduction, because today's topic is really uh the video and change data capture, which is a pattern right. uh We noticed in the I.T community in the market that this pattern is getting a momentum. That's why we invited you to. Please explain us uh why this momentum and what this really this pattern, but before we go into the technical, is that Fabio I like to say hello to our YouTube and twitch uh attendees people? We were watching us. Thank you.

A

Can you please write in the chat? Where are you from where you are attending because we see you know the this is a a show that arises to many countries. Let us know in the chat, if you, if you want and please if you have any question about the pigeon and change data capture and everything, is there send the question in the chat? We're gonna bring the question to the uh to the speakers today.

A

um So folks, what's change data cap, sure and what's the beach.

C

Yeah sure, let's start into that, so we we uh we can Dive Right In, and we brought a couple of slides, of course, that uh that should help you also uh on the stream a little bit. So the BCM I I think is best described uh as an open source change data capture platform um and um it it is basically a supporting a family of different databases. We come to that uh in in the next couple of slides. It is used to actually um read from a databases, transaction log.

C

It has a couple of nice features. It allows to do, for instance, snapshotting, meaning that you can, when you start initially to to work with your data, and you want to expose it to other services, you can take a so-called snapshot, and this brings all the existing data in your database into a different systems. We come to that a bit later. How we can do that, because there are different ways to do it most often it's done propagating these changes through Kafka. You can filter it.

C

It has also, since um not too long ago, a really nice web-based UI that you can use to configure different types of source. Connectors that you want to run and it's fully open source, it has a very active community and I. Think Chris can in particular add a couple of more notes here regarding also different types of of really large production deployments that they see and and add here a couple of more input.

D

Yeah sure so I mean from the community side just to kind of point this out. um You know we have over 400 contributors, individual contributors to the Upstream project already, which is yeah, which is really really amazing.

D

um So you know we do have that Vibrant Community. A lot of people are constantly engaging uh with one another through the mailing list, through our chat and and other avenues. So really, you know huge shout out to our community for really helping support us through this project and really helping us. um You know, add new features and- and you know, regression testing and so much of that stuff. So that's really really great in terms of large production deployments,.

B

D

Do have a number of users that are using the the project both commercially but then also just from the community perspective.

D

um We have uh one large retail customer I know of specifically in Brazil, that's using the newly released Oracle connector.

D

um They have been using this now for about I believe six to eight months um when it was going through its incubation period and things like this. We also have other uh other big large deployments. I know the uh there's. There's a discussion going on um with some customers in Italy. We have some customers in India that are also looking to use the museum.

D

So it's really really touching base on the entire Global Market, um with lots and lots of Banks and Postal Services retail vendors uh really taking use of the project and really trying to scale it out and and take advantage of what we offer through CDC.

D

Yeah and then I guess and then I guess I can talk about this and then um so in terms of connectors uh with the museum. The the biggest thing here is that each of these databases they they have their own format. They have their own way in which we have to kind of interact with the databases to get these changes, and so you know it kind of means that from a business perspective, we have to integrate with each of these in their own way.

D

So we have a host of database vendor-specific connectors to be able to integrate with all of these database vendors, but despite needing to have a database vendor specific connector, we do have kind of like this um uniform connector framework that we're able to use to be able to leverage a lot of those features that Hans Peter had talked about previously things like snapshotting things like filtering and so really kind of the the benefit.

D

Here, as we are working with these different databases with whether it be Oracle, whether it be Cassandra, whether it be MySQL really this. This connector framework that division brings to the table allows you to be able to kind of focus specifically on ingesting those change events using the apis that that data that that database requires us to integrate with, and so um we'll kind of talk about and touch on this a little bit later, but not only just using the connectors themselves. uh Does this make this really important?

D

But it's also important from the perspective that, if you are wanting to integrate or write your own connector, you can actually take advantage of this common connector framework for this as well.

D

I, don't know if you have anything else to add.

C

There Hans Peter, no, that's fine and I think uh what we should uh definitely touch upon here is uh the the two fundamentally different ways to do: change, data capture uh and integrate with databases you or data stores in general and and so here on on that slide, we tried to summarize that a bit because at one point remember uh you you could think well. Why do I need that? Why can I not just uh do uh and leverage a query based approach?

C

Basically, meaning you find a way to come up with a SQL statement that allows you to check what has changed since you last queried a particular table, and that would be this query based approach, uh but when you think about it in in a little bit more uh detail, this, this approach has has several uh limitations.

C

Actually, when you compare it to uh what division does, namely integrating with uh the TX log of of a data store, and so when, when you think about a query based change data capture, you would basically uh poll at certain intervals your database and particular tables in that database, and this would first of all mean you have a certain overhead, because you're executing, uh like any other client different, usually select statements against certain tables at certain intervals, so that that's overhead um uh definitely and uh what that also means is that uh you cannot really capture all the changes uh and there's two things to I think to to highlight here here.

C

One is think about that. Maybe something changed and uh changed again uh in within your appalling interval. Then you might lose one of those changes uh because you haven't seen that uh why? Because it happened during the poll interval, another thing that's usually hard to do is to capture deletions, because uh how would you easily query for something that has been there once, but isn't there any longer? Also, very often you want to understand okay, not only.

C

How does my data look like right now, but what was uh the previous state uh before a particular change, and also that is something that you cannot directly do with the query, based approach and so essentially, all those things that are hard or even impossible to get right with a query based approach become possible when you directly interact with the transaction log of a data store- and this is exactly what all the dpsm source connectors that we just mentioned earlier for those different types of data stores are doing so they can do all of these things.

C

In that regard, do you want to add something here? Chris yeah I just want.

D

To point out the the one point here about the transparency: what's really really important, um at least for me, is that by using the BCM and actually using the transaction logs to being able to do this log base CDC approach. It really prevents you from needing to have to like manipulate your data model. um You don't want those two things to necessarily be integrated, because going back to what you said.

D

If you're wanting to do this polling interval you're going to have to have something in your data model, that's going to that you're going to use to be able to know whether or not from this time stamp to this time stamp what records have changed and when you start kind of mixing those concerns. It creates this on really kind of like unnecessary coupling that, at least for me, I, really dislike and I.

D

Think most people um would would agree in this regard, so being able to use a log based approach, really kind of gives you the ability to being able to not only capture all of those individual changes, as you spoke about, but just really being able to allow your data model to focus on what it's meant to support in your business application, without really integrating concerns that you need inside that business model for change. Data capture.

B

Yeah Chris, what you are telling uh is I think he's a great Concept in in the world of microservices, where the loser coupled uh integration are key. So um keeping the communication between microservices uh without any um need to couple with a specific data model is, is great because it allows to encourage me if I want to apply all that patterns to um related to the deploy as soon as possible stuff, without with the lowest impact on the other surrounding microservices. That that specific uh one is going to to be integrated.

D

Exactly and I mean what kind of touch on this a little bit later, when we talk about.

B

D

Box pattern, but that's also really really important of being able to not only you know with the outbox pattern.

D

Not only are you trying to keep your data model separate from uh from microservice perspective, but you're also trying to make sure that that data model isn't being leaked from one service to another and using the outbox pattern, which we'll talk about later is a really good way of being able to prevent you from from doing this by having a particular model that you want to use for your Eventing components that you can morph independently from your data model itself.

D

C

Right, yeah, and so uh remember, we talked about that. We have different ways uh to um to deploy or to use. In fact, dbzoom and I think still uh correct me if I'm wrong Chris, but still the most common uh uh way of doing it in in the wild, is still by using it in in the original way uh that was supported from the very early days of the PCM, namely that you use it within Kafka connect.

C

uh And so you have these Source connectors for Kafka connect that are again specifically written for these different types of data stores, and you run them within a Kafka connect environment. And then those changes uh are propagated into Kafka topics. And uh what you see there is that you have essentially a one-to-one relation between a table in a database and a topic in Kafka, where you would find all those uh changes that happened to a particular a table and then from Kafka once it's there.

C

You can work with different types of sync connectors and feed any of these change data events into your target system of choice, and here we've just a couple of those that are are pretty popular listed here, so think about bringing certain changes into elasticsearch, where you might want to build some kind of full text, indexing solution or think about that. You use Infinity span as a cache, and you can then employ those change.

C

Events to uh to update your cache or to to initially also warm up your cache in that regard or in general use any kind of jdbc sync connector to propagate that further in in in in data warehousing infrastructure, and things like that and again you capture inserts updates and deletes there and they are propagated in exactly that way. Using a Kafka topics in this case. But that's, of course, not the only way uh to use uh division um later on.

C

uh There were more ways of working with division in in totally different uh scenarios and another uh common one. uh These days, I think uh Chris can highlight upon a little bit uh is the so-called either embedded engine or you use something called the division server.

D

Correct so you know like, like you said, you know, we we've talked about Kafka connect um but and you're correct. Ninety percent of most of our users are using division within Kafka connect, but uh with the embedded engine. This is pretty much taking the BCM as a library, and it allows you to be able to kind of embed the division engine itself directly. Inside of your application, um you may have specific requirements as to why you might want to do this.

D

For example, you might need to be able to interact with those change events at a very, very low level. um A good example of this is the Apache Flink CDC connectors. They actually um use the embedded engine inside of their framework to be able to capture those change.

D

Events manipulate those and emit those events using their uh their framework programmatically, but then there's also another component called the BCM server and this is using the embedded engine, but in a ready-made runtime environment, it's kind of very similar to Kafka connect, it's meant to kind of fulfill the same role, but with a slightly different Focus. It's really kind of designed to send these change events out to other messaging infrastructures. So, for example, we have adapters for Apache Pulsar Amazon Kinesis, uh redis, Google, Pub sub.

D

um We even have one for kubernetes k-native Eventing, so you could really kind of harness the power of division, Source connectors and really be able to kind of integrate with really any messaging platform, which is really really cool, because at the end of the day, we want to bring the beesium to the largest audience possible and being able to have these other ways of being able to run to museum has been extremely crucial um and the the BCM server implementation itself, in particular, has become quite popular as of late as more users are finding um that it really open opens up these additional Alternatives with working with other message infrastructures, if you happen to not be a Kafka, a Kafka shop, for example,.

C

Right, yeah, and so there is just as a hint there was not too long ago, Chris uh showcased, essentially this division embedded engine in one of the recent caucus inside episodes. So there's the link to that. If you want to look into that, he does uh essentially that the whole session is basically a live coding session where he nicely shows uh what it would take to use the typesium embedded engine in in a quacos applications. It's a really nice way of seeing how how this actually works.

C

uh Essentially, you include it as a dependency into your into your project and then you are ready to go and start to use that and can interact with those change events and literally to anything you want in your application to these changing mats um yeah with that, uh so we we mostly were talking so far about uh the I- think that the the technical Foundation, what it basically is, how this um log based change, data capture approach works, but besides the uh that the technical uh Foundation itself, uh when you start thinking about uh the different ways uh that you could use these change data capture for various different use cases in the wild.

C

uh You will really see that uh it opens up a lot of interesting scenarios that can be built upon uh this change, data capture idea and there's just a couple of of those here uh listed in in a tweet here uh by Guna.

C

um So it's about uh simple cases are replicating data between two uh systems, uh but uh again uh we we mentioned that. Okay, uh why do you? Why do you do that? Maybe you uh feed search in the indexes. You want to update, search indexes, a full text, search engine, maybe in solar or or in elasticsearch or wherever. Maybe you have a cache? You need to warm up the cache.

C

You need to invalidate the cache uh due to all of these changes that are happening elsewhere in your data store and, and that's all very valid use cases that become possible pretty pretty in pretty straightforward way. I would say thanks to uh this uh really nice idea of of change data capture in that regard, also, micro, Services architectures. Very often, you need to somehow communicate changes happening in one particular service to other services, and- and here we also see later, Chris also mentioned it.

C

One particular so-called CDC pattern that that helps in that regard a lot, namely the outbox pattern, and then there are a couple of others. So basically what you can see CDC also as as this uh giant enabler as it says here, uh and it sets free your data that was originally as if it was locked away in a database and you could not easily get it out uh and had probably a tough time because think about all those Legacy systems that you might not be able to touch. Maybe you don't have the source code.

C

Maybe you are not allowed to touch the source code for that system and it's just running, but still you want to build new functionality, new services around it and and that this is how you can then tap into all the changes that are happening in the database of such a legacy application and then propagate it to any Downstream consumer that you want to use yeah. Maybe a few insights here. Maybe Chris wants to explain a little bit how how um you know the the actual payloads that the museum generates. Basically look.

D

Like sure, absolutely so, um each of our division, events um and kind of, if you think about this from a relational table perspective, there is a key and then there's some payload. That comes along with each of these events. uh Generally, we map the key, that's associated to these events, to the table's primary key, but of course this is configurable for the user.

D

So if you have a set of other columns that you want to map the Q2, you can certainly do this as well, um but then we also provide you as a part of this event.

D

We provide you um two different, really key and important components, and the first component that we want to talk about is the before and the after state, and if you think about this again from the perspective of a relational table, you have your columns and you have your values and the before basically represents what was the state of that row before the change took place and, of course, then the after represents what is the state of that row after the change occurred and in the case of say, for example, an insert event.

D

You wouldn't have a before state, but you would have after and then, of course, in the case of a delete, you would have a before State and you would have no after state um and we can go on to the next slide and we can see on this one where we have a couple of other components that are part of the message payload.

D

um The first here is the source block. This is really really important because it provides you a lot of metadata information about where we sourced this particular event from uh it could contain things such as the database, the scheme of the table, maybe even the transaction reference in the case of things like Oracle, it's going to provide you even more deeper metadata information about which particular node in the Oracle rack cluster that that event took place on.

D

So you could really kind of use this information as you're going through and scraping the events that are inside your Kafka topic. You could really cross-reference every one of those events back to the transaction log on the database. If you wanted to using this information, and then we also capture things like what is the operation type, is it an insert, is an update? Is it a delete or is it something that represents a read event where we did the initial snapshot and then, of course, things like?

D

When did we process this particular event, and all of these payloads can be serialized to Apache Kafka or to any other messaging platform, using a number of different serialization techniques, whether it be Json, whether it be Avro? That's really entirely up to you, because this entire process is completely configurable and that's really kind of the key. At the end of the day, a lot of dupism users are not programmers. They are maybe data, analysts or maybe they're a little bit tech savvy and by really allowing the beesium to be completely configurable by configuration.

D

It makes it so easy for people to really get started with it and to be able to capture these events, foreign.

D

We also support being able to emit our change events uh using Cloud events um if you're not familiar Cloud events is a uh somewhat standard um technique for emitting uh events and a particular type of format, and so all of our division events can be used with the cloud events converter and configured as such to being able to emit those events using this specific format, and if you notice it, it follows a very similar technique. There is an identifier at the top of the event.

D

There are a number of different metadata key value pairs that are going to be paired as well. In this event, they're going to all be prefixed with IOD beesium, followed by some additional descriptor afterward to describe what does that value represent such as whether or not it's the schema or the table a lot of these attributes? If you, if you go back and you look at the previous slide- you'll see they're they're all mapping back to like the source info block.

D

So there's a lot of that information is being carried over but being carried over in a way that allows uh the the consumer of that cloud event to be able to contain all of the information that we had. Even if you weren't using Cloud events and.

B

Then, of course inside yes, sorry, a question related to this topic. That means that could be CDC can easily integrated or Incorporated in a k native workflow, correct me if I'm wrong so supporting easily Cloud events.

D

It variable could be yes, okay and I mean again going back to what we were talking about earlier, with division server and having um k-native Eventing as a sync option, with division. Server really kind of enables this pattern uh very fully, because at that point you can run the museum inside uh of a a pod capture. Those events convert those events to Cloud events and then emit those directly into kubernetes, using k-native Eventing as well. Okay,.

B

D

um And then, if you, if coming back to the cloud event, um we have a data section, and this section very much mirrors what you would see if you were looking at this inside of a Kafka uh a Kafka event. We have a section for with a schema that describes what our payload structure looks like what are all the fields, their respective data types and then, of course, in the payload field.

D

We have this before and after section that looks just like the previous event types that we would admit if we were not emitting them in Cloud event, format.

D

So I guess I can also talk about here. A little bit of some of the latest changes that we've made to the museum.

D

You know, we've been quite at work over the last number of major iterations of really trying to bring a ton of new features to uh the CDC platform and one of the biggest new features that we added is a feature called incremental snapshots, and we introduced this into BSM 1.6, and the idea here with incremental snapshots is that this allows you to be able to take a snapshot of your existing data concurrently, while the connector is streaming changes from the transaction logs. This also has the benefit as well that this snapshot is resumable.

D

So, let's say, for example, you're running your connector, your snapshot at your incrementally snapshotting, your data you're capturing changes from the transaction log, and you need to shut your database down for a moment to make some changes and then bring it back up. That snapshot is completely unaffected.

D

The traditional snapshots in this particular scenario would require you to rerun the snapshot from the beginning, so that is one of the really huge advantages of using incremental snapshots with division, but we've also improved upon this process in recent versions: adding the ability to being able to stop or running incremental snapshot being able to partially stop parts of the incremental snapshot. Let's say: you're you're wanting to snapshot two tables, and you want to stop snapshotting one of those, but you want the other to continue.

D

You have this option now of being able to do this, you could additionally choose to pause and resume those incremental snapshots and there's also a new feature that was added I, believe in dpsm 1.9. That allows you to be able to even apply a predicate condition to the incremental snapshot process. So if you want to only focus on capturing a subset of the data from a table, instead of all the data again, uh you you have that option and that flexibility of being able to do so.

D

uh Some additional changes that we've also added in recent versions include adding change stream support to the mongodb connector um mongodb was originally written and implemented to read and to tell the oplog of the database or the transaction log, and we have moved away from this and recent versions to using mongodb's native chain stream support. This provides so many more uh features um than what the traditional oplog approach uh provided previously and some recent versions that I believe in division, one 1.9 or 2.0.

D

When we introduced to this there is the ability to be able to capture the full documents of that change. Now, so there's there's tons of new features that came along with adding this chain stream support for the mongodb connector and then just to kind of jump forward a little bit uh in 1.9. We added support for multi-database support using SQL Server. This is really really huge from the perspective of um any environment where you have a single installation and the single database installation supports multiple databases, so you have this in um SQL Server.

D

You also have this in Oracle, and one of the new features that we added with SQL Server is being able to deploy a single connector to capture changes across multiple databases in that SQL Server installation.

D

Previously it required that you go through and deploy different connectors for each database, and now you can do this with a single connector and the main benefit to doing this is we can really harness the power of Kafka connect with its multitasking architecture, where we can distribute each of those databases across different worker nodes to being able to do that, work in parallel instead of sequentially and then, of course, starting with division.

D

2.0 Java 11 is now required and we also have added a new feature for mongodb 6 and later, which is being able to provide you. What was the state of the document before the change occurred, and this is really starting to take mongodb from this perspective, of just being able to tell you what has changed or what does the document look like after the changes occurred to really kind of moving to be more to what that, before and after state looks like and those change events from the relational connectors approach.

D

So we're hoping that, with some of this new features coming with mongodb, that we can start moving in a way to where everything even mongodb could be somewhat uniformly represented as a change event.

D

C

Yeah, maybe that the roadmap very briefly, uh because we have then also like we said, uh two use cases that we want to show, and maybe also a quick demo uh but I- think it's interesting to learn. What's what what lies ahead for the museum in particular for for next year and and after.

D

Yeah, absolutely so I mean just to highlight just a couple of really important ones. Here um we are looking at uh implementing our own jdbc sync connector next year, which is really really kind of cool. um The idea is to being able to ingest raw division events.

D

So if you've used other vendors jdbcc connectors in the past, you you've had to flatten the division events to be able to consume those we're looking to be able to provide a way of being able to ingest those and and write those two, um your jdbc uh database or over jdbc to your your database without having to do that process.

D

um We're also looking to bring Maria DB as a first class connector, um and then we're also looking at implementing things like exactly one semantics, which uh Kafka connect and Kafka nail support. So um we're looking to do this in a multi-phase approach next year, starting with a couple of connectors initially and then rolling that out to other connectors throughout the year and then, of course, there's some UI improvements to our UI division.

D

Ui component, we want to be able to allow you to be able to do things like editing, a connector being able to see some key critical metrics inside the UI and so much uh and so much more so there's a lot of advancements and features coming for the UI next year so be sure to check that out and keep an eye out on the release. Notes.

C

Sure so, with that, let's talk about now two selected CDC patterns to see how you can like make use of CDC. You do for doing something called the outbox pattern or then afterwards. We look into another pattern, called the strangle effect pattern, but I think Chris briefly starts to explain the outbox pattern. A bit.

D

Yeah sure so, with the outbox pattern, we kind of touched on this a little bit earlier when we talked about microservices, you know as you're, building up these microservices.

D

The idea is that you want to be able to kind of keep these micro services and their own silos. We want them to have their own database. We want them to be able to be able to morph and evolve independently of other microservices that might be participating in the topology and in order to be able to do this, you're still going to need to have some kind of mechanism to being able to share data between those microservices and one reliable way of being able to do. That is with the outbox pattern itself.

D

So if we go to the next slide, what we can see here is we have this order service on the left, and we have some other micro services that are on the right and we're using uh Apache Kafka here in the middle, to facilitate this communication, and the idea here is that we want to be able to Signal the shipment service to do something when we've done something with a particular order.

D

Maybe when we save the order, we need to be able to tell the shipment service hey now, go ahead and ship this order for the customer, but we want to do this in a decoupled way, and so in this particular case, even if Kafka could participate in the distributed transaction, which you know some people may come to this approach and go well. You know I'll just do this distributed transaction and and facilitate this kind of uh this kind of orchestrated transaction across these Services.

D

You really don't want to necessarily do that um in the grand scheme of things, and of course, you also don't want to have this kind of like dual right approach, because it's really prone to all kinds of consist inconsistencies.

D

What happens if you write to the shipment service, but then the order service right rolls back or vice versa, so we really want to move to a pattern to where we can facilitate this type of communication in a very reliable way, and so, if we take a look here, the way we can do this with the outbox pattern is we can decouple this down by looking at what our order Service is like first, so we have our business tables here with our for our purchase order and our order lines, and then what we're going to do is we're going to add a new table into this particular service called this outbox table and the idea with this table is that we're going to write to this table at the same time that we write to our purchase order tables.

D

This gives us a local transaction. This gives us um a transaction boundary, that's specific. Only to this service to where, if something happens, with writing a purchase order to its database, the right to this outbox table would also be rolled back at the same time.

D

And if it succeeds, then everything succeeds together and then what we do is we introduce into this Division and Apache Kafka, and what division is going to do is the BCM is going to be monitoring or capturing changes from the transaction log, but it's only going to be interested in this outbox table and what we're going to do is when we notice that there is a change in this table, we're going to push this change into a topic in Apache Kafka, and then we can allow our other microservices in this topology to be able to react to that event in any way that they need to do and the outbox table itself can have a variety of different uh tables.

D

Out of the box. I'm sorry, a number of different columns out of the box. The outbox uh adapter expects these particular types of columns to be present. But this is somewhat configurable, but it really allows you to be able to kind of identify and produce, what's called an Aggregate and being able to provide that information to those other consumers, whether it be a shipping service, whether it be customer service or whatnot, and a real decoupled way.

C

Yeah, maybe you want to show that quickly in a demo, Chris yeah.

D

Sure I certainly can uh move this here.

D

So can you guys see my screen? Yes,.

A

Let me yep enable it yeah.

B

D

uh First of all, I just want to point out for those um for those watching. We do have an example repository where this particular outbox example is sourced from, if you're ever interested in understanding how to use the beesium in a variety of different ways, there's tons of different implementations here, whether it be you're wanting to integrate with camel you're wanting to integrate with a cache or you're wanting to do some kind of case streams or k-sql functionality inside of Apache Kafka.

D

All of that stuff is here so feel free to check out our examples repository, but we're going to be taking a look at this outbots example here in particular, and what I have done is I have gone ahead and started up my uh Docker compose file, and we can take a look at this here. Let me it's a little bigger, so what we have in our Docker compose here is we have a couple of services, we're running zookeeper and uh Apache Kafka. We have a couple of databases here.

D

We have one for our order database and one for our shipment database. This again is mirroring that uh Silo for the order, service and The Silo for our shipment service, and then we have uh Kafka connect running and here we're actually leveraging the use of streamzy in conjunction with division of being able to run this together inside that container and then finally, I have our order service and our shipment service.

D

These two services are using quarkus to be able to provide a restful interface that we can interact with for being able to persist in order to that particular service, and then there is obviously some restful interfaces for being able to get the shipment data out of the shipment service and a true typical microservices architecture.

D

And so what we're going to do is like I said: I've started this up. I have all of those containers running and I need to. First of all deploy my diffusion connector.

D

So the way I'm going to do this I'm just going to register this uh Json configuration with Kafka connect and if I scroll up here I can take a look and what I'm going to see is I'm connecting to my order. Database and I am going to be monitoring or capturing changes from this outbox event table and I'm going to be routing those events using the outbox transformation or single message, transformation through into Apache Kafka.

D

So if I were to pull up I, think I have it here somewhere, there's this image that we were just showing on the previous slide. What I have basically done at this point is I have set up my CDC with divisium to being able to monitor this outbox table this component here and being able to send those events into Apache Kafka here, the metal so at this particular Point. There's no data in my orders table so I need to generate an order, so I'm going to do that by using the order.

D

Services restful, API I'm just going to create an order here and if I scroll up here, you can see that I have created an order with id1. With a couple of lines here, I've decided to order two of the division in action and I've decided to order one of the bees in for dummies books and then, let's say I made a mistake. I want to actually cancel one of those so I'm going to cancel my order for the abusing for dummies, but I'm going to keep my order here for the museum in action now.

D

The reason that I have done two different events here is I want to be able to represent from the originating side in the order service to the shipment service. I want to be able to show that I'm getting multiple events and how those events are being handled inside of those Services itself.

D

So, let's jump over into our order database and just to be able to confirm that all of our order information was written to the database, which we could see here in the log above us, but you know just to be sure we can see here. We have this order and we have two line items and again we can see that the uh division for dummies was canceled.

D

We can see rdb's in for Action is still in entered status and, if I take a look at my outbox table, we're going to see there's no data in this particular table and that's expected and the reason this is. The reason this is expected is because, when you're working with the outbox pattern, we're really only ever interested in the insert events, we're not interested in updating the events that or updating rows that are in this table. We're not interested in deleting rows in this table.

D

So, typically, when you're, using this pattern, you're going to write a record to this table and you're going to immediately remove the record from the table just so that you don't necessarily need any kind of housekeeping on the outbox table over the duration or life cycle of your service. Now some users may decide well I want to keep this data for debugging purposes. You certainly can do this and then purchase this data at some future interval or on some interval, but that's completely up to you. You can do it as you see fit.

D

It will have no impact on the actual delivery of those events. Using the outbox pattern with divisium and then, if I jump over into my shipment database, let's take a look to see what we have over here. I can see now that I have notified the shipment service. The shipment service sees that my order has been received here for customer one.

D

Two three, but I only see that there's one shipment here, but we did insert two records or we did make two modifications if you will inside of the order service well, if I then select here from my consumed messages table we're going to see that there are indeed two entries here in this particular table.

D

What we can do is we can correlate these events back to what's sitting in Kafka. So if I go back to my terminal here, I can now go in I. Think this is it yes.

D

I can use Kafka um Kafka cat to be able to look at my topic to be able to see what events were received inside Kafka and again. This is the events that are being captured by division being read from the outbox table and being sent into Kafka itself, and I can see that I have two events here.

D

I have one event that begins with 266 as an ID and one event that begins with a5f and if I go back in and I look at my shipment database I can see 266 and a5f, which shows me that my shipment service has received both of those events from Kafka and has processed those and has manipulated my order inside the shipping service appropriately, and that's really it it's super simple.

D

um It's all driven by configuration. It really keeps the uh the business concerns of your order service completely separate from your shipment service. It allows you to be able to evolve your data model in each of those Services completely. However, you need to evolve those. It also allows you to be able to define a message structure that you want to be able to use in your outbox table to be able to communicate between these services, and that structure could be very, very different and you can even evolve that structure over time.

D

As well, without necessarily impacting one or the other service.

B

Foreign cool, if I, because when we speak about microservice and and data management, usually there's a lot of uh it could be dangerous Integrations because you most of the time end up doing strange stuff to keep everything's up to date. In this way, this is a great way to keep accomplished things simple: make it really simple but effective, sometimes so,.

D

Exactly and and the beauty about this is that the payload, like I, said the payload that you're sending or communicating between these microservices is really entirely user driven. So whatever you want this payload column to contain, you really can kind of you know, even whether you want to share the data between these microservices as an aggregate, whether you want to send individual events to represent rows or whatever it's really entirely up to you and your business requirements, but it really yeah.

D

It does keep everything completely decoupled and really allows for that evolution of uh a specific concern without impacting another concern in your architecture. Absolutely.

A

Super cool um and Chris I was wondering looking at this demo as well. So how was the world before change data capture I mean uh before the video how the people was managing uh were managing this.

D

I mean it really depends um so uh in a prior life, we managed this using things like uh Microsoft Biz talk um was was a great tool of being able to do some of this kind of stuff, um but it was a lot more complicated.

D

um You know. People have also used things like uh camel to do, polling to being able to. You know, capture those events and things like this. um You know, but then you look at other. uh You look at other architectures of other applications. For example, uh in past lives, I've worked with Oracle, uh Erp and and their particular example. In their scenario, they do try to approach this. From a perspective of where your order, entry environment is separate from your accounts, receivable and accounts payables right, so they try to have.

D

Even though all the data is within the same database, they do try to Silo things as much as possible, and then they use things like interface tables, which is a very similar way of. Like the outbox pattern of being able to put data into a table. You then have another process: that's responsible for reading this table and ingesting it and manipulating it and importing it into another subsystem inside of the database that we're inside of that application, and so really this this pattern we've been using this for quite a long time.

D

You know, even even sometimes you might have even used this in Old ETL approaches where you wanted to gather data across multiple tables, and you wanted to present this data in an aggregate way to some consumer. So it's it's really an approach that has been around for quite a while.

A

But in a batch way right now.

D

A

Batchware- and this is the event driven way, which is the reactive way if you.

D

Will exactly exactly cool because that's the that's the benefit of change data capture. You want the events to come to you, you don't want to have to go, get the events yourself and that's really one of the big benefits of using change data capture today.

D

A

Cool- and you mentioned another pattern right- this was the outbox yeah there's another one.

C

Yeah exactly so I'm looking at the time I mean we could we could quickly go through it, um I think yeah, our slides very quickly or also a demo. Let's see I mean maybe I just uh explained the basic idea and we could probably look into a demo as well.

A

Yeah yeah no problem, I I mean we have uh eight minutes, but if we go some minutes uh beyond the the time, is it's not? It's not a problem. Yeah.

C

So uh then, let me explain it briefly on a couple of slides and then I try to show you the a demo for that as well. Okay, yep! Thank you great. So the strangle effect pattern. um We mentioned that um uh earlier, as well as the other pattern that we wanted to show you, and very often so. Chris already touched upon uh also a micro service architectures a couple of times, and so sometimes we don't start with microservices architecture. Sometimes we have uh Legacy applications.

C

We have monoliths that we want to probably gradually evolve into a different architecture and let's say we want to go from a monolithic application towards microservices, and the idea is we want to do it step by step. uh Usually we know uh from experiences in the field that uh trying to to do a a big rewrite in one go is very often just like not working out that well so, um and that means, when you think about it, you want to extract different uh modules or components of your monolith into separate micro Services.

C

You need to find a way that the monolith and some parts that have been extracted already can temporarily coexist. uh While it takes you uh to uh migrate the whole application that can be many years in certain cases, um so this coexistence idea is very, very important here as well and- and so here we have a very simple scenario: it's a monolithic application, and we assume that this is uh somewhat well structured in the sense that there are some components, some modules, that in that monolith. So we don't assume it's it's it's.

C

It's very uh bad messy big ball of mod architecture here and then the idea is that we want to take out some of those modules, as they are called here into separate red services, and the way this works is that you first introduce a proxy that just routes all the read and write related traffic still to the monolith. At one point, then you start to implement um a micro service, but that micro service needs some data.

C

So uh let's say we do it for the customer module here and then you would set up a division, change data capture pipeline that captures everything that is related to this customer module and propagates it further over Kafka into a separate database, and once you have that customer related data there, thanks to CDC, you can start to implement that micro service. That should work with that data, and once that is done- and let's say we also Implement- that micro service in multiple steps uh we just serve, read, requests to begin with. We reconfigure our proxy.

C

That's the read.

C

Requests are now coming from the micro service and are not directed to the monolith anymore and then once we have that we can also say well, we want to support, write scenarios so changes to this customer service module uh now in living in its separate micro Services should be written in the micro service, but that at the same time, means at the monolith and some modules over there might need to those changes as well, and for that we would set up a an additional change data capture pipeline into the opposite direction, so that we can bring the rights changes to the customer related data into back into the old monolithic application.

C

And so that is the idea, and you will do that for all those modules. One by one uh like I said stepwise and at one point you can shut down either modules or when you are done with the whole migration. The whole monolith, so I have a quick demo here as well to show let me go into my command line.

C

You should see that so there's there are a couple of containers up and running here, um and the idea here is: let me go to the browser once more, because um I have it open in a separate tab. So there's a proxy.

C

This is engine X that runs on localhost now and that proxy will be used to do that redirection of traffic between monolith and microservice, and here we have chosen a deliberately over version of the springbed clinic as our monolithic application and what it basically does is it, uh among other things, it you know, manages owner data owners of different pets, and this is backed by a mySQL database. So the first thing is we want to do. Is we want to have that owner's View and build that in a separate micro service?

C

So for that we will go briefly back to the um to uh and reconfigure our proxy, so nginx after reconfiguring will do one thing it will take uh as from the monolith and redirect us to a micro service in this case now, uh when I search for owners, I'm redirected to, uh in this case a quacos microservice and of course it doesn't find anything.

C

Yet because it has no data yet- and this is what we are going to fix now- we are going to run a MySQL Source connector for two tables here, for the onus and for the pets table and because we have seen that on this view, uh this was not only owner's day that they also showed the paths of the owners and, and once we have that MySQL connector in place, we configured it to listen to two particular tables of this pet clinic application.

C

um We should be able uh to uh have one thing, um namely the data is in Kafka, but it's nowhere else yet. So, uh in addition to that uh think about we mentioned, usually we get our changes uh from tables into separate topics, and now we have an owner's topic, but also a pets topic, and we need to find a way to bring those together so that we can serve a read view that contains owners and paths related data.

C

So basically, we need to join those two topics together and for that, in the background, we have another quacos application, running Kafka streams and what it does is. It listens now to those changes and will join that. But first, let's see how the owner's data looks in Kafka. So we have seen this before. This is one record one owner record from the mySQL database: it's an initial snapshot, so we don't have a before State here um and the pets data they looks similar.

C

um So we have here another topic which contains the different Paths of these owners. Okay and then we have like I said we need to bring that together uh in Flight, basically, and for that, in the background, we have a Kafka streams, application running written with caucus and it joins these together. So what we then get is uh documents such as this. We have owner related data here with all the owner day, owner fields from the database, and we have pets which is an array.

C

Of course, an owner can have multiple pets and now that we have it in Kafka, what we can do is we can in The Next Step uh register a sync connector, and in our case we say our microservice written in Quakers. That should take that data and serve read. Requests is uh running mongodb, so we configure a mongodb sync connector, and once we have done that those uh pre-joint uh data documents make it over to the micro service.

C

So now, when I refresh I have that same view that we have seen earlier in uh the spring pet clinic monolith in my micro service here, together with the pets names. Okay, so let's quickly check one thing: if we want to edit an owner, we are brought back uh again uh by the proxy. It's a write request. We are coming back to the spring pet clinic, okay, and let's make it uh change here.

C

John uh though, and write that data we have chanto written into the mySQL database and in the background this is propagated into the Kafka topic. The Kafka streams uh chop will will do that, uh a join for us and bring it uh to uh finally to the uh to the uh microservice. So, let's just briefly check if that change made it over and yes, you see it here. The change made it over into the microservice and since we're about to uh finish, because we are running out of time.

C

uh That was basically just a a quick idea of how you can use a CDC pipeline to feed specific data of a monolithic application into a into a micro service and its own separate database.

C

um With that uh I mean the demo has a couple of more things, but I think we are basically uh running over otherwise. So um again, the idea is, we can do the same thing for the rights path.

C

We would just reconfigure the proxy once more and set up a CDC pipeline into the other direction, and then we could uh change our owner's data in the micro service and we could propagate them back into the monolith again, thanks to the BCM, it will capture them the changes with the mongodb source connected to Kafka, and then we have a jdbc sync connector that writes those back into the monolith. Maybe uh one last thing that we wanted to touch upon. If we have time, let me know we could do that.

C

Probably Chris wants to wants to quickly uh tell a little bit about how dbzoom is integrated into other parts of redhead products. Yeah.

D

Absolutely so um so so at Red Hat, we do have a number of different uh products that are either um currently in development, going through uh initial preview or or part of commercial offerings. uh Some of those include the openshift streams from Apache Kafka.

D

um We know mostly internally I believe is rosac, um so this is pretty much where we are exposing a fully compliant Kafka broker environment for users to be able to interface with and to use any kind of connector to being able to send and to consume events from uh an Apache Kafka environment, that's managed by Red Hat.

D

We also have the openshift connectors environment and our open shelf connectors uh solution, which is actually using the dibesium connectors, uh as well as the camo K connectors, in order to be able to capture changes from various different sources and to propagate those changes to a managed, Apache, Kafka environment, running on red hat openshift streams for Apache Kafka and then, of course, uh there's the red hat integration.

D

This is a stack built on amq uh streams of being able to uh integrate um your applications across a hybrid infrastructure it be able it allows you to be able to compose and to orchestrate and transform um any kind of data um using CDC and Cloud native platform to support any type of modern application.

D

As I said, the museum is bundled here as a product offering to being able to provide reliable and flexible real-time integration and messaging.

B

A

We've seen two very cool live demos and the example for the demo. Correct information are in their repository. We we shared before right in this one.

D

The the outbox pattern is I, don't believe this. The Strangler fig pattern is there, um but we could certainly look at adding that in the future. That shouldn't be a problem.

C

Comes in its own repository that I did for a conference talk uh previously, but I. We can share it afterwards as well.

A

C

A

So call to action to anyone is going into the pigeon website to discover what is the video download. The video run, the video in your kubernetes in an open ship and there's an operator for division, Chris and Peter.

B

A

Can install the Pigeon in the cluster there.

D

Is you can use uh streamzy uh to being able to deploy a division with Apache Kafka connect? Oh.

A

Wow wow, so you can basically deploy stream. The operator you have, you are able to have Kafka and division. um So you can you could you can test all those uh example? The outboard outbox pattern, Strangler partner, a pattern um you have the in the repository. You have some of the examples and a fox today, we've seen live demos um and Chris big shout out to you for live demos early in the morning. That is super cool. Thank you.

A

B

Here or we met at.

A

The conference soon.

B

A

Good but and Peter your demo is super. Super cool I mean uh that I, like it also the bi-directional pipeline for the read and the right. This is a really cool real example right, so I'm looking forward to see this also in the example directory, because it's super vertical example demo to show so folks. Thank you all. Thank you both for joining us today.

A

It was a pleasure and we would like to have you back uh for any update on the pigeon uh um and when you want, when you come when you want to come and um Fabio. Our appointment today is openshift TV in uh in the afternoon, for.

B

A

B

Announcement today, also, what's new on hobby shift, yeah.

A

Please don't don't miss the what's the news on openshift uh today at openshift TV, um you, you will find it in the openshift TV official schedule. We will come back Fabio next week uh with an episode with uh dynatrace with openshift. We're gonna have some full-time diameters, so uh see you next Wednesday same time and please stay today connected to openshift TV for the what's new. So what's new in the next version of openshift. uh Thank you again.

B

A

Have a good day and talk soon, ciao.

C

Thank you, bye-bye.

A