Red Hat OpenShift Upstream AMA | OpenShift Commons Briefing, 22 Feb 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OCB AMA: Scribe - Asynchronous Data Replication with John Strunk and Scott Creeley (Red Hat)

Description

Scribe is exciting for its unique, light weight and agnostic data movement capabilities for any storage type including File, Block and Object. Scribe also supports all Kubernetes based storage drivers, CSI and non-CSI compliant. It takes advantage of best-of-breed industry data replication technologies using rsync and rclone controlled by a single CR based interface. Scribe also utilizes CSI capabilities like Snapshots and VolumeClones if supported by the driver. Join this briefing for a introduction, demo and live AMA session with Scribe project leads!
https://github.com/backube/scribe

A

All right, everybody happy monday, welcome back to openshift commons today, as we like to do on mondays. We have an upstream project with us and many of the team leaders and we're going to make them tell us all about their their project, and if I get this right, it's asynchronous data replication, which is what scribe does for us, and we have john strunk ryan cook parole singh. um I see scott creely somewhere in the background and gee margot margalit I'll. Let you introduce your team john and there's going to be some live-ish demos here.

A

So um ask your questions in the chat per usual and we will try and get to them all in the end and have live q a so take it away. John.

B

Awesome, thank you diane, so yeah we're uh we're here today to tell you a little bit about scribe um and uh there's a number of folks working on the project. Right now, um as diane said, I'm john strunk and we've got ryan parole, um guy and scott that are also helping out, and uh so, let's, let's start with a quick um overview of of what we're going to cover today.

B

So we're going to start with a few intro slides on what exactly scribe is about, and then we've got uh three demos because everybody likes demos, so we got three demos uh keyed up for you and then we'll finish it off with a little bit of q, a let's get to it.

B

So let's talk a little bit about your data right, so you know kubernetes and git ops. It works really well for stateless applications right. If your pod crashes, you lose a node. Kubernetes is more than happy to reschedule your pods somewhere else in the cluster and in the event that, if that your cluster goes down, if you're managing your application and configuration via git ops, you have all that information and it's easy enough to just apply that.

B

To your backup cluster and again you're back up and running nice and easy, but if you've got stateful applications, it's not really that simple right, you may have the configuration for your app ready to apply, but the hard part is getting your data over to that secondary cluster right, that's the difficult part, and that is why we decided to build scribe.

B

So what scribe is is a kubernetes operator that is designed to do cross-cluster, asynchronous data replication, and it does this in a storage system independent way. So you don't actually need the underlying storage system to support the data replication right, so we handle it all on top and one of the nice bits about that is that you're not forced to run the same, uh the same storage system on all of your sites. So, for example, if one of your clusters is running in the cloud, you can use a storage system that is optimized for that.

B

Whereas if you know you've got some small cluster running at an edge site, you can use something: that's customized to that that resource constrained environment and you can still replicate your data between those sites.

B

Scribe makes use of csi capabilities of clones and snapshots if they're available. So we use that in order to create point in time copies of your data to replicate. But if your storage driver storage system doesn't support it, that's okay, we can still copy your data without and as well. Scribe is designed around an extensible architecture uh so that if the storage system does support uh optimized replication natively, it could also be integrated with scribe.

B

So when we think about where all you know where all we might want to use scribe, uh probably the first thing that comes to mind is disaster. Recovery right so replicating your uh your applications, data from a primary cluster to a secondary cluster, but it's also useful for data distribution scenarios.

B

So perhaps you have a central cluster, that's generating some data, and you want to replicate that out to a number of edge sites right, so you can use scribe for that.

B

It's also useful for um for say, data migration like within your cluster. uh So if you want to swap out your storage system, maybe change uh vendors, that sort of thing you could use scribe to move your your data. That way, as well as migrating data between cloud and on-prem environments, you also use it for off-site analytics or even just replicating your production data for dev and test scenarios.

B

So scribe is built around this notion of data movers, so there is, we have one data mover that is based on rsync and that's really optimized for uh one-to-one volume relationships so, for example, that asynchronous disaster recovery scenario right where you're trying to replicate a volume from a primary to a secondary cluster.

B

And then we have an our clone-based data mover that is optimized for the one-to-many volume relationships for data distribution scenarios and both of these support both in clusters so cross namespace, as well as cross-cluster replication and all of scribe is built on top of just kubernetes and csi primitives, which means that scribe is really well positioned to take advantage of some of the upcoming data management enhancements that are coming in kubernetes, like volume groups and data populators and container notifier as well.

B

So in terms of where we are today with scribe, we, I guess about two weeks ago, made our first release and that has both the rsync and the rclone data mover in it.

B

So I encourage you all to go and check that out and the way to do that is to head on over to artifacthub.io. We have a helm chart up there and you can grab that helm, chart and install on both kubernetes and openshift. So we'd love to have your feedback.

B

What you think of the initial release and in terms of where we're going high up on our list, is getting scribe packaged for operator hub, so that it's a little bit more front and center uh in open shift as well.

B

We are in the process of adding a third data mover that is based on rustic and that is to handle more archive type use cases, we're working on adding uh metrics to the operator so that it's easy to keep track of the current status of the replication relationships and as well we're working on adding some helper programs uh to make it a little bit easier to replicate data into and out of kubernetes environments as a whole.

B

Right, because we realize that not everybody's I.t environment is 100, cube at this point and then finally, where to find us. So you can go check out the documentation it's over at scribe, dash replication on read the docs or you can check out our code. That's on github and we'll put the slide back up again at the end, so that is the overview.

B

And now what we're going to do is we've got three demos that are keyed up for you, so the first one is going to be uh showing off the rsync based data mover uh specifically in a disaster recovery scenario. It kind of takes a look behind the scenes a little bit about kind of what scribe is doing in the background.

B

In order to replicate the data from one cluster to another, then there is a second demo that is showing off the rclone data mover and that is to show how it could be used for data distribution, and then we've got a quick third demo and that one is really about how scribe can be integrated with red hat's advanced cluster management to really simplify uh the management of your stateful apps.

B

So as we get into the first demo, what I'm going to do is provide a little overview here on this slide of kind of what you're about to see.

B

So what we have is two different clusters right, so we've got a primary site that is running an application that has some data volume and then we've got a secondary site and we want to replicate uh that data over to the secondary site so that we could move our application.

B

And so what we're going to do in the demo is we're going to create a custom resource, a replication source over on the primary side that points at that data volume to replicate and then on the secondary side, we're going to create a replication destination that provides a target for us to replicate the data to once those are in place. What scribe is going to do is create a data pipeline from the primary site over to the secondary.

B

So with each sync iteration, it's going to take the applications volume clone it and give that to the rsync data mover that data mover is going to push the volume contents over the network to a volume on the secondary site and then once everything makes it across scribe will take a snapshot to preserve a point in time, copy of that data and then based on the schedule in the replication source.

B

That process repeats and updates the the snap the snapshot on the secondary site. Then, whenever it comes time for us to move the application across, what we do is just take that most recent snapshot turn it back into a volume and spin up the application.

B

So that is what you're about to see in the demo. And so what I'm going to do now is stop sharing my slides and have diane play the first demo for.

B

B

This is a quick demo of cross cluster replication using scribe. What we see here is: I have two clusters. I have a primary cluster in this first window running in us west and over here in the other window. I have a secondary cluster running in us east and what we're going to do is use scribe to ensure that our data is replicated so that we can move an application between clusters if necessary. So, let's start by taking a look at our primary cluster.

B

So what we see in this cluster is that I have a simple wiki application. That's running and here's the pod for the wiki and the data is residing in a pvc also in this namespace, and what we need to do is replicate this data over to our secondary cluster, so that we could fail over our application if necessary. If we take a look at the secondary cluster, what we'll see over here is that we have the same application deployed.

B

However, it's currently scaled down to zero and you'll also notice that it doesn't have any pvc associated with it, because we haven't replicated our data over here. Yet so, like we saw earlier in the slides, let's go ahead and set up scribe to do the replication, so first thing that we're going to do is set up the replication destination over here on the secondary cluster.

B

When we take a look at the scribe cr, what we see is that we're asking it to create a 10 gig volume for the incoming data and then with each sync iteration to preserve a point in time, image via snapshot. So let's go ahead and add that to the cluster, now that we've inserted that into the cluster, let's go and check out our namespace again.

B

Now what we see is here's that replication destination that we just created and the operator has taken that and it's working on setting up the infrastructure necessary to accept the incoming transfers. So the first thing that we see is that the operator has created a pvc in the namespace. That's going to receive the incoming data. It has also set up a load balancer, that's going to act as the end point for our source to eventually connect to, and in a minute uh once this pvc uh gets finishes binding.

B

We should see the rsync pod start up, so there we have our pvc and we have the rsync data mover pod. That is currently spinning up, and so that's actually going to accept the connection from the remote side, all right.

B

So now that that's done, let's take a look at the custom resource and what we see here is that scribe has added to the status field, the connection parameters necessary for us to configure our source so that it can transfer data to this location and what that information consists of is the address to connect to right. So that's our load balancer as well. It has given us a set of ssh keys that exist in a secret here in this namespace, and so we need to transfer that secret over to our primary cluster.

B

So what I'm going to do is I'm going to save that secret out to a file and now, let's go over to our primary cluster and set up the replication source so over here? The first thing we're going to do is insert that secret and now, let's edit the replication source cr. Okay, what this is going to do is define what data we need to replicate. So the first thing, we'll notice here, is we're specifying a source pvc to replicate. That is our wiki's pvc.

B

We've set up a schedule to replicate once every two minutes to our secondary site, and then we need to specify the ssh keys, which was in that secret that we just inserted and then the thing that I need to change, which is the actual address to connect to so we're just going to go and copy that from what the replication destination provided and we should be all set.

B

So, let's insert that into the cluster now and now, if we go and take a look at our primary namespace, we'll see that the operator has started to work here, and so it is just spinning up the first replication of data from our primary over to the secondary and so what it has done is it has created a volume snapshot from the application's pvc and currently we're waiting for that snapshot to become ready, but once that finishes it will be turned back into a pvc for use by scribe and then that will get picked up by the source, rsync data mover and sent over to the other side.

B

In the meantime, let's take a look at our application, so this is the endpoint for our application running on our primary cluster. Here we see it's just a simple wiki we can come and we can edit the data and we see that makes changes and eventually those changes will get replicated over to the other side. So it looks like right now. Scribe is in the process of replicating data over to the remote site. Well, that happens. Let's go and see.

B

What's going on over here, so we are awaiting that first transfer, still, okay, so the first transfer has completed, and what we see over here is that there's now a volume snapshot on the secondary cluster that has a copy of that data that was replicated and if we look back here at the source.

B

What we see is that we are going to be starting another replication iteration very shortly, and it's probably this next iteration that's going to copy our little wiki change across. So we will check back in a minute.

B

Okay, here we see that our next sync iteration has started so scribe has taken a snapshot of the application's volume and we're currently waiting for that to be processed into a usable snapshot once that succeeds, we'll again get a new persistent volume that will be used by the rsync data mover to update the volume on the remote side. Okay, so our snapshot is ready to use.

B

So this pvc should bind shortly. There we go, and now the rsync data mover container should start and send the data across.

B

There we go now over here on the remote site. It has updated the volume snapshot to contain the the most recent data, but let's go and take a look again at our replication destination and again take a look at the status and what we see is this latest image field and that always tells us what the most recent volume snapshot is, so that if we want to spin up our application, we can do that.

B

So we're going to take the name of that volume snapshot and we are going to add that to our customize and use that to scale up our application over here on the secondary site. And so what happens with that? Customize is. It goes and creates a new pvc from that snapshot that latest snapshot as well. It scales up the wiki deployment.

B

So if we go back and look at the namespace again, what we'll see is we now have a pvc for our wiki. It was again created by that snapshot that we copied the name in for, and we have scaled up the deployment we're currently waiting for it to become ready.

B

uh Here's the pod, for it again we're waiting for that, but as soon as that becomes ready, then we'll be able to head back over to our browser, and we should be able to see the edit that we made to the wiki back over on our primary site, except it'll, be here on our secondary okay. Our pod is ready, so I'm going to copy the address here for our secondary site.

B

This was the primary with our little edit I'll open a new tab and paste in our secondary site, and here we are so notice. This one is over in east1 and there's the edit that we had made back on our original cluster. So that's a quick overview of how scribe works. Thanks for watching.

A

All right there we go. Do we want a few more slides here or go right into demo? Two.

C

Oh no, we have, uh I have the control, I am showing the sling.

A

Take care take it away thanks.

C

Okay, so to wrap up the key points from the first demo, we saw the replication of a wiki application from primary to secondary, and to do that, we saw how the scribe operator replicates by using point-in-time copy of the application data, and it preserves that image name in its cr on the secondary site to restore the application on the secondary site. All we need to do is ensure that the test that the that the destination or the secondary pvc is restored from the snapshot that is preserved in the cr.

C

Okay, so we saw how we do our sync-based replication. Now we have a second demo that shows how to use scribe for a white fan out replication, which we believe have potential use cases in ed scenarios where you need to replicate or distribute data from a primary site to multiple edge sites.

C

The pipeline is more or less same as the rsync, which john has explained in previous sites, so we have an application that uses a pvc to store its data and we have a replication source cr running on the primary side.

C

So whenever the uh at the start of each replication iteration, what the operator does, it creates a snapshot or copy of the volume using csi driver if available or it does in a non-csi fashion if they are not available in the cluster. So once it has created the snapshot, it moves that data onto an intermediate storage.

C

For our case, it's an s3 object store. It creates a rs. It creates a scribe. Our clone based data mover job that pushes the data onto the object store on the edge side. You have a similar scribe data mover job that pulls this data and creates a temporary pvc to store that data.

C

It then creates a snapshot or a copy of that data and stores that image name in the replication destination cr to ensure that your edge application is up to date with the data on the primary. All you need to do is restore the pvc from the image name that is stored in the replication destination cr.

C

So you can see that our clone is our clone-based data. Mover job is a push and pull mechanism where the central hub or the primary side pushes the data onto an intermediate storage and the edge cluster pulls the data from the intermediate storage for the demo that I'm going to show. Next, uh I have a kind cluster with two namespaces, the source namespace, which act as a primary site containing the source of truth and the desk name space which act as the its side that will pull data.

C

The application I am going to use is a mysql tv which and the pvc is provisioned by the host path and we are going to use the snapshot feature of the csi drivers.

C

Can you play the video.

C

C

So for the purpose of the demo, I already have a kind cluster running and I've deployed the scribe operator, which you can see over here. Next, I'm going to create two namespaces, the first called source that will act as a primicite and the second as test that will act as the edge.

C

Side, so that is my first namespace.

C

And this is my second namespace, so in my primary namespace on my primary site, what I'm going to do is deploy a sql database mysql database.

C

Application, let's see if the application has been deployed or not.

C

Okay, so it had, it is creating the container. uh Let's give it few seconds to get up and started.

C

Okay, as you can see that my sql application is up and running, let's see uh what pvc it is using to store its.

C

C

So if you see here, uh it is using a pvc which is called mysql, pb claim and what scribe will do now is uh going to create a point in time snapshot of this pvc and copy that data onto the intermediate storage, which is our s3 bucket.

C

So now that we are verified that the database is running, let's go ahead and create a new database uh in this.

C

C

As you can see right now, we have four database inside this application. Let's create a new database called synced and we will be verifying this particular database on the edge site or not.

C

So we have a new database and we will verify if this new database is present on the edge site or not once the replication is completed.

C

So now that we have created a new database, it's time for us to deploy the replication source cr on the primary side, but before we do that, we need to create a secret which is called our clone secret and the operator would be using the secret to push the data on the intermediate side, which is our case. Is a aws. S3 object store, so I am going to go ahead and deploy the.

C

C

So you can see that a new secret has been created now it's time to deploy the replication source cr.

C

So we have deployed the replication source here. Let's go and see, let's go and evaluate the cr.

C

As you can see that it has a status field and it is, it has a trigger, so what it means like it would. It is scheduled to run every five minutes to push the data on the intermediate storage and the source pvc that it is going to use is my sql pv.

C

C

And you can see that this data mover part is running, so let's go and evaluate what's happening behind the.

C

C

So we see that the our clone based data mover job is running and the operator has also created a pvc which is scribe, src database source and this pvc is created or mirrored from this snapshot, which is over here.

C

So basically, that uh describe operator creates a snapshot based out of this ppc and it creates a temporary pvc which is called scribe. Src database source and the rclone based data move job uses this pvc to push the data onto the intermediate object store. So let's, let's wait for uh the operator to finish moving the data.

C

Okay, as you can see that the scribe operator has finished, and it's now time to verify if the replication was uh is happening or is initiated on the destination side or not.

C

So to do that, uh I am on my destination namespace and just like the source side, you have to create a secret which you can see over here and now, I'm going to deploy the replication destination cr.

C

Okay, so I've deployed the replication destination cr on the edge site and now I'm going to watch what the scribe operator is doing behind the scene.

C

So we see that we have a new pvc which is crime, test database destination and a new volume snapshot, which is the scribe test. The new snapshot over here.

C

C

Let's extract the so let's go and verify what scribe operator has done on the destination side and to do that, let's see uh what is happening in the replication destination cr.

C

So again you see that it is a trigger paste that is scheduled to run every five minutes and in the status field. You can see that the first week consolation is complete, and over here you can see that it took the data from the object, store and created a snapshot out of it and that snapshot image name is preserved in the latest image field of the replication destination.

C

So now, just to uh to sync your database application on the it's side, all we need to do is uh ensure that the database application restores its pvc using this snapshot name. Let's do that and see if the database that we created called synced is present on the destination side or not.

C

So I will extract the latest point in time: snapshot image, which is this, and I am going to create a pvc out of it which the edge database application would be pointing to so. I've used this pv, uh this snapshot name and I'm going to replace it into uh I'm going to create the pvc out of.

C

It so now, let's just go and deploy the mysql database application on the edge.

C

Site, as you can see, it has deployed, let's go and verify.

C

It is creating the container, let's wait for the mysql part to be up and running.

C

Okay, as you can see that the mysql port is up and running next, I'm going to.

C

Next, I'm going to verify if the synced database is present in this mysql application or not so to do.

C

C

Great, so we see that the synced application sync database that we created on the primary site has been replicated on the ed site as well. So that's all for the our clone based replication demo.

A

There you go back over to you.

C

Okay, so back to me, and so we saw the our clone based replication um that scribe uh uses, and we think that this has uh potential use cases in its scenarios. So what we did in this demo is, we did a replication of mysql based application from one namespace to a different namespace and the scribe operator uses an intermediate storage like s3 object, store the primary site pushes the data to the intermediate storage, while the edge size pulls the data from the intermediate storage.

C

So so far I have been talking about how uh this white fan out replication has potential for its scenarios, but I didn't actually showed you uh how you can move it between different clusters. Didn't I all I showed you is like how you move from one space to another namespace, so to prove my claim. We have a third demo. That's coming and ryan will show you how you can integrate scribe with red hat advanced cluster management to easily scale applications over tv run.

D

There we go all right so, um as paul said uh this last demonstration, it's going to kind of glue all of the pieces together, um john and pearl both talked about and demonstrated how scrub creates a kubernetes centric way of managing replication. Pretty much. The easiest way to say is: there's yaml files to control your replication.

D

So the really cool part about that and john mentioned early on is that we can use git, ops, tooling, such as red hat, advanced cluster management, argo cd to manage our application placement and then, with the addition of scribe, then we can handle our replication replacement. So with both of those combined, then we can actually just scale out sides as we need as we see fit.

D

You can bring a cluster up and be in business within 30-40 minutes, so for the final demonstration I'll be showing scaling out clusters with red hat, advanced cluster management and red hat advanced cluster management will do the applications and scribe will do our data.

D

So, as you see from this page here, we'll start out with the primary cluster you'll see it's uh rhacm, it's labeled within rackham as local cluster, and this will be running ocs as a storage class.

D

Then we're going to actually scale out to us west 2, where we're going to use the storage class of gp2 csi and then from there we're going to bring in a bare metal cluster and we're going to utilize, hostpath storage for our docuwiki application and the whole key takeaway in this architecture, and even this demonstration is that regardless of storage class, underneath scribe's going to manage the data and to the use of get-offs, tooling we're going to effortlessly scale out so revisiting poolslot from earlier you'll see our three clusters.

D

We have our local cluster, we have our aws cluster and we have a barrier metal cluster. um Our application will be created and updated here and then any changes to that application will get a snapshot.

D

But I will move the data to a bucket and then we'll actually be able to update both clusters at the same time, or even if one of those clusters is on a boat or on a plane. Whenever that cluster comes back in connectivity with acm, it will be replicated, the new data will be there. Any application updates will happen.

D

So, as you see, combining both of these technologies is just a huge strength, because you're no longer having to go to each cluster and kind of poke. It so, like I said at our primary site. Any changes that we make are going to be sent to these other clusters and with that diane I think I'm ready for the video.

D

So, jumping into the rackham console, we will take a look at our clusters. Currently, there is only our source cluster that we showed earlier, which by default, is named local cluster raicum handles application placement based on labels shown here. You will see storage of ocs in site headquarters, as well as some various auto-generated labels, the storage label and the site. Labels are used to determine what storage class to use and whether the location is the replication source or destination.

D

To save some time describe components, storage, class modifications, our replication source and destination objects which parole and john had showed earlier, and our docuwiki site have already been defined within rackham. Here's, our docuwiki page with a simple hello message. Now it's time to bring up the remote sites.

D

We will create an aws cluster and specify the labels of site, remote and storage gp2. This will automatically configure storage class changes and place our replication destination object and deploy our docuwiki application. As you.

D

As you can see, the docuwiki site has been deployed on our new cluster automatically thanks to rackham and scribe with the same hello message that matches our primary deployment. Now we will import a metal cluster. This cluster was too small to deploy openshift container storage, so we will use hostpath as our storage class, we will use the labels storage, equals host path in site equals remote.

D

When the metal cluster becomes ready within rackham, our docuwiki application will deploy, we will now update our docu page. The changes will be synchronized to the remote clusters and when we are ready to update our docuwiki site, we will update our pvc definition and deployment within our git repository.

D

Reckon then, will reconcile these changes and, as you see in our own clusters, show the new message. First, we will see the changes sent to the metal cluster.

A

And then the aws very short video was it supposed to be that short ryan, or was there yeah, okay, good? There we go with.

D

This scenario, we showed it that bicep.

D

So, actually, that's great, though, that you asked that I mean if we could scale out clusters and things are that simplistic to do. That shows the strength of not only rackham but scribe. So that is perfect. um The parts that we did definitely cut out of that, if everybody you know, has spun up an openshift cluster before you know, you know how long it takes to spin up a cluster.

D

So we did cut those parts, but it just definitely shows that by implementing a mature process having a get ops tool like argo, cd or rackham, and then with scribe, you could just have this beautiful scenario and if you're talking about like a factory, you could just spin up new factory locations as you see fit. If you're a restaurant you can pop up faster than you can even imagine so, and just the ability to scale and not have to worry a new way to figure out how to get your data.

D

There is exactly what we want to show today.

A

Well, even with the video editing, it was still incredibly quick.

D

That's that's the key, that's a good thing. You know uh keeping it simple. uh Definitely it's going to make everybody's lives easier.

A

All right: well: um are we ready for a little q, a and conversation about this now, john and peru? So that's.

E

A

This seems to me um and I'm gonna- I need a few folks here and and and paul if you want to join in as well. This seems to be a very early stage and someone is sharing their screen with me still and I'm getting their slack channel. uh So that's all right um happens all the time.

A

It seems to me a very a new project as like this, when you guys approached me to do this talk. um How long has scribe been around.

B

So I guess it's really. We only really got started um back around what maybe october or something of of last year. um You know it's something that you know we want to get started for a while, but um you know just in terms of of scheduling and that kind of thing. So it is, it's still really really new, um but uh we're excited with the with the potential.

A

I know it seems huge and like lately it seems um and no surprise to anyone. We've been having all these edge conversations, so it seemed like when you walk through the beginning in the beginning of talk, this edge data distribution problem and solving that problem for edge and iot devices that that are out. There is huge and it seems like it's a big part of um what we're going to need to move forward successfully in the edge space anyways. So um that's that part's really amazing um and cool.

A

So I can see lots of applications for that. Are you um working with people who are deploying things on the edge? Is that one of the the major reasons for building scribe? Or was it just the data replication that drove you to start? It.

C

Oh, I think john can answer what was the motivation behind, but down the line when we were developing this project.

C

There was a time uh we were getting a lot of emails and people were talking about edge clusters, single node openshift, and we kind of thought like how can we have a solution that is storage independent because with scribe you don't even have to depend and be dependent on the underlying storage system you're using so let's say if one cluster is using gp2 when someone is using hostpath, you can use tribe uh to replicate data, irrespective of the underneath storage. So as we started to build scribe, we thought like.

C

Oh, this is a really cool case that can be applied to edge scenarios as well and that's how we started investing more time on it.

A

So yeah yeah, so everybody's talking edge these days and you know, there's that's that's sort of the new hotness right now. I think in my little world of spheres. So that's it's amazing to see this done and I um totally appreciate it. So um this the code is in github correct. Yes, I saw a url.

E

A

There, um uh how are you um use besides things like this openshift commons briefing? How are you interacting with the community? Are you um is this something that you're going to try and grow a big community around? Is this just a piece of a bigger project?

A

How do you, how do you situate yourself, and how do you want people to interact with you.

B

So um we would, you know anybody, that's interested, please, you know come come visit, the the github repo. You know open issues start discussions, whatever um that's kind of our our primary way. Right now um we are, you know, trying to get uh scribed into various forums. You know to give talks and and that sort of thing um but uh yeah.

B

No definitely you know try it out check it out on on artifact hub um and then you know send us your feedback, open issues, I'm sure there's a bug or two in there somewhere, but uh you know we'll get it.

A

Never about uh yeah, it seems there's a couple of cigs and maybe if, if paul's around um that it would be great to get this in front of um in in the cncf. So I can see a lot of interest coming from that and that maybe this is something we might want to throw into a sandbox in the cncf and some not too distant future, because I think that's a great way to get other people to participate in a project as well and as well as to get other kubernetes folks there.

A

So yeah the artifact hub, helm, charts the operator stuff sounds like it's coming soon to a theater near me. um So uh that's that's good! Is there um an operator a section of the repo itself where you're working on that in public or is that still behind some firewall or on something.

B

Yeah, no, um it's uh it's all there in the the scribe, repo! um So there's the uh there's the kubernetes operator and then there are- and so that's like one container whenever you build it and then there are these other. The data mover um containers right so one for rsync, one for our clone and then the one that we are working on um for rustic.

F

Perfect, have you gotten any outside contributions yet or is it mostly the team so far.

B

It's just the team so far.

A

Yeah, it was new, it was a new topic to me paul. So that's why I was really interested in getting you guys on, because I you know it's like whoa. Where did this come out of, uh and this is because we've had in the okd working group a number of conversations with people, especially from the fedora iot and the core os teams, and you know doing kind of interesting stories, especially around bare metal and um edge stuff.

A

So it's definitely something we want to take you to the okd working group and um make you show off as well. So we'll share this with them, but I think this has got a very broad reach, so um it'd be very interesting to see how other people respond to this. Is there anything out there that competes with this or is similar to this any other projects.

B

um So the the thing is right: the asynchronous data replication- or I mean just data replication in general- is something that has traditionally been done as a part of the storage system. Right because you know in sort in traditional I.t environments, you know it was. It was all up to the vendor to do that.

B

Replication- and you know this was actually kind of one of the one of the reasons why I thought it was important for us to build this operator um that you know to be a cross vendor um replication engine right, because there's there's kind of that that lock-in um of relying on the storage vendor to do the replication- and you know not all of them- support it and that sort of thing, whereas you know kubernetes, is really good about abstracting away the underlying uh hardware and environment that you're in, and so you know, we thought that it was really important to be able to provide those advanced data management capabilities.

B

Also, you know in a a way that allows you to move across. You know different footprints and environments.

A

So yeah, this is the question that I always ask myself is: what overhead does it add to um an application, and can you take a look, I noticed the reference to prometheus and other monitoring things, but I'm wondering what what overhead does it add to my application.

B

Yeah, so um I would say directly to your application. There really isn't any overhead today, but I mean it is going to use some some resources in your cluster and you know I'll be perfectly upfront if your storage system itself supports replication.

B

You know natively it's going to do that more efficient than anything that you know a scribe operator is going to be able to to do, but um the the data movers that we've chosen- you know like rsync, it does go in and it calculates just the changes that have been made in the volume, um and so it does try and and minimize the amount of traffic that goes over the network.

B

That sort of thing um and- and so you know, we've we've tried to make it fairly lightweight in that way, and so we we think that, for a very uh broad spectrum of use cases, this will be a good solution.

E

A

So any um questions out there in the uh chat room, I'm not seeing any yet I'm wondering if anyone's actually deployed scribe yet in production. Is this still um that new that we don't have customers or end users, giving you feedback yet.

B

Yeah, it's it's still pretty new, um but we're uh we're we're talking with folks trying to uh trying to convince them to give it a shot.

A

I I think I think you've got a really good shot. I I can think right off the top of my head about two or three folks that um have been talking about this problem with me, so I'm going to definitely hit them and we've got one question coming in: could the traffic traveling over be compressed and that thus reducing the traffic.

B

Yeah, absolutely um so the you know for the case of the uh the rsync data mover, for example. um It's just it's the rsync protocol over an ssh tunnel between the two sides right and so the ssh connection itself does compression. um You know rsync is doing the the deltas of the files. So um yes.

A

Great good to know.

F

Not a question, but I just wanted to say uh before we get too far afield of this topic, that uh I think that demos and presentation you gave today like that would be great to show to cncf sig storage when you feel like you're, ready.

A

Yeah definitely okay yeah, but we can set you up with that. That's um that's what I was trying to figure out which which one definitely storage but there's a few others too. I think there's even an edgy um say coming around soon. So there's another question coming in. Is it possible to encrypt the data.

B

So um the the data whenever it's going over the wire, um at least with the uh the rsync uh protocol, it is going over an ssh connection and in the demo there we had to copy that secret from from one side to the other, and that was basically moving the ssh keys uh from one side to the other, so that both both sides could authenticate each other and properly encrypt that traffic.

B

So I hope that was what they were after there. With that question,.

E

A

It will give everybody a few more minutes to post any other questions are any final words from ryan or parole or anyone else that can you throw back up that final slide with the links on it so where to get a hold of y'all find you.

F

Any limits on the number of uh edge sites that can pull right now.

B

So one of the uh sorry I'm just trying to put up the slide again.

B

um So one of the uh the benefits about using that intermediate site is that that takes the load off of your central cluster right, because your central cluster is really only pushing it to one place, which is the the s3 bucket in the case of the the demo.

B

And so then all of your edge sites can simply hit that s3 bucket and not overload your your central site or your central cluster.

F

A

E

And johnny, sorry, I have a follow-up question. Does that mean that it only works, though, that that system only works? If you have a cs snapshot, if you're using something like the rsync, does it still go to an intermediary location, or does it go directly to the destination cluster.

B

Right so um in terms of well, so I guess there there's two sort of there's two things to separate there. So uh one the intermediate uh site is used for the r clone mover only um and whereas the rsync mover goes directly site to site right because it's a it's just a one-to-one um replication relationship.

B

um So there's there's that side of it. Now, in terms of the clones and and snapshots with csi, um there is a configuration both on the source side and on the destination side that allows you to basically enable or disable whether you want to do snapshotting.

B

So on the source side. You've actually got three options. You can get your point in time copy of the source volume either via clone or via snapshot. So you know the most efficient way to do. It is just directly clone the volume to be. You know, on the source side to hand that, to the data mover in the demos that we did, we were using um the the ebs csi driver which doesn't actually support clone, so we had to use the snapshot mode where it takes a snapshot and then restores that snapshot as a volume.

B

um In order to get point in time copies, but then there's the third mode, which is basically um to just use the live volume and replicate that on a schedule, and so that gets you around, you know requiring csi snapshots or clones, but you lose the sort of instantaneous view of the volume. When you do that right, so you can potentially get a little bit of skew in there whenever it's being replicated and then over on the destination side.

B

You have your choice of whether to snapshot after each um sync iteration or to just you know, leave the the volume, as is right, which you could do. If your storage provider on the destination doesn't support uh snapshotting.

B

Does that help.

A

Did that answer your question? Sean.

F

I think so I.

A

E

I just have to think some more, uh but thanks john okay.

A

All right more in the chat, um is it reasonable to use for huge volumes? Do you have you done benchmarks or um tests.

B

Right so we haven't done, we haven't really done a lot of benchmarking of it, yet um the you know, obviously the the bigger your volume, the higher your change rate and stuff. Like that, um the the more I guess, the more latency you can potentially see in terms of um you know how long it's going to take to to replicate your data um in terms of doing large volumes.

B

I would say: I'm not all that concerned about just having a big volume in terms of data. It could take a while to get the first copy of that over to your secondary site, but um sort of as the as the the replication is ongoing.

B

It's really just sending the the deltas right what has changed for each iteration, and so I expect it to be much more.

B

Much more based on what your overall data change rate is as opposed to how big your volume itself is.

A

So bench benchmarking will be interesting when we get get to that stage and what it is. We should actually be benchmarking so um because everyone will have a different scenario. So that's going to be interesting to to figure out what the best thing to benchmark is all right, so we're at the almost end of the hour. Anyone got any more questions we'll give them a second, otherwise we're going to say thank you and we're going to have you back in a few iterations.

A

So we'll see what comes out in the next releases and if folks are interested, please do go to the github, repo and reach out to john or parole or ryan, or anybody in the storage team. Over here at red hat, um you can get a hold of them all and um we'll definitely keep keep you posted, because I know this is something near and dear to a number of folks hearts and I'm really pleased to see this solution.

A

uh So um thank you for taking the time today, everybody and for the wonderful demos and the especially the very, very short, acm rackham, one ryan that was impressive, even with the minor edits of launching the clusters. That was great. So thanks very much um and thanks everybody for joining us today and we will um keep you all posted on scribe's progress and see what we can do about getting you in front of sig, storage and other places to get the word out. So thanks again,.

F

B

A

F

Presentation, thank you.

E

E