Red Hat OpenShift OpenShift Commons Briefings, 1 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IBM Db2 Warehouse MPP on OpenShift with Jana Wong (IBM), Michael St-Jean and Sagy Volkov (Red Hat)

Description

IBM Db2 Warehouse MPP on OpenShift Container Storage
Jana J Wong, Db2 Performance Lead Architect (IBM)
with Michael St-Jean and Sagy Volkov (Red Hat)
OpenShift Commons Briefing
July 1, 2020

A

Hello, everyone, my name, is karina angel and I'm here with michael st john and siggy volkov from the red hat storage business unit, as well as jana wong from ibm, and we are here to talk about ibm db2 warehouse on openshift container storage, so really excited about the performance test that they've been doing on ocp and please michael. Take it away thanks.

B

A lot karina yeah so today we're going to talk a little bit about uh I'll, give a little introduction to db2 with red hat uh openshift, and then I'm going to turn it over to suggy and jana to talk a little bit more about the testing that was done and the performance results. So let's uh go ahead and get started here. If, um if you're not familiar with db2 well, I think you should be because db2 is actually one of the industry's leading database platforms.

B

A lot of very critical workloads run on on db2, as well as being able to be deployed in multiple different ways, so think about it. From an oltp perspective, let's say you have cloud native developers that are developing uh applications that needed a relational database.

B

Db2 is is a great piece for that and then as well, db2 can be used as a data warehouse, symmetrical multi-processing type deployment, as well as a multiple parallel processing deployment. They also have event store, which is a in-memory database, can be deployed on the cloud or through cloud pack for data.

B

So for our testing we looked at db2 warehouse. Db2 warehouse uh includes a built-in machine learning, uh as I mentioned, both smp and mpp processing and in database analytics combined with this ibm blu acceleration.

B

uh What we did here was we used db2 warehouse, multi-parallel processing uh for our tests, and you know, typically, as we look at, that, we have a lot of customers who are looking at business, intelligence, type, workloads, aiml type workloads, and so uh we wanted to focus in on that. But as well, you know we felt that uh the db2 warehouse mpp uh tests would give us a good indication because they're very complex workloads.

B

We thought that it would give us a good indication of how well uh db2 works in an openshift environment with openshift container storage uh and then as well. You know where uh we're looking at this from a hybrid data warehouse perspective, so you're able to do rapid data retrieval and you have flexible deployment and scalability, regardless of how you're deploying that that implementation.

B

So as you look at why people are you know considering running db2, because, typically, you think about some of these databases running in a traditional on-prem type of environment. Why are people looking to modernize their their their data infrastructure and move db2 to more of a containerized development environment?

B

Well, if you look typically right now in the industries about 71 percent uh or more of organizations are planning to contain rise, existing applications and so for the past few years, ibm has been doing a lot of work to make sure all of their software applications are running in a new modern, containerized environment, and you see this a lot with customers who are trying to uh modernize and go cloud native with their application, development, application deployment and now a lot of those machine learning and data intelligence types of the applications.

B

So why is db2 as a cloud native database important? Well?

B

First of all, we have the ability to do very rapid deployment if you think about, and and if you're familiar with openshift and the openshift operator, you're able to to deploy uh your applications much faster with the db2 operator, you're able to do a very quick deployment to a a worker node, uh you have simplified life cycle management. So, as you know, in your day, two operations as uh new uh application versions come out. It can be automatically uh updated.

B

uh So you have a much easier uh update process which gives you, if you see in the middle and the bottom, faster delivery of new features and then db2 services can be deployed as microservices and as I mentioned these day, two uh operations they can be developed, they can be spun up, uh they can be updated and scaled independently right and then we have that flexibility that I alluded to, for you know on-prem or uh across private clouds, public cloud type of deployments.

B

So what are some of the key benefits I mean? Typically, if you're looking at uh like an open shift, containerized type platform, uh this reigns true with uh delivering your uh db2 environment as well. It's you know. It's all around agility so being able to deploy db2 uh when you need it where you need it. So one of the key benefits now is that your your data, scientists or application developers don't need to go back to an administrator to get resources for their projects. They can spin up projects very quickly very easily.

B

They can spin up sandbox type of environments if, uh if they don't like something, they can just trash it and it all gets done uh automatically within the the kubernetes infrastructure. So this is a great uh deployment strategy for people who want to take advantage of db2 and then, from a scalability perspective, running on openshift container storage. We saw we see and we'll show you in some of the testing that we did, that we meet or exceed resource utilization, scalability and performance expectations across the board, and then it's also about reducing the complexity.

B

So, with openshift container storage, we provide the data services that are provisioned, just like you would provision compute resources for the db2 application and it's all done within one unified control plane. uh So what is some of the technical differentiation that we see? Well, you know by doing a lot of the testing uh internally with ibm. We have a validated solution, so you can trust in the reliability and the performance of the environment. So you know, as you as you look, for example.

B

Ibm has done a lot of testing around these solutions with uh with other storage environments, so they have a good idea of what works. What doesn't work? What performs well! uh This is why we're coming to you today. We want to tell you about the great performance and uh and the exceptional uh deployment uh experience that we had right and then, from a security perspective, there's a lot of security already built into ibm db2, uh but also by using openshift and openshift container storage.

B

We have very strict standards or security, so you have a very secure storage layer for db2 across that entire environment and then from a life cycle management perspective. I already kind of talked a little bit about this, but openshift container storage is designed and tightly integrated in with red hat openshift. So you have a consistency across your user experience in that type of environment and you're able to manage your compute resources for the application, in this case db2, as well as your storage layer uh independently.

B

That brings us to the next point here being able to independently scale your database resources or your application resources from your storage performance.

B

And then what does it mean to you? Typically? So if you take a look at it from a big data or analytics aiml director's perspective, you know running ibm db2 on red hat. Openshift. Container storage gives a big data director the ability to scale storage as their needs increase with reliability and performance, and the ability to utilize red hat openshift to run both db2 and the storage that supports.

B

It makes operations more efficient and better utilizes uh existing I.t skills that you might have already uh in your it department, right from a data architect or data engineer, perspective, think about open shift and openshift container storage, providing more of a modern data architecture. That's based on containers and the kubernetes orchestration.

B

The whole platform can be configured to be highly resilient and it can scale without arbitrary, arbitrary limitations right, so data can be hosted, redundantly across multiple geographically distributed availability zones or failover zones to provide business continuity and a rapid disaster recovery for that type of environment and then, from a data scientist perspective supporting ibm db2 on openshift container storage empowers the data scientists to innovate without artificial constraints or without constraints placed on it by the storage administrator.

B

So with kubernetes operators, data scientists can work entirely within red hat openshift to program their infrastructure for both the db2 application and red hat openshift container storage. So they can focus on innovating focus on their solutions and uh and not worry about the underlying infrastructure, and uh here we have a quote from pyotr mirzebski he's uh the director of db2 development for ibm data and artificial intelligence.

B

One of the great things about this implementation is with no prior experience with openshift container storage. They were able to ramp this up within a couple of weeks, so you know they actually thought that it was going to take them a lot longer without any prior experience, took them about a week to set everything up another week to get everything tuned up and the test ready and you'll hear more about that from jana and uh siggy. So I'll pass this on to them to talk a little bit more about the test layout.

C

Thank you, michael. Let me share. I like how uh we all look very young in the picture that you put in with the last quote me and jana are going to talk about the actual testing that was done. I'll concentrate on the layout of the cluster of the openshift and ocs cluster and jana will talk about the results, so we have this decided to use uh the data warehouse version of eb2 for a few reasons.

C

First of all, it's designed to do a massive parallel processing of data, so we want something that will hammer our storage, the openshift container storage as much as possible. It's also um pretty much what a ibm usually use when they test a new uh storage subsystem. The data wireless version and uh the uh end of this long, db2, wh oc, means on the cloud and the reason uh we decided on uh on the cloud was well.

C

Everyone is doing something on the cloud and also um it was the in terms of a constraint of what we could use at this point of time. The cloud was easier to do as you can see in this slide. There's a few calculations that are being done uh in trying to basically match a db2 data. Warehouse to the cluster that you are running on each in the past would run a a a partition of the data warehouse.

C

What is called a logical, node or multiple, logical, node and mln, and um there's a few calculations that need to be done. Minimum of eight gigabyte ram per core of whatever you are running. uh You can see it uh uh all in here. The other uh part of this equation is that we also looked at the openshift best practices and uh it says, leave two cpus per node and then gigaback, eight gigabyte of ram per node for openshift. The rest you can use for your application.

C

So with that in in mind, um as I said, we decided on a aws. um This is a um a seven node in openshift for the three cluster. um It's seven, because there's only uh one master, don't try this in a production, but for a budget perspective, it's a it's easier.

C

We decided to use four nodes of a r5a 4x for the worker nodes that are going to run the db2 pods or the mlns, um and these these um these nodes are are known to have a very good ratio and communication uh um bandwidth between the cores and the memory. These are amd uh nodes, if I'm not mistaken, um and we are using uh three uh instances that are of the I think aws call them a storage instances, uh i3 and 2x.

C

um These are basically um an aws instance that has directly attached to it or supposedly directly attached to it uh two uh nvme devices to 2.3 terabyte nvme devices, so our openshift container storage cluster is basically formed out of these three nodes. Each of them has two storage devices gives us a total of six devices um and we ran our initial uh test on everything on a single zone.

C

So doing all of this calculation of how much we need to keep on the let's call them the db2 worker nodes. How much we need to keep for a openshift uh basically gives us this amount of requirements that was used for each of the mlns or the db2 partition pods, and the total of the db2 capacity.

C

uh Four db2 compute nodes, four mln 52 cpus and 460 4168 gigabyte of ram. um This is how it looks um basically in a in a nice uh uh nicer diagram um on the top. The four open shift nodes that are going to run db2 on the bottom, the three uh openshift container storage nodes that are going to provide the storage and our single master on the side.

C

um For uh for the setup itself, we basically uh needed uh what I guess in the db2 world is called a storage zone. Two types of a storage zone- one is a shared storage zone. uh This shared storage zone uh has a is using a cfs.

C

Sf is one of the uh is the building blocks of openshift, container storage um and um portion of cfs is uh or directly is used to share information between the partitions of the db2 ports or the instances of db2 that are running on a different nodes.

C

Also, the test data was created and stored on um on this ffs directory in order to create once and then upload using external table into a all the database airpods, and then we have also um a a zone or storage zone that is not shared, that is pair database instance or pair database port, and this is using uh safe, rpg the block device option of. uh As stated this. This zone needs um very high performance in storage, and so that's why uh we chose uh cflbd for that.

C

um This is a little bit of a how db2 looks in the kubernetes slash openshift world um um db2 is installed as a or the dp2 mlns are installed, as a stateful sets, uh there's a an another version of a hcd that runs as a stateful set and and basically to track information and heartbeat between the different partitions or the different ib db2 pods.

C

There's other parts that are running in the background, some from a management. Some is a toolbox, but um the two most important one are the to the top from the openshift container storage layout of configuration.

C

When we ran this, we were still running version 4.3.

C

As I said, a we used um um those nvme devices um that the aws instance provided um what you, via what we call direct, attach storage, we're using the local storage operator to basically hand out these nvme devices or as spvs and then openshift container storage in return, use those as the building blocks for the sf cluster and then provide the storage uh from there, because the the nodes that we use, uh we want to keep it as a um cheap as possible, or um we uh I had to tweak a little bit uh the cpus that I gave to other components of a openshift container storage, because we are mainly going to use um a little bit of cfs and mostly of a rbd block.

C

So you have the resources that I kind of limited, which basically, in future versions, we might uh have the ability to even control these dynamically, their resources for pair different uh components, of a openshift container storage, which will basically mean that we will be able to provide even more performance or more resources just to the rbd portion and allowing a dp2 to get even more performance from the same layout.

C

Just another, uh quick diagram of how openshift container storage basically looks those up in the top are basically our excuse me, our db2 uh pods uh that are running in in some nodes, um uh the red, the openshift container, storage uh pods. We have um several of them. Those many osd's that you are seeing are basically a pod that gets attached into a storage device that the cef cluster will use. So in our case, we had six nvmes. We had six osds um demons.

C

Look up on on these osds on the metadata on them, uh provide information on where to read and when from and there's other pods that are also a part of the openshift container storage and.

A

C

That I will move this to jana.

D

All right, let's take a look at performance. So when we looked at performance, we basically wanted to answer four basic questions. One is uh how well does dvd db2 on red hat open shift container storage perform in general? How well are we utilizing our system resources?

D

uh How does the system scale as we increase the size of our workload, and how do we compare to existing cloud-based storage solutions? For those are the four questions we were going after. um In order to answer these questions and to test the performance we utilized a workload, that's called bdi. I would like to give a little background on what this workbook represents uh to show that it is really relevant as a typical data warehouse application.

D

Pdi stands for big data intelligence, it is an ibm defined workload and it basically models a life in a business intelligence application where we have a retail database with in-store online and catalog sales of merchandises.

D

um The schema of this pdi workflow follows that of the tpcds benchmark specifications. So our standard industry benchmark and that comes with seven product tables um like store sales and store returns, catalogue sales, catalog returns, wrap sales grab returns in an inventory table, in addition to 17 dimension tables, where we store information about the customer data about the items about the products and so on, um interesting that this record or what we can do is we can generate this database at any scale factor and in order to analyze the performance of db2.

D

With red hat, open shift, container storage, we use a one terabyte workload with scale factor 1000, basically, which means we work with about one terabyte of raw data or black files.

D

um In order to also see how well the system scales, we also set up a two terabyte pdi broker to see what happens if we increase the size of the data by 2x. How does performance change um now to the query site? So the workload is a query only workload it comes with. 100 varies that were inspired by cognizant generated spl for dashboards and reports, and we basically have three types of users that are represented with this workloads for one we have the returns.

D

Dashboard analyst, so that's a person that would investigate the rate of return, the impact on the bottom line of the business. These users run typically very simple queries that can be answered in sub seconds and 17 out of our 100 work. Employees fall into this category and we consider them simple pairs.

D

The second user that is represented is like a sales report analyst. um He would generate sales reports um to understand the profitability of the retail enterprise.

D

These users run more intermediate queries with like runtime of up to one minute and 25 out of our hundred rupees queries fall into this category and they call we call them intermediate queries, and then we have a third user. That's the deep dive analyst, so the data scientists and they use handcraft to deep dive analysis to answer questions um identified by both the returns, dashboard analyst and the sales report analyst- and you know these are very complex queries. We have several minutes of runtime and we have five of those very complex queries.

D

um Now we have two different ways: we can run the workload and we utilize both of those during our um you know, during our testing. For one, it's the serial mode, where we have a single user that basically runs through all 100 queries from beginning to end, and we measure like how long does it take on this particular system to finish running all 100 pairs and then there's a second mode that we can run and that's the concurrent or the throughput test mode where we have a given number of users.

D

In our case, we used 16 and 32 concurrent users that submit um a number of queries or an unending amount of queries um of a certain category to the workbook or through the database for a period of in our case one hour. So we want to see how much work can we can get done within a one hour. Time frame.

D

The queries we chose are only intermediate and heavy queries, not the simple ones, because obviously those can screw the picture off the throughput quite a bit. So we're looking at to kind of stress test the system by allowing 16 or 30 concurrent users to just put intermediate and heavy queries against the database for a period of one hour, and we get a throughput of varies per hour.

D

So to summarize what we did for our runs, um both on the one terabyte and the two terabyte setup. We did a serial warm-up run so on a cold buffer pole to see just 100 queries. We did a serial three iteration run where we run each query three times measure the total elapsed time for that, and then we have two concurrent or throughput tests with 16 heavy users and the 32 heavy user again needs we're using intermediate and complex variations.

D

All right, you can move to the next slide. uh One more! That's right! um So here on this graph, you can see the overall performance summary.

D

What we hear c is on the left, we're looking at warm up and serial run performance, so the bars in blue represent the results for the one terabyte set up the bars in purple for the two terabyte setup and again we're measuring the elapsed time.

D

How long does it take to run all 100 queries, so the warm-up run for one terabyte took 3.8 minutes uh the three duration serial 10.87 minutes um for the two terabyte, the warm-up was 7.7 minutes so about uh 2x um the time of the one terabyte workload which we would expect because we doubled the data and then for the two terabyte serial three iteration months. We had 22.6 minutes, which also is about two times the amount of the one terabyte three iteration runs now on the right graph.

D

We see the results for the multi-user run through protests.

D

Important here to note is that we are looking at how much how many queries we were able to finish within a one hour time frame, so the higher the bar, the better here versus on the left side, the lower the bar the better there, because we looked at total time, but now on the right we're looking at throughput here, you can see that for the one terabyte um 16 user run as well as 32 user run.

D

The throughput is between 440 and 460 queries per hour and, as we move to the two terabyte runs, the throughput goes down to 252 and to 28 horsepower.

D

What is nice to note here, which we will talk a little bit about a little bit later too, as we increase the data size by a factor 2x, the performance only goes down by factor 1.7x, so we don't even see a 2x drop, which means that the system scales pretty well all right. Let's move to the next slide.

D

Now that was just the overall overview. What we just saw the numbers in itself uh on one system, don't mean that much other than the scalability factor. We need to compare to something, and we also need to answer well. How well does the system utilize its resources?

D

So we took a look at three things: cpu utilization, disk utilization, as well as memory and network utilization. So one of the most important things we often look at is cpu utilization. So we look at you know how busy are our cpus during our runs? We did this for all serial and multi-user runs. We're going to focus here on the multi-user runs, because that is more interesting as we look at it. So in this you see on the top.

D

The top two graphs represent the one terabyte cpu utilization- and here you see, uh the db2 nodes are averaging about 65 during the 16 heavy multi-user run, so there's still room. uh You know we're not totally maxing it out. The ocs node cpu utilization is also fairly low, um but as we increase um our data volume to two terabyte, we see that cpu utilization goes up from that 65 now to 90 for the db2 nodes, and we also see an increase of cpu utilization on the ocs node side.

D

So this overall shows you know we are utilizing our system resources pretty good. Okay next slide.

D

um Now, let's take a look at disk utilization. So um again the top two graphs represent the one terabyte run results or disk utilization on both db2 node and ocs node, and on the bottom. The two little graphs show that two terabyte ones. Now I understand that the picture is fairly small, but it it's okay, because we only need to kind of see um what has changed between the upper graphs and the lower graphs.

D

So on the top for the one terabyte runs, you can see um that we um have a fairly low disk utilization, um I would say maybe around 25 yeah with a few spikes here and there. um One thing to note is that the one terabyte setup occupied about 42 of the available disk space that we had and almost all the data fit into buffer pool.

D

So we would expect to not see too much disk io because most of the data is in buffer pool and we can rate straight from there, but that changes as we move to the two terabyte one. So when we have set up the two terabyte workload um about 85 percent of this space was now occupied and we now only fit about 25 of our data into buffer pool, which means we have lots more um disk io going on. We need to clean pages from the buffer pool.

D

We need to read new pages into the buffer pool, and that is very much reflected on these graphs. So if you compare the bottom left to the top left graph, you can see that we many times reach this busyness of like 100. Even you know it spikes here and there, but it's definitely much higher. We also see on the right side that on the ocs notes, we also see that this, the disk utilization increase um one thing to notice. I think saggy mentioned that earlier.

D

We have run this on four db2 nodes and three ocs nodes. The graph here is a representation of one of those, but good to note is also that the disk utilization is very similar across all four db2 nodes and across all three redhead ocs nodes. So that's a good sign that there's no imbalance going on all right. Next, like this.

D

Now, let's just take a quick look at memory and network utilization, so they overall appear healthy that they didn't represent any performance bottleneck. um We have memory available to the ocs nodes as well as the db2 nodes, but again they don't represent any problems.

D

Overall, the ram is utilized as expected, and again we saw that memory and network utilization is very similar across all four db2 nodes and all three ocs nodes which, which shows that you know we don't have any screw going on. We really have a good balance of how everything is working again, as I understand that the picture is very small here, but I'll walk you through it. So the graph in the middle represents the bars.

D

The blue bars show the um the run results for the one terabyte setup and the purple ones the one the run results for the two terabyte runs, and you could see that you know as we increase or double the amount of data. The throughput reduction is only 1.75 x, which suggests good scalability, and the reason for that is that during the one terabyte run we're not running our resources to the max. We saw that cpu utilization was around 60 as we increase our data. We we maximize it more to 90 to 100 percent.

D

We see on the on the data business that you know we go. We utilize our system, resources really well. We are reaching often you know, 90 to 100 percent disk utilization and also our ocs notes show you know this increase. um This is pretty significant. We have been testing with other systems as well in the past and have not seen this great of a scalability. This is really good news for us.

D

um Another thing that I want to mention on this part is also that um during our four-day test window that we had on the system, the perform the system performed really well, we had no unexpected outages, so the resiliency seemed very good. So that's a very good thing um now in order to also evaluate how well does the compare performance of red hat on of db2 on red hat ocs compare to existing cloud-based storage offerings, we have run the same set of tests on different configurations.

D

um The one pictured here is one that basically comes closest in terms of number of mlns number of cpus numbers of ram um that we had in comparison to the red, open shift, container storage. um We measured the um you know the same type of test, uh one terabyte and two terabyte bdi workload. Pictured here is the one terabyte output. um We would run the warm-up and serial runs as well as the throughput runs. Pictured here is the throughput runs, which is again more relevant and we would, in the end normalize it. So.

D

The um cloud-based storage solution had uh about 50 percent more ram, so we ended up normalizing the numbers. What would it look like if we had about the same number of ram? The number of cores or cpus was the same to start with, so when we do that, we can see that we are pretty much on par in those two cases which suggests you know the performance is as expected, it's doing really. Well.

D

Sadly, you can continue from here. If you like,.

C

This is just a beginning of a journey between openshift container storage platform and db2. All this data is already in a published white paper and the next white papers are going to concentrate on our all sort of failover scenarios um which, from the db2 customer's perspective, is super super important.

C

We're going to do also some bare metal performance and use not only data wireless but otp and all up uh uh workloads uh to test everything um and then and and then also um ibm uh cloud pack for data version. Three will have support of a openshift container storage for, and I think that's about it. These are the uh the people that help us uh besides uh me and ayanna, and manny luik and rishi and peter, and we want to thank them as well.

A

Thank you all three of you that was excellent and we do have some questions first question is: does db2 need its own at cd.

C

Yes, for my understand, it does need its own hcd. It might in the future not have to use that, but right now- um and it is super lightweight.

C

In in terms of resources anyhow, but right now db2, this is what they use into to share information on many status of the partitions between the pods.

A

And another question is: does a cloud-based based block storage?

A

Is that the same as ebs.

C

We could have done this with ebs, and that would be mean that for openshift container storage, as the building block, we would have had to use something like gp2 or io1, which will either make things extremely more expensive or probably also extremely more slow in the gp2 case.

C

So I don't know if aws consider those storage instances as part of ebs. I don't think so. uh This think of it as an instance that has a two storage devices directly attached to it in ocs, uh with ceph, basically create a cluster uh from this and manage the storage uh protects. The data replicate the data and all of these.

B

One one thing I might just add is you know if you're looking at openshift container storage, it gives you the file, block and object storage protocols. So you.

A

B

Have to go to like an ebs.

A

All right another question: in the layout diagram, you showed that you deployed persistent storage on different nodes other than db2 can't I run everything together in openshift.

C

You can um this goes actually more on the requirements for dbe2 um and right now. The requirements are for a db2 pod to consume all the resources uh on a particular uh openshift node minus all the resources that openshift needs are. So this is a db2 requirement. um I'm not a a a db2, a expert on that, and I don't know if they are going to change it.

C

I think it's more of a reflection of the migration from you know bare metal uh where these processes of db2 we just want to consume as much resources as possible on each uh on each server moving into the openshift world or slash even to the cloud it's kind of uh right now, continuing with the same um the same line of thoughts.

C

So technically for sure you can do this, um but right now I think the dpt requirements are to have the db2 pod on um um consume. All the resources on on each node.

A

All right, another question is so db2 you're testing on openshift. uh Does this also run in ibm cloud pack for data.

B

Yeah, maybe for those that aren't familiar with the ibm cloud pack for data, uh you know it's it's one of the ways that you can purchase services for db2, so you know with the cloud pack for you.

B

You have a single bundle where you you purchase that one bundle and then you can have multiple ibm applications and db2 is, is one of them, both the db2otp and uh db2 warehouse and, and that can all be run within that license, and then they can uh all run that within the openshift environment and then, in addition, there's this ibm storage suite for ibm cloud packs that gives you the ability to deploy all your data services, including it.

B

It does include the red hat, openshift container storage, as well as red hat sep storage for any of the cloud pack environments. So uh so that's an interesting, interesting way of purchasing. You know overall, ibm services for your environment.

A

All right, another question is which version of ocp openshift container platform had we tested this performance.

C

Yeah, so this was a ocp ocs 4.3, um the next white paper, the next uh white paper, that's going to come up on failovers will be with 4.4.

C

A

Are you the 4.5 is coming up shortly? Are you? Are there plans to.

C

We might do 4.5, uh it depends on the ocs side on the opposition container storage side, but it's definitely going to be 4.4 or something higher. Okay,.

A

Thank you all right. Let's see tests show that you're running in aws. My data is sensitive and I can't have it sitting in a public cloud. Can this be run on-prem in my data center as well.

C

Of course it can be, you can run the openshift cluster either on bare metal or on some on-prem virtualized environment, and you can do literally the same setup in terms of the openshift container. Storage, provide your bare metal devices to the openshift container storage uh pods, so you're only gonna get better performance.

A

Well, I assume that's why that you're publishing your performance tests, otherwise we wouldn't publish them right.

A

All right, I haven't, run openshift container storage. Before do I need to go to training or send my admins to training. um So what's the barrier to entry here.

B

Well, I mean, I think, that's one of the great things about deploying in the openshift environment. You know the operators really streamline the day. One day, two operations- uh and you know, as you saw in the quote from pyotr, uh you know getting the environment up and running and performing to scale, was very simple. You know even for a team with no prior experience, uh you know. Of course you know. Red hat does offer services and expertise if you do need help, but you know getting up and running day.

B

One is is pretty fast and easy, um and maybe I don't know if segi or jana have something to add to that, but since they they run the environment.

C

Yeah well I I just want to say that the quote that you showed from pietre is actually um when the dp2 cp4d team was doing their own initial testing and there was no actually involvement from reddit.

C

At that point, they basically installed their own openshift cluster and installed a ocs on their own um and that's actually where the the quote is coming from so the environment that we tested on. um You know. Obviously I know what I'm doing so. I know how to install ocs, um but the quote actually comes from a completely you know: db2 only separated a team doing their own testing.

B

D

Yeah I had made this same experience from what you were describing. I have not been on reddit ocs before, but it was really easy to just get up and running. I I mean I, I asked probably a few questions. How do you do this and that here, but it was very simple overall and we only had four days on the system, and I think we got more than done than what we expected. We would be able to do in the time. Can we, oh, I think.

D

Originally, we were planning to only do the one terabyte test you know, but we were able, in the end, to add a completely different set as well and provide just some more data points that were relevant for the white paper.

A

Are there plans to put kind of a configuration blog together or is there a configuration guide? I can just walk people through this or what are your thoughts.

C

Specifically for this uh for a db2 data warehouse environment, um I mean for oc from ocs perspective. We are um I correct me from michael. We are going to come up with some kind of a sizing configuration guide and, of course, when it comes to storage, things change and varies from the cloud or even any cloud provider and their own specific storage abilities and and then to uh the on-prem, whether it's bare metal or virtualized.

C

um So I do think we're gonna come up with a with a sizing guide um to help. People understand that.

B

Yes, I I think that would be good for us to to take a look at you know, so we we currently have uh an internal sizing guide. That's you know kind of based on capacity measures and how to how to configure uh the the solution across different clusters right, but uh adding that perspective of you know what do you need for performance? What do you, what are your performance expectations and how you can should configure what disk you should use, etc, etc. Is is helpful?

B

uh I know that there's a knowledge base of deploying ibm db2 on openshift, uh there's a knowledge based article around that out there right now. I don't know that it really gets into the storage perspective on that. um Perhaps the the white paper will help with that, but uh I agree with cd. We should probably look at uh adding some some more information into our ocs documentation as it pertains to sizing for performance.

A

And do you have any final thoughts that you want people to to really take away from this? You have your webinar that's coming up about.

B

Yeah yeah, I just I wanted to mention. There is a panel discussion with some of the ibm and red hat executives, uh that's scheduled for the 28th of july. Unfortunately, I don't have a link to that yet, but um it uh I believe it's probably going to be posted out on ibm.com events, um but uh in any case um you know if there's, uh if there are questions around that, you can get back in touch with any of us. We can give you some additional.

C

C