Apache Cassandra Cassandra Summit 2015, 14 Mar 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Microsoft: Azure + DataStax Enterprise Powers Office365 Per User Store

Description

Speaker: Sean Usher, Software Engineer
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running dse on azure.

A

B

Today, we're going to talk to you about how Asher and datastax Enterprise powers the office 365 / user store, so I'm going to start by giving an introduction of who we are and what it is that that our organizations do and we're going to talk about what we built and why we needed to build it. We're going to talk then about how we built it using Cassandra and spark and then why we chose azure and datastax enterprise as the platform to build it on now.

B

At that point, sylvanas going to step in and he's going to talk about how you can build it on top of a tree using some good tools that were built.

B

So my name is shawna sure, I'm a senior software engineer and office 365 contact information is there. My team really focuses on currently running cassandra for other teams in office, as well as building platforms on top of cassandra, so that we can understand our customers better, and this is silvano koreana.

B

He is a principal program manager and Azure and he'll talk more about himself when he tells you how you can build it so office, 365 I'm, assuming many people here- are familiar with it, you've, probably heard of word and excel and outlook, but those are client level products. There are also server level products, so historically enterprises would buy things like exchange for running email, SharePoint for document management. They would get the hardware install it. They would maintain networking machines disks going bad and office 365.

B

We take all that overhead and we move it into Microsoft, managing it. So IT admins can focus on adding business value. On top of that, instead of dealing with what discs broke last night and then Azure, which I hope all of you know, of its Microsoft cloud computing platform, it provides compute database services, machine learning, mobile services. If any of you aren't familiar with Azure come talk to silvano after the talk and please sign up for a short rials, so you can start deploying things and see how it works and maybe even deploy Cassandra.

B

So what did we build? So an office 65 because we're hosting all these services for a lot of large organizations and they all use services in different ways. Some people rely heavily on email. Some people rely on sharepoint and some people use features of those services more than others. So what we need to understand to provide good service to them is how they're using it, what their experience is. Like historically we've had synthetic monitoring that would run to try to simulate user experience.

B

We would have passive monitoring that would look at server logs and give us aggregated data on how users were we're experiencing the platform in different pieces of hardware. But then we couldn't say you, as an organization are having this experience in our system. So we really wanted to build a way to understand our users and their organizations at a deep level and to start doing that, we wanted to answer a few easy questions and a few really hard questions. So the first one is our users happy with their service are receiving.

B

This is a really hard question, because how do you come up with a number that says they're happy or not, and then our users fully utilizing the service they're paying? For you know if we have an organization that signed up for a thousand licenses and they're using 500 licenses, we don't want them to have to pay that extra cost. We want to be able to reach out with them and work with them and give them a better experience. Our users hitting issues we can proactively help them with issues can happen on our side.

B

That can also happen on the organization side. They could go and make some licensing changes and the users are still trying to use the service, but they're not getting access to it, and we need to be able to reach out to them and say: hey. We see that a problem happened, it's impacting a lot of your users. How can we help you fix it and what does a user experience bin over their lifetime in order to provide good business value for a company?

B

You have to be able to show them that you're, providing business value and one of the ways of doing that is to show them. Here's your availability here, the number of support cases here, the number of incidents you've been impacted by so they can really go to you know their leadership and say office 365 is providing us with value and then there's a really difficult one on how do we discover patterns in our data that we aren't aware of and our users going to pages?

B

You know following some kind of wizard and then leaving now, if we're seeing a lot of users doing that, that can mean that. Well, maybe our page is broken. Maybe it's not intuitive and it's something we need to feed that data back into the product to say: here's how we can provide a better experience. So to do all of this, we need to store a lot of data, but, more importantly, we need to be able to aggregate on this data fast if we take 24 hours to come up with this.

B

This insight about the customer discover their problem. They've probably already called support. They probably already fixed it and we're going to come to them too late and say hey. We knew you had a problem eventually.

B

So we had a few requirements. We wanted to run on the cloud we managed bare metal in Microsoft and office 55 all the time, but that adds an overhead if you want to dynamically scale. We also wanted to, of course, be highly scalable ingest around 50,000 events per second to start with rapidly growing and a few other common requirements. The biggest one for us is really tunable consistency.

B

We've used a lot of data services in the past that enforced full consistency and when you're writing a lot of data and looking at aggregates of data losing one piece of data: it's not really the end of the world, but having your whole pipeline backed up because you can't get full consistency means you can't do anything with the data and then you can't tell your customers anything or help improve their experience. We also wanted to be able to do a real-time and batch analytics as well as machine learning.

B

Machine learning can help us understand what we don't know about our data and long-term storage. So we need a system that really can scale on the storage side of things and one big requirement from management was it gets done in a month, so we started off not knowing what data system we wanted to use.

B

So we started looking at some things that are public some things that aren't public, but really what met all of our requirements: linear, scalability, tunable consistency running well and azure and had a big ecosystem around it so that we didn't take on a technology that we wouldn't have any support out there. Something went wrong was Cassandra, so we said: okay, let's, let's take the bet. We know that some people in Azure we're already partnered with datastax and working on Cassandra, so we decided to try it so we started deploying it.

B

We went with one physical data center. Eventually, we want to geo replicate, but for now one data center with two logical data, centers one for Cassandra and then one for analytics. So, in order to deploy these, we decided to put them all in a veena assigned static, IPS, separate roles in two different subnets and then able to akal all the end points to the subnet so that we aren't exposing our back ends to the Internet, where any zero-day flock and let someone come in and steal all of our data, which is pretty bad for us.

B

So because we didn't understand how Cassandra was scale, we didn't we've never used it before. We said: let's go and get really big machines, so we went with 16 core machines, 224 gigs, three terabytes, and we decided we wanted to be able to set this up fast. So, instead of building the integrations ourselves between spark and Cassandra and configuring that we would go and use the data stacks package, so we had to use a boom to so here is just kind of a table of what we deployed.

B

We have larger heaps and spark because it ends up pulling in a lot of data from Cassandra. It wants to do aggregates and g1 garbage collection which, when I get to the problem section that was very big for us. It reduced our garbage collection times quite a bit our paws times and everything uses replication factor 3 between our two logical data centers.

B

So here's our ops center picture just in case anyone can so we had to get data into Cassandra from office, 365 and I'm, not going to say the number of service we have, but it's a lot of servers in office, 365 and so to do this. We didn't want to have the service directly talking to our Cassandra servers. We wanted to have a queue so that the clients didn't have to deal with retrying data. That was that needed to be replayed or couldn't be written.

B

So we decided we'd go with the rest api and behind that api we'd put a queue, but we chose the azure of vente so where a lot of people would choose Kafka, we didn't want to go and try a whole nother technology. While we were learning a cassandra is a technology. Now there are some people in our group who are looking at Kafka and trying to weigh the benefits of Kafka versus event.

B

Hub and spark streaming doesn't have a vet hub receiver, so we created a set of azure worker roles, use the data stack c-sharp driver and they watch the queue, pull the data in and adjust it all directly into Cassandra, and all data is ingested into the Cassandra data center and then replicated to spark where our agri spark jobs, aggregated and all of our data is scrubbed for identifiable information.

B

So if anyone did look at the data, they wouldn't be able to tie it back to any of our users or organizations, but Microsoft systems can tie it back so for the next two slides I want to show an example of our data model, two of our our main entities, our users and organ. We also call organizations tenants, so our big ingested data is user.

B

Data set wrong, which really is scaled by user ID and device, because we know that users are going to get more devices over time and we also have time PK, and so what we didn't want to happen was a lot of events come from a device creating our large partition, which could then cause compaction issues so time PK has the day that the time was written, that the event was written, and if we see that a lot of events start happening from device, we can change that to the hour that the event was written and since only spark reads this table, all we have to worry about is do spark jobs still run within the SLA that we have set when we have more partitions- and this is a time series piece of data, so are clustering.

B

Key is create a time. We do used a tiered compaction strategy and we'll talk a little bit about why we use date to your compaction when I get to the problems and our other main entity is really tenant, an organization that might be hard to read. But what's important about this, the partition is tenant and then we have a set of skews that you can sign up for an office 365. So that is our clustering key, so one tenant can have multiple skews.

B

Now this is not date, time series this gets bulk imported once a day and then we import diffs as as needed, and this really gives us metadata about an organization. Where are they how many licenses do they have overall? How many licenses per system that they use be an exchange, sharepoint and Skype for business? Are they set to auto renew and what's key, were they signed up for so we ended up taking those two tables and with those two tables we could join them and we could answer a lot of questions.

B

I know who's using the system, our users seeing failures. Are there certain failures from certain regions that we can reach out to? But, as we were running, these spark jobs for every piece of insight we wanted to get. We were spending 20, 30 minutes which really limits the value you can get out of your data. So what we ended up doing was having a spark job that aggregated all of this raw data into daily hourly and most recently weekly tables.

B

So the jobs can look at that, and this hourly job though it is batch right now we're looking to move into spark streaming, so we don't have to pay the cost of riding and then reading data back out. Once we switch to that using the same schema, we had all the jobs switch to the aggregated data. They then started taking 20 to 30 seconds. So now we can get a lot more value out of the data that we have inside. So what were the results? Well, we were able to answer a lot of those questions.

B

Are users happy with the service they're receiving? You still don't have a good answer for that other than the feedback that we get from customers and being able to see the availability. So we do go reach out to customers, but just from the Cassandra data, we didn't have a concrete answer. Our users fully utilizing the service they're paying for that. We can see the metadata says how many licenses they have. We can see how many users use those licenses and we've been able to reach out to organizations and say hey.

B

We can work with you to save money, our users hitting issues we can proactively help them with. We have a whole team at office 65 that that gets the output of this date of the availability of customers. What users are hitting issues and they're able to reach out to each one of those organizations and say hey? We see that these users are having problems. Can we go work with your admin and try to fix them, and sometimes it's admin side? Sometimes it's our side and how's, the user experience over their lifetime.

B

Well, that's hard to answer since we just started this recently and we've had users for a long time, but we do maintain now long-term data on user availability and you how users are using our system now the great feedback we got from from support and customers really inspired us to know that we're doing the right thing, support agents when you call them they don't have to go and ask you all these questions. What services are using? Where are you failing now? They just look it up and they know which is fantastic.

B

We save customers money and we found an interesting thing that sometimes credit cards are expired when a server when a user service is coming to an end, and now we can say oh they're, still using the service, let's reach out to them, instead of shutting them down.

B

So everything worked perfectly no problems, that's not true! So, just like everyone else that we are seeing at these talks, bad data modeling with the problem huge partitions one to two gigabytes. We had no idea what we were doing and we didn't used a tiered compaction for date to your tables. So we had a lot of compaction over going on. Our other biggest issue were not watching metrics and we had too many black flush riders too many drought mutations.

B

The number of files on the OS was set lower than the number of SS tables that we had, which immediately caused Cassandra to kid. Maad of memory, error and RSS table count got too high and that's where you really have to work to bring it back down. You have to monitor that closely squeezed off center now for for monitoring that.

B

So why did we choose a sure, you're? Probably thinking? Well, there Microsoft, of course they chose Azure, but we really wanted to look at this as a cost-benefit analysis. So we knew that we wanted use the cloud we didn't want to manage bare metal. So we had to evaluate what cloud provided everyone to use and we went and looked at operational costs because that's more important than the financial cost that you're going to pay.

B

Regardless of what cloud provider you use and we've been using Azure for five-plus years, writing some tools around them for deployment and monitoring and managing the service. So that was really the cheapest way for us to go and have our current dev ops team be able to manage the service. We also work closely with as your support and we love trying things out on a droid cuz. We can try to break them. It's always fun to try to break someone in your company.

B

Why Cassandra I talked a bit about this. The biggest thing for us was didn't, enforce, full consistency. We can decide what data is important, that we get all of it and what data is it? Okay if we lose some of it, because we're looking at in aggregates the bad things where we had to run it ourselves, we would have loved a hosted solution, less overhead on the team, but so far it actually hasn't been bad and then why datastax, of course, training they provide integration. So it's easy for us to get spark in Cassandra working.

B

We have op center, but to me support there were so many times in the beginning, where I see Chuck taking a picture over there, where we were actually hitting problems and we couldn't figure it out. We didn't know the code base, so we just went to datastax and we said please help and when you have, you know, bps yelling at you saying hey this needs to get up. Datastax is there to back you up, which is fantastic.

B

So now I'm going to hand it over to silvana who's gonna. Tell you how you can run it yourself, yeah.

C

Thank thanks a lot shown and yeah I'm from a juror engineering. My name is Savannah Corey ani, my team, specifically as customer advisory team, is helping first party services like office 365, like Skype dynamics, to give you some examples, but also external customers to onboard on Azure their largest and more complex solutions.

C

And you know we are talking about end-to-end solutions, including multiple SEO services, and you know specifically I'm more focused on the data tier and that's why I had some experience working with customers that on-boarded large cassandra clusters on top of on top of failure? So so Sean was mentioning hosted services and we are not there yet having a fool, hosted the cassandra service, a fully managed available on azure, but we do have a number of options for you in order to deploy your own cassandra topologies.

C

On top of on top of the azure platform, I would divide this into two main buckets. One is what scott guthrie actually presented yesterday during the keynote it is our market market place based offering for Cassandra. We work in a lot with datastax in order to integrate into our marketplace section of of the azure portal simplified.

C

You know way of deploying your cassandra clusters, where you essentially can bring your own license and deploy production and non-production clusters based on a number of attributes and configuration options. You can start small with, for example, four nodes clusters and go up to 90 nodes. These are you know, pre-baked the options that we have in the user interface. Whenever you are actually clicking, you know create a brand new, a brand new Cassandra cluster on top of on top of measure. So you can pick up essentially some some options.

C

Like you know, the vm size, the node, sighs the vien type and this particular marketplace offering today is limited to a single v-net within a single Asia region. When you are, you know when you and you know, ended entering all the attributes. Basically, you get your cluster up and running. You can access the OP Center to you know that URL that you see on the bottom of the slide, and essentially you can start.

C

You know interacting and managing with with your Cassandra cluster, you may add additional services, additional compute nodes or whatever that are actually consuming or interacting with the Cassandra service. We didn't that V net or you may have a more complex networking topology. That actually is, is you know part of your overall solution. The other option that we have is actually to you know, let you define your own deployment apology right and in general, you know this goes through a number of options and selections that you have that you need to have.

C

First of all, you need to decide how you want to group your resources within. You know your larger deployment. Apologies! If you have a you know, compute here. If you have data tier multiple services, interacting with each other, you will need to define. You know how you want to manage those different. You know resources in Asia, then you want to actually define how your compute and storage resources typically are configured.

C

You need to decide. You know if you want to leverage, you know ephemeral, disks, local, to the single computer, compute node, or if you prefer, to go to a persistent. You know durable storage. You know this is a. We know. This is a kind of a sensitive topic in terms of selecting what which one is the best for your particular workload. But we can we offer both actually and we will see a little bit later more details on this and how, for example, to organize.

C

You know your operational activities like you know, backups, taking snapshots of your of your Cassandra databases and and and so on. A third you know typical area of you know. Selection for defining your your topology is about networking Sean mention that you know they consider it in their deployment, a single region, single data center deployment within a single v-net.

C

We have seen other customers actually deploying Cassandra nodes across multiple physical regions of multiple physical data centers and using v-net to V net connectivity, and we will see you know some of the option and some performance considerations around. You know this three layer. You know compute networking storage options that we have available today in in Azure, so in terms of organizing resources within your Azure deployments, we recently introduced the azure resource manager as the backbone, and you know the provisioning and managing mechanism across all other services.

C

This is a recent improvement that we introduced.

C

That brings a lot of new capabilities compared to what we used to head in the past, a armor or as we call it as a resource manager, is fully our back based. So you can define your role based access, control to all resources, all actions or operations that you are doing across your subscriptions and then deployments instant play driven. So you can define your templates using.

C

You know a JSON base, the dialect and how you know all the resources will be deployed in a single fashion through the orchestration engine that is part of the as yours, whose manager itself it hovers. You two models: the declarative model, where you are passing a template to the arm engine or the imperative one where you're actually interacting with resources.

C

Within you know, resource groups by executing actions and commands against those resources is offering some item potency capabilities so that you can reply your particular deployment options, making sure that you are not duplicating resources, the resource deployments operations and and so on, and these multi-service, you can include you- know: infrastructure components, compute, node, networking storage, but also managed services like, for example, as your sequel.

C

Db is a relational database service or any other service in the in the platform, and it's multi region by definition, because you can group logically together resources that are physically deployed in multiple data centers, and this is where, for example, you can start organizing clusters that are effectively deployed in multiple in multiple regions.

C

Resource group is a fundamental entity within, as your suits manager is a logical container, as I said that, where each resource needs to belong to these resource groups are as effectively the unit of life cycle management for your Asia resources. So whenever you are deploying, for example, a Cassandra cluster, you need to define as an example, you know your networking topology and your compute nodes apology. You may want to separate these into different resource groups, and this is giving you the ability to essentially define different life cycle terms for this different resources deploy.

C

So you may want to, for example, get rid of your of your compute nodes, because you are in a development environment and keep all the infrastructural components are deployed and deploy. For example, a new configuration on top of that deployment is another first class, a concept in in as resource manager. Basically, you can organize and track this template execution. The deployment template execution. You can read the you know diagnostic information out of these deployments to understand if everything went well and you can also create nested deployments.

C

If your deployment apology is so complex, that requires some form of organization like, for example, you know presidencies and in some more complex processes behind the sea. So as your resource manager is actually helping us to describe the topology of our solution, deploy the measure, but within you know each node. Of course we will need to execute a number of configuration options, configuration operations and in Azure and in Asia resource manager. We have the options to install on each node a set of BM extensions that are basically dedicated for a particular set of tasks.

C

For example, we have third-party extensions like, for example, chef and puppet. In order to you know, automate configuration of single nodes. We have custom script, extensions that can deploy scripts from our own repositories and start doing vm configurations the marketplace offering that a Scott showed today, for example, for each compute node for each Cassandra, cluster node is injecting and, of course everything from you know. The Java is decayed up to the OP Center for the opscenter node and then is out.

C

You know, automating all the deployment tasks for for us and at the end of the day, is giving us the Cassandra cluster up and running, but these extensions can also help you, for example, in the compute here in your application server tier in order to inject on both windows and linux machines, whatever application framework, your solution is actually requiring, and you know these aren't templates. We see a lot of end customers, but also ISVs or system integrators that are starting to rely on these templates.

C

In order to organize their deployments, these templates may be related to a particular solution area. Within your you know: bigger infrastructure, for example. You know, building a particular Cassandra cluster or the templates may describe the entire solution end to end by using the this nest that template mechanism will give you the ability actually to orchestrate very complex.

C

You know deployment apology with within you know, a single process and by using a single provisioning, provisioning engine and, as I said, you know, the marketplace offering is based on the same template mechanism that you can use yourself in order to create your highly automated deployment tasks as a best practice.

C

What we are seeing is that most of these templates are organized in order to deploy, to simplify, essentially the set of options that you have available for your deployments grouping these options within some t-shirt sizes type of deployments right, so that, if you need a small cluster, you know that automatically you we get. You know a given configuration in terms of storage, networking and compute nodes. If you need a medium or large, you know these configurations will be different and you can automate. You know these different options within the same template by you know.

C

By adopting this nest, a template mechanism so based on the parameters that you will pass different, sub templates or nested templates will be will be called behind the scenes. So there is a a you know very interesting white paper that is actually describing how you can design this complex. The this complex, the complex topology, the topologies for for deployment by nesting together and joining together, multiple multiple templates and that's the templates. This is giving you also the ability, for example, to maintain to evolve what to test.

C

You know, debug, a single you know smaller units of your deployment process, but orchestrate you know the entire units within a single, a single environment. So, but, at the end of the day, every arm template deployment will be based on a set of parameters that you will pass username passwords for compute nodes.

C

You know network configuration the region where you want to deploy that particular template into plus a number of other rounding parameters, and you will have you know this nested structure template that will give you the ability to define, for example, to configure in a different way your op Center nodes compared to your Cassandra nodes in your in your Cassandra cluster right. So you will define these roles within your larger deployment and arm is giving you the ability to automate both.

C

You know the resource you know deployment, but also a source configuration within each type of different deployment. So I, if you are interested in this topic, I highly recommend you to go to these two github repos. The first one is maintained by Microsoft and is containing right now more than 200. If I remember correctly, different deployment templates orchestrating large solutions, including a lot of OSS frameworks and applications from from Cassandra to to elasticsearch and a number of others, and also is also showing step-by-step.

C

You know how you can assemble these different capabilities and and create larger templates. Datastax, on the other hand, is evolving. You know their arm employ deployment based templates and in in their own github repo.

C

Basically, today they are offering the ability to to deploy your datastax cluster in in the best optimize the way possible, and there are also in the in the deck a couple of links that can give you the ability to go directly to the training center in datastax and and see, for example, how to deploy with a CLI or with the azure marketplace, Cassandra on top of on top of asher. So what kind of in terms of compute and storage options? These are, of course, super interesting topics.

C

In order to achieve your ratio, your sweet spot in terms of performance as gay. Okay, we do have a. We do, recommend a couple of computer families for Austin Cassandra nodes. We do have this series. The series is based on local SSD. Disks is optimized, it has Intel processors and it's definitely recommended if you want to, for example, balance, cost and performance, because it's a cheaper than, for example, the g series, that is, our top class vm family. But you can still get a lot of performance, for example from a femoral node.

C

Even you know, the biggest vm for from d series can give you up to 60,000 I ops and some millisecond latency for for local disks, but we do also offer the ability to attach remote storage network at storage, in particular for production classes. We recommend premium storage, that is our provisional die-offs mechanism that is giving you low latency and I throughput capabilities on top of the DSA or the GS series of of BMS.

C

So you can, for example, stripe together a number of premium storage disks and get up to 80,000 I ups and, if I remember correctly, for the latest development up to 2 Giga, two gigabyte per second in terms of a disk true put for a single node. So you can start small with few chords, and you know local SSD disks and go up to 32 cores and terabytes of I performance, low, latency storage attached to to each node.

C

We do also have a standard storage option that is way cheaper, that we typically suggest for, for example, backup snapshots or this kind of purposes, not for Austin data and commit logs kind of files.

C

It's you know, commit data and commit logs are better suit for the local SSD drive or for the premium storage option. In case you want durable, durable storage for forest in your forest in your ear date in terms of networking deployment options, depending on on your network, topology strategy, your application factor, and so on so forth. We do offer you know as a as a as a basic concept this this idea of V net. So it is, of course, a private network environment where your nodes can communicate to each other.

C

With you know, low, latency and and I bandwidth. Actually, we don't throttle nodes that are talking to each other within the same V net and the only limit is actually the ability of a single VM. You know the essentially the bandwidth available for for for each v-net and we can go up to 20 gigabit per second for the largest vm, that we have a g5 or gs5 in terms of in terms of vm size.

C

If we want to implement across region deployment- and you know, for example, partitioned at across physical data centers, we do have a v-neck to vinet gateways available that are giving you the ability to create a single address. You know space, we didn't multiple data centers right now. We do have two options in terms of gateway. The standard and the I performance won the I performance.

C

One can give you up to 200 megabit per second in terms of in terms of throughput, and of course the latency will depend on how far you know the two data centers that you are selecting in terms of in terms of deploying the different you know multiple nodes. Are we see you know we do, recommend some geo pairing between regions. If you want to maintain, you know low latency and we don't have an latency SLA. Actually, nobody in this in the industry have our network SLA.

C

In that that, on this 10-point, but the measures that we took our around, you know 20 millisecond latency between this GOP data, centers that we have, for example, West us and south central us, or you know, or some of the other 19 or 20 I. Don't remember exactly the number today regions that we have around around the world. These gateways also have the ability to disable encryption on this VPN tunnels.

C

In case you're already have a based encryption, like you know, Cassandra Cassandra King can offer you and you know, by turning off the the encryption you can get some additional bandwidth. Just because you know the CPUs I know compute capabilities on these gateway nodes will, you know, don't need to actually compute all the encryption part. So we give you some more bandwidth between your your different minutes across multiple data center.

C

So, just to summarize, in Azure we do have multiple options. To give you the ability to deploy your cassandra clusters, we do have a niley, automated simplified mode through the marketplace offering where you just click on. You know create your new cluster and the cluster will be deployed in minutes. The last test that I've done is for a 4040 node cluster was something like twenty fifteen to twenty minutes, and you can get. You know your environment up around in your op centers.

C

You know up and running, and you are ready to go and to start you know deploying developing applications against that or you may want to create your own customized deployment through a set of you know: well organized the deployment options that are managed and orchestrated the true as a resource manager.

C

So if you want to customize, if you have a very specific set of requirements that you want to customize customize as a resource manager is definitely the way to go, and there are a number of a lot of examples and and best practices and guidance available to let you understand how you can leverage this technology to deploy your consent or cluster okay.

B

So we have about two and a half minutes left, so we can take questions. Any questions that we don't get here feel free to tweet or email us.

A

Watch memory, machines and g1 garbage collection. Now heavy tried to push like heap size to like higher limits because you can use up to like a hundred gigabyte is jawani. So.

B

We have only gone as high as 20 gigabytes and the reason that we didn't go above that was, we didn't really need it, but also datastax had done a bunch of testing and they found that 20 gigabytes was actually a really good setting for on your spark notes with g one, so we haven't done anything higher and the.

A

Second, question is: have you compared the performance of like bare metal cassandra notes with the other like their chow, no, its natural, so.

B

I, don't so my team doesn't have data on bare metal I, don't know if silvana, if you haven't.

C

Yeah, it really depends on some to give you an example: the largest VMS vm families nvm size that we have are actually owning the entire physical node, so I'm assuming performance are exactly on par with a similar equipment. You know hardware, our environment, but, depending on you know your your requirements, your the n-type and the end size. You can get, of course, different different behavior. We think you know at the I provider level we are doing a good job in isolating.

C

You know, performance characteristics of different tenants sitting on the same on the same physical host. So there is a good you know matching between given VN of that size and an equivalent hardware equipment, of course, depending on your storage option. If it's a local storage versus remote storage, this can introduce a little bit of variance.

B

This might be something datastax, it done some tests on, so we can always talk to them. It looks like there's a question right there, yeah.

D

I'm interested in your choice of date tiered, since that it's kind of fundamentally broken in its implementation, so I was wondering um how you've been working with it. So.

B

Our table, you know, data comes in. We never update any data. We have hit some date tiered compaction bugs such as it needing to scan all the SS tables and not taking a long time and that's low in compaction down, so we've run into any of those with work with data stacks.

B

The good thing is any time we've gone to datastax they've already gotten that same problem from another customer, and they have a patch for us, so they've been really to help been able to help us through that problem, and so far our compaction overhead has really gone down with de tiered well before we were using sized here, but it looks like we're out of time so we'll be around after the talk. I want to have any questions and tweet and email us. Thank you. Thanks.

C