Apache Cassandra Cassandra Summit 2015, 14 Mar 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments

Description

Speaker: Adrian Cockcroft, Technology Fellow

The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.

A

B

I wanted to start off by asking so who was here: I wasn't a Cassandra summit last year, but I think I was up most of the ones before that. So I wanted to just summarize a few of the things that if you were around, then you might remember some of these things. So back in 2011 we did some testing while I was at netflix and on scalability and we were running 24 nodes and we weren't sure what would happen when we went to 48.

B

So we tried scaling up and it kept scaling all the way to 288. So I posted this blog post, which datastax then turned into an advert which I kept popping up on websites or the web for the following year or something that's all it scales. So that was good and then we we managed to persuade AWS to do solid state disks, and then we hoovered up the entire supply of solid state disk based instances for about one year after that, so everyone else said: well, they exist, but you can't get them so I'm.

B

Sorry about that. But I did do some nice benchmarking and the bottom right. It was very good to see as your this morning launching 90 Cassandra nodes I on stage right, because that was three years ago. I think I did that with AWS.

B

So it's you know good to see them catching up, but we would I did Abba side-by-side benchmark of solid-state disk versus regular disks, and that was actually one of the most fun presentations I did who they have no idea whether it was going to work and we were linked, stuff, live on stage and deploying hardware and running a side-by-side benchmark, so that was sort of fun anyway, so so I used to be at Netflix, but I moved on.

B

So now I'm at Battery Ventures, it's a VC firm, and these are all the different things I do now, I, don't obviously due diligence on deals for the VC firm, the companies that come in I advise portfolio companies, giving technical support sort of consultant to the CTO, scalability cloud, migrations, Cassandra, migrations and I'm responsible now for four or five companies. I think switching to cassandra and they've seemed pretty happy about it.

B

Networking with interesting people- that's all of you if you want to contact me after this I've- got a few cards, but I'm easy to find so just go to Twitter. If you want to talk about something, if you want to start a company or you want to talk about using some of our portfolio company interacting with people I also mess around with some technologies and I'll talk a bit later about some code, I've been writing, which is this simulation thing. I also do a lot of conferences.

B

I'm on the program committee I present a lot and I sort of travel around doing internal things, but I also do quite a few internal talks at companies. So if you're particularly interested in having me come and do an internal talk to kind of persuade your team that you should be doing microservices better or something happy to discuss that and see if I can fit something in all right, so topics for today, I'm actually going to start by talking about micro services. From the point of view of why is this an interesting thing?

B

It was good to see Sam Newman's microservices book up in this morning when Jonathan put up the slide. So you know that it's a hot topic what's going on there and why is that a Cassandra sort of tends to drive a lot of the a lot of the backend for microservices ends up on Cassandra I. Think it's a very good fit and then I'll talk about more about simulating architectures.

B

So really comes down to you know, sort of you know the canonical, CIO I, guess Canonical's a company, but the CIO that that you know the generic CIOs that are trying to get stuff done. The kinds of things that they're struggling with right now they sometimes call it the digital transformation or the sort of data deluge or whatever. But what they're really trying to do is align the IT departments with the business and what does that mean? How do they build get more product focused instead of more project focused?

B

So that's one of the big transitions they're trying to help develop products faster because, as software eats the world, everybody turns into a software company and lots of people are getting into sort of existential.

B

You know threat problems where, if we don't innovate as fast as you know, some you know: Bay Area company, that's got a big web services thing and you're an old enterprise. If you can't go that speed, then you're going to get eaten by them and then also try not to get briefed, there's a lot of focus on security and building systems that are much more robust now.

B

So when we were trying to figure out architectures in 20 2009 Netflix, we were listening to 10, who was one of the original. This is part of the original idea behind one of the ways of doing DevOps, which is that if you built something, you run it. So that means the developers own things in production. Everything is self-service, so you're on call, and what that drives is that developers have to own building things and own deploying things, and it's really nice.

B

If you can deploy an entire global cassandra cluster in a few minutes right rather than having to go and argue with vendors for a few months and or opt people, you know ops, teams or whatever it is they've just push a button, and it happens. That is an incredibly powerful thing. This is much more than just safes new you're standing up a webpage on Heroku, which you can do in a few seconds. The ability to deploy you know, terabytes or no petabytes of storage.

B

Trivially is really important, so it means that developers are responsible for going faster and they're, also responsible for the efficiency of what they're doing how you know scaling it down when it's not being used and tidying up after themselves and how efficiently they build product as well and then they're also responsible. Now for the the safety of it, you know how rugged or how secure systems are now good, I'm not going to go through it. I don't have time to go through this much today.

B

But if you look up my presentations online I've got lots of presentations that go into this and a lot more detail, but this is what continuous delivery looks like nowadays, you're trying to observe what to do orient yourself, decide which, which option to take and then act on it and for observe it's really about innovation. Can you figure out what you should be doing? Can you find that customer pain point? Then you use big data analytics to creates to figure out what's going on. The typical thing here is you're trying to answer a question.

B

That's never been asked before. This is why you're groping, through you know so gripping through log files, you're importing cleaning stuff up you're, looking in places that no one's looked at before to find this piece of customer painless. You know one entry sort of scat hiding in in a lot of log files, then you're going to figure out what to do as if you have the right culture. You can get stuff done quickly because it's also self-service and you don't have to go and ask for permission and you're working on small pieces of work.

B

So it's easy, and then you use cloud to do employ to deploy this automatically and that getting around this loop as fast faster than your competition is what makes you competitive nowadays and a lot of people are suffering from this. So you don't just go around the Luke you bounce around it. So once you've got that, then this works fine as long as there's a handful of people, but what happens when there's a large team of people? What happens is if you build a monolithic application? You have a hundred people or working on it.

B

They keep stomping on each other's toes. They keep. You know, checking in code that breaks somebody else's code and it's hard to get a bill put together and what you tend to find, and maybe some people have heard this phrase from their QA team. Can we have more time to test this release, as anyone ever heard that know if you give them more time, it turns out the release after that will be even bigger because the developers kept putting more features in.

B

So you get more and more features in a release, it's harder and harder to test. So the correct thing to do to get around that is actually to shrink the size of releases and do smaller releases more often and to break the releases into multiple pieces, and this is really what microservices is about. What you're trying to do is how not have one release plan where everyone's trying to coordinate a hundred people trying to coordinate around it. You have multiple release plans with a handful of people for each and you can do it.

B

Then you also get the ability to use different languages, different systems, different development approaches. You know the front end them in the middle tier they're. All different kinds of things can use whatever they want, because they're not all lock, stepped on a single release model. Now, when you've got multiple things, you want to bundle them together into a common platform and common way of doing that now is to put things in docker containers and your deployment platform just says: I know how to stick containers out with Netflix.

B

We used Amazon machine images because going back five years ago, that was the right thing to do, and Netflix is now moving to use more dakka. Now, if you get a failure now of one of these components, you find some bugs.

B

You just redeploy that one piece you didn't block everyone else, so you don't have your one bug blocking the release blocking the work of a huge number of people from reaching customers what it really comes down to is you want to actually end up changing one thing at a time because they're you change that one thing and if you can deploy things efficiently enough, because you know putting a docker container out there take seconds right so so why would you do that every two weeks when it takes seconds you can do it 10 times a day per person or a thousand times a day per person?

B

There's no reason not to do it as often as you need to, but what you can do do then is make each release be a single thing, which means that it's really easy to tell whether it worked it's easy to back it out. If it didn't work and you're, not disrupting everyone else to the coordination time, it turns out to be the biggest thing. That's slowing everyone down so what's happening, then, is as we reduce the cost and the size and the risk of change. We increase the rate of change.

B

So this isn't weird we're not doing this typical. You know six monthly releases, we're not doing those ten times a day, we're releasing tiny fragments of a release. One feature or one update or one bug fix at a time and you've got a continuously evolving system. Right now, if you're coming from a world where yeah we go and we spent, we have a project and we're going to upgrade SI p for the next nine months, and then everyone sort of gets, exhausted and runs away from it.

B

You know and wants to do something different, it's very different model. This is an ownership model where you own the thing in production, you own, it's evolution, your own, making it better and better all the time.

B

So this is the other labels for this, but mostly when people talk about microservices they're talking about the architectural patterns that people use to do this- and my definition is- is here I'll just sort of pop up some more bits here you want it to be loosely coupled, so you can update different pieces of it independently and you want to have a bounded context, because that's how you decide what is the right size or microservice? How much? How big? Should it be?

B

How many different things should you put in it, and what do you really want is to have each microservers? Do one thing right: it should. You know miss this when you get this when this applies to the data tier, you end up building data club, you know data stores that do one thing, so you have lots and lots of Cassandra clusters. You have one Cassandra cluster, with lots of key spaces in it or a big huge schemer with everything golden together. You've got lots of denormalized stores and they all work independently.

B

They scale independently they're owned independently and you typically put a data access layer in front of them. You funnel everything through so you're trying to turn everything looks more like a rest interface. They actually don't care. What's behind it, you can. This is this is one of the common patterns. They say you're on my sequel.

B

You want to get to cassandra one of the open-source netflix projects called stash storage tier as a service over HTTP with to a stash with 2 a's in it, and that is a java app with an HTTP interface and it has a cassandra client library in it, and it has a my sequel, client library in it. So you put that in front of my sequel start using it.

B

Then you add cassandra, then you start gradually moving breaking pieces of the of your schema off and then gradually you end up all on Cassandra and the but you've abstracted it up into this data access layer. So you actually don't care what's behind it anymore right. So that's the kind of pattern, there's a whole lot of books. That I think are interesting in this space, in particular the domain driven design book.

B

It really talks about these bounded contexts and how you decide what goes in and out of service if you're trying to build really really reliable systems, and one of the reasons people are interested in Cassandra is because it has these sort of very robust modes.

B

You need to understand the drift into failure book, because that explains why these perfectly reliable systems will occasionally fall over because they hide all the brokenness until all the things that are broken gang up on you and finally, it tips over and you find 10 things are broken because it was so resilient. It was hiding the nine other things that you didn't know about, so you have to understand how to get in there and root out the things that are going to bite you later, as you build systems are more and more highly available.

B

This book is a very key book to understand.

B

So I'll talk a bit about the these micro services and cloud native and monitoring. So what we've got here is in the cloud native world is a very high rate of change. We have code pushes causing floods of new instances, floods and new metrics and there's a very short baseline for analysis. The configurations are very ephemeral. They don't live for very long. They could even live for less than a second.

B

Some analysis of docker lifetime showed the most common lifetime for a docker container for over a pair three or four month period of New, Relic monitoring, all the docker containers they monitored, the the most common life time was 1 minute right and the second most common lifetime was 0 minutes right less than a minute, and then you know more than one minute was way down. There was a graph with two tall spikes and a whole lot of small spikes right minute by minute and and these microservices are calling each other in complex patterns.

B

If you just look within cassandra, cassandra is sort of a micro service architecture. The the flow between my cassandra nodes is actually fairly complex, all the gossip and the painted handoff and all the replication. So when you look at how do you scale microservice patterns and how do you monitor them? One part: one problem is managing the scale and it's not just the number of machines. There's this big complex hierarchy. You have it distributed across continents and regions and lots of zones if you're an AWS or data centers.

B

If you're on your own systems, there's lots of different versions of things running at once. There's lots of containers and you can have tens of thousands of machines. So how do you deal with that? And one problem, then, is: what's the flow look like you know, how is your when you have a monolithic app? You hit one side of it, you stayed inside it and then it popped out at the database right that that's relatively simple, there's no real flow there.

B

But when you build these micro service systems, you want to know what's going on so the flu flow visualization tools out there, the left one is a Netflix one. I, don't think they've released bit cold cell top writers, a nap dynamics flow which works up to a certain amount of scale. But you know once you get into huge numbers of microservices, it can get more challenging to visualize, and the bottom right is a Twitter.

B

Is the output from a Twitter tool called Zipkin, which is a a flow based monitoring open source tool which is currently in the process of being turned into a more standardized thing, so an open Zipkin project? That's going on right now, so that looks like a place where we could standardize things. But if you look at the architecture diagrams of a lot of sites, this is kind of what they look like. You have hundreds of services talking to each other, and you have what I call death star diagrams everything turns into a big circle.

B

You can no longer see what the structure is and that's a problem, because we have interesting failure modes that are part of the structure of the system, and when we look at failures, we want to understand that a zone went down. Not that looks like about a third of our machine suddenly disappeared right. If you get a power cut or you lose connection, you need to understand structurally that this is everything in their zone broke rather than everything else so I mean you should know this right.

B

If this is, this is a sort of a diagram of a 3 zone system. You've got a load balancer at the top. Then you hit an API proxy. You had a bunch of business logic. Then there's one of these data access layers in in the back I've got a 12, node Cassandra cluster, with three nodes in each zone right the traffic comes in and when it writes, it writes to Cassandra and that's how the data gets to the other zone. So the next request can go to a different zone and the data is already there.

B

So this is typical. 3-Way replication should be pretty familiar to anyone, that's playing with Cassandra. Now, if you lose a zone, what what? What is your monitoring tool say if you lose an entire zone right, you're, probably going to get a massive flood of errors? Now the system is going to explore your logging, your monitoring systems, going to sort of complain, mightily right, but what should it really do? It should give you one message saying you're still up but don't mess with it.

B

You lost all your redundancy all right, because cassandra is designed to run on two out of three zones as part of the point right, that's why we were doing this. So the fact that you lost an entire zones worth of systems isn't actually an outage to the end user or shouldn't be right. Maybe there's a few retries as it sort of glitches a bit and you want to get into the ALB and have it stopped sending traffic to the dead zone.

B

But that's basically it your infrastructure still working, but you probably want to stop doing maintenance on taking nodes out of service addy, Cassandra tier because then you're you're down to one instead of two- and you know you don't wanna go below quorum, so it's a very different kind of analysis and I'm still looking for tools that can do this level of understanding of what's in the infrastructure.

B

So this is the challenge, and this is the thing I've been working on that I want to talk to you a bit more on than my remaining time is about I'm trying to get people interested in collecting these architectures and modeling these kind of failure modes and that's why building this simulator, where cassandra is a key part of what I'm simulating and the problem here is.

B

If you want to test a monitoring tool at scale, you have you where you can just deploy it right, but if you're deploying 5000 nodes on AWS for testing- and you need to need to have them there all the time that gets expensive pretty quickly, if it's just for testing, if you want to prove you're on on 10,000 nodes or 20,000 nodes or 30, you know what typically would happen is some of we turn up at one of the big companies like Netflix.

B

You put your point, your nice monitoring tool at it and it would sort of explode on impact or just go. I can't deal with this. It's too big and I'm. No sorry, no I can't do a trillion things an hour a day or whatever it's doing so it just gets too expensive.

B

So I came up with an idea of using simulation to do this, and and this this diagram is actually part of the output from the simulator I built and what I've been doing is modeling and visualizing microservices, and this is running inside my laptop in a gigabyte of memory rather than running in on amazon in you know, 10,000 or hundred thousand instances, which is a lot cheaper, because I already have my laptop and I can actually develop code on it easily.

B

So the codes on github there's a front-end, that's using d3, it's a JavaScript front, end, there's a back-end written in go and they aren't really connected together. Yet one of the things I'm gradually working on is getting getting it so that the front end and the back end, you can actually control the back end in real time and visualize what's happening. What happens now? Is you run the back end on the command line? It saves lots of files which you then visualize to see what what's there. So how do I define that architecture?

B

Well, here's his an architecture. I wrote this up for this talk right, so is my architecture definition file? It's a JSON object. It basically says I've got an architecture. It's got a description which is for this talk.

B

One of the things I have is a chaos monkey. So one of the nodes gets deleted at some point during the simulation, but I didn't want. I didn't need that so I didn't give it a name and then I've got these tears in the first one. You can see I just call Cassandra I have a package which implements the behaviors and I'll show you what that looks like later.

B

But this is a a go package which pretends to be the sort of priam Cassandra mix, primas Netflix's management, TF a wrapper for Cassandra, so I put the two names there and it depends on itself, which means the Cassandra nodes talk to each other. So my dependency list by cassandra has to list Cassandra that and you can have any name. You can create as many Cassandra clusters as you want doing this right, I caught six nodes in one region.

B

The next one is my stash data access layer which is just a rest data access layer, so I just called it rest data that depends on Cassandra. I've got six nodes. For that too then I've got my business logic. Carry on is the name of the netflix project for a generic business logic. I've got 12 of them. It depends. This is a very simple app. It's just straight.

B

Through I've got an API proxy I have a load balancer and at the top, I have sort of the DNS entry denominators, the name of the DNS management layer. That thing that Netflix built. But the point here is that at the top level, when you're doing globally distributed systems, you have to have something that isn't in a region right.

B

So it's in zero regions and they're 0 count because it's just a DNS entry and you can see that the elb there isn't actually an instance for the ALB, it's just a thing, but there's one per region. So that's why those numbers are set up that way, all right, so when you've done that what it looks like, hopefully you can see this the architecture. The leftmost thing is the purest form of this architecture. It's just one of each right, so at the front, I've got no bring a mouse over pointer.

B

This is this is basically filtered down.

B

The back I've got three, those three large blobs with a triangle: those are the Cassandra nodes in each zone, so this is basically boiled down to show you one thing per zone right, so there are actually several Cassandra notes per zone and the next one over is actually the expanded version of that first, one, just showing you everything but not collapsed, into a single single note right, so the back now you can see there are six Cassandra nodes all talking to each other right, that's they'll, cross connected the next thing I did was I said: let's just make it twice as many so I can scale up, I can trivially scale.

B

This simulation up and I said, make it two hundred percent bigger and then the third one I made it four hundred percent bigger. So now, I've got a twenty four node and a 48 node Cassandra cluster three zones and I can make an arbitrary, complicated architecture and scale it to an arbitrary size and at first times, I've had a hundred thousand nodes running on my laptop a couple of gig of ram, which is ridiculous to try and do in real life right. So I can what the point of the simulation is.

B

I can create architectures that you couldn't really create in real life either you couldn't create them at all. You couldn't create them cost-effectively. So what I'm trying to do is model these architectures. The next thing I can do from that same file without any changes all I did was. This is all I all I specified I can tell it to have up to six regions. So then I do multi-region cassandra and the first one on the Left. That's a to region, one! Let's hope you can see it.

B

The contrast isn't that great the dot in the middle at the top is that is the endpoint that splits by DNS to the two sides. You can kind of see the six region, six zones there in two regions, and then it goes into this Cassandra cluster. That's all clumped, together in the middle right and the one on the right is 33 regions right.

B

So again, I'm just now simulating a three region node and by the way, every one of these nodes has a name which looks like you know us East, one blah blah blah and it has an IP address assigned to it, which is actually a real AWS. Ip address stolen from their address ranges.

B

So so the system can export, monitor, monitoring tools as if it was a real machine with a real name with a real amazon IP address, even though it's just faking it all in my laptop- and this is what four five and six, and by that time, I get back to my same problem that I had previously right there still that's looking like a Death Star, so you can see four and then I just gave up trying to stretch everything out and it's all sort of into a big blob. Now this is a simple system.

B

It's got one cluster, it's just got a few things connected in front of it. I try to create something more realistic. This is a this one has three different Cassandra clusters to them: sort of overlapping, each other there and two different endpoints, which is again a fairly simple thing, but you can see it's getting harder and harder to visualize these systems. So there's two things I'm trying to do here. One is to create these architectures, which you can then feed to monitoring tools and say: can you figure out how to visualize this better?

B

Please so I'm trying to offend people by showing them badly visualize things to get them to build better visualizations. So there are some people out there using my tool to generate architectures, which there then feeding into their monitoring systems in order to figure out how to display things better, there's a kind of Gordon stoner. That's actually using my tool now been talking to other tools, vendors about this too. So that's you know this I could get this to be so complicated.

B

I can't see it so when we can render this better, maybe I'll generate some more complicated ones. So anyone here ever written any coding go if you go programmers, ok, so this the code, I have isn't very idiomatically go code, but it's somebody told me it look more like Erlang, but this is what the thing looks like and it ends up being the entire package that implements all my Cassandra behaviour is about 200 lines of code.

B

I wrote about half of it in a few hours on a plane last weekend, while flying from back from the UK. So you know four or five hours. I was able to do a fairly significant upgrade to it. So basically, every go routine is simulating, which is a thread basically is simulating a real machine right. So it way you'd have an instance or a container or something you want to model.

B

I have a go routine and it's just sits there in memory and I sent I have channels for connecting them together and I can send messages back and forth at a few hundred thousand a second right. So that's, basically, what's going on every service looks roughly like this. You have, it has a listener to a channel which is basic like a it's.

B

Not it's kind of faking HTTP using go channels is the way to think about it, and then it basically says what kind of message is this and it's got a whole bunch of messages that it deals with, and that's what it looks like it's very simple now when messages come into this flow because those old diagrams I was showing you weren't, just pretty diagrams they're actually model that they are recording the architecture of my go program and those those every line on.

B

It is a channel that has traffic that's going on it right, so all of those diagrams were all fully connected. Now, when you send messages through the system, it records the flow it took through the system and I'll show you what that looks like so here's here's my stash talking making one request. This is a single request coming into the Cassandra system that I'm simulating and the second thing says TP and s. So s is the span. A span is a connection between two microservices right.

B

So that's the span right each span has a unique number. The tea is the transaction or the request, and every every request. Thats related has the same transaction. So when you come, when you hit the edge of this system, it creates a new transaction number and then the one in the middle is the parent span.

B

So at every spare every time you you land in an inner microservice, and you want to call out you take the span that got you there and you stick it in as the parent and that way you can actually build this tree. So what I've got here is the aput going into Cassandra multi-region Cassandra cluster that wanted to replicate the data, and this is actually what happens in the simulation.

B

Each of these little arrows is a is a message flowing over a go channel in my program, but anyway, so it comes in and you see the first line there is the put, and then it goes in.

B

All the rest of them are replicate calls so the first thing it does is 908 and then 909, which is sending it to zone a in Zone B, because it was already in zone c, and then it sends 9 10, which is the remote occur to us west to goes his own, a there which copies it to be and see now the very crummy simulator I've got right now does actually have token-based sharding in it, but the only people that know about the shards are the nodes themselves and they don't know about the remote entries.

B

So when you're talking to somebody, you don't actually know what their shards are. It's not the cassandra. Has this full view? Every node knows everyone everyone's token in the ring right. Well, the way I have it. I don't really know that yet so I've got a tinker with it a bit more. So the reason there's another series of calls is because I landed in zone because cash for- and it turns out that- isn't the right owner for this piece of data, so I have to copy it to one of the other nodes in Zone B.

B

So this is how I made sure this my algorithm works. This is the code I wrote on the plane last Saturday based basically getting this to work. So the interesting thing here, if you want to play around with gossip protocols, this is a great platform for playing around with stuff, like that.

B

It's basically a protocol simulator where you can create arbitrary protocols with arbitrary forwarding logic and you can go and explore what's going on and we're gradually getting better simulations a better visualizations of the flow, so I want to take these flows and put them into a much better trace. Visualizer and I want to get the output to be in Zipkin format so that it basically can feed anything. That knows how to do that, all right.

B

So why am I building this and partly because I think the tools are currently doing a bad job of monitoring these large-scale configurations and particularly systems with cassandra in them? We've got a lot of internal structure to Cassandra that very few tools really understand so I'm trying to get people to understand those structures. I also want to be able to grow and shrink. This have auto scaling so that this network that I've created is not a fixed Network. It grows and shrinks it's actually a dynamic graph and I can grow it. I can shrink it.

B

I can actually forget nodes. I can I can basically have chaos, monkeys, killing things and I can forget links I can create partitions, so I can have a globally distributed system with traffic running through it. Then I can cause certain types of partitions and I can cause types of outages and then I can try and make sure the simulators can handle that. So that's that's. Where we're going on so just to take away on that, if you want to know, what's really happening now in enterprises, the Lean Enterprise book is really documents.

B

The struggle a lot of people going through to get things brought into this modern age. It's sort of the enterprise's trying to learn how to do continuous delivery, building microservices an awesome book for figuring out all the different transitions you have when you're trying to get to this place and I think Cassandra has a really keep key place in the whole microservices continuous delivery transition.

B

So got we start a little bit late. It could take a few questions and, like I said, I work at Battery Ventures. This is the these. Are the companies that I mostly play around with? This is our current portfolio. So if you want to talk to me of any, if these companies happy to do that, it questions yep.

A

B

Talking about the mapping of tables to basically clusters and microservices the way you can do it lots of ways, but the sort of best practice way I. Think of building out a micro service architecture. Is you want to own the data sources? You want to own a data, access layer and force everybody to go through that data access layer? You don't want anyone acts. Anyone else accessing your database. You want to be able to hide any maintenance, work or upgrades or anything you might want to be doing.

B

You can hide it behind that data access layer, which means your business logic code, is all using rest calls into the data access layer from everywhere else. The netflix that's enforced, using security groups. So the only thing that Cassandra trust is its data access, layer right and everyone else has to go into that. So that's that's one model. You then you're optimizing differently here. So you've gotta read denormalized data model, you have your rights are sprinkling across the system and you have to have checkers that make sure all these databases stay in sync.

B

So you do. There is some work to be done there, but that can also happen in the data access layer. You can have threads running. There may be at night when it gets quiet and traffic drops. They've got some extra capacity and you can use that capacity to make sure that all of your foreign keys exist, and you know all the references. All the cross referencing between clusters actually still works right.

B

Yeah am I out of time. Now, all right thanks. Everyone.