Apache Cassandra Cassandra Community Webinar Series, 19 Apr 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cassandra Community Webinar | CMB -- An Open Message Bus for the Cloud

Description

At Comcast Silicon Valley we have developed a general purpose message bus for the cloud. The service is API compatible with Amazon's SQS/SNS and is built on Cassandra and Redis with the goal of linear horizontal scalability. In this Webinar we will explore the architecture of the system and how we employ Cassandra as a central component to meet key requirements. We will also take a look at the latest performance numbers.

Speaker: Boris Wolf, Senior Software Engineer at Comcast

Boris Wolf has more than ten years experience working for various technology startups in the Bay Area and is currently the engineering lead for the CMB project at Comcast Silicon Valley.

A

Edition of the cassandra community webinar series- I am delighted to have with me today: Boris wolf from comcast and Boris, is at the comcast Silicon Valley innovation center. That's where you know comcast works on exciting projects around the future of television.

A

Boris is also going to be at the Cassandra summit on June, eleventh and twelfth and we'll be presenting there. So if you haven't booked your tickets to that event, please make sure you do so. You can do so on the datastax website before I hand over to Boris. Just a couple of housekeeping items number one we always get asked. You know when will the slides and video be available from the webinar, and we will aim to have that up tomorrow we will email all of the registrants and attendees when those are available.

A

Also we will be taking questions during the presentation. So if you have a question, please use the Q&A panel in WebEx and type your question there. We will hold them until the end of the presentation. I will read them to Boris and he will tackle as many as we can during that session. So a couple of key dates for you coming up as well: please don't miss the next of the community webinars in the series and I'll also just circle around on that at the end, so without further ado I hand over to Boris wolf.

A

Thank you very much for joining us today. Boris.

B

And hello, everyone I'm happy to be able to present here today and yeah. Let's jump right in here so today, I'm going to talk.

A

I'll just to let you know we can you loudly and clearly clearly, oh, that that's.

B

Perfect, thank you. So today's topic is the cloud message: bus CNB. This is a project that we've been working on at comcast silicon valet for for over a year now and what it is. It's a general purpose message: bus infrastructure that we implemented on top of Cassandra and Redis, and we've open sourced it. By now it's up on github under Apache 2.0 license, so you can check it out there if you're interested, and so, if you, if you look a little bit closer, you see that CMB really consists of two separate services.

B

One is CGS, which is a queuing service and the other one is CNS, which is a topic based pops up service.

B

So the interesting thing here is both these services are API fully API compatible with their Amazon counterparts, sqf sand SNS, if you're familiar with them, and so, if you are a user of SGS or SNS today, you could switch over to using cmd without having to change any of your code. So the compatibility goes so far that you can actually use the Amazon SDK to to access our version of it. This CMB service.

B

So today, I will I will actually talk about a little bit about the motivation behind why we did make that effort of implementing our own version of it. What our requirements were and then I will go into a technical, deep dive, explaining the architecture and how we use Cassandra and Redis to our advantage and then finally, I'll present a couple use cases how we use this message: bus in house and talk a little bit about. What's what's next so but first, the question why? Why did we build our own version of it?

B

So obviously comcast is a large organization, so we have a lot of project teams in different locations and many of them need some sort of message. Bus infrastructure and what we find is that different teams use different technologies. So some and abusing off-the-shelf open source solutions that are available out there, such as listen to others, have been building their own custom solution tailored to their specific requirements, and what we are trying to do is really having one infrastructure that that everybody can can standardize on and share. So that's one reason.

B

The other reason is we're operating a number of data centers across the country, so data center, failover or data center outage is a possibility for us, and if that happens, if you imagine that service being deployed in across all these data centers, we almost wanted to be agnostic to the fact that there are multiple data, centers involved. So so one example, actually you could think of is if you, if you just think of two data centers, you have clients in both data centers and they they access the same queue by n, queuing and dq'ing messages.

B

You, ideally you want messages to flow freely between data centers, but at the very least, if we have a data center outage, we want this to be. We want to have a smooth failover where there is no disruption in service and we don't lose any any data, any messages and our cues then. Finally, another aspect is we need to scale. Obviously you want to scale the traffic you, you pump through an individual cue, the throughput, but we also need to scale in a different dimension.

B

So you could imagine use cases where we assign or dedicate one q / setup box, that fits in customers living rooms. So we have millions of customers, so that would translate to requiring to be able to create millions of queues.

B

So we need to scale there in that respect as well and then, finally, we have pretty tight latency requirements. So at the beginning of this project we sort of arbitrarily set this goal of 10 milliseconds response time for the 95th percentile. But what it really means is that, even though we want to deploy this service across multiple data, centers for clients for the service should experience data center local response times.

B

So we did evaluate a number of other options out there to use for this purpose and we we rejected every one of them for for different reasons. And then, finally, we arrived at the Amazon, fqs and SNS services and we really liked their their api and and how these services are designed so to get a little bit of an understanding, just in case you're not familiar with SPS, I'm giving you the 60 second primer here of how fqs in amazon works. That's the they call it the simple queuing service.

B

Everything is centered around guaranteed delivery there and which is a key requirement for us as well. By the way like message. Loss is really not an option for us, whereas the message is being delivered in order. We make it. The service makes a best effort there, but there are no guarantees being made. So you could occasionally happy day scenario. You will receive messages in order, but every once in a while, you may see out of order delivery and on the client side you would need to take care of that. Same goes for duplicates.

B

We have guaranteed at least once delivery, but there is a possibility for duplicates and again the client needs to be able to handle that now there are a few core api's, very simple and straightforward. There is a sent message, crawl and a receive message, call that you used to n2 + eq messages, but then, if there is a delete message, so why do you need delete message?

B

That's sort of like the Amazon twist here in this pack, so the way this works is basically the assumption is that you do not trust the message recipients, the clients that pop messages off the queue. So imagine you have multiple identical clients competing for messages from the same queue now one client happens to pop up a message and then crashes before that client was able to actually adequately process and deal with that message. Now that happens the message.

B

Basically, when you receive the message of the queue it will not actually be deleted of it, it just becomes invisible and a timeout. A visibility timeout will be associated with it, which is by default 30 seconds.

B

So if that client, that popped up the message dies and never comes back and deletes the message to confirm that it actually processed it, it will reappear on the queue magically after 30 seconds for another client that has not crashed to pick it up and deal with that message, so that is sort of like in very brief terms, the core, the essence of that service guaranteed delivery, at least once delivery and there's this sort of transaction that you pop up a message by receiving it, and then you need to come back a second time and delete it to actually confirm that you processed it now again question so why?

B

Why did we develop our own version? Why did we clone the service? So, yes, we do like the fact that it's guaranteed delivery. We, like the simple standard, a TI and the scalability of this, but I guess the the biggest like business concerns aside. The biggest technical reason to that didn't work for us using SQS is first of all latency, because, by definition, everything you do in the Amazon. Cloud is a cross data center activity.

B

Unless you use sort of like deploy all your services in Amazon, which, at least in the near term, is not really an option for us and then also the fgs service by amazon has certain limitations in terms of message size, so message size is limited to 65 kilobytes, for instance, certain time out, there's like a message, retention periods that cannot be extended more than I believe 10 days and then finally, there you can schedule future dated messages and again an fgs. We have the limit of no more than 15 minutes into the future.

B

So by implementing this back on our own, all these limitations simply become properties in a lumberton, a properties file for us that we can adjust to our own needs. So to summarize, what we actually set out to do a year ago, with built a horizontally scalable queuing service on top of Cassandra and Redis, which is API compatible with Amazon, SQS and SNS I, should say so a little bit about. Why did we choose Cassandra and red is so Cassandra. We chose basically for its cross data center persistence and replication ability. It comes with that.

B

Essentially out of the box- and we don't have to worry much about it and it was a key requirement for us- and we know from experience in house and from literature that it's it's it's proven to to scale massively horizontally and then Redis at first. We implemented this purely on Cassandra and then a little bit later. We did some refactoring.

B

We brought reddit into the picture simply to meet our tight latency requirements and also, as you will see, Redis is not a simple just a simple cash I'd like memcache d, for instance, but it also provides a rich set of data structures such as lists and so forth, and we use that to our advantage to help with the best effort ordering of messages and to handle the visibility. Timeout mechanism that I was just describing when I explained the dilute message, a TI of the sqf spec, so at first before I go into the entire architecture.

B

That shows how Cassandra and read us play together and make the queuing service happen.

B

I want to spend a little bit of time talking about Cassandra data modeling here, so our schema has a number of columns, families that store all kinds of things like user metadata, Kyle, Q and topic metadata, but really the key problem that we faced was the key data structure is how do we represent messages that are currently in a queue, and we went through an iterative design process coming up with different options on how to design this and I just want to quickly walk you through what we did here and what design we came up with ultimately, and what we're kind of like some steps along the way.

B

So, if you are so this year, should is supposed to be a representation of that Callum family, that that represents the messages that are stored in a queue and if you are coming from a relational database background, as we were at the time you may be tempted to say, okay I want to store all the messages that are associated with one queue in a single column. So if you do that, basically, what you're saying is so you could?

B

The roki could be, for instance, the timestamp when the message was received, and then you have a single static, predefined column called message body here in this diagram and then as messages come in message, 1 2, 3, 4, 5 6 they come in with their time stems the time stamps will be the row key and the messages will be values in that message. Body column. Now the problem with that is that what kind of questions do you want to ask the Q?

B

You probably want to look at the head of the queue and the tail of the queue. Maybe you want to squeeze in future dated messages somewhere in between and the problem is out of the box. Cassandra doesn't allow you to do like range life. Queries that you would need to be able to do to figure that out. So basically, unless this would only work if, instead of the default random partitioner, you would use the order preserving partitioner, which sort of like preserves the order of rows as they are put into nodes on the ring.

B

Basically, everybody we talked to who knows cuz hundred better than we did at the time, was sort of advising against it and said order. Preserving partitioner have some disadvantages and you should really only use them if you, if you don't, have any other options, so we were like okay. That was our first design. Why don't we do it another way around so here we are storing all the messages that are in a queue in a single row. So again we have time stamps and then message bodies that are associated with that.

B

But now you find all these messages in a single row and the row key is the queue ID in this case q, 1, 2 3 as an example. So now you take advantage of a very neat feature of cassandra, which is that it naturally swords all the the columns in a row. So in this case we chose time stamps. So we will see the time stems.

B

We will all see all the messages nicely sorted in chronological order and you now can do Council I screws that allow you to access the the head and the tail of the key and do all these kinds of operations that are important to us. Now. There's one problem here with this design, and that is that in Cassandra, if you're, this bro, obviously, if you have a very busy Q, could get pretty wide.

B

So you you will end up having like a very wide row and all the messages that are in one row will always map-21 node the Cassandra ring, so that could be a problem or a potential bottleneck. As you have a very hot cue, you would like route all the traffic for that cue through the one single note, and also keep in mind here that we.

A

B

Implementing a message queue on Cassandra: it's not really the typical use case in the literature a lot. There are a lot of examples of using Cassandra for like sort of like right, heavy applications, time series data such as logging, while you sort of like keep writing a lot of data in some sequential order and every once in a while. You read and you rarely delete. But in our case this is different. A queue has almost an equal amount of reads, writes and delete.

B

So we have a lot of turn here and that will also produce a lot of tombstones and we need to control that as well. So we figured in particular in our case having all these messages for 12 and one row would be too limiting. So what we ended up doing is sort of like a hybrid approach where we sort of like shark.

B

Now all the messages for a hue across a number of rows and in this case, by default, we shard across 100 rows, but you could change that / q to a different number as well. But here in this example again you have this queue with the ID 123, and then you have shards that are indexed from 0 through 99.

B

So, as you can see the messages as messages come in they they are written into random shards of that queue, but they still sort of preserve that ordering, because we're using timestamps as column names and actually I'm introducing another thing here in this slide. Instead of timestamps, you should probably use time you you IDs with this. Essentially is it still time stamps, but it's sort of a composite key that that adds some entropy to it.

B

So if you had like two clients that are sending messages at the same time, the same time stamp, those messages would not override each other, but these tiny IDs are globally unique and sort of guarantee you that you don't have that situation where you would see collisions if there's a lot of traffic going across that Q. So again, by doing this charting across 100 rows, we now have the advantage that we can. Let me automatically sharding the the traffic for AQ across up to 100 nodes in a Cassandra ring.

B

If your ring is even that big, but the pointer is that we are using multiple nodes in the ring to to deal with the tubes traffic, and that is, that is exactly what we want now. The only problem that you are facing here is now I mentioned. We are sending messages into random shards and we are also reading from random shots.

B

So that could mean in this example here, if you happen to, if you, if you issued a read request- and you happen to read from from chard 00, you will lead message one, but because it's random, you could also end up breathing from from row 99, as you would be message three before message one. However, you would never read message six before you read message one because that is in the same row 00 and you would sort of like get message one first. So globally we still have good ordering of messages but locally.

B

Any two given reads consecutive reads: might still have some probability for out of order delivery, which is ok, I mean that's not against the contract. But again we are trying to make a best effort here and you will see that as sort of where reddits comes into the picture, which helps us in straightening out that ordering problem, but at least from a Cassandra perspective.

B

This design here charting messages across multiple rows and using tiny IDs as column names, because the design we ultimately chose- and that has been working well for us, so to explain how Redis works here. I will use a a quick dataflow example. I will use one queue and send three messages onto that q messages 123 in that order, and then I will receive a message and of course that will be messaged one, because that was the first 1i sent. And finally, I will delete that message to really pop it off the queue.

B

So here you see in the upper left, that's our web service endpoint, the simple REST API endpoint. We host that on jetty, basically in the lower left of that diagram, you see the cassandra column family that I just explained that that stores q messages across a number of rows and then on the right. You see Redis and I mentioned before. Regice provides different types of data structures, so we have a list here and then hash tables and so forth, and we will see how these things work together.

B

So I friend in a first message, the message will be written on a random charge and that Cassandra Collins family and we put the message ID on to a list in Redis and we have one list in Redis, / q and the message. Id is an aggregate, a composite really of the row key and the column name. So we have the full information. We need to pinpoint the message in Cassandra the queue ID the shards index and the time uuid.

B

Then we send two more messages and you can see they trickle in they go into different charts or random shards in Cassandra, but they all go into the same list in Redis and they are nicely ordered, because essentially, that list is an in-memory queue. If you will so that's what what helps us doing? The sorting so now, when I, receive a message, I pop that message off that list in Redis and notice what happens in Cassandra? Essentially, nothing, we don't. We don't delete anything. We don't write anything. We just don't do anything.

B

Every Empire dealing with the disability time what happens in in Redis, which saves us a delete in the write operation in Cassandra, which helps us again controlling the churten that we create there. Now the visibility visibility timeout is handled in this hash table that is labeled automatic VTO. So you see here, as the message was popped off, that Redis list. As the message ID was top of that red is list, it was placed into that hash table.

B

So the message key here is the key in that hash table and the value is actually a time stamp, which is dated 30 seconds into the future, because 30 seconds is the default. Timeout visibility, timeout for messages, so there's a background process that keeps checking that hash table. If messages has to be made revisit all. But in this scenario we are assuming it's the happy day scenario.

B

The client processes that message one in the correct way and comes back and deletes the message, and that is then finally, what removes it from that Redis hash table and then finally, we do a delete from from Cassandra, which essentially removes that message from the Cassandra Collins family, so I know. This was really quick and brief, but essentially this is really the core of how entering and dq'ing works together with the Cassandra and Pettis. Now click architecture recap: cassandra is our persistence.

B

Layer messages, as I mentioned, are charted across 100 rows, and we mainly do that to to distribute the queue traffic among cassandra. Note among cassandra nodes in the ring Redis would introduce there a caching layer to meet latency requirements, but then we realized that it also lends itself to improving the orderly delivery of messages and we use it to handle all of the message: visibility, all of the business that happens between receiving and deleting a message cassandra doesn't have to be involved in that.

B

So what are some of the key features that that we used from Cassandra that made Cassandra so appealing to us? Obviously main reason the driver was that cassandra has this ability to replicate data across datacenters sort of by default. You don't have much to do much or configure much to make that happen, and the other thing that was appealing to us is this aspect of tunable consistency. So all the reads and writes we are doing we're, choosing local forum rates and local quorum rights, and what that means is the ring in your local data center.

B

We are, we are accepting a read or a write operation as successful as soon as a majority of the replicas in your local data center have confirmed the operation we. We know that, eventually, all this data will also travel to the ring in as it is it spans multiple data centers into other data centers, but we're okay with the fact that there is a short period of time where we, where we, where we cannot be sure that the data has already arrived there. Now, you could turn that consistency level app.

B

You could turn it to like either all or all quorum all or you could turn it down to just one or any, and basically what you are doing, then, if you are you're sort of like trading response time for guaranty in terms of whether your data is actually persisted or not, but it seems local quorum reads: are a really good balance between response time, latency and and availability. Basically, so so we ended up doing that, but you could change that if you wanted to.

B

The other reason we chose cassandra is because we need to scale to millions of cues, as I mentioned before, and we know cassandra has the ability to scale as far as your resources, let you basically but then once you get into the nitty-gritty detail of actually implementing it. There are a number of really neat features that that make cassandra work very well for us, for instance, I mentioned there is a message retention period if nobody picks up a message of of a queue by default after four days.

B

This message will just disappear and the way we implemented, that is by putting a TTL a time to live of four days on every column, value that that represents a message in in a queue. So we don't really have to do anything. It's it's simply the cassandra TTL that will ensure that we guarantee that this message retention period of net so again I.

B

We came to this from a relational database ground and the team that worked on that didn't really have any Cassandra experience coming into this, and so here are a couple quick lessons learned for us, like transitioning, from relational database to to the no sequel world. So a lot of introductory articles sort of like make this analogy, saying like oh, you can think of a column family as a table right because you have columns and rows and so forth, but I found, or we found that this analogy is really not very helpful.

B

Rather so I'm a Java person, so I wrote some Java syntax here. What my understanding of a column family is. Rather, you should think of it as a hashmap that map's row keys to tree maps and those tree maps and turn map callum keys to evaluate column values, and the only reason I'm choosing tree map here is because that's sort of like java talk for saying. I want a nap that sorts by the key. So you have this sorting aspect here that columns are sorted or column.

B

Values are sorted by column keys, whereas the rows are not necessarily sorted by arrow key unless, unless you choose the order preserving partitioner, so if you look at it this way, all of these knows that come with no sequel sort of like start to make a lot of sense right. So yeah there's no need to specify column names in advance. You don't need, like a static schema that you would sort of like me in a relational database, and this is exactly what allows you to just like wide rows value.

B

Less columns composite sees all those things then start to make sense. And, furthermore, you have no unique constraint, no foreign keys, which really means you have no complex queries right. You cannot really write complex queries, so the answer is really identify like what is really the key data structure in your system and design.

B

Your schema around your queries and that is sort of like the example that I was making earlier, like the key data structure for apps with how do we store messages in a queue and what are the queries that we're going to be asking and so then come up with a schema that lends itself to asking these questions and so I think the only thing we really found that reminded us of of relational world.

B

There are indexes right, you have the primary index on the rocky and you can define secondary indexes on predefined columns and by all means use those to make your careers more efficient, where it makes sense, but yeah I just thought that was interesting, quick words on scalability and availability, so the key operations on the queue send receive an elite. They all scale with the Cassandra ring and the API web service end points which are stateless and, of course, red is charged.

B

So one thing I didn't mention by the way is: we are using Charlotte credits so and the sharding the Chardon key, is the queue ID. All operations are constant time operations, so so they scale well and so I guess. The only potential bottleneck is really introduced by reddit, not by Cassandra, in the sense that, since we use the queue ID as the key for sharding over edits, like all the traffic for 1q will always go over, a single red is chart, but in practice rennes performs really well.

B

That hasn't really been an issue for us availability, we're sort of like banking on the availability of Cassandra here, and we feel that's a pretty safe bet, and the other interesting aspect is that the service functions without Dennis. If red is goes dying down, the service is still available. There's no message loss. The only thing you would observe is degraded performance and, of course, because we lose the management of visibility time, as you will see a lot of duplicates, but the service is still available and you have still the property of guaranteed delivery.

B

So here to finish up with cqs I, have a quick example. Imagine the service distributed in two data, centers bc, one and DC two, so you have to completely identical deployments here behind a global load balancer that will end route traffic in this case data center.

B

One is the currently active data center, so all traffic that comes from clients requesting operations on Q's will be routed to the local load balancer and data center, one that will then round-robin across the service endpoint and those will then access the cassandra ringing and the Redis charts to do their work now. The one thing you notice here is the the component that spans across datacenters is the Cassandra ring, because this is an eighth note ringing for nodes in data center.

B

14 notes in data center too, and as I said before, all our operations are local quorum, reads and writes, so they will be confirmed in that data center once ring, but then eventually they will also spill over to data center too.

B

So now, if you want to think about a failover scenario, if data that one goes down, the global load, balancer will detect that, because it's performing occasionally it performs health checks and then from one moment to the next, it will rear out all traffic all 50s traffic from data center, one to data center too, and that's fine, because all the data, all the messages in the cubes are readily available in the Cassandra ring and data center to as well, which is really nice.

B

The only problem is that, of course, again redis the the Redis charts and data center. Two are completely unaware of all queues and all messages they have just been sitting there in standby and again we have background processes that sort of like kick in in this case. If they are cache misses, they will be background processes that start heating up these caches and that will take depending on how much data you have in the system is use seconds or a couple minutes. But after a while, the service will be fully available with the same performance.

B

Again as before and again, you don't lose any messages which is really the key feature here around guaranteed, at least once delivery.

B

Now and with that that was sort of like the part of the talk that deals with the queuing part of the service cqs, now I'm spending a little bit of time on SNS and our version cns, the public pub stop service.

B

We don't really have the time to go into the same level of detail here, but I just want to briefly point out a few things about the cns service. So it's a topic based on your subscribe service. The supported product both for subscribers are primarily HTTP. So usually you do HTTP posts to these subscribe to end points that tria topic, but you can also subscribe cues as endpoints, so you can publish into AC qsq or even into an amazon fq SQ because they are API compatible right again, there are a few core.

B

Ap is primarily, we just have to deal with creating a topic than you subscribe, end points to that topic and then, finally, you publish messages on a topic which will then result in sending out those messages to all subscribers. That's really the core of it and I will again explain how this works or how we implemented it by using a very simple data flow example, and in this case I'm just making one call which is we're assuming the topic is already created, the subscribers are already subscribed and we just publish a message message.

B

One and the topic we're dealing with here is topic T and it's a we are assuming. There are four subscribers there's s1 and s2, which are HTTP subscribers and then there's s5 and s6, which rc2 has q's as subscribers. So again, I have a little illustration here. You see how the published message call comes in at the web service endpoint on the left, and so just to clarify the two green boxes are what you would see.

B

As a user, you see the web service endpoint that you can hit with API calls to SNS to cns, where you publish messages or subscribe or unsubscribe, and of course you have the owner of the subscribers. You also see the end points where messages arrive. What you see in the middle that blue box is really the internals of how we implement the cns service and you, as a user, never see that from the outside I'm, just explaining the architecture here and how we, how we implement the service and the interesting.

B

The thing I want to point out here is that we use c2s cues to implement the cns service and we use these cues to distribute the burden of standing out to potentially a large number of subscribers like if you think you have like a topic with a thousand subscribers or so that is a lot right. If you were to do that with a single publisher, that's a problem and we use CQ excuse to sort of like distribute the work load here and make latency manageable.

B

So when you publish a message, what will happen is we will write a message into the queue that is named, published job queue here, and all this message contains is the message body itself and the topic ID that the the message is destined for and then we'll have a background process that will read from that Hugh analyze, which topics message is for and inflated with all the subscribers to to that topic and actually chunk it up in sizable amounts of work.

B

So if you, for instance, have like say a thousand subscribers, we would chunk it up in 2 into 10 subtasks of hundred subscribers each.

B

So in this example, we only have four subscribers and just to to a illustrate that here, I'm tracking it up into two messages and and those contain the message that is to be published and the subscribers that the work is destined for and then there is a different set of workers in the background that pops messages of that second see qsq, that endpoint published queue and does the actual sent out to the subscribers, and the message arrived on those four endpoints and that's really the flow so again publish go into the first q inflated with all the subscribers and chunk it up and then send it out.

B

That's the workflow here now the interesting thing like it was kind of like works well for us using CT as cues as the underlying mechanism here, because messages are preserved when even when those background workers that do all the fanning out are down or overloaded, so nothing gets lost again. We have guaranteed delivery because we are building on that feature of the sea q, SQ and also the visibility time out of the sea q SQ, the built in visibility.

B

Timeout takes care of the guarantee delivery aspect as well like if one of those background processes dives, while it is publishing to a and I to an end point before it gets to delete the message from that internal Q, it will reappear on the q and a different worker will take care of it. So so that works very well for us. So was brief.

B

Here, there's there's a lot more detail, but just as a high-level overview that that was my my quick intro into into the cns publish-subscribe service, comparison of our implementation with the amazon, SQ, f and f NF. Our goal is full api compatible compatibility and we are pretty much there. We are implementing. All ap is pretty much, all parameters are supported and you can even use the Amazon Java, SDK and other language bindings to access the service.

B

There are some things we do not implement yet, like one example, is digital signatures, the latest version of it at least version four. We don't support that. We version one and two are okay and we're working on catching up there as well, and but really there are not that many limitations or SMS endpoints. We don't support that right now, because we didn't really have a need for it, but again I mean we could add all these things.

B

If, if there was a need for that, however, more significantly, we do provide a number of enhancements, most notably the fact that you can create an unlimited number of queues topics and subscriptions in our system, whereas in I believe in FNS in Amazon. We are limited to like a hundred topics per user and similar restrictions, and you can adjust all these other parameters, such as message, size and timeout periods and so forth.

B

So with that I'm now transitioning to a actual use case, what we, what we are using this for in-house- and this is a use case for the cns pops up service. So what you see here is a screenshot of our x1 setup box that we are now rolling out to to customers, and so what you see here is what we call the sports app.

B

So in the left, where you see Travel Channel, that would be the area where you typically watch you, the TV channel you're currently watching and then but then you can open apps, and in this case we would call that an L view, because you have an app that occupies part of the screen and then on the left. You can actually see the program, so this sports app allows you to follow, live sports events and we cover NHL NBA and a couple of other leaks and you can see sort of like what games are currently on.

B

What is the current score? You can go into some sort of detail screen that shows you the detailed score, the last play event and other statistics about the game, and if you stay on that screen, it will update as the game progresses and events come in and in the back end we drive that with by using a cns topic, a sports topic that sort of like publishes stands out. These the game event notifications to all sports apps that are currently running on on customers set up boxes.

B

So here this is a look at the back end right, so we are looking at a single endpoint receiving events sports events. This I took this this scan here doing a month in December last year. I believe this is roughly a month worth of data, and you see these little clusters of messages where you know. Typically, it's around like 10 messages, a second or so for a certain period of time.

B

Those are times when evenings or afternoons when games are on, and then you see some times where there's an exceptional high amount of messages like maybe 60 or 70 per second, that was like probably a game night, where a lot of different leagues like NFL and NBA, maybe were, or a number of other leagues we're having a lot of games at the same time. So against here. This is like one nice use case we find for for our CNS pops observers. This is a use case, so yeah.

B

This diagram is not not very pretty, but imagine that the boxes on the right were set up boxes, excellent, set of boxes in customers, living rooms and the grist case here would be so, for instance, the emergency alert system. Right say there was like an earthquake somewhere in the country, and you wanted an emergency alert to pop up on everybody's TV screen, basically either by zip code or by state or even nationwide. So this would be a typical use case again for a CNS publish-subscribe topic right.

B

You have a topic for the emergency alert and he was sort of want to fan out that alert to all set up. But the problem is, if you, if you use the standard mechanism of pushing HTTP posts, you typically cannot really reach those set up boxes. You cannot really reach a port on the set of boxes because, typically they sit all of them sit behind some sort of firewall. So this is an example, a good use case.

B

If you were wondering why would I ever want to subscribe, 8 q, rather than just a straight HTTP endpoint. So in this case, in this example here we have one dedicated cube per set up box in our data center. Essentially, and we are publishing the topic publishes messages into that queue and then the setup box can go out and pull data from that q.

B

So, by doing that, we can sort of like circumvent that that problem of having a firewall in the middle, so that is a nice use case for seeing how CNS and seek us can play well together in a use case and I just drop that with a nice example.

B

So that's pretty much what I have like just a couple words here moving forward, as I mentioned, we have an open source under Apache 2.0, it's on github, so go there check it out. We are, of course we are we're still, since we decided to to sort of like clone the FNS and FF api's we're sort of like following their specs as they evolve, gruesome, they change their their API as well.

B

Obviously, as timings on, we are currently doing a significant amount of load and stress, testing and yeah I will again I will do a talk at the Cassandra summit in June around the same topic and I'm planning to present some some more data really around. Like really hard numbers around how the system scales.

B

We are currently working on simplifying the deployment and scaling up of the system to make it actually more easy to install it and get going with it, and we are looking at our first production deployments that are currently still isolated by application. That, of course, the long-term goal would be to have like sort of like a seek us or cns as a service that could be shared by by many users and then finally, as CMD is a re implementation of the Nana's on web service.

B

Api is sort of like make sense to to think of OpenStack, and we are currently figuring out where we could fit in the OpenStack world or how we could integrate with OpenStack. So that's what we're currently doing and with that I. Thank you very much for your attention and here's. Our github link, we also have a google groups forum, where you can sign up and ask questions and then finally there's my email.

B

You can feel free to send me an email directly if you have any specific questions, so I think at this point, I would be ready to take some questions.

A

Thank you very much indeed Boris for that and just a reminder. If you would like to ask for us a question, please use the Q&A or chat tab on in inside of WebEx and I will go through those and ask Boris.

A

So just a reminder, because we're getting a couple more questions for people that joined late, they are asking about the recording and the slides, and a couple of people want to share with others on their team. We will have a recording of today's presentation up on our community website tonight, Cassandra org, probably by end of business tomorrow, and also the slides available, we will send an email to all the participants and all the registrants when that is available. So do not fret if you joined late- and you know mr. a little bit of Boris's introduction.

A

Also Boris will be at the San Francisco Apache Cassandra summit, which is on June, eleventh and twelfth, giving a more detailed, deep dive talk from the one he did today and by then sharing some more data that is at fort mason. And if you are looking at your screen right now, SF summit 25 will get you twenty five percent off of the ticket price. We have a great event planned so Boris from saku. Do you use equal number of rows and Cassandra nodes in multi row, q setup.

B

Okay, so if I understand the question correctly, the question is whether we use the same number of shards for for each queue and in the beginning. That is what we did.

B

Indeed, we had it like fixed at 100 shards / q, but we have recently changed that and opened that up a little bit so that when you essentially made that AQ attribute now, which is obviously not a queue attribute that you could set like an Amazon right, because it's our own implementation, but still you can- you can actually set that send that through the Amazon SDK regardless.

B

So by doing that, you can now define as many sharks as you like on a per cubasis either when you create the queue or when you change the queue attributes.

B

So by doing so, you could have a queue with as little as one shard and you would already get like perfect ordering out of cassandra without even involving Redis. If you have a cue that doesn't see a lot of traffic or you could increase that number of sharks to anything between you know, one and a hundred going beyond 100 probably doesn't usually make sense but yeah. So the answer is: if you don't, if you just go with the defaults, you will end up with a hundred charts, but you can change that to a different number.

B

A

Great Thank You Boris questions coming thick and fast now, we'll get through as many as we can. Giancarlo asks about data modeling did you use cql to or cql 32 model data.

B

So we are currently when so when we, when we started out I, think CTL 3 wasn't quite ready yet so we we did use some cql and we still do that today, but for the most part, so by the way I mean this is implemented in Java and we are using the Hector library. So for the most part we get away with mutators and slice queries and and so forth.

B

So we don't really use a whole lot of cql, but as far as it goes, there is a couple places where we use CTL and not security.

A

Okay, thanks and yes, cql 3 has been available since January, with the one dot to release.

B

A

um Okay, so next question from Patrick in case of a Redis failure. How do you transition the client from Redis to Cassandra.

B

So you don't really need to transition anything to do that, because, essentially it's the implementation, the implementation of that fits in the web service endpoint, which is really a Java servlet in a jetty or a tomcat server.

B

So when you, when you do your request to like receive a message or to send a message, the implementation of those API calls will do the entire job of using Hector to go to Cassandra and using the actually jealous library by the way to go to Redis and the code is written in a way that a Redis failure will will not really matter I.

B

Today, you will still get the service response for a receive sender, delete, call and internally we lock the fact that you know reddit is unavailable and that you know we escalate that too, to make sure that somebody takes care of bringing read a stack up and so forth, but to the user. It's really transparent and again, there's no real operational activity that has to take place to to sort of like transition from from back and forth between rather than Cassandra spilled into the code really.

A

Okay, thank you very much indeed, and.

A

Let's have a look here: I'm, not not quite sure, exactly what this one is, but you will know it Boris what we look differently to guarantee fi, fo ness and that's from Thomas, and he then asks a bit of a follow-up clarification or if you need to support the orderly message. Eg need to guarantee the timestamp order right.

B

So so, okay, so this question is around the thigh furnace or the message order right. So what what? If your application really needs? Orderly, delivery, I, guess, I! Guess that's what the question is so well.

B

One thing I I just mentioned in an earlier question- was that so you have the ability to define the number of shards for that you use in the Cassandra Callum family, and if you change that to one right, you will you will sort of have perfect order of your messages, no matter what, with or without lettuce and also as long as reddit is around, and as long as you don't do a data center failover you will its Redis that will guarantee the be orderly delivery right. So you will get messages in order.

B

It's just you cannot that's the happy day scenario right, so there is there's still that potential for occasional out of order deliveries and yeah. I mean our. I guess our that was sort of like our decision, like that we say we want guaranteed delivery and availability and robustness. Those things are more important to us than handing like a duplicate, free and perfect order messages to to the client. So, yes, that is not really the classic understanding of what a message queue.

B

What a message queues contract should be, but in practice it's it's, it's fine and also I mean.

B

If you really need order delivery, then we are basically we're saying: well, that's that's the clients, responsibility to ensure that you know you don't consume duplicates if that's an issue and you order your messages in case you get some out of order, delivery, yeah, so I guess the short answer is you know right now that that that is, that is the design and- and it wouldn't be easy to change that- to to sort of like guarantee those things, the order and and the singleness of messages as well.

A

Great, thank you so good, still coming thick and fast, let's see how many we can get through shawshank asks was Casca rabbit, mq considered for evaluation. If so, what are its disadvantages? / advantages before you answer, Boris shawshank feel free to email me Christian, at datastax com. We have another company career builder, that is using rabbitmq and I know of another company health market sciences using Kafka that you know I'd be more than happy to give you an introduction to. But Boris did you look at anything else like.

B

A

B

We did evaluate a number of other options. We didn't look at Kafka I believe, but we did look at rabbitmq. We looked at actors and Q. We looked at 0 and Q and a couple others as well so I think the issue with with rabbitmq was that we had well I, guess two things well.

B

First of all, we have trouble generating like hundreds of thousands or millions of cues, with RabbitMQ at least out of the box and at the time that we looked at that so I mean that may have changed, but and we may not have had the you know the appropriate set up for that. But it was one of our key requirements to be able to sort of like Eddie's scale.

B

You know to to hundreds of thousands of a million of cues and topics and thousands of subscribers, and that seemed a little bit challenging in RabbitMQ and then the other thing is it's: while relatives implemented in Erlang right and the which is a great language, but at the same time it's it's a little bit awkward and new and and the I guess our concern was more around the Erlang runtime environment, the virtual machine, whether it's stable enough or you know.

B

We couldn't fully understand how the garbage collector works, and we just felt I mean we are where traditionally we have in doing Java development, and so we we felt a little bit uneasy about using bringing the Erlang technology into that picture. But but having said that, I mean, if you don't. If you don't need millions of cues and if you're not worried about Erling, then I think rabbitmq is would be a fine choice as well.

A

Marshall asks what magnitude of hardware for what throughput per second, if you could use round numbers.

B

Yeah, well, that is, that is sort of like a difficult to answer. A question I mean we know that I mean in theory. You can do like you know, with average server hardware, you can probably do like 10,000 operations per second through a Cassandra node through and play Redis shard, maybe like a thousand of operations through an API server. So you can sort of like do the math here and sort of like come up with the numbers.

B

What you need in terms of ring size and how many API servers you need I mean the whole goal of it is horizontal scalability. So in theory you can scale to whatever you need, but again, having said that, we are really in the middle right now of running these types of load tests and we're assembling the numbers, and we will hopefully be able to present some some nice diagrams at the Cassandra summit coming up. So so I can't really give you a much precise or answer than than that, but yep step. It.

A

Okay- and thank you so much by way for answering all these questions, do.

B

A

Here asks: do you have a use case where the publisher is notified? If the message times out and is cleared from the cassandra row based on its functionality, yeah.

B

So we don't have a use case for that and it's not something not a feature that we have implemented right now or even on our road map. For that matter, and I think for the most part I mean really. Our goal is retaining api compatibility and feature compatibility with amazon. So we pretty much do what what Amazon does so. For instance, last fall. Amazon introduced long polling for cues, because reading from cuse is really not a push thing, but poll thing right.

B

So so they introduced long polling for that and we followed staff here and implemented that as well. But other features like when, usually when people come to us and say like oh, hey can I have this feature. If it's not in the Amazon stack, we would typically say no.

B

One of the rare exceptions was the the charting that we made available as a cue attribute, but that's that sort of, like an exception.

A

Okay, this is going to be the last one why Doug Thompson asked why active passive hot instead of active, active.

B

Yes, that's an excellent question, so I think so we obviously we chose Cassandra to to be able to do active, active hot kind of cues, and that is still the goal here. The only reason that we cannot do it right now. The only reason that we have one active data center and one and standby- and then we sail over- is really the reddest cash right now and there is I understand. There's a project going on in the reddest community to sort of like have I believe it's called cluster gratis.

B

But again, that's not quite ready yet so without that, unless we sort of like invent our own solution for that, we have some issues with reddits to truly do the active, active cues, but I mean we have again. We have some ideas on how we can get around that and we have a roadmap for that, but not not a not a set in stone date.

B

Yet when this will be available, but certainly I mean Cassandra does not stand in the way of that and that's the reason we chose Cassandra and we are working on figuring out the problems that prevent us from doing active, active right now and will do so, hopefully in the near future.

A

Okay, well, thank you. Everyone for joining me today do not miss next they're coming stickam fast. So next thursday we have an introduction to apache cassandra, 12 Aaron Morton, a Cassandra commit a prominent member of the community, will be presenting. May second data modeling hot topic in Cassandra. The data model is dead, long live the data model that is Patrick McFadden and then he'll be doing on May sixteenth to follow up to that going deeper into data modeling Boris. Thank you so much.

A

We really appreciate your spending the time looking forward to seeing you out the Cassandra summit in June I'm.

B

Looking forward to it to thank you so much.

A

Okay thanks: everyone bye, bye, right.