Apache Cassandra NYC* 2013, 20 Mar 2013

Previous Meeting

⏯

youtube image

►

From YouTube: NYC* 2013 - "The Big Data Revolution is an Evolution"

Description

Speaker: Eric Lubow
SlideShare: http://www.slideshare.net/planetcassandra/big-data-revolution-is-an-evolution-ny-cassandra-20130320
Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.

A

Like people time USA, Today, Reuters MTV, these types of folks create content on a daily basis. Hundreds thousands, tens of thousands of pieces of content I mean we need they.

B

Need a way to track.

A

How users are engaging with them on a social level, time series data we've got a lot of it. Everything that happens in terms of tweets, Facebook, Likes linkedin shares anything across the major social networks. These are all things that content creators want to know about, but what's the important stuff right like that's something that people really haven't figured out and that's something that we're working on figuring out, because there's all these disparate data sources. You know you get a little bit from facebook. You get a little bit from Twitter.

A

You get a little bit from LinkedIn a little bit from pictures, but what does it all mean right?.

C

B

Who was creating content.

A

Also had to try and figure out what the best thing to do along social media was. You know that that creates a situation where everyone has to know everything about everything, and that's that's not what we're! That's not what anybody's after so we've done here is we've created a dashboard that allows content creators to take a look at what they've published and it's sorted by.

A

You know what we believe based on all the social actions is the most social piece of content for them at that time of day, and it allows them to do certain things like decide, spend and even help decide what content to publish further.

A

We look at millions of URLs every day, so clearly right off the bat. That's a big data problem, because not only in finding out what's going on on each individual URL, but you're also finding out all the social actions that take place around each individual URL. That translates over a billion page views per month. The b-word is a fun word, because that means that everything that you've done that previously worked will probably no longer work.

A

So it's a it's a total change in scope, which again is why things are evolution, because everything that you don't worked with a few hundred thousand URLs is not going to with a few million URLs. Unless you did it right. The first time, in which case I'd like to hire you because that doesn't happen very often we.

B

Look at about 250.

A

Million events every day, so it's a small number right now, but it's growing and that's talking about across every social network that we track.

A

We use the we use the Amazon Cloud and we do that because everything we do needs to scale up and down with traffic patterns at a given time, we're using between 90 and 130 machines during peak and laws of the day, and with all that information we built a predictive measurement algorithm to measure a predictive algorithm to measure the social web. So where did that start? Well, we.

D

A

Out with rails and because we figured let's, we've got a few assumptions, let's test it and we did and then we started becoming successful and we realized that that would not work. Because when.

B

A

B

Of your early assumptions.

A

Stop working so what does it look like now? We've got a significantly more complex system in order to meet the demands that our customers have on us and being able to use and visualize and understand our data at both a granular and a macro level. We had to put a few things together.

A

This is not exactly what one would call a homogeneous environment, but we what we have works for us to get things done, and this didn't happen overnight either. This is this is two years in the making. So how did we get there? Well, we knew them, for we knew that we needed something that could handle a lot of volume very, very quickly, because once.

B

A

Start taking in a few thousand events per second after going from a few events per second, you need to make sure that every thing that comes in gets stored. You can always process it later, but if you didn't catch it, the first time you're, probably not going to get a second chance.

A

B

Fast right, so many.

A

Locations, this is another way of addressing the you know it has to be there in order for you to to recognize it query.

B

By column firms well, this is a really interesting problem.

A

That works well for us if we want to pull all the tweets that happen on a particular article in a particular time frame, we can do that because we can say just give me all the tweets if you tried that in my sequel on you know a table that had five million rows. Good luck because we did and it didn't work too. Well, we.

B

Like ttls, because.

A

We don't need to keep all the data forever. We can just say this. Data is good for 30 days and then, at the end of 30 days, a terabyte of data just disappears and is long if you're not sticking around cleaning up a terabyte of data at the end of every 30 days, your life gets a lot easier, especially when you have to do it across multiple systems.

A

The other problem that we ran into is we were a very heavy nodejs job and when we first started with cassandra back about a year and a half ago, the.

D

Nodejs driver.

A

B

Whole bunch of memory leaks and.

A

We found that if you ran it for more than 30 to 35 minutes, it would crash which is not fun when you're trying to keep a system with a hundred percent work so up time. So we ended up having to roll our own driver, and this is one of the things that you may have to deal with with early adoption.

A

But you know, thankfully, he's not in the room at the moment, but the guy was up here before he wrote the driver for nodejs for Cassandra, and we have not had that problem since we've had other problems, but that driver crashing was not one of them and the other thing that we really liked about Cassandra is that the distribution amongst availability zones and regions.

A

D

Very useful, it.

A

Keeps our of time we've actually been the the Amazon outage that happened a couple months back we're entire data center went down one of the availability zones. We didn't even notice. We got a few alerts, but none of our systems complained other than hey. I can't talk to the Cassandra servers and we were. We ran just fine, so.

D

Why do we use if we.

A

Went to Cassandra.

B

Especially because a lot of people are you.

A

Know put them in the exact same category. Well, one of the things we really like about mondo is the atomic increments. It's incredibly fast, because a lot of the stuff we do is with JavaScript and in nodejs. It drops down to the wire, sends it across in the beasts on protocol and as a result, it is fire and forget. Those increments have been faster than on most memcache or reddit servers I'll get to why that happens in a minute yesterday released their latest version, which includes hashed shard keys. This was a real problem.

A

Up until yesterday we haven't actually upgraded, but this is a huge win for us. We were doing application, show application level sharding and, and although it was handy, it definitely created problems. It.

B

A

Benefit because you can still distribute your keys, but you still end up with hot spots.

A

One of the other benefits that gives us is that it's a great ORM, it has a ID or mongrel mapper, depending on which one you want to use. They work very well for rails and rails. Is our front end. There.

B

A

Thing as far as we're aware that works as seamlessly with rails as ID, but this happens to be one of them, Margo.

B

A

Gives a b-tree indexes until Cassandra provides some support for deterministic indexing. We can't do range queries. You can do range warriors where you reduce the cardinality like with secondary indexes, but on really really large data sets you return times. Your query response times are just not going to be sufficient, or at least they weren't for us so being able to have access to btree indexes of agri data, which is what we store in is very handy. We we also get to take advantage of document based stuff.

A

We store mostly aggregated statistics collections in and that's why documents work very well for us and again ttls when you can just get rid of we don't exactly have terabytes in, but when you can get rid of, you know tens of gigs or in some cases, hundreds of gigs of data at the end of 30 60 90 days. It also becomes very handy Redis. Why.

B

Do we use Redis well.

A

It supports hundreds of thousands of transactions, a second on a single box, which is great because you don't have to have a number of nodes. But.

D

You do have to have a number.

A

Of nodes, or your fault, tolerance, you just don't have to have it, for you know the basic running of the system. It's a great caching engine. It supports great variable, useful variable types, and you know we.

D

Use a lot of these variable.

A

Types before Cassandra supported them in 12, so we needed this to be readily available. The other thing is that everything is guaranteed to be memory-mapped, because Redis is pretty much all in memory.

A

Everything you do is going to be fast. When you query it, it's got great support for transactional operations. You can roll back if you really need, because you can do that on the application side and it's a centralized, queuing and locking system I know. John talked this morning about the attempt to bring that locking system into Cassandra and they're still working on that, but in the interim we use Redis as our queuing and locking system.

A

What do we use info bright for? Well, if you don't know what info bread is it's a my sequel based column store, which is basically storing the columns vertically rather than horizontally, when you think of the normal way that my sequel stores data the way you envisioned it and the way it comes up to you in the console horizontally is exactly the way my sequel stores it on desk. In most cases, columnstore is stored.

A

The opposite, if you were to just slice out that vertical section, that's the way it stores it on disk can do things like aggregating along the column much much more quickly, so it allows for ad-hoc queries. It works for the standard, my sequel driver, it's great for for ad hoc analytics, you get great compression or your data, and you it allows for pre aggregation.

A

So if you have a whole bunch of numbers say that the ages of people in a column, you can find the mean, because it will store that in advance, it'll store that on the right. So it's just like one of the other pre aggregated databases, but you can run ad hoc sequel style, queries against it. So why.

D

A

Many programming languages. Well, we got a few reasons for that, the biggest is each language has its own benefit right for us in order to really understand the social space, we found that, although we started with Ruby and Ruby's great for quick development and especially for a lot of the web work, it just didn't support all the data science stuff we needed and there if there was a little graphic for our rather than the letter.

A

R I probably would have put that up there too, but frankly, the letter ours is not as cool as a gopher, so we use Python for a lot of our data science and that's what our data science team spends most of their time with, we do a little work and see as well, but for the most part we stick with Python. So everything that we do, then that needs to access any of our data stores. We need to be able to access in Python anything.

B

That happens on the front end.

A

It needs to be able to work with rails. Rails is usually the inter in most cases, the intermediary layer between what the what the users on the front end see and what the back end has access to, and the only thing that sits in front of rails is amber. Amber.

D

Amber j/s, more specifically.

A

Is a JavaScript framework that allows real time updating of your views? No real need to go into that, but it does allow us to transform pages on the fly, which is something that we really like each language.

B

A

Own development benefits, so python is very easy for data just to learn so when they come out of the math world, not really knowing much in the way of engineering outside of matlab, it's pretty easy to teach them Python get them going. The other thing.

B

We really like about.

A

All of these languages and the one thing that's consistent across all these languages: they speak JSON and the reason that it's important is because we have a whole bunch of data stores. We have a whole bunch of languages and we need a uniform communication layer. We're.

D

Going to choose XML.

A

But we don't hate ourselves, so we decided to stick with JSON and build all of our api's to speak JSON. But.

B

D

That all mean sometimes.

A

Even though you have a single.

B

Great idea, it may.

A

Not come out exactly the way you want it to so there's a problem to doing all this. Each data store and each language does come with its own cost. So Redis you can only use a single core and if you want to get one of these beefy, eight core machines with 64 gigs of memory in Amazon and just start throwing everything at it. That's great, except it will still only use one core. So the way we solve that was.

A

We took one of those great eight core machines and we stuck seven instances of Redis on it. Did CPU pinning and made sure that anything that we wrote and anything that we any transaction. We knew that we were going to have to pay the serialization d, serialization price for anything string related and.

D

That may not seem like a lot, but when you're doing tens.

A

Of thousands or attempting to do hundreds of thousands of transactions, a second that serialization price is a big deal info bright. We found out the hard way that you cannot delete or update anything.

A

We did try once we try to delete one row and when it wasn't done three and a half days later, we decided that it would be easier to unload the whole database and reload it without that row and believe.

B

A

Was much much faster than that took 45 minutes. So again you need to know the trade-offs that you're making and the other. The big trade-off that we had with cassandra is that it didn't have vietri indexes, and you know I explained a little bit why that was important. But if you want to be able to run, you know range queries, that's going to be a problem and that's going to be a problem. As your data size, data set size gets larger if anybody's used at all regularly there's two things that you'll, probably notice.

A

First off, that the indexes have to fit in memory and if they don't, you probably won't be able to even log in to because it's just gonna it's going to wave it's going to smile and it's not going to really do anything else. The.

D

Other thing is that if.

A

You run it in the cloud you'll.

D

Find that they.

A

Have a replica pain time, that's forced on you a has. This idea of the least amount of tunable is possible for a working system, which is great in theory. Unless you want a working system in a in something, that's not ideal, say the cloud, something that you can't predict what the latency is going to be between two machines at any given time. So you'll.

D

Find that machines will drop in.

A

And out of your replica sets pretty much without warning and that's not fun at three in the morning on a saturday python, it's great! Unless you, you know like formatting your code, your own way, it's.

B

Really a minor complaint.

A

But when you're trying to standardize across you know how many engineers and data scientists and teach them along the way it does present a few code. Problems for teaching also.

D

The community and.

A

I'm not trying to start a flame war community is not great, we'll just leave it at that, but.

E

We do use it a lot, so I'm.

A

Not complaining about Python itself, just the community Ruby, we.

E

A

Wasn't high-performance enough for everything we wanted to do all right. We tried. We tried the the crypto libraries, we tried the stats libraries and we that everything we tried just took much much longer in Ruby than it was in Python and.

D

We were a very, very heavy.

A

Nodejs shop and we still use javascript very heavily as well, but we found that the exception handling is rather limited and having to bubble everything all all the way up to the top. If you get an unknown or an unexpected exception is not only very costly, the application, but most programmers eventually start to hate it, even those who love javascript. So what's the takeaway? Well, the takeaways, very simply that all this stuff takes a lot of work. All these lessons that we learned and all these things that I'm presenting in a single line.

A

They took us a long time to understand. I mean to really figure out where the problems work. So.

D

A

Of the lesson is one of the biggest lessons we learned was building everything to conform to a single standard and to.

B

Do that we built a.

A

Service-Oriented architecture and what we, what we like to refer to as our internal API and what that is, is basically a layer that asks that you can ask any question to, and it knows where to get the data from at any given point. So we have all these data stores and all these languages, so you just use that JSON layer communicate over HTTP ask the JSON as the API for whatever you want say.

A

If you want to find out the last 12 hours, what happened over the last 12 hours in a particular piece of content, it'll.

D

Go to from the most.

A

Recent hour it'll go to to Cassandra for everything that happened within this UTC time period and then prior to that it'll go to info bright for the rest of the data and I'll return, all of it to you in a single JSON response, I mean that alleviates the developers from having to know anything else other than I need to ask one particular location. This question.

A

The other thing you have to worry about is your data accuracy. When you're dealing with multiple data stores, you need to know that what it looks like what data looks like in one place, what it looks like in the other place and the only way that you can find out that it's wrong. The first time is when someone else most likely external to your organization says a doesn't match be. So how did we handle that? We did? We started out. Writing checks. Programmatically. We found out that sometimes that works.

A

Sometimes it doesn't so we had to spot-check. Once we figured out what we were spot checking regularly, we were able to write slowly, more programmatic and ultimately algorithmic tests to see. Does this data make sense, and the biggest reason for this was when you're dealing with things that are external to your organization, like data that you cannot control? For instance, in our case, we deal with all the social api's. You know Twitter Pinterest, Facebook LinkedIn.

A

These guys deal with massive amounts of data as well, and sometimes they, the eventual consistency, bug bites them to they'll, send down.

D

A

That just doesn't make sense, you know, there's you look at an article that has 500 tweets and then you'll say: hey Twitter. How many tweets is this article that they'll be like seven they're like? Well, that's not really possible, and you know you can ask them again and they'll.

D

Be like no, no, no we're sure at.

A

Seven and then you're like but I, don't know about that. You just told me it was 500 and then you ask him again the like no it's 504 and that's really confusing, especially when you're looking at it against multiple data stores. So in order to understand how to make those differentiations, you need to first figure out where the problems are and that's a lot easier to do visually most times than it is programmatically. So the other thing that the service oriented architecture allowed us to do was build.

D

A

Allowed us to build a framework for testing and what's really cool about this, is that any time we want to test something new? We don't actually have to bring our system down. If we want to test a new queuing system, say right now we use Redis but we're pulling or putting n SQ in there and we tried RabbitMQ for a little while we didn't have to bring anything down.

A

We just had to tell our our queuing system the things that do that in q2 jobs do not just in q2 one place to include it two places or in q2 three places. So we can test rabbit. We can test n SQ. We can test Kafka.

B

We can test whatever we want and the same thing goes with: storage.

A

Engines when we want to test a new analytics store, that's equal base. All we do is just say instead of querying in one place, created in two places and there's no downtime incurred by just adding another system, it's just like any other rolling deployment.

A

And the other major benefit is that you get access to all the toolkits that have been built for every single one of these storage engines or queuing engines, or really we're programming languages or whatever you get to take advantage of and.

D

B

Of the examples I.

A

Like to give is, has built this tangent, the folks who make tengen and built this great system called mms, and it works very well. It's just not great to look at right. It just tells you how your systems are doing on a superficial scale. Right shows you. Some graphs tells you. You know your flush times, things that are important to the system level side and then datastax built this thing called off center right and if you use datastax enterprise off center is basically this interface.

A

That tells you on a low level or a high level exactly what your systems look like, and it's much much nicer to look at, and the benefit is that we get to look at how the systems are performing in a nice way, and if we really want to, you know, feel bad about ourselves. We could go, look at them and mass and be like all right. We got.

B

To look at both these tools.

A

But we can go back and leave on a good note by looking at the prettier. One might not seem.

B

A

Deal but try looking at crappy stuff all the time and see how you feel really helps the mood. So what does the service architecture really look like sitting.

D

Behind that internal API.

A

Is just how the systems look? The the internal API is the guard right. Nothing can nothing goes past, nothing, query! Those things directly. The internal API holds the drivers for all of those systems and as long as the the developers don't need direct access and there really isn't a need for them to have direct access, they can go straight to the internal API and ask any question and have a return to them. Concurrently.

A

The other lesson we learned is trying to keep a packet, the path of the packet consistent down the line, and what does that mean? Well, for example, if you take me, if you have what comes in from the internet on one side and the data stores on the other, just traveling down the first, the first set of diagrams that you see that are yellow, we take a look at the Twitter.

B

A

Are have our event, processing systems which handle things like page views and refer accounts. The.

D

A

Are you know the facebook and twitter api for things that they don't pass down the fire hose, and then we have collection systems for like Google+ and other tears that don't give us direct access that we have partners for everything that comes down the line gets immediately dropped onto the queuing system once it is the queuing system, it's picked up by a consumer or a worker. The.

D

A

Accesses the API to make an equal to ask any questions that it needs to process the data. The internal API will then ask the questions of the data stores and then pass it back to the consumers. This way, if there's a breakdown, you know roughly where the breakdown happened, because if the packet made it this far, it had to follow a certain path.

A

This is not an immediately obvious fact, because, ideally you want the quickest path to your information, but if you don't keep the this highway so to speak, it's very easy to lose track of how your data gets from point A to point B.

A

The other thing that we did is this is how we distributed our architecture when we put everything into the cloud like most people, we're like us, East 1a, just spin up nor machines and you East 1a, and that was great until us, East 18 went down and then so did we everything went down. So what we did was we took a look at the distribution of our data and the distribution of our systems and said: okay,.

B

A

If we have one note of Cassandra in u.s. East 1 a 1 node in u.s. East 1b and one owed in u.s. East one E and you can go with any one of the data set availability zones you'd like. But if.

D

You have one in each one as long as you have.

A

Different pieces of the replica, if you have an RF of three, then you can tolerate losing.

B

An entire data center and.

A

You won't even notice, and that's actually what happen to us and the same thing goes for all of the other system types, but the key here then the key here for us was having this internal API be accessible across any one of these across any one of the the availability zones. Because when you ask any one of the internal API servers, it knows about the systems and the other availability zones.

A

But it's going to be more partial to do its existing availability zone where it sits for the faster response times so, like I said it didn't really bug us. We lost an entire availability zone and it didn't bug us if.

D

You were in the last talk.

A

Here you heard Matt file say that you know ninety percent of the people that use Cassandra on the cloud are in Amazon and there's a reason for that at.

B

D

A

You see every couple of weeks, amazon if you're on their their marketing mailing list they'll send you out. You know a few.

B

A

B

Time they sent out a new feature. You.

A

Always wonder like can I use, this can I not use it. Can I take advantage of it, you know. Is it useful? Is it usual for me to learn this stuff? Well, in some cases it is in some case it is.

A

For instance, we tried to off load our queuing system onto Amazon because we figured hey these guys, probably got it right, like they work with a lot of large-scale systems, so we tried to off load our queuing, remove it from Redis, and we figured out that for us to maintain our own queuing system would cost us roughly twenty-five hundred dollars a month at the time. This was just given our workload and the machine cost. So then we tried all floating it on Amazon.

A

We found out that after a week that extrapolating the cost out would roughly equate to about 15 grand a month. It.

B

Made our lives a lot easier, but.

A

Not 10 grand or 13 grand a month easier. On the other hand, when we wanted to offload some of our are scaling when we needed to do just simple web services, scaling Beanstalk will allow us to spin up a new rails app with the Ember front, end on it with all of our cache data sitting on that machine, and it cost us less than half of what bringing up a new instance would cost just to handle that traffic period.

A

So the reason that people run in the cloud isn't just because it's cheaper or you know more highly available, it's because the services, or at least for us it's because the services that they offer and that are they are continuing to offer, make both the job of the developers and the people that do the ops in our systems. That makes their lives a lot easier, and you know just even going down the list.

A

You can see that that most of these features weren't wearing around six to nine months ago and I think other than maybe the queuing service and an redshifts. We probably use all of them at simple reach in at least one form or another, but to really take advantage of all of that, we we really needed to figure out. You know a good way of expanding the role of what one person was capable of doing so we have one really really smart, DevOps guy, who is insanely overworked and probably not too happy at that fact.

A

D

Amazon came out.

A

Is with ops words, which is essentially a extension of chef? We already.

B

Use chef Ilan in-house.

A

And for those of you don't know, chef is a configuration management system. We started out with me logging into every machine and installing everything that was necessary by hand copying binaries, bringing the data over and ultimately hating every single person that I came in contact with after setting up a machine once.

B

We realize that wasn't.

A

Scalable for me, or anybody in my immediate vicinity, we decided that you know we would shift over to chef chef allowed us to bring up a machine just by saying I want another one that looks like this, and then amazon came in and said. You know what we can do better than that you can just build these things yourself. You can build a template yourself and rather than saying, I want another machine that looks like this.

A

You can say: I want another machine that looks like this in this availability zone, another one in this availability zone, another one in this region and have them all talk to each other via these security groups, and it does all the security groups, is the firewalling and everything else for you and oh by the way, you can add it to your monitoring stack because we handle part of that too, and not.

D

Only do not take the onus off of me, but.

A

That made things really easy for the folks who who actually have to manage the systems I'm going.

D

Also made our deployments easier because again.

A

Anytime that we needed to deploy something it didn't just become a matter of spinning up the machine building engine X building rails deploying the latest code base. We would just say give me one that looks like this and deploy this code hash from from github. The other thing we did is took has made really extensive documentation, because if our DevOps guy gets hit by a bus and I know, everyone loves that scenario.

A

If our DevOps gotta get hit by a bus I'm, not only will we all be upset, but that means you know hiring somebody to do. This will be significantly easier than me going back to wanting to hurt people every time I had to set up a machine so.

B

Let's pretend that nobody listened.

A

To anything, I said for the past, however long I was talking, the biggest thing I want everybody to take away from this is that, after 18 to 24 months of doing this, it just takes time right like every lesson that you need to learn every time, you're going to add to a system or every time you're going to increase your architecture.

A

D

Always easy to get it.

A

Right on the first try, sometimes you got to fail. Sometimes you got to do it wrong and that's okay. I mean you can make a couple of mistakes as long as you've learned from it, and you don't make the same mistake twice.

A

You also should know when to build your own tools when to use existing tools and how to integrate the tools that are out there and then again, there's a lot of folks that that stick strongly to use only there's a lot of books that stick strongly to build, but I feel like in the best environments I'm. You involve the best tools that not only work for your organization, but that work for the people in your organization.

A

Another lesson that we learned was just abstraction. If you don't need everybody to know how to use everything, don't force everybody to learn how to use everything you.

D

Know it's great that the.

A

Developers will want to learn how to use and Cassandra and reticent, and there should be an understanding at that level. But ultimately, if everybody needs to know everything, then none that nobody's going to be an expert at anything.

A

D

A

Architecture, one you know, do.

B

Your best to not.

A

Pigeonhole yourself into u.s. East 1a because, as you will undoubtedly find out the hard way when USC's 1a goes down, so will your spirits and the.

B

A

Thing is monitor everything. Automation does not come easy. It takes a lot of time to build up to the point where you can't automate, but it will ultimately make your life easier. So once you figure out the best processes for you and once you figure out the way that your system works and the best way to keep your systems working, automate ways to get yourself there and I just want to leave you with you know something: I alluded to at the beginning.

A

Revolution only take only last 15 years, a period which coincides with the effectiveness of a generation, and you know in theory this is great but like let's even take that down a notch. Ideally.

D

You're not going to be with your.

A

Company for 15 years I mean it's just not gonna, it would be great, but it's not going to happen. So when you set things up, set things up as if you were walking into the you were going to be the next person walking into it right, evolve the systems so that when it's time for you to move on that, somebody can walk in and say, I understand what this person was thinking. Not. I would love to kill the last person who had this thought the new. Thank you slide the new thanks for listening.

A

So if you're at all interested in anything I said we are hiring and we'd love to talk to you. So I've got some time for questions. If you got any I will do my best to you time to answer them.

A

C

I can repeat the question.

C

A

He asked how how we manage the data flow from going to the various storage engines. So if you, if you remember back to the slide about the path of a packet, ideally when you have that work or system right that worker before that worker, when.

C

That worker asks.

A

The API for information from the data stores, the.

D

A

I know where this data ultimately needs to end up. If it's a piece of data that needs to be available immediately as a real time as a real-time number, then it's going to go to because our system uses mongols, almost everything that we do in real time. If it's going to need to be cashed at some layer or invalidate a cash, it's going to go to Reddit and or make a query to rent us and anything that needs to be sort of etl and in the sense where it's going into a more permanent store.

A

Something like my sequel, where you can't do the adjustments, you can't do the updates. You can't do the delete, then we put it all off to the side in a separate queuing section where we say come back later process this differently and if there's anything that we've learned about this, that may invalidate it. Let us know that, for instance, to go back to my example with Twitter. If twitter says you know hey this thing's got 500 tweets.

A

Now it's got seven and then it's got 504 and we, you know during that period of time of, say: we use the granularity of a day if that's changed, and then we go to do our daily roll-ups and we're like whoa. This makes no sense like you will never see tweets drop by that much then we process it and say: okay, we're going to we're going to get rid of this middle data point most likely.

A

We it's only gone up for and yus that final do so it's essentially ETL processes and the worker processes that decide what goes where when and all of that is based on what the requirements are for the system in.

B

Some cases are.

A

Our clients say we need real-time data on XYZ and we need it and real-time has its own, but in some cases they need it within a minute. In some cases they needed within 30 seconds. It's not like the financial social data. In this case, it's not like the financial world. It's not microseconds, it's you know. Sometimes a minute I mean you can get away with. You know minute minute and a half in some cases, but for the most part the workers decide what goes where when and that's decided in advance.

C

A

So we asked: if I could do it again, would we get rid of anything? Would we change anything and the.

B

A

We wouldn't because the the problems that we have to solve there is no one-size-fits-all. There is no single solution that that does everything that we need in some cases, we need to organize the data in a purely chronological way and that's pretty easy because you know with Cassandra. You just store it by date. Well, if you want to store it by date and by.

B

A

You know you can't do that, because you can't order your counters, like that. You know. There's no Cassandra doesn't let you do that. So we need to store our counters in then we can slice out. What we are, then we can, you know, do run a range query and find out what we need.

A

You know you can use Cassandra to a certain extent as a caching engine you can use as a caching engine, but ultimately the fastest caching engine. There is probably going to be read us because it's just the nature of what it does in the nature that you can guarantee that everything is memory.

A

Map just serves better for caching, most things then say Cassandra, each piece has its own its own value and and when it comes to things like the analytics and running ad hoc queries, most people are just more familiar with sequel and even though things like you know, even though cql is attempting to build that, there is no comfort level that bi folks have with you, know CQ outlet and probably won't for a while, if they do so, if you put a business person, who's comfortable with sequel in front of a cql terminal, they're- probably going to look at you crooked right.

A

So you know that piece of the architecture is also very important and is also a requirement. That's that's given to us. I'd say that the only thing that we probably would have done sooner is built the internet intermediary layer that service architecture. Earlier because early on, we were, we were doing everything direct to the datastore, which meant that every engineer knew had to know how to not only effectively right to every datastore but needed to know the schema and the storage pattern and all that stuff took probably longer to figure out and or document.

A

Then we would have liked to, and it probably killed us on the development time.

A

B

A

E

I think you mentioned using Amazon's Cloud husband provider.

B

E

Do you think if I want to swap out because of cheaper solution are better solution? Just fascicular there are.

D

Competitors like.

E

Rackspace you could gulyas, but mainly rack space is open cloud they're, making some noise. How do you swap in and swap out say to another open cloud solution? Well,.

A

That's that's why we went to something like chef right because chef just says: here's what the machine should look like chef says if this is a I mean we're very particular with using Amazon Linux. But if you want it to use like sent to us or something as your base image, you could say, I'm going to use, sent OS, 56 and sent to us 56 is going to look pretty much the same on rackspace or amazon. So you say I want sent to us 56 with all these packages. Spin me up.

A

One on amazon spin me up one on rackspace and then do you know do whatever you need.

A

Well, the cueing itself is just merely an application that sits on top of one of those systems. If we're, if we're using an Amazon specific service, then yeah, then we have locked ourselves in a little bit, but- and we don't do that in all cases there are yes that can become a problem, but it has not yet to date real.

D

Quick, we'll take one more question and just again reminder on the eighth floor. You can head up there to meet the experts and speakers, and all the experts will be up there for questions and conversations for the entire afternoon. I.

B

E

Yet it I noticed you're using redshift, analytics curious what gaps that we.

A

Actually didn't end up using redshift. We were part of their alpha program because.