Apache Cassandra NYC* 2013, 9 Apr 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: NYC* 2013 - Speaker Panel - "We FC*Ked Up So You Don't Have To"

Description

Speakers: Matt Pfeil(DataStax), Eddie Satterly(Splunk), Edward Capriolo(Media6Degrees), Matt Conway(Backupify), Russell Bradburry(SimpleReach), Jake Lucianni(BlueMountain Capital)

A

Handle today, which is really the name for things that go wrong and we're going to teach you how to avoid some of those things just to show off really quick. My name's Matt file on the co-founder and VP of customer solutions of data size. When we go down the panel real, quick and never will do a quick introduction of who you are and what company you work for and how they interact with Cassandra.

B

So I'm Eddie Satterly at work for splunk, currently from a perspective, building an integration with Cassandra uses data store previously in architecture engineering for expedia and had my fair share. The f'ing up mark that we're talking about.

C

I'm hi I'm capriole, oh I, work at an advertising company called media six degrees or m6d right in this area. If you blocks away, we use Cassandra for our cookie store. We go server-side cookies and all of our fitting interactions use Cassandra data hi.

D

I'm Matt Conway I'm, the CTO at backupify. We use Cassandra for writing a lot of data. The backed up data effectively of all the various applications be back up. So very right. Intensive. Not so many reads: a little bit of aggregation, a sort of thing, hi.

E

My name is russ Bradbury. On the first floor, architect, it's simple reach. We use cassandra right now for storing even data time series data and real-time counters. I.

F

My name is Jake Lucy Annie I have a bit of a speech impediment. So please bear with me I work at a Blue Mountain capital and we are using cassandra as a market data datastore, which is being used for all of our free trade and trade activities.

F

A

F

A

We just kick this off and let's go down the line and tell everyone the biggest mistake: you've ever made and can share what you did wrong and how you could avoid that in the future. With.

G

Regards to Cassandra, not life,.

B

So we were initially implementing search components for the front end web site. We built a data model that we were all sure was exactly perfect and we're going to be able to get to get out the way that we intended to, and they only figured out that we represent wrong and from a performance perspective, the retrieving data became quite painful and considerably lower than it had been promised and had to basically scrap and restart from the very beginning and get the data model correct the next time.

B

So looking at the data model early on making sure you actually understand not just take it from person who says.

H

I

Yeah, the soul is really.

B

Important this is only we were going to use it. Look at the little deeper and figure out different ways you might want to get the data because it was extremely painful. I.

C

G

I had similar data.

C

Modeling clothes but I think our biggest mistake was there was this page of the Cassandra wiki that was like? Oh the maximum EXT partition can be like four terabytes or something so we came.

C

This mindset of, like oh yeah, we're gonna, be able to stuff like four terabytes of data on a Cassandra machine like to makes bram it's everything's going to read super fast magically, so you really have to benchmark these things and figure out how how much data you have how much random read is going to be in your data and then try to size hardware appropriately, I, don't really think anyone's running Cassandra for terabyte disks. I could be wrong. So.

D

The list of things we did right is a lot shorter, but in terms of doing things wrong, writing millions of rows instead of millions of columns, I eat data modeling, that's very important, really impacts performance, not leaving enough ram for the colonel to cash. You know I ho things I'm, not giving too much RAM to the JVM, which kind of falls over after a certain size. Things like that the list goes on Jana studio. We started on Cassandra no dot six and not as much was written about it back then. So I.

E

Think the single largest mistake, I made, was probably checking running running a test suite that dropped the key space on production.

E

We had to figure out where 10 terabytes of data win. Luckily, for us, Cassandra's snapshots the key spaces that before you drop it, so we were able to actually cover from it. But it's probably a largest mistake: I've had the same data modeling woes as everybody else also yeah.

F

I've been using sander since, like 04 05, so I've made probably ever mistake. There is, but a recent mistake I made actually was I was trying to clean up some data and the data was written with a timestamp 0.

F

Milliseconds and the and the tombstones were were written with.

F

Nanoseconds, which are much larger, so all of the future rights were eclipsed by the tombstones. It was like how come no, no Dana's in the thing. No for this particular row so I that was a that was a tricky one.

A

I heard a common theme, though, that data modeling is pretty important and to just make sure everyone's very implications. Their consent is very, very good large amounts of data ingestion, but you have to plan on how you're going to read that data, so putting a little bit of homework and upfront solves a lot of pain after the facts and the data size we're working with customers of ours. We actually see that that's probably probably eighty percent of the issues, and so, if you get that right up front, it actually sort of just works after that.

A

But if you don't get that right up front, that sort of doesn't always work after that, so I highly recommend checking out documentation asking questions in the community clinics. Ammeter org has some really good docs from there, but definitely looking at when you're starting the project and.

G

A

You have questions to feel free to meet the experts later today and ask them all sorts of questions around data modeling. So we've gone through some of the mistakes. What are some of the common issues that you guys see with hardware.

B

Okay, I think I'm going to pass on this one.

C

Well, I think I mentioned before what you get with hardware as you get the wrong hardware.

C

Obviously there's the thing Jonathan talked about in the morning about how they're moving to a j-bot set up instead of raid, it was initially hard to figure out what was the right raid for Cassandra servers and and making the assumption that getting a raid, a raid of 12 disks, will give you more seeking capabilities. You know rotational disks, don't really see faster with the bigger rates that they just stream faster. So that can be an early on hardware.

C

Mistake is like assuming here from a sequel standpoint and most of your workloads are our scan orientated, say yeah, you just build a 48 this grade and that will really read fast, but in Cassandra sense into doing small seeks across the disks. That may not be true, so.

D

We're on ec2, so no hardware per se, but you know, choosing the right instant size helps a lot more choice. These days, you can use SSDs rotational discs. I think these will sweet spot when I was looking into. It was extra large ec2m one extra larges. Those work pretty well have just enough disk space just enough RAM and you can scale out pretty easily.

D

We rely on Cassandra for all our redundancy, so we just use you know a striped raid array to get the maximum disk performance off the attached storage, and you know flexion what he said it doesn't make seeks any faster. You know kind of gives you a little more throughput, yeah.

E

I'm also completely in the AWS cloud, so pretty much everything said: statins.

F

See I mean you know in.

E

Terms of hardware.

J

There's there's.

F

All sorts of problems that pop up you know, bed bad disks. You know bad and RAM that too few CPUs to too many CPUs. We are not really utilized in the box, so I think there's definitely a lot of thought that has to go into. How do I sighs each box? It might Buster for my particular.

F

Use case I'll definitely say with Sandra 12, which is what we're using the chain. Jpod config has helped us quite a bit because we've we've had disks died and the nodes stay up. It's been very useful.

A

Ssds were mentioned in there and one of the common things that we also see today is I. Think there's this lack of belief about how many random seeks spinning disk and do just put this in perspective. A 15k SAS drive stands for 15,000 rotations permitted. We do that math. That means it rotates a little over 200 times per second, which things that, if you're doing random, seeks on spinning media, the most random seats you can get out of a single hard drive is about 200.

A

That's actually extremely optimistic, and that's a physics thing I think related is going to sound. So.

K

A

Not going to get any better and a lot of people look at.

H

A

And say: wow SSDs are so much more expensive. Well, first of all, if you look at it from an ions perspective, SSDs actually give you the 100 everyone's Mac or doing at least 10,000 I office, but the cost is only about 2 to 3 X, especially the commodity side. So we highly highly recommend starting with commodity, SSDs and you're, getting ready, because your price points actually lawyer whoa to lower when you're, comparing two spinning media from a pure input output, so SSDs sort of the gods grace to databases. It seems like.

A

So you guys are obviously all familiar with running this thing in production. What are some of the things that you either would recommend are some of the war stories you have around common maintenance tasks, whether that's adding nodes or doing backups or anything in that field. I.

B

Was running it before ops of the newest versions of ops care which make life so much nicer, but kudos.

B

Basically, when we get started, there were several issues with the garbage collection problems and not understanding what you're doing with Eames, which we've talked about as a kind of massive problem earlier, but kind of maintaining the environment itself was kind of simple once it was set up and configure, but definitely making sure you do it right up front sizing.

B

The JVM is correctly making sure you understand what it's called, which again no one else who has changed substantially, so it's a little easier, but we ran into knocking over notes because of evening garbage collection several times, so that was probably the biggest name Wayne. The biggest thing we had to play with Micah with was changing around the day, Liam sizing and making sure we had the heap set correctly. As far as the common tasks we we actually went with to production with that.

H

Six originally.

B

And then, when we move over to DC, we were able to kind of use. Op Center do a lot of the tasks that were being done before and train a database engineer from relational side to be able to run it. So that was coming in d, yeah.

G

You could Jamie, it was probably the biggest pain.

B

And just making sure that it set up the tuned correctly, the way you want it and that you're not oversizing, even killing your own nodes.

C

Yeah I would follow up with that. I get a lot of good points. That was, that was an early on thing. It's a very interesting mix of how much for have you have in a machine and how much you want to give to the heat, and it seems very appealing to use like rope. Cash is everyone's like Oh, memcache memcache just put everything in memory, but you could do yourself a disservice, like Jonathan mention le there's, JVM fragmentation issues so.

G

It seems like oh man,.

C

I'm, giving all this memory to Java but like I, look in jconsole, it's only anything half I should use more caches, but sometimes you just want to hold back because you'll notice it cassandra has certain features where, if you make the catch is too big, it'll knock them down anyway. So yeah.

H

C

Give yourself that free overhead room, because that is what what will kill you in performances you're like 95, percentile latency. If a hundred requests go in two milliseconds, but one takes 10 milliseconds right, that's going to hang up your clients, possibly and hang up your web application, so you're really trying to optimize not for the hundred reads that are fast, but for the one that may be slow.

D

On top of all that, there are a number of things really, you know we're still doing everything manually, we're not using maximum, so tips along that front understand how your token ring works. You know where the data for arrange is where the replicas are. One thing we do that's proven useful is you know, starting at token zero that would be Cassandra 01, looking whatever Oh too kind of the node.

H

D

Matches relatively where it is in the ring, can helps you visualize it when you're doing maintenance tasks get familiar with jconsole. You know you can dig in quite a bit on there's a lot of data in there that you can pull out and look at even better would be to use something like collecti or graphite to automatically collect importance that so you can graph them over time and then sort of aggregate them across your cluster or drilling due to individuals.

D

You know the more information to better when you're trying to debug a problem, I think.

E

It an important note on knowing your rain, especially if you're in the Amazon Cloud, knowing where your replicas ours is extremely important. We have our our rain, set up to have a rapid RF of three in a replica in one replica, each different availability zone in three different availability zone so that we could theoretically lose to they will use Ohm's and still have an entire set of data.

E

The caveat that comes with that is the latency within one availabilities is sub-millisecond, whereas the latency between availability zones can reach up to 5 milliseconds in between regions, which is going to be your data centers, you can actually get up to a hundred millisecond latency, so tuning your Phi convict is a really big, a really big maintenance task that you have to kind of get right when you're in the cloud.

F

So the question was about done the emmaus task. Yes, well, you know so we're using Cassandra 12. So fortunately we don't have to worry about the poking a token placement problem, which is which is certainly one of the largest ones. I would say our biggest maintenance thing, or you know, sort of our current headache is, is really around cup compaction problems, and this is something that will I'll be talking about it in my top later on a day, but going back to to monitoring.

F

So one of the one of the cool things about Cassandra 12 is they use the the now someone's standard.

F

Metrics library, which which basically allows you to to export all of the metrics in Cassandra out to systems like like.

F

Bremen and graphite, and all those things so being able to basically track every stat on every column, family, on every node in the cluster and being able to have a tool like graphite, it's kind of to kind of build your your own dashboards is really really useful. So it really helps us object, input, problems and and fix them.

A

As we've gone across the group here, we've had a few things come up about running in either amazon of the cloud. Why don't we just touch on that? A little bit more deeply and talk about some of the things you recommend, with whether it's amazon or a different cloud provider that you either highly recommend to do or highly recommend to avoid at all costs? Well, since I don't know which of you guys actually run on Amazon start with Matt and we'll go over there. I'm.

D

Sure what I haven't said yet, obviously you know get the the right instant size for you. Need you don't want. The center doesn't work too well with too much disk space. We found that the extra larges worked great. It was like a terabyte of disk granot da garam, half of which me of the Cassandra half of which we give to you, know everything else that works really well. You know similar to sorry for north what we we also spread out across availability zones.

D

You know, Juan will be in zone a2, lavigne zone be free, will be in zone c and back around again and all the amazon outages. None of them are have affected our cassandra cluster, even though old chunks went away at talk.

A

A little bit about EBS do or do to hire. That is that correct.

D

Quarterly not do not use EBS use the ephemeral stores or, if you can afford it, use I think you can do provision by apps which are effectively SSDs, I think or we can get SSD instances. What we do is we use the femoral stores and we raid them together in a stripe, configuration to get the maximum amount of bandwidth out of them. That I.

H

Works very well: no.

D

Problems there where 21 nodes and then you know redundancy, is if one node goes away because of disc problems, and you know it's not a big deal, we'll just drop it rebuild it. We.

E

Have a very similar setup, we're using home or using the extra large machines that are about 15 gigs of RAM. We give we give 92, Cassandra and rest to the operating system. I think like like he mentioned. Bbs's is a no go there. The instances we use come with a for ephemeral disks that we stripe, that we strived and just all together and kind of right right to those, because it's it's a lot faster SSD machines are are definitely proven to be faster.

E

However, the Cotters, a cost I guess and a balance there, I wanna come when it comes to that.

A

Do we have a mic for the audience questions.

L

It's just a follow-up question, because I'm also running ec2 using ephemeral, drives I. Think based on one of your recommendations. This is some place. Do you periodically copy from from the terminal drives down to any PS, or do you just rely on the fact that they're never going to go away? No.

E

I, actually, just we we we don't, do we don't have any traditional I guess back-up plan. Our back-up plan is our multi data center replication and certification factors between the Nova's. So we don't. We don't hold data down anywhere. If there's a disk failure of a node, we consider that node lost and we replace it so.

D

We, you know something similar, you know we, the redundancies across cassandra notes, there's a replication factor of three across the data centers. So it's pretty.

J

D

Since we're back a company, we also backup our Cassandra cluster I think nightly or weekly. We dumped the basically just tore up the data volcano. The data store on this to a snapshot back it up. Dump the tar files in s3 just doesn't extra security, blanket we've never needed it, but it's there. Okay,.

L

That's really helpful because I heard one should periodically copy from your femoral. True, your EBS is video, so you know a bottleneck and then and.

D

We do that asynchronously. So it's you know it's it.

H

Takes a while to run, but we don't really care.

D

It's slightly impact performance, I guess if you're reading was data, but we don't do it that often so thanks.

L

A

We so really quickly over I.

C

Believe also here in the cloud Netflix has its Priam tool and I believe the Priam has some features to deal with that and I really low level, since SS tables are right once if you're clever about it, you could kind of watch the directory and see when new SS tables and bloom filters are created and when old ones are deleted and you could kind of copy them incrementally somewhere else, so you don't always have to do the full at way. Let's get everything and move it, you can usually be slightly more intelligent about it.

C

A

I can I can comment on netflix a little bit. If you think about backups in general is one of two reasons: it's either the data center or something becomes unavailable or tu manera, like you, wrote bad code that happened to lead everything on the former issue running out of multiple data.

A

Centers is easily the best answer, because you keep the system as a whole one hundred percent available, even if you lose an entire data center, which happens pretty frequently on the alternate side, if you are worried about human error and I think everyone's had that moment where they ran down the hall and said I need to restore from tape immediately at least ten years ago.

A

Netflix does everything from taking those snapshots periodically in pushing the history like every 30 or 60 minutes to every single operation, depending on how important the data is that you need to be able to recover from very very quickly, and the nice thing is on that human error recovery aspect. If you are going to recover, you want that data on the machines already so just having it the snapshots locally on the machines.

A

Allow you to recover very, very quickly: first copying it and recovering from somewhere so and I didn't really comment on the general Q&A policy. If you have a question about what we're doing up here, please raise your hand and we'll get a mic over to you and I'm, going to save some time at the end for just general questions all together. So if it's about the sort of topic just raise your hand and you'll come running with a microphone, so we talked a little bit about backups and the cloud.

A

Why don't we sort of bring that right away into multi data center and the fact that Sanders a whole does have a true multi data center, offering which, in my opinion, means you can have as many as you want, they're all active for rights, which is extreme rarity and a huge advantage for this technology? What are some of the things that you guys have encountered with multi data center scenarios that either make your life better or worse, or things that you can do to really make that easier? Let's start down Jake!

A

You want to jump into that sure.

F

Yeah we currently run with the two data centers and I would say you know the benefit of that is our compute nodes sort of run run in in in both data centers. So it basically means to act access data, the compute nodes, don't don't have to pull data across the wind. On the other hand, the problem is, is that currently you know we're running at at RF 6, so that means we have like six copies of the data and three in three in each data center. So you know the downside of that.

F

Is that you're, obviously using a lot more disk per node, which is really weird dumbed it, but but it's really the only way that that we can lose the drive in and we can lose a node in either data center and and still have consistency or or lose a entire. A designer yeah.

E

I think I think the one of the big things to know about doing multi data center replication is don't assume that you can have a twenty node cluster, doing extreme volume of rights, and then your second data center be a three node cluster.

E

The the fact of the matter is is that all those rights are ultimately going to happen on that three node cluster and if the load, the I. Oh wait on that three node cluster goes through the roof. It will ultimately slow down your larger cluster that and be aware of your latency. The the latency between the clusters is also something to be aware of we're.

D

Not using any of the multi DC stuff, you know for our needs. We just wanted a little extra reliability or we're done I. Guess safety fail safety, so we spread them out across the availability zones and that works for us. It was a lot simpler. So if you don't need it, it may not be worth the extra overhead and complexity kind of go with a simpler broke. Frequently.

C

I just was thinking of scenario that rent a lot of people head into, which is Cassandra logic and figure, both a data center name in a rack name, and unless you really really know what you're doing you don't want to have different rack names, people kind of think. Sometimes, oh, this thing's really interact too so I should make it rack too, but the replication strategies have some special logic and it may not.

C

It does what it's supposed to do, but it may not be exactly what you think it's going to do so commonly you probably just want to keep all the rack names the same. So if that ever comes up, you could Elena.

B

Well, so exactly the opposite of that is what we did was great now, so we we actually implemented across two data centers. Previously we had two separate sequel clusters that we're doing the work and someone had to go in and aggregate a bunch of the data across and then figure out how to combine the two.

B

If traffic was going to one data center and the other, what we got was a benefit when we rolled DSE out and the multi data center strategy was, we were able to handle rack failures within the data center, because we had had several issues with going over power limits and other things within space with machines that shouldn't really been there. We also had at least one occasion where only one of the or the other data centers could be online at a given time.

B

We were doing patches and do any testing so having that capability and never having to worry about being able to serve it up, because we had a large link between the two sites. So if compute nodes have to connect directly to the other data center, it wasn't that big of a deal, but in certain scenarios it was horrible so having the capability to run one side or the other or both and be able to have all of the search result data, but into either side.

B

So you could do your analytics against either side of the cluster was highly valuable and I think continues to be expanded even more now.

A

Quick survey for you guys, if you had your option between running between the cloud and your and hardware, which would you choose.

M

E

Either tues the cloud all the way cloud.

D

As well to just not having to deal with managing hardware is great, you don't get the same performance but those times you don't need it. I'm.

C

A cloud hater, I guess really bad, but there's a lot of benefit in it and it really depends on who you are and what you want to do. I think there's a lot of crowd is not new new, but it's pretty new, so everything new has a lot of pipe which makes people tend to use them the wrong way. Sometimes, and let's keep that mind.

B

I well personally, I, don't feel strongly either way, but depending on the use cases to really get to look at because, from the you know, my previous life, we could not have possibly gotten the performance we needed or the scalability that we needed in a way that made sense financially versus buying the high-performance computers that we did to deploy them into the data centers. So just for my cost perspective, as well as your use case and what you need from a performance perspective, I mean it's easy to make a decision.

B

You know, I have a five note close to running and eight addressed. It was great for the use case we were using it for, but we had 144 nodes running in data centers that were necessary to handle the scale and volume and the cost of that would have been like half a million dollars or something to Amazon versus you know, buying the gear which was considerably less even with power and everything else to run it yourself and.

A

We've definitely seen in our customer base from everything from startups to fortune tens. You know once you get into roughly the fortune, 500 or maybe even fortune 1000. The cost scale is just so much better than you already have the infrastructure in place to go ahead and use your own infrastructure. So it makes sense, but as opposed to Ed I love, the clouds and I think it makes sense to get started very quickly, so the lifecycle of machines, especially in the clusters, they generally start smaller and they end up bigger.

A

Why don't we go down the line and talk about any tricks you guys have or including automation, for how do you guys handle cluster growth and adding machines to it and ain't rich there, we'll start with it's in season. Hater.

C

Yeah, that's tricky because you know over time. You know we bought these particular type of servers that came there there they had six, maybe cereals, cuzzies or cereal Sata's. Excuse me and then the same model server. The next line wouldn't allow you to do a six-disc grade one. You know the issue.

C

So if it's like, oh man, are we going to go out and try to buy like refurbs or buy aftermarket of this mile, or we go to the new model where we now have eight drives, but the biggest rates that we could do is four.

C

So that was a kind of interesting thing and it's cool there's a feature of Cassandra for that now, where you could say that this column family is on this this disk and you could we did it just by siblings and mount points, but that was one of the things that we kind of hit on to was you know, new generation of hardware old generation?

C

Surprisingly, Cassandra does work fairly well with different mix and matches of servers as long as they all have enough capability for your load and now, with the really like I think we mentioned it before with SSDs they're, not even really that expensive anymore, it's you're buying. You know you may be buying a four to eight thousand dollars server and you're going to either spend what forty dollars on a SATA disk, or you know three hundred dollars on a scuzzy to an SSD disk.

C

It's not like a huge cost, so I'm pretty much all on the SSD market as well. These days are not always easy to get your hands on, especially the really high fusion-io cards. It sounds great and then they like it might take a month to ship, but for the general purpose SSDs, you could, you know, get those with the servers and no issues there.

J

Is there any particular model of assess the use of, for example, multiverses slc, or have you faced any SSDs.

C

Well, I don't have a particular model because I'm not really involved in which ones we get and why? But there is an interesting there's, a lot they're very new. So there's different details like if you some say you should use trim, but you can't use trim if you're using hardware raid, but then some disks have their own trim. So there's like a lot of new, you know you got some people suggest you should use a different different disk scheduler. You know like Linux has its own, this scheduling it and it works differently.

C

Maybe you should do lvm instead of software raid, so there's a lot of its it's not as cut and dry as it was with the hardware RAID you just say: I'll get a big raid. That's it hardware controller. We know it's fast. You kind of. If you get a question where you're someone's scratching your head and then you have to investigate your decision and say maybe we should have used trim with lbm an ext4, so those things.

G

I

Does anyone here have anything interesting to say about using fusion-io carts.

G

I

Are looking to probably switch to using fusion-io cards instead of plain ol, SSDs and I would be interested to hear any prior experience. In doing that thanks, you.

F

Should probably ask that question that the Instagram talk.

C

The more exotic the card is the harder it is to buy a lot of them. So be careful, don't don't bank on just being able to snap your finger and getting 200 fusion-io cards? You know he's naturally, our god.

N

D

G

Couldn't get any yeah?

G

Hey, hey, hey! You guys answer.

O

Just a quick question about cluster growth: what are some of the indicators that you guys used to determine when it's time to grow your cluster.

A

And don't say: you're shy, stone. I.

E

Would I would say the primary thing they look at is the the overall load across the cluster we run. We run quad-core machines, so if we know the load is approaching 4 plus or wide, we probably need to grow it and.

D

E

We also obviously.

A

H

Of the stats within.

D

Cassandra kind of graph over time- and you have to see it going like this and then does it's pretty good indicator that we maxed it out we're getting close anyway run on raw numbers. It's hard to say just because everybody's compare ish is different.

H

D

I have just good thing to look at I I.

E

Think I think a big thing to look at it just like if you have to determine if it's, if it's a cluster white issue or if it's just a something something's going wrong with your application, maybe it's a hot spot. Maybe it's something else that may not be indicative of maybe growing. Your cluster will help fix that problem in the short term, but it's ultimately going to resurface.

E

So you have to kind of look at what we do is if it's, if it's affecting the entire cluster and it is affecting a lot of different pieces in the closer than we kind of it's a good tell that it's it's a time time to grow.

B

Yes, I'll do that I just said: it'll do the same shameless plug for my new employer, because when I was in my previous role, I actually useful I'm very heavily to monitor the environment, especially our Java Web Services side. We pulled a lot of stats, but the thing that kind of hit harder than anything else is.

B

We were able to pull io memory and cpu performance directly in and build kind of a graph of what we were doing at peak and we had you know pretty much a ninety-nine percent utilization window and then we had a you know den person or zation window on a pretty average basis based on Peaks.

B

So we were constantly having to look at it to see how we scaled up and we basically set a bunch of alerts to start looking when we started seeing trends that were out of what we expected and adapt it that way and we use puppet to deploy all the systems. So it was very easy to just turn up new nodes if we needed to- and we had extra HP cenotes sitting around- that we could bring up when we needed either to swap out, because we had a failure or to be able to add capacity.

G

Yeah particularly.

C

Your question what I noticed, especially back with yet with pre SSDs, was the first thing to go, was kind of I ops which, but you could tell that very easily. If you have a standard, if you're running top right top breaks down your CPU into user system and wait right, so wait is really kind of the key for those type of systems.

C

It says I'm a process running and I'm waiting on disk right and then the other thing to look at is if those are all good, you have to just count the size of different column families because how they perform. If each know it has 10 gigs of data, there is even with SSDs and lots of RAM. There is a difference between how it would perform with 10 gigs of data and 20 gigs of data. You know, there's bloom filters and every read has to check all the bloom filter. So all these things.

C

So if you watch the size of column, families, the cache hit rate and those latency numbers, you can see when things. Hopefully they go slowly and then you can react in time. But if you turn on a new feature and then you get a ton of lower than you have to think about the new feature and what it means to the system as a whole, it's.

E

A good point, and just to kind of expand on that.

E

There's this there's this idea of scale up or scale out and with specific regard to the Amazon Cloud, everybody got really excited that we were able to do 15, node the same load on five nodes that we did on 30 nodes of the M cells on SSDs versus excels in the Amazon Cloud, and the primary reason why we didn't switch to that is because Cassandra starts to get a little finicky when you go approach past a gigs around on the heat, and so we found that keeping a 20 node cluster of smaller machines, where each machine has the full eight gigs versus five nodes that you know.

E

Each machine still like be allocated more. The TC pauses would get a little little too high. It kind of depends on your use pace. So if you have small column, family has been an extremely high write, write throughput and you know scaling up my work for you. But if you have very, very large families, I think scaling out with smaller notes might might be a better idea. Just.

D

You know when rebuilding a node, if it's smaller, it's going to rebuild a lot of liquor. You know, maintenance tasks are a lot quicker. It's just sign.

L

H

Just seems to work.

D

A lot better, if you scale out rather than for our use case anyway, I.

A

D

Recommend that so.

A

That's one more question of the group and then we're going to move on to questions from the audience of get ready to raise your hands, but for the last question this is obviously a horror story list of things that have gone wrong and recommendations and a lot of this learning the hard way.

A

Let's end a little bit of an optimistic note and I'd ask each of you: what do you like the most about Cassandra, that it actually makes your life easier and better and why you go through some of these baits Jake I'm gonna, put you on the spot and asked you first, the.

F

Best is uh I, don't get woken up at 4am when I pissed eyes or you know it's it's it's very hard to totally lose the data. There's been there's been times when you know you think, like oh, like this one horribly wrong, there's you know I lost like to nose at the same time, writes our family I'm, losing data and then, when you print leave it back up and and repair everything that says it wrote at a forum is, is still there.

F

So you know so it's very awful fault, tolerant and for me, that's the biggest thing, because I'm not really a ops person more of a developer, but in a.

E

F

You kind of have to wear both wear both hats, so.

E

A little bit of an idea, what we're doing is like analytics, where we collect events and arrogant events / over time and I. Think what one of the biggest things I love about cassandra is is that I can just say whatever the data is, let's just throw it into Sandra. You know it previously.

G

E

Systems like Cassandra came out. You would have to write right to a log file and only log file. Take these log files, ship them off somewhere run some big MapReduce job on them. Take the data out, whereas now I write to my right to an event: column, family, that's keyed!

E

Off of you know the hour that the event happened and that way I can say: okay, if I want to build an hour's worth of data, I just pull this one row out and rebuild it across my cluster or and in addition to that I also write. You know numerous amount of counters at the time that each and vid comes in for the different ways that I want to read data. So we have this.

E

This real-time, that's written as well as our our raw event data at all in the same place, and we can kind of just pull it up. However, we want and.

D

For me, a second over there, reliability is awesome, not have to worry about a node failing. You know. Your data is safe, pretty much regardless. It's really.

I

D

Top of that I'm not have to work, but sharding is also great right, sharda for a scale anyway. Aside from what you have to know about it, for getting your data model right, I know that I can just keep putting data in there and chances are I'll, never hit the ceiling. I guess technically, there is a ceiling, but we have no come remotely close to it. Yet we write a lot of data I.

C

D

C

Peer-To-Peer is really the best part because I we have a lot of systems and they all have their their pluses and minuses, but the things that Cassandra guarantees are good to me. For example, we have my sequel servers and the replication breaks, and then all these apps are reading from the slaves. But what are they getting there getting something old because those applications don't know that the three-tiered replication structure is somehow behind.

C

Unless you build a lot of logic and to try to- and your client has to ask all these questions but kassandra's the consistency models is pretty basic. You read a quorum and writing quorum. You have consistency. Even if a node fails because a lot of systems they don't they don't have that like when, if I ever have to kick certain things over, you know and I wonder like what happens to commit log.

C

Sometimes I really don't know, because there's just one there's one thing and I'm, not one hundred percent, confident that one thing is right in all situations. You know if you have a my sequel master slave. What, if you know the swap disc was bad? Could a quarry be off or something who knows it's just nice to know that not one single piece of something going wrong could make the whole system go really bad. So that's what I like about it. I.

M

B

Say the right performance was probably the biggest thing we had 45 sequel servers, which will replace with basically eight notes of Cassandra. So in order to get the scale of rights of Anita, there were 45 sequel servers and a cluster which is insanely expensive and on boxes that were ninety six gigs of memory and twenty four quarters and a crazy amount just to be able to handle the load that was out there and we replaced it with eight well used. I PMI data blocks hpc notes, but we replaced it with basically eight blades in 42.

B

You deals so basically we gave back a rack and a half worth of space while we were using before plus we gave back power plus we got better performance by about three or four X when we tested on eight nodes of Cassandra than we got across the much much larger secret, luster.

A

Awesome so any one of the audience has questions. We have a group here and just wait for the microphone to come over and we'll start here that over here next I.

J

Want to switch contacts a.

G

Little bit to development systems particular.

J

Development systems for a single developer and unfunded starter. What have you done that can be considered to be really cheap to set that up and how many nodes would you recommend.

B

Well, a you basically depends on what your workload is, but I would definitely recommend you're getting three or four nodes in the environment. I've been working with a university that doesn't have a lot of funny, so they're they're standing it up on basically about six year old, Bell nodes that were retired because they were end of life and they haven't built that way and they're doing a research project on that with a considerable volume of data. So you can, you can run it on anything.

B

You I have an instance of three node cluster that runs on my macbook when I need to stand it up and test an application and test failure. Scenarios I have a three node virtual cluster that runs on VirtualBox on my back bro and you know, I can test failure. Scenarios I can knock over those of the cluster I can handle reasonable right volume. It's an SSD in the machine, so it does reasonably well I.

B

Don't have a lot of disk space because of it, but I mean you can start very, very small at home, my mac minis. I have three back minis that run a cluster, so it's not! You don't have to have a lot of expensive hardware. You don't have to have a lot of requirements depending on your use case. I think.

E

I think that, from a developer's standpoint, if.

H

E

Don't want to go through all the trouble of setting up virtual machines and joining to an actual cluster there's a great tool out there called CCM the standard cluster manager and it'll actually bring up a localized cluster of end nodes um that you can just it's very easy to use command line project that will just bring up local cluster of n nodes and you neck good start filling.

C

One of the other things too is that I know I. One thing I really like is that there is. You could use Cassandra embedded very easy if you're developing a Java like tom cat I, actually find it very hard. When using my sequel, you can't have an embedded, my sequel in Hudson or Jenkins, so you kind of in a packing a lot of workarounds like I'll use, h2, but an age to the statements aren't exactly the same. H2 may have a slightly different date.

C

Time function, what's nice about Cassandra, for me, is that I'm, a job of guy, so I guess I like it this way, but I can bring out my app and Cassandra in a single JDM and test and tear it down and I. Don't need to do anything outside of the realm of my JVM or worried about how hibernates going to turn this query on this database and what it means I could actually test on on the real deal.

A

Because we're a few minutes left I'm going to limit the panel to only two people per question, so go ahead. I.

M

Was wondering if anyone's evaluated any other cloud options besides, amazon like Google compute engine.

D

And look today happening, you know made a move of the performance seems very attractive, but it's a pretty high switching costs, so I haven't haven't done anything they're here.

A

I will say that when we see people in the cloud in our customer list, it's probably ninety to ninety-five percent amazon and maybe three years ago was probably seventy percent amazon and it's just getting greater and greater. At this point, hi.

N

I'm curious, if you have any recommendations about use of image data, not just metadata, also visual data in terms of leveraging research analysis.

F

You mean like storing a a picture.

N

Well, I mean that, for example, about problems like color search similarity, magic preferences, recommendations.

F

I mean if you pull out, you know the color palette or a a histogram that that might make sense to basically basically store that metadata per row or or or you could use a composite to table in in c ql. Three. You know that I might be useful. Also. The graph targets going on now, but probably you know, has has some useful things. Work, you're, building a graph of similar features to build a recommendation engine. You know, I know that it's it's a use for that as well.

P

H

P

In an organization that doesn't have Cassandra and you have an existing application, let's say, for example, it's of Maya school application as a memcache or something. What would you consider is the best Queen to have a successful portion of that is a you know. You've talked about like server-side cookies, a handle it x, logging.

P

What would someone say? Is you know? What's a good opponent, that you can break down? That's a well-suited for Cassandra. That would, you know, prove the technology I.

B

What we were using it for previously is are basically catching up our search results, because we did several common searches that happen on regular windows that the Refresh times were very low. That was a huge when a matter of fact, you can watch the presentation from last year's summit, but he was a very huge one for us and, basically a hope to grow and is continuing to grow the footprint within the expedia environment.

B

So we picked the use case that was challenging a lot of reads a lot of rights and that required kind of the ability to refresh on a very short time interval, though that worked extremely well I. Think.

E

It's really great for time series data, it's! It lends itself toward toward that type of a paradigm very easily, especially when you do counters and stuff in a.

G

E

That you're just doing like key value type lookups for the reeds and a whole lot of rights to that get ingested. It's perfect well,.

A

Do two more questions.

H

My question is: how long have you guys be using it and during the time that you made a decision to start using Cassandra? What did you compare to a no did you guys make sure that my sequel replication, that's just horrible. What else did you comply you compared to so.

B

I started in production without six in pre-op five and we compared it against HBase, my sequel, Mago. That was kind of a said that we got down to.

D

B

G

It for about three.

I

Years, I guess.

D

16 at the time compared it to react and obviously postgres for my sequel, replication, couchdb and I. Just the architecture distributed. Databases are hard right and you know in reading about the architecture of Cassandra, it seemed like it had the most solid, consistent answers of the set, so.

A

Take one more quickly: yes,.

K

Hi I'm curious, if any of you use the brisk implementation of data stats for Hadoop and if your experience I haven't.

E

Used brisk particularly, but I do use deeply you use DSC heavily, which is the data section of rice and it's wonderful it does. It does exactly what you would expect it to do and we particularly use the hive portion of it heavily and it really breaks down the column families for you easily, especially if you have wide rows, it puts it into a long and skinny type table for visualization purposes.

A

Okay, well I want to say thanks to everyone, just because we're about five minutes over at this point. If you have any other questions, some of these guys would be available, perhaps right now and then we'll have a room full of experts and I think 1250 when lunch starts up on the end, for thank you.

A