Apache Cassandra Cassandra Summit 2013, 26 Jun 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: C* Summit 2013: Splunk + Cassandra = New Value to Business

Description

Speaker: Eddie Satterly, Chief Big Data Evangelist at Splunk
Slides: http://www.slideshare.net/planetcassandra/cassandra-summit2013-eddie
The session will demonstrate Splunk integration with Cassandra today and discuss more concepts for an integrations to come in the future.

A

Alright guys I'm eddie satterlee, I'm chief evangelist in the blanc in my previous life I built a pretty large Big Data solution for a large online travel company, including a lot of Cassandra about 144 nodes, as well as a couple petabytes of Hadoop and about six and a half terabytes per day. Avenged esten too Splunk so I have a little passion for a whole lot of these things. I'm involved pretty heavily with the Cassandra community, when I can and part of me being in splunk means I get to help.

A

Do some cool new things to do. Some integrations I think one of the biggest complaints that at least my internal guys had when I was building Cassandra I was being able to do searches from some kind of EU I. That worked as well as being able to pull everything together and explore the configurations. So I built this little prototype app.

A

Last year a couple people have been running it for a little while, but I now we'll have a supported, app released within the next month or so so I thought I'd, show it off here, you'll be able to find it on splunk basis, one of the splunk apps within a month.

A

So do a quick talk and then we'll jump right into the demo. So basically slug has a lot of capabilities, though so you don't know it, there's you can do searches you can create monitors alerts, build dashboards, it's very heavily used in elliptical till we get I, think 50, 450 500 customers, enterprise customers at this point running around the world, using it for a ton of different use cases, but there's still a whole lot of data that doesn't fit really well in spunk spunk. It's not an OTP datastore.

A

It doesn't handle that kind of transaction load. It wasn't built for that. Cassandra is in a lot of our customer accounts or larger ones, especially they're, using it for an OLTP store and it'd be great.

A

To get some of that information like at my old job, where we had this really nice search thing powered by Cassandra, it was great to pull some of that stuff out and combine it with something else, also in Hadoop great place to store a bunch of stuff and do some computations over time and be able to pull it back in so we're. Basically moving to supporting these data stores is a way to be able to get data sets and be able to search against them. There was a Hadoop connect app released last year.

A

That's going to be iterated on in the near future, as well as my Cassandra connect, app and DB connects out as well for the people who still use those relational database things so I'm gonna go to the demo, which is far more interesting than me talking so quickly, so we'll start off on this side. So this is a function normal you I, spunk dashboard. Oh nice works even less. That way.

A

So we'll just look at exploration first, so what I've done is configured this to where it goes out and pulls all the information from the system tables little variable are available to tell you exactly what the configs of your current system are. So in this case you can look and see what the available key spaces are that exist out there. This is the kind of flight sample data set, that's loaded, so we can do some big searches.

A

I created a something called schema one just to create a little bit of user table to show what a small would look like and then I have all the column, families that are created under all in one place. You can look at any control down into them from there we'll go to the schema book, so you can basically pull up from here search that will give you all the information again about the schema. These are all custom search commands that were built using Python extensions within this one qi.

A

So DB schema is a custom search command for splunk and if I wanted to use one of them that already existed, if some command out there, that is built into splunk, I.

A

Can just do a quick count on a bike key space on when you're there right, so any of the things that work in this one pipeline all the analytical functions, all the 300-plus search commands will work in conjunction with commands returning data back from splunk and cassandra together.

A

If we go look into a coffin which this is the flights cullen family, if we look into it goes into the full definition, so you can see exactly what names your indexes. These are all created with cql. So you have all the information you can go. Look at the names of the columns you want to look for.

A

So, if you're writing a search, you need to know which particular columns you want, you can find out all the columns that are available from the exploration side same thing, I'll show you a simpler one that was built with picasa. So if we look at the schema for the user table, it's very, very simple, its age last name, first name and ages and index so you'll be able to go ahead and do explorations. These are all live, commands again, they're hitting directly against the system.

A

Column, families, if I wanted to do a get keys, I'm doing this on the small store, because I'm sitting on a mac with three VMs running and so it'll give you all the keys that are available out there. If they're, not textual keys you'll be able to get them all so it'll return back the key, so you can do a search by keys using DB lookup.

A

Also have configs so I'm also indexing all the lines of the gamble, so you can go look by host and see exactly what your current configuration is.

A

If you need to look for a particular configuration one of those things that we were just talking about, troubleshooting in the last lightning talk, you'll be able to quickly pop through here and look at all the things that are in your career and will file by x, server in the cluster I'll jump the troubleshoot real, quick to save the best one for last so from a troubleshooting perspective, you said before I ran this environment, so I know what was important to me: may not be what's important to everybody.

A

It's very easy within spunk to add more but pulling up warnings pulling up errors. Looking at your Cassandra system log, looking at Cassandra log, which unfortunately, nothing happens on this box, so it's empty, but you can see your flushes in the last 24 hours and compaction around in the last 24 hours by host in this case only have one of my hosts up, so you're only seeing it by host, but if there were a number of hosts in your cluster, they all show up here when I run against my AWS cluster.

A

So all this is already built again. You can add these any way you want you can. If I am you can search through them drill through, but the stuff that oh yeah and I actually built some other little searches to go count the number of errors, for instance using the extended commands or the number of messages, rather so this host in the last 24 hours has given you sixteen hundred messages, but this is what's actually interesting, so have the ability to issue a full c ql search using the complete cql language from within here.

A

So you pipe DBC ql issue, the c ql commanding quotes separated and it will parse out the whole thing if you put 10 or 15 in there, it'll parse them all little return back. The results with all the fields puts it all into one nice interface, which will let you combine it with data from something else.

A

So in this search by plan, I can also go look at my network logs from my devices that are stored in splunk or whatever else, and be able to pull it all into one place for a more complicated so doing. Another cql search just pulling back origin because I looked and see. Ords is the very first field when I was looking through my configuration information. This will give you all the origins.

A

If we pop back over to really quickly to the configuration of the actual flight column, family I can see that another field down here is year, so I can go right back I'm gonna. Do it the cheating like, because it's faster go right back here and say: okay, I, don't really want origin I want year, I just changed the command and.

A

When it's done searching it'll return back or years apparently blank, but if we pick another one of the fields that actually has a value in it, you'd be able to go really quickly and just see exactly. What's there like showed you before, you can do counts, so you know, there's three rows. It'll show you there's threes in the stats count any of the functions I mean I, kept it pretty basic, but any of the functions that are available.

A

Also in this case, if I decided I don't want my flights table anymore. I can because I'm logged in as admin on this box I have full administrator privileges which roll down to my cassandra configuration as well. So I have the ability to just change this and say you know you slights drop flights. It will work.

A

You can choose that. You can comment it out. It's pretty easily in the files that are part of the application, to tell you exactly where to get rid of that. If you don't want to work at all, I've just set it up to where it works for an administrator role, if you're in the admin group from role based access security. Here, it's going to give you access to those commands.

A

You can create any well anything you can do from cql. You can do from here, so you could create a couple of new column families really quickly. You create a new key space. Do a new configuration! However, you wanted to. Everything will be able to be done from within the one you I from a cql perspective, there's also a DB look up command, which is another pipe too deep.

A

You look up, which will let you use picasa to go retrieve data from column, families that we're not created in a way which can be accessed by cql three, so we have pi cos in here as well. I, don't use it much, but that's that's how we can pull all the data sets in. It's all pretty simple.

A

If you know how Splunk apps work, so everything is literally just combined within the app so there's a whole bunch of Python commands that run all these different things, whether it's c QT, b, c, ql or DB, discover, which is how we looked at the entire configuration the cluster or whether it's DB get keys, which will you pass it a key space, column, family combination.

A

It goes and gets all the keys all the row keys for you be careful with that one, because the flights database, my cassandra cluster- fell over yesterday because that's several hundred thousand rows and it didn't like that in a to gay, keep on my little vm DB insert, I can actually stream- and actually I was working with matt dennis on solving this problem down in Austin a few months ago, but it also had the capability to take a set of results that come from a Splunk search and do a DBM pipe at the DB insert and write it directly into a column family that you already know.

A

So we also be able to stream data out and into a Cohen family to do as a rare, stable or look up whatever you want. Some of this other stuff is just there because it has to be a DB. Schema is the way that we looked at inside the column, family or inside the key space. What they call and family configurations were so these literally anything with a dash command, is just a custom command. That's built within the app. These are all you know.

A

Everything is open within our apps, so it's check taking a break this apart. Do what you want with it, but these will all be here, it'll. Basically, let you do full searches, it'll, let you merge the searches with data that lift somewhere else under showing the config the actual Cassandra config I mean this is actually stored in a cassandra index, so I'm picking up a file. So if you look, it's actually Ernest punk index. Rather, if you look at the index, equals cast confidence.

A

The configuration index that comes as part of this app and when you run this, it's going to Ione lee have one configure.

A

Haven't made any config changes in seven days, but if I'd made config changes like it look by time and see how many things were deltas from one time to another, so when you're going out and trying to figure out whether or not something's, turned audits or an office and we'll win turned it on or turned it off, you can also use the alerting engine to quickly say if anything in the cast comp index changes.

A

You know it alert me, you can do what I used to do and have a kick puppet and go put everything back. There's lots of ways that you can do it, but it will let you do just about anything. You need to do from one place now. So I will open it up to questions. That's well! Oh I'll.

B

Actually put this light: if it's got my information on it too, so you can see that and then we'll open it up for questions.

B

No I should probably talk.

A

About this too, so this is what's in here now, so, what's there is we actually have the the Picasso support? Cql v3 support all the stuff that I showed you is kind of on the left.

A

What's coming in v1 dot, one is actual robe a role-based access integration with d s III and being able to provide API integration for ops center, so that, if you want to view all of your ops center metrics within here, we can quickly make a call out and just show you a summary of how everything's going not trying to replace the functionality just be able to present it in one you I. If you want it that way and.

B

A

Right now, the cullen family creation is capable of doing it now, but only with cql and we'll be able to do it with or without cql the vita one dot. One will be released sometime. This fall, like I, said the version that I just showed you today and all the functions that are there are going to be released within the next month. I'm getting support.

B

To take over so it'll have full enterprise support with all of our other applications.

A

So that's me and ways to contact me and anyone has any questions, feel free to ask or come up and catch me. If you don't want to talk publicly.

A

Okay, well, everybody just understood that fully so no question, it's great thanks.