youtube image
From YouTube: Using Spark Streaming for High Velocity Analytics on Cassandra

Description

1.) With each device comes an implicit contract with the end user: you give us the data, we give you the results. Now. Not tomorrow. Not even fifteen minutes from now.

2.) The flip side of getting data in real time is that users expect results in real time.

3.) In return for being wired into the internet 24x7 customers demand a similar level of responsiveness and even better availability.

__

Spark Streaming and Cassandra form the ideal combination of high velocity CEP and analytics with a high velocity and always on database.

Today's solutions don't scale to the Internet of tomorrow. The always-on nature of the emerging Internet of Things space means you need to process information at previously unseen scale and, more difficult, make sense out of that data.

Cassandra is the leader in large scale, high velocity, time series data workloads. While the Hadoop world has been stuck with legacy "batch analytics" technology, Cassandra users have been increasingly focused on the "now". Fast answers to easy questions about your data, at any velocity, and any scale. But Cassandra has always been weak on the "complex questions" problem. DataStax integrated with Hadoop to overcome this limitation, but it was always an awkward fit. Slow batch analytics on top of fast moving data really doesn't do you much good.

But Spark, and in this case, Spark Streaming, make high velocity streaming analytics at scale easier than ever, similar to how Cassandra pioneered high-velocity data management at scale.

Hadoop is the right choice for batch analytics. Until recently, nobody really knew what the right solution is for real-time processing. We believe that Spark and Cassandra are the clear answer.

---
Tupshin has been helping Cassandra users and DataStax customers build large scale, high velocity applications for years. As both a Solutions Architect as well as the lead Field Strategist for DataStax, he has seen deployments of every scale and in every sector. Recently specializing in the financial services space, he has worked with numerous banking industry customers to build and refine their mission critical, enterprise scale, operational data stores based on Cassandra and DataStax Enterprise. After 18 years in the Bay Area start-up scene, he recently packed up and moved to New York City.

Al is a father, technologist, musician, and open source advocate working for DataStax. While attending Central Michigan University as a music major, Al got into MUDs, C, and Linux, eventually ending up with a career as a sysadmin. Over the last 15 years, Al has worked on everything from kernel changes to modern web applications, mostly from inside operations teams. These days he goes by the title Open Source Mechanic, which means he tries to do interesting things with Cassandra and other open source software.