14 Mar 2016
Speaker: Darryl Kanouse, Senior Director - Consumer Technology
In 2014, with the launch of Call of Duty: Advanced Warfare, Activision released a system for messaging its users with highly personalized and contextually relevant communications designed to enhance the user experience and deepen user engagement. Now, in year two, the system is being extended to serve all Activision titles and to deliver these experiences in reaction to player behaviors. The key to success is Activision's ability to process massive amounts of data in real-time using a data center built around Cassandra as the primary user profile store.
In 2014, with the launch of Call of Duty: Advanced Warfare, Activision released a system for messaging its users with highly personalized and contextually relevant communications designed to enhance the user experience and deepen user engagement. Now, in year two, the system is being extended to serve all Activision titles and to deliver these experiences in reaction to player behaviors. The key to success is Activision's ability to process massive amounts of data in real-time using a data center built around Cassandra as the primary user profile store.
- 3 participants
- 41 minutes

14 Mar 2016
Speaker: Gordon Worley, Senior Software Engineer
At AdStage we have a large volume of data about ads and their relationships: campaigns, ad groups, keywords, bids, budgets, targeting info - the list goes on. We started out storing all this data in Postgres, but even before we reached public beta we were already putting too much strain on the largest Postgres instance we could run. We considered sharding, but instead we decided to embark on a project to store our data in Cassandra. Now, after more than a year of development, we present Monacella, a relational object database that uses Cassandra as the datastore. In this talk we'll examine the architecture of Monacella, its features and use cases, and plans for future development.
At AdStage we have a large volume of data about ads and their relationships: campaigns, ad groups, keywords, bids, budgets, targeting info - the list goes on. We started out storing all this data in Postgres, but even before we reached public beta we were already putting too much strain on the largest Postgres instance we could run. We considered sharding, but instead we decided to embark on a project to store our data in Cassandra. Now, after more than a year of development, we present Monacella, a relational object database that uses Cassandra as the datastore. In this talk we'll examine the architecture of Monacella, its features and use cases, and plans for future development.
- 2 participants
- 32 minutes

14 Mar 2016
Speaker: Chris Burroughs, Engineer
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
- 4 participants
- 36 minutes

14 Mar 2016
Speaker: Adrian Cockcroft, Technology Fellow
The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.
The SimianViz microservices simulator contains a model of Cassandra that allows large scale global deployments to be created and exercised by simulating failure modes and connecting the simulation to real monitoring tools to visualize the effects. The simulator is open source Go code at github.com/adrianco/spigo and is developing rapidly.
- 2 participants
- 34 minutes

14 Mar 2016
Speaker: Randy Fradin, Vice President
At BlackRock, we use Apache Cassandra in a variety of ways to help power our Aladdin investment management platform. In this talk I will give an overview of our use of Cassandra, with an emphasis on how we manage multi-tenancy in our Cassandra infrastructure. Multi-tenancy can mean different things to different people, but it often comes with added requirements related to security, isolation, and administration. I'll talk about how we operate (and make changes to) Cassandra to accommodate these needs in our platform.
At BlackRock, we use Apache Cassandra in a variety of ways to help power our Aladdin investment management platform. In this talk I will give an overview of our use of Cassandra, with an emphasis on how we manage multi-tenancy in our Cassandra infrastructure. Multi-tenancy can mean different things to different people, but it often comes with added requirements related to security, isolation, and administration. I'll talk about how we operate (and make changes to) Cassandra to accommodate these needs in our platform.
- 6 participants
- 38 minutes

14 Mar 2016
Speaker: Javed Roshan, Director of Data Services
As a leader in the financial industry, Capital One applications generate huge amounts of data that require fast and accurate handling, storage and analysis. We are transforming how we report operational data to our internal users so that they can make quick and precise business decisions to serve our customers. As part of this transformation, we are building a new Go-based data processing framework that will enable us to transfer data from multiple data stores (RDBMS, files, etc.) to a single NoSQL database - Cassandra. This new NoSQL store will act as a reporting database that will receive data on a near real-time basis and serve the data through scorecards and reports. We would like to share our experience in defining this fast data platform and the methodologies used to model financial data in Cassandra.
As a leader in the financial industry, Capital One applications generate huge amounts of data that require fast and accurate handling, storage and analysis. We are transforming how we report operational data to our internal users so that they can make quick and precise business decisions to serve our customers. As part of this transformation, we are building a new Go-based data processing framework that will enable us to transfer data from multiple data stores (RDBMS, files, etc.) to a single NoSQL database - Cassandra. This new NoSQL store will act as a reporting database that will receive data on a near real-time basis and serve the data through scorecards and reports. We would like to share our experience in defining this fast data platform and the methodologies used to model financial data in Cassandra.
- 8 participants
- 41 minutes

14 Mar 2016
Featuring Billy Bosworth, CEO of DataStax, Jonathan Ellis, Apache CassandraTM Project Chair, and Scott Guthrie, EVP of Microsoft.
The Cassandra Summit 2015 Keynote dives into the continued rise of NoSQL databases, Cassandra 3.0, and live demos featuring the leading distributed database technology, Apache Cassandra.
The Cassandra Summit 2015 Keynote dives into the continued rise of NoSQL databases, Cassandra 3.0, and live demos featuring the leading distributed database technology, Apache Cassandra.
- 7 participants
- 1:35 hours

14 Mar 2016
Speaker: Matthias Niehoff, IT Consultant
CQRS (Command Query Responsibility Segregation) is a pattern, which separates the process of querying and updating data. As a query only returns data without any side effects, a command is designed to change data. CQRS is often combined with Event Sourcing. This is an architecture in which all changes to an application state are stored as a sequence of events.
Because of its great capability to store time series data Cassandra is the perfect fit for implementing the event store. But there a still a lot of open questions: What about the data modeling? What techniques will be used to process and store data in the Cassandra database? How to access the current state of the application, without replaying every event? And what about failure handling?
In this talk, I will give a brief introduction to CQRS and the Event Sourcing pattern and will then answer the questions above using a real life example of a data store for customer data.
CQRS (Command Query Responsibility Segregation) is a pattern, which separates the process of querying and updating data. As a query only returns data without any side effects, a command is designed to change data. CQRS is often combined with Event Sourcing. This is an architecture in which all changes to an application state are stored as a sequence of events.
Because of its great capability to store time series data Cassandra is the perfect fit for implementing the event store. But there a still a lot of open questions: What about the data modeling? What techniques will be used to process and store data in the Cassandra database? How to access the current state of the application, without replaying every event? And what about failure handling?
In this talk, I will give a brief introduction to CQRS and the Event Sourcing pattern and will then answer the questions above using a real life example of a data store for customer data.
- 4 participants
- 30 minutes

14 Mar 2016
Speaker: Daniel Chia, Software Engineer
Like many startups, Coursera began its data storage journey with MySQL, a familiar and industry-proven database. As Coursera's user base grew from several thousand to many millions, we found that MySQL provided limited availability and restricted our ability to scale easily. New product initiatives and requirements provided a perfect opportunity to revisit our choice of core workhorse database.
After evaluating several NoSQL databases, including MongoDB, DynamoDB and HBase, we elected to transition to Cassandra . Cassandra's relative maturity, masterless architecture (for availability), tunable consistency, and stable low-latency performance made it a clear winner for our needs.
Learn more about what it takes to transition from SQL to Cassandra in this talk.
Like many startups, Coursera began its data storage journey with MySQL, a familiar and industry-proven database. As Coursera's user base grew from several thousand to many millions, we found that MySQL provided limited availability and restricted our ability to scale easily. New product initiatives and requirements provided a perfect opportunity to revisit our choice of core workhorse database.
After evaluating several NoSQL databases, including MongoDB, DynamoDB and HBase, we elected to transition to Cassandra . Cassandra's relative maturity, masterless architecture (for availability), tunable consistency, and stable low-latency performance made it a clear winner for our needs.
Learn more about what it takes to transition from SQL to Cassandra in this talk.
- 6 participants
- 41 minutes

14 Mar 2016
Speaker: Al Tobey, Partner Architect
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
- 1 participant
- 36 minutes

14 Mar 2016
Speaker: Patrick McFadin, Chief Evangelist
You know Cassandra works and can solve a lot of problems, but then you try to design it into your application and things start falling apart. Stop! This is where we need to have some real talk. I've been helping organizations implement Cassandra for years. True story. I can help! It's easy to get lost in the details, but making the switch to Cassandra is a journey of many steps. This will be a system of the next 30 years so take your time, do it right and feel the happiness. It's all there for you. Your health and sanity will be intact and most importantly, your job will be better!
You know Cassandra works and can solve a lot of problems, but then you try to design it into your application and things start falling apart. Stop! This is where we need to have some real talk. I've been helping organizations implement Cassandra for years. True story. I can help! It's easy to get lost in the details, but making the switch to Cassandra is a journey of many steps. This will be a system of the next 30 years so take your time, do it right and feel the happiness. It's all there for you. Your health and sanity will be intact and most importantly, your job will be better!
- 1 participant
- 31 minutes

14 Mar 2016
Speaker: Sebastian Estevez, Solutions Architect
The startup program has had over 600 applicants, from over 20 market verticals, leveraging a wide range of DSE features for their use cases. This talk is my best effort to synthesize key learnings to benefit future DSE powered startups (with major constraints on time and money). The talk will range from design and development to operations and production.
1) Program overview and startup breakdown
2) Development
3) Operations
4) Q&A
The startup program has had over 600 applicants, from over 20 market verticals, leveraging a wide range of DSE features for their use cases. This talk is my best effort to synthesize key learnings to benefit future DSE powered startups (with major constraints on time and money). The talk will range from design and development to operations and production.
1) Program overview and startup breakdown
2) Development
3) Operations
4) Q&A
- 1 participant
- 40 minutes

14 Mar 2016
Speaker: Mario Lazaro, Big Data - Software Engineer
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
- 6 participants
- 41 minutes

14 Mar 2016
Speaker: Chris Fregly, Data Solutions Engineer
The audience will participate in a live, interactive demo that generates high-quality recommendations using the latest Spark-Cassandra integration for real time, approximate, and advanced analytics including machine learning, graph processing, and text processing.
The audience will participate in a live, interactive demo that generates high-quality recommendations using the latest Spark-Cassandra integration for real time, approximate, and advanced analytics including machine learning, graph processing, and text processing.
- 2 participants
- 39 minutes

14 Mar 2016
Speaker: Christopher Reedijk, Distributed Applications Engineer, and Gary Stewart, Sr IT Specialist
ING is truly becoming an Engineering Company. In the past 2 years ING is really adopting Cassandra fast and at the same time, ING is also focusing on the Customer Experience using a touch-point architecture based more and more on micro-services. For most API's, Cassandra is a perfect match whilst easing availability challenges by being active-active and having an always-on architecture.
However, at ING we have plenty of relatively small use cases which makes justifying dedicated clusters difficult due to avoiding SAN and hardware choices already made. Therefore a shared cluster is inevitable. We call this KaaS Keyspace as a Service and can be compared to a hotel with tenants. One crucial aspect is that we need to create a sandbox environment for the entire organization as implementing Cassandra is hard and requires experience and un-learning.
Our talk will share our experiences on how we are making the switch to NoSQL accessible for the whole organization. It is far from easy and we will also talk about how we are addressing containment and a cost model for the hotel rooms.
ING is truly becoming an Engineering Company. In the past 2 years ING is really adopting Cassandra fast and at the same time, ING is also focusing on the Customer Experience using a touch-point architecture based more and more on micro-services. For most API's, Cassandra is a perfect match whilst easing availability challenges by being active-active and having an always-on architecture.
However, at ING we have plenty of relatively small use cases which makes justifying dedicated clusters difficult due to avoiding SAN and hardware choices already made. Therefore a shared cluster is inevitable. We call this KaaS Keyspace as a Service and can be compared to a hotel with tenants. One crucial aspect is that we need to create a sandbox environment for the entire organization as implementing Cassandra is hard and requires experience and un-learning.
Our talk will share our experiences on how we are making the switch to NoSQL accessible for the whole organization. It is far from easy and we will also talk about how we are addressing containment and a cost model for the hotel rooms.
- 3 participants
- 38 minutes

14 Mar 2016
Speaker: Peter Connolly, Senior Architect
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of ""doing more with less."" You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of ""doing more with less."" You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
- 6 participants
- 41 minutes

14 Mar 2016
Speaker: Sean Usher, Software Engineer
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running dse on azure.
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running dse on azure.
- 4 participants
- 40 minutes

14 Mar 2016
Speaker: Rob Bagby, Cloud Architect
We have the challenge of how to reliably store massive quantities of data that are available even in the face of infrastructure failures. We have similar challenges on the application side. The most successful cloud architectures break applications down into microservices. How then do we deploy, upgrade and manage the scale of those microservices? This session will illustrate how to tackle these challenges by taking advantage of both Cassandra and Microsoft's next generation PaaS infrastructure called Azure Service Fabric.
We have the challenge of how to reliably store massive quantities of data that are available even in the face of infrastructure failures. We have similar challenges on the application side. The most successful cloud architectures break applications down into microservices. How then do we deploy, upgrade and manage the scale of those microservices? This session will illustrate how to tackle these challenges by taking advantage of both Cassandra and Microsoft's next generation PaaS infrastructure called Azure Service Fabric.
- 4 participants
- 41 minutes

14 Mar 2016
Speaker: Carlos Alonso, Software Engineer
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
- 1 participant
- 36 minutes

14 Mar 2016
Speaker: Christos Kalantzis, Director of Engineering
This talk will cover how Netflix monitors its Cassandra fleet and the steps we take to make sure we can survive even the worst unplanned outages.
This talk will cover how Netflix monitors its Cassandra fleet and the steps we take to make sure we can survive even the worst unplanned outages.
- 8 participants
- 40 minutes

14 Mar 2016
Speaker: Michael Laing, Systems Architect
The Internet of Things uses Topics to tag information.
Topics are segmented named channels that are attached when information is sent or stored
IoT Brokers use Retained Storage to persistently store information by Topic.
Retained Storage is a data store that is searchable using Wildcards in Topics.
Wildcards are reserved characters that match a single level or multiple levels in a Topic
By externalizing Retained Storage to Cassandra, IoT broker instances can autoscale, potentially handling 10's of millions of clients.
This requires efficient queries using Wildcards in Cassandra to access Retained Storage.
I will present strategies for implementing fast Wildcard queries composed of sub-strategies such as:
- Cluster key shuffling and auto-inversion to help determine partition key and narrow the row slice
- Sparse secondary indexes to minimize filtering
- Stratio's Cassandra Lucene Index to augment or replace other sub-strategies
illustrated by comparative benchmarks.
I will further discuss integration with IoT brokers at scale.
The Internet of Things uses Topics to tag information.
Topics are segmented named channels that are attached when information is sent or stored
IoT Brokers use Retained Storage to persistently store information by Topic.
Retained Storage is a data store that is searchable using Wildcards in Topics.
Wildcards are reserved characters that match a single level or multiple levels in a Topic
By externalizing Retained Storage to Cassandra, IoT broker instances can autoscale, potentially handling 10's of millions of clients.
This requires efficient queries using Wildcards in Cassandra to access Retained Storage.
I will present strategies for implementing fast Wildcard queries composed of sub-strategies such as:
- Cluster key shuffling and auto-inversion to help determine partition key and narrow the row slice
- Sparse secondary indexes to minimize filtering
- Stratio's Cassandra Lucene Index to augment or replace other sub-strategies
illustrated by comparative benchmarks.
I will further discuss integration with IoT brokers at scale.
- 2 participants
- 39 minutes

14 Mar 2016
Speaker: Donny Nadolny, Scala Developer
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
- 7 participants
- 34 minutes

14 Mar 2016
Speaker: Paul Rechsteiner, Principal Engineer
Most Cassandra usages take advantage of its exceptional performance and ability to handle massive data sets. At PagerDuty, we use Cassandra for entirely different reasons: to reliably manage mutable application states and to maintain durability requirements even in the face of full data center outages. We achieve this by deploying Cassandra clusters with hosts in multiple WAN-separated data centers, configured with per-data center replica placement requirements, and with significant application-level support to use Cassandra as a consistent datastore. Accumulating several years of experience with this approach, we've learned to accommodate the impact of WAN network latency on Cassandra queries, how to horizontally scale while maintaining our placement invariants, why asymmetric load is experienced by nodes in different data centers, and more. This talk will go over our workload and design goals, detail the resultant Cassandra system design, and explain a number of our unintuitive operational learnings about this novel Cassandra usage paradigm.
Most Cassandra usages take advantage of its exceptional performance and ability to handle massive data sets. At PagerDuty, we use Cassandra for entirely different reasons: to reliably manage mutable application states and to maintain durability requirements even in the face of full data center outages. We achieve this by deploying Cassandra clusters with hosts in multiple WAN-separated data centers, configured with per-data center replica placement requirements, and with significant application-level support to use Cassandra as a consistent datastore. Accumulating several years of experience with this approach, we've learned to accommodate the impact of WAN network latency on Cassandra queries, how to horizontally scale while maintaining our placement invariants, why asymmetric load is experienced by nodes in different data centers, and more. This talk will go over our workload and design goals, detail the resultant Cassandra system design, and explain a number of our unintuitive operational learnings about this novel Cassandra usage paradigm.
- 5 participants
- 44 minutes

14 Mar 2016
Speaker: Aaron Stannard, CTO
The .NET ecosystem spent years on the sidelines, watching the NoSQL and distributed computing movements flourish in ecosystems like Java, Node.JS, and others.
Over the past year or so, the .NET ecosystem took matters into its own hands and has feverishly started adopting new ideas like NoSQL, reactive programming, the actor model, and more!
In this talk we're going to explore what the modern .NET enterprise stack looks like: Cassandra, Akka.NET, and Windows Azure. Also, we'll share what exciting new possibilities this has been able to create for some of the largest .NET shops in the world.
The .NET ecosystem spent years on the sidelines, watching the NoSQL and distributed computing movements flourish in ecosystems like Java, Node.JS, and others.
Over the past year or so, the .NET ecosystem took matters into its own hands and has feverishly started adopting new ideas like NoSQL, reactive programming, the actor model, and more!
In this talk we're going to explore what the modern .NET enterprise stack looks like: Cassandra, Akka.NET, and Windows Azure. Also, we'll share what exciting new possibilities this has been able to create for some of the largest .NET shops in the world.
- 2 participants
- 42 minutes

14 Mar 2016
Speaker: Ben Laplanche, Product Manager
Companies turn to PaaS and Cloud Native Applications to gain agility and speed. To provide customer value, a fault tolerant infrastructure is essential. But what happens if an entire data centre, region, or even country should go offline?
Cassandra holds the key to keeping application state in sync through replication, whilst Pivotal Cloud Foundry provides easy deployment to multiple IaaS providers. It also comes complete with a managed service offering for DataStax Enterprise.
This talk will discuss how this setup can be deployed in one day, including demonstrations and a walk-through of the key concepts, approaches, and considerations.
Companies turn to PaaS and Cloud Native Applications to gain agility and speed. To provide customer value, a fault tolerant infrastructure is essential. But what happens if an entire data centre, region, or even country should go offline?
Cassandra holds the key to keeping application state in sync through replication, whilst Pivotal Cloud Foundry provides easy deployment to multiple IaaS providers. It also comes complete with a managed service offering for DataStax Enterprise.
This talk will discuss how this setup can be deployed in one day, including demonstrations and a walk-through of the key concepts, approaches, and considerations.
- 1 participant
- 29 minutes

14 Mar 2016
Speaker: Harold Nguyen, Data Scientist
Social media has become the new frontier for cyber-attackers. The explosive growth of this new communications platform, combined with the potential to reach millions of people through a single post, has provided a low barrier for exploitation. In this talk, we will focus on how Cassandra is used to enable our fight against bad actors on social media. In particular, we will discuss how we use Cassandra for anomaly detection, social mob alerting, trending topics, and fraudulent classification. We will also speak about our Cassandra data models, integration with Spark Streaming, and how we use KairosDB for our time series data. Watch us don our superhero-Cassandra capes as we fight against the bad guys!
Social media has become the new frontier for cyber-attackers. The explosive growth of this new communications platform, combined with the potential to reach millions of people through a single post, has provided a low barrier for exploitation. In this talk, we will focus on how Cassandra is used to enable our fight against bad actors on social media. In particular, we will discuss how we use Cassandra for anomaly detection, social mob alerting, trending topics, and fraudulent classification. We will also speak about our Cassandra data models, integration with Spark Streaming, and how we use KairosDB for our time series data. Watch us don our superhero-Cassandra capes as we fight against the bad guys!
- 6 participants
- 29 minutes

14 Mar 2016
Speaker: Anastasia Zamyshlyaeva, VP Platform - Product Management
Cassandra's flexibility and scalability make it an ideal foundation for a modern data management architecture. Come hear how Reltio is using Cassandra, in combination with graph technologies and Spark to deliver a new breed of data-driven applications.
In this presentation you'll find out:
- How we ended up selecting Cassandra
- The unique characteristics of data-driven applications
- The best practices we learned by combining Cassandra, graph technology, Spark and more
Cassandra's flexibility and scalability make it an ideal foundation for a modern data management architecture. Come hear how Reltio is using Cassandra, in combination with graph technologies and Spark to deliver a new breed of data-driven applications.
In this presentation you'll find out:
- How we ended up selecting Cassandra
- The unique characteristics of data-driven applications
- The best practices we learned by combining Cassandra, graph technology, Spark and more
- 5 participants
- 34 minutes

14 Mar 2016
Speaker: Jerome Louvel
Starting from the persistence needs of an API PaaS, we'll explain how we selected Cassandra and, finally, DSE Search, the main challenges we faced both in term of development and operations, and the solutions we have implemented.
Starting from the persistence needs of an API PaaS, we'll explain how we selected Cassandra and, finally, DSE Search, the main challenges we faced both in term of development and operations, and the solutions we have implemented.
- 6 participants
- 36 minutes

14 Mar 2016
Speaker: Alexander Filipchik, Principal Software Engineer
It has been 2 years and 20 million+ consoles sold since the Playstation 4 launch, and Cassandra is still alive and well within our infrastructure. We will cover various aspects of running Cassandra at large scale, share our findings, and discuss some tricks that can make your lives easier. We will share how we handle varying use cases such as batch analytics using Spark to how we provide real-time personalized search. And just like before, we will be having a raffle (Last time, one lucky attendee walked away with a brand new PS4 destiny edition!).
It has been 2 years and 20 million+ consoles sold since the Playstation 4 launch, and Cassandra is still alive and well within our infrastructure. We will cover various aspects of running Cassandra at large scale, share our findings, and discuss some tricks that can make your lives easier. We will share how we handle varying use cases such as batch analytics using Spark to how we provide real-time personalized search. And just like before, we will be having a raffle (Last time, one lucky attendee walked away with a brand new PS4 destiny edition!).
- 8 participants
- 47 minutes

14 Mar 2016
Speaker: Amit Bhayani, Co-Founder
Cassandra is commonly perceived as a write heavy NOSQL database. However at TeleStax, Cassandra is used for even read operations! It processes millions of SMS per day and writes billions of CDR's. The case study shows how TelScale SMSC Gateway leverages Cassandra's scalability and high availability without compromising performance. We will talk about architecture used to make sure SMSC can read records in milliseconds and achieve linear scalability. This new generation SMSC GW offers 100x better performance / price ratio than traditional SMSC from incumbent NEPs.
Cassandra is commonly perceived as a write heavy NOSQL database. However at TeleStax, Cassandra is used for even read operations! It processes millions of SMS per day and writes billions of CDR's. The case study shows how TelScale SMSC Gateway leverages Cassandra's scalability and high availability without compromising performance. We will talk about architecture used to make sure SMSC can read records in milliseconds and achieve linear scalability. This new generation SMSC GW offers 100x better performance / price ratio than traditional SMSC from incumbent NEPs.
- 4 participants
- 37 minutes

14 Mar 2016
Speaker: Amrith Kumar, Founder and CTO
This presentation is aimed at IT managers, DevOps or Developers in an IT organization, as it presents an in depth exploration of the architecture and internals of Cassandra databases with OpenStack Trove.
The presentation will start with an overview of the Trove architecture, exploring such concepts as ""How does Trove interact with other OpenStack services"", ""What are the various components of Trove"", ""What are guest agents"", ""How are requests to Trove processed"", and ""How does Trove support multiple database types"".
It will next explain how Trove supports the Apache Cassandra NoSQL database and how to deploy and manage a Cassandra database with Trove. It will then describe Trove's framework for implementing clustering and how Cassandra clusters are deployed through Trove.
It helps the participant understand the internals and architecture of Trove and provides the participant with knowledge that would be useful in assessing, deploying and managing a Cassandra database with Trove.
This presentation is aimed at IT managers, DevOps or Developers in an IT organization, as it presents an in depth exploration of the architecture and internals of Cassandra databases with OpenStack Trove.
The presentation will start with an overview of the Trove architecture, exploring such concepts as ""How does Trove interact with other OpenStack services"", ""What are the various components of Trove"", ""What are guest agents"", ""How are requests to Trove processed"", and ""How does Trove support multiple database types"".
It will next explain how Trove supports the Apache Cassandra NoSQL database and how to deploy and manage a Cassandra database with Trove. It will then describe Trove's framework for implementing clustering and how Cassandra clusters are deployed through Trove.
It helps the participant understand the internals and architecture of Trove and provides the participant with knowledge that would be useful in assessing, deploying and managing a Cassandra database with Trove.
- 4 participants
- 40 minutes

14 Mar 2016
Speaker: Mick Semb Wever, Team Member
Monitoring provides information on system performance, however tracing is necessary to understand individual request performance. Detailed query tracing has been provided by Cassandra since version 1.2 and is invaluable when diagnosing problems. Although knowing what queries to trace and why the application makes them still requires deep technical knowledge. By merging Application tracing via Zipkin and Cassandra query tracing we automate the process and make it easier to identify and resolve problems. In this talk Mick Semb Wever, Team Member at The Last Pickle, will introduce Cassandra query tracing and Zipkin. He will then propose an extension that allows clients to pass a trace identifier through to Cassandra, and a way to integrate Zipkin tracing into Cassandra. Driving all this is the desire to create one tracing view across the entire system.
Monitoring provides information on system performance, however tracing is necessary to understand individual request performance. Detailed query tracing has been provided by Cassandra since version 1.2 and is invaluable when diagnosing problems. Although knowing what queries to trace and why the application makes them still requires deep technical knowledge. By merging Application tracing via Zipkin and Cassandra query tracing we automate the process and make it easier to identify and resolve problems. In this talk Mick Semb Wever, Team Member at The Last Pickle, will introduce Cassandra query tracing and Zipkin. He will then propose an extension that allows clients to pass a trace identifier through to Cassandra, and a way to integrate Zipkin tracing into Cassandra. Driving all this is the desire to create one tracing view across the entire system.
- 3 participants
- 41 minutes

14 Mar 2016
Speaker: Nate McCall, Co-Founder
Security is always at odds with usability, particularly in the context of operations and development. More so when dealing with a distributed system such as Apache Cassandra. In this presentation, we'll walk through the steps required to completely secure a Cassandra cluster to meet most regulatory and compliance guidelines.
Topics will include:
- Encrypting cross-DC traffic
- Different types of at-rest disk encryption options available (and how to tune them)
- Configuring SSL for inter-cluster communication
- Configuring SSL between clients and the API
- Configuring and managing client authentication
Attendees will leave this presentation with the knowledge required to harden Cassandra to meet most guidelines imposed by regulations and compliance.
Security is always at odds with usability, particularly in the context of operations and development. More so when dealing with a distributed system such as Apache Cassandra. In this presentation, we'll walk through the steps required to completely secure a Cassandra cluster to meet most regulatory and compliance guidelines.
Topics will include:
- Encrypting cross-DC traffic
- Different types of at-rest disk encryption options available (and how to tune them)
- Configuring SSL for inter-cluster communication
- Configuring SSL between clients and the API
- Configuring and managing client authentication
Attendees will leave this presentation with the knowledge required to harden Cassandra to meet most guidelines imposed by regulations and compliance.
- 4 participants
- 34 minutes

14 Mar 2016
Speaker: Aaron Morton, Co-Founder
Apache Cassandra makes it possible to write code on a laptop and deploy to multi-region clusters with a few configuration changes. But what does it take to create repeatable, scalable, reliable, and observable clusters?
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the tools and techniques they use. From environment planning to implementation for tools such as Chef, Sensu, Graphite, Riemann and LogStash this will be a discussion of the full stack ecosystem for successful projects.
Apache Cassandra makes it possible to write code on a laptop and deploy to multi-region clusters with a few configuration changes. But what does it take to create repeatable, scalable, reliable, and observable clusters?
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the tools and techniques they use. From environment planning to implementation for tools such as Chef, Sensu, Graphite, Riemann and LogStash this will be a discussion of the full stack ecosystem for successful projects.
- 4 participants
- 41 minutes

14 Mar 2016
Speaker: Jianmin Wang, Professor
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
- 4 participants
- 37 minutes

14 Mar 2016
Speaker: John Gao, Manager - Product Development
Online analytical processing (OLAP) enables real-time processing of large multidimensional databases where the potential query is not known a priority. A more light-weight approach is preferred where queries are known and where the data is not hierarchical or not structured in relational set. The advantage of NOSQL database, such as Cassandra in this light-weight model is that it can be used for caching and pre-aggregation of data sets, against known query constructs. In this way, Apache Cassandra data layer can hide data complexity while decoupling the underlying data sources and consuming solution layers. Vertafore uses Apache Cassandra to create an enterprise platform data abstraction layer and thus provides an analytic solution to both on-premises and online users.
Online analytical processing (OLAP) enables real-time processing of large multidimensional databases where the potential query is not known a priority. A more light-weight approach is preferred where queries are known and where the data is not hierarchical or not structured in relational set. The advantage of NOSQL database, such as Cassandra in this light-weight model is that it can be used for caching and pre-aggregation of data sets, against known query constructs. In this way, Apache Cassandra data layer can hide data complexity while decoupling the underlying data sources and consuming solution layers. Vertafore uses Apache Cassandra to create an enterprise platform data abstraction layer and thus provides an analytic solution to both on-premises and online users.
- 2 participants
- 25 minutes

14 Mar 2016
Speaker: Robert Strickland, Director of Software Engineering
We hear a lot about lambda architectures and how Cassandra and Spark can help us crunch our data both in batch and real-time. After a year in the trenches, I'll share how we at The Weather Company built a general purpose, weather-scale event processing pipeline to make sense of billions of events each day. If you want to avoid much of the pain learning how to get it right, this talk is for you.
We hear a lot about lambda architectures and how Cassandra and Spark can help us crunch our data both in batch and real-time. After a year in the trenches, I'll share how we at The Weather Company built a general purpose, weather-scale event processing pipeline to make sense of billions of events each day. If you want to avoid much of the pain learning how to get it right, this talk is for you.
- 8 participants
- 41 minutes

14 Mar 2016
Speaker: Ted Wilmes, Senior Data Warehouse Engineer
The graph database, TitanDB, with Cassandra as its backing store, provides a powerful platform for modeling and extracting insights from the connected world of today's internet of things. This talk will briefly cover graph database basics and then dive into IoT specific use cases with a focus on data modeling and performance considerations.
The graph database, TitanDB, with Cassandra as its backing store, provides a powerful platform for modeling and extracting insights from the connected world of today's internet of things. This talk will briefly cover graph database basics and then dive into IoT specific use cases with a focus on data modeling and performance considerations.
- 7 participants
- 46 minutes

14 Mar 2016
Speaker: Julien Anguenot, VP Software Engineering at iland
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
- 1 participant
- 35 minutes
