Apache Cassandra Cassandra Summit 2014 Open Meetings

3 Mar 2015

In this talk we'll dive into how Cassandra nodes discover and communicate with each other, and share global state information via gossip. As the gossip subsystem seems shrouded in mystery to many folks, we'll peel back the layers and learn how it powers the underbelly of Cassandra.

1 participant
34 minutes

gossiping

gossip

cassandra

communicate

rumor

disseminating

sharing

networking

protocols

whatnot

10 Oct 2014

Speakers: Seán O Sullivan, Service Reliability Engineer & Tim Czerniak, Software Engineer, at Demonware

This presentation covers the eight-month evaluation process we underwent to migrate some of Call of Duty’s core services from MySQL to Cassandra. We will outline our requirements, the process we followed for the evaluation, decisions we made around our schema, configuration and hardware, and some issues we encountered.

2 participants
31 minutes

server

demon

activision

deploying

hosts

diablo

consoles

dev

cassandra

ware

10 Oct 2014

Speaker: Claudiu Barbura, Senior Director of Engineering at Atigeo

xPatterns is a big data analytics platform-as-a-service that enables rapid development of enterprise-grade analytical applications. It provides tools, API sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to Cassandra and solrCloud clusters for real-time access through low-latency/high-throughput (automatically generated) apis as well as dashboard and visualization api/tools leveraging the available data and models. In this talk I'll share some of the hard lessons we've learned in the past three years while leveraging Cassandra (and Hector) in large-scale enterprise-grade deployments. We will focus on three specific areas, in which we identified consistent best practices & design patterns: data model optimization as a result of exporting data from HDFS/Hive/Shark into Cassandra through Spark/Hadoop MR jobs under Mesos with throttling, instrumentation and resilience features, automatically publishing geo-replicated, instrumented and monitored REST API's on top of the exported Cassandra data, and lessons learned from running Cassandra at scale from 0.6 to 2.0.6, including performance tuning, and tips and tricks. You will see live demos of our Publish to NoSql tools (Spark/Shark, Mesos, Hive, Cassandra ), a dashboard application built on top of generated data apis (D3.js, Cassandra) and xPatterns' monitoring and instrumentation consoles (Graphite, Ganglia, Nagios).

4 participants
42 minutes

cassandra

experience

analytics

interface

demos

context

provisioning

intelligent

servers

a3

10 Oct 2014

Speaker: Patrick McFadin, Chief Evangelist of Apache Cassandra at DataStax

A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!

1 participant
36 minutes

cassandra

conference

summit

discussion

conversations

comments

great

went

join

fiasco

10 Oct 2014

Speaker: John Sumsion, Software Developer at FamilySearch

FamilySearch hosts a collaborative family tree with over a billion editable records. The tree currently serves as many as 10,000 concurrent users at peak weekly load. These users come from across the globe and collectively maintain and enhance the tree around the clock. Recent efforts to port the tree from a relational database to Cassandra have resulted in drastically improved performance and scalability. The database consists of more than 5 billion records in journaled form, and we anticipate having over 10TB of live data available for user view & edit, with that data size growing significantly as our user base grows. The dataset has resisted sharding in the past, so the port involved rethinking the core data model. The model we chose retains the consistency that our users demand, and is able to be implemented without requiring ACID transactions. Specifically, the consistency model we chose combined a Convergent and Commutative Replicated Data Type (CvRDT and CmRDT) with Cassandra's atomic batch implementation to form the basis for a consistency model that met the demanding needs of the family tree application.

3 participants
37 minutes

familysearch

genealogical

mormons

ancestors

family

records

wikipedia

summarize

old

cassandra

10 Oct 2014

Speaker: Michael Nelson, Development Manager at FamilySearch

A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.

1 participant
30 minutes

mormons

mormon

cassandra

genealogy

oracle

tweaking

family

tree

million

backlog

10 Oct 2014

Speaker: Ben Vanberg, Senior Software Engineer at FullContact

Here at FullContact we have lots and lots of contact data. In particular we have more than a billion profiles over which we would like to perform ad hoc data analysis. Much of this data resides in Cassandra, and we have many analytics MapReduce jobs that require us to iterate across terabytes of Cassandra data. To solve this problem we've implemented our own splittable input format which allows us to quickly process large SSTables for downstream analytics.

1 participant
36 minutes

contact

contacts

details

leveraging

profile

database

people

implementation

cassandra

helper

10 Oct 2014

Speaker: Mohammed Guller, Application Architect & Lead Developer at Glassbeam

Learn how Cassandra can be used to build a multi-tenant solution for analyzing operational data from Internet of Complex Things (IoCT). IoCT includes complex systems such as computing, storage, networking and medical devices. In this session, we will discuss why Glassbeam migrated from a traditional RDBMS-based architecture to a Cassandra-based architecture. We will discuss the challenges with our first-generation architecture and how Cassandra helped us overcome those challenges. In addition, we will share our next-gen architecture and lessons learned.

1 participant
26 minutes

cassandra

discussion

analytics

insights

audience

taking

iot

management

technical

middleware

10 Oct 2014

Speaker: Dave Gardner, Software Developer at Hailo

Building a successful startup is hard work. This talk will explain how Hailo has succeeded by putting C* front and center - building an award-winning, hundred-million-dollar startup that operates on three continents. In this talk, I will cover our journey from MySQL to Cassandra, and the challenges involved in migrating from MySQL to Cassandra, from single data-center to multi-data center, and upgrading versions of Cassandra -- all with zero down time. I will talk about the myriad use cases to which we have applied C*, including simple entity storage, time series indexes, and realtime analytics using HyperLogLog. Finally I'll cover some of the operational and organizational challenges of running Cassandra.

1 participant
39 minutes

halo

cassandra

location

migrations

app

consultancy

immortan

microservice

taxi

launched

10 Oct 2014

Speaker: Adam Zegelin, CTO at Instaclustr

In this presentation we discuss a method of provisioning and running an Apache Cassandra deployment spilt between multiple heterogeneous data centers which, rather than allocating per-node public IPv4 addresses or configuring mesh VPNs, uses Port Address Translation (PAT) for node↔internet connectivity and is self- configuring and discoverable via DNS Service Discovery (DNS-SD or wide-area Bonjour). While Cassandra has built-in support for AWS EC2 multi-region/data centre topologies (via Ec2MultiRegionSnitch, etc), the existing solution requires the wasteful allocation of public IPv4 addresses per-node. Additionally there is little support for topologies that are either a mix of or deploy completely on alternative infrastructure providers. Our solution uses a single public IP address per data center, is provider-agnostic, doesn’t introduce the configuration and management overheads of a mesh VPN between data centres, and allows nodes to automatically discover each-other.

1 participant
30 minutes

datacenter

vpn

cluster

cassandra

connectivity

servers

dcs

multi

deployments

mapping

10 Oct 2014

Speaker: Eiti Kimura, Software Engineer at Movile International

Apache Cassandra was adopted by Movile in 2009, and became a fundamental piece within the robust and scalable architecture to support more than 50 products, impacted by over 200MM users in Latin America. In this case we present the architecture of our ring, configuration details, detailed tuning, hardware used to be able to achieve our performance requirements (order of a few milliseconds), information storage strategies for network and disk space optimization, and best practices, in addition to showing the evolution of the architecture of simple systems to become scalable and distributed platforms. We introduced our cluster with a relatively low number of nodes (6) using commodity hardware to support critical high-performance applications. After this talk, you'll understand how Apache Cassandra was essential to evolve our systems and leverage the growth of our business. Movile is the leading mobile content company in Latin America. Movile’s products include mobile content, mobile TV, mobile learning, mobile games, mobile payment, mobile marketing and mobile commerce. Every month, it publishes content and services to more than 20 million mobile costumers. It has grown substantially over the last few years (with a more than 25-fold increase in its revenue over the last five years) both organically and through an aggressive M&A strategy, including five acquisitions in the last five years. Movile is positioning itself as a kind of Silicon Valley company based in Brazil. For the last two years, Movile has been named in the “Great Place to Work” list for technology companies in Brazil. The company shareholders include the founders of the company plus Naspers, a South-African media conglomerate.

1 participant
26 minutes

cassandra

app

users

model

servers

business

modern

camera

query

architectures

10 Oct 2014

Speaker: Roopa Tangirala, Senior Cloud Data Architect at Netflix

High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.

1 participant
37 minutes

netflix

active

streaming

services

capacity

binge

access

enterprise

amazon

cassandra

10 Oct 2014

Speaker: Puneet Oberai, Senior Software Engineer at Netflix

In this session, we'll cover a quick introduction to the Astyanax Java client driver, powerful features, comparison to Java Driver and what to do with CQL3.

1 participant
34 minutes

cassandra

asthenics

protocol

astx

netflix

sdn

streaming

virtual

services

conversation

10 Oct 2014

Speaker: Harold Nguyen, Senior Data Scientist at Nexgate

In this talk, we focus on a use case by showing how Cassandra can detect spam and spammers on social media. We also show how we use Cassandra to train our 100+ social-media-security classifiers. The accuracy of any security product is directly tied to the breadth of the corpus of data upon which it is built. For Nexgate, this means that the success of our products is inextricably tied to our ability to save everything we've ever scanned, but in a way that is still readily accessible. In the days before NoSQL, this was hard. This talk is about how Datastax and Cassandra make it easy.

1 participant
21 minutes

twitter

profile

monitoring

gate

facebook

enterprise

security

accounts

query

think

10 Oct 2014

Speaker: Tammer Saleh, Director of Product - Cloud Foundry Services at Pivotal

Pivotal is dedicated to bringing best-of-breed data services to Pivotal CF, and there is no other open source data technology with as much potential as Cassandra. We’ll discuss the strategies and techniques for deploying and managing a multi-user Cassandra installation that integrates with Cloud Foundry.

- Making Cassandra manage itself
- Single-tenant versus Multi-tenant usage
- Deploying Cassandra with BOSH
- Cloud Foundry services architecture.

1 participant
28 minutes

pivotal

services

servers

deploying

staging

dashboard

cloud

cassandra

datastax

cf

10 Oct 2014

Speaker: Chris Lohfink, Engineer at Pythian

This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.

1 participant
37 minutes

monitoring

cassandra

analytics

dashboard

important

profiling

data

tuning

bother

proactive

10 Oct 2014

Speaker: Ken Krugler, President of Scale Unlimited

Early Warning has information on hundreds of millions of people and companies. When a person wants to open a new bank account, they need to be able to accurately find similar entities in this large dataset, to provide a risk assessment. Using the combination of Cassandra & Solr via DSE, they can quickly find and evaluate all reasonable candidates.

1 participant
36 minutes

clients

consulting

account

talked

dealing

applicants

thinking

enterprise

decided

sophisticated

10 Oct 2014

Speakers: Alexander Filipchik & Dustin Pham, Staff Software Engineers at Sony Network Entertainment

Since the launch of the PlayStation 4, many of the PSN features have been delivered using Cassandra. We will be talking about our experience as we launched one of the most popular gaming consoles in the world on well over 300 nodes.
- Why we picked Cassandra
- Exactly what PSN features for PS4 are powered by Cassandra
- The infrastructure used to deploy our clusters
- How we monitor system heath
- How we design, test and deploy
- Issues we faced and lessons learned along the way

2 participants
42 minutes

cassandra

playstation

performance

launch

customers

avid

having

san

insights

chatty

10 Oct 2014

Speaker: Les Hazlewood, CTO at Stormpath and the Apache Shiro PMC Chair

In this session Les Hazlewood, the Apache Shiro PMC Chair, will cover Shiro's enterprise session management capabilities, how it can be used across any application (not just web or JEE applications) and how to use Cassandra as Shiro's session store, enabling a distributed session cluster supporting hundreds of thousands or even millions of concurrent sessions. As a working example, Les will show how to set up a session cluster in under 10 minutes using Cassandra. If you need to scale user session load, you won't want to miss this!

3 participants
41 minutes

session

cassandra

server

shiro

users

proxy

datastore

presentations

deploying

querying

10 Oct 2014

Speaker: Dan Cundiff, Technical Architect Consultant at Target Corporation

This presentation will cover the problems we needed to solve, the journey we took to get there, and the lessons we learned along the way. We’ll cover the technical and non-technical aspects of this story.

1 participant
28 minutes

api

providers

servers

consumers

target

operationally

data

enterprise

matters

monitoring

10 Oct 2014

Speaker: Robbie Strickland, Software Development Manager at The Weather Channel

As a reformed CQL critic, I'd like to help dispel the myths around CQL and extol its awesomeness. Most criticism comes from people like me who were early Cassandra adopters and are concerned about the SQL-like syntax, the apparent lack of control, and the reliance on a defined schema. I'll pop open the hood, showing just how the various CQL constructs translate to the underlying storage layer--and in the process I hope to give novices and old-timers alike a reason to love CQL.

1 participant
38 minutes

cassandra

discussion

twc

cq

useful

communicating

comm

data

advancing

weather

10 Oct 2014

Speaker: John Berryman, Data Sadist at VividCortex

CQL3 is the newly ordained, canonical, and best-practices means of interacting with Cassandra. Indeed, the Apache Cassandra documentation itself declares the Thrift API as “legacy” and recommends that CQL3 be used instead. But I’ve heard several people express their concern over the added layer of abstraction. There seems to be an uncertainty about what’s really happening inside of Cassandra. In this presentation we will open up the hood and take a look at exactly how Cassandra is treating CQL3 queries. Our first stop will be the Cassandra data structure. We will briefly review the concepts of keyspaces, columnfamilies, rows, and columns. And we will explain where this data structure excels and where it does not. Composite rowkeys and columnnames are heavily used with CQL3, so we'll cover their functionality as well. We will then turn to CQL3. I will demonstrate the basic CQL syntax and show how it maps to the underlying data structure. We will see that CQL actually serves as a sort of best practices interface to the internal Cassandra data structure. We will take this point further by demonstrating CQL3 collections (set, list, and map) and showing how they are really just a creative use of this same internal data structure. Attendees will leave with a clear, inside-out understanding of CQL3 and will be able use CQL with a confidence that they are following best-practices.

1 participant
32 minutes

cassandras

cassandra

cql

eql

datastore

databases

api

query

connections

understanding

10 Oct 2014

Speaker: Clinton Kelly, Member of Technical Staff at Wibidata

Cassandra’s scalability and robust feature set make it a natural choice for building personalized Big Data Applications such as product recommenders, personalized search engines and fraud detectors. However, creating such applications requires a lot of time, resources, and expertise to build the additional functionality needed in addition to the Cassandra platform. Enter: the Kiji Project. Kiji is an open-source, modular platform that provides developers a head start in building real-time, Big Data Applications on Cassandra. Created by engineers with experience building personalized applications at companies like Google, Cloudera and Amazon, Kiji includes modules for capturing data, analyzing data, training machine learning models, and applying machine learning models in real time. Let their expertise work for you! In this talk, we provide an overview of the Kiji Project, detail a case study of how one Fortune 500 retailer uses Kiji in production for product recommendations, and discuss how Kiji works with Cassandra.

1 participant
34 minutes

kiji

kijirest

kigi

wikidata

personalization

implementation

kit

bigtable

recommender

cassandra

Apache Cassandra / Cassandra Summit 2014

3 Mar 2015

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014

10 Oct 2014