Apache Cassandra / Cassandra Summit 2014

Add meeting Rate page Subscribe

Apache Cassandra / Cassandra Summit 2014

These are all the meetings we have in "Cassandra Summit 2014" (part of the organization "Apache Cassandra"). Click into individual meeting pages to watch the recording and search or read the transcript.

3 Mar 2015

In this talk we'll dive into how Cassandra nodes discover and communicate with each other, and share global state information via gossip. As the gossip subsystem seems shrouded in mystery to many folks, we'll peel back the layers and learn how it powers the underbelly of Cassandra.
  • 1 participant
  • 34 minutes
gossiping
gossip
cassandra
communicate
rumor
disseminating
sharing
networking
protocols
whatnot
youtube image

10 Oct 2014

Speakers: Seán O Sullivan, Service Reliability Engineer & Tim Czerniak, Software Engineer, at Demonware

This presentation covers the eight-month evaluation process we underwent to migrate some of Call of Duty’s core services from MySQL to Cassandra. We will outline our requirements, the process we followed for the evaluation, decisions we made around our schema, configuration and hardware, and some issues we encountered.
  • 2 participants
  • 31 minutes
server
demon
activision
deploying
hosts
diablo
consoles
dev
cassandra
ware
youtube image

10 Oct 2014

Speaker: Claudiu Barbura, Senior Director of Engineering at Atigeo

xPatterns is a big data analytics platform-as-a-service that enables rapid development of enterprise-grade analytical applications. It provides tools, API sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to Cassandra and solrCloud clusters for real-time access through low-latency/high-throughput (automatically generated) apis as well as dashboard and visualization api/tools leveraging the available data and models. In this talk I'll share some of the hard lessons we've learned in the past three years while leveraging Cassandra (and Hector) in large-scale enterprise-grade deployments. We will focus on three specific areas, in which we identified consistent best practices & design patterns: data model optimization as a result of exporting data from HDFS/Hive/Shark into Cassandra through Spark/Hadoop MR jobs under Mesos with throttling, instrumentation and resilience features, automatically publishing geo-replicated, instrumented and monitored REST API's on top of the exported Cassandra data, and lessons learned from running Cassandra at scale from 0.6 to 2.0.6, including performance tuning, and tips and tricks. You will see live demos of our Publish to NoSql tools (Spark/Shark, Mesos, Hive, Cassandra ), a dashboard application built on top of generated data apis (D3.js, Cassandra) and xPatterns' monitoring and instrumentation consoles (Graphite, Ganglia, Nagios).
  • 4 participants
  • 42 minutes
cassandra
experience
analytics
interface
demos
context
provisioning
intelligent
servers
a3
youtube image

10 Oct 2014

Speaker: Patrick McFadin, Chief Evangelist of Apache Cassandra at DataStax

A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
  • 1 participant
  • 36 minutes
cassandra
conference
summit
discussion
conversations
comments
great
went
join
fiasco
youtube image

10 Oct 2014

Speaker: John Sumsion, Software Developer at FamilySearch

FamilySearch hosts a collaborative family tree with over a billion editable records. The tree currently serves as many as 10,000 concurrent users at peak weekly load. These users come from across the globe and collectively maintain and enhance the tree around the clock. Recent efforts to port the tree from a relational database to Cassandra have resulted in drastically improved performance and scalability. The database consists of more than 5 billion records in journaled form, and we anticipate having over 10TB of live data available for user view & edit, with that data size growing significantly as our user base grows. The dataset has resisted sharding in the past, so the port involved rethinking the core data model. The model we chose retains the consistency that our users demand, and is able to be implemented without requiring ACID transactions. Specifically, the consistency model we chose combined a Convergent and Commutative Replicated Data Type (CvRDT and CmRDT) with Cassandra's atomic batch implementation to form the basis for a consistency model that met the demanding needs of the family tree application.
  • 3 participants
  • 37 minutes
familysearch
genealogical
mormons
ancestors
family
records
wikipedia
summarize
old
cassandra
youtube image

10 Oct 2014

Speaker: Michael Nelson, Development Manager at FamilySearch

A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
  • 1 participant
  • 30 minutes
mormons
mormon
cassandra
genealogy
oracle
tweaking
family
tree
million
backlog
youtube image

10 Oct 2014

Speaker: Ben Vanberg, Senior Software Engineer at FullContact

Here at FullContact we have lots and lots of contact data. In particular we have more than a billion profiles over which we would like to perform ad hoc data analysis. Much of this data resides in Cassandra, and we have many analytics MapReduce jobs that require us to iterate across terabytes of Cassandra data. To solve this problem we've implemented our own splittable input format which allows us to quickly process large SSTables for downstream analytics.
  • 1 participant
  • 36 minutes
contact
contacts
details
leveraging
profile
database
people
implementation
cassandra
helper
youtube image

10 Oct 2014

Speaker: Mohammed Guller, Application Architect & Lead Developer at Glassbeam

Learn how Cassandra can be used to build a multi-tenant solution for analyzing operational data from Internet of Complex Things (IoCT). IoCT includes complex systems such as computing, storage, networking and medical devices. In this session, we will discuss why Glassbeam migrated from a traditional RDBMS-based architecture to a Cassandra-based architecture. We will discuss the challenges with our first-generation architecture and how Cassandra helped us overcome those challenges. In addition, we will share our next-gen architecture and lessons learned.
  • 1 participant
  • 26 minutes
cassandra
discussion
analytics
insights
audience
taking
iot
management
technical
middleware
youtube image

10 Oct 2014

Speaker: Dave Gardner, Software Developer at Hailo

Building a successful startup is hard work. This talk will explain how Hailo has succeeded by putting C* front and center - building an award-winning, hundred-million-dollar startup that operates on three continents. In this talk, I will cover our journey from MySQL to Cassandra, and the challenges involved in migrating from MySQL to Cassandra, from single data-center to multi-data center, and upgrading versions of Cassandra -- all with zero down time. I will talk about the myriad use cases to which we have applied C*, including simple entity storage, time series indexes, and realtime analytics using HyperLogLog. Finally I'll cover some of the operational and organizational challenges of running Cassandra.
  • 1 participant
  • 39 minutes
halo
cassandra
location
migrations
app
consultancy
immortan
microservice
taxi
launched
youtube image

10 Oct 2014

Speaker: Adam Zegelin, CTO at Instaclustr

In this presentation we discuss a method of provisioning and running an Apache Cassandra deployment spilt between multiple heterogeneous data centers which, rather than allocating per-node public IPv4 addresses or configuring mesh VPNs, uses Port Address Translation (PAT) for node↔internet connectivity and is self- configuring and discoverable via DNS Service Discovery (DNS-SD or wide-area Bonjour). While Cassandra has built-in support for AWS EC2 multi-region/data centre topologies (via Ec2MultiRegionSnitch, etc), the existing solution requires the wasteful allocation of public IPv4 addresses per-node. Additionally there is little support for topologies that are either a mix of or deploy completely on alternative infrastructure providers. Our solution uses a single public IP address per data center, is provider-agnostic, doesn’t introduce the configuration and management overheads of a mesh VPN between data centres, and allows nodes to automatically discover each-other.
  • 1 participant
  • 30 minutes
datacenter
vpn
cluster
cassandra
connectivity
servers
dcs
multi
deployments
mapping
youtube image

10 Oct 2014

Speaker: Eiti Kimura, Software Engineer at Movile International

Apache Cassandra was adopted by Movile in 2009, and became a fundamental piece within the robust and scalable architecture to support more than 50 products, impacted by over 200MM users in Latin America. In this case we present the architecture of our ring, configuration details, detailed tuning, hardware used to be able to achieve our performance requirements (order of a few milliseconds), information storage strategies for network and disk space optimization, and best practices, in addition to showing the evolution of the architecture of simple systems to become scalable and distributed platforms. We introduced our cluster with a relatively low number of nodes (6) using commodity hardware to support critical high-performance applications. After this talk, you'll understand how Apache Cassandra was essential to evolve our systems and leverage the growth of our business. Movile is the leading mobile content company in Latin America. Movile’s products include mobile content, mobile TV, mobile learning, mobile games, mobile payment, mobile marketing and mobile commerce. Every month, it publishes content and services to more than 20 million mobile costumers. It has grown substantially over the last few years (with a more than 25-fold increase in its revenue over the last five years) both organically and through an aggressive M&A strategy, including five acquisitions in the last five years. Movile is positioning itself as a kind of Silicon Valley company based in Brazil. For the last two years, Movile has been named in the “Great Place to Work” list for technology companies in Brazil. The company shareholders include the founders of the company plus Naspers, a South-African media conglomerate.
  • 1 participant
  • 26 minutes
cassandra
app
users
model
servers
business
modern
camera
query
architectures
youtube image

10 Oct 2014

Speaker: Roopa Tangirala, Senior Cloud Data Architect at Netflix

High availability is an important requirement for any online business and trying to architect around failures and expecting infrastructure to fail, and even then be highly available, is the key to success. One such effort here at Netflix was the Active-Active implementation where we provided region resiliency. This presentation will discuss the brief overview of the active-active implementation and how it leveraged Cassandra’s architecture in the backend to achieve its goal. It will cover our journey through A-A from Cassandra’s perspective, the data validation we did to prove the backend would work without impacting customer experience. The various problems we faced, like long repair times and gc_grace settings, plus lessons learned and what would we do differently next time around, will also be discussed.
  • 1 participant
  • 37 minutes
netflix
active
streaming
services
capacity
binge
access
enterprise
amazon
cassandra
youtube image

10 Oct 2014

Speaker: Puneet Oberai, Senior Software Engineer at Netflix

In this session, we'll cover a quick introduction to the Astyanax Java client driver, powerful features, comparison to Java Driver and what to do with CQL3.
  • 1 participant
  • 34 minutes
cassandra
asthenics
protocol
astx
netflix
sdn
streaming
virtual
services
conversation
youtube image

10 Oct 2014

Speaker: Harold Nguyen, Senior Data Scientist at Nexgate

In this talk, we focus on a use case by showing how Cassandra can detect spam and spammers on social media. We also show how we use Cassandra to train our 100+ social-media-security classifiers. The accuracy of any security product is directly tied to the breadth of the corpus of data upon which it is built. For Nexgate, this means that the success of our products is inextricably tied to our ability to save everything we've ever scanned, but in a way that is still readily accessible. In the days before NoSQL, this was hard. This talk is about how Datastax and Cassandra make it easy.
  • 1 participant
  • 21 minutes
twitter
profile
monitoring
gate
facebook
enterprise
security
accounts
query
think
youtube image

10 Oct 2014

Speaker: Tammer Saleh, Director of Product - Cloud Foundry Services at Pivotal

Pivotal is dedicated to bringing best-of-breed data services to Pivotal CF, and there is no other open source data technology with as much potential as Cassandra. We’ll discuss the strategies and techniques for deploying and managing a multi-user Cassandra installation that integrates with Cloud Foundry.

- Making Cassandra manage itself
- Single-tenant versus Multi-tenant usage
- Deploying Cassandra with BOSH
- Cloud Foundry services architecture.
  • 1 participant
  • 28 minutes
pivotal
services
servers
deploying
staging
dashboard
cloud
cassandra
datastax
cf
youtube image

10 Oct 2014

Speaker: Chris Lohfink, Engineer at Pythian

This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
  • 1 participant
  • 37 minutes
monitoring
cassandra
analytics
dashboard
important
profiling
data
tuning
bother
proactive
youtube image

10 Oct 2014

Speaker: Ken Krugler, President of Scale Unlimited

Early Warning has information on hundreds of millions of people and companies. When a person wants to open a new bank account, they need to be able to accurately find similar entities in this large dataset, to provide a risk assessment. Using the combination of Cassandra & Solr via DSE, they can quickly find and evaluate all reasonable candidates.
  • 1 participant
  • 36 minutes
clients
consulting
account
talked
dealing
applicants
thinking
enterprise
decided
sophisticated
youtube image

10 Oct 2014

Speakers: Alexander Filipchik & Dustin Pham, Staff Software Engineers at Sony Network Entertainment

Since the launch of the PlayStation 4, many of the PSN features have been delivered using Cassandra. We will be talking about our experience as we launched one of the most popular gaming consoles in the world on well over 300 nodes.
- Why we picked Cassandra
- Exactly what PSN features for PS4 are powered by Cassandra
- The infrastructure used to deploy our clusters
- How we monitor system heath
- How we design, test and deploy
- Issues we faced and lessons learned along the way
  • 2 participants
  • 42 minutes
cassandra
playstation
performance
launch
customers
avid
having
san
insights
chatty
youtube image

10 Oct 2014

Speaker: Les Hazlewood, CTO at Stormpath and the Apache Shiro PMC Chair

In this session Les Hazlewood, the Apache Shiro PMC Chair, will cover Shiro's enterprise session management capabilities, how it can be used across any application (not just web or JEE applications) and how to use Cassandra as Shiro's session store, enabling a distributed session cluster supporting hundreds of thousands or even millions of concurrent sessions. As a working example, Les will show how to set up a session cluster in under 10 minutes using Cassandra. If you need to scale user session load, you won't want to miss this!
  • 3 participants
  • 41 minutes
session
cassandra
server
shiro
users
proxy
datastore
presentations
deploying
querying
youtube image

10 Oct 2014

Speaker: Dan Cundiff, Technical Architect Consultant at Target Corporation

This presentation will cover the problems we needed to solve, the journey we took to get there, and the lessons we learned along the way. We’ll cover the technical and non-technical aspects of this story.
  • 1 participant
  • 28 minutes
api
providers
servers
consumers
target
operationally
data
enterprise
matters
monitoring
youtube image

10 Oct 2014

Speaker: Robbie Strickland, Software Development Manager at The Weather Channel

As a reformed CQL critic, I'd like to help dispel the myths around CQL and extol its awesomeness. Most criticism comes from people like me who were early Cassandra adopters and are concerned about the SQL-like syntax, the apparent lack of control, and the reliance on a defined schema. I'll pop open the hood, showing just how the various CQL constructs translate to the underlying storage layer--and in the process I hope to give novices and old-timers alike a reason to love CQL.
  • 1 participant
  • 38 minutes
cassandra
discussion
twc
cq
useful
communicating
comm
data
advancing
weather
youtube image

10 Oct 2014

Speaker: John Berryman, Data Sadist at VividCortex

CQL3 is the newly ordained, canonical, and best-practices means of interacting with Cassandra. Indeed, the Apache Cassandra documentation itself declares the Thrift API as “legacy” and recommends that CQL3 be used instead. But I’ve heard several people express their concern over the added layer of abstraction. There seems to be an uncertainty about what’s really happening inside of Cassandra. In this presentation we will open up the hood and take a look at exactly how Cassandra is treating CQL3 queries. Our first stop will be the Cassandra data structure. We will briefly review the concepts of keyspaces, columnfamilies, rows, and columns. And we will explain where this data structure excels and where it does not. Composite rowkeys and columnnames are heavily used with CQL3, so we'll cover their functionality as well. We will then turn to CQL3. I will demonstrate the basic CQL syntax and show how it maps to the underlying data structure. We will see that CQL actually serves as a sort of best practices interface to the internal Cassandra data structure. We will take this point further by demonstrating CQL3 collections (set, list, and map) and showing how they are really just a creative use of this same internal data structure. Attendees will leave with a clear, inside-out understanding of CQL3 and will be able use CQL with a confidence that they are following best-practices.
  • 1 participant
  • 32 minutes
cassandras
cassandra
cql
eql
datastore
databases
api
query
connections
understanding
youtube image

10 Oct 2014

Speaker: Clinton Kelly, Member of Technical Staff at Wibidata

Cassandra’s scalability and robust feature set make it a natural choice for building personalized Big Data Applications such as product recommenders, personalized search engines and fraud detectors. However, creating such applications requires a lot of time, resources, and expertise to build the additional functionality needed in addition to the Cassandra platform. Enter: the Kiji Project. Kiji is an open-source, modular platform that provides developers a head start in building real-time, Big Data Applications on Cassandra. Created by engineers with experience building personalized applications at companies like Google, Cloudera and Amazon, Kiji includes modules for capturing data, analyzing data, training machine learning models, and applying machine learning models in real time. Let their expertise work for you! In this talk, we provide an overview of the Kiji Project, detail a case study of how one Fortune 500 retailer uses Kiji in production for product recommendations, and discuss how Kiji works with Cassandra.
  • 1 participant
  • 34 minutes
kiji
kijirest
kigi
wikidata
personalization
implementation
kit
bigtable
recommender
cassandra
youtube image