youtube image
From YouTube: Cleaning Large Scale Data in Cassandra | Apache Cassandra World Party 2022

Description

Presentation from the second Cassandra World Party on July 20, 2022. Learn more about Apache Cassandra at https://cassandra.apache.org/

Rahul Xavier Singh of Anant presents “Cleaning Large Scale Data in Cassandra”
So you have tons of data in Apache Cassandra. Then you realize years later that you didn't have any TTLs on your data and now there's terabytes or petabytes of data that needs to be cleaned out. We'll cover how we used Apache Airflow, Apache Spark, and some carefully crafted DAGs to democratize data cleanup and other common data operations without giving users access to the Spark Cluster. This pattern can also be used to build other common data operations such as import / export of data to and from Apache Cassandra using a variety of tools including DSBulk, or Scylla Migrator.

About the Presenter
Rahul leads a boutique consultancy that helps clients design, build, and operate global scale platforms that serve and impact large groups of people. He's been working with Cassandra for about 7 years and has been contributing to the community by publishing cassandra.link and running a weekly "Cassandra Lunch" for almost a year. He's recently gotten involved in the Cassandra Kubernetes SIG and helps out by updating the blog with updates.

Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Get started: https://cassandra.apache.org/quickstart/

The Apache Cassandra World Party 2022 was a virtual, one-day celebration on Wednesday, July 20, 2022 held in anticipation of the launch of 4.1

Join the Cassandra community: https://cassandra.apache.org/community/

#ApacheCassandra #database #NoSQL