Ceph / Ceph Virtual 2022

Add meeting Rate page Subscribe

Ceph / Ceph Virtual 2022

These are all the meetings we have in "Ceph Virtual 2022" (part of the organization "Ceph"). Click into individual meeting pages to watch the recording and search or read the transcript.

23 Nov 2022

Presented by: Jiffin Tony Thottan

Introduction to Container Object Storage Interface aka COSI for ceph RGW

For applications in Kubernetes, CSI provides a way to consume file/block storage for their workloads. The main motivation behind the Container Object Storage Interface is to provide a similar experience for Object Store. Basic idea is to provide a generic, dynamic provisioning API to consume the object store and the app pods can access the bucket in the underlying object-store like a PVC. The major challenge for this implementation there is no standard protocol defined for object and the COSI project need to be vendor agonistic. It won't handle the orchestration/management of object store, rather it will be another client and provide bucket access on behalf of applications running in Kubernetes. The initial version of the ceph-cosi driver can be found at https://github.com/ceph/ceph-cosi.
  • 2 participants
  • 27 minutes
storage
csa
kubernetes
csi
topics
project
handled
volume
cefcocet
ssf
youtube image

23 Nov 2022

Presented by: Jonas Pfefferle
NVMe-over-Fabrics support for Ceph

NVMe-over-Fabrics (NVMeoF) is an open, widely adopted, defacto standard in high performance remote block storage access. More and more storage vendors are introducing NVMeoF target support, with hardware offloads both for NVMeoF targets and initiators. Ceph does not support the NVMeoF protocol for block storage access; its clients use the Ceph RADOS protocol to access RBD images for good reason: RADOS is a distributed m-to-n protocol that provides reliable object access to sharded and replicated (or erasure coded) Ceph storage. However, there are good reasons to enable NVMeoF for Ceph: to enable its use in datacenters that are already utilizing storage hardware that offload NVMeoF capabilities, and to allow existing NVMeoF storage users to easily migrate to Ceph. In this talk we present our effort to integrate a native NVMeoF target for Ceph RBD. We discuss some of the challenges of implementing such a support for Ceph including subsystem/namespace discovery, multi-pathing for fault tolerance and performance, authentication and access control (e.g., namespace masking). Furthermore, we describe how the NVMeoF target design can be extended to allow reducing additional network hops by leveraging the Ceph CRUSH algorithm (ADNN).

Event link: https://ceph.io/en/community/events/2022/ceph-virtual/
  • 2 participants
  • 41 minutes
nvm
routers
fabric
vpn
threads
connection
users
protocols
envy
offloading
youtube image

23 Nov 2022

Presented by: Deepika Upadhyay, Gaurav Sitlani, and Subham K Rai

Troubleshooting and Debugging in the rook-ceph cluster

This session shares an overview of troubleshooting a rook-ceph cluster, discussing some troubleshooting scenarios. This is followed by an introduction and an overview of kubectl-rook-ceph krew plugin, and how it makes managing containers easier from a troubleshooting perspective. We’ll also discuss future issues we’re planning to solve with it here, followed by a short future roadmap of the rook project. Later, we look forward to discussing and gathering feedback from the users about common and challenging problems they face while troubleshooting the clusters.
  • 4 participants
  • 42 minutes
troubleshooting
rooksip
rook
debugging
monitors
cluster
kotak
workflow
logs
prepared
youtube image

22 Nov 2022

Presented by: Nizamudeen A
Event: https://ceph.io/en/community/events/2022/ceph-virtual/
  • 4 participants
  • 32 minutes
dashboard
dashboards
advanced
manage
monitoring
safe
version
workflow
troubleshooting
sf
youtube image

15 Nov 2022

Presented by: Federico Lucifredi & Ana mcTaggart

Data Security and Storage Hardening In Rook and Ceph

We explore the security model exposed by Rook with Ceph, the leading software-defined storage platform of the Open Source world. Digging increasingly deeper in the stack, we examine hardening options for Ceph storage appropriate for a variety of threat profiles. Options include defining a threat model, limiting the blast radius of an attack by implementing separate security zones, the use of encryption at rest and in-flight and FIPS 140-2 validated ciphers, hardened builds and default configuration, as well as user access controls and key management. Data retention and secure deletion are also addressed. The very process of containerization creates additional security benefits with lightweight separation of domains. Rook makes the process of applying hardening options easier, as this becomes a matter of simply modifying a .yaml file with the appropriate security context upon creation, making it a snap to apply the standard hardening options of Ceph to a container-based storage system.
  • 5 participants
  • 43 minutes
administrator
hosts
hacking
mike
protocols
security
chef
staff
linux
acknowledgment
youtube image

15 Nov 2022

Presented by: Gabryel Mason-Williams

DisTRaC: Accelerating High-Performance Compute Processing for Temporary Data Storage

There is a growing desire within scientific and research communities to start using object stores to store and process their data in high performance (HPC) clusters. However, object stores are not necessarily designed for performance and are better suited for long term storage. Therefore, users often use a High-Performance File system when processing data. However, network filesystems have issues where one user could potentially thrash the network and affect the performance of everyone else's data processing jobs in the cluster. This talk presents a solution to this problem DisTRaC - (Dis)tributed (T)raisent (Ra)m (C)eph. DisTRaC offers a solution to this problem by providing a method for users to deploy Ceph onto their HPC clusters using RAM. Their intermediate data processing can now be done in RAM, taking the pressure off the networked filesystem by using the node interconnect to transfer data. In addition, all the data is localized, creating a hyper-converged HPC cluster for the duration of the job. DisTRaC reduces the I/O overhead of the networked filesystem and offers a potential data processing performance increase.

Learn more about DisTRaC: https://github.com/rosalindfranklininstitute/DisTRaC
Event: https://ceph.io/en/community/events/2022/ceph-virtual/
  • 2 participants
  • 16 minutes
distract
institute
research
project
mechanism
facility
maintainable
informatics
workflow
hpc
youtube image

15 Nov 2022

Presented by: Josh Salomon & Laura Flores

New workload balancer in Ceph

One of the new features in the Quincy release is the introduction of a new workload balancer (aka primary balancer). While capacity balancing exists and works well since the introduction of the upmap balancer, the issue of primary balancing in order to even the load on all the OSDs was never handled. This proves to be a performance problem, especially in small clusters and in pools with less PGs. In this presentation we will discuss the difference (and sometimes the contradiction) between capacity balancing and workload balancing, explain what we did for Quincy, and outline future plans to further improve the Ceph balancing process.
  • 3 participants
  • 55 minutes
balancer
balancers
rebalancer
capacity
workloads
operation
storage
important
reads
upstream
youtube image

15 Nov 2022

Presented by: Chunsong Feng

Optimize Ceph messenger Performance

1. The NIC SR-IOV is used. Each OSD uses an exclusive VF NIC. 2. The DPDK interrupt mode is added. 3. The single-CPU core and multiple NIC queues are implemented to improve performance. 4. The admin socket command is added to obtain the NIC status, collect statistics, and locate faults. 5. Adjust the CEPH throttling parameters, TCP, and DPDK packet sending and receiving buffer sizes to prevent packet loss and retransmission. 6. The Crimson message component uses the Seastar DPDK.
  • 1 participant
  • 21 minutes
optimize
efficient
optimal
tcpm
compression
rdm
maintenance
tester
ost
dps
youtube image

15 Nov 2022

Presented by: Daniel Gryniewicz

RGW Zipper

RGW was developed to provide object access (S3/Swift) to a Ceph cluster. The Zipper abstraction API divides the RGW into an upper half containing the Operations (Ops) for S3 and Swift, and a lower half, called a Store, containing the details of how to store data and metadata. This allows the same Ops code to provide correct S3 and Swift semantics via a variety of storage platforms. The primary Store is the current RadosStore, which provides access to a Ceph cluster via RADOS. However, new Stores are possible that store the data in any desired platform. One such Store, called DBStore, has been developed that stores data in SQL, and specifically in a local SQLite database. Additional Stores, such as S3, are planned to provide additional flexibility. Zipper also allows intermediate Filter layers that can transform Ops, perform policy (such as directing different objects to different Stores), or perform caching for data and metadata. The first planned Filter is a LuaFilter, which will allow rapid prototyping and testing of other filters. An individual instance of RGW will consist of a stack of Filters, along with one or more Stores providing actual data. This presentation will cover information about Zipper, about the existing DBStore, and plans for the future.
  • 1 participant
  • 14 minutes
storage
s3
rs3
zipper
rgw
apps
cache
protocol
compressed
unzipping
youtube image

15 Nov 2022

Presented by: Satoru Takeuchi

Revealing BlueStore Corruption Bugs in Containerized Ceph Clusters

Cybozu has been running and testing their Rook/Ceph clusters for two years. During this time, they have suffered from a bunch of BlueStore corruption (e.g. #51034 and #53184). Most corruptions happened just after OSD creation or on restarting OSDs. They have been able to detect these problems because the nodes in their clusters are restarted frequently and lots of OSD creation happens for each integration test. These scenarios are not so popular in traditional Ceph clusters but are common in containerized Ceph clusters. They will share what the known problems are in detail and how they have overcome these problems with the Ceph community. In addition, they will also propose improvements to the QA process to prevent similar problems in the future.
  • 1 participant
  • 24 minutes
cyborg
cyborgs
cyber
containers
virtualization
infrastructures
kubernetes
machines
modern
firmware
youtube image

14 Nov 2022

Presented by: Ziye Yang

Accelerating PMEM Device operations in bluestore with hardware based memory offloading technique

With more and more fast devices (especially persistent memory) equipped in the data centers, there is great pressure on the CPU to drive those devices (e.g., Intel Optane DC persistent memory) for persistency purposes under heavy workloads. Because there is no DMA-related capability provided by persistent memory compared with those HDDs and SSDs. And the same issue also exists in Ceph while using persistent memory. We would like to address such pain points by leveraging memory offloading devices (e.g., DSA). So generally in this talk, we will talk: 1) While persistent memory integration is not very successful in Ceph due to the high CPU overhead while performing I/O operations on persistency device; 2) We introduce the memory offloading devices (e.g., DSA) in order to offload the CPU pressure while doing I/Os; 3) We will describe the main change in pmem device (i.e., src/blk/pmemdevice.cc) and state how we can achieve the offloading including the challenges. 4) We would like to have some early performance results if Intel's SPR platform is available to the public.
  • 1 participant
  • 28 minutes
cpus
computing
hdm
memory
virtual
devices
ssds
configuration
io80
bottleneck
youtube image

14 Nov 2022

Presented by: Gal Salomon

S3select: Computational Storage in S3

S3 Select is an S3 operation (introduced by Amazon in 2018) that implements a pushdown paradigm that pulls out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3. The talk will introduce s3select operation and architecture. It will describe what the pushdown technique is, why, and where it is beneficial for the user. It will cover s3select supported features and their integration with analytic applications. It will discuss the main differences between columnar and non-columnar formats (CSV vs Parquet). We’ll also discuss recent developments for ceph/s3select. The presentation will show how easy it is to use ceph/s3select.
  • 1 participant
  • 30 minutes
push
pushing
s3
ssx
sd
efficient
workflow
querying
allocation
manipulating
youtube image

14 Nov 2022

Presented by: Yingxin Cheng

Understanding SeaStore through profiling

SeaStore is the new ObjectStore designed to complement Crimson OSD to support a new generation of storage interfaces/technologies (NVMe, ZNS, Persistent Memory, etc). As SeaStore matures, profiling becomes increasingly critical to understand the comprehensive performance impact of design choices and to set direction moving forward as the backend moves to the mainstream. Profiling infrastructure will also aid new contributors to understand the inner workings of SeaStore. In this session, we will talk about SeaStore support for performance profiling, optimizations made based on the initial analysis, the current status or gaps vs BlueStore along with performance data.

Event link: https://ceph.io/en/community/events/2022/ceph-virtual/
  • 3 participants
  • 28 minutes
profiling
advanced
throughput
implementation
performance
analysis
crimson
monitoring
cpus
inefficient
youtube image

10 Nov 2022

Event: https://ceph.io/en/community/events/2022/ceph-virtual/
Presented by: Matt Vandermeulen

How we operate Ceph at scale

As clusters grow in both size and quantity, operator effort should not grow at the same pace. In this talk, Matt Vandermeulen will discuss strategies and challenges for operating clusters of varying sizes in a rapidly growing environment for both RBD and object storage workloads based on DigitalOcean's experiences
  • 1 participant
  • 24 minutes
servers
deployments
capacity
storage
digitalocean
dbas
centralized
systems
workflow
cloud
youtube image

10 Nov 2022

Presented by: Curt Burns and Anthony D'Atri

Optimizing RGW Object Storage Mixed Media through Storage Classes and Lua Scripting

Ceph enables flexible and scalable object storage of unstructured data for a wide variety of workloads. RGW (RADOS GateWay) deployments experience a wide variety of object sizes and must balance workload, cost, and performance requirements. S3 storage classes are an established way to steer data onto underlying media that meet specific resilience, cost, and performance requirements. One might for example define RGW back end storage classes for SSD or HDD media, non-redundant vs replicated vs erasure coding pools, etc. Diversion of individual objects or entire buckets into a non-default storage class usually requires specific client action. Compliance however can be awkward to request and impossible to enforce, especially in multi-tenant deployments that may include paying customers as well as internal users. This work enables the RGW back end to enforce storage class on uploaded objects based on specific criteria without requiring client actions. For example one might define a default storage class on performance TLC or Optane media for resource-intensive small S3 objects while assigning larger objects to Matt Vandermeulendense and cost-effective QLC SSD media.
  • 1 participant
  • 23 minutes
cluster
capacity
workloads
servers
bottlenecks
gateway
optimizing
iops
ai
disk
youtube image

10 Nov 2022

Presented by: Samuel Just

What's new with Crimson and Seastore?

Next generation storage devices require a change in strategy, so the community has been developing Crimson, an eventual replacement for ceph-osd intended to minimize cpu overhead and improve throughput and latency. Seastore is a new backing store for crimson-osd targeted at emerging storage technologies including persistent memory and ZNS devices. This talk will explain recent developments in the Crimson project and Seastore.

Event: https://ceph.io/en/community/events/2022/ceph-virtual/
  • 1 participant
  • 26 minutes
iop
throughput
cpu
iops
workloads
threading
efficiency
monitor
parallel
cores
youtube image

9 Nov 2022

Join us November 3-16th for different Ceph presentation topics! https://ceph.io/en/community/events/2022/ceph-virtual/

Ceph Crash Telemetry - Observability in Action

To increase product observability and robustness, Ceph’s telemetry module allows users to automatically report anonymized crash dumps. Ceph’s telemetry backend runs tools that detect similarities among these reported crash events, then feed them to Ceph’s bug tracking system. In this session we will explore Ceph crash telemetry end-to-end, and how it helps the developer community to detect emerging and frequent issues encountered by production systems in the wild. We will share our insights so far, and learn how users benefit from this module, and how they can contribute.
  • 2 participants
  • 31 minutes
dashboard
cluster
telemetry
self
bot
triage
device
access
privacy
deploying
youtube image

3 Nov 2016

Join us November 3-16th for different Ceph presentation topics! https://ceph.io/en/community/events/2022/ceph-virtual/
  • 4 participants
  • 34 minutes
stuff
research
enhancements
milestones
initiative
significantly
monitoring
squidward
seth
quincy
youtube image