Cloud Native Computing Foundation Observability Practioners Summit 2019 (San Diego) Open Meetings

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

A Picture is Worth 1,000 Traces - Steve Flanders, Splunk & Yuri Shkuro, Uber Technologies

Distributed tracing has emerged as the go-to solution for understanding what’s going on in the ever-changing cloud native architectures. A single trace can reveal many things: network latencies, time spent in databases, a service spinning idly, etc. but finding the right trace among billions that demonstrates a problem in a large distributed application is very hard. By looking at traces in aggregate, we can eliminate the need to state and validate hypotheses and instead answers start to emerge naturally. Especially when we use creative visualizations that put our visual cortex to work without overloading it with useless information. This talk will present the power of aggregate analysis of distributed traces by highlight its applications beyond performance troubleshooting.

2 participants
32 minutes

microservices

microservice

ubers

architectures

providers

backends

deployments

complexity

devops

bottleneck

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Wrap-up

9 participants
28 minutes

discussions

researchers

conferences

university

having

contribution

thanks

expectations

70

ted

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Dynatrace Sponsored Session - Observability Where are We Headed? - Alois Reitbauer, Dynatrace

Observability is helping us to move beyond the traditional paradigm of monitoring. Companies are looking for more answers and gathering more data than traditional alerting can provide. On our journey we learned that simply having more data has just as many challenges. Ultimately, what you do with those insights from the data provides the value. Let’s look at some of these challenges and how they can be addressed.

1 participant
13 minutes

monitoring

observe

observability

observer

monitors

important

aware

analyzing

automation

assistance

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

LeitMotif: An Abstraction for Debugging Distributed Applications - Mania Abdi, Northeastern University

Abstractions, such as APIs, allow developers to build complex distributed applications out of smaller building blocks. In contrast, there are very few abstractions available to limit the amount of complexity engineers must deal with when diagnosing problems in production applications. This mismatch means that diagnosis will continue to become more challenging as systems continue to scale. We present the workflow motif abstraction, instantiations of which capture frequent or important processing patterns observed in the workflow of requests. We argue that use of motifs can make existing diagnosis techniques more powerful and enable new use cases. We discuss features needed from distributed tracing infrastructures to generate useful motifs, progress on modifying frequent-subgraph mining algorithms to identify motifs from traces, and initial experiences using motifs to debug problems.

4 participants
26 minutes

abstractions

abstraction

debugging

debug

implementation

workflow

distributed

complexity

systems

virtualization

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

LightStep Sponsored Session: Observability for Deep Systems - Spoons (aka Daniel Spoonhower), LightStep

Software architectures have evolved: applications are not just getting bigger but scaling deeper. Observabiliity tools must adapt to this new environment or leave developers with lots of responsibility but little control. I'll describe deep systems and where they came from as well as the opportunity that they have created for observability practitioners. All this in only 10 minutes!

1 participant
9 minutes

observability

observabillity

microservices

scale

understanding

important

huge

efficient

platforms

spoonhour

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

New Relic Sponsored Session - Mike Panchenko, New Relic

At New Relic, we’re going all in with Kubernetes. That doesn’t just mean delivering features to customers that allow them to observe and monitor Kubernetes, but also, embracing Kubernetes as the defacto standard for orchestrating workloads running on the entire New Relic data platform.
This lightning talk will cover the trials and tribulations of planning and migrating New Relic’s massively scaled distributed database (a database that processes up to 1.5 billion data points a minute) to Kubernetes. You’ll learn how monitoring and observability as well as the tooling created for spreading our workloads out over many heterogeneous clusters has been critical for the success of the migration thus far. We will also share our perspective about what to expect and be prepared for in the future of this fast-growing space.

1 participant
10 minutes

observability

functionality

infrastructure

platforms

technical

relic

ization

kubernetes

information

manager

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Pythia: An Automated, Cross-layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications - Emre Ates, Boston University

It is extremely difficult to understand where to enable instrumentation a priori to help diagnose problems that may occur in the future. We present Pythia, an automated cross-layer instrumentation framework, which explores the space of possible instrumentation choices and enables instrumentation needed to diagnose a newly-observed problem in production systems. Pythia builds on distributed tracing and uses statistical techniques to identify where instrumentation is needed. This talk will discuss 1) the scalable design of Pythia 2) our progress on identifying promising data structures to represent the instrumentation search space across multiple data center stack layers (e.g., application and kernel). These structures must trade-off between compactness, exhaustiveness, and accuracy. 3) Creating algorithms to search this space quickly while staying under a specific instrumentation budget.

3 participants
24 minutes

debugging

debug

workflow

implemented

automated

instrumentation

distributed

infrastructure

problematic

leveraged

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Real-Time Application Maps for Proactive and Actionable Visibility - Aloke Guha, OpsCruise

Today’s observability provides volumes of time-series data, statistical trends including anomaly detection and correlational analyses. We argue that operations teams need an integrated and cohesive understanding of the application that maps interdependencies across microservices and dependencies on the orchestration and infrastructure services. We show that beyond metrics, logs, and traces, capturing configuration information are necessary for creating a complete application maps for gaining deeper insights into the application behavior. In addition, establishing a standard approach capture the attributes of the complete application environment will enable automated detection and causal analysis of application problems. We will present some early findings on building real-time actionable application maps for cloud applications.

5 participants
26 minutes

ops

enterprise

capabilities

observability

challenges

visibility

proactive

deploy

entities

important

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk

"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?

At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."

3 participants
30 minutes

reliability

reliable

zendesk

monitoring

initiative

experts

000

observability

sres

budget

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

SlackTrace: A New Tracing Tool - Suman Karumuri, Slack

Trace data contains very rich information about a request execution. However, current tracing tools only expose that information as a trace view or a service graph, which severely limits the questions we can ask of trace data and diminishes the utility of tracing. However, from past experience, we found that these limitations arise because unlike logs or metrics, we can’t query raw trace data.

To query raw trace data easily, we designed a new span format called SpanEvent and built our tracing infrastructure called SlackTrace around it. In addition, to presenting the trace data as a trace view and a service graph, the SpanEvent format allows us to query raw span data using SQL queries which allows us to derive rich insights from trace data that is not possible with existing tracing systems. In this talk, I will present SpanEvent format and an overview of our SlackTrace infrastructure.

6 participants
34 minutes

tracing

hosts

monitoring

twitter

project

annotation

log

endpoint

intuitive

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Testing in A Distributed Systems World - Fernando Mayo, Undefined Labs

While microservices are becoming the norm due to advancements in development, deployment and monitoring techniques in the last few years, we are still using the same testing methodologies we used for monolithic apps. In this talk, we look at how distributed tracing can be applied to testing modern, distributed applications, from unit to end-to-end tests, to continuously give developers invaluable insight on how entire applications behave, and when and why they fail, before they are deployed to production. We'll also discuss the power of distributed context propagation and how it can be leveraged for testing purposes, from safely testing in production to failure injection.

5 participants
30 minutes

testing

tests

debugging

troubleshooting

monitoring

analyzer

assess

labs

process

risks

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Tracing is for Everyone, Not Just Backend Engineers. (How Tracing Could Help Front-end Engineers to Build a Better UX) - Nina Stawski, Omnition

There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The usecases involved are usually focused on backend engineers and DevOps. But what about us front-end engineers? We also want to know how things work. More often than not, we get blamed first when something breaks, and it is important to understand the whole application, not just the front-end.

Currently, observability is not the top concern for front-end engineers, and I will show why it should be. In many cases, even if the application speed cannot be changed significantly, you can apply little tricks and add microinteractions to improve the UX. Besides, emerging tooling in OpenCensus and OpenTelemetry is easy to configure, enriches the existing data and helps developers to correlate traces between backend and UI.

4 participants
22 minutes

backends

services

insurance

splunk

performance

analytics

fronting

startup

conference

micro

29 Nov 2019

Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

When Connections are Magic: Understanding Performance in Serverless - James Burns, LightStep

Observability! Cloud Functions! APIs! What could go wrong?! While researching the performance of object storage APIs there appeared to be custom run time magic happening leading to significant performance differences. Further research showed that it was *not magic* but lead to even more questions.

Working with modern systems means network connections, many of them. Understanding how those connections impact your customer's experience can be difficult. Distributed tracing helps isolate what parts of the system are failing, but when only implemented at the RPC level the reasons for and scope of network induced issues can be lost. See how network level insights can be integrated into distributed traces and hear how to effective practice iterative observability from the specific case of this research to a general framework for investigation.

5 participants
24 minutes

cloud

throughput

api

performance

services

observability

aws

backend

public

data

Cloud Native Computing Foundation / Observability Practioners Summit 2019 (San Diego)

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019

29 Nov 2019