29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A Picture is Worth 1,000 Traces - Steve Flanders, Splunk & Yuri Shkuro, Uber Technologies
Distributed tracing has emerged as the go-to solution for understanding what’s going on in the ever-changing cloud native architectures. A single trace can reveal many things: network latencies, time spent in databases, a service spinning idly, etc. but finding the right trace among billions that demonstrates a problem in a large distributed application is very hard. By looking at traces in aggregate, we can eliminate the need to state and validate hypotheses and instead answers start to emerge naturally. Especially when we use creative visualizations that put our visual cortex to work without overloading it with useless information. This talk will present the power of aggregate analysis of distributed traces by highlight its applications beyond performance troubleshooting.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A Picture is Worth 1,000 Traces - Steve Flanders, Splunk & Yuri Shkuro, Uber Technologies
Distributed tracing has emerged as the go-to solution for understanding what’s going on in the ever-changing cloud native architectures. A single trace can reveal many things: network latencies, time spent in databases, a service spinning idly, etc. but finding the right trace among billions that demonstrates a problem in a large distributed application is very hard. By looking at traces in aggregate, we can eliminate the need to state and validate hypotheses and instead answers start to emerge naturally. Especially when we use creative visualizations that put our visual cortex to work without overloading it with useless information. This talk will present the power of aggregate analysis of distributed traces by highlight its applications beyond performance troubleshooting.
- 2 participants
- 32 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Wrap-up
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Wrap-up
- 9 participants
- 28 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Dynatrace Sponsored Session - Observability Where are We Headed? - Alois Reitbauer, Dynatrace
Observability is helping us to move beyond the traditional paradigm of monitoring. Companies are looking for more answers and gathering more data than traditional alerting can provide. On our journey we learned that simply having more data has just as many challenges. Ultimately, what you do with those insights from the data provides the value. Let’s look at some of these challenges and how they can be addressed.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Dynatrace Sponsored Session - Observability Where are We Headed? - Alois Reitbauer, Dynatrace
Observability is helping us to move beyond the traditional paradigm of monitoring. Companies are looking for more answers and gathering more data than traditional alerting can provide. On our journey we learned that simply having more data has just as many challenges. Ultimately, what you do with those insights from the data provides the value. Let’s look at some of these challenges and how they can be addressed.
- 1 participant
- 13 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
LeitMotif: An Abstraction for Debugging Distributed Applications - Mania Abdi, Northeastern University
Abstractions, such as APIs, allow developers to build complex distributed applications out of smaller building blocks. In contrast, there are very few abstractions available to limit the amount of complexity engineers must deal with when diagnosing problems in production applications. This mismatch means that diagnosis will continue to become more challenging as systems continue to scale. We present the workflow motif abstraction, instantiations of which capture frequent or important processing patterns observed in the workflow of requests. We argue that use of motifs can make existing diagnosis techniques more powerful and enable new use cases. We discuss features needed from distributed tracing infrastructures to generate useful motifs, progress on modifying frequent-subgraph mining algorithms to identify motifs from traces, and initial experiences using motifs to debug problems.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
LeitMotif: An Abstraction for Debugging Distributed Applications - Mania Abdi, Northeastern University
Abstractions, such as APIs, allow developers to build complex distributed applications out of smaller building blocks. In contrast, there are very few abstractions available to limit the amount of complexity engineers must deal with when diagnosing problems in production applications. This mismatch means that diagnosis will continue to become more challenging as systems continue to scale. We present the workflow motif abstraction, instantiations of which capture frequent or important processing patterns observed in the workflow of requests. We argue that use of motifs can make existing diagnosis techniques more powerful and enable new use cases. We discuss features needed from distributed tracing infrastructures to generate useful motifs, progress on modifying frequent-subgraph mining algorithms to identify motifs from traces, and initial experiences using motifs to debug problems.
- 4 participants
- 26 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
LightStep Sponsored Session: Observability for Deep Systems - Spoons (aka Daniel Spoonhower), LightStep
Software architectures have evolved: applications are not just getting bigger but scaling deeper. Observabiliity tools must adapt to this new environment or leave developers with lots of responsibility but little control. I'll describe deep systems and where they came from as well as the opportunity that they have created for observability practitioners. All this in only 10 minutes!
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
LightStep Sponsored Session: Observability for Deep Systems - Spoons (aka Daniel Spoonhower), LightStep
Software architectures have evolved: applications are not just getting bigger but scaling deeper. Observabiliity tools must adapt to this new environment or leave developers with lots of responsibility but little control. I'll describe deep systems and where they came from as well as the opportunity that they have created for observability practitioners. All this in only 10 minutes!
- 1 participant
- 9 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
New Relic Sponsored Session - Mike Panchenko, New Relic
At New Relic, we’re going all in with Kubernetes. That doesn’t just mean delivering features to customers that allow them to observe and monitor Kubernetes, but also, embracing Kubernetes as the defacto standard for orchestrating workloads running on the entire New Relic data platform.
This lightning talk will cover the trials and tribulations of planning and migrating New Relic’s massively scaled distributed database (a database that processes up to 1.5 billion data points a minute) to Kubernetes. You’ll learn how monitoring and observability as well as the tooling created for spreading our workloads out over many heterogeneous clusters has been critical for the success of the migration thus far. We will also share our perspective about what to expect and be prepared for in the future of this fast-growing space.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
New Relic Sponsored Session - Mike Panchenko, New Relic
At New Relic, we’re going all in with Kubernetes. That doesn’t just mean delivering features to customers that allow them to observe and monitor Kubernetes, but also, embracing Kubernetes as the defacto standard for orchestrating workloads running on the entire New Relic data platform.
This lightning talk will cover the trials and tribulations of planning and migrating New Relic’s massively scaled distributed database (a database that processes up to 1.5 billion data points a minute) to Kubernetes. You’ll learn how monitoring and observability as well as the tooling created for spreading our workloads out over many heterogeneous clusters has been critical for the success of the migration thus far. We will also share our perspective about what to expect and be prepared for in the future of this fast-growing space.
- 1 participant
- 10 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Pythia: An Automated, Cross-layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications - Emre Ates, Boston University
It is extremely difficult to understand where to enable instrumentation a priori to help diagnose problems that may occur in the future. We present Pythia, an automated cross-layer instrumentation framework, which explores the space of possible instrumentation choices and enables instrumentation needed to diagnose a newly-observed problem in production systems. Pythia builds on distributed tracing and uses statistical techniques to identify where instrumentation is needed. This talk will discuss 1) the scalable design of Pythia 2) our progress on identifying promising data structures to represent the instrumentation search space across multiple data center stack layers (e.g., application and kernel). These structures must trade-off between compactness, exhaustiveness, and accuracy. 3) Creating algorithms to search this space quickly while staying under a specific instrumentation budget.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Pythia: An Automated, Cross-layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications - Emre Ates, Boston University
It is extremely difficult to understand where to enable instrumentation a priori to help diagnose problems that may occur in the future. We present Pythia, an automated cross-layer instrumentation framework, which explores the space of possible instrumentation choices and enables instrumentation needed to diagnose a newly-observed problem in production systems. Pythia builds on distributed tracing and uses statistical techniques to identify where instrumentation is needed. This talk will discuss 1) the scalable design of Pythia 2) our progress on identifying promising data structures to represent the instrumentation search space across multiple data center stack layers (e.g., application and kernel). These structures must trade-off between compactness, exhaustiveness, and accuracy. 3) Creating algorithms to search this space quickly while staying under a specific instrumentation budget.
- 3 participants
- 24 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Real-Time Application Maps for Proactive and Actionable Visibility - Aloke Guha, OpsCruise
Today’s observability provides volumes of time-series data, statistical trends including anomaly detection and correlational analyses. We argue that operations teams need an integrated and cohesive understanding of the application that maps interdependencies across microservices and dependencies on the orchestration and infrastructure services. We show that beyond metrics, logs, and traces, capturing configuration information are necessary for creating a complete application maps for gaining deeper insights into the application behavior. In addition, establishing a standard approach capture the attributes of the complete application environment will enable automated detection and causal analysis of application problems. We will present some early findings on building real-time actionable application maps for cloud applications.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Real-Time Application Maps for Proactive and Actionable Visibility - Aloke Guha, OpsCruise
Today’s observability provides volumes of time-series data, statistical trends including anomaly detection and correlational analyses. We argue that operations teams need an integrated and cohesive understanding of the application that maps interdependencies across microservices and dependencies on the orchestration and infrastructure services. We show that beyond metrics, logs, and traces, capturing configuration information are necessary for creating a complete application maps for gaining deeper insights into the application behavior. In addition, establishing a standard approach capture the attributes of the complete application environment will enable automated detection and causal analysis of application problems. We will present some early findings on building real-time actionable application maps for cloud applications.
- 5 participants
- 26 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk
"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?
At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk
"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?
At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."
- 3 participants
- 30 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
SlackTrace: A New Tracing Tool - Suman Karumuri, Slack
Trace data contains very rich information about a request execution. However, current tracing tools only expose that information as a trace view or a service graph, which severely limits the questions we can ask of trace data and diminishes the utility of tracing. However, from past experience, we found that these limitations arise because unlike logs or metrics, we can’t query raw trace data.
To query raw trace data easily, we designed a new span format called SpanEvent and built our tracing infrastructure called SlackTrace around it. In addition, to presenting the trace data as a trace view and a service graph, the SpanEvent format allows us to query raw span data using SQL queries which allows us to derive rich insights from trace data that is not possible with existing tracing systems. In this talk, I will present SpanEvent format and an overview of our SlackTrace infrastructure.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
SlackTrace: A New Tracing Tool - Suman Karumuri, Slack
Trace data contains very rich information about a request execution. However, current tracing tools only expose that information as a trace view or a service graph, which severely limits the questions we can ask of trace data and diminishes the utility of tracing. However, from past experience, we found that these limitations arise because unlike logs or metrics, we can’t query raw trace data.
To query raw trace data easily, we designed a new span format called SpanEvent and built our tracing infrastructure called SlackTrace around it. In addition, to presenting the trace data as a trace view and a service graph, the SpanEvent format allows us to query raw span data using SQL queries which allows us to derive rich insights from trace data that is not possible with existing tracing systems. In this talk, I will present SpanEvent format and an overview of our SlackTrace infrastructure.
- 6 participants
- 34 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Testing in A Distributed Systems World - Fernando Mayo, Undefined Labs
While microservices are becoming the norm due to advancements in development, deployment and monitoring techniques in the last few years, we are still using the same testing methodologies we used for monolithic apps. In this talk, we look at how distributed tracing can be applied to testing modern, distributed applications, from unit to end-to-end tests, to continuously give developers invaluable insight on how entire applications behave, and when and why they fail, before they are deployed to production. We'll also discuss the power of distributed context propagation and how it can be leveraged for testing purposes, from safely testing in production to failure injection.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Testing in A Distributed Systems World - Fernando Mayo, Undefined Labs
While microservices are becoming the norm due to advancements in development, deployment and monitoring techniques in the last few years, we are still using the same testing methodologies we used for monolithic apps. In this talk, we look at how distributed tracing can be applied to testing modern, distributed applications, from unit to end-to-end tests, to continuously give developers invaluable insight on how entire applications behave, and when and why they fail, before they are deployed to production. We'll also discuss the power of distributed context propagation and how it can be leveraged for testing purposes, from safely testing in production to failure injection.
- 5 participants
- 30 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Tracing is for Everyone, Not Just Backend Engineers. (How Tracing Could Help Front-end Engineers to Build a Better UX) - Nina Stawski, Omnition
There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The usecases involved are usually focused on backend engineers and DevOps. But what about us front-end engineers? We also want to know how things work. More often than not, we get blamed first when something breaks, and it is important to understand the whole application, not just the front-end.
Currently, observability is not the top concern for front-end engineers, and I will show why it should be. In many cases, even if the application speed cannot be changed significantly, you can apply little tricks and add microinteractions to improve the UX. Besides, emerging tooling in OpenCensus and OpenTelemetry is easy to configure, enriches the existing data and helps developers to correlate traces between backend and UI.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
Tracing is for Everyone, Not Just Backend Engineers. (How Tracing Could Help Front-end Engineers to Build a Better UX) - Nina Stawski, Omnition
There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The usecases involved are usually focused on backend engineers and DevOps. But what about us front-end engineers? We also want to know how things work. More often than not, we get blamed first when something breaks, and it is important to understand the whole application, not just the front-end.
Currently, observability is not the top concern for front-end engineers, and I will show why it should be. In many cases, even if the application speed cannot be changed significantly, you can apply little tricks and add microinteractions to improve the UX. Besides, emerging tooling in OpenCensus and OpenTelemetry is easy to configure, enriches the existing data and helps developers to correlate traces between backend and UI.
- 4 participants
- 22 minutes
29 Nov 2019
Join us for Kubernetes Forums Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
When Connections are Magic: Understanding Performance in Serverless - James Burns, LightStep
Observability! Cloud Functions! APIs! What could go wrong?! While researching the performance of object storage APIs there appeared to be custom run time magic happening leading to significant performance differences. Further research showed that it was *not magic* but lead to even more questions.
Working with modern systems means network connections, many of them. Understanding how those connections impact your customer's experience can be difficult. Distributed tracing helps isolate what parts of the system are failing, but when only implemented at the RPC level the reasons for and scope of network induced issues can be lost. See how network level insights can be integrated into distributed traces and hear how to effective practice iterative observability from the specific case of this research to a general framework for investigation.
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
When Connections are Magic: Understanding Performance in Serverless - James Burns, LightStep
Observability! Cloud Functions! APIs! What could go wrong?! While researching the performance of object storage APIs there appeared to be custom run time magic happening leading to significant performance differences. Further research showed that it was *not magic* but lead to even more questions.
Working with modern systems means network connections, many of them. Understanding how those connections impact your customer's experience can be difficult. Distributed tracing helps isolate what parts of the system are failing, but when only implemented at the RPC level the reasons for and scope of network induced issues can be lost. See how network level insights can be integrated into distributed traces and hear how to effective practice iterative observability from the specific case of this research to a general framework for investigation.
- 5 participants
- 24 minutes