Dagster Community, 11 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Dagster Community Meeting Featuring Rohde and Schwarz | May 11, 2021

Description

In our seventh Dagster Community Meeting, Simon Späti from Rohde & Schwarz MNT discusses the production Dagster setup at R&S, and the core team presents our recent API "approachability" improvements, as well as upcoming 0.12.0 features.

👨‍🏫 Today's Agenda 👩‍🏫

Introduction: 0:00
Dagster at Rohde & Schwarz MNT: 1:26
Dagster Approachability Improvements: 26:40
Upcoming Dagster 0.12.0 Features: 35:30
Q&A: 39:25

🌟 Socials 🌟

Checkout our Github ➡️ https://github.com/dagster-io/dagster
Check out our Documentation ➡️ https://docs.dagster.io/
Join our Slack ➡️ http://dagster-slackin.herokuapp.com
Visit our Website ➡️ https://dagster.io/
Follow us on Twitter ➡️ https://twitter.com/dagsterio

A

Hey everybody. uh Thank you so much for coming to our monthly community meeting here at the action park team for those of you who are new here. We have these on the second tuesday of every month and it's just a space for the daxter community court, our dash record team in the community to get to know each other present about what they're working on and, as for us, to communicate new ideas to you. um So today we have four or three presentations on the agenda.

A

First up we'll have simon from rorden schwartz who will give a presentation about how rhoden schwartz using doctrine production next we'll have sandy from the dax record team present a set of approachability improvements, we've made in the last release that make it easier to use daxter and finally, sandy will also present what's coming up in banks for 0.120, which are planning to release at the beginning of july, and finally, we'll have some time today for a q. A so please repair any questions.

A

You have that you'd like to ask our team um without any further ado, I'd love to invite simon to present about how he's using baxter.

A

Thank you very much.

B

So I hope you can see the slides, okay, okay, hi and welcome. Thank you very much for having me my name is simon spatty. um I have the pleasure to show you today uh how we use dexter at rhodes, schwarz mobile network testing a little bit about myself. I'm a data engineer.

B

I I'm an author of my little blog, where I write also about data engineering stuff and I'm an early user of daxter wrote schwartz has is a big company. We have over 12 000 employees around the world, we specialized in electronic test equipment, broadcast and media cyber security, uh radio, monitoring and all kind of radio communications.

B

I myself I'm located in switzerland, where we work at a product called smart analytics. This is a it's actionable benchmarking, so you can compare different network providers to each other. um You have kind of a business intelligence overview to to really see the mobile network testing quality. So you can see um how is my network doing compared to my competitors? uh What where can I improve? And uh where is my bottlenecks in my network and as mentioned my blog, where I write about the data ecosystem.

B

To go a little bit more in detail what we do so our main goal is to provide the tools for our customer to improve the quality and performance of the mobile network. So on the downside you can see the car. So this is a mobile network testing in africa. So there is not always a wells uh streets, but we have a equipment that you can see on top of this car.

B

There is actually a we put all of our smartphones in there, so we have a qualified which our software is is uh running on and you can either put that in a car or you can also put it in a backpack. So if you would like to measure the network quality in, for example like in a metro or in a football stadium or so then you can use the backpack put in the smartphones and then configure what kind of tests you would like to do.

B

So we we can configure like youtube tests, we can the phones, they will call each other, uh they will call um whatsapp calls they will upload to youtube, so they will all kind. They will do all kind of different uh test scenarios that uh hopefully the real user will also do and then, at the end of the day, you get a measurement file and then that gets uploaded in our smart analytics in uh in our smart analytics.

B

We have like a time series and statistical. So if you look at the architecture, this is actually the testing going on. So during the day, several cars, several backpacks are collecting this measurement files and then typically during night or in the evening, when you finish measuring uh you uh upload it to our data warehouse where our custom etl is running. So this is uh where all the the magic happens where the the data is is transformed into fact and dimensions.

B

We clean the data, we transform them, we aggregate and then we we put them into the sql server for the time series analytics. So if you really want to drill down into the let's say in the protocol level, what what happened when a call has dropped? Why is that dropped? uh So that's where you use the sql server or also, if you want to plot the the points so the measurement points on the map, you would typically use the sql server and then we we use the analysis service where we. uh This is our cube.

B

So for statistical reasons there, you can have overall statistics how your network is doing compared to your provider competitors. What tests you have actually measured, how long and yeah all kind of statistical uh measurements um yeah and then the motivation for us to for using daxter.

B

So we wanted to bring this detail into the cloud and we would like to manage all this logic or this uh ether logic in a center place and also being a big data ready with in the cloud, as you know, there's a lot of open source tools which you need to mingle together and there we just want to be ready to to connect all of these tools and also be have like a state-of-the-art tool. So I said we from on premise: we went to the cloud.

B

We also want to scale out. So at the moment we have sql server. That can only scale up, meaning we need to buy a lot of expensive hardware. In case we have a lot of data which costs a lot, especially if it's, if you have idle time so you pay the cost anyway. So and in the cloud we want to use more cheaper mushy machines and only scale up if we really need the resources, so we can really also scale down and save money.

B

When we we don't use any kind of detail or only limited amount of queries and then which generally we want to also overcome the intel processing, because that took longer and longer, especially for our big customers, they're collecting, more data and also with the 5g and 6 g in the pipeline.

B

There is coming more and more data which we need to be ready for and also query time. We have a lot of uh percentile uh queries so or medians.

B

So in that matter you really need to sort all the data to find your median and that, of course, you know when the bigger the data is the longer it takes, and there we really got on the limit for bigger datas.

B

We also have a web ui as you've seen before, and there we really need seconds response time, so the user has. They want to have this dashboard kind of feeling where you really drill and slice on the fly, and they don't have time to wait. So we need a really fast response time.

B

um That's then, how we we we look at our architecture today, so we we use the browser for accessing our smart analytics. uh Smart analytics is then built it's an in-house build tool with no js angular and c-sharp back-end.

B

We also have other query types like tableau, so users or our internal users can also use tableau for querying the data directly.

B

We have jupyter notebooks to do ad hoc analytics or machine learning models and then the measurement files, the ones that we've talked about before they get uploaded directly in our s3 storage, and there we ingest into our data warehouse which is through it apache druid, is our replacement for our cubes, so it is well suited for us because it has a sub seconds response time also on large data sets, and it also the architecture is made that you can really differentiate the ingestion to the query time.

B

So, even though we would load a lot of data, the query time is not interfered at all, so this is makes it a really really good choice for us.

B

We also use spark for processing uh the data and also creating delta tables. We have general service in our road and charts cloud. We use kubernetes for that and then there we get like monitor logging, um a resource scaling we get out of the box and then, of course, that the heart is is stacked, where we really put all our uh code into so the import pipeline is, is mainly based on eventing, so we use a lot of event, event driven pipelines.

B

So if the customer uploads files it gets triggered pipelines and then from there, we use other sensors or events that take it to the next pipeline.

B

We also also have a so-called adoption layer where we do the data processing. Let's say if we have a version upgrade or we we do other calculation or machine learn, learning models that that's done in this. This bucket here.

B

As mentioned, the sensors is a big part of it, so if you have never used sensors in in daxter, this is this is how it looks on the packet ui. So it's very uh nicely done that you can really see all the dots.

B

That's actually when you trigger a new sensor, so there you can say how many off or how often it should pull and then it would check some kind of python logic and then yeah, we'll start some job, so you can see which got spawned and then you can really click all the pipelines and see which pipeline has been started.

B

So this is, uh from the ui perspective, a very nice way, but also, if you, if you implement this, so the only thing you need to do is actually to to put an annotation at sensor and name your pipeline. You would like to start and then for us, it's very handy. We use uh we use s3, mostly with sensors, and there we can build in some glue logic. So we need to redefine which files we would like to read.

B

You could also add more logic, and then we just spawn pipelines out of each file. So this is a nice way to add a little bit glue code there and and to to spawn your pipelines event base driven.

B

This is, let's say our import pipeline. Just to give you an overview how this looks so we don't have like an overall pipeline, as mentioned before. If we upload our zipped files sensor get triggered, it will uh like parallelize per file that get uploaded all of the unzipped files get uploaded again, and then we take a next sensor that takes each of this file immediately and run the atl pipeline on it.

B

So we really try to scale as much as possible and that's also why we use sensors uh because we have different granularity between these uh sensors or pipelines. This uh etl pipeline will then uh do all the the logic which we had before. So we have our facts and dimension coming out here as a parque file, and then this gets based on fact, tables gets ingested into or created as a delta table on our s3 storage, and then we also with python ingested to do it.

B

So we have a very nice flexibility that we have machine, learning, use cases other experimental stuff. We can really use the delta tables. We can also do joins and more sophisticated things, whereas in druid we really use for our sub second response time for duplication.

B

This is the pipeline part, but we actually also use tax for provisioning. So whenever we have a new user or we have a new tenant in our cluster, we we use taxi to create the buckets to upload the secrets to our vault. We create the druid account so uh the more we use dax there. We also find new use cases that we can actually integrate with it. So it's not just for data pipelines.

B

That is it which is very powerful for us, but also for for other kind of administrative tasks that we would like to automate and exactly, and then we also use assets. So assets is the part where you link your data to the computation.

B

So it's also maybe a rather new newer feature, but uh it's uh very well suited for uh yeah. If you want to add metadata on or you also want to show your persistent data to your customers, so we played around with the etl, so we added some size or durations uh to it, and then we could immediately see if there was a spike in some retail jobs or then we could. Actually you can then on the ui.

B

You can also click on that uh point and really trick build drill down to the actual pipeline to see which file it was and what happened there. So this is a very nice feature for us and we would also like to to add more so we we try to create assets for all our persistent three tables and also for our delta tables and in the future.

B

We also like to try the data lineage that just got added to daxter, because we have this uh event driven pipeline, that from the zipped files to to the actual druid fact, data table.

B

Sometimes we we have some wrong data and then it would be nice to see where this data is actually coming from and there we were hoping to use the data lineage feature to actually document it and also make it available for all our customers and also for us as engineers.

B

um If you haven't used assets, it's actually quite easy. The only thing you need to do if you have your solid. This is the one part of your pipeline. Besides yielding your output, you just yield another asset materialization, and this is actually exactly the. What we do here for to get the output that we just saw before so we just just add a key which is the unique indentifier. So here we are still playing around.

B

If we want to have it on each fact level or we would like even to have it a little bit higher level and then dax is like dividing it. But here you can add descriptions and then all the metadata which we have seen before.

B

So uh so far we use it. Let's say for uh quite a bit and the advantages for us using daxter is in the first place also that we could replace our custom-based uh processing engine that we used on premise. But now we have massive out of the box features, so we just have restart capabilities. We have backfill, we have dependency management um yeah, we see. What's top, we have the ui, so we have a different mode, so you can just switch from production to local.

B

Just you have this modes that you can just switch, so all of these features are tested and stable, so we just get them out of the box. So this was a nice.

B

Nice implication for us when we started using daxter, the beautiful jacket or the ui is very nicely done, and it shows exactly what you need to know. It's you see always the state of each pipeline. What is the current jobs running and you get rich metadata? That's also a big plus for us that we can really add, also customize metadata, to show make available in the yeah in the ui as well easily without doing too much.

B

The problem solving is also a big thing for us before we used other methods. Sometimes it was actually hard to find the error itself, so we had to go through pipelines or we had to really find the error, and sometimes we couldn't even find it so with dax. You really have the error straight in your face, so you really.

B

You can really start solving errors because- uh and in case you missed some data or you don't know why the error happened. You just add more meta information, so, at the end of the day, you just have an easy way or always have all meta information to solve this box.

B

We also have uses uh developers that didn't come from the data sphere and they have told me that it's easy for them to grasp the concepts, so all the concepts with the resources and the solids and the pipelines it made sense to them. So we could also have quite fast speed up with new developers uh using daxter and it's actually very pleasant to write pipelines.

B

When you have a such a nice framework around you with uh yeah, there's a lot of thought in it that you don't need to think about it's just built in and then everything is self-documented. So you don't need to actually uh like draw diagrams anymore, because the pipelines index that are self-documented, you can really see each steps.

B

uh You can even put sql statements in inside there to really see what's actually going on with that assets. As mentioned and uh yeah, you you're spending less time actually explaining what's going on, but you can really uh discuss business uh transformation and logic. What needs to be done?

B

The one of the biggest thing for me was the reusable uh of code. So once we had the before, we had microservices in python, so first things we could really easily moved them to daxter, so we had classes and then we just added them to resources, and then we already had them inside taxes, so that was a very elegant way to to use access existing code.

B

So for us it was in no time we could move the stuff, but also the resources are specif specifically helpful for us, because you write the code once and then it's tested and you can use it everywhere in in the solids or in all the pipelines. So you don't duplicate your code.

B

um It also reduced the boilerplate code for us, because when you have microservices, you tend to integrate the locking, the restart and so on these kind of things you always need, but you always re, implement them normally, when you have different microservices, but with stacks you you do that once or you you even get it already out of the box or you build it yourself, but then you do it once, and that was very good for us and it's also functional, but by design.

B

So when you use taxi, all the pipelines are actually built, functional which also reduces management or ease up the maintenance of code or restartability, and all of this comes just out of the box just by using daxter.

B

Here is a if an example of how we use resources. So, for example, we have a druid resource. We took our classes that was already there to three connector with some methods that we use and then you just add the annotation, add resources, and then you give it all the passwords and usernames it needs.

B

And then you do that exactly once and then you can use it in your pipelines just by specifying the resource, and then you can access this with the context. So the context is very powerful, so all resources that you specify you can just uh with context resource dot through it. You have access to all this uh yep resourced method. So, besides that, the context also has additional functions that you have, for example, the run id.

B

So you really get a powerful tool there and you don't need to fiddle around with passwords and the usernames, because this you do once in the resource and then you're ready to to actually do the pipeline logic, which is a yeah. It's also another very nice feature of daxter.

B

When we started there wasn't yet the kubernetes deployment, but just uh when we were about to start kubernetes, actually there was, and so we we are having using kubernetes. uh It's also very handy for us, because we have a sql server on linux uh in one deployment and then another one we use spark.

B

So it's easy to separate them with the user codes and we can also scale out easily with new ports, and it makes things very easy for us and also understandable. What's going on and uh yeah, that's also a big plus for us, it's piston based, so the language of the data nowadays is uh is titan. So it's easy to learn and to adapt for engineers, and it also supports sql statements.

B

So you can have your solid that takes sql statements and then you can easily integrate that as well or it has also powerful extensions like dpt and other things that you easily integrate with it so and yeah. That's this and the next step, so we're not yet using that much unit, testing and smoke test. So we would like to improve there at the moment. There's a lot of manual testing and yeah. We would like, of course, to have some test files that automatically get tested the documentation we would like to use assets more intensively.

B

Maybe even automated integrate the data lineage feature, we started some tax, the pipeline guidelines just to have some best practices around the uh let's say when to use assets. uh What is the namings how to use resources? So it's very basic, but we also like to start uh growing that one and uh specifically, I would also like to try the dynamic orchestration as told before we have many sensors starting low level pipelines, but with uh I would like to try if we can also use.

B

Maybe one pipeline then spawns all the dynamically old sub pipelines and also partitioning we're not using yet, but everything is based on files in our environment, so partitions would make sense there as well yeah. That's it from my site, just contact me anywhere, if you have some question or yeah feel free to ask them later on.

B

Thank you very much.

B

Thank you simon. I.

A

Also wanted to give simon a special shout out: he was an extremely early user of daxter and has produced a ton of great content about using docs on his blog, which I highly recommend everyone here to check out uh and we'll also have time for questions for simon at the end of the end of the community meeting.

A

um So next up, we have sandy from the tax record.

A

A

Sandy are you there.

A

uh It looks like sandy's computer is frozen, so we can actually maybe have some time for questions for simon right now. If anyone has any feel free to drop them in the chat or unmute and ask.

C

B

It's it's uh asparty.com right. They can just.

A

I will give sandy just a minute for his computer to come back.

A

We also do general q a earlier now that we have time. um So if any questions for the dax report team, please feel free to.

A

A

Oh, we have sandy great I'm back.

D

um That was terrifying. uh My computer just froze right before I was about to talk um uh okay, but I'm excited to be speaking to you, I'm going to share my screen in one second.

D

Okay, cool, um okay, my name is sandy and I'm an engineer on the dagster team. I'm going to talk about a set of improvements that we've made to daxter's core apis in the last couple months.

D

So people tell us a lot of nice things about dagster, but one of the more persistent pieces of critical feedback we've gotten is: they can feel very heavyweight.

D

um Dextro has a lot of concepts, so these are some screenshots taken from our doc site. um When you're doing complex things, you often need complex software, but being confronted with 20 different abstractions, when you're trying to get just a basic pipeline set up is more of a hindrance than a boon.

D

So one of our aims recently has been to make dagster better at progressive disclosure of complexity. um What progressive disclosure complexity basically means is that doing simple things should feel simple. um When you want to do more advanced things, you need advanced concepts. Those concepts should show up just in time.

D

The second area where dexter can feel heavyweight is boilerplate. So previously this was the code you needed to write. If you wanted an argument to a solid decorated function to include a python type annotation uh for a non-built-in type, uh you had to write this word series, which is the type that you're trying to annotate three different times. um So there are a bunch of other situations where you end up needing to type a lot of characters to get something pretty simple done.

D

um We want to make it so people developing a topic or don't need to put the same information in multiple places, uh and we want um to make it so that they, you know generally just don't- have to type a bunch of characters in general. So one caveat here, of course, that we don't want to make things too magical. It's always possible to go further in reducing boilerplate, but at some point clarity can suffer as well.

D

So a bit of highlight improvements that we made in a few different areas of our apis.

D

One of the areas.

C

Is the relationship.

D

Between python types and dagster types we built dijkstra types to help developers write flexible runtime checks on the data that's passed between solids. um This is a capability that we think is really important, so we described desktops as gradual, but historically we require users to engage with them in some situations where the user is trying to do something other than just write. A custom runtime check um with the recent changes we've made, it's now possible to use python type annotations without dealing with the dags or type system.

D

So let me show you what this looks like. This is a code snippet that dagger would raise an error on until recently um in it. Someone is trying to include a type annotation on the argument to a solid decorated function, um they're trying to say that this input um should be a series dexter used to raise an error on this, because it tries to find a dexter type that corresponds to the series type, which is just a regular python type and can't find one um to get that to work.

D

You have to do something like this, so this is the example I showed on the intro boilerplate side. What's happening here is that we're constructing a daxor type that corresponds to the series python type, then registering it globally globally, so that dagster understands that correspondence.

D

With the recent changes you don't need to mess with doctor types when all you're trying to do is express the python type of an input or output. You can just.

C

Use the python.

D

Type annotation behind the scenes, dijkstra wraps it in a dasher type, which means that it can show up and dag it and also be checked for that type at a runtime.

D

um So this kind of brings up the more general question of like when, should you use types and and basically, if you want to express the python type of an input or an output, just use python type annotations, don't feel like you need to mess with dancer types. um If you wanted to find custom, checking logic for an input or an output um like write, some function that uh validates it in some deeper way use a dexter type.

D

A related change that you can now provide an input definition for one of your inputs without being required to provide input definitions for all of them. So in this example, we want to add a description for our first input. We don't care about doing that for our second input previously dijkstra would raise an error here, but now it does.

D

Not so another place, we've tried to improve dijkstra's progressive disclosure of complexity is with config schema.

D

Historically, if someone wanted to provide configuration to a solid in their pipeline, they would need to provide a value for the config schema argument on the solid.

D

In addition, all of our examples highlight the strongly typed aspects of our config system, so here it's strongly typed because we're requiring the config to be specified as a string with a given key, it's actually possible to be more permissive and provide a value of any as the config schema type, but we didn't really encourage people to do that.

D

So we observe that for many people this is a lot to deal with when learning the system, they need to understand how config schemas work and the type system for confused dms just in order to do basic things even for decks or experts, it can be annoying to need to provide detailed config schema when you're just trying to prototype or when passing a blob of data with a complex structure.

D

So we made a small but fairly significant change, which is to make the default config schema type at any time. What this essentially means that now you can provide config to us all without needing to divide without needing to define a config schema for it. um This is to say that the code example of this slide now works. We passed configuration to our solid without explicitly setting a configuration.

D

We just apply run config when executing the pipeline and the solid can access it. So when should you use config schema? We still believe that it's ys can skip fake schema for any solid and any configuration that you're putting into production, because.

C

Catching errors.

D

Early saves lots of time in the long run, um but when you're prototyping something in development or when teaching dax or someone new, it can often make sense to admit it.

D

Stepping back a little bit. One of the things that um can feel intimidating about dagger is emphasis on config in general, so new conversations are often led to believe they need to heavily factor their logic. To make use of the configuration system.

D

um Overuse of config can result in kind of like sprawling yaml documents for relatively simple pipelines. So here's a pretty intense example of heavily factored config. This is an early version of one of our own demo pipelines and there's tons of redundant information that really can and should be constructed in code.

D

So, on our latest doc site, we've tried to clarify and publish guidelines on the situations where config is appropriate.

D

In general, you should use the config system when you want the person or software, that's executing a pipeline to be able to make choices about what the pipeline does at the time that they're launching the pipeline. um So, for example, maybe a pipeline processes, data for a particular date, and you want the person launching the pipeline or the schedule launching the pipeline um to be able to parameterize behavior with that date.

D

Maybe it's uh operating on a particular file and you want kind of like self-service interface, um where less technical users in your org can come type in the path to a file and process and process that file and config is kind of perfect. For that scenario,.

D

So another area we've tried to streamline is event metadata event. Data is the kind of metadata you can include on events like asset materializations um and the metadata then gets nicely displayed in dagget. The changes we've made here focus on boilerplate elimination, so here's an example of an asset materialization that includes four metadata fields. As you can see, it's quite a different there's.

C

D

Has a monthly community meeting, which is.

C

Happening at the moment.

D

uh Do you have a question.

A

Okay, I'll continue um uh daxter is having a.

D

So here's what you can do now, instead, instead of a list, you can provide a dictionary and for primitive types like floats and ins, you can just provide them directly. Instead of needing to construct a wrapper object. um Last of all, we shortened the name of the argument so taken together. You know these changes might seem small, but you actually get a quite less verbose api um which hopefully lowers the like um general friction and threshold.

D

That's required to include any metadata which hopefully encourage you to to log more metadata, so you can look it up and dagger and last, but certainly not least, the context argument on solid decorated functions is now optional, um so you only need to declare it if you need to use it. In this example, here we have a couple of saws that don't use their context argument. You know all they're doing is um returning a value or taking some inputs, adding them together or trying to value.

D

Now they can be rewritten to not even have a context. Argument dexter recognizes the name context and knows to treat it in a special way. um We never allowed solas to have inputs with that name. So this is not a breaking change.

D

This, of course, doesn't mean the context. Argument is no longer available, so you can still use it when you need it like in this example, um where we're accessing the config via the context.

D

So the context change really shines in combination with another change- that's not here yet, but is on its way currently in dexter. If you want to execute a solid, for example, in a unit test, you need to do something like this, so you invoke a function called execute solid, um which executes your solid and gives you a result, object which you can then pull the output value from um it's, not terrible, but it's a lots of extra type.

D

Every time you wanna invoke a function, um we're working on changes that will allow you to just treat the saw like a regular function and execute it like this.

D

If your solid requires a context, object, you'll be able to just create a context and pass it in um note that we're still finalizing these apis and they might end up with a slightly different name when they make it into release. But here here we're creating a solid context um and then we are directly invoking our solve. That's called operate passing in that context and then passing in um the inputs that are expected by that solid direct, solid indication is useful in a few different situations.

D

So if you're writing a unit test, there's less spoiler play same thing. If you want to book your solid from inside a jupyter notebook, if you're just trying to experiment um and directly invoke some of the components of your pipeline, you don't need to wrap it in a bunch of boilerplate um and then third, it becomes easier to switch back and forth between making something a solid and a regular function.

D

So you won't need to change all the call sites if you decide that a function should actually be a solid or vice versa.

D

So I'm going to shift gears briefly um uh last of all and talk about a couple things that we're working on that aren't related to approachability.

D

um The first one is comprehensive failure. Notifications, so far, dexter has solid hooks, which will trigger some logic if a failure happens inside one of your solids, this isn't sufficient because there are a lot of things that can happen outside of a saw that might cause a pipeline to fail or not even run at all, um and if you have a production pipeline, you know you what you care about is whether it runs not the particular place that the particular thing that's causing it to run uh or causing it to fail. So.

C

D

Example, you might encounter a fail while launching a run, you might encounter a failure while validating configuration for a run um or even inside a schedule or sensor function before the run has been launched at all, we're still exploring what the solution space looks like, but ultimately we want to make it easy for you to write code that executes any time that one of these unexpected things happens.

D

The second thing we're looking at is making it easy to understand the relationships between pipelines. So there are a couple different ways that we might say a pipeline depends on another pipeline.

D

One is that a pipeline might consume assets that an upstream pipeline produces, and another one is that a pipeline might have a sensor that triggers it. When a upstream pipeline completes in situations, we have a lot of pipelines, it can be difficult to get a handle on how they all relate to each other. So we want to make it easy, so it's easy to basically understand your entire organization's um graph and not just the sub-segments of those graphs that live inside individual pipelines.

D

So these are two of the features that we're targeting for our other release, which will land um sometime in july. We have a few other areas that were uh interested in as well, but we haven't sort of defined what the direction looks like um quite as much so we're interested in your feedback on what's important to include in our upcoming release.

D

um You can uh then we're gonna have some extra time for questions, so you can sort of um bring it up directly here in this conversation um or, uh if you think, of something later or just more comfortable typing, the dexter feedback channel is a good place to put it so yeah thanks a lot. That's all I had.

A

Thank you sandy yeah, the rest of the time is open for questions, so please feel free to speak up or just leave a question in the chat and we will get to.

D

A

A

Sandy, do you want to take that question.

D

um Alex are you here because you've been doing some retries work recently.

D

Yeah we are exploring what the right way to set this up is so we don't have anything definitive, but there are some prototype diffs out, at least for the solid level, and then I think, the piece that sandy mentioned the the same system that will back these notifications for like pipeline level failures, will probably have some means for doing pipeline level retries, so we're working on it. Nothing super definitive to talk about at this point.

A

Is the o120 release going to have a cool code name? Yes, it will. It hasn't been finalized. Yet.

A

A

Alex you want to take this one too.

D

um So this is continues to be so. The question is: what's the latest in the act graph operator to replace composite solids? It's a great question. We spent a lot a lot of time discussing how to manage this. um uh We currently have you know a different approach that we're exploring. uh Maybe sandy would like to speak to you say sandy: why don't you just kind of give your current pitch of where we're at so? Basically, this is really tricky right. We want to. We want to try to simplify.

D

We currently have the setup where there's pipelines solids and in between. We have this composite solid, which allows you, like kind of make, a pipeline that you can embed in other pipelines. It's a weird space, we're not we're not happy about it, but the different approaches we've explored so far to solve this problem. Haven't none of them have really like made us very happy, because.

A

There is some significant.

D

Amount of uh you know change that users are going to have to go through here since we're changing one of like the core things in the system. um So we've continued to like spend a bunch of time kind of get close on something and decide. It's not quite good enough and so sandy's been leading the charge on kind of his current vision, for where this could go so I'll. Let him speak to this a little bit yeah.

D

So so you know for those who aren't familiar, I think um one of the main threads that's where we pulled to get at this. This problem is essentially that we have this thing called pipeline and this thing called composite solid and they function very. Similarly. um In other words, they both contain a graph of solids but they're um they're, two distinct concepts, um and this can be confusing for a lot of people and um sort of adds extra overhead to understanding the system.

D

um So our question was basically: can we consolidate them into a single concept and the kind of primary obstacle that we've faced when trying to do that is modes? um Because if you have a pipeline with a set of modes, then you want to embed that pipeline inside of another pipeline or composite solid um sort of what is the meaning of that? Do you get this sort of like combinatorial explosion of modes as you go up the the treated pipelines?

D

So what we're investigating right now and we haven't committed to any direction- is basically a direction that would sort of allow us to take modes out of the equation in those situations.

D

So that putting pipelines inside of the pipelines or graphs inside of the graphs, depending on what naming we decide to go with, doesn't face that same that same issue, and it will also dovetail with a set of changes that we're considering um uh independently, where it can be awkward to be required to define all the you know the full set of modes on a pipeline up front.

D

um So, for example, if you want to inject particular resources inside unit tests, that's very awkward in that extra right now, um because the pipeline definition itself is required to basically hold all of the resources um that it's allowed to execute with. So this, this we're considering changes that would allow you to define a pipeline without defining the modes on the pipeline and then um supply those resources or that mode later.

D

A

Saying do you have context for the next question as well.

D

um I think this might be a better alex question yeah, so.

A

D

Is about uh launching pipelines from python? Currently um andy here is talking about. They use the execute launch command, which is the entry point for our cli to do. Launching and we're wondering is wondering, and if we're going to have an async function to do this as per a discussion. So this is something that we explored a little bit uh last release related to some other changes in the testing apis, which is also connected to this graph stuff.

D

So we're thinking about like having an execute graph command and a launch graph command and like how what's the right way to target something for launch. um So we've explored this a bit, but we don't have anything ready. Quite yet. I guess it's the short answer, not not super definitive, unfortunately, but so we've explored this um and some more work to do on figuring out exactly what the api for this should be.

D

So I'm looking at the next question, um which is in reference to mode's discussion and resources, would that include mapping expected to supply resource names, so we're just going to do less of a global namespace that needs to be managed.

D

I think, would you mind talking a little bit more about what you mean by expected to supply resource names, like is the idea that you'd like to be able to supply resources to particular solids without supplying them to other solars in the same pipeline?.

C

No, what I had in mind was uh this is chatai by the way. uh Hi um is that um okay, every resource in a pipeline that requires a resource um gives a name to that expected resource right and then that's something that you have to uh supply uh to the solid.

C

When you configure the pipeline, both like in terms of configuration and when you set up to mode and those.

B

C

To match like, uh if uh you have two resources, uh if you have two solids, that both need a resource named db, but they're, actually different kinds of dbs, but the writers of those solids didn't really think about coordinating the name space. Then you're kind of in a hard place, um except by changing those folds and every config that ends up using those solids in your existing pipelines and production. Like that's.

C

um Is there a way, or at least like a direction to make that uh a simpler thing to solve.

D

Got it got it? Okay, yeah. We definitely talked about that. um We've talked about it as essentially like resource key remapping. um Where, if you have, I understand correctly, you have these collisions and resource keys, and you want to be able to say um this, uh this resource key, that this solid requires actually corresponds to um a different resource. Key on the pipeline. um Yep you've done some investigation into this chris on our team has uh has played around with it a little bit. To my knowledge, we haven't shipped anything.

D

Yet um adding an extra layer of direction makes a lot of things complex and so we've been trying to be careful there, um but it's definitely something that we would consider incorporating if a lot of people are facing, it.

C

D

um Okay, so the question was how are connecting pipelines with different schedules? Look like in the graph world in the world, with the at graph decorator, um imagining pipeline a generates assets hourly, which should then be used by pipeline b, which runs on a daily schedule.

D

um So I think I think the designs, the design space is fairly open-ended and we're considering a lot of options. um I think, to some degree, if you have um a pipeline, a producing assets um and then pipeline b is consuming those assets. That is uh inherent relationship between those two pipelines and we'd like to be able to capture them. If I understand correctly, the two pipelines don't have a super um tight execution dependency. So it's not like.

D

You want to kick off pipeline b every time the pipeline a kicks off, um but there's still a relationship there, and so we would like to be able to capture and visualize that.

D

A

Wait a bit longer for any more questions to roll.

A

A

Cool, if there's nothing else, uh you can call it thanks again, everyone for coming by we'll be back next month on the second tuesday of um june, and if you'd like to speak at the next community meeting, please reach out to me or fill the google form that we have at the beginning of the presentation, and we can work with you to form a presentation.

A

A