.Net Foundation Virtual User Group, 20 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Modeling and Partitioning in Cosmos DB (2021-05-20)

Description

Mark Brown, Principal Program Manager Azure Cosmos DB, will be presenting on Modeling and Partitioning in Cosmos DB. A great introduction to using a distributed NoSQL database.

A

I'll make it pretty clear, let me see what I have I pushed. I pushed the button.

B

A

Thank you very much. Welcome everybody to the may 20th edition of the san antonio cloud computing group, meetup virtual as it has been I'd like to thank everyone from the san antonio cloud computing group, uh everyone from the net virtual user group for being here uh and I'd like to thank dave and the.net foundation for hosting and running this thing.

A

I appreciate the great work they do uh in in running and hosting these things, as well as their great work, educating, advocating and uh kind of bringing in the new generation of developers uh I'll post that link in the chat here. In a moment, we have one announcement uh one week from today on the 27th.

A

uh Our next meetup event will have mike binkovich, who is a former microsoft, developer tools and cloud evangelist who has spent his career, helping developers and enterprises explore and apply new technologies to solving information challenges.

A

He'll be here and he'll tell us about the permit to cloud which is how to migrate your app and land with confidence in this year.

A

I will um I don't have a link for that one yet, but I'll post, that to our page uh as soon as I do uh and now to our main event of the evening, I'd like to introduce our guest mark brown mark is a principal product manager for azure, cosmos, db and I'll. Let him intro his talk since he knows all about it, but I'm very excited because I don't know how often it is we get to hear first-hand uh you know from from the source material. So.

B

uh Thank you for being.

A

Here, mark uh and I'll hand it over to you.

B

Thank you kevin uh I've known mike ben bakovich, for god well over a decade, probably so he's a great guy good to have him on uh next uh next. Was it month? Do you guys meet monthly.

A

uh No, we we've been it kind of depends on the guests availability. uh We try to meet uh twice twice a month, so on.

B

Thursday nights.

A

Around this time, great.

B

All right well, thanks for having me and uh welcome everyone. uh My name is mark brown, I'm a principal program manager on the cosmos db team. uh I've been on the cosmos tv team, for probably I guess a little over three years or around three years uh prior to that. I was on the azure networking team uh and prior to that.

A

uh Took a little time off.

B

And worked on the azure web apps and redis cache I've been in azure since about 2011.

B

I think- and I've been at microsoft now, just about 20 years, uh all told so quite a long time worked on lots of different products and services. I have to say this is probably the most fun I've had uh uh no sql distributed. Databases is a heck of a lot of fun. uh Lots of good tech in here uh my day. Job for cosmos is uh twofold: uh I'm the program manager for what's called our resource provider. That's basically our control, plane back-end.

B

So when you provision an account using, say an arm template or a powershell or a cli or even rest, uh all that api stuff, the management api is, uh is what I uh when I pm uh and then I also run a team of other pm's as well, uh and we're just basically trying to help spread, uh love and awareness around cosmos db. So today I'm gonna talk about data modeling and partitioning uh in cosmos db.

B

This is probably, uh I would say, the most important topic uh for anyone new to this type of database uh to understand. um So let me kind of go through objectives here. First, I want you to get familiar kind of with cosmos, db's core concepts uh kind of our api in there and understanding data modeling best.

A

B

And then we're going to apply that to a real world scenario uh here today and then I'll help you understand throughout all this kind of how cosmos differs from a relational database when designing a data model because we're going to start with a relational data model.

B

So what is azure cosmos db? Well, it's you know officially microsoft's nosql database on azure when we say no sql. What we really mean is cosmos db is both non-relational and horizontally scalable.

B

So let's talk about the horizontal scale aspects of cosmos db. um Unless your workload is small with just a low amount of data or requests, uh your data is likely going to be stored on a bunch of different physical servers or partitions within something we call a container now in cosmos. We abstract all this away, but under the hood, you're reading and writing data in and out of cosmos across this cluster of servers- and this is better known as scale out and it's how we achieve horizontal scalability, enabling both unlimited storage.

B

So when you need more storage, we just simply add more servers to your cluster. uh This also provides unlimited throughput, because each of these computers has its own cpu memory, and I o uh running through it so uh adds additional capacity to handle the request when you need it now. Cosmos db is also non-relational.

B

So when working with relational databases, you have the ability to do things like define constraints between the different entities. You're storing uh this lets you do things like create foreign keys, uh perform join operations. Those types of things- non-relational databases, like cosmos db, don't implement any of these relational constraints and the reason why is because cosmos is a horizontally scalable database and your data is likely going to be spread across multiple servers. This could be tens or even hundreds of servers depending on the storage or throughput needs you have now.

B

I don't want to suggest that it's not technically possible to enforce relational constraints across a cluster of servers, but doing so would actually have an enormous impact on the performance and availability for your database and because cosmos is designed to provide predictable performance. uh We don't really provide a way to declare these types of relational constraints, uh so you may be asking yourself: well is cosmos? Okay to use for relational workloads, and definitely it most fact is, and in fact most workloads on cosmos are relational in nature.

B

You just need to use and learn different techniques to implement the relationships between your entities, and this approach is very different than designing a data model for a relational database for those that are new to this type of database. uh You should not follow your intuitions uh best practices in the relational world. Don't translate very well uh to this type of database and may even be anti-patterns okay.

B

So let's put this into practice, I'm to spend the rest of this session, taking a relational database that anyone who's familiar with sql server should know the adventure works. 2017 database well, actually we'll just take a small part of it. These tables represent the canonical e-commerce workload. If you will, uh we've got customers with the customer table. Customer address a customer password uh for the products. We've got, of course, the product table and then there's your product categories and product tags, uh and then, finally, we have our sales orders.

B

So we've got our like our sales order, header uh here and then also our sales order detail. So with a non-relational or excuse yes, with a non-relational document database like cosmos db, there's not much, we can do with the tables the way they are. You would never use these tables, as is because the cost of performance for trying to do operations on them would be prohibitive.

B

Another thing I want to point out, too, is that in a relational database, the relationships between the data are important for modeling the database for a nosql database. This is only part of the story. What's most important, when designing a data model for a nosql database is to look at the access patterns for your application.

B

How does your app read and write data out of your database and, as I'll show you through this talk, we're going to look at the application patterns and use that to inform the design and the decisions we're going to make in terms of modeling this database?

B

Okay, so a real world ecommerce platform would obviously need to implement a lot more. You know functions than this, but this subset really is enough to kind of guide uh and illustrate the different design techniques that we want to showcase. So here's what our little mini ecommerce site is going to do we're going to create and edit a customer, and then we want to retrieve a customer.

B

We want to create and edit product categories.

B

Then we want to list all product categories and then we'll do the same for product tags we'll create edit and list those we want to create a product and edit a product, and then we want to list all products uh from a category uh and we also want to include the name of the category and the tags in there, because if you look here in product we don't have the category name in here and then the tags are sitting in a completely different set of tables with a many-to-many relationship.

B

So we want a query: that's going to provide all of that. We also need to create a sales order and then we want to list all sales for a customer. Okay, so that's a different one and then we have another one which we want to query top 10 customers uh by the number of sales orders they have. So this is kind of like a a light analytics scenario.

B

If you will, uh maybe I want to send out a nice gift to them, or I don't know a coupon or whatever saying, thanks for being a customer, uh here's a little something for you, um okay, so let's get started remodeling uh this relational database in cosmos db uh and we'll start here with the customer entities. So uh we have three tables here: I've got customer customer address uh and customer password uh and first what we need to do is we need to translate these into its corresponding json document.

B

Now we could keep these in separate containers in cosmos, but let's look at the operations we need to support that includes both creating a customer and editing a customer uh now, given that cosmos db stores data as json, another approach we could take is to embed the address and the password tables into our customer table. Now, when I create a new customer, I only need to insert the data into a single container.

B

Reading data is also easier because I only need to read from a single container as well, so both these operations are would now be faster and less expensive to execute, because I've embedded them into a single container here.

B

So at this point, people often ask: when would you embed data versus reference it within a database, like cosmos, there's, actually some good rules of thumb on when you would do these types of things with a json database like cosmos, where you can have hierarchical data like this?

B

Generally, you want to embed data or entities together when there's a one-to-one relationship for the data, so for customer and customer password, you would have that one-to-one relationship, so that makes a good a good candidate for that or if you have what we call one to few relationships now. This is different than one to many. One to few is actually a little more precise and that it explains or helps to to uh quantify. Is it an unbounded uh uh relationship in that sense right?

B

So, if you have one to few they're, just a handful of things, it's typically some kind of bounded amount. This would be like address right, like address, is strictly a one-to-many relationship to a customer, but you don't have an unlimited number of addresses or you don't have an insane number of addresses. You just have a small number like maybe one or two, you got yourself, your wife, your your kids or your mom, or your dad or whoever.

B

But so it's a one-to-few relationship that we like to call uh and then the other uh aspect is that the uh related items uh are queried or updated. Together is another reason why you would want to and better reference within there. Now you want to reference should be in bed now you want to reference when the relationship is one-to-many and especially if it's unbounded and I'll explain why that's important in a little bit.

B

Also, if you have a many demanding relationship, you also need to reference this data rather than embedded and then also if the related items are queried or updated independently.

B

Also, another good option for referencing the data uh within a database like cosmos, okay, so because we have this one to one and one to free relationship uh between our entities and because we usually retrieve all the data for a customer at once, it makes sense to embed everything inside the single json document here now that we've defined our first entity, we need to store our data, customers or customer data into a container and we'll call that customer now, when creating a new container, we also have to define its partition key within cosmos db, and you may be asking well what's a partition key well remember earlier, I said that uh cosmos db uh is a basically a cosmic tv container is an abstraction over a cluster of physical servers when storing documents in a cosmos db container.

B

These documents end up getting dispatched to different physical servers. So the question is: how do I decide that this document should go there and that document should go there now technically we're not directly assigning documents to servers?

B

Instead, documents are being written to what we call logical partitions and it's these logical partitions that sit on different physical servers and so from a user's perspective. You don't really need to care about physical servers when designing your data model, uh just that your data gets written to these different logical partitions.

B

So again, now how do you ask like? How am I going to decide which logic partition a date should go on?

B

Well, that's where the partition key comes in. So when creating a new cosmos db container, you define the container's partition key, which is the name of a property that cosmos will use to decide which logic partition. Your data should go to think of it as like an address to route your data to within your database, now using the example of a container partitioned by username.

B

We would end up with a logical partition where data where username equals andrew ends on one partition, and then you have a second logical partition where data where username equals debra is assigned to, and so on now, there's two constraints to keep in mind when designing a data model on cosmos db.

B

First is uh each document has a maximum size of two megabytes now remember before when I talked about when to embed or when to reference. uh It's because uh each document can be no larger than two megabytes, which is why you don't want unbounded arrays uh inside your documents.

B

Another restriction, too, is each logical. Partition has a maximum size of 20 gigabytes, so you can't have more than 20 gigabytes of documents where say username equals andrew and when working with data, uh what we want to achieve or strive to achieve at least is an even distribution of data across our lib, our different logical partitions. So if one partition gets a lot more data stored on another, this is what we call a hot partition and we want to try to avoid that same goes for request as well.

B

If most or all of your requests hit the same logical partition, this is going to be a bottleneck. That's going to drastically limit the overall scalability of your model within there.

B

What we want to end up with is a design where storage and throughput are more or less evenly distributed across our different logical partitions, so partition key plays an important role in both performance and scalability for your database.

B

Let's take that example again of a container partition by username. Now, if we were to issue this query, you'll notice that it filters on the username property- which, thankfully, is our partition key and because cosmos db understands that it's going to send this query to a single, logical or physical partition.

B

That means this query is always going to hit one physical server, so its performance is always going to be really good.

B

Now, if our query is filtered on something else say like favorite color here, we would have no idea where the results are so in this case, cosmos fans out the query to each and every logical partition and thus every physical partition and, as you've probably already guessed. This is going to hit every physical server, and this is what we call a cross partition or fan out query and more specifically, as a fan out means it's going to hit every single physical partition in your database.

B

Now such a query will work, but it does have an impact on latency, throughput cost and ultimately scalability now whoops. Sorry, I need to go back now for small containers. This performance impact is not that bad. In fact, for a container with just a single physical partition, uh you won't notice any impact in the performance at all. However, as your database grows, larger, the impact becomes worse and worse as you have, as you have to hit more physical servers.

B

To answer your query- and this is the scenario that precisely traps so many people new to this type of database is typically developers will do dev and test on a small data set and then conclude that their design is good because the performance they measured was acceptable in a nosql database. You don't truly know if your design is scalable until you actually measure it under heavy load, or you have a good amount of data in there about the concurrency of your operations.

B

Okay, now, with all that knowledge about partitioning, how do we choose the right partition key for our customers? Remember before I show the operations we need to do for our customers. We had three things we needed to do we needed to create a customer, we have to edit a customer, and we also need to retrieve a customer in this case we're going to retrieve them by their id.

B

So id of the json documents we're going to store this container should make for a pretty suitable partition. Key keep in mind too. The other thing that's important is the concurrency of these operations we're not often creating or editing a custom customer, but we are retrieving them quite often right every time they log into the e-commerce application.

B

We need to look them up, so this is why you want to oper or want to optimize around. Well, certainly, the operations that are going to be run, but also the frequency and the currency of those operations as well, plays a huge part.

B

uh Okay, so the id the json documents is going to make for a suitable partition. Key here note that when using the id as the partition key, we end up with as many logical partitions as there are documents in the container, with each partition containing only a single document, and that's perfectly fine. Many users are concerned about this.

B

Having this high number of logical partitions, but there's no need to worry logical partitions are a virtual concept, so cosmos, co-locates, co, logical, partitions on the same physical servers and then moves them to different physical servers when needed, and there's no upper limit to how many logical partitions uh you could have in a container in fact, functionally um a single document per logical partition is a key value store uh in cosmos db and you can have key value stores, uh basically infinite size and cosmos, an infinite number of logical partitions.

B

Okay. So let's do a demo uh or two here uh in this demo, I'm just going to show you how to query for a customer by id and then return that uh customers, data and I'll show you some code here as well. So I've got a little demo app. All set up here uh and we're gonna run this one query customer right here, so let me open that up- and here you can see I'm getting a reference to my database and my container I'm going to look up this customer id.

B

So this is the id of the customer I'm going to get in there. I've got a sql statement so we're using cosmos, db's sql api, uh which has a very sql-like syntax. It's not ansi sql, keep in mind we're a json store, so our flavor of sql is meant to work with json, not with fixed column. Rare data like you'd, find in like a sql or any postgres or anything okay.

B

So next, then, I'm going to create, what's called a query, uh get item, query iterator uh and I'm going to pass in my new sql definition there and then I'm going to pass in a parameter and then I'm going to create some request options here and then I'm going to specify my partition key now for cosmos db. You can do one of two things you can.

B

um You don't have to do this, uh and but if you pass in the value of the partition key in the where clause, uh it will kind of suss it out and then use that uh to wrap the query. However, I'm calling it out here because it's kind of a best practice uh to put this in the request options and this functionally is what is going to route the query to the correct physical partition uh in your database. So it knows where to go, find that data.

B

So that's my query definition here and then I'm just going to put this in a while loop and then loop through the results here and call read next async, that's going to fetch the first record and then just print out my customer using a handy little print function with the new soft library there and that's it just luther all. There's only one customer, but I'm still going to put this in a 4-h uh just to show that that function there.

B

Okay, so over here in my app I've got my apps right here and I'm going to run, uh I'm going to run menu item a and I'm going to query for a single customer.

B

This will take a second because it's just starting the app uh it hasn't connected to the database. Yet.

B

There we go okay, so there's our customer record, you can see. There's the original customer data there, the title first name last name: email address phone number and creation date. uh Here's address in an array now, there's only one address in here, but there it is, and here's the password object that we had in our other table, and here you can see. I returned the request charge of 2.83.

B

Now this may be new to you as well, but cosmos db uses what we call request units per second or ru's. This is how you measure throughput in cosmos db uh and it's essentially a proxy for compute and io and memory uh within there. So when you provision a new container in cosmos, you have to specify some level of throughput and the more throughput you provision the more compute power you get, the more you can do, the more operations you can handle the more you can.

B

Do you also get more storage with that, because there's there's an implied level of storage for a certain amount of throughput within there. um So you knowing how much your operations cost is a way to figure out. Okay, I'm going to run this operation 10 times a second. I know it costs 2.83 ru, I'm gonna and then add that up with other operations that I know I'm going to run at some level of concurrency.

B

Add all those up to figure out how much just roughly, how much are you I need to provision total for my database. Another thing I'll point out too, is because throughput here is measured or governed per second, the longer you can amortize request over time, the lower overall throughput you need. I often hear from customers like I want to run a batch job. That's going to do a whole bunch of work um and I've got to provision the insane amount of throughput to do it. um Why? Why do I?

B

Why is it so much uh to do that, and I, my usually my answer is: can you stream the data rather than um batch it, because if you can stream it, you can process it over a longer period of time, that's going to require less throughput and you're going to pay less for it. So, anyway, that's just talking about throughput something to know there, but we want to measure that here. So that's 2.83 for for that request of that query now.

B

Another thing I want to show you is, if you're making a doing a query where you're passing the partition, you know the partition key and the id for the data. You can do something. What that we call a point read and it's part of the non-query operations for say inserting updating deleting and reading, uh and let me show you that so here I have a function called get customer and same get my database in my container same customer id and I'm going to write.

B

Let's use this function here, which is called read, item async and I'm going to pass it precisely two things: I'm going to pass it the id for the data I'm looking for in this case customer id and the partition key, which is the same in this case it's customer id. So I'm going to go and get the same amount of data using a point read uh and let's see how that runs so run number b and there it is back and look here at the bottom.

B

The request charge for this was just a single ru, and this is something we guarantee for any point read operation that you make for a kilobyte of data or less is always going to cost you a single ru, and the reason is because this goes straight to our back end. We know the cost for a kilobyte of data, uh and we- and so that's why we can guarantee it at a single ru. It's going to be much faster, much cheaper.

B

So if you have a lot of high concurrency reads, if you can structure your data or model your data and partition it such that you can fetch those single items uh with a partition key and an id. That's what you should do, because it's going to give you the best performance, the best bang for buck for your application.

B

Okay, let's go back to slides here.

B

Are there questions I get through this uh kellen or just people? Just hang out and then do it at the end.

A

Yeah, if anybody wants to ask any questions, that's that's pretty good. We only had one one question that had to do with kind of uh the stream uh if uh he was asking, if there's a way to up the the size of the uh the text in your editor there uh it seemed like or or the quality of the stream um but yeah. I'm I'm monitoring for questions. I.

B

Am just going at.

A

Here I can make this bigger hold.

B

On yeah, this is it's funny. I post a podcast and I'm always telling when I do check checks. I'm like you gotta make your stuff bigger, we'll make this bigger and whoops where'd you go and then let me zoom up uh visual studio here and we'll get that to like, say 120. Maybe how does that look? That should be better yeah? That's right, cool cool, cool, all right yeah! Thank you.

B

Folks, sorry for running small texture make it better I'm and I'm going at 1080p right now, so I can't go any higher uh okay, so that was our query. Customer demo, let's move on to products and we'll look at the product tables. Next. uh First up is the product category table. So we're going to do the same thing we did earlier, which is translating this into a json document, and we need to store that document in a container we'll call that product category uh and then next we need to figure out his partition key here.

B

So again we look. We need to look at the request that we need to support in our app right. So here we need to create a create a product category and edit a product category, and then I want to.

A

B

Product categories and again I want to point out when you're looking at the operations, you need to support it's important to know those the volume of those operations. uh For instance, we probably aren't creating or editing product categories very often, uh but just like with customer.

B

We need to list the query product categories uh quite frequently uh within our ecommerce application, uh but there's a problem. We've got here, uh there's no, where filter on this query. So how do I make this thing? A single partition query. Well again we're gonna oops. uh Where are we okay? So what we're gonna do is we're gonna use a little trick here uh and I'm gonna create a new property and give it a constant value in our document.

B

uh So here I've created a property called type uh and I've, given it a value of category uh now I can partition this uh this container by type uh and then just set this value category in every document. Now look. I know this looks a bit weird, but it actually makes sense. If you just stick with me, I'm going to iterate on this design through the rest of my talk here and you'll, see how this actually makes a whole lot of sense, and it's actually a really smart thing to do so.

B

Okay, let's do a demo here in this demo, I'm going to show you a query for product categories uh and then we'll measure the ru charge in our app. So back to my app here, I'm going to query product category uh query products by category id. Sorry, that's the thing! I'm doing I'm listing all product categories. For my one, this is the one I'm going to run okay, so this should look familiar, uh I'm going to get here and scroll.

B

This thing out of the way: yeah, that's okay, uh so top of this looks just like the same for the others. I got my database, my container, my query and here you're, going to see it's just basically hard-coded so select star from c where c type equals category and then I'm going to create my uh query. Iterator here pass in the partition, key guess what hardcoded category and then iterate through the results here and I'm just going to print these out and I'm going to print out the request chart.

B

So, let's list all product categories here, that's option c, and here you can see a bunch of product categories. I think there's 37 in this database uh and there's my query: request charge 4.04. uh So not bad! That's you know pretty good cost there. Just four. Are you to run that query?

B

So in a real world scenario, you know you would it's not like? You would run this query every time right. You would run this at startup and then cache it in memory uh or something right. So uh you wouldn't worry too much about the cost of this grade. But certainly you wouldn't want to be running it over and over and.

A

Over again because it's the same data.

B

And it doesn't ever change all right. Let me go back to slides here.

B

Okay, so next we're going to look at the product tags here and I'm going to translate that into. Of course, it's uh json document format here and we'll store its own container, we'll call that product tag pretty unique now it turns out tags shares the exact same access pattern, as categories does so we're just going to apply the same strategy here, I'm going to add a new property called type, and I'm going to give it a value of tag and stick that for each document, all right.

B

Next, moving on to the product table here going to translate into its json and next I want to look at the relationship uh from product to product tags. uh Our product table here has a many domain.

A

B

With product tags and I need to access tags, the product tags in my application, meaning when I display a product, I need to display the tags for it as well. I also would want a query for a product using its tags as well in here now. I could do this in one of two ways: I could store the product info in the product tags table or I could materialize tags in my product table now.

B

Given that there's much fewer tags for product than product or tags, it makes sense to materialize, product tags and embed them in my product table right because that's going to be a bounded array, you're not going to have a million product tags for a product, or hopefully you won't uh so remember with that one-to-few relationship. It's a good candidate for embedding tags into my product table here uh and then next, we're gonna store products in its own container, we'll call that product um and then next now we need to figure out a good partition key.

B

So again, we're gonna look at the operations here and decide on partition key. We, of course, need to create an edit a product, uh but the interesting operation here is creating for a product by category, uh because this is likely, at least in our design, how customers are gonna search for products or at least one primary weight. uh So we need to list all products that match a specific category, so this corresponding query select star from c where c dot, category id equals say category a will, return all the products uh for that category.

B

So and then, of course, or to make this single partition, we need to have all the products from the same category sit in the same logical partition. So what that means is that for our category for our product category here or our product container.

A

B

uh We're going to use category id as the partition key so now, every time I run that query, uh it's gonna, it's gonna be within partition, uh so I had another problem here, and that is that every time I create for privacy category, uh I get a category id and I get a bunch of tag ids. But what I really wanna display is the category name and then a list of tag names for each product, as I render that out to the page.

B

So in order to achieve that, I would first need to run my query that I've defined here on the products. I then need to issue a second query on my product categories container just to get the name of the category and then for each product returned.

B

I would then need to issue a query to product tags to go fetch the corresponding tag names. Now this could work, but you may be asking yourself: hey. Can I just use joints? Well frankly, you can't, as I mentioned previously, cosmos is a non-relational data store and it doesn't.

B

Support joins across containers data that is modeled in this type of data store is optimized such that it could be served in a single request so to our products table we're going to add additional properties, including the name of the category uh and then also the name for each of the tags that are in there, and so. By doing this we make sure that we can retrieve all the data we eventually are going to need and return that to the client to render on the page and do it in just a single request.

B

Okay, so another demo here uh we're gonna, see what this looks like when we add a category uh uh and the tag names into our private container. So let's run this here, go back to so here's the query: I'm gonna run, I'm gonna create products by category id here so get the reference to my database and container and then here's the category name.

B

I'm going to query on this is uh components: comma headsets and then here's my query so select star from c or c category d equals add category d and again you're gonna, create your query. Definition here pass in your partition key uh and then we'll loop through the results and then I'll show a request chart for that. So let's run that one! That's option d here: query products by.

A

B

B

And here you can see, I've got one two three products, so here's the category d and category name that came back here and then the sku, the name, the description, the price and then here's an array of tags where I've got the tag id and the name for each of them in there. So this one's got three tags. This one has two: this one's got five in there and our request charge, for that was a little less than three 2.91. So that's a pretty efficient query uh in there and enter go back to there.

B

You go okay, let's go back to slides.

B

Okay, okay, so now, when we create a new product, uh we need to populate these additional properties. uh But what if we rename a category or a tag? How do we manage that referential integrity uh between the containers?

B

Well, guess what in a nosql database, you still have to maintain uh referential integrity uh between data right data can change uh and you need to be able or wanna be able to reflect that uh in other places in there and it turns out. Cosmos actually has a way to handle this, and it's called change. Feed now changeview is an api that lives within every cosmos tv container. Actually, technically it lives within every physical partition, but just kind of ignore that, for.

A

As far as you're concerned or users.

B

Are concerned, you access it through the container reference, so whenever data is written to cosmos db, such as an insert or an update uh change, feed streams these to a delegate that you can listen to and then use that event to respond to data that was changed.

B

Okay. So in our case we have to listen to changes uh that occurred to our product category container, as well as our product tags container, and every time that data is updated, it will also propagate those changes to the product container. Accordingly, okay, so I'm going to do a demo on this. So in this demo.

A

I'm going to show you how.

B

To use change feed to do this. First, I'm going to create a product container for a specific category and then we'll see how many products are in that category. I'll then update that category's name uh in the product category container and then we'll show how change feed picks up those changes and then propagates that to every product in the category, all right so back into here. So let me show you some code, so I'm going to query products for category here all right.

B

So here's my category accessories tires and tubes and then I'm going to do a count right in my product container and pass in that category id and then do a group by on the category name in there. Okay and then that's the first thing, I'm gonna run. So let's do that so uh create products by category id. That's option d.

A

B

Whoops wrong one: uh oh that's going to run that uh okay! This is all going to run as a single, a single function. Next thing I want to do is I want to update the product category name, so I have a new thing here. A new function here same category name, and what I'm going to do is I'm going to just make a small change. I'm going to replace the word and and put an ampersand into the category name.

A

B

Creating basically a new category object here and I'm just changing the name of the category right there and everything.

A

Else is going to remain the same same.

B

Category id I'm then going to call a function called replace item async. This is functionally an update for cosmos and it takes three things: it takes the partition key which, by the way, all of these functions take a partition key. So, like the read item, async insert item uh and replace item and delete yes uh and then it's gonna take the id, which is the category here and then the item itself right. So that's the updated product category object that I created here.

B

A

I want to show you.

B

Change feed and what that looks like so here's another little project I have, and it's sitting here running just listening, and this is the code for change feed in here. So let me walk you through this, so it's a couple of things you need to know. uh First is chaintree uses this thing. We call a lisa's container, it's basically a checkpoint uh uh for changes that have been read off of uh off the container. So when change feed runs and it pulls about every second, it goes to the leases container.

B

It says, hey tell me the last thing I read in this container and if it says okay, you read this and there's a difference between that and what's in there it'll say: okay, here's some more changes, and it then sends them uh to the change feed to this delegate that we've called input here and that's basically going to send it as a read-only collection and for us for casting that uh or deserializing that to a specific type, uh the reason it's a collection is because um there may be more than one change.

B

uh That's occurred since the last time you read, uh so it just calls a delegate with this collection full of changes in there uh and then you iterate or loop through all those changes and then do something uh with them. So that's the basically. This is kind of the boilerplate for a change feed uh in here uh and what I'm gonna do.

B

Is I'm going to create a new uh list of task objects here and then, as I for each through every product item or product category item in this collection, I'm going to grab its category id and its name right. So this is going to be the new name when this thing comes in and then I'm just going to add to that task list. Another a function here, that's going to update uh the product category name.

B

So let's look at that and then I'm gonna just go call when all on this right, because the change feed you can process lots of different stuff at the same time. So you just basically set it on a task and let it go all right so here below I've got another function here. This is update product category name. Now I need to update all the products with that category name.

B

So the first thing I'm going to do is I'm going to write a query select star from c, where c header id equals that category id I've passed in um and then I have another reference to a product container here. So let me go to the top. The very top I've got two container references. I've got my product category container, which is what I'm listening to here right, so get change. Change. Speed, processor builder is called off the off the container that you want to listen to, and then I have another container here.

B

This is my product container. This is the thing I'm going to go. Do something with okay so down here. I'm going to first do a query against that container, because I want to retrieve all the products for that category uh and then I'm going to loop through them all here right so just like I was using this forage to loop through to print them out. I'm now going to use it uh to update the name of the product category for each product. That gets returned in my query.

B

Right, so I'm going to count this too, so you can see how many, how many times it does it so here I'm going to change in my return product doc uh object here. Category name is going to be the category name that gets passed into the function and then I'm going to call replace item async and I'm going to update every product in my collection or my container with that new category name: okay, all right! So let's close everything up here. Actually, let's just go back to here. Okay, so here we go.

B

I want you to watch what I do up on top and then watch the change feed down here below, because you're going to see it all right, so update product category name here, that's option e, all right! So here's product category count is 11 and that's the category accessories tires and tubes.

B

Now I'm going to press any key.

B

And I'm going to update that now, look down below.

B

There one change received and I've updated 11 products with a new category name accessories tires ampersand tubes, okay, cool huh and then I'm going to do another count on that. You can see it's. I got the same 11 products in there, so everything is consistent.

B

uh Referential integrity maintained uh now, let's change it back, so you can see it one more time here you can see it picked it up right back to accessories, tires and tubes, and that's it. Okay. Let's go back to slides any questions. I guess at this point.

B

uh I can't tell all.

A

Right, well, let me know if there's.

B

Questions and hopefully you guys are liking. This talk.

A

Yeah it's fantastic! So far I haven't seen any questions, but if I, if I see some I'll stop and I'll, let you know so any people are just minds are blown and they can't put their hands on.

B

Their keyboard.

A

I I think uh I I think it's looking really uh really streamlined and really cool, so I'm definitely.

B

A

I love giving this.

B

Talk and I've, given it a bunch of times because uh a lot of customers don't understand these concepts and it's absolutely critical if to be successful on this date, using this database or this type of database. uh Understanding the concepts and the techniques for modeling data is the only way to get the what you know that what what you need out of this thing I mean the promise for a nosql distributed database is functionally.

B

You can get theoretically, unlimited scale right, it's just it's a scale-out database, so you just keep adding servers to it uh in a relational world like a sql server. There's like a there's a limit, there's a four terabyte limit uh for an uncharted database in sql right by the way. Sharding is the same exact thing right. We call it partitioning, but you're.

B

Basically, it's a scale out right and then you have to figure out which data goes on, which char your partition, we're doing that already here right, we try to make it as simple as possible by saying pick a property in your data and use that to distribute your data to these different physical partitions or shards within there. Right and because it's distributed out like that accessing that data is always going to be fast if you're doing something like a point read right, because it's just you need to know the partition key.

B

So I know the address of that server, so I can go get it so that you know we're unique in the fact that we're the only database in azure that has an sla on latency. So for a read or write of a kilobyte of data or less using our direct mode api using our sql api, we have a sla of less than 10 milliseconds at p99, so 99 out of every 100 requests. We guarantee are going to be less than 10 milliseconds and the p50 of that is generally about four or five milliseconds.

B

So it's generally quite quite fast, uh and that's true whether or not your database is a megabyte in size or a petabyte in size. It doesn't matter because we're scale out right, your data is just is spread out amongst all these different servers, you provide the partition key and the id to get your data uh and we'll get it for you and return it in less than 10 milliseconds. So.

A

Okay, look at it. I just wanted to say I think it's really cool how you have the metrics in there as well. I mean, theoretically, I understand that you know it would be very quick at doing those things, but you know kind of proving it is definitely a great a great thing to do as well.

B

Yep, absolutely, in fact, this is what we tell customers is. They need to prove this out when they go from dev test to prod. You need to put serious load on your database because you need to know if it's actually truly going to scale right. This is the trap, so people that don't understand these concepts and modeling and partitioning they go and they def test their thing.

B

They go into prod and then a month, six months a year later, they realize that they made a poor choice uh and the problem is that you can't go back and change the partition key, because it's actually physically assigning data to where it's physically stored. uh So you need to go and create a new container pick a better partition key, hopefully, and then copy your data from one container to the other and then switch it over.

B

um It can be very painful for customers, so having this knowledge up front makes you a winner in the end. So, okay, let's move on to our last uh set of entities here with the sales order, uh do the same exact thing here: we're going to create json documents.

B

Out of these guys- uh and of course this is also a good candidate for embedding, because it's a one-to-four relationship right you're not going to have a sales order with a billion items in there unless you're um unless you're crazy, you can't stop buying uh next store in its own container, we'll call that sales order uh and let's look at the operations, because we need to decide on a partition key here.

B

So, of course we need to create a sales order uh and then we need to list all sales orders uh for a customer in here. So that's what this query looks like here: we're going to select star from c, where c dot customer id equals customer a. uh So if we partitioned by customer id here, this would make it a single partition query, which is good. I think that makes sense right. So this would be a quite a frequently run operation. uh You know listing all the sales orders for your customer. I was in there.

B

It doesn't impact creating a sales order and there that kind of makes sense as well. uh So I think for this thing here we're going to use customer id as our partition. Key here, okay, before we go any further, I want to take a step back when looking at the containers we've designed so far. It's interesting to note uh that we already have a container that's partitioned by the customer's id and that's the customer container itself. So could we store say the customer record and the sales orders in the same container?

B

uh Yes, absolutely we can do that. Not only is this technically possible in a database like cosmos db, uh this.

A

Is actually a best practice.

B

uh For this type of database, cosmos is schema agnostic right, it's a nosql database, so it does not enforce schema at the database level. So this is something that's totally supported and it's also uh quite suitable when data shares similar access patterns and, of course, shares the partition. Key- and that's true here as well. um Customer is gonna. Actually you know you, the customer's gonna log in so you're gonna access the customer data and then, of course, the customer is going to access their sales orders or create a new sales order.

B

So in the case here it makes sense to store these things uh in the same uh in the same collection. So what we'll do is, uh instead of storing these in different containers, we're going to store everything in a customer uh container here now I've got to make some other changes here as well. uh I'm going to change the partition key from id to customer id and I'm going to add customer id to the customer object or the customer document itself.

B

So now the customer document is going to have the same value for the id as it does for its customer id as a new partition. Key property.

B

Another thing I need to do is: I need a way to be able to distinguish uh between a sales order and a customer within the container, so I'm going to add what we call a discriminator property or this type property, and I'm going to give it a value of customer or sales order for each of those different entities, and this will allow me to query for them individually.

B

If I want here, okay, so now, what we have is a customer's container, where each logical partition will contain exactly one customer row and then all that customer sales orders within here and so now to get all the sales orders for our customer. I just simply run with this updated query here, so this is select star from c, where c customer id equals customer a and c type equals sales order.

B

Okay, so let's go back into our code and we'll take a look at that.

A

We have a quick question here, if you're available to take one sure says why would you use cosmos db for a well-structured data like your schema, as opposed to using uh something like azure sql? uh Well,.

B

For a number of reasons, one speed is one, so we're the only only database that provides latency sla, uh we're also the only database that has uh five nines of availability, and that is because uh we're distributed in that um we run in every single region. So we can survive regional outages uh with very low rpo factor, minimum rpo of five minutes um or you can have an rpo of zero.

B

If you want to run strong consistency, however, you're going to pay for that in terms of latency on your rights, because every time you write data, it then needs to write to every region that you have your account set up. You can also get an rto of zero with a higher rpo.

B

If you want to run multimaster or active active, we call it multi-region rights, and what that does is that every region, you provision, is now a writable endpoint, rather than just having a single region right where you're, basically primarily writing in a single region, and then you replicate to a secondary region where you either do reads uh or you use it for failover for hdr.

B

This is good in a couple of ways: one, not only do you get rto0, because what happens is if, if the primary region stops responding, the sap client will automatically redirect the request uh to a secondary region, uh and it does it within 30 seconds and, frankly, probably does it even faster than that.

B

So you have zero downtime or rto0. With that um other reasons, you would use it as well. I mean just because your data may not be structured, doesn't mean that this is a bad, or this is a database that you couldn't use. It there's plenty of customers that have very structured data, but they have higher requirements for the latency, the availability, uh those sorts of things. So that's, I think, one of the primary reasons that we see customers using.

A

B

I mean consider that cosmos was born in azure right, I mean sequels as a database has been around since 1970., so it's 51 years old now uh now granted um you know back then. When sql was created, you know the cost of a megabyte of storage uh was like a hundred thousand dollars or something like that. Right now, storage is cheap. I mean I got a terabyte of storage on my phone and that just cost me a few hundred bucks what's really expensive.

B

Now is the cost of compute relative to storage, it's very expensive and cosmos being a relatively new database. It's only been around since 2015 2016 is that it's a nosql database designed to optimize around request or the compute end of it, which is why database cosmos is fine with duplicating data uh in your in your account, because the cost of storage fundamentally is cheap in there. What we want to do is design our database such that it serves data exactly as it's needed by your application with as minimal changes to it as possible.

B

So you want to optimize around the request with a database like cosmos, and you do that by taking advantage of the fact that it's schemeless or schema agnostic. You have schema right like if I go into uh my class here models. Here's my schema for my database right. I've got a customer object. Customer address location, object, password product right. This is where you enforce your schema as you do it at the at the application level, uh and this, of course gives you flexibility, and if you want to change it, you certainly can.

B

I was in there with something like uh sql database you're going to have downtime as you go into alter table and that add additional properties to it. So so there's lots of reasons not there's no one reason, um but you don't need to have unstructured data to use a database like cosmos like I said, if you have uh insane or needs around availability or latency or something else. This is also a good choice.

B

Okay, uh any more questions.

A

uh That's the only one yeah that was that's pretty good. I could see where you'd have, that that massive performance gain uh uh and and of course weighing the cost of the storage versus the compute so yeah it definitely makes sense.

B

That and you know the other thing too- is uh it's conscious to not have kind of enforced relational constraints right. We want to be right, optimized so that you know there's no nothing blocking uh when you're trying to write data to the database within there. So there's I mean we physically could enforce relational constraints across physical partitions, um but here's the problem is, if say, a physical partition went down and by the way, there's four physical partitions. For every time you write data, we store data in four different replicas and four different pieces of compute.

B

So we keep multiple copies of your data for availability, because if one of those replicas goes down, you still got three more and when every time you do a write in the cosmos, you write into three replicas and then it copies over to the fourth one. So that's just additional bulletproofing, if you will and within region right so forget about replicating to another region where we do the same thing again. Every time you write data into cosmos, it's stored four different times, okay and then that, and that just gives you additional availability.

B

So um what I'm saying is that if you had to enforce relations constraints across physical servers and one of those servers failed well, then your availability- all up- is gone in that sense. So it's a conscious decision to not enforce that type of thing, but, like I'm, showing like maintain referential integrity, you can totally do it. You just need to know what kind of technologies uh to use and techniques around that so, okay, let's keep going I'm getting close to the end here. uh Finally, I want to look at this last request.

B

We need to serve uh so what we want to do is we want to query our top 10 customers by the number of sales orders they've got uh so this request requires you. You have to count the number of sales orders for each customer and then sort those in descending order. Then return the first 10 uh that come back out of that now, even though customers and sales orders sit in the same logical partition, uh this isn't actually a query. I could do uh with cosmos db, at least today.

B

So again, what we're going to do is we're going to denormalize, but in this case.

A

We're going to denormalize an.

B

Aggregate uh and we're going to store that in the customer entity within our customer container, so I've got this new property here. Sales order account and I'm going to store it in there.

B

So what we want to achieve is that every time I add a new sales order uh into my customer container, I'm going to increment uh this sales order count on my customer object, uh and here we can benefit from the fact that uh each because customers and sales are sit in the same logical partition, we can use transactions actually, so cosmos does support transaction, which is a very relational concept, because the data sits in the same logical partition.

B

So remember we can't we don't it's a conscious decision not to do these types of relational constraints across partitions, but because it's in the same partition we don't have the same issues with regards to the loss of availability or other kinds of weirdness. That can happen when you're trying to do these kind of distributed transactions across these different pieces of compute, and so we can do this in a transaction. Now in cosmos, you could do transactions one of two ways.

B

You can use a stored procedure which is written in javascript and I'm not a huge fan of that, uh but we also have a way to support this uh through our sdks in both the java and uh.net sdk. Using this feature called transactional batch okay. So now what I can do is I can write a query that looks like this, so I'm going to select top 10 from c, where c dot type equals customer and I'm going to do order by on my sales order, count in descending order.

B

Okay, so in this demo let me show you how all this works.

A

And I'll close you.

B

B

I'm going to query my customer and sales order id I'm going to call this function here. First, all right- and I got my customer id here. This is the same one I was querying earlier and then notice in here. uh In my query: I'm not using the type property. So I'm going to get back um the customer record and I'm going to get back.

A

All their sales orders too, which is another really.

B

Kind of kind of a cool tricky thing is: I can get two different types of entities with a single query and pass that back to my client so again really focused around optimizing. How you model your data can have a huge impact in how efficient you can get with your applications.

B

Now before I was uh deserializing, uh these queries into specific classes, because.

A

B

Them here I I can't do that.

A

B

To use dynamic type uh when I deserialize this data, because I I'm getting different types of objects in here, so the rest of this all looks the same right and then I've got a customer object here and then I'm going to create a list of sales orders and then, as I iterate through each of the results here, I'm going to for each of this thing, I'm going to inspect the type property and if it's a type customer then I'll deserialize that into a customer object here and if it's type, sales order, I'm going to do, orders, dot, add and then deserialize it into that there, okay and then I'm going to print this out.

B

B

Query for customer in all orders here: okay, so I've got product here product here product here.

A

And here you can see.

B

Sales order count has this thing been sitting in here for a while, then, oh, I got to get rid of that.

A

And here let me fix this up and I didn't clean myself up here.

B

That door's been sitting there a while okay. So let me query for the customer in all their orders. That's number g here, okay, so I've got product here a product here and then what I'm supposed to show? You is sales order, count of two right here: okay, so there's my denormalized aggregate uh that I've got so that's that query now, I'm going to create a new order and update the customer item total. So let me show you that code here get this back to normal.

B

Okay, so here's my customer here now- the first thing I'm going to do is I need to fetch my customer and I'm going to do that using my point. Read that I showed earlier right, so I'm going to pass the id is the customer id and then the partition key is the customer id remember before we created a new property customer id. That's a partition key, but for the customer record it's the id right. So I don't need to specify something here.

B

I know exactly that id is that same as the partition key within there and I'm going to save that into my customer object or deserialize that other response object here and then I'm going to increment sales order account right here. Okay, so just sells our account plus plus now I'm going to create a new dummy order here. So I've got a a good I'd, normally use new guide here and then here's a new sales order.

B

So I need to create sales order and I need to specify the type right, because I need that discriminator property pass in my customer id. I got an order date and then, of course, a blank ship date because we haven't shipped yet and then I'll cut a couple of products in here. I've got a new mountain bike frame, that's black and 38 inches. I guess, and then some racing socks as well to complete my order here and then below that I'm going to do use this thing called transactional batch.

B

So this comes off the container object here and I'm going to call create transactional batch and because transactions happen within a single logical partition.

B

I have to pass in my partition key and that here, of course, is going to be our customer id and then I'm going to call a couple of functions here, I'm going to call create item and then I'm going to pass the sales order we just created above and then I'm going to call replace item, which is the update uh and I'm going to pass in my customer id and then the customer object in there and then call execute async.

B

Okay. So here we're going to create your order and update order, total that's option age here and all successful. Now I'm going to query for my customer and all their orders, so option g and here's my new sales order made just now with my new hl mountain bike frame, that's black and then a pair of racing socks. And if I scroll up here, you go sales order. Count is three okay, so awesome sauce and just like uh creating orders, you can do that in a transaction.

B

You can also delete an order uh and also do that in a transaction. So here I've got my customer id and order id. I'm gonna call read item async on my customer object and then I'm gonna decrement sales order account to down one and then I'll call transaction batch again and this time I'm gonna pass call the lead item and just pass in the order id and then replace item on the customer object again and then it'll update the customer within there. So go ahead and I'm going to delete.

A

An order here that same.

B

Order now, if I go back and create for my customer in all their orders,.

A

B

As you saw this before, I did this because I wasn't ready and then here you can see that new order is gone and my sales order account is now updated to two okay. So I showed you a whole bunch of stuff here and the reason is because I needed to do this, so I could run my top 10 customer query.

B

So here's my function for that and here's my query: select top 10, first name last name: sales order account from uh my container here customer where c type equals customer right, because I have to distinguish that I'm not pulling sales orders in there uh and then do an order buy on sales order, account in descending order and then I'm just going to print all that out and we'll run you, and this will take a second to run.

B

And here you go so we got dalton mason henry samantha. All these all ten of these guys are going to get a special discount or I don't know what gift card, and here you can receive our request charge of 13.08 ru.

B

Okay. So let's go back to slides here.

B

So I don't know if you noticed, but that last query was actually a cross-partition query right, because what's the partition, key of our customer container is customer id, and here we were running a query that went across customers.

B

So I know before I said you should try to avoid these, but in reality it's kind of hard to avoid them for every different situation and in scenarios where you're just going to run it say once a month or something like that: it's okay, it's okay! For operations that aren't run very frequently it's when they're high concurrency queries. You want to avoid that type of thing at some point, and also of course, if they're in small containers as well.

B

um At some point, it may make sense that a cross-partition query that was once okay uh has gotten to a point where it's gotten too expensive to run. This can happen when you get into containers that are maybe thousands of partitions in size. In that case, what you want to do is you want to denormalize that aggregate, basically you're, creating a materialized view of that data and you're going to store that in another container?

B

So uh the what would happen is you don't create that in terms of uh transaction, uh there you're basically going to create another, an upsert statement and it's going to upsert the value for that sales order count into another collection. And then you run the query and you use that collection or container. Excuse me to serve that query that you then run the materialized view.

B

Pattern is a as another very frequently used, operation or type of trick uh where you're essentially- and it's just like the name- explains right and it's for anybody, juice, sql and what I've materialized viewers is. These understand in this case here we're basically materializing in aggregate and then using that to serve queries, and this is common.

A

B

Workloads where you have say high right, throughput and high read throughput, and this is also another thing- that's um not uncommon is uh customers are often kind of frozen by the fact uh that they may need to have, or they may need to optimize around different partition. Key values, uh it's quite common that data gets written into one container and then they use change feed to write it into another container, with a completely different partition key and that each of the containers kind of serves different purposes or maybe serves different queries.

B

It takes a little bit of math and and figuring to get to the to knowing whether you need to do that. But that's why again understanding the operations that you're running and the concurrency of those operations is so important to be able to do a good design, a scalable design uh for a database like cosmo cv.

B

So, okay, um our final design, here we've got a customer container with customers and sales orders. I got a product container here with my products and I got a product tag container with tags and a category container with categories uh there's one more optimization. We can actually do here, and that is, I can create another container and we'll call it product meta, uh because they're, both they both use type as a partition key with their own unique values.

B

So this is another way you can optimize um data like this and just store it in a single container, because it only needs every container unless you're using shared throughput, um but every container needs its own throughput. But if I store the same container, I can do that. This is actually a best practice for master data right or reference. Data is just store it all in the same container and then use the the type of data as a as its partition.

B

Key because you're never going to run into 20 gigs of product categories or 20 gigs of product tags. um And if you do, you can also you can use a composite key to get a higher level of cardinality within there, so that you can easily keep that data within your 20gb. So here's our final design, um our three containers from nine original relational tables that we had from adventure works and this database will scale and perform uh to essentially petabytes in size unlimited size uh within our application.

B

So that's it uh for my talk. um Everything you saw here is available on github, uh so you go to github.com, azure cosmos db, cosmic works. uh All the code I showed is there uh actually all the data is there as well? um I am working on the writing. A data loader, it's actually written. I just haven't- uh I just haven't merged into this repo, but you can take a look at this and look at all the code in there and look at the data uh and get it set up.

B

I've got an arm template in there. That'll set it up. You can also run a cli bash script. I wrote in there as well that'll also set it up for you there's a good um article, practical cosmos db. This is basically showing modeling and partitioning using like say a wordpress or a blog platform, lots of great videos on uh youtube and then, of course, uh we've got.

B

A little micro site called got cosmos that I run uh and on there we've got uh a weekly podcast that I host uh every thursday at uh 1 p.m, pacific, which I think you guys are essential. So what would that be? Like 2 p.m? I guess for you guys in austin 3 p.m. For us, oh, is it? Are you guys two hour service, okay or two hours, yeah? Okay, uh so here's I've got cosmos.com tv. uh You can come see me every week.

B

uh If I didn't burn you out watching me now uh lots of great stuff next week, we're gonna recap all of our build announcements. We've got a lot of cool stuff coming for build, uh we always do um and then just more great episodes coming up. You can check that out as well. uh We just ran our own first cosmos tv comp. This is kind of like a.net conf, but much much uh lower uh uh level that guy the net confidence.

A

Like three four days or something like that non-stop, we.

B

We can't quite do that yet, uh but lots of.

A

Sessions in here.

B

uh There's live sessions you can go and see in here uh keynote. There's some really great content here. Lots of good on-demand sessions in here as well uh so come and check that out. uh Here's the repo with all the stuff in here um and you can come and check that out as well. It's even got a deployed azure uh thing in there as well, so uh any questions from anyone.

B

Let's see, I see why, uh okay, that you asked that one earlier from.

B

A

Yeah, okay, the the only question I I might have is is kind of uh are there's uh security benefits um because I know it's it's not strictly relational, but you demonstrated that you could, uh you know, use it relationally.

B

uh Security benefits, so um let me talk about security, so the way you uh we use these master keys uh in our sdk to access cosmos db. So you secure um you secure the database using a pair of master keys, there's a rewrite and a read. Only. uh We just recently announced support for azure id and rmac.

B

So now what you can do is uh you can actually uh authenticate using the service principle id from an aed token and then pass that when you do create a new cosmos client uh using our sdk, and you can get all the our back goodness out of that as well. So I can actually now go and create aad groups and then give them permissions using the new rbac model. Like read from this container right to this container. I can query this.

B

Do that and then that's all going to be managed basically, like you, would like a sql database in terms of kind of it's having that aad support where you basically single off into your app using your aed identity and then getting all the hardback goodness out of there as well.

B

uh We also, of course- and I just actually on our um on my podcast today- just did an episode on uh network security, but we also have support for um ipfirewall, so you can secure the database and just limit it to a a handful or a cider range of ip addresses. um You can.

B

We have support for service endpoints, uh so you can uh restrict access to people on a on a subnet or a vnd or a subnet uh within there, and we also support private endpoint, which essentially removes the public ip address right, because cosmos is on the public internet right.

B

It's, the endpoint is out there, the the uri for your endpoint, your you know, myaccount.documents.azure.com uh resolves to a public ip address, so you can use private endpoints uh to remove that and then you basically get a ten dot address and you connect to that or it resolves to that right so and that's basically using also the private dns as well, because you need to have a way to resolve that fqdn uh so that you can access that 10 dot address in there, so so lots of options for security, uh of course, uh in there and authentication.

B

So we've got jurassic and you're off then covered uh from the network all the way down to the app oh, um what else uh customer managed keys? So I guess we're in the security realm you can. uh You can encrypt your data as well. I mean we already encrypt it using a microsoft manage key. If you want to encrypt it again, uh you can go and create a key stored. Key vault pass that key.

B

uh Your the the resource uri for the key and key vaults at us, uh and you put in your database account and then we will encrypt your data again on there so and then at build. um We've got some announcements, cool announcements coming up around security, uh but I'm not gonna, no spoilers. No, no! Sorry, no spoilers here so I gotta fair enough. Yeah.

A

I keep my job I'll, be excited.

B

To see what's coming up.

A

B

Yeah definitely come check us out, I mean that's. Build for us is the big event, because you know cosmos is a. Is a developer's database right? You don't find sql. Is you know big in the dba community, because you need dbas to run it. um Cosmos is fully managed. It's you know it's it's and the only way to access it really is I mean you have a portal, but you don't really use portal to do much of anything just other than just getting things kind of started.

B

um You really all the work happens when you're writing code using our sdk.

A

B

Read and write data in and out of your database right, so it's very developer focused very developer friendly in there, so um so we're kind of a different we're kind of a different database. If you will from from the others.

A

Excellent yep is there anything else you wanted to add, or is that.

B

uh That's it really for me, so unless you guys got any more questions.

A

No, I guess we'll we'll call it a show. uh Thank you very much mark for coming out. uh You really highlighted some some great strengths and some uh some great reasons why you would make the decision you know, make the informed decision to go with cosmos db and and uh really why why you know why you would consider it versus uh some some alternatives, so.

B

Well, what I want you to walk away with is you know, use the right database for the job. You can't use cosmos. It doesn't make sense in every scenario, although you can't use it for a lot of different workloads, um but no being smart about how you use database. How do you model how you design for it? Why is it built the way?

B

It is uh there's a reason why cosmos is built like this uh and it's to give you that basically unlimited scalability uh in there and just insane fast performance, but you can't you can't get the promise unless you, unless you understand the concepts and design for it, and that's that's what I hope that you all got today so.

A

That's definitely, I definitely feel that's the takeaway. I I get and I'm pretty new to uh no sequel. So that's yeah mission.

B

A

B

Cool well follow us on twitter, at azure cosmos db, uh I'm on twitter all day as well. uh In fact, I monitor our twitter account. So so, if you got questions uh you can ask who on twitter, I'm also quite frequently on stack overflow, pretty much answering questions there and on our microsoft q. A uh so you know my job is to try and help developers be successful on cosmos. That's kind of my you know my other job too.

A

Perfect well yeah. It was a great demo, a great presentation, so we appreciate you uh gracing us with with your your knowledge and your presence here with the san antonio group and I'm sure with the.net group as well. So great, oh, okay, we'll wrap it up. Thank you uh very much everybody for coming out. uh We appreciate it and uh please join us next time.

A