Tuning and Managing Scalability with Cosmos DB

In this chapter, we will analyze many aspects that allow us to design and maintain scalable architectures with Cosmos DB. We will use our sample application to understand how many important things work, but we will also work with other examples to understand complex topics related to scalability.

Understanding request units and how they affect billing

In Chapter 3, Writing and Running Queries on NoSQL Document Databases, we learned how to execute SQL queries against a document collection with the SQL API by using different tools. We also learned how to check the request units consumed by each query. In Chapter 4, Building an Application with C#, Cosmos DB, a NoSQL Document Database, and the SQL API, we performed operations and composed queries in strings and we executed them against a document collection with the Cosmos DB .NET Core SDK. In Chapter 5, Working with POCOs, LINQ, and a NoSQL Document Database, we performed operations with POCOs and we composed queries with LINQ and POCOs. However, in these last two chapters, we didn't check request unit consumption.

Every operation and query performed on Cosmos DB consumes request units. We can consider request units as the main currency for Cosmos DB services.

Now we will learn what request units are, how they affect billing, and how we can measure the request unit consumption for different operations and queries with the Cosmos DB .NET Core SDK.

If you want to run a NoSQL database server in your development computer, the service will use your existing hardware resources; that is, some memory, some CPU cores, and some storage. Depending on the operations performed, the database server will consume an amount of memory, use a percentage of the available CPU cores, and perform read and write operations to the storage resources.

In order to perform operations and run queries in Cosmos DB, you just need request units. Request units combine all the resources required to execute any operation or query into a single currency. You don't need to worry about how much memory, CPU time, or storage throughput a particular query might require. You just need to pay attention to a cost expressed in request units that combine all these factors.

Request units are also known as RU in the Azure portal and many Cosmos DB tools.

Request units are the currency used for purchasing resources in Cosmos DB, and we use them as a unit to provision resources in Cosmos DB for our containers or databases. In our previous examples, we provisioned resources for a specific collection; that is, for a container of a document database.

Cosmos DB provisions request units on a per-second basis, and therefore, we will see instances of RU/s, which means request units per second, in the Azure portal and many of the Cosmos DB tools. Azure bills the provisioned request units on a per-hour basis. Thus, whenever we configure the request units per second we want to provision for a container in the Azure portal, the portal displays the estimated hourly and daily spend in the currency that Azure has configured.

Dynamically adjusting throughput for a collection with the Azure portal

Now we will use the Azure portal to check the settings we used to create the Competitions1 document collection with the C# code we wrote in Chapter 4, Building an Application with C#, Cosmos DB, a NoSQL Document Database, and the SQL API.

In the Azure portal, make sure you are in the page for the Cosmos DB account in the portal. Click on the Data Explorer option, click on the database name you used in the configuration for the examples in Chapter 4, Building an Application with C#, Cosmos DB, a NoSQL Document Database, and the SQL API, and Chapter 5, Working with POCOs, LINQ, and a NoSQL Document Database, (Competition) to expand the collections for the database and click on the collection name you used for the examples (Competitions1). Click on Scale & Settings and the portal will display a new tab that allows us to check the existing provisioned throughput for the collection in the Scale section. The following screenshot shows the storage capacity set to Unlimited, the throughput set to 1000 RU/s, and the hourly and daily estimated spend expressed in USD for the provisioned storage and throughput we have defined.

Note that the values might be different based on your account settings. The CreateCollectionIfNotExistsAsync method created the document collection with the following settings:

Replace 1000 with 2000 in the Throughput (400 - unlimited RU/s) textbox. Note that the estimated hourly and daily spend increases in a proportional way. In the sample configuration, which might differ from yours, the hourly estimated spend expressed in USD increases from 0.080 to 0.16 USD. The following screenshot shows how the portal previews the estimated spend for a new throughput value:

Note that we need to specify values equal or greater than 1,000 RU/s to be able to work with unlimited storage capacity because we are working with a partitioned document collection. Keep in mind that any partitioned container in Cosmos DB needs to have a minimum throughput of 1,000 RU/s to enable unlimited storage.

Once we have calculated the RU/s throughput that we need for the tasks performed by our application and for our scalability needs, we might have specific days (perhaps weekends or other certain times) where we expect more operations than usual. For example, let's consider our eSports competition sample application. Some days, the number of competitions might be higher than on other days. In fact, Fridays and Saturdays have higher number of eSports competitions than other days of the week.

As previously learned, it is very clear that an excess in the provisioned throughput will mean we will be billed for throughput that we don't need. Hence, it's useful to know that it is extremely easy to make changes to the provisioned throughput with the Azure portal, the other tools we have analyzed, and the SDKs. In addition, keep in mind that we can automate this task through scripting, as happens with many Azure maintenance tasks. However, the automation options are outside of the scope of this book.

Now we will provision more throughput for the Competitions1 collection. Just click the Save button on the toolbar on top of the Scale and Settings tab. After a few seconds, you will see a message similar to the following one in the Notifications tab at the bottom of the page:

Successfully updated offer for resource dbs/prUNAA==/colls/prUNAJOG6yw=/

Once the eSports competitions application's peak demand is finished, we can follow the same steps to go back to the previously provisioned throughput of 1,000 RU/s to avoid being excessively billed for throughput we don't need.

Working with client-side throughput management

One of the nice things about Cosmos DB is that all the operations that the SDK exposes are ones that we can perform with the Cosmos DB service. Now we will use the .NET Core 2 console application that we coded in the previous chapter as a baseline, and we will make changes to our existing application to adjust the provisioned throughput for the existing Comptetitions1 collection with C# code that uses the Cosmos DB .NET Core SDK.

Specifically, we will increase the provisioned throughput to 2,000 RU/s before we start performing the different operations and running the queries, and then, we will reduce the provisioned throughput to 1,000 RU/s before the application finishes its execution. In this way, we will understand how to dynamically scale the provisioned throughput.

Note that each different available Cosmos DB SDK ends up calling a REST API to interact with the Cosmos DB service. However, we are always working with C# and the Cosmos DB .NET Core SDK in this book.

The code file for the solution with the new sample is included in the learning_cosmos_db_06_01 folder in the SampleApp3/SampleApp1.sln file.

Now open the Program.cs file and add the following UpdateOfferForCollectionAsync static method to the existing Program class. The code file for the sample is included in the learning_cosmos_db_06_01 folder in the SampleApp3/SampleApp1/Program.cs file:

private static async Task<Offer> UpdateOfferForCollectionAsync(string
collectionSelfLink, int newOfferThroughput)
{
    // Create an asynchronous query to retrieve the current offer for the document collection
    // Notice that the current version of the API only allows to use the SelfLink for the collection
    // to retrieve its associated offer
    Offer existingOffer = null;
    var offerQuery = client.CreateOfferQuery()
        .Where(o => o.ResourceLink == collectionSelfLink)
        .AsDocumentQuery();
    while (offerQuery.HasMoreResults)
    {
        foreach (var offer in await offerQuery.ExecuteNextAsync<Offer>())
        {
            existingOffer = offer;
        }
    }
    if (existingOffer == null)
    {
        throw new Exception("I couldn't retrieve the offer for the collection.");
    }
    // Set the desired throughput to newOfferThroughput RU/s for the new offer built based on the current offer
    var newOffer = new OfferV2(existingOffer, newOfferThroughput);
    var replaceOfferResponse = await client.ReplaceOfferAsync(newOffer);
    return replaceOfferResponse.Resource; 
}

In a nutshell, the code retrieves the Microsoft.Azure.Documents.Offer instance, which represents the offer bound to the document collection with an asynchronous query. Then, the code creates a new instance of the Microsoft.Azure.Documents.OfferV2 class, which is a subclass of Offer, and sets its desired provisioned throughput to the RU/s received in the newOfferThroughput argument. Finally, the code replaces the offer related to the document collection with the new instance of the OfferV2 class in an asynchronous operation and returns the persisted instance. This way, we have a new method that allows us to set the desired throughput for our partitioned document collection.

The method receives the value of the SelfLink property for a Collection instance in the collectionSelfLink parameter, because the version of the API that we are using doesn't allow us to use a URI to retrieve the offer related to a collection. Hence, we must work with the old-fashioned self link, which was more common for other operations in the first versions of the API.

The first line declares the existingOffer variable as an Offer instance. Then, the code creates a LINQ query with a call to the client.CreateOfferQuery method without arguments chained to a Where query method, which specifies that we want to retrieve the Offer instance whose ResourceLink property matches the collectionSelfLink value received as an argument. This way, we indicate that we want to retrieve the offer that is related to the collection.

The client.CreateOfferQuery method returns a System.Linq.IOrderedQueryable<Offer> object, which the code converts to Microsoft.Azure.Documents.Linq.IDocumentQuery<Offer> by chaining a call to the AsDocumentQuery method after the chained LINQ query method. The IDocumentQuery<Offer> object supports pagination and asynchronous execution and it is saved in the offerQuery variable.

At this point, the query hasn't been executed. The usage of the AsDocumentQuery method enables the code to access the HasMoreResults bool property in a while loop that makes calls to the asynchronous ExecuteNextAsync<Offer> method to retrieve more results as long as they are available. In this case, we will only retrieve one Offer instance that matches the specified criteria. If there is a match, the code in the foreach loop will assign the retrieved Offer instance to the existingOffer variable.

Note that the previous method doesn't chain a FirstOrDefault method to the query, because this method would execute the query with a synchronous execution. In our examples, we are always working with queries with an asynchronous execution and we won't use synchronous methods to run the queries.

In our case, we will always be able to retrieve an offer related to the collection, and therefore, existingOffer will always have a value after the execution of the foreach block and performs the previously explained code to create the new OfferV2 instance, set the new desired throughput, and persist it with an asynchronous operation.

Now replace the code for the existing CreateAndQueryCompetitionsWithLinqAsync static method in the Program class with the following lines. Note that the added lines are highlighted in the next code snippet. The code file for the sample is included in the learning_cosmos_db_06_01 folder in the SampleApp3/SampleApp1/Program.cs file:

private static async Task CreateAndQueryCompetitionsWithLinqAsync()
{
    var database = await RetrieveOrCreateDatabaseAsync();
    Console.WriteLine(
        $"The database {databaseId} is available for operations with the following AltLink: {database.AltLink}");
    var collection = await CreateCollectionIfNotExistsAsync();
    Console.WriteLine(
        $"The collection {collectionId} is available for operations with the following AltLink: {collection.AltLink}");
    // Increase the provisioned throughput for the collection to 2000 RU/s
     var offer1 = await UpdateOfferForCollectionAsync(collection.SelfLink, 2000);
    Console.WriteLine(
        $"The collection {collectionId} has been re-configured with a provision throuhgput of 2000 RU/s");
    collectionUri = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
    var competition3 = await GetCompetitionByTitleWithLinq("League of legends - San Diego 2018");
    if (competition3 == null)
    {
        competition3 = await InsertCompetition3();     }
    bool isCompetition4Inserted = await DoesCompetitionWithTitleExistWithLinq("League of legends - San Diego 2019");
    Competition competition4;
    if (isCompetition4Inserted)
    {
        competition4 = await GetCompetitionByTitleWithLinq("League of legends - San Diego 2019"); 
        Console.WriteLine(
            $"The {competition4.Status} competition  with the following title exists: {competition4.Title}");
    }
    else
    {
        competition4 = await InsertCompetition4();
    }
    var updatedCompetition4 = await UpdateScheduledCompetitionWithPlatforms("4",
        "92075",
        DateTime.UtcNow.AddDays(300),
        10,
        new List<GamingPlatform>
        {
            GamingPlatform.PC, GamingPlatform.XBox
        });
    await ListScheduledCompetitionsWithLinq();
    await ListFinishedCompetitionsFirstWinner(GamingPlatform.PS4, "90210");
    await ListFinishedCompetitionsFirstWinner(GamingPlatform.Switch, "92075");
    // Decrease the provisioned throughput for the collection to 1000 RU/s
    var offer2 = await UpdateOfferForCollectionAsync(collection.SelfLink, 1000);
    Console.WriteLine(
        $"The collection {collectionId} has been re-configured with a provision throuhgput of 1000 RU/s"); 
}

The new lines added at the beginning of the code call the previously defined UpdateOfferForCollectionAsync asynchronous method with the collection.SelfLink value and 2000 as the values for the collectionSelfLink and newOfferThroughput arguments. This way, the call provides the value of the SelfLink property for the previously retrieved collection and increases the provisioned throughput for the collection to 2,000 RU/s.

The new lines added at the end of the code call the previously defined UpdateOfferForCollectionAsync asynchronous method with the collection.SelfLink value and 1000 as the values for the collectionSelfLink and newOfferThroughput arguments. This way, the code decreases the provisioned throughput for the collection to 1,000 RU/s.

We can easily make changes to the provisioned throughput based on our specific needs with a few lines of code and a generic method that does the job, such as the UpdateOfferForCollectionAsync static method.

Understanding rate limiting and throttling

If we hit the provisioned RU/s rate limit for any operation or query, the Cosmos DB service won't execute this operation and the API will throw a DocumentClientException exception with the HttpStatusCode property set to 429.

This HTTP status code means that the request made to Azure Cosmos DB has exceeded the provisioned throughput and it couldn't be executed.

In some cases, the only way to execute the request would be to increase the provisioned throughput. For example, if we have a single operation that requires more than 1,000 RU/s but we have provisioned only 1,000 RU/s, there will be no way to execute the operation unless we increase the provisioned throughput. No matter the number of times we retry, the operation will always fail. Of course, we should avoid operations that require a huge amount of RU/s.

If we have two operations that require 501 RU/s each and we have provisioned only 1,000 RU/s, neither operation would be able to be executed in the same second as the other because they will consume 501 * 2 = 1,002 RU/s. One of them will fail and throw the previously explained exception. However, in this case, it would be possible to retry the operation that failed in the next second because the next second will have 1,000 RU/s available again and we only need 501 RU/s. Hence, if no other operation that requires more than 499 RU/s is executed concurrently, the operation that initially failed will succeed on the next retry.

There are many settings in the SDK that allow us to configure the number of retries that the SDK automatically performs for any operation that hits the provisioned RU/s rate limit before the DocumentClientException exception is thrown. Thus, we have to consider the behavior of the Cosmos DB service and the configuration of the SDK. There is no single configuration that fits all scenarios. It is necessary to decide which is the most convenient way of handling rate limiting and retries based on our specific application needs.

We can easily check the requests that exceeded provisioned throughput and generated an HTTP 429 status code from the Azure Cosmos DB service with the Azure portal. Follow the next steps to check the metrics for the collection we are using in our sample application:

In the Azure portal, make sure you are in the page for the Cosmos DB account in the portal.
Click on the Metrics option.
Select the desired database from the Database(s) dropdown (Competition).
Select the desired collection from the Collection(s) dropdown (Competitions1).
Click 1 hour at the right-hand side of the dropdowns to summarize the overview of the metrics for the last hour.
The Overview tab will display the following information about the selected database and collection for the last hour:
1. A map with the region
2. The average throughput in RU/s
3. The average number of requests per second
4. The data size
5. The index size
6. A chart with the number of requests and their status codes
7. A chart with the number of requests that exceeded capacity; that is, those requests that generated an HTTP 429 status code
8. A chart with the data and index storage available and consumed

By inspecting the metrics, we can easily check whether we require more throughput or specific optimizations to reduce the RU/s consumed. The following screenshot shows an example of the metrics after running our sample application. Note that there are no requests that exceeded the provisioned throughput:

Tracking consumed request units with client-side code

There are many variables that determine the way Cosmos DB calculates the request unit charge for each operation. The first variable is the amount of data an operation or query reads or writes. 1 RU is how much effort it takes to read 1 KB of data from Cosmos DB that directly references the document with its URI or self link. Writes are more expensive than reads because they require more resources. The amount of properties and data you have in a document affects the cost as well. The data consistency levels, such as strong or bounded staleness, can cause more reads. Indexes affect your query costs. Your query patterns and the finally stored procedures and triggers you defined will add more request units with more complicated query executions. These are all factors that can be optimized, fine-tuned, and monitored.

Now we will establish a breakpoint in one of the methods that reads and updates a scheduled competition with platforms to inspect the properties provided by the DocumentResponse and ResourceResponse instances for the read and update operations. First, make sure you remove all the existing breakpoints.

Go to the following line within the UpdateScheduledCompetitionWithPlatforms static method: readCompetitionResponse.Document.DateTime = newDateTime;

Right-click on the line and select Breakpoint | Insert breakpoint in the context menu.

Now go to the following line within the same method: if (updatedCompetitionResponse.StatusCode == System.Net.HttpStatusCode.OK)

Right-click on the line and select Breakpoint | Insert breakpoint in the context menu.

Start debugging the application.

Inspect the value for the readCompetitionResponse variable in the Locals or Auto panel. This variable holds the DocumentResponse<Competition> instance with the result of the call to the client.ReadDocumentAsync<Competition> method. Expand readCompetitionResponse and check the different properties. The RequestCharge property provides us with the request units that have been charged for this document read operation: 1. There are many other properties that also provide valuable information about the operation with the document, as well as properties that indicate values that apply to the document collection and the database. The following screenshot shows the properties, their values, and their types:

The CollectionSizeQuota property specifies the maximum size for the collection in which we have performed the 10485760 operation. The CollectionSizeUsage property indicates the size that we are consuming from the collection. These properties provide information that is related to the collection, and therefore, we can use them to check whether we are running out of space within a collection. There are many other properties that provide quotas and actual usage for different resources.

Continue debugging the application until it hits the next breakpoint.

Inspect the value for the updatedCompetitionResponse variable in the Locals or Auto panel. This variable holds the ResourceResponse<Document> instance with the result of the call to the client.ReplaceDocumentAsync method. Expand updatedCompetitionResponse and check the different properties. In this case, the RequestCharge property provides a value of 13.26. However, remember that this value might be different in your configuration. A document replacement operation consumes more request units than a document read operation. The following screenshot shows the properties, their values, and their types:

So far, we have been working with the ExecuteNextAsync method to retrieve each of the pages for the results of the asynchronous queries. However, in order to check the request charge for asynchronous queries, we have to make a small edit to the existing code to be able to inspect the FeedResponse instance.

Now stop debugging the application and replace the code for the existing GetCompetitionByTitleWithLinq static method in the Program class with the following lines. Note that the added lines are highlighted in the next code snippet. The code file for the sample is included in the learning_cosmos_db_06_01 folder in the SampleApp3/SampleApp1/Program.cs file:

private static async Task<Competition> GetCompetitionByTitleWithLinq(string
title)
{
    // Build a query to retrieve a Competition with a specific title
    var documentQuery =
client.CreateDocumentQuery<Competition>(collectionUri,
        new FeedOptions()
        {
            EnableCrossPartitionQuery = true,
            MaxItemCount = 1,
        })
        .Where(c => c.Title == title)
        .Select(c => c)
        .AsDocumentQuery();
    while (documentQuery.HasMoreResults)
    {
        var feedResponse = await documentQuery.ExecuteNextAsync<Competition>();
         foreach (var competition in feedResponse)
        {
            Console.WriteLine(
                 $"The Competition with the following title exists: {title}");
            Console.WriteLine(competition);
            return competition;
        }
    }
    // No matching document found
    return null; 
}

The edited code saves the results of calling the documentQuery.ExcuteNextAsync<Competition> method in the feedResponse variable. Then, the next line uses feedResponse to get its enumerator and retrieve each Competition instance.

Go to the following line within the GetCompetitionByTitleWithLinq static method: foreach (var competition in feedResponse)

Right-click on the line and select Breakpoint | Insert breakpoint in the context menu.

Start debugging the application and continue debugging it until it hits the recently added breakpoint.

Inspect the value for the feedResponse variable in the Locals or Auto panel. This variable holds the FeedResponse<Competition> instance with the result of the call to the documentQuery.ExecuteNextAsync<Competition> method. Expand feedResponse and check the different properties. The RequestCharge property provides us with the request units that have been charged for this query: 3. However, remember that this value might be different in your configuration. A query consumes more request units than a document read operation. The following screenshot shows the properties, their values, and their types:

We can write the code necessary for summing the request units charged to the different operations and queries we perform and have a better understanding of our required provisioned throughput.

Understanding the options for provisioning request units

So far, we have been provisioning throughput for each collection. This gives us more granular control on how many request units we will need per container. This option is usually better if you have a smaller number of containers and require guaranteed throughput on each container backed by SLA. Keep in mind that all physical partitions of a Cosmos DB container will equally share the number of request units available for the container.

The other option is to provision throughput at the database level. In this case, all the containers within the database will share the total pool of request units you have. This can be a more comfortable management option when you have a high number of containers and do not necessarily want to manage all individually.

One crucial difference between container-level provisioning and database-level provisioning is the minimum throughput we can provision. For partitioned containers, the minimum provisioned throughput allowed is 1,000 RU/s. For database-level provisioning the minimum provisioned throughput is 50,000 RU/s.

Learning portioning strategies

To implement the right partitioning strategy for your needs, it is essential to understand how Cosmos DB does partitioning internally. In its usual sense, a partition may or may not exist. Physical and logical is how we have separated the two types of partitions. Out of all the partition key values we have in our dataset, Cosmos DB decides when we need a physical partition and moves an appropriate set of logical partitions to a separated physical partition when required. A physical partition consists of a reserved SSD storage area and variable compute resources. The management of physical partitions is done entirely by Cosmos DB. We do not have any control over it, except in picking the right partition key design to influence how request units might end up being distributed across physical partitions. The number of physical partitions on a container will vary based on the RU load on the container, and can increase up to the number of unique partition key values we have in a container.

In the following diagram, a container with two physical and six logical partitions shares RU/s allocations across physical partitions equally:

Logical partitions define every partition represented by a unique partition key in a dataset in a container. Multiple logical partitions can be in a single physical partition, depending on their RU load, or can be hosted in their reserved physical partitions.

When a container is first provisioned for X number of RUs, Cosmos DB will look for consumed data and decide on how many physical partitions are needed. If RU consumption is higher than a single physical partition's maximum throughput, Cosmos DB will create as many partitions as needed to serve the X number of RUs required. Every physical partition will get an equal share of the total RUs provisioned for the container.

Logical partitions will be distributed across the number of physical partitions.

The choice of the perfect partition key is critical for a scalable, long-lasting Cosmos DB implementation. When a single logical partition is stuck in a separate physical partition, there will be no way to scale it further. For example, at the time I was writing this book, a single physical partition in Cosmos DB had a limit of 10 GB of data storage. This limit could only be surpassed by adding new partition key values into the container so that Cosmos DB can create a new physical partition with the new logical partition. It is not just about the storage; compute needs to be evenly distributed as well. The amount of RUs provisioned for a container will be equally shared by all physical partitions. If you need 5,000 RUs in a partition called Seattle and have a total of 10 physical partitions, you will need to scale up to 10 * 5,000 RU to make sure your Seattle partition gets its fair share where your other partitions might only need 1,000 RUs.

We already analyzed how to monitor RU consumption and data storage in the Metrics section of the Azure portal. The next screenshot shows how much storage space is used by top partitions, including the data size and the index size. Even though the numbers in this particular sample dataset are small, the fact that Beef Products has mostly three times the data as most other partitions is a red flag. If the incoming data keeps the same balance, we will hit the limit with this partition before other partitions fill up:

The next screenshot shows another chart from the Azure portal that displays the top partition key ranges. These are physical partitions hosting multiple logical partitions. In this example, you can see that the top physical partition, with 30 MB data in it, hosts logical partitions with the Beef Products, Vegetables, and Baked Products partition keys:

By navigating through the reporting screens in the Throughput page, you can find out how many physical partitions you have. In the next screenshot, you can read the Provisioned throughput is evenly distributed across these partitions (2000 RU/s per partition) statement. The Nutrition collection is currently set up for 10,000 RU/s, and because of the five partitions Cosmos DB had to create to serve enough RUs for all queries, the total RU is divided to provide each partition with 2,000 RU/s:

Looking at the Throughput metrics page, a critical graph is at the bottom of the screen titled Max consumed RU/s by each partition key range. In the next screenshot, we have four physical partitions, all exceeding their RU/s limits. The dark blue horizontal line in the graph represents the RU/s limit we have on every partition. It looks like all four partitions have exceeded their limits. The fact that all partitions had the same load is a good sign. We do not have a hot partition. What we have is an overall outage. If just one of the partitions had a high load, then it would be a hot partition issue where a single partition exceeds the limit. If that hot physical partition has multiple logical partitions, Cosmos DB would scale and create a new physical partition to distribute the load. If a single logical partition created a hot partition, there would be no way out other than scaling the full container to make sure the divided RUs per partition was higher than the hot partitions requirement. This is why a good, balanced design for partition key selection is crucial:

Partition key selection is not only for proper RU distribution and consumption. A lousy partitioning design can cause cross-partition queries, which are very costly and disabled by default.

Note that Cosmos DB throughput SLAs apply only to single-partition queries. Running cross-partition queries in Cosmos DB is not ideal and should be considered a last resort. We worked with cross-partition queries in our previous examples to demonstrate how we could run them.

When you run a cross-partition query, the SDK will fetch key ranges from Cosmos DB and run all single-partition queries in parallel to fulfill the requirement of the cross-partition query you wrote. Depending on the degree of parallelism, a cross-partition query may sometimes provide the same performance as a single-partition query, and sometimes it may not. Upper limits for parallel query execution in Cosmos DB can be configured with the MaxDegreeOfParallelism and MaxBufferedItemCount parameters within the SDK. MaxDegreeOfParallelism helps to increase the concurrent connection for a crosspartition query. By default, it is set to 0.

What if you can't find a single property in your documents that can help you achieve a proper distribution of data and query load? You can combine multiple fields and create synthetic keys.

For example, imagine that you have the following document:

{
    "city": "seattle",
    "date": 2018
}

If you are not sure that using the city key as a partition key will be good enough, you can combine two fields and create a new one by concatenating the two values to be used as a partition key. For example, the following document combines city and year to create a partitionKey key that we will use as the partition key:

{
    "city": "seattle",
    "year": 2018,
    "partitionKey": "seattle-2018"
}

What if we just assign a unique value to every document and use that as the partition key? That sounds pretty flexible. The issue is that if you wanted to use cross-document transactions with stored procedures or triggers, the requirement would be that both documents be in the same partition.

By using unique values and having random GUIDs for all your documents as the partition key value, you will make it almost impossible to have multiple documents in the same partition. The container will be able to scale pretty well, but you will never know what documents can be used in a single transaction.

The obvious choices for a partition key are things like user ID, company ID, tenant ID, and device ID. Whatever ID you pick, you need to make sure you will never need more than the maximum performance and storage of a single physical partition; that will mean you will never hit any limits. Once that is accomplished, the second evaluation point would be the RU load distribution. A good practice would be to look into your application and figure out what common parameters exist in most used queries. Having those parameters in the partition key will make sure that you run single partition queries and distribute your load. You should consider your dataset saturation as well. If 75% of your customers are in the USA, picking the country field as a partition key is a recipe for a hot partition.

Deploying to multiple regions

The ability to operate at the global scale is one of the significant capabilities of Cosmos DB. It is not merely a database built for an on-premises server farm that has been moved into the cloud. Cosmos DB is a database built for the cloud from scratch. A cloud-native database needs to be global and be able to accommodate the needs of a global-scale application.

Deploying a multi-region database with Cosmos DB requires just a couple of mouse clicks or taps on the Azure portal. Navigate to the Replicate data globally blade of your Cosmos DB account in the Azure portal and start picking as many regions as you need. Once your selection is complete, hit Save and watch the magic happen. However, don't forget that deploying to multiple regions will have an impact on billing.

In the following screenshot, you can see that the global replication for Cosmos DB means selecting regions in the Azure portal. This will help to spread and scale your reads across multiple regions and help your applications read data from the nearest data center with the help of multi-homing:

The Cosmos DB SDK helps client applications decide their preferred read regions. When a request to a particular region fails, the client application will try the same request in the whatever region that follows in the preferred read regions list.

Understanding the five consistency levels

Now that we have multiple replicas of our data around the globe, what about data consistency? Cosmos DB provides the following five options:

Strong
Bounded staleness
Session
Consistent prefix Eventual

In a replicated dataset, there is always a trade-off between consistency, availability, throughput, and latency. In theoretical computer science, Brewer's theorem (named after computer scientist Eric Brewer), also known as CAP Theorem, says that "a distributed database can only give two of the three guarantees; consistency, availability, partition tolerance". With this context in mind, Cosmos DB offers five different consistency levels, letting us fine-tune our priorities.

Keep in mind that high consistency levels will require more RUs as well.

The following screenshot shows the Default Consistency option in the Azure portal, which allows us to pick the desired consistency level:

For those of you experienced with distributed databases, strong and eventual consistency might sound familiar. Strong consistency is where everything is in perfect synchronization. All read operations get the latest data. A write operation never becomes visible until all replicas synchronously commit the data change. If you pick strong consistency, you can only add one region to your Cosmos DB account. In that case, what replicas are we talking about? Let's clarify that. Every physical partition set in Cosmos DB is replicated four times across four resource partitions. These are called replica sets. These copies are local to the region and help achieve multiple SLAs. On the other hand, the replications between regions are asynchronous. It would be very costly to provide strong consistency across regions. Considering the SLAs Cosmos DB is trying to hit, it makes sense that strong consistency is only available in single-region accounts.

What about the global database? How is that possible with single-region accounts? Of course, it is not. We will need to loosen up the consistency level a little bit – good thing that we've got four other consistency levels in our list.

Bounded staleness is somewhere between session consistency and strong consistency. It allows reads to lag behind writes for a specific amount of time or versions. This consistency level provides total global order. This is like strong consistency but with some lag. At this level, you can have as many Azure regions in your Cosmos DB account as you want. Regarding RU consumption, bounded staleness will take its toll on your operations. The cost of queries will be higher than session consistency, but the same as strong consistency.

Session consistency provides a perfect read-your-own-writes environment. It offers strong consistency scoped into a client session. The cost for session consistency is lower than bounded staleness but higher than eventual consistency.

Consistent prefix is between session consistency and eventual consistency. Consistent prefix makes sure read operations always return a prefix of all data updates without any gaps.

Eventual consistency is the weakest consistency level in Cosmos DB. In this level, when a client sends requests across multiple regions, it might get old data compared to what it received in the past from another region. Eventual consistency provides the lowest latency and RU consumption.

Consistency levels for a container can be set in the Azure portal in the Default consistency blade. You can override consistency levels per query depending on the operation. For example, you can ask for a lazy read with a low RU cost even if the default consistency level is higher.

Taking advantage of regional failover

You might have noticed that on the Replicate data globally page in the Azure portal, there is only one region defined as Write region. This is because Cosmos DB is designed to get all writes into a single region and distribute the reads.

Cosmos DB started supporting multiple write regions with a feature called Multi-Master in a preview version during the last quarter of 2018. With a single write region, it is important to plan for regional failovers. When a read region fails, another region will take over. When a write region fails, you might want to prioritize what region to take over explicitly. To do so, you can find an Automatic Failover button on top of the region replication blade. Once you open up the page, you can enable automatic failover and drag and drop regions to create your prioritized failover list.

The following screenshot shows a sample configuration for the read regions and its priorities:

In addition to automatic failovers, you can change the write region to another region by simply triggering a manual failover. The manual failover button is available in the region management blade. Once you click on it, you will have a list of regions to failover the write region to. During a failover, there is no code change required in client applications. For some global applications, it makes sense to failover the write region to different regions at different times of the day to make sure writes go to the nearest location. Manual failover can be automated through Cosmos DB SDKs if needed.

Understanding indexing in Cosmos DB

The default configuration for indexing in Cosmos DB makes indexing happen automatically. Hence, whenever we create or update a document in a document collection, all the keys included in the document are indexed. This might sound counter-intuitive, but it is how the system is designed to work. No need for index management, unless you want to optimize your costs better or you require specific queries.

Keep in mind that every index you have in your dataset will have its toll on request units consumed and storage space used. Hence, if you are indexing keys that you are never going to use in search criteria, you are wasting resource units in every write operation.

In contrast, sometimes removing an index can increase the request unit cost of a query as well. Thus, it is very convenient to make sure that we don't remove indexes for keys that are included in search criteria. It is vital to use indexing strategically to come up with the best implementation. Let's look at what options Cosmos DB has to offer.

Cosmos DB has the following three index update modes:

Consistent index: This is the default mode. As the name suggests, this indexing mode helps to keep an always consistent index. This mode of indexing will have its share of load, especially during writes. In this mode, the index is updated synchronously as part of the operation that persists or deletes a document in a collection. However, as soon as the operation has finished, the document is indexed and it can be queried immediately.
Lazy indexing: This indexing mode updates the index asynchronously when the collection provisioned throughput is not fully utilized. The big risk of this indexing mode is that documents might be indexed slowly when the provisioned throughput is being consumed at high rates by all the operations. Queries might provide results that aren't consistent. For example, a COUNT query with specific criteria won't include the documents that aren't indexed.
None: There is no indexing at all. This mode is only useful when we work with documents that are accessed by ID and we don't need to execute queries. Hence, we should only consider this option when we use a collection as key-value storage. If we run any query in a collection that isn't indexed, it is necessary to set the EnableScanInQuery property to true in the FeedOptions instance passed as an argument to the CreateDocumentQuery method. However, these queries will be executed as full scans that will consume an important amount of resource units.

Cosmos DB has the following three different indexing types that are suitable for diverse data types:

Hash: This index type is mainly used for equality and JOIN queries. The data type for hash indexes can be String or Number.
Range: This index type comes with the maximum index precision by default. This index type is used for range queries, equality queries, and sort operations (ORDER BY). Range indexes support String or Number as well. In Cosmos DB, DateTime values are stored as ISO 8601 strings, and therefore, range indexes help with range queries related to DateTime keys as well.
Geospatial: In this index type, Point, Polygon, or LineString indexes are compatible with GeoJSON. Geospatial indexes support spatial queries and many spatial operations on the indexed types.

If we customize the indexing policies but we don't pick the right index types, our queries might get limited. For example, we can't use range operators in a query if the field does not have a range index. We can always force to run a query with a full scan by setting the explained EnableScanInQuery property to true in the FeedOptions instance passed as an argument to the CreateDocumentQuery method. However, we will always want to avoid full scan queries to reduce the RU charge.

Range and hash indexes can be further fine-tuned with an index precision parameter. This parameter helps us balance the storage overhead for the index and query performance. For numbers, the default precision is -1 (maximum). Yeah, it sounds crazy, but -1 is the maximum precision. If you increase the value, the index data size will decrease, but queries will need to scan more documents because the index record will point to a broader range of documents.

For string ranges, the precision has more effect because of the size of data for a single key. However, in order to be able to do sort queries (ORDER BY) for string keys, the precision needs to be -1 (maximum).

Checking indexing policies for a collection with the Azure portal

An index policy specifies the indexing mode for a collection and includes a list of paths to index, or to exclude. The includedPaths key lists all the indexes in a container, the index types to be used, matching data types, and the index precision. Indexing policies can be manipulated on the fly by editing them on the Azure portal or with the Cosmos DB SDK.

Now we will use the Azure portal to check the indexing policy for the Competitions1 document collection.

In the Azure portal, make sure you are in the page for the Cosmos DB account in the portal.

Click on the Data Explorer option, click on the database name you used in the configuration for the examples in previous chapters (Competition) to expand the collections for the database, and click on the collection name you used for the examples (Competitions1). Click on Scale & Settings and scroll down to the JSON document shown under Indexing Policy. The following lines show the JSON document that defines the indexing policy for the collection:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Hash",
                    "dataType": "String",
                    "precision": 3
                }
            ]
        }
    ],
    "excludedPaths": []
}

Test your knowledge

Let's see whether you can answer the following questions correctly:

On what basis does Azure bill provisioned RUs?
1. Per-day basis
2. Per-hour basis
3. Per-second basis
How many RUs does it cost to read 1 KB of data from Cosmos DB directly referencing the document with its URI or self link?
1. 1 RU
2. 10 RUs
3. 1,000 RUs
Which of the following numbers define the maximum precision for a Cosmos DB index?
1. -1
2. 256
3. 65535
If a collection isn't indexed but you still want to run a query, which of the following properties must be set to true in the FeedOptions instance, which specifies the feed options for the query?
1. EnableFullScan
2. EnableNonIndexedCollectionQuery
3. EnableScanInQuery
If a collection has 10,000 RUs provisioned and you have five physical partitions, how many RUs can be consumed on a single partition?
1. 50,000 RUs
2. 10,000 RUs
3. 2,000 RUs

Summary

In this chapter, we learned to analyze many aspects of Cosmos DB that allow us to design and maintain scalable architectures. We used our sample application to understand how many important things work, and we worked with the other examples to understand complex topics related to scalability.