Technologies

Experience using Azure Cosmos DB in a commercial project

Marketing Team

WaveAccess

Published July 6, 2018

Let us tell you more about our projects

Start here

In this article, we will share with you our experience using the Azure Cosmos DB database service in a commercial project. We will tell you what the DB is for, and the nuances we came across throughout development.

In this article we will share with you our experience using the Azure Cosmos DB database service in a commercial project. We will tell you what the DB is for, and the nuances we came across throughout development.

What is Azure Cosmos DB?

Azure Cosmos DB is a commercial, globally distributed database service with a multi-model paradigm, provided as a PaaS solution. It is the next generation of Azure DocumentDB. The database was developed in 2017 at Microsoft Corporation with the participation of Dr.Sci. Leslie Lamport (winner of the Turing Award 2013 for a fundamental contribution to the theory of distributed systems, the developer of LaTex, the creator of the TLA + specification).

The main characteristics of Azure Cosmos DB are:

Non-relational database
Storage of documents in the JSON format
Horizontal scaling with the ability to select geographic regions

Map 1_EN (1)

Multi-model data paradigm: key-value, document, graph, family of columns
Low latency for 99% of queries: less than 10 ms for read operations and less than 15 ms for (indexed) write operations
Designed for high throughput
Ensures availability, consistency of data, delay at SLA level of 99.999%
Configurable throughput
Automatic replication (master-slave)
Automatic data indexing
Configurable levels of consistency of data Five different levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual);

Notice the dependence of different levels of consistency on availability, performance, and consistency of data.

The graph is up – availability, productivity; graph down – consistency.

For a convenient transition to Cosmos DB from your database, there are many APIs for accessing data: SQL, JavaScript, Gremlin, MongoDB, Cassandra, Azure Blob
Customizable firewall
Configurable DB size

Let us tell you more about our projects

Start here

The problem that we solved

Thousands of sensors located around the world transmit notifications every N seconds. These notifications should be stored in the database, and allow for searching and display in the UI of the system operator.

Customer requirements:

Using the Microsoft technology stack, including the Azure cloud
Throughput of 100 requests per second
Notifications do not have a clear structure and could be further expanded
For critical notifications, processing speed is important
High fault tolerance of the system

Based on customer requirements, we were perfectly suited for a non-relational, globally distributed, reliable commercial database.

Map 2_EN

If you look at databases similar to Cosmos DB, you would find Amazon DynamoDB, Google Cloud Spanner. But Amazon DynamoDB is not globally distributed, and Google Cloud Spanner has fewer levels of consistency and types of data models (only a table view, a relational one).

For these reasons, we settled on Azure Cosmos DB. Azure Cosmos DB SDK for .NET was used to interact with the database, since the backend was written in .NET.

The nuances we encountered

Database Management Tools

In order to start using the database, first of all, you need to select a tool to manage it. We used Azure Cosmos DB Data Explorer in the Azure portal and DocumentDbExplorer. Also, there is a utility Azure Storage Explorer.

Configuring Database Collections

In Cosmos DB, each database consists of collections and documents.

Customizable characteristics of the collection, which should be noted:

Size of collection: fixed or unlimited
Bandwidth per request units per second RU/s (from 400 RU/s)

Scale _EN

Indexing policy (Including or excluding documents and paths to and from the index and from it, configuring various index types, setting up index update modes)

Example of a typical index:

        
        {
          "id": "datas",
          "indexingPolicy": {
          "indexingMode": "consistent",
          "automatic": true,
          "includedPaths": [
          {
          "path": "/*",
          "indexes": [
          {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
          },
          {
          "kind": "Hash",
          "dataType": "String"
          },
          {
          "kind": "Spatial",
          "dataType": "Point"
          }
          ]
          }
          ],
          "excludedPaths": []
          }
        }

To search for a substring in string fields, you need to use the Hash-index ("kind": "Hash").

Database transactions

In the database, transactions are implemented at the stored procedures level (Execution of a stored procedure is an atomic operation). Stored procedures are written in JavaScript.

        
          var helloWorldStoredProc = {
            id: "helloWorld",
            body: function () {
              var context = getContext();
              var response = context.getResponse();
              response.setBody("Hello, World");
            }
          }

Database change channel

Change Feed listens for changes in the collection. When there are changes in the collection documents, the database throws a change event to all the subscribers of this channel.

We used Change Feed to track changes to the collection. When creating a channel, you must first create an auxiliary AUX collection, which coordinates the processing of the change channel for several work roles.
Database restrictions
- Absence of bulk operations (used stored procedures for mass deletion, updating of documents)
- No partial update of the document
- There is no operation SKIP (complexity of the implementation of pagination). To implement the pagination in requests to receive notifications, we used the parameters RequestContinuation (reference to the last element as a result of issuance) and MaxItemCount (the number of items returned from the database). By default, results are returned in packets (no more than 100 items and no more than 1 MB in each package). The number of items returned can be increased to 1000 by using the MaxItemCount parameter
Processing the Error 429

When the collection Throughput reaches a maximum, the database starts throwing an error "429 Too Many Requests". To process it, you can use the RetryOptions setting in the SDK, where MaxRetryAttemptsOnThrottledRequests is the number of attempts to execute the query, and MaxRetryWaitTimeInSeconds is the total time that connection attempts are attempted.
Forecasting the cost of using the database

To predict the cost of using the database, we used the online calculator RU / s. In the base plan, 1 query unit for a 1 KB element corresponds to a simple GET command by reference to itself or the identifier of that element.

Resume

Azure Cosmos DB is easy to use, easily and flexibly configured through the Azure portal. A lot of APIs for accessing data allow you to quickly transition to Cosmos DB. You do not need to get the database administrator to maintain the database. Financial guarantees of SLA, global horizontal scaling make this database very attractive in the market. It is great for use in corporate and global applications that require high demands on fault tolerance and throughput. We at WaveAccess continue to use Cosmos DB in our projects.

To get your custom Azure Media Services based project, contact us!

Microsoft ecosystem