18. January 2024 By Yannik Rust
AWS DynamoDB: insights into the NoSQL database in the cloud
AWS DynamoDB is a high-performance NoSQL database service designed to serve as a key-value store. As a fully managed, serverless service, DynamoDB offers a fast, flexible and cost-effective way to store and retrieve data in the cloud. In this blog post, I take a deep dive into the main features, design patterns and best practices relating to DynamoDB.
What is NoSQL?
NoSQL is short for ‘Not Only SQL’ or ‘Non-SQL’ and refers to a large class of database management systems that differ from conventional relational databases. NoSQL databases were developed to meet the requirements of modern applications that need to process large amounts of unstructured or semi-structured data, such as the databases used in web applications, on social media platforms, in big data and in other scenarios.
A brief introduction to DynamoDB
DynamoDB is a non-relational key-value store built on NoSQL principles. As a fully managed service in the AWS cloud, the database requires no manual server management, allowing developers to focus on application development rather than on the underlying infrastructure.
DynamoDB offers the possibility to create globally distributed databases. This architecture makes it possible to manage data across different AWS regions, ensuring a high level of resilience and availability, which in turn reduces the time required to perform requests to milliseconds. In addition, it is even possible to achieve a latency in the microsecond range if one uses the DAX caching extension. Beyond that, DynamoDB also enables backups and data to be reset to a specific point in time (point-in-time recovery, or PITR for short), which is vital to protecting data from unintentional loss. DynamoDB offers APIs for carrying out CRUD (Create, Read, Update, Delete) operations. Transactions can also be carried out across several tables in compliance with the ACID (Atomicity, Consistency, Isolation, Durability) principles, all without the use of joins.
Structure of a DynamoDB
AWS DynamoDB features a clear hierarchical structure in which tables act as top-level entities. Unlike relational databases, there are no strict relationships between the tables. Each table instead represents a separate, independent entity, which enables flexible data modelling.
Performance control takes place at the table level, meaning that developers can optimise the performance of each table individually without this affecting other tables in the database. The data or items are stored in a special JSON format from DynamoDB, enabling both efficient storage and rapid retrieval.
One of the key components in the structure – and one that is absolutely required – is the primary key, whereby the rest of the schema is flexible. The primary key can be a simple (only one attribute) or a composite key (two attributes: the partition key and the sort key). This level of flexibility is also available when it comes to the other optional attributes in the table.
The following diagram shows the structure of a table using computer game data as an example. Here, the primary key consists of a partition key (player ID) and a sort key (game ID). The individual attributes are not available for every entry and are therefore highly flexible.
The data shown here in the table is stored in the background in DynamoDB’s own JSON format, which is why this database can also be referred to as a document database. For the first line of the example presented above, the JSON file looks like this:
{
"Player ID" :"S": "Player 1"},
"Game ID" :"S": "Game 1"},
"Date" :"S":"2020-01-02"},
"Time" :{"N":"62"},
"Score" :{"N":"34000"},
"Ranking" :{"N":"20"},
"Award" :{"S":"Champion"}
}
The prefixes ‘S’ and ‘N’ stand for the data types ‘String (S)’ and ‘Number (N)’, respectively, and they are saved in the attribute. Boolean (B) and Zero (N) can also be used as simple data types. Other data types that are also available include Lists (L), Maps (M) and String Sets (SS).
Indexes in DynamoDB
The use of indexes in DynamoDB makes it possible to more efficiently retrieve data. There are two types of index: the local secondary index (LSI) and the global secondary index (GSI).
Local secondary index (LSI)
An LSI is based on the same partition key as the primary key of the table, but it has a different sort key. It is important to note that this is a composite key. When creating an LSI, there may be no more than five indexes per table, and the size of the indexed elements is not permitted to exceed ten gigabytes. In addition, LSIs must be defined when the table is created and cannot be added at a later time.
Global secondary index (GSI)
A GSI allows you to use the same key as the primary key of the table or a different one than it. The GSI can be either a simple or a composite key. Each table can have up to 20 GSIs, and there are no size limits on the indexed elements. GSIs can be created and deleted at any time and allow you to make requests across partitions. It is important to note that GSIs only offer potential consistency (see below).
Data consistency
The AWS DynamoDB consistency models play a key role when it comes to managing read and write operations. The data to be saved is stored in a distributed manner in order to ensure the high availability of DynamoDB. Generally speaking, AWS observes the principle of potential consistency. This means that once the data has been written, there is no actual guarantee that all copies of the data will be identical at any given time. That said, DynamoDB lets you select a certain level of consistency for read and write operations depending on the application.
Consistency for read operations
If consistency is high, the latest data will always be accessed, though this requires an explicit request be made during the reading operation. Strong consistency guarantees that the data returned is the most up-to-date and consistent data available.
Conversely, weak consistency means there is no guarantee that the data returned will be the most recent version of it. It is the standard consistency level and it is cheaper, being about 50 per cent less expensive than strong consistency.
Transactional consistency provides ACID support for one or more tables within a single AWS account and region. Although it offers the most reliable level of consistency, it costs twice as much as strong read consistency.
Write consistency
Standard write consistency makes it easy to carry out write operations. The data may appear inconsistent for a short period before being harmonised internally by DynamoDB.
By contrast, transactional write consistency offers ACID support for write operations. However, this comes at a cost, it being twice as expensive as standard write consistency.
Price structure
The AWS DynamoDB pricing model offers developers both the flexibility and scalability they need to meet the requirements of their applications. The two main options available here are provisioned capacity and on-demand capacity.
Under the provisioned capacity model, users pay for the capacity they defined in advance. This model is primarily used for production systems where the amount of capacity they use is already known. In this context, capacity is defined as the number of read and write operations per second. These operations are also referred to as read/write capacity units (RCU/WCU), whereby one capacity unit represents one read or write per second.
- An RCU (read capacity unit) is measured in blocks of four kilobytes, whereby the last block is always rounded up. One RCU corresponds to one strongly consistent read operation, two eventually consistent read operations or half a transactional read operation per second.
- A WCU (write capacity unit) is measured in blocks of one kilobyte, with the last block always being rounded up. One WCU corresponds to one standard write operation or half a transactional write operation per second.
A sample calculation of the number of capacity units for writing and reading a 15 kilobyte object to and from the database with the different consistency types is shown here to provide greater context.
Limits may be imposed on the database so that no further requests can be processed if the capacity provided is exceeded due to an excessive number of requests. To avoid this problem, one can use the auto-scaling function, which automatically adjusts the capacity to predefined limits.
Under the on-demand capacity model, users pay for each read and write request. It is not necessary to provide capacity in advance. Instead, this is adjusted dynamically to the actually capacity utilised. This is a great option particularly for environments where the amount of data traffic is unpredictable, two examples of this being test and development environments. Under this model, billing is performed on the basis of request units. These are calculated in the same way as capacity units are, though they are more expensive.
In addition to the capacity costs, there are also charges imposed for storage, backups, replication, caching and external data transfer.
Best practices for DynamoDB
If you want to optimise the performance, scalability and efficiency of AWS DynamoDB, here are a few best practices to follow:
Efficient key design
- The partition key should have a large number of unique values to ensure an even distribution of data across the partitions.
- ‘Hot’ data, that is, data that is accessed frequently, should be stored in separate tables from the ‘cold’ data.
- Having a good understanding of the expected request patterns before the database is created allows you to optimise the sort keys and indexes.
Storing large attribute values
- The use of compression techniques is recommended for large attribute values as a way to reduce storage requirements.
- In the case of very large volumes of data, certain attributes can be transferred to Amazon S3, with only the path being saved in DynamoDB.
- Large attributes should be distributed across several items so as not to exceed the limit of 400 kB per item.
Read operations
- Scans and filters can consume a lot of resources and generate high costs. It is recommended that you use targeted requests with indexes.
- Eventual consistency can be used to reduce the cost of read operations where real-time consistency is not of the essence.
Local secondary indexes (LSIs)
- Local secondary indexes should be used sparingly, since they can increase the amount of resources used.
- Limited storage of attributes in an LSI helps keep the size of the index down.
Global secondary indexes (GSI)
- Limited storage of attributes in GSIs may be a good option to ensure requests are performed efficiently.
- GSIs are a great way to create eventually consistent read replicas of a DynamoDB table as a means of reducing the load on the main table.
Conclusion
AWS DynamoDB is a robust, highly scalable NoSQL database for modern cloud applications. Because it offers flexible data modelling, ACID transaction support and a variety of consistency models, it can be used as a platform for a wide array of applications. In addition to that, dynamic pricing models, efficient key design and the intelligent use of indexes help keep the cost of scaling low. That being said, it is important to choose the right primary and sort keys, define indexes correctly and take other key factors like this into account when creating a DynamoDB table to ensure it is used efficiently and does not drive up costs.
You will find more exciting topics from the adesso world in our latest blog posts.
Also intersting: