Introducing DynamoQuery: Python AWS DynamoDB ORM

DynamoDB is a great fit for serverless architectures: it is scalable and fast, it supports role-based permissions, and most importantly, is itself serverless.

However, a common barrier for engineering teams to use DynamoDB is the lack of a widespread, generic, and flexible Object-Relational-Mapper (ORM) for interacting with it. This is especially striking in the Python ecosystem, where there are great ORM tools for working with relational databases.

At Altitude Networks, we adopted a serverless architecture, and have been using DynamoDB from day one. To address the gap of a Pythonic ORM for DynamoDB, our team at Altitude Networks developed our own library for it. The result was the DynamoQuery project. We have been using DynamoQuery in our application for more than a year now, and have consistently improved it. We are now excited to make it an open-source project to allow the larger community to use it and contribute to it.

Key Features

  • Type hints: DynamoQuery has full typing support, per PEP-484.
  • Test coverage: DynamoQuery has full test coverage.
  • Support for partitioned tables and large data sets. At Altitude Networks, we use DynamoDB to store large datasets. To support such operations, we have included features such as partitioned tables and iterators that make DynamoQuery a suitable interface for projects of any size.
  • Built-in caching.
  • Pythonic interface. We were inspired by successful ORMs such as SQLAlchemy and Django, and built a Python interface that is familiar to Python developers.
  • BYOD. In designing DynamoQuery, we adopted the BYOD (“bring-your-own-data”) paradigm, allowing the callers to interact with DynamoDB using their own data objects. Moreover, to support working with table-like data, we defined a lightweight data class called DataTable (loosely inspired by pandas.DataFrame). Another key data type is DynamoRecord, which is a regular Python dict, so it can be used in boto3.client('dynamodb')  calls directly.
  • Full feature support. DynamoQuery provides access to the low-level DynamoDB interface in addition to ORM via boto3.client and boto3.resource objects. This gives full access to the entire DynamoDB API without blocking developers from using the latest features as soon as they are introduced by AWS.
  • Pythonic logging. DynamoQuery registers and uses its own logger everywhere, so it will be easy to integrate that within a larger framework such as Django.

How to use and contribute

DynamoQuery is published on pip, so you can install it by running the command:

python -m pip install dynamoquery

The package README file has information on how to use and contribute to the project:

https://github.com/altitudenetworks/dynamoquery

Here is a simple example of how to use DynamoQuery to define and interact with a Dynamo table.

"""
Usage examples for `DynamoTable` class.
"""
from typing import Optional

import boto3
from mypy_boto3_dynamodb.service_resource import DynamoDBServiceResource, Table

from dynamo_query.dictclasses.dynamo_dictclass import DynamoDictClass
from dynamo_query.dynamo_table import DynamoTable
from dynamo_query.dynamo_table_index import DynamoTableIndex


class UserRecord(DynamoDictClass):
    pk: str
    project_id: str
    company: str
    email: str
    name: Optional[str] = None
    age: Optional[int] = None
    dt_created: Optional[str] = None
    dt_modified: Optional[str] = None

    @DynamoDictClass.compute_key("pk")
    def get_pk(self) -> str:
        return self.project_id

    @DynamoDictClass.compute_key("sk")
    def get_sk(self) -> str:
        return self.company


class UserDynamoTable(DynamoTable[UserRecord]):
    gsi_name_age = DynamoTableIndex("gsi_name_age", "name", "age", sort_key_type="N")
    global_secondary_indexes = [gsi_name_age]
    record_class = UserRecord

    read_capacity_units = 50
    write_capacity_units = 10

    @property
    def table(self) -> Table:
        resource: DynamoDBServiceResource = boto3.resource("dynamodb")
        return resource.Table("test_dq_users_table")  # pylint: disable=no-member


def main() -> None:
    user_dynamo_table = UserDynamoTable()
    user_dynamo_table.create_table()
    user_dynamo_table.wait_until_exists()
    user_dynamo_table.clear_table()

    user_dynamo_table.batch_upsert_records(
        [
            UserRecord(
                project_id="my_project",
                email="john_student@gmail.com",
                company="IBM",
                name="John",
                age=34,
            ),
            UserRecord(
                project_id="my_project",
                email="mary@gmail.com",
                company="CiscoSystems",
                name="Mary",
                age=34,
            ),
        ]
    )

    print("Get all records:")
    for user_record in user_dynamo_table.scan():
        print(user_record)

    print("Get John's record:")
    print(
        user_dynamo_table.get_record(
            UserRecord({"email": "john_student@gmail.com", "company": "IBM"})
        )
    )

    print("Query by a specific index:")
    print(
        list(
            user_dynamo_table.query(
                index=UserDynamoTable.gsi_name_age, partition_key="Mary", sort_key=34
            )
        )
    )

    print("Using iterators for batch methods:")
    record = UserRecord({"email": "john_student@gmail.com", "company": "IBM"})
    for full_record in user_dynamo_table.batch_get_records((i for i in [record])):
        print(full_record)

    user_dynamo_table.batch_upsert_records([record])
    user_dynamo_table.batch_delete_records((i for i in [record]))


if __name__ == "__main__":
    main()

Alternatives

There are a few open-source options for DynamoDB Python interface, but most of them are either no longer maintained, or support only a fraction of DynamoDB features. The only open-source package we are aware of that has full feature support and is actively maintained is the PynamoDB package. It has nice features such as supporting scans and queries, transactions, and polymorphism. PynamoDB requires defining a Model class based on the DynamoDB table data, and introduces methods for the Model class to view or interact with the table data. While this pattern is familiar to developers that have used traditional Model-View-Controller architectures, it limits their ability to use their data store of choice. For instance, if you read data from an external source such as a relational database or JSON blob, you first need to store it as a Model object before you can work with DynamoDB. If you come from a web application, MVC, or Django background, this abstraction may be desirable. However, if you work with data pipelines, large data sets, and application programming, we believe DynamoQuery is more flexible and scales better with your application’s growing needs.