August 8, 2023

Lessons learned optimizing DynamoDB costs

Background

Using Appcues in both your web and mobile applications allows you to deliver beautiful, contextual, targeted messaging to your users - and drive deeper adoption of your products.

The data we track with DynamoDB is an event stream of user activity that our customers send us via the Appcues mobile/JavaScript SDKs, APIs, or integrations like Segment and Rudderstack. We keep a history of how often and when these events occur to allow our customers to deliver Appcues experiences to the right user at the right time. For example, our customers can create the following Audience targeting rule to show a "Reminder to set up your profile" Pin (a special kind of tooltip), the 2nd time a user visits a page within 30 days of their sign-up.

As a user navigates through our customer’s product, the Appcues SDK generates events to help our customers target Appcues experiences. We store these events in AWS DynamoDB, a NoSQL database with great performance, scalability and high availability.

Problem to solve

Our monthly write costs for the AWS DynamoDB table used to store user events were way higher than we would have liked - around five times more.

We needed to reduce our DynamoDB costs so we could allocate the savings to build more cool new features for our customers. Alternatives to DynamoDB didn’t have the performance we needed, and frequently they cost more too.

This post describes how we optimized our DynamoDB write operations to reduce our costs.

Our DynamoDB usage in a nutshell

The DynamoDB table that stores our user events data has a format similar to the one shown below:

In this example, the user_id is the partition key and the event_name is the sort key. This table stores all events sent to our backend with a list of timestamps for each occurrence. With this information, the system can query how many times an event previously occurred.

Unfortunately, this straightforward schema became costly over time. Why?

Investigation

The team started looking at various metrics but they didn’t provide any insights. Several calls with our AWS support team also didn’t help either.

We built custom graphs derived from AWS data and our system metrics to investigate further. When we analyzed the numbers on one of these custom charts, we found that the average write unit consumed per request was around 35 ~ 40 units, even when the only change was adding a timestamp to the list. That meant on every request, we updated a table item with about 40kb. That is an expensive operation to do that frequently!

Since the timestamp is an ever growing list, the cost of writing any modification for an item also grows linearly.

From DynamoDB docs: "A write capacity unit represents one write per second for an item up to 1 KB in size. Item sizes for writes are rounded up to the next 1 KB multiple. For example, writing a 500-byte item consumes the same throughput as writing a 1 KB item."

Even when updating just an item attribute, even a boolean field, that will consume DynamoDB write capacity taking into account the whole item size (surprise!):

DynamoDB considers the size of the item as it appears before and after the update. The provisioned throughput consumed reflects the larger of these item sizes. Even if you update just a subset of the item's attributes, UpdateItem will still consume the full amount of provisioned throughput (the larger of the "before" and "after" item sizes).

We then started analyzing the patterns of events for the customers sending the most data. We discovered that, while most events were sent to our backend a few times per user, some would be sent thousands of times. These events were increasing our DynamoDB costs because the event timestamp list would get bigger (and more expensive to write) every time we stored them.

This issue was hard to detect since most table items had few timestamps in the events list. After identifying which events were sent most for a specific account, we found examples of the DynamoDB table items driving the costs up. When our system received thousands of the same event it created very large DynamoDB records that required multiple write units to record each new event.

This table was expensive because, on average, it consumed 40 write units every time we updated a table item. We experienced the perfect cost storm given DynamoDB pricing is based on consumed capacity and our system is write-heavy.

Armed with this information we were ready to plan a solution.

Solution

After spending some time scratching our heads on a virtual whiteboard, we proposed a solution to store the user event timestamps not as a list but in a separate table where the timestamp is the range key, with a composite partition key of user_id, and the event_name .

This schema was a fundamental shift in how the system stored the user's data. Instead of appending a list of timestamps, now each timestamp would be stored as a separate item. These are some advantages of this schema:

Writing a timestamp has a fixed cost (1) because the item is just the key (user_id + event_name), and the range key is the timestamp. These changes maintain a fixed cost to write event timestamps, thereby reducing our costs for this table by one order of magnitude.

It enabled us to make better use of DynamoDB’s filter clauses and count, by leveraging the timestamp as the range key. Rather than handling the logic in code, we used a DynamoDB query to retrieve all items or a count where the timestamp is greater than a provided value.

We validated this solution by running multiple inserts via CLI to ensure that the consumed write units were constant and that we could query all the information we wanted, like count, filtering, etc.

We didn’t want a big bang migration that would take a ton of time, so we changed our application logic to first read from the new schema. The application would use the legacy schema if it didn’t find any data. The legacy table became a read-only database since the application only writes data on the new table.

Rolling it out

After making the necessary changes in our codebase, we began testing this new solution on a few test accounts first.

Next, we rolled out this newly designed schema a sampling of accounts to validate that our solution used significantly less write capacity. Even though the new schema only ingested about 25% of the incoming events, the write capacity consumed was much less than that of the legacy table, as shown below during the rollout period.

After validating our assumptions, we rolled it out to all accounts.

Savings 🤑

The new DynamoDB table schema we implemented has proven highly cost-effective for our company. A 97.4% savings! This has allowed us to allocate these funds toward other essential business areas.

Lessons learned

If you're struggling with high AWS bills due to the write-heavy usage of DynamoDB, you're not alone. Many companies face this challenge, but there are ways to reduce costs and optimize usage.

First and foremost, it's essential to understand the pricing model of DynamoDB. DynamoDB pricing is influenced by consumed capacity, and with write-heavy usage, costs can quickly add up. One common issue we faced was the cost of writing a single item, which could consume multiple write units even for a small update.

Additionally, constantly updating large items can significantly increase costs over time. We recommend using a proper table schema to address these issues and avoid updating large items frequently. In our case, we shifted from appending a list of timestamps to storing each timestamp as a separate item. This reduced our costs by one order of magnitude and allowed us to use DynamoDB's more efficient filter clauses and count. Even duplicating stored data in multiple tables is an option when you can’t develop a simple table schema.

Take advantage of count operation instead of keeping a count attribute, which requires multiple updates and consumes write capacity.

Monitoring the average write/read size on a DynamoDB table is also important. This is key to controlling costs and may not be obvious on the provided AWS graphs.

So, if you're experiencing high AWS bills due to the write-heavy usage of DynamoDB, don't worry. There are ways to optimize usage and reduce costs. By following the tips we've shared and monitoring your usage, you may be able to decrease DynamoDB costs significantly.

‍

Rubem Nakamura

Senior Platform Engineer

@ Appcues