How to simplify distributed app development with CRDTs

Scaling applications globally has many challenges, many of which have to do with how to handle simultaneous updates to the same data set. Imagine a business account being used for transactions by multiple people in different locations, or an organization that ships products globally having to deal with multiple requests for the same item.

Developers will often try to solve this problem by making the same application scale to multiple geographies using brute force. They handle every condition at the application level and use locks, semaphores, and other mechanisms to force applications out of concurrent update situations.

There is a better way: Let the database do all the heavy lifting.

Advances in distributed computing have introduced the notion of CRDTs (conflict-free replicated data types). These data structures allow multiple replicas in different regions to mathematically resolve to the same state without coordination or slowing down individual regional replicas.

Using CRDTs in applications simplifies global distribution challenges enormously, allowing developers to write applications as if for a single region and to deploy globally in a hassle-free manner.

Here are the top techniques and scenarios where CRDTs can be used to radically simplify the implementation of globally distributed applications.

Gartner Magic Quadrant for Software Test Automation 2017

A common global scaling scenario

Traditionally, applications deployed in several different locations required developers to handle conflict scenarios related to shared datasets at the application level. But CRDTs have lifted this burden. 

Imagine someone shopping at an online store on a mobile phone while riding (hopefully not driving) in a car traveling a long distance through the Bay Area in California. As the person moves, the online store serves the user from multiple data centers to ensure the best possible response times as they interact with the application. The San Jose data center might serve the user at the south end of the Bay Area, while the San Francisco data center takse over as the user nears that city.

When the user browses the catalog and adds items to the shopping cart (while being served from San Jose), the same items must be replicated to other data centers, including San Francisco. When the user moves north, the app will be served from San Francisco, so additional shopping cart items should be added to the shared dataset and must not overwrite previous items.

The shopping cart must show all the items, not just the new ones. This task is typically accomplished at the application level while providing specific instructions to how shared copies of the data must be updated.

When this scenario is implemented as a CRDT, all of these steps are completed correctly and automatically. The database handles updates to a set that contains the shopping cart items, regardless of which data center they are added from, while also replicating multiple active copies.

Here are the shopping cart updates received while on the San Jose data center:

+ shoes

+ sandals

+ bag

And here are the updates received while on the San Francisco data center:

- shoes

+ jacket

+ scarf

Here is the desired state of the shopping cart in every active copy:

  • sandals
  • shoes
  • bag
  • jacket
  • scarf 

Some databases that use only passive copies in multiple data centers require that the end user's writes (adds to or deletions from the shopping cart) always be routed to the master active copy, which usually results in unacceptable latencies. Other databases that maintain multiple active copies of the data in each data center require updates to be propagated to a certain number of copies before acknowledging the write. This also imposes latencies. But it also typically means the app is unable to present a consistent view of the shared dataset to the user as he or she moves between locations.

CRDT-based databases ensure that multiple active copies present accurate views of the shared datasets at low latencies, while also providing highly intelligent mechanisms for dealing with simultaneous updates to the same shared dataset.

Using CRDTs

Redis and Riak are two open-source NoSQL data stores that support CRDT data structures. AWS Dynamo and Redis Enterprise are two commercial databases that support it as well. Here’s what the scenario would look like using Redis Sets to store shopping cart items:

Figure: Redis Sets in Redis Enterprise

Now, imagine a shared mobile account being used by family members in different locations. As dad uses his phone in California while grandpa is calling his friend from New York and mom checks her messages during a visit to Oregon, the account must be updated simultaneously with increments to the total minutes consumed. If there are multiple active copies of the account meter being maintained in California, New York, and other data center locations, updates should not clash and overwrite each other. With a CRDT implementation, simple increments of the distributed counter are additive.  


This kind of CRDT implementation is highly efficacious because it gracefully handles changes in the states of data types (for which multiple active copies are maintained) and provides a consistent view of the shared data without the need for any conflict resolution. All active copies eventually converge to the same state. And, when implemented with an in-memory database such as Redis, any lags happen over a minuscule time period.

Get up to speed on CRDT implementations

Now that you understand the advantages that you can gain from using CRDTs in your mobile apps, consider reading this more technical introduction to state-based CRDTs. Data management in distributed systems is going to become increasingly harder as more organizations move their applications to microservices architectures.

How has your team put CRDTs to work for geographically-distributed apps? Share your experiences or questions in the comments below.

Gartner Magic Quadrant for Software Test Automation 2017
Topics: App DevDevOps