Serious Consistency Shit For Google App Engine Datastore

June 28, 2013

Based on Structuring Data for Strong Consistency

The Google App Engine’s High Replication Datastore (HRD) provides high availability for your reads and writes by storing data synchronously in multiple datacenters. However, the delay from the time a write is committed until it becomes visible in all datacenters means that queries across multiple entity groups (non-ancestor queries) can only guarantee eventually consistent results. Consequently, the results of such queries may sometimes fail to reflect recent changes to the underlying data.

What does it means? If you didn’t use ancestor/entity group for your model, consistency is not ensured (e.g. If you add an model and expect the model to be ready in the subsequent query, good luck!)

What is the drawback of using ancestor/entity group?

It’s read only, meaning you cannot change ancestor/entity group after the model is created.

This approach achieves strong consistency by writing to a single entity group per guestbook, but it also limits changes to the guestbook to no more than 1 write per second (the supported limit for entity groups).

I am not sure what it means by “1 write per second”, but I assume the worst case scenario where it would take 10 seconds to write to 10 guestbook of the same ancestor/entity group, OMG!

Is strong consistency important?

It you just added a model, you can’t expect it to be available on the next query (unless these two operation are at least a few seconds apart) e.g. this case.

If you want to ensure a unique property/field, it would be tricky. What not use ID/Key Name? Because these are read-only, even ISBN for books could change. No hope on support for Unique Field yet by GAE.

How to achieve strong consistency without using ancestor/entity group (performance penalty)?

Does NDB Caching helps?

Queries do not look up values in any cache.

To generate unique property, use ID/Key Name with Model.get_or_insert (drawback: read-only).

Does transaction ensure consistency?

All Datastore operations in a transaction must operate on entities in the same entity group if the transaction is a single group transaction, or on entities in a maximum of five entity groups if the transaction is a cross-group (XG) transaction.

Caching (memcache) seems to be one of the most viable option, storing recent models (affected by current user/session) which are required to ensure consistency (custom code required).

If your application is likely to encounter heavier write usage, you may need to consider using other means: for example, you might put recent posts in a memcache with an expiration and display a mix of recent posts from the memcache and the Datastore, or you might cache them in a cookie, put some state in the URL, or something else entirely.

How to ensure a Unique Property?

  1. Before setting the property, check if the value already exist in RecentValueCache (e.g. store latest value in memcache, expire in 10s). If already exist in RecentValueCache, UniqueContraintException is raised.
  2. Store value into RecentValueCache (allow immediate locking of value to temporary avoid further usage, memcache expire in 10s).
  3. Check value already exist by performing a Query. If exist, UniqueContraintException is raised.
  4. If no, proceed to save/put.

This method is not 100% safe if the code could be executed in a multi-threaded or multi-process manner under web environment (e.g. set threadsafe: false in app.yaml or launching tasks).

Thread-safety

Put the above code in a function, and use Python Thread Locking, refer to python lock method annotation and what are some common uses for Python decorators?.

I doubt Python Thread Locking play an important role here since Google App Engine could launch multiple instances which might be on different process or different server.

PS: I might have missed out a few things due to my limited understanding, do suggest :)

This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.