Google App Engine Datastore Composite Index Zigzag Merge Join

When a entity model class is defined, basic property class such as StringProperty, IntergerProperty, DateTimeProperty, etc are indexed by default (you can disable it by using indexed=False). TextProperty is not indexed.

class Read(Model):    posted_date = ndb.DateTimeProperty()    name = ndb.StringProperty(indexed=False)    description = ndb.TextProperty()    parent_key = ndb.KeyProperty()    is_active = ndb.BooelanProperty(default=True)

Only indexed property can be queried.

items = Read.query(Read.is_active == True).fetch()

NOTE: if you declare a new property after created N items, the new property query doesn't work for old items until you save/put the old items.

When you need to perform sorting on the queried result, you would need to declare composite index.

Edit index.yaml.

- kind: Read  properties:  - name: is_active  - name: posted_date    direction: desc
items = Read.query(Read.is_active == True).order(-Read.posted_date).fetch()

NOTE: To make sure all the necessary composite index are declared during development (no auto generation), use dev_appserver.py --require_indexes.

When you need query Read.is_active and Read.parent_key, you have 2 options.

Option 1: Create a full index. The benefit of this index is best performance.

- kind: Read  properties:  - name: is_active  - name: parent_key  - name: posted_date    direction: desc

It only works for the following query (both query property must be specified).

items = Read.query(Read.is_active == True, Read.parent_key == key).order(-Read.posted_date).fetch()

Options 2: Create 2 partial index. The performance is slower as it might involve multiple query using zigzag merge join algorithm.

- kind: Read  properties:  - name: is_active  - name: posted_date    direction: desc- kind: Read  properties:  - name: parent_key  - name: posted_date    direction: desc    

An extra benefit of option 2 is that all the following query are supported.

# query by is_activeitems = Read.query(Read.is_active == True).order(-Read.posted_date).fetch()# query by parent_keitems = Read.query(Read.parent_key == key).order(-Read.posted_date).fetch()# query by is_active and parent_keyitems = Read.query(Read.is_active == True, Read.parent_key == key).order(-Read.posted_date).fetch()

For best performance and flexibility, you can declare all 3 composite index.

- kind: Read  properties:  - name: is_active  - name: posted_date    direction: desc- kind: Read  properties:  - name: parent_key  - name: posted_date    direction: desc - kind: Read  properties:  - name: is_active  - name: parent_key  - name: posted_date    direction: desc

There are also other considerations (besides performance and flexibility) such as index limit and size. Read the references section for a better understanding.

NOTE: It is advisable not to declare composite index which you don't intent to use.

References:

❤️ Is this article helpful?

Buy me a coffee ☕ or support my work via PayPal to keep this space 🖖 and ad-free.

Do send some 💖 to @d_luaz or share this article.

✨ By Desmond Lua

A dream boy who enjoys making apps, travelling and making youtube videos. Follow me on @d_luaz

👶 Apps I built

Travelopy - discover travel places in Malaysia, Singapore, Taiwan, Japan.