Create Idempotent Cloud Functions for Firestore Trigger (Python)

June 27, 2019

I am writing a cloud functions to listen to firestore changes. As of now, there is no guarantee that the trigger would only fire once, so I need to do my own checking.

We can use Event ID as the key to check if this function has been executed before or not

One way to fix this is to use the event ID, a number that uniquely identifies an event that triggers a background function, and— this is important—remains unchanged across function retries for the same event.

def test_idempotent(data, context):
    event_id = context.event_id # d0b97305-85a8-4e02-943c-9021454d618f-0

Then we need a storage mechanism to store and check for this Event ID.

  • Firestore - Free Tier: 1GB Storage, 50K Read / 20K Write per Day
  • Datastore (Cloud Firestore in Datastore mode) - Free Tier: 1GB Storage, 50K Read / 20K Write per Day
  • MemoryStore - No Free Tier, $0.049 per GB per hour
  • Cloud SQL - No Free Tier, $0.0150 per hour

I will use Firestore as I am familiar with it.

Solution 1: Create Document with Event ID

Create a document using Event ID as Document ID. Proceed if the document/Event ID does not exist.

Refer: https://cloud.google.com/blog/products/serverless/cloud-functions-pro-tips-building-idempotent-functions

Assuming we have to create a document for idempotent purpose

  • A document is created for each request/Event ID (read + write cost), with timestamp for deletion purpose at later stage
  • We will need to delete them eventually (additional read/query + delete cost).

Solution 2: Create Single Document storing list of Event IDs

We can create a single document storing a list of Event IDs.

  • Each request will append Event ID with Timestamp into the document (same read + write cost as Solution 1). Use Transaction and Partial Update.
  • Create a daily batch job to delete Event ID older than 1 day (involve only 1 READ + 1 WRITE per day, cheaper than Solution 1)
  • Each document has a limit of 1MB size with 40K index (assuming each field have 2 index (ascending & descending), so each document support 20K fields with index). We could disable index since we don’t plan to query the document, or we could shard it to multiple documents based on Event ID, or increase frequency of deletion to once an hour.
  • Might suffer from maximum write rate to a document of 1 per seconds, with 60-second idle expiration time for transaction. Probably not suited to cloud functions which triggered more than 60 times per seconds, or sharding in multiple documents is required.
import logging
import datetime
from firebase_admin import firestore

log = logging.getLogger(__name__)

firestore_client = firestore.Client()

def test_idempotent(data, context):
    transaction = firestore_client.transaction()
    idempotent_ref = firestore_client.collection('function_idempotent').document('test_idempotent')

    @firestore.transactional
    def check_idempotent(transaction, idempotent_ref, event_id):

        idempotent_doc = idempotent_ref.get(transaction=transaction)

        # d0b97305-85a8-4e02-943c-9021454d618f-0
        event_id = f"data.`{event_id}`" # escape with backtick to comply with field path constraint
        try:
            if idempotent_doc.get(event_id):
                return False
        except KeyError as e:
            pass
        except ValueError as e:
            log.error(e)

        now = datetime.datetime.utcnow()
        transaction.update(idempotent_ref, {event_id: now, 'modified': now})
        return True

    if not check_idempotent(transaction, idempotent_ref, context.event_id):
        log.warning(f"Event {context.event_id} already processed, quit.")
        return

    # do something

Solution 3: Don’t create Document for idempotent purpose

Assuming our cloud functions are going to update some document anyways, we can store the Event ID in existing documents (to reduce additional read + write cost): using Event ID as Document ID and store a list of Event IDs. The downside is the data and code will more messy.

References:

This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.