I am writing a cloud functions to listen to firestore changes. As of now, there is no guarantee that the trigger would only fire once, so I need to do my own checking.
We can use Event ID
as the key to check if this function has been executed before or not
One way to fix this is to use the event ID, a number that uniquely identifies an event that triggers a background function, and— this is important—remains unchanged across function retries for the same event.
def test_idempotent(data, context): event_id = context.event_id # d0b97305-85a8-4e02-943c-9021454d618f-0
Then we need a storage mechanism to store and check for this Event ID.
- Firestore - Free Tier: 1GB Storage, 50K Read / 20K Write per Day
- Datastore (Cloud Firestore in Datastore mode) - Free Tier: 1GB Storage, 50K Read / 20K Write per Day
- MemoryStore - No Free Tier, $0.049 per GB per hour
- Cloud SQL - No Free Tier, $0.0150 per hour
I will use Firestore as I am familiar with it.
Solution 1: Create Document with Event ID
Create a document using Event ID as Document ID. Proceed if the document/Event ID does not exist.
Assuming we have to create a document for idempotent purpose
- A document is created for each request/Event ID (read + write cost), with timestamp for deletion purpose at later stage
- We will need to delete them eventually (additional read/query + delete cost).
Solution 2: Create Single Document storing list of Event IDs
We can create a single document storing a list of Event IDs.
- Each request will append Event ID with Timestamp into the document (same read + write cost as Solution 1). Use Transaction and Partial Update.
- Create a daily batch job to delete Event ID older than 1 day (involve only 1 READ + 1 WRITE per day, cheaper than Solution 1)
- Each document has a limit of 1MB size with 40K index (assuming each field have 2 index (ascending & descending), so each document support 20K fields with index). We could disable index since we don't plan to query the document, or we could shard it to multiple documents based on Event ID, or increase frequency of deletion to once an hour.
- Might suffer from maximum write rate to a document of 1 per seconds, with 60-second idle expiration time for transaction. Probably not suited to cloud functions which triggered more than 60 times per seconds, or sharding in multiple documents is required.
import loggingimport datetimefrom firebase_admin import firestorelog = logging.getLogger(__name__)firestore_client = firestore.Client()def test_idempotent(data, context): transaction = firestore_client.transaction() idempotent_ref = firestore_client.collection('function_idempotent').document('test_idempotent') @firestore.transactional def check_idempotent(transaction, idempotent_ref, event_id): idempotent_doc = idempotent_ref.get(transaction=transaction) # d0b97305-85a8-4e02-943c-9021454d618f-0 event_id = f"data.`{event_id}`" # escape with backtick to comply with field path constraint try: if idempotent_doc.get(event_id): return False except KeyError as e: pass except ValueError as e: log.error(e) now = datetime.datetime.utcnow() transaction.update(idempotent_ref, {event_id: now, 'modified': now}) return True if not check_idempotent(transaction, idempotent_ref, context.event_id): log.warning(f"Event {context.event_id} already processed, quit.") return # do something
Solution 3: Don't create Document for idempotent purpose
Assuming our cloud functions are going to update some document anyways, we can store the Event ID in existing documents (to reduce additional read + write cost): using Event ID as Document ID and store a list of Event IDs. The downside is the data and code will more messy.
References: