Using Datastore in App Engine Standard Python 3.7

August 30, 2018
Enable Datastore Emulator for Development

NOTE: Since Python 3.7 for Google App Engine is just released a month ago and still in Beta, the documentation and library is still not quite up to pace. The following article is relavant as on 20 Aug 2018, but things might changed later.

One of the major difference between Python 2.7 vs 3.7 for App Engine is Datastore.

The ndb ORM library is not available for Python 3. You can access Cloud Datastore through the Cloud Datastore API. You can use the Google Cloud client libraries to store and retrieve data from Cloud Datastore.

Edit requirements.txt to include latest version of google-cloud-datastore.

google-cloud-datastore==1.7.0

Install the module.

pip install  -r requirements.txt

NOTE: Now you have the option to choose either Cloud Datastore or Cloud Firestore Datastore Mode. Refer to Google Cloud Datastore vs Firestore.

For development, you probably need to run a local copy of Datastore.

gcloud components install cloud-datastore-emulator

Start the datastore emulator.

gcloud beta emulators datastore start --no-legacy --data-dir=<DATASTORE_DIR> --project <PROJECT_ID> --host-port "127.0.0.1:8585"
[datastore] Aug 28, 2018 11:23:25 PM com.google.cloud.datastore.emulator.CloudDatastore$FakeDatastoreAction$9 apply
[datastore] INFO: Provided --allow_remote_shutdown to start command which is no longer necessary.
[datastore] Aug 28, 2018 11:23:25 PM com.google.cloud.datastore.emulator.impl.LocalDatastoreFileStub <init>
[datastore] INFO: Local Datastore initialized:
[datastore]   Type: High Replication
[datastore]   Storage: /code/datastore/jobs/WEB-INF/appengine-generated/local_db.bin
[datastore] Aug 28, 2018 11:23:25 PM com.google.cloud.datastore.emulator.impl.LocalDatastoreFileStub load
[datastore] INFO: Time to load datastore: 94 ms
[datastore] Aug 28, 2018 11:23:25 PM io.gapi.emulators.netty.NettyUtil applyJava7LongHostnameWorkaround
[datastore] INFO: Applied Java 7 long hostname workaround.
[datastore] API endpoint: http://::1:8693
[datastore] If you are using a library that supports the DATASTORE_EMULATOR_HOST environment variable, run:
[datastore] 
[datastore]   export DATASTORE_EMULATOR_HOST=::1:8693
[datastore] 
[datastore] Dev App Server is now running.
[datastore] 
[datastore] The previous line was printed for backwards compatibility only.
[datastore] If your tests rely on it to confirm emulator startup,
[datastore] please migrate to the emulator health check endpoint (/). Thank you!

NOTE: JRE 8 is required.

NOTE: If I didn’t specify --host-port, I get DATASTORE_EMULATOR_HOST=::1:8693 which causes google.api_core.exceptions.ServiceUnavailable: 503 Connect Failed when I initialize the Datastore API.

Setup the environment variable.

$(gcloud beta emulators datastore env-init --data-dir=<DATASTORE_DIR>)

NOTE: When Datastore is shutdown, run $(gcloud beta emulators datastore env-unset).

Python code to access datastore.

import logging
from flask import Flask
from google.cloud import datastore

app = Flask(__name__)
client = datastore.Client()

IS_DEV = __name__ == '__main__'

if IS_DEV:
    import mock
    import google.auth.credentials
    # https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/datastore/tests/unit/test_client.py
    credentials = mock.Mock(spec=google.auth.credentials.Credentials)
    client = datastore.Client(credentials=credentials) # , _use_grpc=False

    logging.basicConfig(level=logging.INFO)
else:
    client = datastore.Client()


@app.route('/test_datastore')
def test_datastore():
    key = client.key('Task')
    item = datastore.Entity(key)
    item.update({
        'category': 'Personal',
        'done': False,
        'priority': 4,
        'description': 'Learn Cloud Datastore'
    })
    key = client.put(item)

    return f"id={item.id}"


@app.route('/test_datastore_query')
def test_datastore_query():
    # https://cloud.google.com/datastore/docs/concepts/queries
    query = client.query(kind='Task')
    # query.add_filter('done', '=', False)
    # query.add_filter('priority', '>=', 4)
    # query.order = ['-priority']

    items = query.fetch()
    # items = list(query.fetch())

    size = 0
    for item in items:
        logging.info(item.id)
        # https://googlecloudplatform.github.io/google-cloud-python/latest/datastore/keys.html
        logging.info(f"key.urlsafe={item.key.to_legacy_urlsafe()}")

        urlsafe = item.key.to_legacy_urlsafe()
        key = datastore.Key.from_legacy_urlsafe(urlsafe)
        fetch_item = client.get(key)
        logging.info(item.key == fetch_item.key)

        logging.info(f"key.flat_path={item.key.flat_path}")
        size += 1

    return f"size={size}"

NOTE: I get DefaultCredentialsError when I didn’t pass in the credentials (I believe this behaviour is a bug). So we need to pass it a mock credentials as of now.

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://developers.google.com/accounts/docs/application-default-credentials.

pip install mock

Test the app.

python main.py

NOTE: DefaultCredentialsError is thrown if the datastore emulator is not setup properly (as it’s trying to connect to production datastore).

PS: I wish google could bring back an equivalent ndb library for datastore in the near future. The ndb API makes datastore programming easy and fun.

References:

This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.