Partial Search On Google App Engine With Search API

October 31, 2012

Objective

Enable partial search / substring search on Google App Engine using Search API.

For the phrase “Hello World”, we want to be able to search with “hello”, “world”, “he”, “elo”, “lo wo”, etc.

Problems

  1. GAE Search API doesn’t support partial text search (like statement)
  2. Datastore could fake a prefix match (“he”, “hel”), but could simulate intermediate or suffix match (“ell”, “elo”).
  3. When the data is big (more than few thousand records), loading all the data and perform a substring check is no longer feasible.

Hack

Inspired by Google App Engine (python) : Search API : String Search.

First, tokenize the data string for all possible substrings (hello = h, he, hel, elo, etc.)

def tokenize_autocomplete(phrase):
    a = []
    for word in phrase.split():
        j = 1
        while True:
            for i in range(len(word) - j + 1):
                a.append(word[i:i + j])
            if j == len(word):
                break
            j += 1
    return a

Build an index + document (Search API) using the tokenized strings.

index = search.Index(name='item_autocomplete')
for item in items:  # item = ndb.model
    doc_id = item.key.urlsafe()
    name = ','.join(tokenize_autocomplete(item.name))
    document = search.Document(
        doc_id=doc_id,
        fields=[search.TextField(name='name', value=name)])
    index.put(document)

Perform search, and walah!

results = search.Index(name="item_autocomplete").search("name:elo")
This work is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.