Objective
Enable partial search / substring search on Google App Engine using Search API.
For the phrase "Hello World", we want to be able to search with "hello", "world", "he", "elo", "lo wo", etc.
Problems
- GAE Search API doesn't support partial text search (like statement)
- Datastore could fake a prefix match ("he", "hel"), but could simulate intermediate or suffix match ("ell", "elo").
- When the data is big (more than few thousand records), loading all the data and perform a substring check is no longer feasible.
Hack
Inspired by Google App Engine (python) : Search API : String Search.
First, tokenize the data string for all possible substrings (hello = h, he, hel, elo, etc.)
def tokenize_autocomplete(phrase): a = [] for word in phrase.split(): j = 1 while True: for i in range(len(word) - j + 1): a.append(word[i:i + j]) if j == len(word): break j += 1 return a
Build an index + document (Search API) using the tokenized strings.
index = search.Index(name='item_autocomplete')for item in items: # item = ndb.model doc_id = item.key.urlsafe() name = ','.join(tokenize_autocomplete(item.name)) document = search.Document( doc_id=doc_id, fields=[search.TextField(name='name', value=name)]) index.put(document)
Perform search, and walah!
results = search.Index(name="item_autocomplete").search("name:elo")