How Queries Are Transformed During Search

The article on crawling and indexing showed that instead of searching through a content source, SearchUnify matches a query with the data stored in its index, which is a highly-processed and simplified snapshot of the data on content sources. This article continues on the theme. Here you will learn how queries are transformed before their matching with a document from the index.

Overview

In the previous section we learned that when a user runs a query SearchUnify searches through its index instead of entire content sources. But how exactly does it find relevant documents? This is a story in itself.

Operator Extraction

Search parameters are reserved keywords which refine a query. SearchUnify supports three Boolean parameters.

Boolean Operators: This is performed with # followed by search terms along with operators:

  • AND: Finds documents which contain all words separated by AND or && in search query. This is equivalent to an intersection using sets. ‘# laptop AND charger’ finds results containing both laptop and charger.
  • OR: Finds documents which have at least one of the terms separated by OR in search query. ‘# laptop OR charger’ finds results containing either laptop or charger or both.
  • NOT: Finds documents which do not have the keyword preceded by NOT. ‘# laptop NOT charger’ finds results containing laptop but not charger.

Besides Boolean parameters, SearchUnify also supports grouping and wildcard search.

  • Grouping: All operators mentioned above can be grouped using braces To search for laptops or printers of of HP, use (laptops OR printers) AND hp.
  • Wildcard search: Matches documents that have fields matching a wildcard expression (not analyzed). Supported wildcards are:
    • * matches any character sequence (including the empty one)
    • ? matches any single character. This operator can have serious implications on performance of query.

From Advanced Search Parameters

The search parameters can also be inserted using the Advanced Search form available right under the search box. Four options are available.

  • With the Exact Phrase: Find documents with the query as it is. Use them to find document containing a specific phrase, sentence, or sequence of terms, rather than sparse occurrences of the keywords throughout the index items. You can use a phrase match query syntax to find such index items. Phrase search is not case-sensitive.
  • Without the Words: Find documents that don't have the specified keyword(s).
  • With One or More Words: Find documents that have at last one keyword from the query.
  • Results per Page: Change the number of results displayed on each page.

Text Processing

As soon as a search query is entered, the search terms are sent for processing, which involves:

  1. Correcting Misspellings. The language in which the query is made is identified, then the search terms are matched with a standard dictionary of that language and a custom SearchUnify dictionary to identify and correct misspelled words.
  2. Converting Search Terms to Lowercase. The corrected (and misspelled) terms are converted to lowercase, if their lowercase forms exist. For languages such as Japanese, Arabic, and Hindi, this process is skipped.
  3. Removing Stop Words. Articles, conjunctions, and other common terms with little meaning are removed from a search query. In a query like, how to install SearchUnify in ServiceNow?, only install, SearchUnify, and ServiceNow are kept.
  4. Applying Synonyms. A search for SSO fetches documents containing single sign-on as well. SearchUnify has inbuilt support for standard synonyms (such as "kill", "halt", and "abort" in "kill a process", "halt a process", and "abort a process"). Admins can further Add Synonyms to Improve Search Experience.
  5. Stemming Search Terms. A search for integration returns documents containing integration, integrate, integrating, and integrated as well. Users can enclose search terms between quotes to stop stemming.

Query Building

The processed search query has to be transformed into a form comprehensible to search algorithms. The transformation involves appending the search query with parameters that limit the scope of the search and generate results faster.

Some parameters (resultsPerPage and uid) are common across search clients, but others are not. For example, the parameter permissions is available only for Salesforce search clients.

This table lists the parameters that are frequently used for query building.

Parameter Significance
resultsFrom Search results offset function. If resultsFrom=x, then the first x results will be eliminated. The x+1th result will be the first result.
sortby It can have one of these two values: score and post_time. Most relevant documents are displayed first if sortby=score. Conversely, the most recent documents are displayed on top if sortby=post_time.
orderBy It's always set to descending. Documents with the lowest scores or oldest post_time are served last.
pageNo Tells a user the search results page he or she is on.
aggregations Facet values. This field is empty if no facets are checked.
uid The search client ID.
resultsPerPage The default value is 10.
exactPhrase Search terms enclosed in quotation marks.
withOneOrMore Search terms surrounding the Boolean operator OR.
withoutTheWords Search terms preceded by the Boolean operator NOT.
sid Session ID.

Previous article: Crawling and Indexing Content Sources

Next article: How Documents Are Ranked in SearchUnify

Last updatedFriday, September 25, 2020