How Queries Are Transformed During Search
The article on crawling and indexing showed that instead of searching through a content source, SearchUnify matches a query with the data stored in its index, which is a highly-processed and simplified snapshot of the data on content sources. This article continues on the theme. Here you will learn how queries are transformed before their matching with a document from the index.
Overview
In the previous section we learned that when a user runs a query SearchUnify searches through its index instead of entire content sources. But how exactly does it find relevant documents? This is a story in itself.
Operator Extraction
Search parameters are reserved keywords which refine a query. SearchUnify supports three Boolean parameters.
Boolean Operators: This is performed with # followed by search terms along with operators:
- AND: Finds documents which contain all words separated by AND or && in search query. This is equivalent to an intersection using sets. ‘
# laptop AND charger
’ finds results containing both laptop and charger. - OR: Finds documents which have at least one of the terms separated by OR in search query. ‘
# laptop OR charger
’ finds results containing either laptop or charger or both. - NOT: Finds documents which do not have the keyword preceded by NOT. ‘
# laptop NOT charger
’ finds results containing laptop but not charger.
Besides Boolean parameters, SearchUnify also supports grouping and wildcard search.
- Grouping: All operators mentioned above can be grouped using braces To search for laptops or printers of of HP, use
(laptops OR printers) AND hp
. - Wildcard search: Matches documents that have fields matching a wildcard expression (not analyzed). Supported wildcards are:
- * matches any character sequence (including the empty one)
- ? matches any single character. This operator can have serious implications on performance of query.
From Advanced Search Parameters
The search parameters can also be inserted using the Advanced Search form available right under the search box. Four options are available.
- With the Exact Phrase: Find documents with the query as it is. Use them to find document containing a specific phrase, sentence, or sequence of terms, rather than sparse occurrences of the keywords throughout the index items. You can use a phrase match query syntax to find such index items. Phrase search is not case-sensitive.
- Without the Words: Find documents that don't have the specified keyword(s).
- With One or More Words: Find documents that have at last one keyword from the query.
- Results per Page: Change the number of results displayed on each page.
Text Processing
As soon as a search query is entered, the search terms are sent for processing, which involves:
- Correcting Misspellings. The language in which the query is made is identified, then the search terms are matched with a standard dictionary of that language and a custom SearchUnify dictionary to identify and correct misspelled words.
- Converting Search Terms to Lowercase. The corrected (and misspelled) terms are converted to lowercase, if their lowercase forms exist. For languages such as Japanese, Arabic, and Hindi, this process is skipped.
- Removing Stop Words. Articles, conjunctions, and other common terms with little meaning are removed from a search query. In a query like,
how to install SearchUnify in ServiceNow?
, onlyinstall
,SearchUnify
, andServiceNow
are kept. - Applying Synonyms. A search for
SSO
fetches documents containingsingle sign-on
as well. SearchUnify has inbuilt support for standard synonyms (such as "kill", "halt", and "abort" in "kill a process", "halt a process", and "abort a process"). Admins can further Synonyms to Improve Search Experience - Stemming Search Terms. A search for
integration
returns documents containingintegration
,integrate
,integrating
, andintegrated
as well. Users can enclose search terms between quotes to stop stemming.
Query Building
The processed search query has to be transformed into a form comprehensible to search algorithms. The transformation involves appending the search query with parameters that limit the scope of the search and generate results faster.
Some parameters (resultsPerPage
and uid
) are common across search clients, but others are not. For example, the parameter permissions
is available only for Salesforce search clients.
This table lists the parameters that are frequently used for query building.
Parameter | Significance |
resultsFrom
|
Search results offset function. If resultsFrom=x , then the first x results will be eliminated. The x+1 th result will be the first result. |
sortby
|
It can have one of these two values: score and post_time . Most relevant documents are displayed first if sortby=score . Conversely, the most recent documents are displayed on top if sortby=post_time . |
orderBy
|
It's always set to descending. Documents with the lowest scores or oldest post_time are served last. |
pageNo
|
Tells a user the search results page he or she is on. |
aggregations
|
Facet values. This field is empty if no facets are checked. |
uid
|
The search client ID. |
resultsPerPage
|
The default value is 10. |
exactPhrase
|
Search terms enclosed in quotation marks. |
withOneOrMore
|
Search terms surrounding the Boolean operator OR. |
withoutTheWords
|
Search terms preceded by the Boolean operator NOT. |
sid
|
Session ID. |
Related
Previous article: Crawling and Indexing Content Sources
Next article: How Documents Are Ranked in SearchUnify