How Documents Are Ranked in SearchUnify
Once the relevant documents have been found, the next challenge is to sort them. SearchUnify considers more than a dozen factors before returning results in any particular order to a user. They have all been touched upon in this article.
The processed search terms are matched with the index using Lucene's search and sort algorithm. The best matches are found based on the formula, which has the following variables:
- Term Frequency (). How often the query () appears in a document ().
- Inverse Document Frequency (). How often appears across the index ().
- Coord. Number of terms in that are found in .
- lengthNorm. Importance of in .
- queryNorm. A variable to compare queries.
- Boost (Index). Boost at the time of indexing.
- Boost (Query). Boost at the time of query building.
The mapped documents are further ranked based on a proprietary algorithm. The variables in the algorithm include:
Documents with the highest score are likely to end up near the top.
Auto Tuning plays a big role here. If it's turned on, then user activity history is taken into account in deciding how the relevant documents will be ranked. The search results for a user might be different from another user. For details, check out How Auto Tuning Works and Its Features
Search Tuning Settings
.Three tuning configurations are available to SearchUnify admins:
Assign a document any rank between 1 and 10 for a query or a series. You cannot put two or more documents the same rank for a query;
document-2 cannot have the same rank for
query. However, you can return a document on the same rank for one or multiple queries;
document can have the same rank for
query-2. A setting in keyword tuning overrules all other factors and returns the document on its assigned rank. Check out Boost Documents for Specific Keywords
Content Source Tuning
.Unlike moving one document up or down at a time, admins can boost all the documents in a content source with content source tuning. Content boosting increases the default relevancy score of all documents. In the images, you can see what happens when a StackOverflow is boosted as a content source.
Supported fields are a document's title, its status, age, and popularity. An admin can alter the default emphasis on each field. For instance, a keyword match in the title can boost a document's relevancy score twice, thrice, or more. In the images, you can see the impact of title boosting for the keyword "unavailability." Custom Tuning: Boost Articles Based on Keyword Match in a Field
Recent documents are preferred over older docs.
- Offset. The period during which the score of a newly updated document doesn't change.
- Scale. Once Offset has passed, begins to drop. It continues to drop for a time period, known as Scale.
- Decay [Rate]. The rate of drop of relevance score during Scale is controlled by Decay Rate,
Example. A search for ‘pyramid construction’ before and after the creation of a new doc. The new doc is prioritized in the second image.
Some search queries activate SearchUnify's machine learning algorithms. As a result, a user receives up to 15 (instead of the usual 10) results. Machine learning algorithms fetch those extra results by applying facets and autocorrecting some search terms.
- Previous article: How Queries Are Transformed During Search
Last updated: Friday, November 27, 2020