Difference between Keyword, Neural and Hybrid search
Admins can choose between two retrieval algorithms for search clients. The first algorithm is the good old Keyword Search and the second algorithm is the newly-introduced Neural Search. For those who want to have their cake and eat it too, Hybrid Search is the way forward. It brings the best in Keyword Search and Neural Search to you.
Keyword Search
Keyword Search has been around for a long time. In Keyword Search, the data stored in content source is broken into bits called "tokens". Those tokens, along with the document information, are stored in a file called the "index".
Imagine a document that contains only one sentence: "Bagira is a black cat." That document can produce four tokens: "Bagira", "is", "black", and "cat." The index stores all the four tokens with mapping information so that looking at a token you can tell its source document.
When a user runs a search query, the query is matched with the tokens stored in the index. If a match a found, the mapped document is returned.
This kind of search works excellently for exact match queries. A user can insert a case ID in the search box to find a case quickly.
Neural Search
Neural Search is different. Instead of storing tokens, it stores numbers. The entire sentence "Bagira is a black cat." can be stored as a series of three numbers, [8, 6, 3].
The entire series of numbers is known as a “vector” and each number in that series is called a “vector distance.” When data is stored or “vectorized”, the vector distances for each document are measured. Let’s assume that we have four documents in the content source.
Document 0: “Bagira is a black cat."
Document X: "I hate the color black."
Document Y: "Nocturnal animals are everywhere."
Document Z: "I like black cats."
Again, assume that the vectorization values Document 0 are [8, 6, 3]. Here’s how those numbers are obtained:
Document X is about colors and it's only tangentially related to Document 0. So it is 8 units away from it along the dimension “color”. The document Y is about all the animals, not just cats, so it is 6 units away along the dimension “animal”. Document X is only 3 units to the original sentence because it also talks about "black cats."
Our sentence was "encoded" in three dimensions: "color", "animal", and "black cats." In a large dataset, the encoding can be based on thousands of dimensions. In each dimension, the document is a certain distance away from other documents. "Cats are good." and "Streaming is good." may be thousands of units away in one dimension (subject: "cats" and "streaming") but very close in another dimension (quality: "good").
When Neural Search is on and the user runs a query, then the query is encoded into vectors and the vectors nearest to it in the content sources are found. This works excellently for generic searches. For example, "best movies of 2024" can produce a list based on several criteria, including the year of launch and the rating on different sites.
Differences between Keyword Search and Neural Search
Neural Search | Keyword Search |
Finds semantic (meaning) matches |
Find syntactic (words) matches |
Highly context dependent |
Moderately context dependent |
Works excellently on short queries |
Works better on long queries |
Language agnostic | Needs to be configured for each language (eg. Japanese, English, etc) |
Supports image and video search |
Supports text search |
To use Neural Search, check out Relevancy Configurations: Default Search Operator, Special Character Search, Neural Search, Hybrid Search and Full Neural Search (with Hybrid)
Hybrid Search
For a search engine that can handle multiple kinds of queries, you need Neural Search for generic queries and keyword search for exact match searches. Hybrid Search gives you both.
In the hybrid search, each search query is treated as plain text (keyword search) and a vector (neural search). Both for exact match queries, such as “ticket AB123CD”, and general queries, such as “most used SearchUnify features”, the results are retrieved through keyword search and neural search. Then, those two streams of results are combined and a Re-Ranker orders those results before presenting them to the user. The Re-Ranker is smart enough to prioritize Keyword Search for the query “ticket AB123CD” and Neural Search for the query “most used SearchUnify features”.
To use Hybrid Search, check out Relevancy Configurations: Default Search Operator, Special Character Search, Neural Search, Hybrid Search and Hybrid Search