Content Source

In SearchUnify, a Content Source refers to an external data repository that SearchUnify connects with to index and retrieve information for search experiences. These content sources can include: Knowledge Bases, Community Platforms, CMS & Document Repositories, Learning Management Systems (LMS) etc.

Each content source is integrated via a crawler, which fetches data at scheduled intervals. The indexed data is then processed using vectorization and ranking algorithms in Neural and Hybrid Search for better relevance and contextual search experiences.

NOTE.

Large documents impact search performance. A document is what's returned in search results. It can be an HTML page, a Jira issue, a knowledge article, a Khoros community post, a Google Drive file, or any object in your content source. So to maintain performance, an upper size limit of 12 MB has been introduced. If textual content in all the content fields of a document exceeds 16MB, then it's not crawled.

Scenario 1: You've to crawl Zendesk articles and you've indexed three fields: Title, Description, and Publication Status. If the text in all those fields amounts to 12 MB or more, then the issue is not crawled.

Scenario 2: You've to crawl Zendesk articles and you've indexed four fields: Title, Description, Publication Status, and Attachments. In this case, the crawler first extracts text from the attached file and then measures its size. If the combined size of the text stored in all the six indexed files is less than 12 MB, then it'll be crawled.


Supported Platforms

SearchUnify supports over 40 platforms, including Salesforce, Higher Logic (Vanilla and Thrive), Adobe Experience Manager, websites among others. Each of these platforms can be indexed and searched within a few clicks. Admins can connect several of them to SearchUnify, although the recommended practice is to connect only one of its kind.

To view all the SearchUnify-compatible content sources, first navigate to Content Sources and then click Add new content source.

The content sources are categorized into seven parts: Cloud Storage, Collaboration, Content Management System, CRM and Support, Learning Management System, Search Engine, and Others. On the screen itself, but not in the dropdown, there is an eighth category: Popular, where you can find the most frequently-used content sources on the SearchUnify platform.

Use the dropdown to find all the content sources in a category.

Another way to look for a content source is to use the search box. Simply enter the name. You will see results as soon as you start typing.

Supported Languages

SearchUnify can crawl, index, and search content in more than 30 languages, including Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kurdish, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

Installed and New Content Sources

on the content source tab shows it has already been installed. You can see in the next image that YouTube and Vimeo are already installed but Box, Dropbox, Github, and Google Drive aren't.

In case you spot a , you can safely assume that the content source was added in the previous release. In the next image, you can see that SkillJar is a newcomer.

In order to install a content source, click Add (red). To find its official documentation, click Know More (blue).

In case a content source is not found, you can suggest it as an idea in the SearchUnify Community. Simply click Create an Idea in SearchUnify Community.