Index RSS Feeds

The easiest way to search through RSS feeds content is to link the feeds to SearchUnify RSS Crawler. Once linked, SearchUnify indexes the content as soon as it appears. You can limit indexing to the latest batch or keep all content in the index. Up to twelve content fields are supported, including links, descriptions, and titles.

PERMISSIONS.

SearchUnify ignores user permissions during searches. All indexed files can be search by all users.

Establish a Connection

  1. Navigate to Content Sources.

  2. Click Add New Content Source.

  1. Find RSS through the search box and click Add.

  2. Under the Authentication tab, enter the following details and click Connect:
  • Name. Give your content source a name.
  • Client URL. Each RSS feed has a URL usually ending in .rss or .xml. Enter the URL of your RSS feed here.

  • Language. Select the content language. You can select multiple languages if your RSS feed content is in more than one language. English is selected by default.

Set up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, pick a date. Only the data created after the selected date will be crawled.

Smart Crawls can be used while setting up a frequency crawl, which will be done at the end of this article. When Smart Crawls is inactive, then during each frequency crawl the previously-indexed data is removed from index and a fresh index is generated with up-to-date data. When it's turned on, then the previously-indexed data remains unaffected and appends new data on top of the existing indexed data.

For now, ignore Smart Crawls, keep the Frequency to never, and click Set.

Rules

Define the RSS content fields to be indexed. items is the sole object supported as of now. Click on the icon to view the list of all the pre-configured fields.

NOTE.

Only admins are advised to edit or delete the fields.

Save the Settings.

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

NOTE 1

Review the settings in Rules if there is no progress in Crawl Logs.

NOTE 2

For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.