Set up a Content Source with API

SearchUnify supports more than 30 content sources out-of-the-box. Any data repository that is not in the list can be transformed into a content source with APIs. It's secure because API is OAuth 1.0 compliant. This article walks you through the process.

PREREQUISITES

  • Content source with a working API and structured data
  • Familiarity with essential REST API terminology
  • Access to the API of the platform, where content resides

Establish a Connection

  1. Navigate to Content Sources.

  2. Click Add New Content Source.

  1. Find API and click Add.

  2. Enter the following details for authentication:
    • Name. Insert a label for your content source.

    • Client URL. Enter the base URL of your API. It might resemble https://mycompany.platform.com/api/v2.

    • Language. Select the language of your API platform. English-en is the default selection.

    • Permission. Leave this field blank.

  3. Click Connect.

Configure the Calls

The Configuration section covers the set-up process to crawl and index your content through APIs.

  1. Select either GET or POST from Method Type.

  2. Select an Authentication Type. You can pick:
    • No Auth. No details are needed to authenticate the connection with your content source.
    • Basic Auth. Enter the user name and password required to make calls to your REST API.
    • OAuth1. Enter the Consumer Key, Consumer Secret, Token, Token Secret, and Realm to make calls to your REST API.

  3. Write all the key-value pairs to be used in Header in JSON. If you want to keep the field empty, write a pair of curly braces ({}).

  4. Open Request Parameters.

  5. Enter query parameters in JSON to consume your REST API.

  6. Select a pagination method:
    • By Page No. Almost no platform returns data in a single go. That means a caller will have to make multiple calls to get all the documents and each call will be slightly different. The difference is usually a field in your query parameter which has to be selected in the Page No. Tag dropdown. Page No. Tag appears when you select By Page No. as your pagination method. It gets incremented with each call and fetches a new batch of documents each time it is run. Default Page Size Tag—which also appears when you select By Start Offset—determines the size of each page. If the value of its parameter is 10, then there will be 10 results on each page.
    • By Start Offset. It is similar to By Page No. The only difference being that the variable used to differentiate calls is Page Start Offset Tag.
    • By Next Link. It is an object with two key-pair values: a Boolean and a link. The crawler uses the value stored in the link as the differentiating variable as long as the Boolean is true. For Next Link Tag, select the URL key to move to the next page.
    • No Pagination. The entire data is indexed in one go. It is extremely rare to encounter a content source that will accept this method.
    • Has More & Next ID. It is an object with two key-pair values: a Boolean and a tag. The crawler uses the value stored in the link as the differentiating variable as long as the Boolean is true. For NEXT ID field, select the key from responses where the ID is present for the next page.

  7. Open Response Parameters and enter a JSON snippet that contains all the response parameters in Response.

  8. Select a Result Iteration Tag and an Index Tag. Result Iteration Tag is a response field that contains the data you wish to index. Index Tag contains a unique ID to distinguish document from one another.

  9. SearchUnify will stop making calls to your content source once all your documents have been indexed. Total Results Method keeps a count of the total and indexed documents and there are two ways to keep this count: By Total Count or By Has More Flag. Select one.

  10. From Total Count Tag, select a parameter returns total count. Then, click Set.

Set Up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, select a date to start crawling; the data created after the selected date will be crawled. For now, keep the frequency to its default value Never and click Set and move to the next section.

Add the Objects and Fields for Indexing

  1. Enter an object where your data is stored, give it a label, and click Add Object.

  2. Repeat the previous step if you wish to add more than one object.
  3. Click to manage fields.

  4. Add all the fields that you wish to index and click Save.

  5. Click Save.

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

NOTE 1

Review the settings in Rules if there is no progress in Crawl Logs.

NOTE 2

For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.

Smart Crawls

Smart Crawls reduce indexing time. When it's toggled on, then during each frequency crawl the search index isn't created from scratch. Instead, it's updated to reflect the data changes in your content source. Let's say that two new docs have been created in your content source since the previous crawl. In this scenario, the new docs will be added to the index.

When Smart Crawls is toggled off, then the search index is generated from scratch during each frequency crawl. Instead of merely adding the docs that have been created in your content source since the last crawl, the current index is deleted entirely and a brand new index is created from scratch.