Use SharePoint As a Content Source

This article walks you through the process of installing SharePoint as a content source.

PREREQUISITES

  • You can only crawl the data related to documents, pages, and lists.

  • You have to be an admin to crawl SharePoint site data. If you are not an admin, then you can crawl only the sites accessible to you.

  • Any SharePoint user can crawl community sites.

  • Data returned from SharePoint may be throttled. Take a look at the latest SharePoint rate limits.

  • Attachment types crawled in SearchUnify are PDF, DOCX, TXT, DOC, PPT, PPTX, CSV, XLS, and POTX.

Establish a Connection

  1. Navigate to Content Sources and click Add New Content Sources.

  1. Find SharePoint from the search box and click Add.

  2. Under the Authentication tab, enter the required details:

    • Name. Enter a label. Labels help you distinguish content sources from one another.

    • Client URL. Enter the SharePoint instance web address.

    • Authentication Type. Select either Basic or OAuth.

      • Selecting Basic requires you to enter your SharePoint login ID and password.

      • Selecting OAuth required you to enter Client ID and Client Secret. This article explains how to Obtain Client ID and Client Secret for SharePoint Authentication.

    • Language. Select the content language.

  3. Click Connect.

Once the connection has been set up successfully, you will be prompted to the next action - Set Frequency.

Set Up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, select a date to start crawling; the data created after the selected date will be crawled. For now, keep the frequency to its default value Never and click Set and move to the next section.

Select Fields and Websites for Indexing

SearchUnify can index three SharePoint content types: list, page, and document. Under the Rules tab, you will land on By Content Type subtab. You can further define which properties (content fields) of these content types are to be indexed.

  1. Click to view the pre-configured properties of a content type.

    NOTE. You can add or delete the content fields. Although, it is not recommended for users other than Admins to make any changes in the fields.
  2. Navigate to By Sites subtab and use the alphabetical index to find and select your SharePoint websites that you want to index.

  3. Save your settings.

You have successfully added SharePoint as a content source in SearchUnify. Perform a manual crawl to start indexing data in SearchUnify.

Related

Difference between Manual and Frequency Crawls

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.