Use Contentful as a Content Source

This article shows how to establish a connection between SearchUnify and your Contentful CMS platform. Through the connection, SearchUnify can index data stored in your Contentful spaces and environments and make it searchable.

NOTE.

The deleted and archived documents are removed from SearchUnify index during every frequency crawl.

Establish a Connection

  1. Navigate to Content Sources and click Add New Content Sources.

  1. Find Contentful and click Add.

  2. Enter the details and click Connect.

    • Name: A descriptive label that will be used to identify your content source. Useful when you have several content sources in an instance.

    • Language. Select the language your data is in. More than 20 languages are supported.

    • Space ID. Enter the ID of the space to be indexed. Check out Get Contentful Space ID and API Key.

    • Access Token. Enter the Content Delivery API access token corresponding to the space ID.

  3. After entering the aforementioned details, click Connect.

Once the connection has been set up successfully, you will be prompted to the next action - Set Frequency.

Re-Connect

The Authentication screen is displayed when an already-created Content Source is opened for editing. An admin can edit a Content Source for multiple reasons, including:

  • To reauthenticate

  • To fix a crawl error

  • To change frequency

  • To add or remove an object or a field for crawling

When a Content Source is edited, either a Connect or a Re-Connect button is displayed.

Case 1: When the Connect button is displayed:

When the Connect button is displayed if the Content Source authentication is successful. Along with the button, a message is displayed There are no crawl errors and the Content Source authentication is valid.

Fig. The Connect button is displayed on the Authentication tab.

Case 2: When the Re-connect button is displayed:

The Re-connect button is displayed when the authentication details change or the authentication fails for any reason.

In both cases, the Content Source connection must be authenticated again. To reauthenticate a Content Source, enter the authentication details, and click Re-Connect.

Fig. The Re-Connect button is displayed on the Authentication tab.

Set Up Crawl Frequency

The first crawl is always performed manually after configuring the content source. In the Choose a Date field, select a date to start the crawl; only data created after the selected date will be crawled*. For now, leave the frequency set to its default value, Never, and click Set.

Fig. The Frequency tab when "Frequency" is set to "Never".

Select Content Types for Indexing

After setting the connection and configuring frequency, define content models for indexing. You can find a list of all content models on the Content model page. In the image, you can see two content models: Article and Author.

  1. In Rules, enter a content model in Object Api and give it a name Search Label. If the name of the Content model is correct, click Add Object. To add more than one Content models, repeat the process.

  2. Click to specify the fields in a Content model that should be indexed. A new screen appears where you can configure the fields. Once you are done, click Apply.

  3. Save the Settings.

You have successfully added Contentful as a content source in SearchUnify. Perform a manual crawl to start indexing data in SearchUnify.

Related

Difference between Manual and Frequency Crawls

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

Once the first crawl is complete, click in Actions to open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data created or updated after the selected date is indexed.

  2. The following options are available for the Frequency field:

    • When Never is selected, the content source is not crawled until an admin opts for a manual crawl on the Content Sources screen.

    • When Minutes is selected, a new dropdown appears where the admin can choose between three values: 15, 20, and 30. Picking 20 means that the content source crawling starts every 20 minutes.

    • When Hours is selected, a new dropdown is displayed where the admin can choose between eight values between 1, 2, 3, 4, 6, 8, 12, and 24. Selecting 8 initiates content crawling every 8 hours.

    • When Daily is selected, a new dropdown is displayed where the admin can pick a value between 0 and 23. If 15 is selected, the content source crawling starts at 3:00 p.m. (1500 hours) each day.

    • When Day of Week is selected, a new dropdown is displayed where the admin can pick a day of the week. If Tuesday is chosen, then content source crawling starts at 0000 hours on every Tuesday.

    • When Day of Month is selected, a new dropdown appears where the admin can select a value between 1 and 30. If 20 is chosen, then content source crawling starts on the 20th of each month.

      It is recommended to pick a date between the 1st and 28th of the month. If 30 is chosen, then the crawler may throw an error in February. The error will be “Chosen date will not work for this month.”

    • When Yearly is selected, the content source crawling starts at midnight on 1 January each year.

    Fig. The content source crawling starts at 00:00 on each Tuesday.

  3. Click Set to save the crawl frequency settings.

  4. Click Save.