Use Confluence As a Content Source

SearchUnify can index the pages and blogs stored in your Confluence instance. This article walks you through the process of setting up Confluence as a content source.

PERMISSIONS

  • You should have access to the projects or spaces to be crawled.

  • The number of responses received from Confluence over a given period depend upon the Confluence's rate limits. You can find the the latest rate limits on their website.

Establish a Connection

  1. Navigate to Content Sources.

  2. Click Add New Content Source.

  1. Find "Confluence" through the search box and click Add.

  2. Give your content source a name.

  3. Enter the web address of your Confluence instance followed by wiki/ in Client URL.

  4. Select an Authentication Method.
    • Basic. Select it to crawl the spaces that the Confluence user, whose username and API token has been entered, is allowed to access. Anybody who is part of an organization can authenticate and crawl data. Every user will need to create their own API token. Check out SearchUnify's doc on Create an API Token in Atlassian (Jira and Confluence) or the Confluence doc on Create an API Token.
    • OAuth. Select it to crawl all the spaces and projects to which the client application has access. The level of access and permissions granted to the client application depends on the admin's role and the permissions assigned to them in Confluence. Only a Confluence admin can create the client application. The instructions are in Create an App in Confluence (from SearchUnify) and in Conflgure an Incoming Link (from Confluence).

  5. Click Connect.

Set Up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, select a date to start crawling; the data created after the selected date will be crawled. For now, keep the frequency to its default value Never and click Set and move to the next section.

Select Types and Fields for Indexing

SearchUnify can index Confluence pages and blogs. You can choose to index them both, or select just one of them. You can further index all blog and page fields, or only a few of them.

  1. Click to select content fields.

  2. Use the dropdown in the Name column to add content fields one at a time.

  3. OPTIONAL. SearchUnify assigns each field a label, type, and either an isSearchable or isFilterable tag. The values don't require a change, but advanced users can edit them.

  4. Press Save.

  5. Repeat the steps 2-5 with the second content type.
  6. Navigate to By Place.

  7. Use the index to find your project and check enable for each one of it.

  8. Press Save.

You have successfully installed Confluence as a content source.

Find and Replace

Those on the Q2 '24 release or a later version will notice a new button next to each object on the Rules screen. It resembles a magnifying glass and is labeled "Find and Replace." You can use this feature to find and replace values in a single field or across all fields. The changes occur in the search index, not in your content source.

Find and Replace proves valuable in various scenarios. A common use case is when a product name is altered. Suppose your product name has changed from "SearchUnify" to "SUnify," and you wish for the search result titles to immediately reflect this change.

  1. To make the change, click .

  2. Now, choose either "All" or a specific content source field from the "Enter Name" dropdown. When "All" is selected, any value in the "Find" column is replaced with the corresponding value in the "Replace" column across all content source fields. If a particular field is chosen, the old value is replaced with the new value solely within the selected field.

  3. Enter the value to be replaced in the Find column and the new value in the Replace column. Both columns accept regular expressions.

  4. Click Add. You will see a warning if you are replacing a value in all fields.

  5. Save the settings.

  6. Run a crawl for the updated values to reflect in search results.

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

NOTE 1

Review the settings in Rules if there is no progress in Crawl Logs.

NOTE 2

For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.

Data Deletion and SU Index

All the data deleted from Confluence is removed from the SearchUnify index within 24 hours.