Use Github As a Content Source

SearchUnify can crawl, index, and search the issues in your Github repositories.

PERMISSIONS.

SearchUnify ignores user permissions during searches. All indexed files can be search by all users.

PREREQUISITES.

You should have access to the repositories to be crawled.

Establish a Connection

  1. Navigate to Content Sources.

  2. Click Add New Content Source.

  1. From the search box, find Github and click Add.

  2. Give your content source a Name.
  3. Pick either No Authentication or OAuth from Authentication Type. On selecting No Authentication, you can crawl the issues on public GitHub repositories. OAuth allows you to index the issues on private GitHub repositories.
  4. Select the issues language in Language.
  5. Select the type of your Github account in Repository Type. Individual developers can select User but the correct way forward for companies is Organization.
  6. Label your repository in Organization/User Name.
  7. Click Connect.
  8. Click Next if a window pops up with the "Connection Successful" message.

Set Up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, select a date to start crawling; the data created after the selected date will be crawled. For now, keep the frequency to its default value Never and click Set and move to the next section.

Select Fields and Repositories for Indexing

SearchUnify indexes only one Github object, issues. It means that only the data seen under the Issues tab on Github can be indexed and made searchable.

The issues objects consists of several components, such as the issue title, issue description, assignee, label, projects, milestones, and others. Each component is called a field. In the rules tab, you can select the fields whose data should be indexed.

  1. Click to edit issues.

  2. Add, edit, or remove fields and click Save. Only the data from the selected fields is indexed.

  3. Navigate to By Spaces, use the index to find your repositories, check them, and click Save. If you don't specify repositories, then all your Github repositories are indexed.

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

NOTE 1

Review the settings in Rules if there is no progress in Crawl Logs.

NOTE 2

For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.

Data Deletion and SU Index

A method to update the index in real time is to enable event subscriptions, which supplement existing crawls and synchronize data between your Jira instance and SearchUnify in real time. Check out Enable Event Subscription in Github.

Last updatedThursday, April 25, 2024

Or, send us your review at help-feedback@searchunify.com