Use Stack Overflow As a Content Source
Stack Overflow is a popular forum for programmers. SearchUnify can crawl the public content on Stack Overflow. This article walks you through the installation process.
SearchUnify ignores user permissions during searches. All indexed files can be search by all users.
Establish a Connection
- Navigate to Content Sources.
- Click Add New Content Source.
- Find StackOverflow from the search box and click Add.
- Under the Authentication tab, fill in the following details:
- Name. Give your Stack Overflow content source a name.
- Organization Type. Select whether your organization type is ‘Public’ or ‘Enterprise’ from the drop-down.
Note- If you have an Enterprise-level Stack Overflow account (not public), then you need to create an app in Stack Overflow and enter the generated API key in Stack Overflow content source to configure your account.
- Language - Select the language that your community interacts with. It's usually English.
- Authentication Type - Select your authentication method from the dropdown. ‘No Authentication’ is used when the admin doesn't have a Stack Overflow account and OAuth for those who have an account. Functionally, both are identical, which means that the indexing doesn't depend on whether the admin has an account or not.
- Click on Connect.
Set Up Crawl Frequency
The first crawl is always manual and is performed after configuring the content source. For now, keep the frequency to its default value Never and click Set and move to the next section.
Select Content Types and Tags for Indexing
You can add or remove available content fields in By Content Type. Each question in Stack Overflow has tags. You can specify what questions to crawl by specifying the tags in By Tags.
Based on credentials used to configure Stack Overflow, the API rate limit changes.
- Click to view the properties of a content type.
- A dialog box will open. You can click to remove a content field. The removed content fields are not indexed. You can use the Name column to find content types, the Label column to rename them, and the Type column to change the default data type. To edit existing content fields, click . Once the configurations are complete, click Save. Note - It is not recommended to edit these fields.
- Navigate to By Tags and define the scope of indexing by inserting tags. If an admin inserts, "Python", then only the questions tagged "Python" will be indexed. In the Search Tags Here section, type the name of the tag corresponding to which you wish to crawl the data in Stack Overflow. Click on Add Tag, and when you are done adding the required tags, click on Save.
After the First Crawl
Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.
Review the settings in Rules if there is no progress in Crawl Logs.
For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.
Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.
- In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.
- Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.
- Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.
Last updated: Tuesday, February 6, 2024
Or, send us your review at firstname.lastname@example.org