Use Google Drive As a Content Source

SearchUnify can crawl, index, and search the data in your Google Drive instance. This article explains how to start using Google Drive as a content repository for your search clients.

PERMISSIONS.

The person authenticating Google Drive can index only those files to which they have view-access.

SearchUnify respects user permissions during searches. A user has access to a file named "Accounts", then they can find "Accounts" through search.

SearchUnify needs read-only access to Google Drive. In the Google's parlance, it means view and download access.

Files on which export is disabled aren't crawled.

If file permissions are altered, these changes will be reflected in the index following the frequency crawl. Additionally, any modifications to shared drive and folder permissions will also be updated in the SearchUnify index.

A person creates a folder with multiple files and shares it with a teammate. If the teammate removes a file from the folder, then the file data is not deleted from the SearchUnify index. However, if the creator themselves remove a file, then an event is triggered and the data is removed from the SearchUnify index.

With Google OAuth 2.0-based authentication in new SearchUnify instances, an API rate limit of 10,000 grants per day is applicable. It might impact the crawling in SearchUnify.

Establish a Connection

  1. Navigate to Content Sources.

  2. Click Add New Content Source.

  1. Find Google Drive and click Add.

  2. Under the Authentication tab, enter details. If you are on the first quarterly release of 2024 or a later release, then enter the name of the content source, pick a content language, and enter the Cliend ID and Client Secret of the OAuth 2.0 app.

  3. Administrators on a release older than Q1.24 can enter the name of the content source, pick a content language, and click Connect. A permissions window pops up asking for required permissions. Click Allow.

  4. A pop-up window asking view and download permissions will appear. Click Allow to let SearchUnify index the files on your Google Drive.

  5. A connection successful message will appear. Click Next.

Google hasn't verified this app

If you encounter an error where Google indicates that the app has not been verified, please disregard the warning and continue with the authentication process. This issue is slated for resolution in future updates.

Set Up Crawl Frequency

The first crawl is always manual and is performed after configuring the content source. In Choose A Date, select a date to start crawling; the data created after the selected date will be crawled. For now, keep the frequency to its default value Never and click Set and move to the next section.

Select Types and Fields for Indexing

Google Drive supports only one content type file. Under the Rules tab, you will land on the By Content Type subtab. Click EDIT  to see the list of pre-configured fields.

Edit/Remove the fields and click Apply.

  1. Switch to By Folders subtab.

  2. From My Folders, Shared Folders, and Shared Drive select the directories and move then to Select Files and Folders section for indexing.

  3. After selecting the repositories and click Save.

You have successfully added Google Drive as a content source in SearchUnify. Perform a manual crawl to start indexing the Drive data in SearchUnify.

Related

Difference between Manual and Frequency Crawls

Find and Replace

Those on the Q2 '24 release or a later version will notice a new button next to each object on the Rules screen. It resembles a magnifying glass and is labeled "Find and Replace." You can use this feature to find and replace values in a single field or across all fields. The changes occur in the search index, not in your content source.

Find and Replace proves valuable in various scenarios. A common use case is when a product name is altered. Suppose your product name has changed from "SearchUnify" to "SUnify," and you wish for the search result titles to immediately reflect this change.

  1. To make the change, click .

  2. Now, choose either "All" or a specific content source field from the "Enter Name" dropdown. When "All" is selected, any value in the "Find" column is replaced with the corresponding value in the "Replace" column across all content source fields. If a particular field is chosen, the old value is replaced with the new value solely within the selected field.

  3. Enter the value to be replaced in the Find column and the new value in the Replace column. Both columns accept regular expressions.

  4. Click Add. You will see a warning if you are replacing a value in all fields.

  5. Save the settings.

  6. Run a crawl for the updated values to reflect in search results.

After the First Crawl

Return to the Content Sources screen and click in Actions. The number of indexed documents is updated after the crawl is complete. You can view crawl progress in in Actions. Documentation on crawl progress is in View Crawl Logs.

NOTE 1

Review the settings in Rules if there is no progress in Crawl Logs.

NOTE 2

For Mamba '22 and newer instances, search isn't impacted during a crawl. However, in older instances, some documents remain inaccessible while a crawl is going on.

Once the first crawl is complete, click in Actions open the content source for editing, and set a crawl frequency.

  1. In Choose a Date, click to fire up a calendar and select a date. Only the data after the selected date is indexed.

  2. Use the Frequency dropdown to select how often SearchUnify should index the data. For illustration, the frequency has been set to Weekly and Tuesday has been chosen as the crawling day. Whenever the Frequency is other than Never, a third dropdown appears where you can specify the interval. Also, whenever Frequency is set to Hourly, then manual crawls are disabled.

  3. Click Set to save crawl frequency settings. On clicking Set, you are taken to the Rules tab.

Data Deletion and SU Index

All the data deleted from Google Drive is removed from the SearchUnify index within 12 hours.

OAuth 2.0 Setup

If you are an existing SearchUnify user and you migrate your instance to Q1 '21 or newer versions, your YouTube and Google Drive content sources will continue to work. However, you will see the following error on your YouTube and Google Drive content sources in case you haven't authenticated them with OAuth 2.0.

Copy

Error

You need to set up your Google OAuth to continue using this content source. Click here to know more.

You will see the following errors on your YouTube and Google Drive content sources in case you haven't authenticated them with OAuth 2.0.

Set up OAuth 2.0 on your Google account and re-authenticate your content sources using the client ID and client secret.

Help Article: Setting up OAuth 2.0