Find Duplicate Documents in a Content Source or Across Content Sources

You can use Duplicacy Checker to detect identical and similar documents in content sources. It is an add-on.

Install Content Duplicacy

  1. Log into your SearchUnify instance and navigate to Addons from the menu bar on the left.
  2. Click Add New SearchUnify Addon.
  3. Scroll down to find Duplicacy Checker and install.

The installation was successful if you can view Content Duplicacy in the menu bar.

Detect Duplicate Documents

  1. Click Content Duplicacy and then Add new duplicacy checker.
  2. Using the Comparing dropdowns in the first column, select a content source and content type.
  3. Use the dropdowns in the second column to select a content source and content type.

NOTE. To find duplicate documents in a content source and type, select the same source and type in both columns.

  1. Use the Percentage dropdown to decide when two documents are to be considered duplicates.

NOTE. Selecting 30 percent will mean that two documents are duplicates if the 30 percent of content between them is identical.

  1. Click Compare.

View Duplicate Documents

  1. Navigate to Content Duplicacy and click .

If any duplicate documents are found, a list of them will appear on your screen. Currently, we can see the duplicate docs in Mindtouch.

Last updatedFriday, November 27, 2020