Automatic Text Classification with Content Annotation

Content Annotation in Content Sources is used to categorize data into predefined categories. A popular use case is to tag threads on a community with product names. To understand how it works, assume a small community with only 10 threads:

Fig. An illustration of a community with ten threads.

These 10 threads are about 4 products. If you were to tag them by product, here's what the classification would look like:

Fig. An illustration of a community with threads organized into categories.

This approach to taxonomy, where a human is tagging threads, for all its advantages, is not scalable. It's not feasible to hire a person to tag large community with 10,000 or a million threads. But with Content Annotation, you can apply a set of tags on a content source.

The tags are defined in Taxonomy, where using controlled vocabulary, an admin defines Entities and Values. This article walks you through the process of applying an already defined set of tags to a content source.

Check out the article, Taxonomy, for the definitions and instructions on how to add tags.

Apply Taxonomy to Content Source

  1. From Content Sources, go to Content Annotation.

  2. Click Add Annotation Rule.

  3. Select the Content Source and Select Object where tagging is going to be applied.

  4. From Entity, pick an entity from Taxonomy.

  5. The values described in Taxonomy can be supplemented with content sources fields. To group together Taxonomy values and content fields, click on the fields in Select Fields.

  6. Taxonomy values are grouped together into facet. Name the facet in Content Field Name and Label. The value in Content Field Name is used by the SearchUnify in the backend and the value in the Label field is shown to the search client users.

    The Content Field Name must be unique and be not present in other content sources. If it's available in another content source, activate Derive name from existing field and select a Content Source, Object, and Content Field to name the facet after the selected field.

  7. Click Send Annotation Request. Based on data size, annotation can take anywhere between a few minutes to several hours.

Synonyms Configurations

You can configure Content Annotations to support synonyms. The instructions are on Synonyms Configuration.

Troubleshooting

  • If you cannot click Send Annotation Request after editing an annotation because the indicator shows in progress, then wait. The indicator will disappear once the backend job is complete.

  • Content is annotated once a week. Search performance may be temporarily impacted while content annotation is in progress.

  • After creating one or more filters using content annotation and running a manual crawl, please log a ticket with the support team. They will initiate a backend job to ensure the content annotation is applied to the new documents.