Thursday, May 15, 2014

Content Categorization Role in Content Management

An ability to find content in a content management system is crucial. One of main goals of having a content management system is to make content easy to find, so you can take an action, make a business decision, do research and development work, etc.

The main challenge to findability is anticipating how users might look for information. That's where categorization comes into play. The quality of the categorization of each piece of content makes or breaks its findability. Theoretically, good tagging will last the lifetime of the content. You would think that if you do it well initially, then you can forget about it until it is time to retire that content. But reality can be very different.

Durable Categorization

Many issues complicate content categorization. They include:
  • the sheer volume, velocity, and variety of internal and external-facing content which needs management;
  • evolving/emerging regulations and compliance issues, some of which need to be retroactively applied; 
  • the need to limit the company's exposure and to support the strength of its position in any legal activity.
Some organizations face the added challenge of integrating content from acquisitions or mergers, which most likely use content management structure, categorization, and methodologies that are incompatible and of inconsistent quality.

Considering these issues, the success factor for good content categorization are the automatic categorization techniques and processes.

Traditionally, keywords, dictionaries, and thesauri are used to categorize content. This type of categorization model poses several problems:
  • taxonomy quality - it depends on the initial vision and attention to detail, and whether it has been kept current;
  • term creep - initial categorization will not always accommodate where and how the content will be used over time, or predict relevancy beyond its original focus;
  • policy evolution - it can't easily apply new or evolving policies, regulations, compliance requirements, etc.;
  • cost and complexity - it is difficult and costly, if not practically impossible, to retroactively expand the original categorization of the existing content if big amount of content is added.
Automatic Categorization

Using technology to automatically categorize content is a solution. It applies the rules more consistently than people do. It does it faster. It frees people from having to do the task, and therefore has less costs. And, it can actively or retroactively categorize batches or whole collections of documents.

You can experience these benefits by using concept-based categorization driven by an analytics engine integrated into the content management system. These systems mathematically analyze example documents you provide to calculate concepts that can be used to categorize other documents. Identifying hundreds of keywords per term, they are able to distinguish relevance that escapes keyword and other traditional taxonomy approaches. They are even highly likely to make connections that a person would miss.

Consider 3D printers as an example. These are also known as "materials printers", "fabbers", "3D fabbers", and as "additive manufacturing". If all of those terms are not in the taxonomy, then relevant documents that use one or more of them, but not 3D printer, would not be optimally categorized.

People looking for information about 3D printers who are not aware of the alternative terms would miss related documents of potential significance. This particularly impacts  external facing websites that sell products on their websites. Their business depends on fast and easy delivery of accurate and complete information to their customers, even when the customer doesn't know all of the various terms used to describe the product they are looking for.

In contrast, through example-based mathematical analysis and comparison along multiple keywords, conceptual analytics systems understand that these documents are all related. They would be automatically categorized and tagged as relevant to 3D printing.

Another difference is that taxonomy systems require someone to enter the newly developed or discovered terms. In conceptual analytics, it is simply a matter of providing additional example documents that automatically add to the system's conceptual understanding.

The days of keeping everything "just in case" are long gone. From cost and risk exposure concerns, organizations need to keep only what is necessary, particularly as the volume and variety of content continue to grow. Good categorization and tagging systems are essential to good content management and to controlling expense and exposure.

Outdated and draft documents unnecessary expand every company's content repositories. Multiple copies of the same or very similar content are scattered throughout the organization. By some estimates, these compose upwards of 20% or more of a company's content.

Efficiently weeding out that content means 20% less active and backup storage, bandwidth, cloud storage for offsite disaster recovery, and archive volume. Effective and thorough tagging can identify such elements to reduce these costs, and simultaneously reduce the company's cost and exposure related to legal or regulatory requirements.

The Value Beyond Cost Savings

An effectively managed content delivers better cost of content management and reduced exposure to risk. While this alone is reason to implement improvements in categorization, there are other reasons.

Superior categorization through conceptual analysis also affects operational efficiency by enabling fast, accurate, and complete content gathering. A significant benefit for any enterprise is that it allows more time for actual work by reducing the time it takes to find necessary information. It is of critical importance for companies whose revenue depends on their customers quickly and easily finding quality information.

Conceptual analytics systems deliver two other advantages over traditional taxonomy methods and manual categorization. It creates a mathematical index, so it is useless to anyone trying to discover private information or clues about the company. Also, it is deterministic and repeatable. It will give the same result every time and so it is very valuable in legal or regulatory activities.

Concept-based analysis makes content findable and actionable, regardless of language, by automatically categorizing it based on understanding developed from example documents you provide. Both internally and externally, the company becomes more competitive with one of its most important assets which is unstructured information.


  1. Wonderful post however , I was wanting to know if you could write a litte more on this subject?
    I'd be very thankful if you could elaborate a little
    bit more. Bless you!

    Here is my website :: true beauty tips

  2. Thank you for your comment. I will write more on this subject in the coming week. Please check back or become a follower. I look forward to you comments.