Thursday, June 20, 2013

Intelligent Search and Automated Metadata

The inability to identify the value in unstructured content is the primary challenge in any application that requires the use of metadata. Search cannot find and deliver relevant information in the right context, at the right time without good quality metadata.

An information governance approach that creates the infrastructure framework to encompass automated intelligent metadata generation, auto-classification, and the use of goal and mission-aligned taxonomies is required. From this framework, intelligent metadata enabled solutions can be rapidly developed and implemented. Only then can organizations leverage their knowledge assets to support search, litigation, e-discovery, text mining, sentiment analysis and open source intelligence.

Manual tagging is still the primary approach used to identify the description of content, and often lacks any alignment with enterprise business goals. This subjectivity and ambiguity is applied to search, resulting in inaccuracy and the inability to find relevant information across the enterprise.

Metadata used by search engines may be comprised of end user tags, pre-defined tags, or generated using system defined metadata, keyword and proximity matching, extensive rule building, end-user ratings, or artificial intelligence. Typically, search engines provide no way to rapidly adapt to meet organizational needs or account for an organization’s unique nomenclature.

More effective is implementing an enterprise metadata infrastructure that consistently generates intelligent metadata using concept identification. A profoundly different approach, relevant documents, regardless of where they reside, will be retrieved even if they don’t contain the exact search terms, because the concepts and relationships between similar content has been identified. The elimination of end-user tagging and the resulting organizational ambiguity enables the enriched metadata to be used by any search engine index, for example, ConceptSearch, SharePoint, Solr, Autonomy or Google Search Appliance.

Only when metadata is consistently accurate and trusted by the organization can improvements be achieved in text analytics, e-discovery and litigation support.

In the exploding age of big data, and more specifically text analytics, sentiment analysis and even open source intelligence, the ability to harness the meaning of unstructured content in real time improves decision-making and enables organizations to proactively act with greater certainty on rapidly changing business complexities.

To achieve an effective information governance strategy for unstructured content, results are predicated on the ability to find information and eliminate inappropriate information. The core enterprise search component must be able to incorporate and digest content from any repository, including faxes, scanned content, social sites (blogs, wikis, communities of interest, Twitter), emails, and websites. This provides a 360-degree corporate view of unstructured content, regardless of where it resides or how it was acquired.

Ensuring that the right information is available to end users and decision makers is fundamental to trusting the accuracy of the information and is another key requirement in intelligent search. Organizations can then find the descriptive needles in the haystack to gain competitive advantage and increase business agility.

An intelligent metadata enabled solution for text analytics analyzes and extracts highly correlated concepts from very large document collections. This enables organizations to attain an ecosystem of semantics that delivers understandable and trusted results that is continually updated in real time.

Applying the concept of intelligent search to e-discovery and litigation, traditional information retrieval systems use "keyword searches" of text and metadata as a means of identifying and filtering documents. The challenges and escalating costs of e-discovery and litigation support continue to increase. The use of intelligent search reduces costs and alleviates many of the challenges.

Content can be presented to knowledge professionals in a manner that enables them to more rapidly identify relevant information and increase accuracy. Significant benefits can be achieved by removing the ambiguity in content and the identification of concepts within a large corpus of information. This methodology delivers expediencies, and reduces costs, offering an effective solution that overcomes many of the challenges typically not solved in e-discovery and litigation support.

Organizations must incorporate an approach that addresses the lack of an intelligent metadata infrastructure. Intelligent search, a by-product of the infrastructure, must encourage, not hamper, the use and reuse of information and be rapidly extendable to address text mining, sentiment analysis, e-discovery, and litigation support.

The additional components of auto-classification and taxonomies complete the core infrastructure to deploy intelligent metadata enabled solutions, including records management, data privacy, and migration. Search can no longer be evaluated on features, but on proven results that deliver insight into all unstructured content.