Galaxy Consulting Blog

Wednesday, September 18, 2013

The Mystery of How Enterprise Search Works (For You!)

Enterprise search starts with a user looking for information and submitting a search query. A search query would be a list of keywords (terms) or a phrase. The search engine would look for all records that match the request and return a list to the user. The list would contain results that are ranked in order of most relevant to least relevant for the request.

Let's look at search in more detail.

Performance Measures

There are two performance measures for evaluating the quality of query results: precision and recall.

Precision refers to the fraction of relevant documents from all documents retrieved. Recall is the fraction of relevant documents retrieved by a search from the total number of all relevant documents in the collection. It is said that precision is a measure of usefulness of a result while recall is a measure of the completeness of the result.

Modern search engines provide a high recall with good precision. It is easy to achieve high recall by simply returning all documents in the collection for every query. However, the precision in this case would be poor. A key challenge is how to increase precision without sacrificing recall. For example, most web search engines today provide reasonably good recall but poor precision. In other words, a user gets some relevant results, usually in the first 10 to 20 results, along with many non-relevant results.

Relevancy

Relevancy is a numerical score assigned to a search result representing how well the result meets the information the user who submitted the query is looking for. Relevancy is therefore a subjective measure of the quality of the results as defined by the user. The higher the score, the higher the relevance.

For every document in a result, a search engine calculates and assigns a relevancy score. TF-IDF is the standard relevancy heuristic used for all search engines.It compares TF and IDF variables to provide a ranking score for each document.

TF stands for Term Frequency. This is the number of times a word (or term) appears in a single document as percentage of total number of terms in the document. Term frequency assumes that when evaluating two documents, document A and document B, the one that contains more occurrences of the search term is probably also more "relevant" to the user.

IDF stands for Inverse Document Frequency. This is a measure of the general importance of the term which is the ratio of all documents in the set to the documents that contain the term. IDF prevents a bias towards longer documents.

Additional techniques may put more emphasis other attributes to determine relevancy, for example, freshness - when was the document created or last updated or what part of the document matched the term - document title or author may score higher than finding the term in the text body.

Modern search engines provide good relevancy scoring across a wide range of document formats, but more importantly, allow users to create and use their own relevancy scoring profiles optimized for their queries. These user-defined weights, also called boosting, can be set up and run for a user, group of users, or per query. This is extremely helpful for personalizing the search experience by roles or departments within the organization.

Linguistics

Linguistics is a vital component of any search solution. It refers to the processing and understanding of text in unstructured documents or text fields. There are two parts to linguistics: syntax and semantics.

Syntax is about breaking text into words and numbers which is also called tokenization. Semantics is the process of finding the meaning behind text, from the levels of words and phrases to the level of paragraphs, a document or a set of documents. Semantic analysis often involves grammatical description and deconstruction, morphology, phonology, and pragmatics. One major challenge is ambiguity of language.

Linguistics therefore improves relevancy and affects precision and recall. Common linguistic features in a search solution include stemming and lemmatization of words (reducing words to their root or stem form), phrasing (the recognition and grouping of idioms), removal of stop words (words that appear often in documents but contain little meaning, for example articles), spelling corrections, etc.

Navigation

One way to overcome the challenges of semantics and language ambiguity used by search engines is navigation. In this case, the search engine is using linguistics features, such as extraction of entities (nouns and noun phrases, places, people, concepts, etc.) and predefined taxonomy to narrow the results by clustering related documents together or providing useful dimensions, called facets, to slice the data, for example using price, name, etc. to narrow down the search results.

The Search Index

At the heart of every search engine is the search index. An index is a searchable catalog of documents created by the search engine. The search engine receives content for all source system to place in the index. This process is called ingestion. The search engine then accepts search queries to match against the index. The index is used to quickly find relevant documents for a search query out of collection of documents.

A common index structure is the inverted index which maps every term in the collection to all of its locations in this collection. For example, a search for the term "A" would check the entry for "A" in the index that contains links to all the documents that include "A".

Wednesday, July 31, 2013

ISO 9001 and Documentation

ISO 9001 compliance becomes increasingly important in regulated industries. How does it affect documentation? Here is how...

What is Document Control?

Document control means that the right persons have the current version of the documents they need, while unauthorized persons are prevented from use.

We all handle many documents every day. These documents include forms that we fill out, instructions that we follow, invoices that we enter into the computer system, holiday schedules that we check for the next day off, rate sheets that we use to bill our customers, and many more.

An error on any of these documents could lead to problems. Using an outdated version could lead to problems. Not knowing if we have the latest version or not could lead to problems. Just imagine us setting up a production line to outdated specifications or making strategic decisions based on a wrong financial statement.

ISO 9001 gives us tools (also referred to as "requirements") that show us how to control our documents.

ISO 9001 Documents

There are no "ISO 9001 documents" that need to be controlled, and "non ISO 9001 documents" that don't need control. The ISO 9001 system affects an entire company, and all business-related documents must be controlled. Only documents that don't have an impact on products, services or company don't need to be controlled - all others need control. This means, basically, that any business-related document must be controlled.

However, how much control you apply really depends on the document.

The extent of your approval record, for example, may vary with the importance of the document (remember, documents are approved before they are published for use).

The Quality Policy, an important corporate policy document, shows the signatures of all executives.

Work instructions often just show a note in the footer indicating approval by the department manager.

Some documents don't even need any approval record: if the person who prepared a document is also responsible for its content (e.g., the Quality Manager prepares instructions for his auditors), a separate approval is superfluous.

On the other hand, identifying a document with a revision date, source and title is basic. It really should be done as a good habit for any document we create.

Please note that documents could be in any format: hard copy or electronic. This means that, for example, the pages on the corporate internet need to be controlled.

Responsibility for Document Control

Document control is the responsibility of all employees. It is important that all employees understand the purpose of document control and how to control documents in accordance with ISO 9001.

Please be aware that if you copy a document or print one out from the Intranet and then distribute it, you are responsible for controlling its distribution! The original author will not know that you distributed copies of this documents, so the original author can't control your distribution.

Dating Documents

ISO 9001 requires to show on every document when it was created or last updated. Many of us may have thought to use our word processor's automatic date function for this, but... should we use the automatic date field on documents?

Generally not. If you enter the automatic date field into a document, the field will automatically be updated to always show the current date, no matter when you actually created or updated the document.

Example: For example, if you use the automatic date field in a fax and you save the fax on your computer for future reference, you won't be able to tell when you wrote the fax: when you open the fax on your computer, it will always show today's date.

The automatic date field is not suitable for document control. Therefore, as a general rule, don't use the automatic date field to identify revision status.

ISO 9001 Documentation

ISO 9001 documentation includes:

the Quality Procedures Manual, which also includes corporate policies and procedures affecting the entire company;
work instructions, which explain in detail how to perform a work process;
records, which serve as evidence of how you meet ISO 9001 requirements.

Policies and Procedures

Our ISO 9001 Quality Manual includes the corporate Quality Policy and all required ISO 9001 Procedures. While most procedures affect only managers, every employee must be familiar with the Quality Policy and with the Document Control procedures. The Quality Policy contains the corporate strategy related to quality and customer satisfaction; all other ISO 9001 documents must follow this policy. The Document Control procedures shows how to issue documents, as well as how to use and control documents.

Continuous Improvement

Implementing ISO 9001 is not a one-time benefit to a company. While you are utilizing the quality manual, quality procedures and work instructions in daily business activities, you are not only benefiting from better quality and increased efficiency but you are also continually improving. In fact, the ISO 9001 requirements are designed to make you continually improve. This is a very important aspect because companies that don't continue to improve are soon overtaken by the competition.

Thursday, June 20, 2013

Intelligent Search and Automated Metadata

The inability to identify the value in unstructured content is the primary challenge in any application that requires the use of metadata. Search cannot find and deliver relevant information in the right context, at the right time without good quality metadata.

An information governance approach that creates the infrastructure framework to encompass automated intelligent metadata generation, auto-classification, and the use of goal and mission-aligned taxonomies is required. From this framework, intelligent metadata enabled solutions can be rapidly developed and implemented. Only then can organizations leverage their knowledge assets to support search, litigation, e-discovery, text mining, sentiment analysis and open source intelligence.

Manual tagging is still the primary approach used to identify the description of content, and often lacks any alignment with enterprise business goals. This subjectivity and ambiguity is applied to search, resulting in inaccuracy and the inability to find relevant information across the enterprise.

Metadata used by search engines may be comprised of end user tags, pre-defined tags, or generated using system defined metadata, keyword and proximity matching, extensive rule building, end-user ratings, or artificial intelligence. Typically, search engines provide no way to rapidly adapt to meet organizational needs or account for an organization’s unique nomenclature.

More effective is implementing an enterprise metadata infrastructure that consistently generates intelligent metadata using concept identification. A profoundly different approach, relevant documents, regardless of where they reside, will be retrieved even if they don’t contain the exact search terms, because the concepts and relationships between similar content has been identified. The elimination of end-user tagging and the resulting organizational ambiguity enables the enriched metadata to be used by any search engine index, for example, ConceptSearch, SharePoint, Solr, Autonomy or Google Search Appliance.

Only when metadata is consistently accurate and trusted by the organization can improvements be achieved in text analytics, e-discovery and litigation support.

In the exploding age of big data, and more specifically text analytics, sentiment analysis and even open source intelligence, the ability to harness the meaning of unstructured content in real time improves decision-making and enables organizations to proactively act with greater certainty on rapidly changing business complexities.

To achieve an effective information governance strategy for unstructured content, results are predicated on the ability to find information and eliminate inappropriate information. The core enterprise search component must be able to incorporate and digest content from any repository, including faxes, scanned content, social sites (blogs, wikis, communities of interest, Twitter), emails, and websites. This provides a 360-degree corporate view of unstructured content, regardless of where it resides or how it was acquired.

Ensuring that the right information is available to end users and decision makers is fundamental to trusting the accuracy of the information and is another key requirement in intelligent search. Organizations can then find the descriptive needles in the haystack to gain competitive advantage and increase business agility.

An intelligent metadata enabled solution for text analytics analyzes and extracts highly correlated concepts from very large document collections. This enables organizations to attain an ecosystem of semantics that delivers understandable and trusted results that is continually updated in real time.

Applying the concept of intelligent search to e-discovery and litigation, traditional information retrieval systems use "keyword searches" of text and metadata as a means of identifying and filtering documents. The challenges and escalating costs of e-discovery and litigation support continue to increase. The use of intelligent search reduces costs and alleviates many of the challenges.

Content can be presented to knowledge professionals in a manner that enables them to more rapidly identify relevant information and increase accuracy. Significant benefits can be achieved by removing the ambiguity in content and the identification of concepts within a large corpus of information. This methodology delivers expediencies, and reduces costs, offering an effective solution that overcomes many of the challenges typically not solved in e-discovery and litigation support.

Organizations must incorporate an approach that addresses the lack of an intelligent metadata infrastructure. Intelligent search, a by-product of the infrastructure, must encourage, not hamper, the use and reuse of information and be rapidly extendable to address text mining, sentiment analysis, e-discovery, and litigation support.

The additional components of auto-classification and taxonomies complete the core infrastructure to deploy intelligent metadata enabled solutions, including records management, data privacy, and migration. Search can no longer be evaluated on features, but on proven results that deliver insight into all unstructured content.

Wednesday, May 29, 2013

Digital Assets Management System - Autonomy Virage MediaBin

Autonomy Virage MediaBin is the advanced and comprehensive solution to index, analyze, categorize, manage, retrieve, process, and distribute all types of digital assets within an organization.

Autonomy Virage MediaBin helps organizations with globally distributed teams to effectively manage, distribute, and publish digital assets used to promote their messaging, products, and brands.

Companies would benefit from higher-impact marketing and communications, greater agility, stronger brand equity, increased team productivity, and the security of knowing valuable corporate assets will be fully leveraged and preserved for the future. By providing self-service access to digital assets, marketing personnel no longer have to spend time fulfilling content requests.

Autonomy Virage MediaBin delivers rapid return on investment and can support implementations scaling up to the largest global enterprises.

Major Features:

Unified Management: a single environment which supports standardized and automated tagging to accelerate search and streamline the creation, management, delivery, and archival of all digital assets.

Intelligent Analytics: leverages Autonomy IDOL to automate manual processes such as metadata tagging, summarization, and categorization.

Next-Gen Rich Media Technology: leverages next generation video and speech analytics technology that extracts concepts to enable cross-referencing with other forms of information.

Effective and Agile Content Reuse: provides secure access to all content for all users. Internal and external teams can collaborate more effectively to improve coordination and productivity in all marketing programs.

Transform and Transcode on the Fly: Multi-threaded transformation task engine can handle large quantities of simultaneous complex transformations involving format conversions, color-space conversions, color adjustments, resolution, cropping, sizing, padding, watermarking, and a wide variety of advanced graphics adjustments that would normally require a user to open an editing application on their desktop.

Other Features:

browser based system;
permissions can be defined based on users roles or by folders; search incorporates permissions;
content can be pulled from CMS such as TeamSite and rendered on the fly;
each asset has unique ID which is passed over to TeamSite; TeamSite "knows" when there is a different or a new revision. If an asset gets updated in MediaBin, TeamSite gets notified;
has set of workflows such as approval and review, can define set of rules once assets are approved, they move to publishing area; also includes Process Studio which is the workflow tool and Template which is form builder;
assets can be uploaded by "drag and drop" and it can be Dragged and Dropped to Teamsite from MediaBin;
there is no limitation to size of the files;
upload can be automated for assets to go to specific folders;
after the download, assets will be preserved for individual users;
how assets are used is reported in Teamsite;
can pull content from SharePoint;
metadata is preserved, it is searchable and indexable.
content is automatically categorized by asset type and resolution; asset type is recognized on ingest, so no entering metadata is required;
Teamsite pulls images from MediaBin;
supports 29 languages;
ability to link assets together (for example: associated assets) using existing metadata;
ability to create a taxonomy of assets;
search includes saved searches, recent searches, both preset and executed searches, custom search;
ability to search for words in video and then go that place in the video;
once a user finds content, an action can be taken such as download, send it e-mail, send shortcut to content or add it to light-box which is defined by permissions;
there is Activity Manager which includes all taken actions and an ability to get to users' tasks.

Benefits:

eliminates human error and ensures quicker access to content through automatic metadata extraction and accurate search results;
reduces costs by automating the production, review, and distribution of digital assets;
encreases efficiency by providing users with self-service access at any time;
greater speed time-to-market while maintaining accuracy and consistency;
facilitates quick reuse and re-purposing of images, as well as rapid content creation;
produces higher-impact marketing and communications, greater agility, and stronger brand consistency;
increases compliance by security controlled access, complete audit trail, and control of licensed content.

Thursday, May 9, 2013

Search Engine Technology

Modern web search engines are highly intricate software systems which employ technology that has evolved over the years. There are few categories of search engines that are applicable to specific browsing needs.

These include web search engines (e.g. Google), database or structured data search engines (e.g. Dieselpoint), and mixed search engines or enterprise search.

The more prevalent search engines such as Google and Yahoo! utilize hundreds of thousands of millions of computers to process trillions of web pages in order to return fairly well-aimed results. Due to this high volume of queries and text processing, the software is required to run in a highly dispersed environment with a high degree of superfluity.

Search Engine Categories

Web search engines

These are search engines that are specifically designed for searching web pages. They were developed to facilitate searching through a large amount of web pages. They are engineered to follow a multi-stage process: crawling the infinite number of pages to skim the figurative foam from their contents, indexing the foam/buzzwords in a sort of semi-structured form (for example a database), and returning mostly relevant as links to those skimmed documents or pages from the inventory.

Crawl

In the case of a wholly textual search, the first step in classifying web pages is to find an "index item" that might relate expressly to the "search term". Most search engines use sophisticated algorithms to "decide" when to revisit a particular page, to check its relevance. These algorithms range from constant visit-interval with higher priority for more frequently changing pages to adaptive visit-interval based on several criteria such as frequency of chance, popularity, and overall quality of site. The speed of the web server running the page as well as resource constraints like amount of hardware or bandwidth also figure in.

Link map

The pages that are discovered by web crawls are often distributed and fed into another computer that creates a veritable map of uncovered resources. This looks a little like a graph, on which different pages are represented as small nodes that are connected by links between the pages. The excess of data is stored in multiple data structures that allow quick access to this data by certain algorithms that compute the popularity score of pages on the web based on how many links point to a certain web page, which is how people can access any number of resources concerned with diagnosing psychosis.

Database Search Engines

Searching for text-based content in databases presents few special challenges from which a number of specialized search engines developed. Databases are slow when solving complex queries (with multiple logical or string matching arguments). Databases allow pseudo-logical queries which full-text searches do not use. There is no crawling necessary for a database since the data is already structured. However, it is often necessary to index the data in a more economized form designed to inspire a more expeditious search.

Mixed Search Engines

Sometimes, searched data contains both database content and web pages or documents. Search engine technology has developed to respond to both sets of requirements. Most mixed search engines are large Web search engines, like Google. They search both through structured and unstructured data sources. Pages and documents are crawled and indexed in a separate index. Databases are indexed also from various sources. Search results are then generated for users by querying these multiple indices in parallel and compounding the results according to "rules".

Tuesday, April 30, 2013

Big Data and Content Management

There has been a lot of talk lately about big data. What is big data?

Big data is is a collection of data sets so large and complex that it becomes difficult to process using on-hand commonly used software tools or traditional data processing applications. The challenges include capture, governance, storage, search, sharing, transfer, analysis, and visualization.

What is considered "big data" varies depending on the capabilities of the organization managing the data set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain.

Big data sizes are a constantly moving target. As of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. How does it apply to us and what we do in content management?

The sheer numbers, covered in most enterprise content management (ECM) analyst reports, also extend to all aspects of the information technology sector, prompting developers to create a new generation of software and technology or distributed computing frameworks in an effort to cope with this scalability phenomenon.

Content growth is everywhere. From traditional data warehouses to new consolidated big data stores, IT infrastructure must be ready for this continuing scale; it impacts the entire IT industry, especially ECM.

Content is getting bigger. Applications are growing more complex, challenging IT as never before. How will these changes impact content management technologies? It's difficult to predict exactly, but there are insights to be found and used to plan for the future.

ECM technology is evolving toward a platform-based approach, enabling organizations to make their own content-centric and content-driven applications smarter. Analysts, vendors and users all agree: The time for "out-of-the-box" CMS applications has passed. Now each project can meet specific needs and individual requirements.

Content and data, more often than not, come with embedded intelligence whether through adding custom metadata and in-text information or by leveraging attached media and binary files and it can be utilized, whether structured or unstructured.

This can be observed on many different levels across various domains. For instance, the arrival of what some have started to call "Web 3.0": the semantic Web and the related technology that promotes intelligence out of raw content through advancements like semantic text analysis, automated relations and categorization, sentimental analysis, etc. -- effectively, giving meaning to data.

More traditional ECM components, such as workflows, content lifecycle management and flexibility, demonstrate much of the same. Smart content architecture along with intelligent, adaptive workflow and processed or deep integration with the core applications within information systems are all making enterprise content-centric applications smarter and are refining the way intelligence is brought to content.

In short, content is getting smarter on the inside as much as on the outside.

In fact, such disruptive phenomena as Big Data or the new semantic technology on the scene are huge opportunities for enterprise content management solutions. They are bringing new solutions and possibilities in business intelligence, semantic text analysis, data warehousing and caching that require integration into existing content-centric applications, all without rewriting them.

As a result, Big Data and smart content will push more of enterprise content management toward technical features such as software interoperability, extensibility and integration capabilities.

These developments will also demand a clean and adaptive architecture that is flexible enough to evolve as new standards arise to bridge CMS and semantic technologies, as well as connectors, to a back-end storage system or connectors with text-analysis solutions.

This underscores the advancements made in the development of modular and extensible platforms for content-centric applications. Taking the traditional approach of employing large enterprise content management suites that rely on older software architecture will make it harder to leverage these new and nimble opportunities.

In order to get the most value out of smart content and refine methods of dealing with Big Data, enterprise content management architects must incorporate a modern and well designed content management platform upon which to build, one that not only looks at end-user features but stays true to the development side. Enterprise content management will not be reinvented; Big Data and smart content are evolutions, not revolutions, in the industry.

I will continue on this subject in my future posts.

Sunday, April 28, 2013

Optimize Web Experience Management

Leading enterprises strive to acheeve higher levels of customer engagement through online channels, and this means they must easily, quickly and cost effectively provide fresh, personal, relevant content anytime, anywhere, on any device, through a consistent and dynamic user experience.

Traditional web content management system (CMS) solutions are no longer sufficient, and a richer and broader range of capabilities that enable web experience management - managing and optimizing the site visitor experience across the web, mobile apps, social networks and more - must now be leveraged in this new era of engagement.

The Need for Web Experience Management

Over the last few years, the Internet has undergone a tremendous amount of fundamental change in its landscape - socia1, personal and mobile.

1. Social - The Web is becoming increasingly more social and much less anonymous. The power of sharing can enhance or destroy brands in seconds.

2. Personal - While the Internet is continuously expanding in terms of ubiquity, at the same time it's becoming much more local and much more personal in terms of user experience.

3. Mobile The growth of mobile access to the Internet is rapidly expanding to the point where access from tablets and phones will soon exceed that from desktops and laptops.

The very way we communicate with customers is changing, and when fundamental change like this occurs, those who recognize the change and move quickly to adapt will benefit the most.

A New Era of Engagement

Each of these trends reinforces the others and fuels further adoption and innovation. It is these technologies, the behaviors and capabilities they foster that have brought us to a new era which Forrester calls the "era of engagement."

Driving these trends are people - our friends, leads, customers, critics, and fans. This is our audience and the other half of the conversation, and in today's age of engagement, they want to participate and expect us to engage them on their terms, on their schedule, in the context of their location, in their language and optimized for their device. To effectively tackle this challenge of serving a mass audience with limited resources, enterprises require strategy and effective tools to help get the job done.

Web experience management (WEM) provides us with the tools to take on this otherwise daunting task. The capabilities of WEM allow you to create, manage and deliver dynamic targeted and consistent content across various online channels including your website, social media, marketing campaign sites, mobile applications, etc. It takes a lot more than a traditional Web CMS to meet these new demands.

Key Principles of Web Experience Management

To effectively implement WEM, enterprises must start with their business strategy and goals which should drive their messaging and engagement strategy and which in turn should drive their content strategy. In other words, the strengths, weaknesses, threats and opportunities that businesses face should be considered first and foremost.

Too often organizations fail to do this by jumping straight into a technology selection without due consideration of the business drivers. Around this foundation, we wrap the fundamentals of basic Web content management. It is important to remember that content is still king. Business users and marketers need easy to use, yet powerful, content authoring and publishing capabilities.

They need rich content models that allow them to create engaging visitor experiences, to easily create new content assets, to quickly find and re-purpose existing content, and to preview content and the site visitor experience for all online channels.

Upon this foundation, an effective WEM solution provides a comprehensive collection of capabilities that allow organizations to create, manage and deliver dynamic, targeted and consistent content and visitor experiences across multiple touch points -corporate website, dedicated marketing campaign sites, mobile applications, social media sites, etc.

While WEM requirements are going to vary from organization to organization, some of the most critical features needed by essentially all enterprises include content targeting and personalization, mobile device support, faceted search and navigation, multi-channel publishing, integrated Web analytics, and campaign management.

Tuesday, April 23, 2013

Knowledge Management Applications - Coveo for Service and Support

In my last two posts about Coveo products, I described Coveo search applications - Coveo for advanced web search and Coveo for advanced enterprise search. Today, I will complete describing Coveo products with Coveo knowledge management application - Coveo for service and support.

With Coveo, knowledge required to solve cases faster can be found wherever it resides, within and beyond the knowledge base. Many companies are challenged with the proliferation of data, in multiple systems, communities, on-premise and in the cloud. Knowledge is everywhere and hard to manage.

Coveo solves this challenge by placing information from anywhere, related to the agent’s context, directly in front of them. Coveo technology automatically "reads" case information, established context, and instantly shows contextually relevant content and experts directly within the CRM such as Salesforce, or within a separate Insight Console. Coveo creates information mash-ups regardless of where the information resides, combined with advanced enterprise search and navigation abilities that bring your entire knowledge ecosystem to your agents.

Such knowledge availability decreases case resolution time, increases first contact resolution, and empowers lower level agents to become productive faster and to solve more complex cases. The results show dramatic impact on contact center capacity and customer satisfaction.

Features

Solutions and experts from anywhere - Coveo automatically presents 360° views of customer, case, or product information and communications, as well as experts who can help. Using advanced data enrichment, solutions and customer insight can stem from multiple sources, across enterprise, community, and social content.

Advanced enterprise search and navigation - expanded views enable deep, broad, knowledge exploration for cases, securely, across any enterprise content.

United indexing - Coveo federates searches and mash-ups from cloud, enterprise, and social data securely and in real time—regardless of format or source. It indexes source data from Salesforce, SharePoint, databases, file shares, Exchange, Dropbox, Lithium, Gmail, etc.

Expertise finding - dynamically, through context and topics, from internal colleagues to external experts, Coveo locates people with experience relevant to each case and customer.

Customer is in the center - Coveo cuts across departmental and system silos and enriches cases with sales or engineering content, thus providing richer and more relevant customer interactions. Conversely, other departments benefit from information generated by agents to inform product development and sales.

Virtual interaction - consolidates all customer and prospect communication and interactions from any channel, bringing together opportunities, cases, transactions, e-mails, events, cases, calls, tweets, etc.

Customization - The intuitive admin interface enables customization of any objects and combinations of information, including custom fields.

Saturday, March 30, 2013

Search Applications - Coveo - Advanced Website Search

In my last post, I described Coveo for advanced enterprise search. In this post, I will describe Coveo for advanced website search.

Coveo takes your existing Web Content Management to a new level of insight and productivity. Coveo creates a virtual integration layer between your WCM and all your company’s key information sources (knowledge bases, databases, cloud content) to provide powerful, consolidated insight.

Coveo recommends the most relevant content to visitors, powered by indexing and search across any set of diverse systems. You can increase customer satisfaction by finding the missing content your customers are looking for. Smart navigation and search makes relevant content quickly apparent for your users. Website visitors will be presented with relevant, related content without the administrator having to create the links and without the visitor having to search!

You can instantly correlate content from your WCM/CMS such as Sitecore with other key information sources, including your Salesforce, SharePoint, Exchange, Lithium, and more.

Features

WCM Features

Out-of-the-box usability - you can start getting insight immediately without any difficult set up or configuration.

Related Topics - automatically correlate site content to relevant content based on similar themes and attributes.

Composite Views - composite views that combine relevant site content with other corporate system outside WCM (Communities, Intranet, CRM, Social, etc.)

Modular, flexible design - Template-based rendering for easy customization, reusable and extensible user controls for deep customization.

Computed Facets - configurable facets ideal for eCommerce websites, providing dynamic calculations of relevant product information such as average or summed prices, as well as filtering by price ranges.

Native integration with leading WCMs - API-Level Integration with Sitecore, SharePoint and SDL Tridion provides support for live indexing, security trimming and metadata search.

Faceted Search and Navigation - More intuitive, complementing traditional keyword searches with guided navigation and conversational search that leverages metadata for increased relevance and precision.

Search Analytics - Provides valuable information about visitor search behavior, content usage (top queries) and gaps in content (unsuccessful queries), offering unparalleled insights into key trends and more agile decision-making.

Indexing

Audio-video Indexing - The speech in audio or video files can be indexed with the optional Audio Video Search module. It creates an accurate transcript of speech content that is aware of the enterprise's vocabulary (i.e. proper names, employee names, domain terms), and allows users to effectively search audio and video content as easily as they search document content. When searching, the exact location of the searched terms are highlighted in the timeline of the audio or video player.

Connector Framework - Connector APIs enable easy integration with most repositories, including a flexible security API to support the security models of the indexed repositories.

Converters - Tens of file formats are supported out of the box, including PDFs, Office documents, Lotus Notes, HTML, XML, Text files, etc. Metadata contained in audio and images file formats is also indexed, while the text contained in images can be index with the optional OCR module.

Languages - Languages are automatically identified at indexing time, improving content processing and relevance algorithms.

Metadata mapping - Regardless of the actual naming for the metadata in the indexed repositories, the system supports configurable mapping to a specified internal field representation. For instance, an index containing both Exchange and Lotus Notes emails will merge the “From” and “To” and “Subject” metadata even if they use different names for these fields.

OCR- The Optical Character Recognition (OCR) module allows the indexing of text content from files such as scanned documents stored in image or PDF files.

Pre/Post conversion scripts - Conversion scripts are hooks in the indexing pipeline that allows administrators to fully customize the way documents are indexed. There are two types of scripts, those that are executed before and those executed after the conversion of the document from its binary representation to indexable metadata and text.

Push API - Provides a simple way to integrate with external systems. All the calls necessary to support all the advanced features of the indexing pipeline are available through this API.

Tagging - Metadata can be injected on documents at search time, enabling search and facets on these new metadata in real-time. An example of usage is the addition of user-created tags on documents.

Security

Document Level Security - Data sources can be configured to index document permissions with content, making Early-binding security possible, or permissions can be set directly for all documents of this source.

Index Security - Security is integrated directly in the index structures to ensure that users only see content they are entitled to see. Early and Late security binding are both handled at the index level to deliver the best performance and security.

Index Segmentation - In addition to the document level securities reflecting the underlying repository permissions, the index can be segmented into collections with their own access restrictions.

Security Freshness - Changes in the group/user structure are constantly monitored and refreshed in Coveo’s security cache. An administrator can also force a refresh of the cache if required

Security Normalization - Securities from different systems are normalized within the index so that users are automatically assigned with all proper security identifiers when accessing Coveo. This ensures that users see all the content they are entitled to see.

Super User Access - The main system administrator can grant temporary and audited rights to a specified user to search and access content for which he normally does not have access rights. Typical uses are e-Discovery, forensic, etc.

Reporting and Analytics

3rd party analytics integration - The Coveo analytics database allows the use of third-party reporting tools for more complex or custom reporting. An administrator can also easily configure the search interfaces to integrate third-party Web Analytics systems such as Google Analytics.

Advanced Query Analytics - Captures data on all user interactions with the search interfaces including result click-through and the use of different search UI functions. Reporting interface allows administrators to analyze the captured data, to elevate the most popular results, or select the correct result, for given queries.

Query & Indexing Logs - Comprehensive reports and statistics with graphical views on system status, queries, content, history, etc. Live console gives administrators a real-time view of what is going on the system.

Text Analytics

Configurable Text Analytics - An administrator can configure a workflow that will create new metadata based on content analysis, rules and context, such as Themes, Named-entities, Regular Expressions.

Incremental updates - An administrator can configure update schedules to capture recent changes in the index.

Interactive Fine Tuning - Extraction parameters, normalization and blacklisting can be refined and metadata regenerated without re-indexing the full documents set.

Named Entity Extraction - Entities such as persons, locations, and organizations are automatically extracted from indexed content. Additional entities can be configured in the system.

Plug-ins - Additional, 3rd party, plugins can be added to the Text Analytics workflow. For example, domain/organization specific taxonomies can be used in the process.

Rule-based Extraction - Configurable rules can be used to add specific metadata to documents.

Theme Extraction - Themes are Topics and Concepts automatically extracted from indexed content.

Next time, I will complete describing Coveo products with Coveo for service and support.

Monday, March 11, 2013

Search Applications-Coveo-Advanced Enterprise Search-Part 2

Yesterday, I mentioned that Coveo offers three products - Coveo for advanced enterprise search, Coveo for advanced website search, Coveo for service and support. I presented the some features of Coveo for advanced enterprise search product.

Today, I will complete presenting this product.

User Interface

Coveo InsightBox - different field values are suggested while a user types a query.

Facets: AND/OR mode - for specific facets, multi-selection can be switch from OR (default) to AND

Facets: Computed Fields - computations can be applied of specific facets: Sums (Avgs, Mins, Maxs).

Facets: Search-as-you-type - search-as-you-type on all unique values of specified facets.

Facets: Sort-by - facet values can be sorted based on their label, count or computed value.

Mobile UI - user Interface compatible with iOS browsers.

Relevance Ranking - results are ordered by default based on user profile, social and other context. This can be easily tuned and configured by an administrator.

Result Sort-by any Field - results can be ordered by any field. This is configured by the administrator.

Secure, Federated search - log in once in Coveo and search, navigate and consolidate dozens of different data sources simultaneously. Trim results based on user permissions.

Sort-by any Field - results can be ordered by any field, as configured by the UI administrator.

Faceted Search and Navigation - relevant metadata and fields can be used to populate facets within a result set. User can select multiple facet values dynamically and get instant changes in the result list.

Advanced User Interface

Export Results to Excel - results can be downloaded and opened in Excel. Administrator can select which metadata to include in the export.

Floating Searchbar - Windows users can activate a floating Search bar and search Coveo without the need to start a Web browser.

Outlook Sidebar - Outlook users get contextual results based on their context and selection. They can also search for emails, files, people and SharePoint without leaving Outlook.

Tagging - users have the ability to add custom tags and annotations to results. Tags are searchable, are applied in the index in real-time and are available to other users.

Widgets - an administrator can configure widgets to display results in advanced visual representation.

Windows Desktop Indexing - Windows users can search their local files and email archives. A desktop agent is required to capture content and synchronize it with the centralized Coveo index.

Relevance

Configurable Ranking - administrator can assign weight on more than 20 ranking attributes, such as Term proximity, TFIDF, dates, Terms in Title, Content reputation.

Query Correction - the spelling of the query is checked against the index content in order to suggest proper spelling even for words that are not normally part of general dictionaries (example: internal project codes, people names, etc.)

Query Ranking Expressions (QRE) - for each UI, an administrator can configure specific ranking rules, based on context and result set. This is an easy and flexible way to promote content based on profile attributes of the current user, such as locations, history, languages, roles.

Stemming - variations of a keyword with a similar basic meaning are treated as synonyms, broadening the search when required.

Thesaurus - an administrator can create a thesaurus and link it to the query. Thesaurus can be created from scratch or imported from existing enterprise content.

Top Results - an administrator can assign specific results to appear at the top of the list for specific queries.

Collaborative/Social ranking - click through data and Manual document rating are used in relevance calculation. This is automatically shared among colleagues based on their social proximity.

Administration and Configuration

Admin Roles - the main system administrator can delegate partial administration permissions based on roles (interface designer, system administrator, collection administrator, etc.)

APIs - administration APIs allow custom development and integration of the administration functions into external systems.

Audience Management - administrator can define multiple audiences and assign specific UIs to them.

Installation Kit - everything is installed and initially configured using an install kit for easy deployment.

Interface Editor - for each User Interface, an administrator can configure Result templates, CSS, Facets, Sort-keys and other parameters.

Monitoring/Email alerts - different system conditions are monitored and email alerts can be sent to report important system events, such a disk space running low.

Web-based Administration UI - simple, Web-based UI for easy administration.

I will describe Coveo for advanced website search in my next post.