Showing posts with label Search Applications. Show all posts
Showing posts with label Search Applications. Show all posts

Wednesday, September 30, 2015

conceptClassifier for SharePoint

conceptClassifier for SharePoint is the enterprise automatic semantic metadata generation and taxonomy management solution. It is based on an open architecture with all APIs based on XML and Web Services. conceptClassifier for SharePoint supports all versions of SharePoint, SharePoint Online, Office 365, and OneDrive for Business.

Incorporating industry recognized Smart Content Framework™ and intelligent metadata enabled solutions, conceptClassifier for SharePoint provides a complete solution to manage unstructured and semi-structured data regardless of where it resides.

Utilizing unique compound term processing technology, conceptClassifier for SharePoint natively integrates with SharePoint and solves a variety of business challenges through concept identification capabilities.

Key Features
  • Tag content across the enterprise with conceptual metadata leveraging valuable legacy data.
  • Classify consistent meaningful conceptual metadata to enterprise content, preventing incorrect meta tagging.
  • Migrate tagged and classified content intelligently to locations both within and outside of SharePoint.
  • Retrieve precise information from across the enterprise when and how it is needed.
  • Protect sensitive information from exposure with intelligent tagging.
  • Preserve information in accordance with records guidelines by identifying documents of record and eliminating inconsistent end user tagging.
Components

conceptClassifier

Both automated and manual classification is supported to one or more term sets within the Term Store and across content hubs.

conceptTaxonomyManager

This is an advanced enterprise class, easy-to-use taxonomy and term set development and management tool. It integrates natively with the SharePoint Term Store reading and writing in real-time ensuring that the taxonomy/term set definition is maintained in only one place, the SharePoint Term Store. Designed for use by Subject Matter Experts, the Term Store and/or taxonomy is easily developed, tested, and refined.

Term Set Migration tools are also a component of conceptTaxonomyManager that enable term sets to be developed on one server (e.g. on-premise server) and then migrated to another server (e.g. Office 365 server) in an incremental fashion and preserving all GUIDs. This is a key requirement in migration.

conceptSearch Compound Term Processing Engine

Licensed for the sole use of building and refining the taxonomy/term set, the engine provides automatic semantic metadata generation that extracts multi-word terms or concepts along with keywords and acronyms. conceptSearch is an enterprise search engine and is sold as a separate product.

SharePoint Feature Set

Provides SharePoint integration and an additional multi-value pick-list browse taxonomy control enabling users to combine free text and taxonomy browse searching.

Products

These are base platform and optional products that are needed to solve your particular business process challenge and leverage your SharePoint investment.

Search Engine Integration

This functionality is provided via conceptClassifier for SharePoint to integrate with any Microsoft search engine being used within SharePoint. conceptClassifier for SharePoint also supports integration with most non-SharePoint search engines and can perform on the fly classification with search engines calling the classify API.

Search engine support includes SharePoint, the former FAST products, Solr, Google Search Appliance, Autonomy, and IBM Vivisimo. If the FAST Pipeline Stage is required, this is sold as a separate product.

Intelligent Document Classification

This functionality is provided via conceptClassifier for SharePoint, to classify documents based upon concepts and multi-word terms that form a concept. Automatic and/or manual classification is included.

Content managers with the appropriate security can also classify content in real time. Content can be classified not only from within SharePoint but also from diverse repositories including File Shares, Exchange Public Folders, and websites. All content can be classified on the fly and classified to one or more taxonomies.

Taxonomy Management and Term Store Integration

With the Term Store functionality in SharePoint, organizations can develop a metadata model using out-of-the-box SharePoint capabilities. conceptClassifier for SharePoint provides native integration with the term store and the Managed Metadata Service application, where changes in the term store will be automatically available in the taxonomy component, and any changes in the taxonomy component will be immediately available in the term store.

A compelling advantage is the ability to consistently apply semantic metadata to content and auto-classify it to the Term Store metadata model. This solves the challenges of applying the metadata to a large number of documents and eliminates the need for end users to correctly tag content. Utilizing the taxonomy component, the taxonomies can be tested, validated, and managed, which is not a function provided by SharePoint.

Intelligent Migration

Using conceptClassifier for SharePoint, an intelligent approach to migration can be achieved. As content is migrated, it is analyzed for organizationally defined descriptors and vocabularies, which will automatically classify the content to taxonomies, or optionally the SharePoint Term Store, and automatically apply organizationally defined workflows to process the content to the appropriate repository for review and disposition.

Intelligent Records Management

The ability to intelligently identify, tag, and route documents of record to either a staging library and/or a records management solution is a key component to driving and managing an effective information governance strategy. Taxonomy management, automatic declaration of documents of record, auto-classification, and semantic metadata generation are provided via conceptClassifier for SharePoint and conceptTaxonomyWorkflow.

Data Privacy

Fully customizable to identify unique or industry standard descriptors, content is automatically meta-tagged and classified to the appropriate node(s) in the taxonomy based upon the presence of the descriptors, phrases, or keywords from within the content.

Once tagged and classified the content can be managed in accordance with regulatory or government guidelines. The identification of potential information security exposures includes the proactive identification and protection of unknown privacy exposures before they occur, as well as monitoring in real time organizationally defined vocabulary and descriptors in content as it is created or ingested. Taxonomy, classification, and metadata generation are provided via conceptClassifier for SharePoint.

eDiscovery, Litigation Support, and FOIA Requests

Taxonomy, classification, and metadata generation are provided via conceptClassifier for SharePoint. This is highly useful when relevance, identification of related concepts, vocabulary normalization are required to reduce time and improve quality of search results.

Text Analytics

Taxonomy, classification, and metadata generation are provided via the conceptClassifier for SharePoint. A third party business intelligence or reporting tool is required to view the data in the desired format. This is useful to cleanse the data sources before using text analytics to remove content noise, irrelevant content, and identify any unknown privacy exposures or records that were never processed.

Social Networking

Taxonomy, classification, and metadata generation are provided via conceptClassifier for SharePoint. Integration with social networking tools can be accomplished if the tools are available in .NET or via SharePoint functionality. This is useful to provide structure to social networking applications and provide significantly more granularity in relevant information being retrieved.

Business Process Workflow

conceptTaxonomyWorkflow serves as a strategic tool managing migration activities and content type application across multiple SharePoint and non-SharePoint farms and is platform agnostic. This add-on component delivers value specifically in migration, data privacy, and records management, or in any application or business process that requires workflow capabilities.

conceptTaxonomyWorkflow is required to apply action on a document, optionally automatically apply a content type and route to the appropriate repository for disposition.

An additional add-on product, conceptContentTypeUpdater is deployed at the site collection level, can be used by site administrators, and will change the SharePoint content type based on results from pre-defined workflows and is used only in the SharePoint environment.

Where does conceptClassifier for SharePoint fill the gaps?
  • SharePoint has no ability to automatically create and store classification metadata.
  • SharePoint has no taxonomy management tools to manage, test, and validate taxonomies based on the Term Store.
  • SharePoint has no auto-classification capabilities.
  • SharePoint has no ability to generate semantic metadata and surface it to search engines to improve search results.
  • SharePoint has no ability to automatically tag content with vocabulary or retention codes for records management.
  • SharePoint has no ability to automatically update the content type for records management or privacy protection and route to the appropriate repository.
  • SharePoint has no ability to provide intelligent migration capabilities based on the semantic metadata within content, identify previously undeclared documents of record, unidentified privacy exposures, or information that should be archived or deleted.
  • SharePoint has no ability to provide granular and structured identification of people, content recommendations, and organizational knowledge assets.
Leveraging Your SharePoint Investment

When evaluating a technology purchase and the on-going investment required to deploy, customize, and maintain, the costs can scale quickly. Because conceptClassifier for SharePoint is an enterprise infrastructure component, you can leverage your investment through:
  • Native real-time read/write with the term store.
  • Ability to implement workflow and automatic content type updating.
  • Reduce IT Staff requirements to support diverse applications.
  • Reduce costs associated with the purchase of multiple, stand-alone applications
  • Deploy once, utilize multiple times.
  • Rapidly integrated with any SharePoint or any .Net application.
  • Used by Subject Matter Experts, not IT staff, does not require outside resources to manage and maintain.
  • Eliminate unproductive and manual end user tagging and the support required by business units and IT.
  • Reduce hardware expansion costs due to scalability and performance features.
  • Deployable as an on-premise, cloud, or hybrid solution.
Leveraging Your Business Investment

The real value of your investment includes both technology and the demonstrable ROI that can be generated from improving business processes. conceptClassifier for SharePoint has been deployed to solve individual or multiple challenges including:
  • Enables concept based searching regardless of search engine.
  • Reduces organizational costs associated with data exposures, remediation, litigation, fines and sanctions.
  • Eliminates manual metadata tagging and human inconsistencies that prohibit accurate metadata generation.
  • Prevents the portability and electronic transmission of secured assets.
  • Assists in the migration of content by identifying records as well as content that should have been archived, contains sensitive information, or should be deleted.
  • Protects record integrity throughout the individual document lifecycle.
  • Creates virtual centralization through the ability to link disparate on-premise and off-premise content repositories.
  • Ensures compliance with industry and government mandates enabling rapid implementation to address regulatory changes.
Benefits

The combination of the Smart Content Framework™, conceptClassifier for SharePoint, and the deployment of intelligent metadata enabled solutions result in a comprehensive and complete approach to SharePoint enterprise metadata management. Specific benefits are:
  • Eliminate manual tagging.
  • Improve enterprise search.
  • Facilitate records management.
  • Detect and automatically secure unknown privacy exposures.
  • Intelligently migrate content.
  • Enhance eDiscovery, litigation support, and FOIA requests.
  • Enable text analytics.
  • Provide structure to Enterprise 2.0.

Tuesday, June 30, 2015

Search Applications - Concept Searching

Concept Searching Limited is a software company which specializes in information retrieval software. It has products for Enterprise search, Taxonomy Management and Statistical classification.

Concept Searching Technology Platform

The Concept Searching Technology Platform is based on our Smart Content Framework™ for information governance, and incorporates best practices for developing an enterprise framework to mitigate risk, automate processes, manage information, protect privacy, and address compliance issues. Underlying the framework is the technology to:
  • Automatically generate semantic metadata using Compound Term Processing.
  • Auto-classify content from diverse repositories.
  • Easily develop, deploy, and manage taxonomies.
The framework is being used to enable intelligent metadata enabled solutions to improve search, records management, enterprise metadata management, text analytics, migration, enterprise social networking, and data security.

Features
  • Compound terms are extracted when content is indexed from internal or external content sources, enabling the delivery of greater precision of relevant content at the top of search results.
  • Relevance ranking displays extracts from the documents based on the query.
  • Search refinement delivers to the end user highly correlated concepts that may be used to refine the search.
  • Taxonomy browse capabilities are standard.
  • Documents can be classified into one or more taxonomy nodes, enhancing the precision of documents returned.
  • In addition to static summaries, Dynamic Summarization, a modified weighting system, can be applied that will identify in real-time short extracts that are most relevant to the user’s query.
  • Related Topics will return results based on the conceptual meaning of the search terms used, using the ability to generate compound terms in a search. For example, ‘triple’ is a single word term but ‘triple heart bypass’ is a compound term that provides a more granular meaning.
  • Based on previous queries, or on extracts retrieved, end users can use the text to perform additional searches to retrieve more granular results.
  • The product is based on an open architecture with all API’s based on XML and Web Services. Transparent access to system internals including the statistical profile of terms is standard.
  • Highly scalable.
  • High performance specifically with classification occurring in real time.
  • Easily customized to achieve your organizations’ objectives.
Base Components in the Concept Searching Technology Framework

Conceptual Search Platform

conceptSearch, is Concept Searching’s enterprise search product and a key component in the Concept Searching Technology Platform. It is a unique, language independent technology and is the first content retrieval solution to integrate relevance ranking based on the Bayesian Inference Probabilistic Model and concept identification based on Shannon’s Information Theory.

Unlike other enterprise search engines that require significant customization with marginal results, conceptSearch is delivered with an out-of-the-box application that demonstrates a simple search interface and indexing facilities for internal content, web sites, file systems, and XML documents. Application developers experience a minimal learning curve and the organization can look forward to a rapid return on investment.

Because of the innovative technology, conceptSearch delivers both high precision and high recall. Precision and recall are the two key performance measures for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate for both precision and recall. The ideal goal is to have these features balanced. Compound term processing has the ability to increase precision with no loss of recall.

conceptSearch is particularly important for organizations that need sophisticated search and retrieval solutions. By weighting multi-word phrases, instead of single words, or words in proximity, the retrieval experience is more accurate and relevant. The ability for the search engine to identify concepts enables organizations to improve the search experience for a variety of business requirements.

Search Engine Integration

This functionality is provided via the Concept Searching Technology platform to integrate with any search engine. The Concept Searching Technology platform can perform as on the fly classification with search engines calling the classify API. Search engine support includes SharePoint, the former FAST products, Office 365 Search, Solr, Google Search Appliance, Autonomy, and IBM Vivisimo. If the FAST Pipeline Stage is required, this is sold as a separate product.

conceptClassifier

conceptClassifier is a leading-edge rules based categorization module providing control of rules-based descriptors unique to an organization. conceptClassifier delivers a categorization descriptor table, which is easy to implement and maintain, through which all rules and terms can be defined and managed. This approach eliminates the error-prone results of ‘training’ algorithms typically found in other text retrieval solutions and enables human intervention to effectively tune classification results.

Functionality is provided via the Concept Searching Technology platform, to classify documents based upon concepts and multi-word terms that form a concept. Automatic and/or manual classification is included. Knowledge workers with the appropriate security rights can also classify content in real time. Content can be classified from diverse repositories including SharePoint, Office 365, file shares, Exchange public folders, and websites. All content can be classified on the fly and classified to one or more taxonomies.

conceptTaxonomyManager

This is an advanced enterprise class, easy-to-use taxonomy development and management tool, still unique in the industry. Developed on the premise that a taxonomy solution should be used by business professionals, and not the IT team or librarians, the end result is a highly interactive and powerful tool that has been proven to reduce taxonomy development by up to 80% (client source data).

conceptTaxonomyManager is a simple to use, has an intuitive user interface designed for Subject Matter Experts, and does not require IT or Information Scientist expertise to build, maintain and validate taxonomies for the enterprise. conceptTaxonomyManager has the capability to automatically group unstructured content together based on an understanding of the concepts and ideas that share mutual attributes while separating dissimilar concepts.

This approach is instrumental in delivering relevant information via the taxonomy structure as well as using the semantic metadata in enterprise search to reduce time spent finding information, increase relevancy and accuracy of the search results, and enable the re-use and re-purposing of content. Using one or more taxonomies, unstructured content can be leveraged to improve any application that uses metadata. This flexibility extends to records management, information security, migration, text analytics, and collaboration.

Intelligent Migration

Using the Concept Searching Technology platform an intelligent approach to migration can be achieved. As content is migrated it is analyzed for organizationally defined descriptors and vocabularies, which will automatically classify the content to taxonomies, or in the SharePoint environment, the SharePoint Term Store, and automatically apply organizationally defined workflows to process the content to the appropriate repository for review and disposition.

conceptSQL

This product provides the ability to define a document structure based on information held in a Microsoft SQL Server. A document can include any number of text and metadata fields and can span multiple tables if required. conceptSQL supports SQL 2005, 2008, and 2012. A powerful but easy to use configuration tool is supplied eliminating the need for any programming. Templates are provided for out of the box support for Documentum, Hummingbird, and Worksite/Interwoven DMS.

SharePoint Feature Set

The SharePoint Feature Set includes the following components: farm solution with feature sets, Term Store integration, taxonomy tree control for editing, refinement panel integration, event handlers for notification of changes, management of classification status column, web service advanced functionality (implement system update or preserve GUIDS), automated site column creation.

Intelligent Records Management

The ability to intelligently identify, tag, and route documents of record to either a staging library and/or a records management solution is a key component in driving and managing an effective information governance strategy. Taxonomy management, automatic declaration of documents of record, auto-classification, and semantic metadata generation are provided via the Concept Searching Technology platform and conceptTaxonomyWorkflow.

Data Privacy

Fully customizable to identify unique or industry standard descriptors, content is automatically meta-tagged and classified to the appropriate node(s) in the taxonomy based upon the presence of the descriptors, phrases, or keywords from within the content. Once tagged and classified the content can be managed in accordance with regulatory or government guidelines.

The identification of potential information security exposures includes the proactive identification and protection of unknown privacy exposures before they occur, as well as real-time monitoring of organizationally defined vocabulary and descriptors in content as it is created or ingested. Taxonomy, classification, and metadata generation are provided via the Concept Searching Technology platform and conceptTaxonomyWorkflow.

eDiscovery and Litigation Support

Taxonomy, classification, and metadata generation are provided via the Concept Searching Technology platform. This is highly useful when relevance, identification of related concepts, vocabulary normalization are required to reduce time and improve quality of search results.

Text Analytics

Taxonomy, classification, and metadata generation are provided via the Concept Searching Technology platform. A third party business intelligence or reporting tool is required to view the data in the desired format. This is useful to cleanse the data sources before using text analytics to remove content noise, irrelevant content, and identify any unknown privacy exposures or records that were never processed.

Social Networking

Taxonomy, classification, and metadata generation are provided via the Concept Searching Technology platform. Integration with social networking tools can be accomplished if the tools are available in .NET or via SharePoint functionality. This is useful to provide structure to social networking applications and provide significantly more granularity in relevant information being retrieved.

Business Process Workflow

conceptTaxonomyWorkflow serves as a strategic tool managing migration activities and content type application across multiple SharePoint and non-SharePoint farms and is platform agnostic. This add-on component delivers value specifically in migration, data privacy, and records management, or in any application or business process that requires workflow capabilities.

conceptTaxonomyWorkflow is required to apply action on a document, optionally automatically apply a content type and route to the appropriate repository for disposition.

Tuesday, December 30, 2014

Latest Applications in Enterprise Search

In my previous post, I described the future of enterprise search. In this post, I will describe few new search applications that could be interesting.

Concept Searching

Founded in 2002, Concept Searching provides software products that deliver automatic semantic metadata generation, auto-classification, and powerful taxonomy management tools. Concept Searching is the only platform independent statistical metadata generation and classification software company in the world that uses concept extraction and compound term processing to significantly improve access to unstructured information. The Concept Searching Microsoft suite of technologies runs in all versions of SharePoint, Office 365, and OneDrive for Business.

The technologies are being used to improve search outcomes, deploy an enterprise metadata repository, enable effective records management, identify and secure sensitive information, improve governance and compliance, social tagging, collaboration, text analytics, facilitate eDiscovery, and drive intelligent migration.

Concept Searching, developer of the Smart Content Framework™, provides organizations with a method to mitigate risk, automate processes, manage information, protect privacy, and address compliance issues. This infrastructure framework utilizes a set of technologies that encompasses the entire portfolio of unstructured information assets, resulting in increased organizational performance and agility.

Lexalytics, Inc.

Lexalytics provides enterprise and hosted text analytics software to transform unstructured text into structured data. The software extracts entities (people, places, companies, products, etc.), sentiment, quotes, opinions, and themes (generally noun phrases) from text. Text is considered unstructured data which comprises somewhere between 31% and 85% of what is stored in any given enterprise.

Lexalytics is an OEM vendor of text analytics and sentiment analysis technology for social media monitoring, brand management, and voice-of-customer industries. The software uses natural language processing technology to extract the above-mentioned items from social media and forums; the voice of the customer in surveys, emails, and call-center feedback, traditional media, pharmaceutical research and development, internal enterprise documents, and others.

Lexalytics, provides a text mining engine that is used by a number of search partners like Coveo, Playence, and Oracle to add additional metadata to their search. This is additional intelligence around "just what do those words actually mean?" In other words, this engine is boosting the value of search by providing more information into the index. This enables other applications, and helps search be "smarter".

MaxxCAT

MaxxCAT provides enterprise search solutions for corporate intranets, web sites, databases, file systems and applications, and other environments that require rapid document retrieval from multiple data sources. The flagship products offered by MaxxCAT are the SB-250 series and the EX-5000 series network search appliances. Also available are series of cloud-enables storage appliances.

Basis Technology

Founded in 1995, this software company specializes in applying artificial intelligence techniques to understanding documents written in different languages. Their software enhances parsing tools by classifying the role of words and provides metadata on the role of words to other algorithms. Software from Basis Technology will, for instance, identify the language of an incoming stream of characters and then identify the parts of each sentence like the subject or the direct object.

The company is best known for its Rosette Linguistics Platform which uses Natural Language Processing techniques to improve information retrieval, text mining, search engines and other applications. The tool is used to create normalized forms of text by major search engines, and, translators. Basis Technology software is also used by forensic analysts to search through files for words, tokens, phrases or numbers that may be important to investigators.

dtSearch

Founded in 1991, this company specializes in text retrieval software. Its current range of software includes products for enterprise desktop search, Intranet/Internet spidering and search, and search engines for developers (SDK) to integrate into other software applications

LTU technologies

Founded in 1999, this company is in the field of image recognition for commercial and government customers. The company provides technologies for image matching, similarity and color search for integration into applications for mobile, media intelligence and advertisement tracking, ecommerce and stock photography, brand and copyright protection, law enforcement and more

Sematext Group, Inc.

This company's product SSA - Site Search Analytics - continuously monitors, measures, and improves the search experience. It identifies top queries, problematic zero-hit queries, common misspellings, etc. It measures and compares search relevance and improves conversion rates. It is available It is available on-premises and in the cloud.

Exorbyte

This is a privately held software company which was founded in 2000 in Konstanz, Germany, with an additional office in the United Kingdom (Bristol). The company develops intelligent software for search and analysis of structured and semi-structured data.

Their product MatchMaker is the leading error-tolerant search & match platform for huge master data volumes. The multiple award-winning software technology thinks, searches and finds like a human – but dramatically faster, in much more complex configurations and with no serious data restriction using keys or similar methods. It is available on-premises and in the cloud.

Federal authorities, insurance agencies, ICT firms and more use this software to identity a resolution in diverse, data-intensive business processes such as input management, enterprise search and data quality. It has easy customization and integration.

Inbenta

Founded in 2005,this company provides enterprise semantic search technology based on artificial intelligence and natural language processing. It offers intuitive search solutions and intelligent content support for website and corporate Intranets.

Content Analyst Company

This is a privately held software company which develops concept-aware text analytics software called CAAT, which is licensed to software product companies for use in eDiscovery. In 2013, five CAAT-powered products were named in the Gartner eDiscovery Magic Quadrant Report, and the analyst firm 451 Group referred to CAAT as The Hottest Product in eDiscovery.

Content Analyst's CAAT analytics software is a machine learning system based on latent semantic indexing technology. CAAT provides several text analytics capabilities using both supervised learning and unsupervised learning methods including concept search, categorization, conceptual clustering, email conversation threading, language identification, near-duplicate identification, auto summarization and difference highlighting.

SearchYourCloud

With SearchYourCloud and its patented, federated search technology, a single search request in Outlook simultaneously and transparently searches your email, desktop and all of your cloud storage sources and delivers highly targeted results. You get exactly the information you need with just one query.

Docurated

Docurated aggregates all your documents in one place, turning them into a searchable and customizable database. Docurated will now provide Dropbox integration as well. It accelerates sales in companies looking for fast growth by making the best marketing content readily available to Sales around the world. Docurated works with your existing content stores and uses machine learning to enable your team to find and re-use the most effective content with no manual tagging or uploading.

This is the next generation visual knowledge management platform which solves the information retrieval problem for leading companies like Clorox, Omnicom, Netflix, Weather Channel, and many others. Docurated enables sales, marketing, and technology teams to surface and use the exact chart or slide they need, no matter where it is stored, without slogging through folders and files. Docurated seamlessly integrates with existing folder-based repositories.

Lucene

Apache Lucene is a free open source information retrieval software. It is supported by the Apache Software Foundation and is released under the Apache Software License. While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognizedfor its utility in the implementation of Internet search engines and local, single-site searching.

At the core of Lucene's logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene's API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, and OpenDocument documents, as well as many others (except images), can all be indexed as long as their textual information can be extracted.

These are just few search applications that are currently on the market. There are many others. Choosing the right application is based on your organization's requirements.

Tuesday, March 25, 2014

Search Applications - Vivisimo

Vivisimo was a privately held technology company that worked on the development of computer search engines. The company product Velocity provides federated search and document clustering. Vivisimo's public web search engine Clusty was a metasearch engine with document clustering; it was sold to Yippy, Inc. in 2010.

The company was acquired by IBM in 2012 and Vivisimo Velocity Platform is now IBM InfoSphere Data Explorer. It stays true to its heritage of providing federated navigation, discovery and search over a broad range of enterprise content. It covers broad range of data sources and types, both inside and outside an organization.

In addition to the core indexing, discovery, navigation and search engine the software includes a framework for developing information-rich applications that deliver a comprehensive, contextually-relevant view of any topic for business users, data scientists, and a variety of targeted business functions.

InfoSphere Data Explorer solutions improve return on all types of information, including structured data in databases and data warehouses, unstructured content such as documents and web pages, and semi-structured information such as XML.

InfoSphere Data Explorer provides analytics on text and metadata that can be accessed through its search capabilities. Its focus on scalable but secure search is part of why it became one of the leaders in enterprise search. The software’s security features are critical, as organizations do not want to make it faster for unauthorized users to access information.

Also key is the platform’s flexibility at integrating sources across the enterprise. It also supports mobile technologies such as smart phones to make it simpler to get to and access information from any platform.

Features and benefits

1. Secure, federated discovery, navigation and search over a broad range of applications, data sources and data formats.
  • Provides access to data stored a wide variety of applications and data sources, both inside and outside the enterprise, including: content management, customer relationship management, supply chain management, email, relational database management systems, web pages, networked file systems, data warehouses, Hadoop-based data stores, columnar databases, cloud and external web services.
  • Includes federated access to non-indexed systems such as premium information services, supplier or partner portals and legacy applications through the InfoSphere Data Explorer Query Routing feature.
  • Relevance model accommodates diverse document sizes and formats while delivering more consistent search and navigation results. Relevance parameters can be tuned by the system administrator.
  • Security framework provides user authentication and observes and enforces the access permissions of each item at the document, section, row and field level to ensure that users can only view information they are authorized to view in the source systems.
  • Provides rich analytics and natural language processing capabilities such as clustering, categorization, entity and metadata extraction, faceted navigation, conceptual search, name matching and document de-duplication.
2. Rapid development and deployment framework to enable creation of information-rich applications that deliver a comprehensive view of any topic.
  • InfoSphere Data Explorer Application Builder enables rapid deployment of information-centric applications that combine information and analytics from multiple sources for a comprehensive, contextually-relevant view of any topic, such as a customer, product or physical asset.
  • Widget-based framework enables users to select the information sources and create a personalized view of information needed to perform their jobs.
  • Entity pages enable presentation of information and analytics about people, customers, products and any other topic or entity from multiple sources in a single view.
  • Activity Feed enables users to "follow" any topics such as a person, company or subject and receive the most current information, as well as post comments and view comments posted by other users.
  • Comprehensive set of Application Programming Interfaces (APIs) enables programmatic access to key capabilities as well as rapid application development and deployment options.
3.Distributed, highly scalable architecture to support large-scale deployments and big data projects.
  • Compact, position-based index structure includes features such as rapid refresh, real-time searching and field-level updates.
  • Updates can be written to indices without taking them offline or re-writing the entire index, and are instantly available for searching.
  • Provides highly elastic, fault-tolerant, vertical and horizontal scalability, master-master replication and “shared nothing“ deployment.
4. Flexible data fusion capabilities to enable presentation of information from multiple sources.
  • Information from multiple sources can be combined into “virtual documents“ which contain information from multiple sources.
  • Large documents can be automatically divided into separate objects or sub-documents that remain related to a master document for easier navigation and comprehension by users.
  • Enables creation of dynamic "entity pages" that allow users to browse a comprehensive, 360-degree view of a customer, product or other item.
5. Collaboration features to support information-sharing and improved re-use of information throughout the organization.
  • Users can tag, rate and comment on information.
  • Tags, comments and ratings can be used in searching, navigation and relevance ranking to help users find the most relevant and important information.
  • Users can create virtual folders to organize content for future use and optionally share folders with other users.
  • Navigation and search results can return pointers to people to enable location of expertise within an organization and encourage collaboration.
  • Shared Spaces allow users to collaborate about items and topics that appear in their individualized views.

Thursday, May 9, 2013

Search Engine Technology

Modern web search engines are highly intricate software systems which employ technology that has evolved over the years. There are few categories of search engines that are applicable to specific browsing needs.

These include web search engines (e.g. Google), database or structured data search engines (e.g. Dieselpoint), and mixed search engines or enterprise search.

The more prevalent search engines such as Google and Yahoo! utilize hundreds of thousands of millions of computers to process trillions of web pages in order to return fairly well-aimed results. Due to this high volume of queries and text processing, the software is required to run in a highly dispersed environment with a high degree of superfluity.

Search Engine Categories

Web search engines

These are search engines that are specifically designed for searching web pages. They were developed to facilitate searching through a large amount of web pages. They are engineered to follow a multi-stage process: crawling the infinite number of pages to skim the figurative foam from their contents, indexing the foam/buzzwords in a sort of semi-structured form (for example a database), and returning mostly relevant as links to those skimmed documents or pages from the inventory.

Crawl

In the case of a wholly textual search, the first step in classifying web pages is to find an "index item" that might relate expressly to the "search term". Most search engines use sophisticated algorithms to "decide" when to revisit a particular page, to check its relevance. These algorithms range from constant visit-interval with higher priority for more frequently changing pages to adaptive visit-interval based on several criteria such as frequency of chance, popularity, and overall quality of site. The speed of the web server running the page as well as resource constraints like amount of hardware or bandwidth also figure in.

Link map

The pages that are discovered by web crawls are often distributed and fed into another computer that creates a veritable map of uncovered resources. This looks a little like a graph, on which different pages are represented as small nodes that are connected by links between the pages. The excess of data is stored in multiple data structures that allow quick access to this data by certain algorithms that compute the popularity score of pages on the web based on how many links point to a certain web page, which is how people can access any number of resources concerned with diagnosing psychosis.

Database Search Engines

Searching for text-based content in databases presents few special challenges from which a number of specialized search engines developed. Databases are slow when solving complex queries (with multiple logical or string matching arguments). Databases allow pseudo-logical queries which full-text searches do not use. There is no crawling necessary for a database since the data is already structured. However, it is often necessary to index the data in a more economized form designed to inspire a more expeditious search.

Mixed Search Engines

Sometimes, searched data contains both database content and web pages or documents. Search engine technology has developed to respond to both sets of requirements. Most mixed search engines are large Web search engines, like Google. They search both through structured and unstructured data sources. Pages and documents are crawled and indexed in a separate index. Databases are indexed also from various sources. Search results are then generated for users by querying these multiple indices in parallel and compounding the results according to "rules".

Saturday, March 30, 2013

Search Applications - Coveo - Advanced Website Search

In my last post, I described Coveo for advanced enterprise search. In this post, I will describe Coveo for advanced website search.

Coveo takes your existing Web Content Management to a new level of insight and productivity. Coveo creates a virtual integration layer between your WCM and all your company’s key information sources (knowledge bases, databases, cloud content) to provide powerful, consolidated insight.

Coveo recommends the most relevant content to visitors, powered by indexing and search across any set of diverse systems. You can increase customer satisfaction by finding the missing content your customers are looking for. Smart navigation and search makes relevant content quickly apparent for your users. Website visitors will be presented with relevant, related content without the administrator having to create the links and without the visitor having to search!

You can instantly correlate content from your WCM/CMS such as Sitecore with other key information sources, including your Salesforce, SharePoint, Exchange, Lithium, and more.

Features

WCM Features

Out-of-the-box usability - you can start getting insight immediately without any difficult set up or configuration.

Related Topics - automatically correlate site content to relevant content based on similar themes and attributes.

Composite Views - composite views that combine relevant site content with other corporate system outside WCM (Communities, Intranet, CRM, Social, etc.)

Modular, flexible design - Template-based rendering for easy customization, reusable and extensible user controls for deep customization.

Computed Facets - configurable facets ideal for eCommerce websites, providing dynamic calculations of relevant product information such as average or summed prices, as well as filtering by price ranges.

Native integration with leading WCMs - API-Level Integration with Sitecore, SharePoint and SDL Tridion provides support for live indexing, security trimming and metadata search.

Faceted Search and Navigation - More intuitive, complementing traditional keyword searches with guided navigation and conversational search that leverages metadata for increased relevance and precision.

Search Analytics - Provides valuable information about visitor search behavior, content usage (top queries) and gaps in content (unsuccessful queries), offering unparalleled insights into key trends and more agile decision-making.

Indexing

Audio-video Indexing - The speech in audio or video files can be indexed with the optional Audio Video Search module. It creates an accurate transcript of speech content that is aware of the enterprise's vocabulary (i.e. proper names, employee names, domain terms), and allows users to effectively search audio and video content as easily as they search document content. When searching, the exact location of the searched terms are highlighted in the timeline of the audio or video player.

Connector Framework - Connector APIs enable easy integration with most repositories, including a flexible security API to support the security models of the indexed repositories.

Converters - Tens of file formats are supported out of the box, including PDFs, Office documents, Lotus Notes, HTML, XML, Text files, etc. Metadata contained in audio and images file formats is also indexed, while the text contained in images can be index with the optional OCR module.

Languages - Languages are automatically identified at indexing time, improving content processing and relevance algorithms.

Metadata mapping - Regardless of the actual naming for the metadata in the indexed repositories, the system supports configurable mapping to a specified internal field representation. For instance, an index containing both Exchange and Lotus Notes emails will merge the “From” and “To” and “Subject” metadata even if they use different names for these fields.

OCR- The Optical Character Recognition (OCR) module allows the indexing of text content from files such as scanned documents stored in image or PDF files.

Pre/Post conversion scripts - Conversion scripts are hooks in the indexing pipeline that allows administrators to fully customize the way documents are indexed. There are two types of scripts, those that are executed before and those executed after the conversion of the document from its binary representation to indexable metadata and text.

Push API - Provides a simple way to integrate with external systems. All the calls necessary to support all the advanced features of the indexing pipeline are available through this API.

Tagging - Metadata can be injected on documents at search time, enabling search and facets on these new metadata in real-time. An example of usage is the addition of user-created tags on documents.

Security

Document Level Security - Data sources can be configured to index document permissions with content, making Early-binding security possible, or permissions can be set directly for all documents of this source.

Index Security - Security is integrated directly in the index structures to ensure that users only see content they are entitled to see. Early and Late security binding are both handled at the index level to deliver the best performance and security.

Index Segmentation - In addition to the document level securities reflecting the underlying repository permissions, the index can be segmented into collections with their own access restrictions.

Security Freshness - Changes in the group/user structure are constantly monitored and refreshed in Coveo’s security cache. An administrator can also force a refresh of the cache if required

Security Normalization - Securities from different systems are normalized within the index so that users are automatically assigned with all proper security identifiers when accessing Coveo. This ensures that users see all the content they are entitled to see.

Super User Access - The main system administrator can grant temporary and audited rights to a specified user to search and access content for which he normally does not have access rights. Typical uses are e-Discovery, forensic, etc.

Reporting and Analytics

3rd party analytics integration - The Coveo analytics database allows the use of third-party reporting tools for more complex or custom reporting. An administrator can also easily configure the search interfaces to integrate third-party Web Analytics systems such as Google Analytics.

Advanced Query Analytics - Captures data on all user interactions with the search interfaces including result click-through and the use of different search UI functions. Reporting interface allows administrators to analyze the captured data, to elevate the most popular results, or select the correct result, for given queries.

Query & Indexing Logs - Comprehensive reports and statistics with graphical views on system status, queries, content, history, etc. Live console gives administrators a real-time view of what is going on the system.

Text Analytics

Configurable Text Analytics - An administrator can configure a workflow that will create new metadata based on content analysis, rules and context, such as Themes, Named-entities, Regular Expressions.

Incremental updates - An administrator can configure update schedules to capture recent changes in the index.

Interactive Fine Tuning - Extraction parameters, normalization and blacklisting can be refined and metadata regenerated without re-indexing the full documents set.

Named Entity Extraction - Entities such as persons, locations, and organizations are automatically extracted from indexed content. Additional entities can be configured in the system.

Plug-ins - Additional, 3rd party, plugins can be added to the Text Analytics workflow. For example, domain/organization specific taxonomies can be used in the process.

Rule-based Extraction - Configurable rules can be used to add specific metadata to documents.

Theme Extraction - Themes are Topics and Concepts automatically extracted from indexed content.

Next time, I will complete describing Coveo products with Coveo for service and support.

Monday, March 11, 2013

Search Applications-Coveo-Advanced Enterprise Search-Part 2

Yesterday, I mentioned that Coveo offers three products - Coveo for advanced enterprise search, Coveo for advanced website search, Coveo for service and support. I presented the some features of Coveo for advanced enterprise search product.

Today, I will complete presenting this product.

User Interface

Coveo InsightBox - different field values are suggested while a user types a query.

Facets: AND/OR mode - for specific facets, multi-selection can be switch from OR (default) to AND

Facets: Computed Fields - computations can be applied of specific facets: Sums (Avgs, Mins, Maxs).

Facets: Search-as-you-type - search-as-you-type on all unique values of specified facets.

Facets: Sort-by - facet values can be sorted based on their label, count or computed value.

Mobile UI - user Interface compatible with iOS browsers.

Relevance Ranking - results are ordered by default based on user profile, social and other context. This can be easily tuned and configured by an administrator.

Result Sort-by any Field - results can be ordered by any field. This is configured by the administrator.

Secure, Federated search - log in once in Coveo and search, navigate and consolidate dozens of different data sources simultaneously. Trim results based on user permissions.

Sort-by any Field - results can be ordered by any field, as configured by the UI administrator.

Faceted Search and Navigation - relevant metadata and fields can be used to populate facets within a result set. User can select multiple facet values dynamically and get instant changes in the result list.

Advanced User Interface

Export Results to Excel - results can be downloaded and opened in Excel. Administrator can select which metadata to include in the export.

Floating Searchbar - Windows users can activate a floating Search bar and search Coveo without the need to start a Web browser.

Outlook Sidebar - Outlook users get contextual results based on their context and selection. They can also search for emails, files, people and SharePoint without leaving Outlook.

Tagging - users have the ability to add custom tags and annotations to results. Tags are searchable, are applied in the index in real-time and are available to other users.

Widgets - an administrator can configure widgets to display results in advanced visual representation.

Windows Desktop Indexing - Windows users can search their local files and email archives. A desktop agent is required to capture content and synchronize it with the centralized Coveo index.

Relevance

Configurable Ranking - administrator can assign weight on more than 20 ranking attributes, such as Term proximity, TFIDF, dates, Terms in Title, Content reputation.

Query Correction - the spelling of the query is checked against the index content in order to suggest proper spelling even for words that are not normally part of general dictionaries (example: internal project codes, people names, etc.)

Query Ranking Expressions (QRE) - for each UI, an administrator can configure specific ranking rules, based on context and result set. This is an easy and flexible way to promote content based on profile attributes of the current user, such as locations, history, languages, roles.

Stemming - variations of a keyword with a similar basic meaning are treated as synonyms, broadening the search when required.

Thesaurus - an administrator can create a thesaurus and link it to the query. Thesaurus can be created from scratch or imported from existing enterprise content.

Top Results - an administrator can assign specific results to appear at the top of the list for specific queries.

Collaborative/Social ranking - click through data and Manual document rating are used in relevance calculation. This is automatically shared among colleagues based on their social proximity.

Administration and Configuration

Admin Roles - the main system administrator can delegate partial administration permissions based on roles (interface designer, system administrator, collection administrator, etc.)

APIs - administration APIs allow custom development and integration of the administration functions into external systems.

Audience Management - administrator can define multiple audiences and assign specific UIs to them.

Installation Kit - everything is installed and initially configured using an install kit for easy deployment.

Interface Editor - for each User Interface, an administrator can configure Result templates, CSS, Facets, Sort-keys and other parameters.

Monitoring/Email alerts - different system conditions are monitored and email alerts can be sent to report important system events, such a disk space running low.

Web-based Administration UI - simple, Web-based UI for easy administration.

I will describe Coveo for advanced website search in my next post.

Sunday, March 10, 2013

Search Applications-Coveo -Advanced Enterprise Search - Part 1


Coveo offers three products - Coveo for advanced enterprise search, Coveo for advanced website search, Coveo for service and support. Today, I am going to present Coveo for advanced enterprise search. This product has many features, so I will start presenting them today and will finish tomorrow.

Coveo for advanced enterprise search is the enterprise search solution that automatically organizes your company’s information into actionable, on-demand knowledge. Coveo's powerful enterprise search engine correlates and analyzes all your company’s data information sources, wherever they reside. All the information in your Sharepoint, CRM, email, Cloud content, and File servers are now instantly accessible from one place.

Features

Access Real-time information from anywhere - federate searches on enterprise, social and cloud data securely and in real time—regardless of format or source.

Transform how your users access information - seamlessly integrate within existing applications and workflows to maximize impact and minimize disruption.

Digest, synthesize and utilize information faster - automatic metadata and entity extraction, themes and tagging combine to help users discover content and share findings.

Navigate content with ease - dynamic, searchable facets provide an ability to navigate to the most relevant content.

Simple to set-up and deploy with existing resources - as easy to use as any consumer web app, coupled with enterprise-grade robustness and scalability.

No hassle security integration - secure configuration out of the box is safe and easy.

Indexing

Audio-video Indexing - the speech in audio or video files can be indexed with the optional Audio Video Search module. It creates an accurate transcript of speech content that is aware of the enterprise's vocabulary (i.e. proper names, employee names, domain terms), and allows users to effectively search audio and video content as easily as they search document content. When searching, the exact location of the searched terms are highlighted in the timeline of the audio or video player.

Connector Framework - connector APIs enable easy integration with most repositories, including a flexible security API to support the security models of the indexed repositories.

Converters - multiple file formats are supported out of the box, including PDFs, Office documents, Lotus Notes, HTML, XML, Text files, etc. Metadata contained in audio and images file formats is also indexed, while the text contained in images can be indexed with the optional OCR module.

Languages - languages are automatically identified at indexing time, improving content processing and relevance algorithms.

Metadata mapping - regardless of the actual naming for the metadata in the indexed repositories, the system supports configurable mapping to a specified internal field representation. For instance, an index containing both Exchange and Lotus Notes emails will merge the "From", "To" and "Subject" metadata even if they use different names for these fields.

OCR - the Optical Character Recognition (OCR) module allows the indexing of text content from files such as scanned documents stored in image or PDF files.

Pre/Post conversion scripts - conversion scripts are hooks in the indexing pipeline that allows administrators to fully customize the way documents are indexed. There are two types of scripts, those that are executed before and those executed after the conversion of the document from its binary representation to indexable metadata and text.

Push API - provides a simple way to integrate with external systems. All the calls necessary to support all the advanced features of the indexing pipeline are available through this API.

Tagging - metadata can be injected on documents at search time, enabling search and facets on these new metadata in real-time. An example of usage is the addition of user-created tags on documents.

Reporting and Analytics

3rd party analytics integration - Coveo analytics database allows the use of third-party reporting tools for more complex or custom reporting. An administrator can also easily configure the search interfaces to integrate third-party web analytics systems such as Google Analytics.

Advanced Query Analytics - captures data on all user interactions with the search interfaces including result click-through and the use of different search UI functions. Reporting interface allows administrators to analyze the captured data, to elevate the most popular results, or select the correct result for given queries.

Query and Indexing Logs - comprehensive reports and statistics with graphical views on system status, queries, content, history, etc. Live console gives administrators a real-time view of what is going on the system.

Scalability and Fault Tolerance

Distributed Indexing - indexing process distributed in many Index Slices, each one indexing part of the content. Slices can be hosted locally (on local drives or on a SAN) or on separate servers (through IP connection) providing highly scalable architecture.

Failover and Query Scalability - index mirroring system provides high availability (if one mirror fails, the others can continue serving queries). The number of queries that can be answered per second can be doubled by doubling the number of automatically synchronized mirrors.

Performance profiles - configurable performance profiles to balance indexing total throughput, query performance and time-to-index.

Query Federation/GDI - federate queries to other instances of Coveo and merge the results from all instances into a single result page while also leveraging the ranking algorithms from the different instances.

Security

Document Level Security - data sources can be configured to index document permissions with content, making early-binding security possible, or permissions can be set directly for all documents of this source.

Index Security - security is integrated directly in the index structures to ensure that users only see content they are entitled to see. Early and late security binding are both handled at the index level to deliver the best performance and security.

Index Segmentation - in addition to the document level securities reflecting the underlying repository permissions, the index can be segmented into collections with their own access restrictions.

Security Freshness - changes in the group/user structure are constantly monitored and refreshed in Coveo’s security cache. An administrator can also force a refresh of the cache if required.

Security Normalization - securities from different systems are normalized within the index so that users are automatically assigned with all proper security identifiers when accessing Coveo. This ensures that users see all the content they are entitled to see.

Super User Access - the main system administrator can grant temporary and audited rights to a specified user to search and access content for which he normally does not have access rights. Typical uses are e-Discovery, forensic, etc.

Text Analytics

Configurable Text Analytics - an administrator can configure a workflow that will create new metadata based on content analysis, rules and context, such as Themes, Named entities, Regular Expressions.

Incremental Updates - an administrator can configure update schedules to capture recent changes in the index.

Interactive Fine Tuning - extraction parameters, normalization and blacklisting can be refined and metadata regenerated without re-indexing the full documents set.

Named Entity Extraction - entities such as persons, locations, and organizations are automatically extracted from indexed content. Additional entities can be configured in the system.

Plug-ins - additional, 3rd party, plugins can be added to the text analytics workflow. For example, domain/organization specific taxonomies can be used in the process.

Rule-based Extraction - configurable rules can be used to add specific metadata to documents.

Theme Extraction - themes are topics and concepts are automatically extracted from indexed content.

More features of this product tomorrow...

Tuesday, July 17, 2012

Search Applications - Exalead

Exalead provides search platforms and search-based applications. Search-based applications (SBA) are software applications in which a search engine platform is used as the core infrastructure for information access and reporting. SBAs use semantic technologies to aggregate, normalize and classify unstructured, semi-structured and/or structured content across multiple repositories, and employ natural language technologies for accessing the aggregated information.

Exalead has a platform that uses advanced semantic technologies to bring structure, meaning, and accessibility to previously unused or under-utilized data in the disparate, heterogeneous enterprise information overload.

The system collects data from virtually any source, in any format, and transforms it into structured, pervasive, contextualized building blocks of business information that can be directly searched and queried, or used as the foundation for a new breed of lean, innovative information access applications.

Exalead products include the CloudView platform and the ii Solutions Suite of packaged SBAs, all built on the same powerful CloudView platform.

Exalead CloudView

Exalead CloudView enables organizations to meet demands for real time, in-context, accurately delivered information, accessed from diverse web and enterprise big data sources, yet delivered faster and with cost less than with traditional application architectures. This platform is used for both online and enterprise SBAs as well as enterprise search.

Available for on-premises or cloud delivery, Exalead CloudView is the infrastructure that powers all Exalead solutions, including Exalead’s public web search engine, the company’s custom SBAs, and the Exalead ii Solution Suite of packaged, vertical SBAs.

Exalead ii Solutions Suite

Exalead ii ("information intelligence”) applications are packaged, workflow specific SBAs that transform large volumes of heterogeneous, multi-source data into meaningful, real time information intelligence, and deliver that information intelligence in context to users to improve business processes.

On the data side, the Exalead information infrastructure uses semantic technologies to non-intrusively aggregate, align and enhance multi-source data to create a powerful base of actionable knowledge (i.e., information intelligence).

Exalead Advanced Search options appear as a drop-down menu below the search form, where users select the search criteria that will be entered directly into the search form. A different set of advanced search options is available for each search type.

Registered users can select the "Bookmark" option below for any search results to add these results to a list of saved sites, accessible on the Exalead homepage as a collection of image thumbnails.

Strengths:
  • truncation, proximity, and many other advanced operators not available from other search engines;
  • includes thumbnails of pages;
  • provides excellent narrowing options on right side.

Exalead appears to support Boolean operators and nested searching with the operators AND, OR, and NOT. Either AND, OR or NOT can be used. Searching can be nested using parentheses. Operators must be in upper case. Exalead can also use "for NOT" but only when it is not used along with the Boolean operators. In the Advanced Search, it also has drop-down choices for "containing," "not containing," and "preferably containing." There is also an OPT operator which means that the word following it is an optional word.

Phrase searching is available by using "double quotes" around a phrase. It also supports a NEXT operator for ordered proximity of one word (in other words, the same thing as a phrase search.) So "double quotes" should get the same results as double NEXT quotes. Exalead also supports the NEAR operator for 16 word proximity. You can change it to NEAR/5 (or any other number) to specify a different proximity value.

Exalead's Advanced Search also offer some unusual types of special searches:
  • phonetic spelling with the sounds like: operator;
  • approximate spelling with the spells like: operator;
  • regular expression using regex syntax.
On a search with two or more words, stemming is automatic. Exalead also supports truncation using an asterisk * symbol. Stemming is also controlled on the preferences page. Exalead has no case sensitive searching. Using either lower or upper or mixed case will result in the same results. Exalead supports a title search.

Exalead has limits for language, country, file type, site, and date available on the Advanced Search page. The file type limit includes text, PDF, Word, Excel, Powerpoint, Rich Text Format, Corel WordPerfect, and Shockwave Macromedia Flash. You can place the file type search command into the search box. The Advanced Search page offers a site limit, which can be used to limit results to those from the specified domain. The language limit is available in the Advanced Search.

Some common words such as 'a,' 'the' and 'in' are ignored, but they can be searched with a + in front. Within a phrase search, all words are searched.

Results are sorted by a relevance algorithm. Pages are also clustered by site. Only one page per site will be displayed. Others are available via the yellow folder and domain name. The Advanced Search page used to include two date sort options, but those disappeared with the new interface in October 2006. They are still available via the field prefix of sort. Two options are available: sort:new and sort:old.