Galaxy Consulting Blog

Thursday, September 20, 2012

Faceted Search

Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple filters. A faceted classification system classifies each information element along multiple explicit dimensions, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order.

Facets correspond to properties of the information elements. They are often derived by analysis of the text of an item using entity extraction techniques or from pre-existing fields in a database such as author, descriptor, language, and format. Thus, existing web-pages, product descriptions or online collections of articles can be augmented with navigational facets.

Faceted search has become the de facto standard for e-commerce and product-related web sites. Other content-heavy sites also use faceted search. It has become very popular and users are getting used to it and even expect it.

Faceted search lets users refine or navigate a collection of information by using a number of discrete attributes – the so-called facets. A facet represents a specific perspective on content that is typically clearly bounded and mutually exclusive. The values within a facet can be a flat list that allows only one choice (e.g. a list of possible shoe sizes) or a hierarchical list that allow you to drill-down through multiple levels (e.g. product types, Computers > Laptops). The combination of all facets and values are often called a faceted taxonomy. These faceted values can be added directly to content as metadata or extracted automatically using text mining software.

For example, a recipe site using faceted search can allow users to decide how they’d like to navigate to a specific recipe, offering multiple entry points and successive refinements.

As users combine facet values, the search engine is really launching a new search based on the selected values, which allows the users to see how many documents are left in the set corresponding to each remaining facet choice. So while users think they are navigating a site, they are really doing the dreaded advanced search.

There are best practices in establishing facets. They are:

do not create too many facets - presenting users with 20 different facets will overwhelm them; users will generally not scroll too far down beyond the initial screen to locate your more obscure facets;

base facets on key use cases and known user access patterns - idenfity key ways users search and navigate your site. Analysing search logs, evaluating competitor sites, and user research and testing are great ways to figure out what key access points users are looking for. Interviewing as few as 10 users will often give you great insight into what the facet structure should be;

order facets and values based on importance - not all facets are equally important. Some access points are more important than others depending on what users are doing and where they are in the site. Present most popular facets on the top. When determining order for navigation, again think about your users and why they are coming to your site.

leverage the tool to show and hide facets and values - while the free or low-cost faceted search tools don’t all offer these configuration options, more sophisticated faceted search solutions allow you to create rules to progressively disclose facets.

Think of a site offering online greeting cards. While the visual theme of the card – teddy bears, a sunset, golf – might eventually be important to a user, it probably isn’t the first place they will start their search. They will likely start with occasion (birthday, Christmas), or recipient (father, friend), and then become interested in themes further down the line. Accordingly, we might hide the “themes” facet until a user has selected an occasion or recipient. You can selectively present facets based on your understanding of your users and their typical search patterns (as mentioned in the previous “do”).

Also take advantage of the search engine’s clutter-reducing features, such as the “more...” link. This allows you to present only the most popular items and hide the rest until the user specifically requests to see them. You can also do this at the facet level, collapsing lesser-used facets to present just the category name and let users who are interested expand that facet.

facet display should be dependent on the area of the site. If you are in the first few layers of your site, you should show fewer facets with more values exposed, whereas if you are deeper into product information you should show more facets, some with values exposed and others hidden.

create your taxonomy with faceted search in mind - a good taxonomy goes a long way in making a successful faceted search interface.

There are some important guidelines to follow in taxonomy design. Facets need to be well defined, mutually exclusive and have clear labels. For example, having one facet called “Training” and another “Events” is confusing: where do you put a seminar? Is it training or an event? If you have to wonder, your users will too. The taxonomy depth (how many levels deep does it go) and breadth (how many facets wide is it) are other important considerations. Faceted search works better with a broad taxonomy that is relatively shallow, as this lets users combine more perspectives rather than get stuck in an eternal drill down, which causes fatigue. The facet configuration and display rules will help you create the optimal progressive presentation of these facets so as to not overwhelm users with the breadth.

If you are torn between two places in the taxonomy for a term, consider putting it in both places. This is called polyhierarchy, and it is a good way to ensure findability from multiple perspectives. Polyhierarchy is best served within a facet rather than across multiple facets. Since facets should be mutually exclusive, you shouldn’t have much need to repeat terms across facets, which can be more confusing than helpful.

The most important thing however, is to be prepared to break any of these rules in the name of usability. Building a faceted taxonomy involves understanding your users’ search behavior.

As the trend towards increased social computing continues, Web 2.0 concepts are entering the realm of faceted search. We are starting to see social tags being used in faceted search and browse interfaces. Buzzillions.com, a product-review site, is using social tag-based facets in its navigation, allowing users to refine results based on tags grouped as "Pros" or "Cons". This site uses a nice blend of free social tagging and control to ensure good user experience; when you type in a tag to add to a product review, type-ahead verifies existing tags and prompts you to select one from the existing list of matches to maximize consistency.

Ultimately, navigation and search is one of the main interactions users have with your site, so getting it right is not just a matter of good design, it impacts the bottom line. Faceted search is a very popular and powerful solution when done well; it allows users to deconstruct a large set of results into bite-size pieces and navigate based on what’s important to them. But faceted search by itself is not necessarily going to make your users lives easier. You need to understand your users’ mental models (how they seek information), test your assumptions about how they will interpret your terms and categories and spend time refining your approach.

Faceted search can just add more complexity and frustrate your users if not considered from the user perspective and carefully thought through with sound usability principles in mind. Faceted search is raising the bar in terms of findability and how well you execute will determine whether your site meets the new standard.

Monday, August 27, 2012

SharePoint - Project Management Features

SharePoint has project management features to manage projects and keep track of project information on a site. You can track team events with a calendar, manage a list of tasks, and log and respond to issues.

A team can use a calendar to track team events, vacations, and conferences, and other events. Team members can connect this calendar to Microsoft Office Outlook 2007, where they can overlay it with their personal calendars to avoid scheduling conflicts. They can copy events back and forth between the calendars.

The team can use tasks lists to manage the work for large projects, such as planning a convention and managing a marketing campaign. Tasks can be set up with a standard list view or as a project tasks list. A project tasks list provides a visual overview, known as a Gantt view, of the tasks and their progress. Templates are available for creating lists in either format — a standard list view or a project tasks list.

The team can use an issue tracking list to track logistical problems that are related to the conference planning, such as registration database issues. A team member logs the issue, and then people record any updates and fixes until the issue is resolved.

Working with lists

When you create a team site, several lists are created for you. These default lists range from a discussion board to a calendar list. You can customize and add items to these lists, create additional lists from templates, and create custom lists with just the settings and columns that you choose.

Lists can include many types of information, ranging from text to dates to pictures. Lists can also include calculations, such as totals or a calculated date, such as a week from today's date.

By using lists, you can do the following:

Track versions - You can track versions of list items, so that you can see which list items have changed, as well as who changed them. If mistakes are made in a newer version, you can restore a previous version of an item.

Require approval - Your organization can specify that approval for a list item is required before it can be viewed by everyone.

Integrate e-mail with a list - If incoming or outgoing e-mail is enabled on your site, some lists can take advantage of e-mail features. Some types of lists, such as calendars and discussions, can be set up so that people can add content to them by sending e-mail. Additionally, Office Outlook 2007 integrates with calendar, tasks, and contacts lists.

Customize permissions - Your organization can specify custom permissions for a list or even a single list item. This feature can be useful, for example, if a specific item contains confidential information.

Create and manage views - Your group can create different views of the same list. The contents of the actual list don't change, but the items are organized or filtered so that people can find the most important or interesting information.

Keep informed about changes - You can subscribe to RSS Feeds of lists and views to see updates to lists in your RSS viewer, such as Outlook 2007. If your organization has set up incoming e-mail, you can receive e-mail alerts when items change.

Manage lists and work offline with lists in Microsoft Office Access 2007 - You can manage lists with database tools and take lists offline with Office Access 2007.

View lists on mobile devices - You can view many lists, such as tasks lists and calendars, and document libraries on mobile devices. To view a mobile list, type /m after the Web address of the site. Mobile views are not available for some list types, such as discussions, and may not display all column types.

Common types of lists for collaboration

The following are some of the more common types of lists your organization can use:

Calendar - Use a calendar for all of your team's events or for specific situations, such as a project calendar or company holidays.

Tasks and project tasks - Use a tasks list to track information about projects and other events for your group. You can assign tasks to people, as well as track the status and percentage complete as the task moves toward completion. A project tasks list displays the tasks with progress bars, known as a Gantt view.

Issue tracking - Use an issue-tracking list to store issues, their status, and resolution. This is a common type of list for tracking support issues or incidents, such as customer service, quality assurance, or technical support.

Discussion boards - Use a discussion board to provide a central place to record and store team discussions that is similar to the format of newsgroups.

Announcements - Use an announcements list to share news and status and to provide reminders.

Contacts - Use a contacts list to store information about people or groups that you work with.

Links - Use a links list as a central location for links to the Web, your company's intranet, and other resources.

Surveys - Use a survey to collect and compile feedback, such as an employee satisfaction survey or a quiz.

Custom - Although you can customize any built-in list, you can start with a custom list and then add just the settings that you want. You can also create a list that is based on a spreadsheet, such as a Microsoft Office Excel 2007 workbook for managing contracts.

Thursday, August 23, 2012

Content Management Systems Reviews - Documentum - Automatic Classification - Captiva Dispatcher

In my last post, I mentioned that Documentum has two tools for automatic classification: Content Intelligence Services (CIS) and EMC Captiva Dispatcher. I also described Content Intelligence Services (CIS) tool. In this my post, I am going to describe EMC Captiva Dispatcher.

EMC Captiva Dispatcher delivers high speed automatic content classification, data extraction, and routing documents. With Dispatcher, companies are able to scan multiple batches of structured, semi-structured, and unstructured content within a single flow, without a need for separator sheets, barcodes, or patch codes. By combining EMC Captiva Dispatcher with the Captiva InputAsset Intelligent enterprise capture platform, you can scan, classify, extract, and deliver data from almost any kind of electronic or paper document, often without a need for manual sorting or data entry.

The result is cost reduction and business process optimization which are measures that can help save time and money while increasing an ability to manage the flow of incoming documents.

One of the greatest strength of Dispatcher lies in its ability to identify similar document types. It uses both text and image based analysis to determine document types, automatically capture business data for search and archiving or to drive transactional processes and route documents to the appropriate department for processing. The technology works by automatically learning the attributes of existing documents and using them as a basis for classifying new incoming documents.

By analyzing document's layout design such as logos or other graphical elements, Dispatcher is completely language and format independent. In the case of unstructured and semi-structured documents, they system uses full-text engine results, looking for keywords and text phrases contained in a document to determine the document type. By learning documents based on a visual layout, new document types can be automatically added.

Dispatcher performs automated data extraction and validation, reducing the need for manual data entry and ensuring that accurate information is passed to back-office systems. Dispatcher includes several recognition engines that allow you to extract machine printed and handwritten text, check marks, and barcode information.

For structured forms, Dispatcher extracts data using fast and accurate pre-defined zones. For less structured documents, like invoices or contracts, Dispatcher extracts data using more flexible, free form recognition, enabling data to be extracted regardless of where it exists on a page.

This broad set of recognition technologies and methods ensure that data is extracted with the highest performance from structured forms, while also providing maximum flexibility to extract data from all document types.

As part of EMC Captiva intelligent enterprise capture solution, Dispatcher integrates seamlessly with InputAccel, providing a capture platform that supports both centralized and distributed environments. InputAccel custom capture process flows manage the end-to-end process, ensuring that documents are classified, data is extracted and validated, and information is delivered to all relevant content repositories and business systems. Leveraging InputAccel together provides organizations with a complete solution that is capable of processing volumes ranging from few thousand documents a day to several million.

Friday, August 17, 2012

Content Management Systems Reviews - Documentum - Automatic Classification

Documentum has two tools for automatic classification: Content Intelligence Services (CIS) and EMC Captiva Dispatcher. The subject of my today’s post is Content Intelligence Services (CIS. In my next post, I will describe EMC Captiva Dispatcher.

Content Intelligence Services (CIS) is an extension to the EMC Documentum content management platform that enables automatic classification and categorization of content in the Documentum repository. Its benefit is well organized, classified, and categorized content. With CIS, content is parsed and analyzed and classification rules are applied. The results of the classification can then be used for categorization as keywords to populate content metadata.

The capability of automatically creating keywords to populate content metadata can remove the burden from end users who otherwise have to do it manually. Many users struggle to consistently populate metadata as content is being created which significantly limits its future use since metadata is what enables processing of the content.

CIS eliminates this dependency on users. CIS can propose metadata to users who can accept or modify them as needed. CIS can provide support for a combination or automatic and manual classification with a special user interface to category owners. Category owners can make a classification decisions manually in cases where the automatic rules cannot classify content with a preset certainty level. The user interface is built into every Documentum client such as Webtop and becomes active upon detection of CIS in the system.

With CIS, the results of the classification can be used for content categorization which assigns content to appropriate categories. Typically, categories are represented by a folder structure to which content is linked. A category hierarchy – or taxonomy – is usually common to a department or an organization and allows all users to share the same navigation view for content in active project or content that has been archived.

CIS comes with prepackaged taxonomies for various industries. These taxonomies can be customized and used either out of box or as a starting point for a customization. Users can add categories and sub-categories to these taxonomies.

CIS supports major European languages. This enables the classification of local content in its native language against an enterprise-wide or local taxonomy. Using this multilingual capabilities, companies can deploy CIS globally, enhancing globalization capabilities of Documentum that include pervasive Unicode compatibility and localized user interfaces.

Next time: EMC Captiva Dispatcher.

Monday, August 13, 2012

SharePoint Libraries for Content Management

A library is a place on a site where team members can work together to create, update, and manage files. Each library displays a list of files and key information about the files.

Why work with libraries?

Storing your documents in a central location can help your team work on files together, especially if your files tend to be scattered among people's computers or in multiple shared folders on your network.

For example, the Marketing team uses a document library named Marketing Documents for managing its press releases, budget files, contracts, and other types of files. The library stores information that is relevant to the type of file, such as the name of the project that the file is associated with. The Marketing team also uses a slide library to share and reuse slides for presentations.

The Shared Documents library is created automatically when your team creates a new site. You can start using this library right away, customize it, or create other libraries. Your team can also create more specialized libraries, such as slide libraries, picture libraries, and form libraries.

The Marketing team tracks versions in its libraries, so the team has a history of how files have evolved and can restore a previous version if someone makes a mistake. Team members check out documents when they work on them, so that no one else can overwrite their changes.

If you want a workspace where you can coordinate work on a document or a small number of related documents, you can create a Document Workspace site. A Document Workspace site includes a document library in addition to a tasks list, schedules, and a list of workspace members. For storing your team's primary set of documents, which your team uses on a routine basis, use your team's Shared Documents library.

By default, people in the Members group can add files to and edit files in a library. If you don't have permission, contact the person who owns your site or library. If you are a site owner or designer, you can customize the library by changing how the files are displayed and managed.

Some key advantages of working with libraries

The following are some key features of libraries that enable your team to manage its files and work more efficiently. Advanced features for managing content, such as policies on how documents are used and shared, are explained in other topics.

Central location - a library is a central location where your team can update and manage documents. If your team members struggle to keep up with files stored on individual shares or sent in separate e-mail messages, a library can help reduce the chaos.

Checkout - you can check out a file to reserve it for your use so that others cannot change it while you are working on it. If you are using the 2007 Microsoft Office system, you can work with files on your computer, and even take them offline, when you check them out.

Versions - a library can track versions, which provides a version history and enables previous versions to be restored.

Alerts and RSS - you can set up e-mail alerts or subscribe to RSS Feeds so that you are updated on changes to files.

Views - your team can create views that show content in multiple ways that may be especially relevant or meaningful. For example, the Marketing team has views of files grouped by department and contracts that expire this month.

Search - libraries are searchable. For example, you can search on a title or property of a document, such as the document author.

Client integration - If you are running some 2007 Office release programs, such as Microsoft Office Word 2007, you can work with server features directly from the client, such as checking out files, updating server properties, or viewing a version history.

Approval - your library can be set up to require someone to approve files before they are displayed to others. This feature can be helpful if your library contains important guidelines or procedures that need to be final before others see them.

Content types - your team can set up content types for the types of documents it uses most often, such as marketing presentations, budget worksheets, and contracts. The content types include templates as a starting point, for formatting and any boilerplate text and for properties that apply to the documents of that type, such as department name or contract number.

Workflow - your group can apply business processes to its documents, known as workflows, which specify actions that need to be taken in a sequence, such as approving or translating documents.

Thursday, August 9, 2012

E-Discovery and Records Management

Discovery is the pre-trial phase in a lawsuit in which each party can request documents and other evidence from opposing parties. E-discovery deals with discovery of electronically stored information (ESI), including documents and e-mails.

E-Discovery preparedness makes it imperative for organizations to develop an enterprise wide strategy to manage the volume of electronic information. The discovery process affects many individuals in an organization, not just lawyers and others involved in discovery, but also IT professionals and records managers, who have to be prepared to produce electronic content for discovery and litigation.

For legal counsel, it means having a review process to determine what discovered content is relevant to the case. For an IT person, it means restoring backup tapes to show evidence on file shares, content management systems, e-mail systems, or other applications. But for records managers, this work will have begun long before any lawsuit with managing records for retention, placing legal holds, and finalizing disposition.

ESI presents special issues for discovery:

ESI can be replicated at a very low cost, resulting in tremendous volume;
Electronic content can be easily changed and deleted;
ESI can be backed up, creating more volume as content is copied;
Electronic content may require certain software to access and read;
ESI can reflect relationships based upon how it is distributed;
ESI may have associated metadata;
ESI can be searched.

Ediscovery could be costly because it requires organizations to retrieve content from servers, archives, backup tapes, and other media.

In some cases, an organization is unable to execute a discovery order because it is unable to locate all content in a timely manner, or it is unable to place holds on all content and some of it is deleted during the lawsuit. The inability to do this correctly also has a cost, and it can be considerable.

To address these costs, many organizations are looking at e-discovery solutions that will enable them to review the found content and take it through litigation.

But organizations can also lower costs for archiving and restoring, legal review, and sanctions by simply cutting down how much content it retains. Less stored content means less content on which to perform discovery.

On the other hand, because all ESI is now discoverable, organizations may be tempted to destroy that information as soon as possible to reduce the cost of discovery. But, some information must be kept for regulatory and compliance reasons. For example, many organizations are governed by regulatory bodies that require business information to be retained for a specific period of time. Some of that information might also be important to support the organization in case of litigation. Destroying the wrong information can lead to fines and unfavorable judicial decisions.

Some organizations may randomly pick through content to remove content that is deemed most risky. But in litigation, it will be necessary to prove that the deletion of this content was consistent with a policy that has been applied rigorously. Without audit trails and certificates of destruction, it can be difficult to prove compliance with an organization’s policies.

To avoid this situation, many organizations are simply choosing to keep everything. But this experience proves that the cost of restoring backup and archive tapes, as well as the cost of discovery and the inability to identify content and place immediate holds, can make this policy economically disastrous in the event of litigation.

Developing a strategy and a plan of action for handling e-discovery will help organizations mitigate their risk and save them a significant amount of money in the event of litigation. Organizations need to have a retention policy to determine which content can be destroyed and at what time and which content should be kept and for how long. The key is to have a retention program that is flexible enough to keep content for the right retention period.

Retention periods are historically thought of in terms of calendar events. A document that was created in 2000 may no longer be required in 2012, and so it may be destroyed. Retention periods for content are driven by events, such as the length of a project, the duration of a contract, or the termination of an employee. And the retention policies that match up to these content types must reflect the lifecycle of the content.

Organizations may choose to keep project information for x number of years after the end of the project. A workflow event that signals the end of a project, such as the publishing of a report, may commence the retention period for the associated e-mails and files. An organization may create a retention policy that a contract will be retained for x number of years after the end of the contract period. The end of the contract, then, could then trigger a lifecycle action for that document.

There are many types of events that could trigger a retention policy: content expired (e.g. a contract), usage statistics (e.g., document has not been accessed in six months), business event (e.g., environmental impact filing), content lifecycle event (e.g., new revision checked in).

There are many actions an organization can take based upon the retention policy: delete, notify author, archive, move, delete revisions, revise. These different actions can be applied to retained content over the course of its lifecycle as it moves from its active use to inactive status to its deletion.

The best approach to records management is where authors create content using their familiar tools and systems, and retention management is enforced on that content where it lives, from a centralized place. This approach has a number of benefits:

retention policies are centrally administered through a single interface;
a catalog of discoverable content is created;
holds can be placed instantly across these different systems, ensuring that evidence is not deleted during litigation;
disposition can be performed from a central place.

By categorizing content, creating a catalog of the content, creating a retention plan, implementing a hold methodology, and having disposition procedures, an organization will benefit in many ways. They include:

Decreased Risk – by keeping less content, an organization decreases the risk of adverse evidence being found;
Higher Productivity – by organizing content through a file plan, key information, such as regulatory filings, tax information, business licenses, invoices, and other content, can be more easily found;
Lower Discovery Costs – with less information available for discovery, an organization will reduce the cost of restoration of content and the cost of legal review;
Increased Flexibility – an organization will be prepared to present a catalog of discoverable content, which is a requirement in a case of a litigation;
Stronger Legal Action – By knowing the evidence that an organization possesses, legal counsel can more quickly assess strategy and pursue a settlement, which can be a huge money savings;
Less Vulnerability – organizations that are unable to comply with electronic discovery requirements are beginning to see nuisance lawsuits. When an organization cannot comply with discovery requirements, it may set a cost threshold – stating, for instance, that any lawsuit under $100,000 is not worth the discovery effort and should be settled. This exposes the organization to nuisance lawsuits that are brought at just under the threshold.

If you have not already done so, now is the time to develop ESI retention programs. Now is the time to create committees within your organizations and to bring their expertise together with legal counsel and IT to prepare for e-discovery and litigation. And, now is the time to focus on one of any organization’s greatest assets, its information.

Friday, August 3, 2012

SharePoint and Collaboration

Most people spend the greater part of their work day involved in collaborative tasks. They share information, they work together in teams, and they manage projects. It can be a challenge to collaborate effectively if you do not have tools to easily communicate, share information, and coordinate projects details and deadlines among a large group of people.

SharePoint can help you get your work done more efficiently because it provides organizations with a platform for sharing information and working together in teams. A SharePoint site offers specific kinds of tools and workspaces that you can use to communicate with team members, track projects, coordinate deadlines, and collaboratively create and edit documents.

Manage Projects More Efficiently

Users can create a site from the Team Site template to manage a range of team projects and document related tasks. They would use their team site every day to create and manage documents, track issues and tasks, and share links and contacts. Because they have one location for these activities, members of a team can save time and enjoy increased productivity.

The site template for a team site includes:

Shared documents library;
Announcements list;
Calendar;
Team discussion list;
Tasks list;
Links list.

The site can store long term routine information for a single department or short term information for a special project that spans several departments. By creating a team site to use as a collaborative workspace, your team can become both more efficient and more productive and ultimately achieve better business results. You can also customize your site to meet the needs of your team or project by adding lists, libraries, or other features to the site. Calendar can be used for tracking events, meetings, etc. Users can link the calendar to their personal calendars in Microsoft Office Outlook so that they can view this information along with their personal calendar information. Users can create a Project Tasks list to visualize and track the key phases of projects.

There are several different ways you can use a team site to manage projects more efficiently:

use built-in features such as the Project Tasks list template, which enables you to visualize task relationships and project status with automated Gantt charts;
coordinate the team's work with shared calendars, alerts, and notifications. You can connect a calendar on your SharePoint site to your calendar in Office Outlook 2007, where you can view and update it just as you do your personal calendar;
create Meeting Workspace sites to gather materials and documents related to a meeting.

Create, Review, and Share Documents

Groups of people can create and edit documents collaboratively. For example, team members save general documents to a shared Documents library, where other team members can easily read them or check them out and edit them or team can use slide or picture libraries to save and reuse slides and pictures for various presentations, etc. For special projects that involve only a few people, team members can create Document Workspace subsites on their team site. Document Workspace sites help users to coordinate work on a single document or a group of documents.

There are several different ways to save and work on documents and other files on a team site:

use document libraries to store and manage important documents. Features such as versioning and check-out help you keep track of revisions to a document and to prevent multiple people from making changes at the same time;
create Document Workspace sites to coordinate the development of specific documents;
use Slide Libraries to share and reuse slides in a central location;
take document libraries offline to enable people to view and edit documents while they are not connected to the network;
use workflows to manage collaborative tasks such as document review or approval.

Capture and Share Team Knowledge

SharePoint provides organizations with a central location to capture best practices, share information, and promote standardized business processes. Teams can use both a wiki site and a blog site to capture and communicate information of interest to the team. A team can use a wiki to compile general information about company and team processes that will be helpful to new team members. Any member of the team can add information to the wiki or update the wiki posts. A team can also routinely post industry-related or marketing-related information to a blog site, where other team members can read the posts and comment on them. The blog provides team members with a forum to share new ideas, opinions, or inspiration.

Here are some ways you can use SharePoint to capture and share collective team knowledge or important information:

track updates and information with alerts or Really Simple Syndication (RSS);
use blogs to share or promote information;
Capture community knowledge or document internal processes by using a wiki;
use surveys or discussions to gather information or encourage dialogue.

Tuesday, July 31, 2012

Content Management Systems Reviews - Open Text - ECM Suite - Portal

OpenText Portal (formerly Vignette Portal) is a part of OpenText ECM Web Content Management Solution. It enables you to create web sites with rich content and applications, enabling customized users interactions. It provides a highly scalable and efficient means of aggregating content and applications for use across a variety of initiatives inside and outside the firewall.

It enables users to combine web services, repository data, and user interfaces in meaningful ways to create valuable business applications without IT help. Users can create web pages by simply selecting portlets from OpenText’s library of over 200 portlets.

Portal layout management allows users to easily apply a variety of page layouts via visually intuitive tools. There is interaction between portlets. Pages refresh only as needed. Portlets load separately, so the end user does not have to wait until the entire page loads. Pages are dynamic and load quickly.

Users can create a template from an existing site to be used later to create similar sites. Portlets can be embedded at any web site. User can utilize pre-defined portlets that allow rapid site creation of portals with common functionality that is integrated with existing applications and data. There are portlets that enable teams to share and publish portal documents as part of any business processes.

Social platforms such as blogs, wikis, rating, ranking, tagging could be integrated. Out-of-box federated search and taxonomy management tools are also available. There is an ability to custom connect 3rd party search engines.

All portals could be managed from a unified permissions-based management console while allowing delegated administration to individual portals which allows diverse multi-dialect administrators to manage virtually all of their portal objectives.

Content could be delivered to customer's PDA, cell phone or other device of choice. Content could be targeted from different repositories based on dynamic user segments.

User experience can be improved by enabling inter-portlet communications, web services integration and display of 3rd party portlets in an integrated, contextual way.

One can enhance security and auditing of the site activity via a native reporting interface that reports on site modifications.

Web presence is globalized with extensive internationalization for portal users and administrators to support diverse audiences.

Live sites could be updated faster via incremental item changes instead of import/export of the entire navigation tree. Import/export components or versions of components, categories of components or versions of components using batch processes.

Friday, July 27, 2012

Content Management Systems Reviews - Open Text - ECM Suite - Web Content Management

The following products deliver the Web Content Management component of the OpenText ECM Suite:

OpenText Web Experience Management is a comprehensive solution for managing content in high performance, scalability, and transaction-oriented web applications.

OpenText Portal works in tandem with OpenText Web Experience Management to allow you to rapidly create mashups and composite applications built on Web services, repository data, and user interfaces.

OpenText Dynamic Portal for Third-Party Portals works in tandem with OpenText Web Experience Management to allow you to publish content directly into portals such as Liferay, IBM WebSphere, or Oracle WebCenter.

OpenText High Performance Web Delivery provides a unique, integrated combination of real-time caching and intelligent cache management capabilities. It improves Web site performance, makes Web sites more scalable, and in many cases reduces costs and manual overhead.

OpenText Semantic Navigation is a powerful and engaging technology, OpenText Semantic Navigation combines content analytics with information retrieval to automatically present Web site visitors with content that is relevant to what they’re looking for or viewing.

OpenText's Web Experience Optimization products give you the capabilities to optimize each phase of your on-line marketing campaign lifecycle and provide customers with a more relevant Web experience.

OpenText Campaign Management helps deliver highly personalized content to individual recipients through online and offline touch points. From simple campaigns to more sophisticated marketing programs, it enables the easy design, execution and measurement of multipart, results-driven communications across a variety of channels.

OpenText Business Integration Studio is a graphical development environment for rapidly integrating business applications, processes and information that facilitates the integration of OpenText's Web content management, social media, and portal management applications with disparate applications and content repositories inside and outside the enterprise.

Today's post is about OpenText Web Experience Management.

OpenText Web Experience Management (formerly Vignette Content Management) is a solution for creating and managing content for enterprise Internet, extranet, or intranet applications.

Users can:

create new sites from site templates derived from their existing sites or launch a sample site with out-of-the-box content types, workflows, and presentation assets;
apply graphical themes, page and region layouts to pages, layouts or whole sites;
browse content in contextual, multi-dimensional workspaces by site, content type, folder, category or explorer views.

It offers user-friendly console, branded themes, and preferences that empower its content owners to easily create and manage web content while automatically adjusting to day-to-day authoring actions. This enables users to edit pages and content with no experience and non-intrusive toolbars, improve productivity with contenxtual views that present information users need when and where they need it, and publish in one click.

Powerful and easy to use site layout, theme, and content templating interface enables users to control how site content is presented and helps ensure consistent branding and communication to a variety of audiences, while reducing site development and maitenance cost.

Users can manage all content through intuitive and configurable role-based management console. The console includes ribbon menu and properties toolbar for commonly used items, content tracker, task inbox, and content search with saved queries. There are ergonomic controls for faster editing including language, time zone, filters, page and content settings.

Cotent items can be reused across multiple sites. For example, one article can be published on 100+ sites with a single management workflow.

Vanity URLs can be completely automated or manually defined to help increase site rankings in major search engines and support marketing campaigns, promotions, and messaging that can help increase the number of visitors to your site.

It integrates well with social and collaboration sites.

Users can create content using their favorite tools and web forms. Content from other repositories can be dynamically integrated or migrated for full cycle workflow and publishing management. Images, podcasts, Adobe Flash files and video metadata management allows editors to streamline approval, metadata tagging, and publishing of these assets.

Support for roles allows organizations to customize access to content creators, approvers, developers, and other users. This allows individulas to participate in selected processes automatically while standardizing and enforcing business practices that are exposed to users through delegated administration.

There are good workflows and content types modeling and the best practices template in the sample site. The content type modeler provides an intuitive interface to create and modify content objects such as articles, products, news, etc. Content type evolution allows you to make common modifications.

Content could be in any data format including files, database records, XML documents, rich media assets such as images, videos, and podcasts. There are library services such as check-in and check-out, version control, rollback, content history, security, content classification, metadata indexing, and search. Content could be in any language.

There are tools to optimize the staging and delivery of managed content through web sites, portals, and other applications. The application streamlines the retrieval of content items according to their multi-faceted taxonomies and then transforms the items to suit the intended delivery context, application or device.

User can publish content through automated workflows that deliver content to multiple delivery applications (web servers, databases, application servers). The publishing engine manages content dependencies, so content retains its context throughout content lifecycle.

Search capabilities allow parametric search across content, content attributes, and metadata, both within the management console and for the site search features as well as framework for 3rd party search, enabling high degrees of search accuracy.

There is an ability to manage user access centrally including delegated administration based on LDAP standards.

I will continue describing other products of web content management component of the OpenText ECM Suite in my next posts. Follow me and stay tuned!

Wednesday, July 25, 2012

Automatic Classification

In my previous posts, I mentioned that the taxonomy is necessary to create navigation to content. If users know what they are looking for, they are going to search. If they don't know what they are looking for, they will look for ways to navigate to content, in other words, browse through content. Taxonomies can also be used as a method of filtering search results so that results are restricted to a selected node on the hierarchy.

Once documents have been classified, users can browse the document collection, using an expanding tree-view to represent the taxonomy structure.

When there are many documents involved, creating taxonomy could be time consuming. There are few tools on the market that provide automatic classification. Another use of the automatic classification is to automatically tag content with controlled metadata (also known as Automatic Metadata Tagging) to increase the quality of the search results.

The tools that provide automatic classification are: Autonomy, ClearForest, Documentum, Interwoven, Inxight, Moxomine, Open Text, Oracle, SmartLogic.

These tools can classify any type of text documents. Classification is either performed on a document repository or on a stream of incoming documents.

Here is how this software works. Example: "International Business Machines today announced that it would acquire Widget, Inc. A spokesperson for IBM said: "Big Blue will move quickly to ensure a speedy transition".

The software classifies concepts rather than words. Words are first stemmed, that is they are reduced to their root form. Next, stop words are being eliminated. These include words such as a, an, in, the - words that add little semantic information. Then, words with similar meanings are equated using thesaurus. For example, the words IBM, International Business Machines, and Big Blue are treated as equivalent.

Next, the software will use statistical or language processing techniques to identify noun phrases or concepts such as "red bicycle". Further, using thesaurus, these phrases are reduced to distinct concepts that will be associated with the document. In this example, there are 3 instances of IBM, 2 instances of acquisition (acquire, speedy transition), and 1 instance of Widget, Inc.

Approaches to Classification

Manual - requires individuals to assign each document to one or more categories. It can achieve a high degree of accuracy. However, it is labor intensive and therefore are more costly than automatic classification in the long run.

Rule-based - keywords or Boolean expressions are used to categorize a document. This is typically used when a few words can adequately describe a category. For example, if a collection of medical papers is to be classified according to a disease together with its scientific, common, and alternative names can be used to define the keywords for each category.

Supervised Learning - most approaches to automatic classification require a human expert to initiate a learning process by manually classifying or assigning a number of "training documents" to each category. This classification system first analyzes the statistical occurrences of each concept in the example documents and then constructs a model or "classifier for each category that is used to classify subsequent documents automatically. The system refines its model, in a sense "learning" the categories as documents are processed.

Unsupervised Learning - these systems identify both groups or clusters of related documents as well as the relationship between these clusters. Commonly referred as clustering, this approach eliminates the need for training sets because it does not require a preexisting taxonomy or category structure. However, clustering algorithms are not always good at selecting categories that are intuitive to users. On the other hand, clustering will often expose useful relationships and themes implicit in the collection that might be missed by a manual process. For this reasons, clustering generally works hand-in-hand with supervised learning techniques.

Each of approaches is optimal for a different situation. As a result, classification vendors are moving to support multiple methods.

Most real world implementations combine search, classification, and other techniques such as identifying similar documents to provide a complete information retrieval solution. Organizations having document repositories will generally benefit from a customized taxonomy.

Once documents are clustered, an administrator can first rearrange, expand or collapse the auto-suggested clusters or categories, and then give them intuitive names. The documents in the cluster serve as initial training sets for supervised-learning algorithms that will be used subsequently to refine the categories. The end result is a taxonomy and a set of topic models are fully customized for an organization's needs.

Building an extensive custom taxonomy can be a large expense. However, automated classification tools can reduce the taxonomy development and maintenance cost.

Organizations with document collections that span complex areas such as medicine, biotechnology, aerospace will have a large taxonomy. However, there are ways to refine taxonomy so it does not become an overwhelming task.

Together, enterprise search and classification provide an initial response to information overload.