Thursday, May 31, 2012

User Study

You have decided to implement a content management or a document control system. This system that you are planning to deploy is for users. Everything you create is for users. If your system meets your users' requirements, they will use it. If your system does not meet your users' requirements, they are not going to use it.

They are the ultimate designers. Design a system that confuses users and they will go somewhere else. Build the system that frustrates users and they will not use it. No matter what you do, they will find all possible excuses of why they cannot and should not use your system. Users adoption is going to be very difficult, almost impossible if you deploy a system which is not based on your users' requirements.

But who are your users? Why are they looking for information? What information are they looking for? How are they looking for information? How would they like to search for information? How would like to author this information? How would they like to use your system? and similar questions - these questions you should ask your users before you deploy any system. You ask these and similar questions during user study which should be done at the beginning of your project. This is the subject of my today's post.

How do you study users and their requirements? There are few methods: surveys, focus groups, interviews, user testing. Select broad spectrum of users across the entire organization. Include major stakeholders, department or unit managers, major authors and consumers of information.


This research tool provides an opportunity to gather input from a large number of people. They can be used to gather qualitative or quantitative data. You can send them by email or you can use free survey tools like Survey Monkey. When creating a survey, you will need to limit the number of questions if you want a reasonable response rate. If you have too many questions in your survey, users may not return the survey to you. You may also have to guarantee anonymity and offer an incentive.

Since there is little opportunity for the follow-up questions or dialogue, surveys do not allow you to gather rich data about users' information seeking behavior. The survey results can provide you with a powerful political tool. For example, if 90% of users say that they have a problem searching for documents and are frustrated, than this could be used as a compelling reason to improve the search by having a better system or improving existing system.

Focus Groups

When conducting focus groups, you gather groups of people who would be users of your system. You might ask them questions about what features they would like to see in your system, demonstrate a prototype of a system, and then ask users' perception of the system and their recommendations for improvement.

Focus groups are great for generating ideas about possible content and functions for the system. By getting several people from your target audience together and facilitating a brainstorming session, you can quickly find yourself with list of suggestions.
They could be used to prove that a particular approach does or does not work.


Face-to-face sessions involving one user at a time are a central part of users' study. You typically would begin with questions. Some of the questions you might ask are:

  • What do you do in your current role?
  • What information do you need to do your job?
  • How do you search for this information?
  • What information is most difficult to find?
  • What do you do when you cannot find something?
  • Do you create documents that are used by other people or departments?
  • What do you know about life-cycle of your documents?
  • What happens after you create them?
  • If you could select few features in the upcoming content management system, what would they be?

In determining what questions to ask and especially how to determine what features users would like in the system, it is important to remember that users are not content managers or information architects. They do not have the understanding or vocabulary to have a technical dialog about the system or its architecture. So, you need to be prepared to interpret what they might tell in general to specific system features that you already know about and then provide this information to them in your response.

User Testing

In basic user testing, you ask a user to do a task, for example to find information in the current situation. You can ask the user to browse or to search. Allowing about three minutes per task, ask the user to talk out loud while he is navigating. Take good notes and make sure you capture what he said and where he goes. You may want to count clicks and time each session. Include a range of audience types. It is particularly important to include people who are technically minded (for example engineers) and who are not (for example marketing) as they will demonstrate different behavior.

User study is iterative process, so you may have to repeat the same method few times. But whatever you do, do not underestimate the value of user study. If you would like to have the user adoption in the end, conduct the user study in the beginning.

Wednesday, May 30, 2012

Content Management Systems Reviews - Drupal

Drupal is a free and open-source content management system (CMS) and content management framework (CMF) written in PHP and distributed under the GNU General Public License. It is used as a back-end system for at least 1.5% of all websites worldwide ranging from personal blogs to corporate, political, and government sites. It is also used for content management and business collaboration.

he standard release of Drupal, known as Drupal core, contains basic features common to content management systems. These include user account registration and maintenance, menu management, RSS-feeds, page layout customization, and system administration. The Drupal core installation can be used as a brochureware website, a single- or multi-user blog, an Internet forum, or a community website providing for user-generated content.

As of March 2012 there are more than 15,648 free community-contributed addons, known as contrib modules, available to alter and extend Drupal's core capabilities and add new features or customize Drupal's behavior and appearance. Because of this plug-in extensibility and modular design, Drupal is sometimes described as a content management framework. A content management framework (CMF) is a system that facilitates the use of reusable components or customized software for managing web content. It shares aspects of a web application framework and a content management system (CMS). Drupal is also described as a web application framework, as it meets the generally accepted feature requirements for such frameworks.

Although Drupal offers a sophisticated programming interface for developers, no programming skills are required for basic website installation and administration. Drupal runs on any computing platform that supports both a web server capable of running PHP to store content and settings.

Drupal Core

In the Drupal community, the term "core" means anything outside of the "sites" folder in a Drupal installation. Drupal core is the stock element of Drupal. In its default configuration, a Drupal website's content can be contributed by either registered or anonymous users (at the discretion of the administrator) and is made accessible to web visitors by a variety of selectable criteria. Drupal core also includes a hierarchical taxonomy system, which allows content to be categorized or tagged with key words for easier access. Drupal maintains a detailed changelog of core feature updates by version.

Core Modules

Drupal Core includes optional modules which can be enabled by the administrator to extend the functionality of the core website. The core Drupal distribution provides a number of features, including:
  • Access statistics and logging
  • Advanced search
  • Blogs, books, comments, forums, and polls
  • Caching and feature throttling for improved performance
  • Descriptive URLs
  • Multi-level menu system
  • Multi-site support[37]
  • Multi-user content creation and editing
  • OpenID support
  • RSS feed and feed aggregator
  • Security and new release update notification
  • User profiles
  • Various access control restrictions (user roles, IP addresses, email)
  • Workflow tools (triggers and actions)
Core Themes

Drupal core includes core themes, which customize the "look and feel" of Drupal sites, for example, Garland, Blue Marine etc. The Color Module, introduced in Drupal core 5.0, allows administrators to change the color scheme of certain themes via a browser interface.


Drupal is available in 55 languages. Drupal localization is built on top of gettext, the GNU internationalization and localization (i18n) library.

Auto-update Notification

Drupal can automatically notify the administrator about new versions of modules, themes, or the Drupal core. Such a feature can be useful for security fixes.

Extending the Core

Drupal core is modular, defining a system of hooks and callbacks, which are accessed internally via an API. This design allows third-party contributed (often abbreviated to "contrib") modules and themes to extend or override Drupal's default behaviors without changing Drupal core's code. Drupal isolates core files from contributed modules and themes. This increases flexibility and security and allows administrators to cleanly upgrade to new releases without overwriting their site's customizations.


Contributed modules offer image galleries, custom content types and content listings, WYSIWYG editors, private messaging, third-party integration tools and more. The Drupal website lists over 11,000 free modules.

Some of the most commonly used contribution modules include:

Content Construction Kit (CCK): allows site administrators to dynamically create content types by extending the database schema. "Content type" describes the kind of information. Content types include, but are not limited to, events, invitations, reviews, articles, and products.
Views: facilitates the retrieval and presentation through a database abstraction system of content to site visitors.
Panels: drag and drop layout manager that allows site administrators to visually design their site.

Drupal Distributions

Distributions are a collection of pre-configured themes and modules for feature-rich web sites giving you a head start on building your site. Users can build your own online communities, media portal, online store, and more!

Friday, May 25, 2012

Enhancing SharePoint Through Information Governance

According to Microsoft, every day for the past five years 20,000 new SharePoint users have been added. As one of the most popular departmental content management solutions, SharePoint silos are now littering the organizational landscape with little or no centralized control. Enterprises are seeking to do more with less, leverage what they already own, and take advantage of SharePoint 2010 functionality.

Technologies are available to tag content, classify it to organizational taxonomies, preserve and protect information through the automatic identification of records and privacy data, and as a migration tool. These building blocks work well in the SharePoint environment and add functionality transparently to the end user.

Building Block #1: Metadata

An enterprise metadata repository is the primary building block in the framework, enabling the proactive management of content. This component is tightly integrated with the management of content life-cycle. Enterprises struggle with managing content, stemming from the end user's inability to accurately and consistently tag content for search, storage, records identification and archiving purposes. Most organization still focus on relying on the end user for appropriate tagging. Only by eliminating the human factor can enterprise metadata management be achieved and subsequently the content life-cycle management.

Through automatic semantic metadata generation and auto-classification as content is created or ingested, the taxonomy component integrates well with Term Store to seamlessly manage the metadata. Eliminating end user tagging, a comprehensive metadata repository can be easily developed, deployed, and managed.

Building Block #2: Search

For many organization, content exists in numerous locations, on diverse repositories and replicated across various silos. Most end users are unable to find relevant information to support business objectives resulting in the inability to re-use and re-purpose content. This leads to impaired decision making and decreased organizational agility.

Whether the enterprise search is SharePoint or FAST, the delivery of meaningful results depends on the ability to effectively index and classify content and utilize taxonomies to better manage the content. The search engine provides the features, functions and interface, while the technologies provide the tagging and classification structure to deliver relevant results.

Building Block #3: Governance

The enterprise governance structure allows employees to work in the most efficient and effective way possible by giving them access to information in a controlled and secure manner. This building block consists of tools that ensure information quality, maintain content life-cycle, address the retention and disposition of records, secure and protect privacy, and establish standards when dealing with information.

Building Block #4: Policy

The application of policy must be deployed from an enterprise perspective and address the entire portfolio of information assets. The technology generates the identification of concepts, records, and privacy of data. Assignment of custom content types and workflows can be initiated for disposition making user involvement much less. This solution ensures consistency, improves record-keeping and enables the establishment of monitoring and auditing processes to ensure proof of compliance and data protection.

Building Block #5: Privacy

The demarcation of who is responsible for the protection of privacy data is becoming blurred. Each business function may have a unique view of what is confidential, such as legal, human resources, and product development. It remains the responsibility of the organization to set the policies and the stakeholders to protect and hold confidential certain information.

Leveraging content types to drive information rights management coupled with automatic semantic metadata generation and organizationally defined descriptions, unknown privacy exposures can be identified and processed automatically to the appropriate repository for disposition.

Building Block #6: Enterprise and Web 2.0

SharePoint provides technology to implement collaboration tools. These tools encourage collaboration and link employees, partners, suppliers, and customers to share information. Adding structure to chaos provides more control of collaboration, while encouraging the audience with ability to interact and share information. Adding control via classification and providing an integrated view of organized content through the taxonomy structure, end users still have the ability to freely contribute and the enterprise can more effectively use these tools as a business advantage.

Tuesday, May 22, 2012

Content Management Systems Reviews - Documentum - XML Platform for Content Reuse

Designed for content oriented applications such as publishing, archiving, information mashups, regulatory filings, collaboration and knowledge management, XDB provides a scalable architecture to warehouse content in an application-neutral format, not dependent on any application for information retrieval.

XML-based Documentum platform reduces costs by enabling technical writers to reuse rather than reinvent content. You can:
  • manage content at a granular level thus increasing the likelihood that a particular piece of content can be used without modification;
  • automate the assignment of attributes to content so that writers spend less time describing content;
  • leverage advanced search techniques to enable writers to easily find, reuse, and repurpose content;
  • manage images and other types of rich media in a common content repository;
  • separate content from structure and format, using external DTD and schemas to control the document structure;
  • leverage automatic transformation and publishing capabilities to package information for different delivery channels;
  • significantly reduce costs associated with localization and translation efforts on only the content that has changed.
XDB allows content schemas to be easily modified to adapt to changing information requirements, supports queries against complex structures, and supports automatic versioning of content and schemas.

Common applications for XDB include the following:

Dynamic Publishing

XDB has scalability, performance, and functionality to power high-volume dynamic sites that deliver highly relevant targeted content. You can also take advantage of Dynamic Delivery Services, a platform that makes it easy to build, maintain, and deploy content delivery applications on XDB.

Content Warehousing

XML is a perfect format for aggregating content into content warehouse to support archiving. It is application neutral. It is also self-describing, which means that it retains its information value even when the application used to create it is not available. You can easily modify content definitions (schemas) to adapt to changing needs. A high performance repository is required for efficient storage and access and a flexible query language for mining the content. This is where XDB comes in.


Scalable, high performance architecture: unlimited storage capacity, minimal storage overhead, low memory requirements, extensive tuning options, high concurrency, reduced processing overhead for compound operations, unlimited horizontal scaling.

High reliability with simplified administration: high availability, robust transactions, intuitive navigation, powerful and easy to use administration, embeddability, easy copying between databases.

Comprehensive applications development: complete API, built-in transformations, rapid development.

Powerful search, retrieval, linking, and updates.

Integration and interoperability: content validation, WebDav interface, database import, file import.

Extended content management – manage non-xml content, versioning, metadata, xml differencing.

Saturday, May 19, 2012

Content Analysis

Content analysis is the cornerstone of your content management initiative. It includes careful review of the documents and objects that exist and that would help to define scope of your project.

The main purpose of content analysis is to provide data that is critical to the development of a solid information architecture. It helps you to reveal patterns and relationships within content and metadata than can be used on to better structure, organize, and provide access to that content. It will also help to configure your content management system accordingly. It will help you in the design phase when you begin coming up with content types and metadata. It also provides valuable input into the broader design of organization, labeling, navigation, and searching systems.

I recommend to my clients to conduct content analysis in the form of detailed audit. Early in the research phase, a high-level content survey could be a useful tool for learning about the scope and nature of content. Later in the process, a content audit or inventory would produce a roadmap for the project and will facilitate an organized approach to authoring and managing of content.

Gathering Content

To begin, you will need to find and analyze a representative sample of your organization content. Try to capture few items from each content type. You content could be white papers, annual reports, forms, drawings, financial documents, marketing brochures, press releases, forms, etc. Capture a diverse set of content types. Include different formats content such as textual documents, video and audio files, archived e-mail messages, etc.

Make sure that you got samples for engineering, marketing, customer support, finance, human resources, sales, research, etc. Try to represent a broad range of subjects in your content sample. Consider intended audience, document length, languages, etc. Consider also the importance of certain content types over others.

If there is already a content management system in place, you can use it to get information about existing content.

During content analysis, note the following:

Structural Metadata - describe the information hierarchy of this object. Is there a title? Are there discrete sections or chunks of content? Might users want to independently access these chunks?

Descriptive Metadata - think of all the different ways you might describe this object. How about topic, audience, format? There should be at least a dozen different ways to describe many of the objects in your analysis.

Administrative Metadata - describe how this object relates to business context. Who created it? Who owns it? When was it created? When should it be removed?

You might ask yourself these questions:

What is this object?
How can I describe this object?
What distinguishes this object from others?
How can I make this object findable?

Look for patterns and relationships that emerge as you study many content objects. Because of the need to recognize patterns within the context of the full sample, content analysis is by necessity an iterative process. It may be on the second or third pass over a particular document that you discover a useful solution.

Content Map

In the end of this process, you may want to create a content map. A content map is a visual representation of the existing information environment. It is a tool for understanding your content. It will help you to visualize relationships between content categories, explore navigation pathways within content areas, figure out the structure, organization, and location of existing content, and ultimately help you to come up with ideas about how to provide improved access to your organization content.

Monday, May 14, 2012

Document Control System Implementation

Document control is revision control of documents, assigning and tracking documents numbers, change control management, assuring documents compliance, documents routing and tracking. It can also include Bill of Materials (BOM) and Approved Vendor List (AVL) management. Document control could either exist separately or could be a part of content management activities.

In companies, especially in regulated industries, there are document control people for performing document control functions separately. They do not have any functions related to content management. Document control is usually part of QA. It is mandated function in regulated industries. Document control is a part of ISO 9001 and GMP/GxP requirements.

The primary purpose of document control is to ensure that only current documents and not documents that have been superseded are used to perform work and that obsolete versions are removed. Document control also ensures that current documents are approved by the competent and responsible for the specific job persons and are distributed to the places where they are used.

How to implement the document control system and where to start? This is the subject of my today's post.

In order for your document control project to be successful, I recommend that you follow these steps in this specific order.

Select a System

Select an electronic system in which you are going to control your documents. If you already have a content management system in place, it could serve document control purpose as well. If you don't have a content management system, select a system specifically designed for document control. Most widely used and popular systems specifically designed for document control are Agile, Arena, and Omnify. If in future you decide to implement a content management system, you would be able to integrate it with your document control system.

Define Controlled Documents

Controlled document is any document that is used to perform work. Reference document is any document that is used for reference only and NOT to perform work. These documents must NOT be used to perform work.

Identify all your document types, for example specifications, drawings, schedules, meeting minutes, etc. Among these documents, identify which documents are going to be controlled documents and which are going to be reference documents.

Be careful about designating documents as controlled. Controlled documents must be approved by authorized approvers in order for them to be valid and to be able to use them. The author or modifier of a controlled document must get this document approved before it can be used to perform work. Each controlled document would need to have a number. All controlled documents must be accounted for, their distribution is strictly controlled. For this reason, choose your controlled document types with great care.

So, if a document so not need to be approved or their distribution is strictly controlled, make it a reference document. For example, if a document is going to be used for work and this work needs to be controlled as far as its quality and safety, then this document should be controlled.

Define Document Approvers

Define who in your staff is going to approve documents when they are going to be created or changed. Define the procedure for documents approval.

ECO Process

When a new document is created or a document is going through a change, Engineering Change Order (ECO) is used to document and approve the document creation or changes. It can also be called Engineering Change Notice (ECN) or Document Change Notice (DCN). ECO outlines the proposed change, lists the product or part(s) that would be affected and requests review and approval from the individuals who would be impacted or charged with implementing the change. ECOs are used to make modifications to components, assemblies, associated documentation and other types of documents.

The change process starts when someone identifies an issue that may need to be addressed with a change to the product. It ends when the agreed-upon change is implemented. ECOs are used in between to summarize the modifications, finalize the details, and obtain all necessary approvals. Every time a document is created or changed, ECO would need to be created and used to get this document approved.

You would need to assign a number to each ECO and if you are not using an electronic system for generating ECO or lists which could be used as ECO list, you would have to scan and upload ECO into your system.


Every information system should include two access points to information: search function and browse function. Users use search function when they know exactly what they are looking for. Users use browse function when they do not know what they are looking for. Taxonomy needs to be created to accommodate the browse function in the system.

Users do not always know what they are looking for. In fact, in most cases, users do not know what they are looking for or they know it but are not able to find it using search. Users are going to look for ways to find documents. It is easy to find uncategorized documents when there are just few of them in the system. When there are many items in the system, it is going to be very difficult to find them.

Create taxonomy for your documents. Taxonomy should be validated in the user study and user side testing when necessary and adjusted as needed.

Metadata, Naming Conventions, Controlled Vocabulary


Metadata values for documents need to be defined to accommodate the search function of documents in the system. Each document type should have metadata assigned to it. Metadata values would be the criteria that users need to use to search for documents.

The general system search will accommodate the full text search of content. This search would be sufficient when there are just few documents in the system. When there are many documents in the system, the general system search will retrieve a long list of irrelevant items. Users are not going to browse through long lists of items. To make the search precise, the presence of metadata is necessary. If metadata is present, the search can be performed using metadata rather than full text search.

Metadata should be validated in the user study and user side testing when necessary and adjusted as needed.

Naming Conventions

The role of naming conventions is very important in order for users to identify documents in the list without opening each one of them. Naming conventions should be created for each document type. Naming conventions should be validated in the user study and user side testing when necessary and adjusted as needed.

Controlled Vocabulary

Controlled vocabulary is the list of controlled terms that should be used for some of the metadata fields. These controlled terms should be standard terms used in standard publications, documents, majority of users, etc. Controlled vocabulary would help to ensure that metadata values are consistent. Consistent metadata will ensure high precision search.

Assure Documents Distribution

During this step, you need to make sure that everyone who needs the document gets a copy.

Distribution may be physical (paper documents) or electronic. When posting the document on intranet or other electronic systems, ensure that everybody who needs to have the new document knows about the posting (e.g. through an email or workflow notifications). When distribution is physical (paper documents), documents need to be stamped to identify that this is a controlled document and that a user of this document needs to verify that this is the most current version before starting work.

Controlled documents need to be watermarked so that if they are printed, users know that they need to verify their version before using them.

An inventory of controlled documents should be created with the exact location of each controlled document.

Remove Obsolete Documents

This is easy if you use an electronic documentation management system but is more complicated with hard copy documents. Each hard copy document must be replaced when it has been changed.

You may request the receiver of new documents to send back obsolete ones. If for some reason you need to retain obsolete versions of documents, they need to be marked to avoid unintended use. Many organizations use a stamp: "obsolete document".

Sunday, May 13, 2012

Content Management Systems Review - Open Text - ECM Suite - Records Management and Archiving

In my last post on Open Text ECM Suite Content Lifecycle Management group of products, I mentioned that this group consists of document management, imaging, records management, and archiving. I described document management in my previous post and I described imaging solution yesterday.

Today, I am going to describe records management and archiving solutions of Content Lifecycle Management group of products.

Records Management

OpenText Records Management (formerly Livelink ECM - Records Management) delivers records management functions and capabilities to provide full lifecycle document and records management for the entire organization. This product allows your organization to file all corporate holdings according to the organizational policies, thereby ensuring regulatory compliance and reducing the risks associated with audit and litigation. Records management can manage content in a number of different repositories.

Record management product provides services to core ECM Suite components. Its features are embedded in the interface of respective applications enabling users to access records management functions in the interface they are most familiar with.

Users can access records management solution from a standard Web browser. It provides a common interface to access all forms of information, such as images, paper, word processing documents, spreadsheets and email. Users can apply metadata to submitted documents to enhance search capabilities. Metadata is indexed and can be used to more easily find, retrieve and generate reports on documents based on your custom criteria.

Metadata, retention and disposition rules can be applied immediately upon the classification of a record to all content regardless of type. The product supports the application of multiple file classifications, holds, and retention schedules to individual records. Content can hold two or more record classifications and be retained according to multiple retention schedules. You can combine classifications and schedules to meet the unique retention and disposition needs of content.

Users are able to create Record Series Identifiers (RSI) to define a disposition schedule for each RSI. RSI Apply rules can be created that define which records belong with what RSI. Rule searches can be applied to return documents that are marked with an RSI value. File Plan can be created with which an RSI or a Records Management object can be associated.

The product includes the ability to manage physical items such as paper records, equipment, and more, adding representative object graphics to electronic storage repositories. In addition, it supports the use of XML-based color labels and barcode labels for physical records such as folders, boxes and shelves directly from within the Records Management interface.

There are few options for classifying records. You can classify records interactively with a single click or automatically inherit retention schedules and classifications by moving many records at the same time into folders. You can automatically import retention policies and other data. Records management maps records classifications to retention schedules.

All Records Management objects have Access Control Lists. In addition, security settings can be modified globally. Administrators can periodically review vital records to ensure appropriate classification and disposition.

All activities can be fully audited. You can track which records were purged from the system and generate high-level views of all system activity.

Disposition of records can be automated according to organization requirements. You can create list of records that are ready for review or final disposition, and route them to individuals for review and approval.

You can perform disposition searching against items. Searching calculates the disposition date of the items based on RSI schedule and returns the records ready for deletion, archiving or moving on to the next stage in their lifecycle.

You can support vital records identification and the cycling of vital records based on pre-set periods, such as monthly, quarterly, and annually.

You can make records official to prevent users from modifying them. You can also apply legal holds: you can suspend retention schedules and protect content from deletion with legal holds. You can apply multiple legal holds to documents at the same time.

You can also manage physical records:
  • barcode label management supports the use of XML-based color labels and barcode labels for physical records such as folders, boxes, and shelves;
  • warehouse management - box items and send them to off-site storage;
  • circulation management - allow users to borrow items, request for future borrowing, and pass single or multiple records in a single steps. Users can box items and send them to off-site storage.
You can extract records into a secured centralized repository and manage records in them as "in place" or physically extract and automatically replace records with shortcuts, enabling secure content archiving in a centralized, compliant storage environment while still allowing user access directly from application.


An OpenText archiving solution is powered by records and retention management.

The following products deliver the archiving component of the OpenText ECM Suite:

OpenText Archiving for SAP Solutions - links document content to the SAP business context. Archiving for SAP Solutions enables you to create, access, manage, and securely archive all SAP content. Archiving for SAP Solutions is a highly scalable and secure repository for business-critical SAP business documents and data. It is designed for the complete range of business documents such as incoming/outgoing invoices, orders, delivery notes, quality certificates, HR employee documents, archived SAP data, and more.

OpenText File System Archiving (formerly Livelink ECM - File System Archiving) provides secure, long-term storage of content from file system drives, while ensuring content integrity, avoiding redundancy, reducing storage costs, and enabling records management. Physical files are managed by a secure document archive.

All contextual metadata information, including auditing, version control, security permissions, and more, is handled by a dedicated metadata management layer. You can configure storage rules according to relevant criteria, such as file size, date, or folder, to control what file system content is archived and to which storage media.

You can migrate content out of file systems and replace items with hyperlinked shortcuts to archived content. Clicking a shortcut retrieves the corresponding archived document, creating a seamless end-user experience.

Alternatively, you can copy files into the OpenText managed repository. This scenario accommodates organizations who want to consolidate access to content stored in file systems, but do not want to disturb the physical files stored therein. You also can move files into the secure storage repository entirely. This scenario accommodates organizations who want to discontinue usage of file systems altogether.

Archive content to secure storage media such as WORM, DVD, UDO, or write-once hard disks. Time stamps and system signatures ensure the integrity of documents. Furthermore, you can quickly review activity logs around archived content, including who viewed or edited documents, when, and why.

File System Archiving enables you to automate the process of storing content safely in multiple physical locations or on hot stand-by devices. In addition, you can automatically render content into standardized formats, such as PDF and TIFF, to ensure future readability.

You can detect multiple instances of content and eliminate redundancies. Content can be compressed automatically to minimize wasted space. You can execute powerful full-text searches across archived file systems, consolidating content from multiple file systems into a single result set.

Multiple archived documents can be restored to the original file system (or to a new, specified location) with a single click from the search results. If appropriate, shortcuts are replaced with original files, and content can be optionally left in or removed from the OpenText managed repository.

OpenText Email Management products are characterized by a centralized foundation of compliant archiving and records management, enabling you to securely store, manage and retrieve email content and ensure regulatory compliance. This is done
through the archiving, control, and monitoring of Lotus Notes and Microsoft Exchange email to reduce the size of the email database, improve server performance, and control the lifecycle of email content.

OpenText Application Governance and Archiving for Microsoft SharePoint provides integrated, end-to-end management of SharePoint sites and documents across an entire enterprise.

OpenText Storage Services for Microsoft SharePoint stores Microsoft SharePoint content in external storage devices and reduces wasted space by automatically detecting multiple instances of the same content. This ensures the scalability and performance of SharePoint, reduces the costs accorded to having it reside on the very costly production environments that host SharePoint. Also, by redirecting content storage can increase efficiencies, enabling the storage and management of a larger number of documents, simplifying backup and restore processes, and allowing customers to house SharePoint content on less expensive storage devices.

The content metadata attached to documents is still stored and maintained within SharePoint. No stubbing or linking is used in this method of externalization, so end-users can create and edit content within the SharePoint environment seamlessly with no knowledge of the storage management going on behind the scenes.

In addition to lowering overall storage costs, Storage Services for SharePoint can help businesses to meet requirements for information retention. Storage Services for SharePoint ensures that business-critical content is secured in multiple physical locations, and enables the housing of information on a number of different storage devices to meet business and compliance requirements.

The product detects multiple file instances to ensure that only a single instance of every file is stored, provides optional content compression before storage configurable for each individual logical archive, monitors archive server through events and notifications. It uses secure encryption to ensure content is always protected.

OpenText Integration Center for Data Archiving enables full audit and records management for data archiving alongside files and emails from any business application to the ECM Suite. It is a data and content integration platform that unifies information silos that cross application boundaries, consolidating and transforming data and content throughout the entire information ecosystem, including leading-edge ERP, CRM, and ECM systems as well as legacy applications. This product enables you to:
  • control the lifecycle of your data with integrated Records Management and Archiving in the ECM Suite;
  • archive data to the ECM Suite's Archive Server from any application;
  • utilize full record extraction from source systems and transport to the Archive Server;
  • automatically apply lifecycle management rules to archived data;
  • transform data, enhance content metadata, and deliver records into the Archive Sarver as one process;
  • perform monitoring and generate audit trails for reporting.
See more about Open Text products in my upcoming posts.

Monday, May 7, 2012

Content Management Systems Reviews - Open Text - ECM Suite - Content Lifecycle Management

In my last post about Open Text, I started describing Open Text ECM Suite. The subject of my today's post is Open Text ECM Suite - Content Lifecycle Management. Content Lifecycle Management includes document management, imaging, records management, and archiving.

Managing, controlling, and securing content is critical to an organization’s overall information governance strategy. OpenText ECM Suite, Content Lifecycle Management gives organizations ECM solutions to manage content throughout its entire lifecycle.


Fully featured, highly scalable, web-based document management provides a secure, single repository for organizing and sharing enterprise content.

Workflow automates processes, such as change requests and approval, for accuracy and consistency. Processes can be designed according to corporate or regulatory standards.

Imaging transforms physical records into valuable digital assets. Scanned information is indexed and classified with customizable metadata to make it fully searchable.

Records management enables full lifecycle management of all enterprise content, electronic or physical, enabling you to control retention and ensure destruction at the right time.

Intelligent storage management optimizes storage according to business context and metadata, leveraging less expensive media and provides high-end storage reduction services.

Last time I described Document Management (formerly Livelink ECM - Document Management) solution of this suite of products. Now: imaging solution.

Imaging Solution

OpenText Imaging accommodates the need to scan paper documents and the need to view electronic documents. It covers both aspects of business document integration—capturing documents from various sources including scanners, faxes, email, and other office applications, and retrieving documents for users in many different environments, including in remote offices and via the Internet.

The OpenText Imaging solution provides interfaces for integration with workflow, OpenText Archive Server, OpenText Content Server, and other business applications. With OpenText Imaging, documents can be captured, archived, and linked to all types of business objects within enterprise applications.


The Enterprise Scan feature of OpenText Imaging:

Scales from hundreds to many thousands of documents per day.

Supports complex business processes by integrating with and initiating SAP Webflow, OpenText BPM Server, or Content Server workflows.

Integrates with Content Server, including single sign-on and scanning directly from a Content Server folder.

Integrates with SAP Business Suite applications, supporting SAP ArchiveLink and interchange of metadata between SAP systems and Content Server.

Provides built-in barcode support and automatic document separation.

Provides sophisticated pre-indexing capabilities, including pick lists and sticky fields.

Provides full-page views with the ability to zoom.

Generates scanned documents in TIFF, PDF, Searchable PDF, or PDF/A formats.

The DesktopLink feature of OpenText Imaging:

Integrates with Microsoft Word, Excel, and PowerPoint, allowing users to store and archive office documents directly from the originating application and link them to the appropriate transactions in business applications.

Stores and archives documents in Microsoft Windows Explorer via drag-and-drop or the file menu.

Stores documents in their original format or renders them in standard formats, such as TIFF or PDF, for long-term archiving and readability.

The OpenText Imaging ExchangeLink and OpenText Imaging NotesLink options:

Archive and link Microsoft Outlook or Lotus Notes emails or email attachments to the appropriate transaction in the enterprise application (such as SAP ERP), allowing the email to be searched and displayed directly within the enterprise application.

The Web Viewer, Java Viewer, and Windows Viewer features of OpenText Imaging:

Scroll, rotate, and zoom via an easy-to-use thumbnail view of document pages.

Append documents with notes that are automatically tagged with the current date, time, and user name.

Add, edit, and view annotations, including drawing elements such as arrows, lines, mark, checkmarks, and text elements.

Support form overlays, enabling documents to be displayed together in the original form in which they were printed.

Print or save documents locally, with automatic document rendering.

Perform free text search in documents like ASCII, ALF (Advanced List Format), and OTF (Output Text Format) documents.

Support ASCII, ALF, OTF, TIFF/FAX, and JPEG formats. PDF documents are supported in Web Viewer and Windows Viewer.

In my next post, I will describe records management and archiving solutions of ECM Suite - Content Lifecycle Management.

Friday, May 4, 2012

Content Management Initiative Implementation

The usual problem in content management arena is that employees spend a lot of time searching for information, re-creating information and while they are doing it, they are not being efficient and productive, and so the company looses money. Employees also use obsolete documents in their work and so the integrity of work and compliance is at risk.

How is this problem solved? By implementing a content management initiative. You have decided to implement a content management initiative. Where to start? This is the subject of my today's post.

In order for your content management project to be successful, I recommend that you follow these steps in this specific order.

Business Analysis

This is the first step in this initiative. It involves requirements gathering and development process. During this process you identify the specific needs of the business and then develop and implement the solutions to meet them. This could be for example a new content management system deployment, modification of a current content management system, integrating few content management systems, designing a search solution, etc.

You start with the user study. User study includes users’ requirements gathering and user side testing when necessary. Identify main stakeholders in you organization and include them in your user study. Involve as many stakeholders as you possibly can. All of them might be users of your system. During this process, specific needs of users as they pertain to the content management need to be identified and documented. Current content processes as they would apply to a new content management environment should be discussed with users. Your solution should be based on these requirements.

Based on the user study, the project requirements document (PRD) should be created. This document should include all user requirements and identify the scope of the project. This document would serve as the foundation of your project and will determine your specific actions.

User-centered design is paramount to the project success. When system is deployed based on users’ requirements, they are going to use it. Users will have the sense of ownership of the system which provides excellent starting point in the user adoption process. They know that the system being deployed will be what they need. This process will also greatly help change management processes that would be associated with this system deployment.

User study will help to avoid friction in the content management environment. Users will experience discomfort and stress if they find the system difficult to use or will not find features they need. You want to make sure that their experience is easy and to facilitate an engaging environment, so that there is no disconnects between the system and users. This will also ensure that everyone responsible for content creation and management is on the same page, even if they are not on the same team or in the same department.

Without the user study, the user adoption of the system and change management processes would be very difficult.

Content Strategy

Content strategy refers to the planning, development, and management of content. Content strategy evaluates business and users’ needs and provides strategic direction on how content and content processes can help to achieve specific objectives. Content management initiative is much more likely to succeed with a solid strategy supporting it. It can also help to save cost.

Content strategy starts with the big picture and then drills down to a granular level that can be implemented and measured. It encompasses everything that impacts content, including workflow and information governance. It looks across organizational silos and integrates the different business needs, goals, and tactics. It makes sure that the end product promotes consistent, effective and efficient user experiences and business processes.

This user adoption also will be much easier with clearly defined goals, content processes, and the tactics that have been identified by the content strategy. Content strategy will also ensure that everyone responsible for content creation is on the same page, even if they are not on the same team or in the same department. The development of a strategy and plan will not only help things run smoothly, but also actually ensure the business impact that your organization is looking to achieve.

The content strategy should outline the following:
  • content types in scope of this project, content types outside of scope of this project and where they are going to be stored and managed;
  • unstructured vs structured content management environment
  • content creation processes;
  • content flow including collaboration, review, approval of content as well as localization and translation processes;
  • lifecycle of content from its creation to its archiving and destruction;
  • content archiving processes;
  • technology to be used or modified depending on your situation;
  • vendor selection if you are going to acquire a new system;
  • relationship between the existing systems where content currently resides;
  • permissions to the system and type of these permissions;
  • administrative support to the system;
  • users training and support;
  • migration of legacy content;
  • identify content owners for each content type;
  • content output formats and publishing processes;
  • post-publication processes;
  • information governance processes.

The content strategy is paramount to the project success. The content strategy could be outlined for short term and long term and what this project will mean in terms of business objectives. It should determine project goals, resourcing, workflow, and success metrics, which can save GPL from the high cost of ineffective content management initiative.


At this point in time, if you did not yet acquire a content management system, you would do a vendor selection and acquire a content management system based on your user requiremens. You would then work with your IT department and if necessary consultants to deploy the system. If you are modifying the system, you would work with your IT department to coordinate the system modification effort. You will have to write functional specification document which will outline the system functions - new and/or modified system.

Content Audit and Structure

Before any content is uploaded into a system, it is important to know what that content is and what type of content will be uploaded into the system in future. Proliferation of content without its analysis and structure will create the situation where it will be very difficult to find, reuse, and manage it. If you are looking into having a structured content management environment, this task becomes even more important. Chunks of content should be consistent and categorized. Inconsistent content cannot be efficiently reused and/or published. Taxonomy and metadata framework and archiving processes are based on the content structure.

Content structure development should be preceded by a detailed audit and analysis of existing content and projection of content types that might be uploaded into the system in future. Product and content types should be identified and detailed content audit conducted.

If you are in a structured content management environment and using DITA, after the audit, random samples of each content type should be analyzed for the similarity and list of DITA topics and their types should be created. Topics should be consistent as well as each topic’s beginning and end.

Content structure should also include definitions of how different types of content (e.g. notes, warnings, precautions) would be handled: using a topic, conditional reuse, filtered reuse, conrefs, images, tables, and what content is going to be handled through style sheets, etc.


Every information system should include two access points to information: search function and browse function. Users use search function when they know exactly what they are looking for. Users use browse function when they do not know what they are looking for. Taxonomy needs to be created to accommodate the browse function in the system.

Users do not always know what they are looking for. In fact, in most cases, users do not know what they are looking for or they know it but are not able to find it using search. Users are going to look for ways to find content. It is easy to find uncategorized content when there are just few content items in the system. When there are many content items in the system, it is going to be very difficult to find them.

In the structured content management environment, where the number of content items is bigger than in the unstructured content management environment, this problem is much more serious. If you are going to use DITA, with the component oriented DITA use, the difficulty of finding content items is increased by two or three times, because users are looking for smaller needles in bigger haystacks. In the environment where localization and translation processes into multiple languages are involved, there are going to be thousands of content items in the system. The presence of the taxonomy in such environment is absolutely critical.

Having uncategorized content is the system will cause content proliferation. Proliferated content will be very difficult to find and reuse. Duplicate content will be unavoidably created.

Metadata, Naming Conventions, Controlled Vocabulary


Metadata values for content items need to be defined to accommodate the search function of content items in the system. Each content type should have metadata assigned to it. Metadata values would be the criteria that users need to use to search for content items.

The general system search will accommodate the full text search of content. This search would be sufficient when there are just few content items in the system. When there are many content items in the system, the general system search will retrieve a long list of irrelevant items. Users are not going to browse through long lists of items. To make the search precise, the presence of metadata is necessary. If metadata is present, the search can be performed using metadata rather than full text search.

Metadata should be based on the content structure. Metadata should be validated in the user study and user side testing when necessary and adjusted as needed.

Naming Conventions

The role of naming conventions is very important in order for users to identify content items in the list without opening each one of them. Naming conventions should be created for each content type and should be based on the content structure. Naming conventions should be validated in the user study and user side testing when necessary and adjusted as needed.

Controlled Vocabulary

Controlled vocabulary is the list of controlled terms that should be used for some of the metadata fields. These controlled terms should be standard terms used in standard publications, documents, majority of users, etc. Controlled vocabulary would help to ensure that metadata values are consistent. Consistent metadata will ensure high precision search.

QA and System Set Up

After all the above items have been completed and the system has been deployed or modified, perform thorough QA testing of the system at this time, fix bugs if there are any and then perform regression testing.

Then set up your content management system based on the above criteria and per users requirements. After the system has been set up, demo content can be uploaded in preparation for user acceptance testing.

User Acceptance Testing

Prepare the system and the script for user acceptance testing. Invite all user groups to test the system. User acceptance testing would help to validate that the system meets user requirements and to encourage users to start using the system. When they participate in this process, this gives them the feeling of ownership of the system. This process will also help to uncover problems and/or bugs in the system and to see what suggestions users might have. User acceptance testing is paramount to user adoption and change management processes.

Pilot project

If user acceptance testing has been successful, upload content for a pilot project into the system. Start with a simple project that does not have many content items. The reason for the pilot project is that if there are problems in the system or in the process, it is easier to fix them with having just few content items in the system. Invite all user groups to test the system. If the pilot project goes well, continue with the next project which could include more content items. Upload few projects into the system. Then create a plan for the migration of legacy content.

Thursday, May 3, 2012

Content Management Systems Reviews - Alfresco

In my post about open source content management systems (CMS), I mentioned that Alfresco, Drupal, Joomla, Apache Jackrabbit, Liferay are just few of the open source CMS. In this post, I will describe Alfresco which is very popular CMS.

Alfresco is an enterprise content management system for Microsoft Windows and Unix like operating systems. There are two types of Alfresco: Alfresco Community Edition and Alfresco Enterprise Edition. Alfresco Community Edition is free software. Alfresco Enterprise Edition is commercially and proprietary licensed open source for an enterprise. Its design is geared towards users who require a high degree of modularity and scalable performance.

It includes a content repository, an out-of-the-box web portal framework for managing and using standard portal content, a CIFS interface that provides file system compatibility on Microsoft Windows and Unix like operating systems, a web content management system capable of virtualizing webapps and static sites via Apache Tomcat, Lucene indexing, and Activiti workflow. The Alfresco modular architecture is developed using Java technology.

Alfresco has been built on leading industry standards, including: REST, RSS, Atom publishing, JSON, OpenSearch, OpenSocial, OpenID, Web Servcies, JSR 168, JSR 170 level 2, MyFaces, CIFS, FTP, WebDAV, SQL, ODF and CMIS.

The system has been designed with high scalability. It can be architected to support a large community of users and to be able to manage the high volumes of content associated with enterprise wide deployments. Simple to configure clustering allows companies to scale their Alfresco deployment.

Simple administration, changing server settings, can be done via standard JMX tools without the need to stop the Alfresco server.

Content Platform

Content platform is used for the system modules and includes the following features:

Rules and Aspects Services - create content rules on a folder, start a workflow, convert content into another format, move to another folder, notify a set of users via email, and extract the properties such as author, keywords, etc. from an office document.

Library services - check-in/out; minor and major version control.

Auditing services - who created, who updated, when created, when updated, when read, when logged in.

Search services - combined metadata, content, location, object type and tag search.

Transformation services – extensible engine with large number of in-built transformations including Office to PDF or Flash.

Thumbnailing services – content thumbnailing of first page.

Content modeling – create new content types without the overhead of inheritance.

Collaboration Services - REST based services - site, person, invite, activities, preferences, discussion, blogging and commenting.

Activity services – activity feed on the "who, what, when and where" of repository services – new or edited content, comments, new team members, critical calendar dates

Share interface

Share interface is used for the system modules and enables global teams to collaborate on content and projects. It includes social features such as status updates, content activity streams, tagging, and search. Team tools include a document library, blog, wiki, calendar, and simple workflow.

It includes the following features:

RSS Feeds - proactive feeds automatically update team members of changes – who did what, where and when.

Create Virtual Teams with user invitations and easy control of permissions.

Personal dashboard - allow users to setup and view information in a variety of ways.

Project Dashboard - each project has a dashboard to provide access to all project information including activities, team members, project calendars, modified content and project links.

Project Calendars - team calendars capture and share critical project dates.

Discussion Forums - team members can use online discussion forums to raise issues, discuss topics and capture thoughts to be shared with other team members.

Project Blogs - team members can draft project blogs. These can be reviewed within the team before being published externally.

Wiki pages.

Project Data Lists - users can create and share lists of items.

Social Tagging - social content (documents, blogs, wiki pages, discussion posts, etc.) can be tagged by team members, providing easy navigation to content.

Image Light Box - used to browse images managed within each project.

Alfresco includes document management, records management, web management modules.

Document Management

Document management module is architected to support a large number of users and to be able to manage very high volumes of content.

It includes full ECM functionality delivered through the modern, consumer-like Share interface. There is a single unified repository to manage any type of content – documents, images, video and audio. With support for CIFS and WebDav, IMAP & SharePoint protocol, you can drag and drop files right into Alfresco just like into a shared network drive. Alfresco can be mounted as an IMAP service in your email client, so that you can drag and drop content into Alfresco right from email.

There is inline preview. You can preview popular file types (like Microsoft Office documents, PDFs and images) directly within your browser, without having to download them.

Alfresco provides the ability to automatically create more than one document format for any content within the system. For example, Microsoft Word documents can have a PDF version automatically created at the end of an approval workflow for later publication on the website.

Alfresco looks just like SharePoint to Microsoft Office, allowing users to upload, check-in, check-out and modify content right from MS Office. It includes version control and allows users to track major and minor versions of documents with an audit trail.

Users can define unique Types and the associated metadata. More powerful than Types is the ability to create Aspects. Aspects can hold a set of custom metadata and be applied to any document, regardless of content type.

Alfresco provides workflows to help automate the processing of documents. Workflows can be built to support simple review and approval processes or can be configured to support more complex business processes. Users can create simple document workflows by themselves.

The system includes fine grained security levels, based on user, group and role management to control access to content.

Content can be replicated between Alfresco systems. Remote offices can have read-only access to content locally providing them with quick access and reducing wide area network traffic.

Lightweight scripting allows developers to create new reusable components using Javascript, PHP, and freemarker.

It is compliant with open standards like CMIS & JSR 168.

Records Management

Alfresco is used to manage the lifecycle of the content before it becomes a record. This allows the managing of the review and approval process of a company report as it goes through multiple revisions before the final, approved, version is filed as a record.

Records management module is built on top of Alfresco's Document Management repository, including the Share interface. You can upload records using drag and drop from the desktop or email client, or any web browser. It can be added as an IMAP service in any standard email browser, allowing users to drag and drop emails into Alfresco for uploading.

It includes multiple interfaces. End users can use the most appropriate interface to allow them to add new records.
Using Microsoft SharePoint protocol, users can upload and file records from within standard Office tools. Users can use the same web interface that they use to manage other content to upload, manage, and declare records.

Alfresco supports a multi-stage process for filing and declaring records. This allows users to file and then, at a later date, add the required information to enable them to declare the record. Users use the same Alfresco system to store and manage all of their content.

Users can create record series, record categories and record folders. Simple point-and-click configuration allows users to create unique record retention schedules for each of the record categories. Users can identify records that need to be reviewed, automatically prompting the user at the end of the review period. There is transfer support which support the ability to transfer records at the end of the disposition cycle. Full audit logs enable users to track who did what and when for each record.

The system provides support for a range of record types including electronic records (standard documents, scanned records, PDF records, and web records) and physical records. It also provides support for a wide range of relationships (e.g. Supersedes, Versions, References, etc.) Administrators can add their own relationships to support unique business requirements.

There is full support for holding records in the case of litigation, ensuring that records are not destroyed as part of the normal retention schedule when legal discovery is involved.

Multiple roles enable users to control which activities are available to each user. Role definitions can easily be extended to support each company's requirements.

This module is 5015.2 certified.

Web Content Management

This module provides an integrated collaboration environment to allow web teams to work together. Advanced social collaboration capabilities of Share interface can be used to work with globally distributed virtual teams.

Content can be created and modified directly within the web application. Transformation tools automatically convert office files into web-ready formats for publishing – removing manual conversion processes. Office-to-web automatically publishes enterprise content. Users can choose from a range of different interfaces for creating and updating web content.

This module acts as shared network drive. Users can continue to use existing desktop tools and simply use drag-and-drop to upload new content without the need for downloads or plugins. Using Microsoft SharePoint protocol, users can seamlessly upload and modify web content from within standard Office tools. Users have an ability to quickly create new sites, or micro-sites.

There is support for business processes that control how new content is managed through a review and approval process. Transformation services can repurpose content for delivery through multiple channels – web, smart phone, tablets, etc. Content can be published between multiple environments.

The module provides a scalable development platform that can be quickly downloaded and easily extended to meet business needs.

You can add more features as your requirements change. Using repository clustering, flexible deployment and transfer services, architects can define and build web infrastructures that they can scale to meet future business needs. The system provides repository interoperability, reduces vendor lock-in, and simplifies content migration.