Galaxy Consulting Blog: Content Management

Showing posts with label Content Management. Show all posts

Thursday, April 19, 2012

Cloud Content Management

Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network, typically the internet. With the power of cloud computing, small businesses can enjoy the same level of IT infrastructure as Fortune 500 companies with vastly limited overhead.

End users access cloud based applications through a web browser or a light weight desktop or mobile application while the business software and data are stored on servers at a remote location. Cloud application providers strive to give the same or better service and performance than if the software programs were installed locally on end-user computers.

In the past, all of a business computing would have to be done by its own servers. Now that we can compute over the web, servers do not have to be located within business offices, and with cloud computing, they do not even have to be owned by the business!

In addition to hosted servers, small businesses can now purchase software as a service (SaaS) that is hosted online and completely scalable. With software purchased as a service, a small business no longer needs IT personnel on site to install and maintain software and hardware. SaaS allows businesses to purchase software without multi-year contracts and without painful software installation.

An added benefit of cloud computing is its document storage capabilities. The cloud revolutionizes the way you store and access data. A great way for businesses to harness the power of the cloud is by utilizing cloud content management systems that store and organize documents online. By doing this, a business can securely leverage all the benefits of cloud computing for its content. Accessibility, scalability, sharing and collaboration are only a few of the benefits cloud content management can offer.

Because on-premise enterprise content management (ECM) software requires significant commitment of time and money, only big organizations have been able to take advantage of the efficiency, productivity, and cost savings of automated document management and workflow. These organizations can afford highly-customized on-premise systems that help them gain competitive advantage over smaller rivals.

Cloud enterprise content management solutions level the playing field for organizations of any size. Now, the smallest company or any budget-pinched department within a larger organization can have all the computing power, and the efficiency and productivity gains of the biggest companies.

The cloud enterprise content management (ECM) offers tremendous advantages over traditional on-premise content management software implementations. Solving a typical content management business problem requires the integration of multiple technologies like document management, workflow, scanning, capture, email management, etc.

For many organizations this involves a lengthy implementation process from initial business specifications to hardware and technology planning to cross-departmental functional groups to actual software installation and customization. It may be many months from the initial business need for a content management solution to the actual workable solution.

A cloud ECM solution, however, comes as a pre-loaded and immediately usable business solution. A cloud enterprise content management platform is accessible via an internet connection. There is no need to add additional modules or pay for expensive and time consuming integration services. A business unit can be up and running in days with an enterprise content management and workflow solution, while the on-premise ECM software implementation project remains stuck in its planning phase.

In content management, these advantages play an even larger role in project success. This is why analyst firms like Gartner and Forrester predict cloud providers to grow much faster than on-premise ECM software vendors in the coming years.

Benefits of Cloud Content Management

Rapid Deployment

With no hardware or software to install and no servers to buy, cloud content management virtually has no setup time. So, it can be deployed very quickly.

Access Anywhere

For a typical enterprise content management (ECM) solution to work, it requires that all of its components be available when they are needed. An implementation that includes multiple technologies like document management software, automated workflow, scanning equipment, document capture, email management, etc.. means that each will have a service lifecycle, and a service level, that will need to be monitored and tracked to keep the entire solution in working order.

A cloud content management solution, however, offers an application available anytime and from any internet browser. With cloud content management, you have all of your data right at your fingertips. By managing your documents online, information is always accessible and data can be shared instantly.

Easy Collaboration

Since it can be accessed anywhere, cloud content management systems allow any authorized personnel to access and collaborate on content. Sharing lets you get information to those who need it instantly, and from anywhere in the world. With cloud content management, you can bring important documents to everyone in your business.

Integration

Solving a typical enterprise content management (ECM) business problem requires the integration of multiple technologies like document management, workflow, scanning, document capture, email management, etc. In the installed content management software world, this often means a great deal of time and dollars dedicated to making multiple technologies work together.

By contrast, content management technologies can be integrated in a cloud, including capture, document management, workflow, e-signature, eForms and much more.

Low Cost

When it comes to enterprise content management (ECM) solutions, on-premise software carries a large price tag. Add up the software license, implementation services, additional hardware and networking costs, and annual maintenance fees and on-premise content management software can be out of reach of many organizations. Cloud ECM solutions, however, offer a highly-affordable alternative to automate document-intensive processes.

Not only are initial costs much lower, but the cloud content management model also brings you reduced costs over the long-term. There are no servers or software to administer and no annual maintenance fees. This is one of the major reasons why organizations that start with one cloud application, such as content management and workflow, tend to seek cloud solutions for subsequent applications.

Secure Content

ECM Cloud platform offers the highest level of security. There are additional security controls within the cloud solution where typical on-premise content management products do not have them.

Enhanced Business Agility

In a cloud, content management solutions can be quickly and easily tailored to meet your changing document management and workflow needs. In addition, there is an ability to take advantage of new features and enhancements as they become available.

Thursday, March 22, 2012

Structured Content Management

Organizations of all sizes are beginning to realize how content and its reuse across the enterprise can improve productivity. The need for change is driven by the desire to better manage information assets (documents, creative ideas, illustrations, charts, graphics, multimedia, etc.) and eliminate costly processes that fail to facilitate the effective and consistent re-use of content.

Content reuse can take a variety of forms. The most common reuse scenario is dynamically updating multiple web pages when content is added or removed from a site. There are also content reuse opportunities across multiple web sites, as in the case of co-branding and syndication. Content reuse is critical and often complex when supporting print and web publishing. Perhaps the biggest impact content reuse is in efficient multilingual publishing.

To reuse content it must be structured. Structured content simply means that the information is stored in a format that defines and describes the content. Extensible Mark-Up Language (XML) is a simple and effective format for creating and managing information. Using XML you can describe the content that you are managing, so a headline will actually be defined as a headline, and likewise for a price, a product description, a caption, etc.

Although structuring takes some planning, the benefit is enormous. You can easily re-use text and media for a variety of purposes. You can create publications quickly because images and text are easy to find and put together. Updating your publications is easier because you only need to make changes in one place, and it updates everywhere the content is used. Managing structured content happens in an XML-based content management system (CMS).

There are often great benefits to content structure. Benefits include:

making content more retrievable and re-usable;
reducing costs and complexity of translation;
enforcing authoring, style, and branding guidelines;
improving information interchange.

XML is the industry standard format for structuring content. It is very easy to work with and is easy to migrate to other formats. Graphics, video, Word documents, PDF's and other files are wrapped in XML to provide structure and metadata that makes the files easy to find and manage. XML was explicitly designed to represent the very hierarchical models of content.

There are four basic parts critical to structuring information:

defining content types;
identifying rules of content hierarchy;
creating modular content units;
applying standards consistently.

Defining Content Types

When you begin to analyze your existing documentation and future requirements, think about your content according to its informational type rather than its format. Procedures, topics, facts, terms, definitions, prices, product numbers, and product descriptions are common information types.

As you continue to analyze the content you create, you will likely discover that many content types are reusable. For instance, you may discover that there is no reason that your product description should be any different regardless of where it is published.

Identifying Rules of Content Hierarchy

The most significant way that structured documents differ from unstructured ones is that structured documents include rules. These rules formalize the order in which text, graphics, and tables may be entered into a document by an author. For example, in an unstructured document, a paragraph has specific formatting - font, size, and spacing. In a structured document, this same paragraph also has an exterior wrapper that governs the elements that are allowed to appear before and after it. The elements' rules are defined in a document type definition (DTD) or schema.

Structured content management implies moving away from formatting cues to signal such relationships within a document and instead working with information rules. This is where the power of the information model comes from, but also the difficulties in change management, in ways authors are used to working with CMS.

Creating Modular Content Units

Structured content management requires that you begin to look at the content you create as separate, identifiable chunks of information that can be reassembled differently depending on audience, purpose, or delivery method. This represents an intentions based analysis, and not an academic exercise. How, where, and when you intend to re-use that content should drive your modularity.

These chunks of information, once identified and tagged, can be reassembled (reused/repurposed) in other information products. They can even be reused in a different order. Modular content from a source document could be reused in a marketing brochure, user manual, and customer-facing web site.

Using Standards Consistently

At a subconscious level, you may understand the importance of following internal standards, branding guidelines, and formalized structure. But, it is human nature to continue to find reasons to override templates or alter the format "just this one time."

Breaking the rules is not allowed when it comes to structured authoring. Reuse is only possible when your information is consistently structured. Imagine how useless a phone directory would be if the data entry clerks at the phone company were allowed to enter information in any order they choose. Some clerks use the first field for first name, others for last name. And instead of last name, first name, ordered alphabetically, what if some of the listings were first name, last name?

Of course, most enterprise content is not as highly structured as a phone book. But if your goal is to reuse content, it must be be structured consistently. If adhering to a particular document standard seems painful, re-examine whether the content is really as structured as you think, or change your expectations about how much information can be re-used and easily shared. Document models can be made easier and more flexible, but with a cost in downstream utility of the information.

XML Building Blocks

You have identified your content types, chunked them into modular components intended for re-use, established the relationships among those chunks, and decided that you can live with them in a componentized fashion to the extent that your team will follow that structure consistently.

These concepts are important in structured XML content management:

Elements
Attributes
DTDs and Schemas

Elements

The basic unit of information is called an element. Elements can be text, graphics, tables, or even containers for other elements. In short, everything is an element.

When you create an information model, you define a document hierarchy. A hierarchy specifies the order in which elements are allowed to be used in a particular information product.

For example, for a set of user documentation, a ChapterTitle always begins a chapter, followed by a synopsis and a bulleted list of topics in the chapter. Elements are powerful tools that allow you to create structured content appropriate for reuse.

Attributes

XML elements can be extended to contain more information than just a label. Elements have attributes which is additional information about each element. For example, a chapter element can have an optional attribute of author and the author's university affiliation. These attributes allow to find all instances of a specific author or university.

Because you can classify information based on attributes, you can create new information products from your source content that you would otherwise have to cobble together manually.

Documentation authors have long benefited from adding attributes to the elements of content they create, allowing readers to use "help" applications and user guides more intelligently. Attributes can help indicate in which information products an element should appear, and in which languages. For example, some elements should be present on a web site, but may not be appropriate for a printed guide; others should appear in the Spanish version of a document, but not in the Portuguese.

Attributes make content smart enough to know where to go. For example, elements and attributes can be harnessed to create dynamic content for web-based information products, based on the personal preferences of your users.

DTDs and Schemas

You define the structure of an information product in a document type definition (DTD) or a schema. A schema, unlike a DTD, is an actual XML document, but both are used to define information models. Both provide considerable modeling power and can help facilitate content reuse and multi-channel publishing.

Friday, March 9, 2012

Case Study - Wind River - Twiki Information Architecture

In the "Case Studies" series of my posts, I describe the projects that I worked on and lessons learned from them. In this post, I am going to describe the project of re-structuring content and information architecture of a content management system based on Twiki in Wind River.

Wind River is a software engineering company that used Twiki as their content management system. TWiki is a Perl-based structured wiki application, typically used to run a collaboration platform, content management system, a knowledge base, or a team portal.

Content organization and information architecture in Twiki were such that users had difficulty in finding information they were looking for. This situation made it difficult to find, re-use, and update content.

This also discouraged further adding of the new content and thus created areas where no documentation existed and the knowledge instead of being shared was being stored in personal computers. Storing the content in personal computers presented a risk of it being lost because it was not backed-up.

There was a lot of obsolete content because no content owners have been formally identified and no retention schedule has been set up. Collaborative work on the documents and projects was accomplished by users sending links to Twiki pages. Without these links it was very difficult to find information. There was no information governance in place and so content management processes were very sporadic.

The task was to re-organize the content organization and information architecture of the system and to set up information governance to solve these problems.

I strongly believe in user-centered design, so I performed the users study. I identified stakeholders within each Wind River team and created the questionnaire for collecting users' requirements for the system re-organization and the usability issues.

Based on these requirements, I re-organized the content structure and information architecture of the system. The major key to this re-organization was that the structure is very simple and intuitive. I made navigation very simple, created very intuitive labels, made sure that there is not too much information on one page and a user does not have to scroll down a very long page, that each page has a correct breadcrumb, and created taxonomy of webs (the building blocks of Twiki). Based on this taxonomy, I re-organized the location of documents. I also enhanced the system search.

For each content type, document owners were identified and retention schedule was set up with the function to flag the content that would reach an expiration date according to the retention schedule. This flag function would send an email notification to the content administrator that a certain document reached an expiration date. This alert allowed the content administrator to contact the document owner for the decision on what should be done with this document: review and update, move to an archive, or delete.

User acceptance testing of the system was performed. Users were satisfied with the system's new information architecture and indicated that it became much easier to find information.

The system with new content structure and information architecture was deployed.

Information governance was set up. Group and individual training was conducted on ongoing basis.

The project was a success. Company management and users were very cooperative in helping to make this project a success. It helped to increase efficiency and productivity and thus saved Wind River cost because employees did not waste any time on searching for documents or recreating documents that already exist.

Lessons learned

1. User-centered design is paramount to the project success. When you design and build the system based on users’ requirements, they are going to use it. Users have the sense of ownership of the system which provides excellent starting point. They know that the system you are building will be what they need.

2. Top-down support is critical for the project success. Management support is a huge factor in employees' encouragement to use the system and in setting up and enforcing procedures for information governance.

3. Assurance of users from the very beginning that they will not be left alone with the system provided their cooperation.

4. User acceptance testing helped to encourage employees to start using the system. When they participate in this process, this gives them the feeling of ownership of the system.

5. Ongoing training after the system deployment with the new content structure and information architecture made user adoption smooth.

Wednesday, February 8, 2012

Developing Enterprise Search Strategy

During last ten years the volume and diversity of digital content grew at unprecedented rates. There is an increased use of departmental network drives, collaboration tools, content management systems, messaging systems with file attachments, corporate blogs and wikis, and databases. There are duplicate and untraceable documents that crowd valuable information needed to get work done.

Unfortunately, not all content makes into it into a managed content repository, like a portal or a content management system. Some companies have more than one content management system. Having a search solution that could search across all content repositories becomes very important.

Expectations for quality search continue to rise. Many users like to use an expression: "we would like a search like Google". So, how do we formulate a search strategy?

Here are few key points:

Security within enterprise search strategies should be carefully designed. Information like employee pay rates, financial information, or confidential communications should not end up in a general search results.
Search results should deliver high quality, authoritative, up-to-date information. Obsolete information should not end up in the search results.
Search results should be highly relevant to keywords entered in a search box.
The ability to limit the search should be included.

Steps to Develop an Enterprise Search Strategy

Step 1: Define Specific Objectives for Your Search Strategy

People don’t search for the sake of searching. They search because they are looking to find and use information to get their jobs done. Answer these questions:

1. Who is searching? Which roles within the organization are using the search function, and what requirements do they have?

For example, a corporate librarian is likely familiar with Boolean search and using advanced search forms, while a layperson searcher likely prefers a simple search box. A sales professional may need an instant access to past proposals for an upcoming meeting, but compliance professionals conducting investigations often use deep search across massive message archiving and records management systems.

2. What categories of information are they looking for?

Define the big buckets of information that are the most relevant to different roles. Realize that not all roles need all information. Part of why desktop search tools are popular is they inherently define a bucket called "stuff on my machine". Defining categories for searching project information, employee information, sales tools, and news helps searchers formulate the right query for the right type of search.

3. What are they likely to do with the information when they find it? After defining broad information categories, work to understand context and answer the question: why are people searching?

For example, if a marketer is collecting information on a particular competitor by searching on the company’s name, it is often useful to expand that query to include related information, like other competitors in the industry, specific business units or product lines, pricing information, past financial performance. Related information can be included in search experiences through a variety of methods, including the search results themselves or methods like faceted navigation.

It is impossible to account for every type of information that users may be looking for, but defining broad user roles, like sales professionals or market researchers and identifying their most common search scenarios is a great way to create the scope of a search project. Use such methods as personas, use cases, interview users to validate assumptions about what processes they are involved in, and identify the information that is most useful to support those processes.

Step 2: Define the Desired Scope and Inventory Repositories

When using the search function built into a particular content management system, the product itself limits the scope of the search to whatever is stored in this system. Search engines such as Autonomy, Endeca Technologies, Google, Vivisimo, and others will search across multiple content management systems and databases. Increasingly, portal products and collaboration platforms from companies like IBM, Microsoft, Oracle, and Open Text will also let you search content that is stored inside and outside of their systems.

Use search to reach outside the confines of a single repository. Cross-repository search becomes essential when companies use different content repositories for different purposes.

Match roles and search categories to relevant content sources. Search requirements often include multiple repositories, such as document libraries, file systems, databases, etc. These repositories usually consist of multiple technology products, such as Lotus Notes, EMC Documentum, Microsoft SharePoint, and others. Using the roles and types of searches you are looking to support, identify all of the relevant repositories necessary to achieve your desired search scope.

Create an inventory of required repositories. When creating your inventory, document the name of each repository, a repository owner, a description of its content, an assessment of the quality of this content, and the quantity and rate of growth of content in each repository. Also document the technology product used as well as any specific security access policies in place.

Consider a phased rollout and select simple but telling data source repositories for kick-off. When rolling out a project such as search strategy that involves disparate sources and complex UIs, a phased rollout may be preferable depending upon factors such as resource constraints and time-to-launch pressure. By approaching the project in phases, you can vet the process and workflow while familiarizing users with the objectives.

Inventory and prioritize the repositories at the start of your project so that you can identify and start with the repositories that will have a big impact. For example, basic queries into a CRM system can add a lot of value while remaining relatively straightforward. Throughout this process, it is important to set expectations with your users, since this approach may lengthen their involvement with the project.

Documenting your repositories lets software vendors effectively size and bid on your project. Most search software gets priced based on the number of documents (or data items) in the index plus additional fees for premium connectors that ingest content from repositories like enterprise content management systems.

For example, strategies that require a limited set of commodity connectors are priced altogether differently than those with premium connectors for content management systems and enterprise applications. Thus, knowing which repositories are relevant and understanding the rate of content growth within them can help avoid unnecessary overspending.

Step 3: Evaluate and Select the Best Method for Enriching Content

When addressing content with very little descriptive text and metadata, evaluate several methods for enriching the content to improve the search experience. Methods range from manual application of metadata to automatic categorization. Some companies use a mix of both methods.

Step 4: Define Requirements and List Products and Vendors to Consider

After specifying a search scope, define requirements for users. The most important is not to get distracted with irrelevant features, but instead to focus on products that adequately meet the organization’s requirements over a specified time period. Consider factors like ease of implementation, product strategy, and market presence in any product evaluation.

Score and select vendors on criteria that are relevant for your needs. There are many vendors to choose from. Search vendors include Autonomy, Coveo Solutions, Endeca Technologies, Exalead, Google Enterprise, ISYS Search Software, Recommind, Thunderstone Software, Vivisimo, and others. Also large software providers such as IBM, Microsoft, Oracle, and SAP have one or more search products on the market.

Product capabilities range from highly sophisticated, large-scale, secure searches that mix advanced navigation and filtering, to basic keyword searches across file systems. Products differ depending on whether the content being searched consists primarily of data. For example, high-end search companies like Endeca offer robust tools for searching structured data from databases, while small-scale basic file system search needs can be met with products like the Google Mini or the IBM OmniFind Yahoo! Edition.

Step 5: Define a Taxonomy of Logical Types of Searches

While it is impossible to predict and account for everything people search for, it is possible to organize the search experience so it is intuitive to use. Start with defining logical types of searches. For example:

People Search. Searching for employees has gained acceptance as a valuable type of search within enterprises for finding expertise on a subject. A search for people, whether it is a simple name look-up or more advanced expertise search, requires attention to everything from how the query gets processed to how results appear in the interface. For example, searchers typically want to see an alphabetical list of names in a people search results as opposed to results ranked by relevance.

Product Search. A search for products frequently needs to include product brand names (e.g., Trek), concepts and terms related to the product (e.g., bike, bicycle, road race, touring), product description, and specific product attributes, like frame size, material, and color. Knowing where all of this information is stored and how it should be optimally presented to end users is essential.

Customer Search. It is now possible to search and return results for virtually any logical item in an enterprise, like orders, customers, products, and places. You should look into sources like enterprise data warehouses, ERP systems, order histories, and others to create a full picture of the items that is being searched.

Documents Search. Documents usually reside in few repositories, so be sure to include them in your search sources. Users expect search results to be highly relevant with most relevant to be on the top of the search results list.

By bucketing types of searches into logical categories, you can also improve the quality of those searches. Several methods include applying type specific thesaurus, taxonomies, and controlled vocabularies.

Administrators can influence the relevance algorithm in a way that returns the right information the right way, like weighting hits in a product description more heavily than a product attribute field.

Step 6: Plan for a Relevant User Experience

Recognize that not all search experiences should be the same. Google, Yahoo!, and MSN’s popularity on the Web have generated strong interest in offering simple-to-use wide search boxes and tabbed interfaces within the enterprise. But in the enterprise, it is often helpful to use more advanced interface techniques to clarify what users are looking for, including:

Faceted navigation adds precision to search. It exposes attributes of the items returned to an end user directly into the interface. For example, a search through a product information database for "electrical cables" might return cables organized by gauge, casing materials, insulation, color, and length, giving an engineer clues to find exactly what he is looking for.

Statistical clustering methods remove ambiguity. Methods like statistical clustering automatically organize search results by frequently occurring concepts. Clusters provide higher level groupings of information than the individual results can provide, and can make lists of millions of documents easier to scan and navigate.

Best bets guide users to specific information they need. Creating best bets is the process of writing a specific rule that says something like: "when a person enters the term "401K plan" into the search box on the corporate intranet, they should see a link to the "401K plan" page on the intranet".

Additionally, products like Google OneBox and SAP’s Enterpise Search Appliance enable retrieval of frequently searched facts, such as sales forecast data, dashboards, and partner information from back-end ERP systems. Best bets help users avoid a lot of irrelevant results and are very effective for frequently executed queries.

Use basic interface mock-ups and pilot efforts to test, refine, and make these concepts useful for employees in your organization. Many companies use a "Google Labs" style page on their intranets to test out search user interface concepts and tools prior to exposing them more broadly to the enterprise.

Step 7: Implement, Monitor, and Improve

For large projects, allow a lot of time for change management. Teams should maintain the interface between the search engine and all of its back-end content sources.

It is essential to keep IT individuals informed of product evaluation and selection plans so that the final implementation supports security and regulatory policies that are in place for these systems.

Create a plan for ongoing maintenance of search indexing processes and exceptions. Create a monthly reporting plan that lists most frequent searches performed, searches that did not retrieve results, and overall usage of the search function. This can help you troubleshoot existing implementations and drive future decisions on how to enhance the search experience over time.

Enhancements typically include adding types of searches to the experience, further enriching content assets for better retrieval, and incorporating new, valuable content into the overall experience.

In my future posts, I will describe search products such as Autonomy, Coveo Solutions, Endeca Technologies, Exalead, ISYS Search Software, Recommind, Thunderstone Software, Vivisimo, and others.

Monday, January 16, 2012

Component Content Management

In my last post, I described how DITA is used in dynamic content management. I will continue the subject of dynamic content management in this post.

DITA was conceived as a model for improving reuse through topic-oriented modularization of content. Instead of creating new content or copying and pasting information which may or may not be current and authoritative, organizations manage a repository of content assets – or DITA topics – that can be centrally managed, maintained and reused across the enterprise. This helps to accelerate the creation and maintenance of documents and other deliverables and to ensure the quality and consistency of the content organizations publish.

Dynamic content management is also called component content management. It is also called single source publishing. DITA is its foundation. A component content management system (CCMS) is used for managing component content. A component content management system (CCMS) is a content management system that manages content at a granular or component level rather than at the document level. Examples of such systems are Interwoven, Documentum, AuthorIT, DocZone, Vasont, SiberLogic, Trisoft, Astoria, Tridion.

What exactly is a component? Each component represents a single topic, concept or asset (e.g., image, table, product description). Components can be as large as a chapter or as small as a definition or even a word. Components in multiple content assemblies can be viewed as components or as traditional documents. Reuse allows the core component to be edited and maintained in one place, and then be assembled into thousands of documents where it is needed.

Each component is only stored one time in the content management system, providing a single, trusted source of content. These components are then reused (rather than copied and pasted) within a document or across multiple documents. This ensures that content is consistent across the entire documentation set. Each component has its own lifecycle (owner, version, approval, use) and can be tracked individually or as part of an assembly.

Component Content Management can be regarded as an overall process for originating, managing, and publishing content right across the enterprise and to any output.

Component content management provides significant benefits and cost savings over traditional document authoring and maintenance methods. Some of these are:

greater consistency and accuracy;
reduced maintenance costs;
reduced delivery costs;
reduced translation costs.

And more specifically:

Faster time to market because authors spend far less time creating and recreating the same content, reviewers spend less time reviewing, translators spend less time translating. Publishing to print, Help, and Web formats is fully automated. This is achieved by controlling standards, eliminating duplication, and effectively managing creation, localization, and publishing of content.
Efficient use of resources by eliminating repetitive creation and maintenance, more of your resources can be devoted to improving the quality of the content and adding value to your documentation.
Slashed translation costs: content is translated only once no matter how often it is reused. Translators only ever work on new or changed source content, so you don’t pay for them to handle unchanged text. Real projects have shown reductions in translation word count in excess of 30%.
Improved quality and usability of content: through easy definition and enforcement of standards you can guarantee consistent documentation structure and formatting, increasing readability and usability. Using single-source content ensures 100% consistency wherever it appears.
Improved workplace satisfaction: free authors from tedious, time-consuming tasks such as formatting and repetitive updates, so they can concentrate on creating and improving content. Reviewers gain by reviewing content only once, regardless of the number of end deliverables. Writers save 95% of the time they usually spend formatting content.
Increased customer satisfaction: consistent, accurate documentation of all types means fewer calls to customer support, because you are providing the right information, at the right time, in the right format.

Generating content takes time and money. As such, content should be treated as the valuable business asset that it is. To get maximum value from your content, you should be able to do a number of things:

You should be able to re-use content across documents without copying, so that you can write it once, and maintain it in a single place no matter how many times you have used it.
You should be able to use content created for one purpose equally well in other contexts and for other purposes.
You should be able to translate re-used content once and have it automatically reflected anywhere it is used.
You should be able to publish to print, help, and web outputs without having to modify or make different versions of your content.

These measures provide the potential for increasing the quality and consistency of your documentation, for reducing the cost and time involved in producing it, and for gaining more value from every piece of content that you create.

In my future posts, I will describe component content management systems.

Saturday, January 14, 2012

DITA and Dynamic Content Management

In my previous post on DITA, I mentioned that DITA, Darwin Information Typing Architecture, is an XML-based architecture for authoring, producing, and delivering information. In this post, I am going to describe more details about DITA and how it is used in content management.

At the heart of DITA, representing the generic building block of a topic-oriented information architecture, is an XML document type definition (DTD) called the topic DTD. The point of the XML-based Darwin Information Typing Architecture (DITA) is to create modular technical documents that are easy to reuse with varied display and delivery mechanisms.

Main features of the DITA architecture

As the "Architecture" part of DITA's name suggests, DITA has unifying features that serve to organize and integrate information:

Topic orientation. The highest standard structure in DITA is the topic. Any higher structure than a topic is usually part of the processing context for a topic, such as a print-organizing structure or the navigation for a set of topics.

Reuse. A principal goal for DITA has been to reduce the practice of copying content from one place to another as a way of reusing content. Reuse within DITA occurs on two levels:

Topic reuse. Because of the non-nesting structure of topics, a topic can be reused in any topic-like context.

Content reuse. DITA provides each element with a conref attribute that can point to any other equivalent element in the same or any other topic.

Specialization. Any DITA element can be extended into a new element.

Topic specialization. Applied to topic structures, specialization is a natural way to extend the generic topic into new information types (or infotypes), which in turn can be extended into more specific instantiations of information structures. For example, a recipe, a material safety data sheet, and an encyclopedia article are all potential derivations from a common reference topic.

Domain specialization. Using the same specialization principle, the element vocabulary within a generic topic can be extended by introducing elements that reflect a particular information domain served by those topics. For example, a keyword can be extended as a unit of weight in a recipe, as a part name in a hardware reference, or as a variable in a programming reference.

Property-based processing. The DITA model provides metadata and attributes that can be used to associate or filter the content of DITA topics with applications such as content management systems, search engines, etc.

Extensive metadata to make topics easier to find. The DITA model for metadata supports the standard categories for the Dublin Core Metadata Initiative. In addition, the DITA metadata enables many different content management approaches to be applied to its content.

Universal properties. Most elements in the topic DTD contain a set of universal attributes that enable the elements to be used as selectors, filters, content referencing infrastructure, and multi-language support.

Taking advantage of existing tags and tools. Rather than being a radical departure from the familiar, DITA builds on well-accepted sets of tags and can be used with standard XML tools.

Leveraging popular language subsets. The core elements in DITA's topic DTD borrow from HTML and XHTML, using familiar element names like p, ol, ul, and dl within an HTML-like topic structure. In fact, DITA topics can be written, like HTML for rendering directly in a browser.

Leveraging popular and well-supported tools. The XML processing model is widely supported by a number of vendors and translates well to the design features of the XSLT and CSS stylesheet languages defined by the World Wide Web Consortium and supported in many transformation tools, editors, and browsers.

Typed topics are easily managed within content management systems as reusable, stand-alone units of information. For example, selected topics can be gathered, arranged, and processed within a delivery context to provide a variety of deliverables to varied audiences. These deliverables might be a booklet, a web site, a specification, etc.

At the center of these content management systems are fundamental XML technologies for creating modular content, managing it as discrete chunks, and publishing it in an organized fashion. These are the basic technologies for "one source, one output" applications, sometimes referred to as Singe Source Publishing (SSP) systems.

The innermost ring contains capabilities that are needed even when using a dedicated word processor or layout tool, including editing, rendering, and some limited content storage capabilities. In the middle ring are the technologies that enable single-sourcing content components for reuse in multiple outputs. They include a more robust content management environment, often with workflow management tools, as well as multi-channel formatting and delivery capabilities and structured editing tools. The outermost ring includes the technologies for smart content applications.

It is good to note that smart content solutions rely on structured editing, component management, and multi-channel delivery as foundational capabilities, augmented with content enrichment, topic component assembly, and social publishing capabilities across a distributed network.

Content Enrichment/Metadata Management: Once a descriptive metadata taxonomy is created or adopted, its use for content enrichment will depend on tools for analyzing and/or applying the metadata. These can be manual dialogs, automated scripts and crawlers, or a combination of approaches. Automated scripts can be created to interrogate the content to determine what it is about and to extract key information for use as metadata. Automated tools are efficient and scalable, but generally do not apply metadata with the same accuracy as manual processes. Manual processes, while ensuring better enrichment, are labor intensive and not scalable for large volumes of content. A combination of manual and automated processes and tools is the most likely approach in a smart content environment. Taxonomies may be extensible over time and can require administrative tools for editorial control and term management.

Component Discovery/Assembly: Once data has been enriched, tools for searching and selecting content based on the enrichment criteria will enable more precise discovery and access. Search mechanisms can use metadata to improve search results compared to full text searching. Information architects and content managers can use search to discover what content exists, and what still needs to be developed to proactively manage and monitor the content. These same discovery and search capabilities can be used to automatically create delivery maps and dynamically assemble content organized using them.

Distributed Collaboration/Social Publishing: Componentized information lends itself to a more granular update and maintenance process, enabling several users to simultaneously access topics that may appear in a single deliverable form to reduce schedules. Subject matter experts, both remote and local, may be included in review and content creation processes at key steps. Users of the information may want to "self-organize" the content of greatest interest to them, and even augment or comment upon specific topics. A distributed social publishing capability will enable a broader range of contributors to participate in the creation, review and updating of content in new ways.

Federated Content Management/Access: Smart content solutions can integrate content without duplicating it in multiple places, rather accessing it across the network in the original storage repository. This federated content approach requires the repositories to have integration capabilities to access content stored in other systems, platforms, and environments. A federated system architecture will rely on interoperability standards (such as CMIS), system agnostic expressions of data models (such as XML Schemas), and a robust network infrastructure (such as the Internet).

These capabilities address a broader range of business activity and therefore fulfill more business requirements than single-source content solutions. Assessing your ability to implement these capabilities is essential in evaluating your organizations readiness for a smart content solution.

Tuesday, January 10, 2012

Enterprise Search

Enterprise search is the practice of identifying and enabling specific content across the entire organization to be indexed, searched, and displayed to authorized users.

How does Enterprise Search affect your organization? When individuals within your company are searching for documents, are they finding the correct and most current information? When customers visit your website, are they locating the products and services with ease? These are essential questions to make sure your company maximizes its profitability.

Business intelligence and enterprise search solutions allow both your employees and customers to locate the information they need to make the most informed decisions. Universal Enterprise Search is the key way in which companies organize their information and allow individuals to locate exactly what they are looking for.

Companies may have few managed content repositories. It could be few content management systems, knowledgebase applications, wiki applications, CRM, ERP, etc. They are usually isolated from one another. Companies also may have unmanaged content repositories such as network drives. These repositories grow fast thus resulting in disconnected enterprise knowledge assets.

In a situation like this, it would be a good idea to have a search tool that would allow users to search all these repositories at the same time. "Universal Search" term was born. Another term that was born is "Content Management Interoperability Services" (CMIS) which offers to connect some of those repositories. Companies are looking into Enterprise Search (ES) as a shortcut to their findability problems.

Enterprise search is not an alternative to ECM systems. Even though there are search applications that allow the search across different platforms, enterprise search alone is not a solution to the findability problem. While enterprise search provides great value, it does not mean that you can give up integrating your repositories or leave your content on network drives.

Whit Andrews, Gartner VP, distinguished analyst, and author of Gartner's 2009 "Magic Quadrant for Information Access Technology" report (Gartner includes enterprise search as a part of "Information Access Technology") stated that with having right enterprise search you can get search that will be very effective, but it isn't the same as 100% success. He also acknowledged that you cannot simply buy, install, and walk away from powerful ES systems. Both ES and ECM systems require diligence and governance.

Leslie Owens, a Forrester Research analyst, is also skeptical about the likelihood of an ES quick fix, although she too is a strong proponent of targeted search applications.

In addition, out-of-the-box search may get you part of the way to findability, but [it] generally works best when enhanced with supporting semantic information-such as controlled vocabularies.

Every content repository should have two access points - search and browse. Users are going to use the Search access point if they know exactly what they are looking for. They are going to enter keywords in the search field and retrieve results. If they retrieve too many results, the presence of metadata will allow them to limit the search to specific results.

If users don't know what they are looking for, they are going to browse the content to get some ideas of what is available in the content repository. At some point during browsing, they may switch to search.

Therefore having only search as one access points is not enough to ensure findability of content. Metadata and taxonomy are invaluable tools to ensure that search and browse access points work properly. Taxonomy that is intuitive would greatly help the browse access point. The successful taxonomy is the one that is not too deep and not too wide. The users should be able to get to content with as few clicks as possible and yet the list of documents should not be too long. Users are not going to scroll through long lists of documents.

Naming conventions are extremely important. If users can figure out the contents of a document from its name without opening it, it will greatly speed up the time the user will spend browsing for documents.

Having information governance in place will make sure that the content resides in the correct place, that is has correct metadata, and that the content follows procedures to keep it accurate and up-to-date.

So is it ECM or ES? The answer is both.

In my future posts, I will review search applications and will further address search strategy. I will also review few case studies in ECM and ES.

Wednesday, December 21, 2011

Content Strategy

Content strategy refers to the planning, development, and management of content. In other words, content strategy plans for the creation, publication, and governance of useful, usable content.

The purpose of content strategy has been described as achieving business goals by maximizing the impact of content.

Necessarily, the content strategist must work to define not only which content will be published, but why we are publishing it in the first place. Otherwise, content strategy is not strategy at all: it is just a glorified production line for content nobody really needs or wants.

Content strategists strive to achieve content that is readable and understandable, but also findable, actionable and shareable in all of its various forms. Content strategy development is necessarily preceded by a detailed audit and analysis of existing content.

A content strategy defines:

key themes and messages;
recommended topics;
content purpose (i.e., how content will bridge the space between audience needs and business requirements);
content gap analysis;
metadata and taxonomy frameworks and related content attributes;
search engine optimization (SEO);
implications of strategic recommendations on content creation, publication, and governance.

Content strategy may include:

Editorial strategy

Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.

Web writing

Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.

Metadata and taxonomy strategy

Metadata and taxonomy strategy identifies the type and structure of metadata, also known as “data about data” (or content) and taxonomy. Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.

Search engine optimization

Search engine optimization is the process of editing and organizing the content on a page or across a web site (including metadata) to increase its potential relevance to specific search engine keywords.

Content management strategy

Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.

Content channel distribution strategy

Content channel distribution strategy defines how and where content will be made available to users.

The content life cycle is a repeatable system that governs the management of content. The processes within a given content lifecycle are system-agnostic. The processes are established as part of a content strategy, and implemented during the content life cycle.

Aspects of a content life cycle

The content life cycle covers four macro stages: the strategic analysis, the content collection, management of the content, and publishing, which includes publication and post-publication activities.

The content lifecycle is in effect whether the content is controlled within a content management system or not, whether it gets translated or not, whether it gets deleted at the end of its life or revised and re-used.

The analysis quadrant comprises the content strategy. The other three quadrants are more tactical in nature, focusing on the implementation of the content strategy.

Analysis

In the analysis phase, the content life cycle is concerned with the strategic aspects of content. A content strategist (or business analyst or information architect or writer) examines the need for various types of content within the context of both the business and of the content consumers, and for multiple outputs on multiple platforms.

The analysis has a bearing on how the content strategy is implemented in the other quadrants of the content life cycle. On a new project with new content, this is the beginning of the process. Much of the time, the process will start somewhere else in the cycle; a lot depends on a multitude of factors involved in changing content from a current state to its future state.

Collection

Content collection includes the garnering of content for use within the framework set out in the analysis phase. Collection may be through content development - creating content or editing the content of others, content ingestion - syndication of content from other sources, or incorporation of localized content, or a hybrid of content integration and converge - such as integrating product descriptions from an outside organization with prices from a costing system, or the convergence of editorial and user-generated content from social media for simultaneous display.

Publish

The publishing quadrant deals with the aspects of content that happen when the content is delivered to its output platform and ensuing transformations, manipulations, or uses of the content. Publishing the content is only a point in the first life cycle iteration; there are post-publishing considerations such as re-use and retention policies that require attention.

Management

The management quadrant is concerned with the efficient and effective use of content. In organizations using technology to automate the management of content, the management aspect assumes use of a content management system (CMS) of some sort. In organizations with smaller amounts of content, with little need for workflow control, and virtually no single-sourcing requirements, manual management is possible.

However, in large enterprises, there is too much content, and there are too many variations of content output, to manage the content without some sort of system to automate whatever functions can be automated. The content configuration potential is enormous, and builds on the information gathered during the analysis and collection phases.

The solutions will be highly situational, and revolve around the inputs and outputs, the required content variables, the complexity of the publishing pipeline, and the technologies in play. The most basic questions are around adoption of standards and technologies, and determining components, content granularity, and how far up or down the publishing pipeline to implement specific techniques.

At its core, content strategy is a way of thinking that has direct impact on the way we do business. And the way we do business must include a clear focus on how we create, deliver, and govern our content. Because more than ever before, content has become one of the most valuable business assets.

Monday, December 5, 2011

CMS Types

There are three major types of CMS: offline processing, online processing, and hybrid systems. These terms describe the deployment pattern for the CMS in terms of when presentation templates are applied to render content output from structured content.

Offline processing

These systems pre-process all content, applying templates before publication to generate Web pages. Since pre-processing systems do not require a server to apply the templates at request time, they may also exist purely as design-time tools.

Online processing

These systems apply templates on-demand. HTML may be generated when a user visits the page or pulled from a cache. Most open source CMS have the capability to support add-ons, which provide extended capabilities including forums, blog, wiki, web stores, photo galleries, contact management, etc. These are often called modules, nodes, widgets, add-ons, or extensions. Add-ons may be based on an open-source or paid license model. Different CMS have significantly different feature sets and target audiences.

Hybrid systems

Some systems combine the offline and online approaches. Some systems write out executable code (e.g., JSP, ASP, PHP, ColdFusion, or Perl pages) rather than just static HTML, so that the CMS itself does not need to be deployed on every web server. Other hybrids operate in either an online or offline mode.

There are literally thousands of content management systems available on the internet. Each one caters to different users offering a variety of features which are consistent across all sites.

There are four main types of content management systems that each of the thousands fall under. The systems include:

1) Homegrown

2) Commercial

3) High-end

4) Open Source

Homegrown content management systems are software created by a single development company for their own use of their own products. Consequently, every aspect of the system is catered to their specific needs since they are the only organization utilizing it. The main issue is the development company relies on a single vendor to fix the bugs and create patches.

The second type is commercial content management systems. This is the most widespread offering many different pricing options, plans and features. Unlike homegrown systems, these are rarely customizable.

The third type is high-end content management systems. One nice feature is their reliability as high-end content management systems deliver robust solutions.

The final content management system is open source. This essentially means that the software is available to anyone for free. The primary advantages are the price (free!) and that these systems are fully customizable since its open source code. The main limitation is the quality of the product. They often lack stability, security, and support of certain infrastructures as you would often expect from free software.

The selection of the type of content management system is based on a customer's need. If they are a company that needs customization and price is not an issue, a homegrown system or high-end system might be the most viable option.

On the other hand, if a price tag is problematic and customization is not important then a commercial content management system is the best choice. Finally, if the consumer is not concerned with stability, security and support and likes the price tag and customization options, open source is the way to go.

Like any type of product, you get what you pay for. Those that have the money will purchase the best content management system, those that do not will function with a potentially unreliable product. Either way, the selection is based on an individual need and due to the availability of thousands of content management systems, there are many options out there.

Friday, December 2, 2011

Content Management Systems (CMS)

In order to efficiently manage content, a content management system is required. A CMS is a tool that enables a variety of (centralised) technical and (de-centralised) non technical staff to create, edit, manage and finally publish (in a number of formats) a variety of content (such as text, graphics, video, documents etc), whilst being constrained by a centralised set of rules, process and workflows that ensure coherent, validated electronic content.

A Content Management System (CMS) has the following benefits:

allows for a large number of people to contribute to and share stored data;
increased ability for collaboration;
facilitates document control, auditing, editing, and timeline management;
controls access to data, based on user roles. User roles define what information each user can view or edit;
aids in easy storage and retrieval of data;
reduces repetitive duplicate input;
documents workflow tasks coupled with messaging which allow for formal review and approval of documents;
the ability to track and manage multiple versions of a single instance of content.

Content Management Systems come in all shapes and sizes. The most known and used CMS are: Microsoft SharePoint, Interwoven, Vignette, Documentum, Livelink, Oracle ECM suite. There are also open source CMS. The most known are Alfresco, Drupal, Joomla.

One of CMS types is web content management system WCMS. It can be used to control a dynamic collection of web material, including HTML documents, images, and other forms of media.

A CMS typically has the following features:

Automated templates

Create standard output templates (usually HTML and XML) that can be automatically applied to new and existing content, allowing the appearance of all content to be changed from one central place.

Access Control

CMS systems support user groups. User groups allow you to control how registered users interact with the site. A page on the site can be restricted to one or more groups. This means if an anonymous user (someone not logged on) or a logged on user who is not a member of the group a page is restricted to, the user will be denied access to the page.

Scalable expansion

Available in most modern CMSs is the ability to expand a single implementation (one installation on one server) across multiple domains, depending on the server's settings. CMS sites may be able to create microsites/web portals within a main site as well.

Easily editable content

Once content is separated from the visual presentation of a site, it usually becomes much easier and quicker to edit and manipulate. Most CMS software includes WYSIWYG editing tools allowing non-technical individuals to create and edit content.

Scalable feature sets

Most CMS software includes plug-ins or modules that can be easily installed to extend an existing site's functionality.

Web standards upgrades

Active CMS software usually receives regular updates that include new feature sets and keep the system up to current web standards.

Workflow management

Workflow is the process of creating cycles of sequential and parallel tasks that must be accomplished in the CMS. For example, one or many content creators can submit a story, but it is not published until the copy editor cleans it up and the editor-in-chief approves it.

Collaboration

CMS software may act as a collaboration platform allowing content to be retrieved and worked on by one or many authorized users. Changes can be tracked and authorized for publication or ignored reverting to old versions. Other advanced forms of collaboration allow multiple users to modify (or comment) a page at the same time in a collaboration session.

Delegation

Some CMS software allows for various user groups to have limited privileges over specific content on the website, spreading out the responsibility of content management.

Document management

CMS software may provide a means of collaboratively managing the life cycle of a document from initial creation time, through revisions, publication, archive, and document destruction.

Content virtualization

CMS software may provide a means of allowing each user to work within a virtual copy of the entire web site, document set, and/or code base. This enables changes to multiple interdependent resources to be viewed and/or executed in-context prior to submission.

Content syndication

CMS software often assists in content distribution by generating RSS and Atom data feeds to other systems. They may also e-mail users when updates are available as part of the workflow process.

Multilingual

Ability to display content in multiple languages.

Versioning

CMS allows the process of versioning by which documents are checked in or out of the CMS, allowing authorized editors to retrieve previous versions and to continue work from a selected point. Versioning is useful for content that changes over time and requires updating, but it may be necessary to go back to or reference a previous copy.

Next time: CMS Types

Pages