Thursday, December 8, 2011

Information Architecture

Information architecture is defined by the Information Architecture Institute as the art and science of organizing and labeling web sites, intranets, online communities, and software to support findability and usability.

Information architecture is the term used to describe the structure of a system, i.e the way information is grouped, the navigation methods and terminology used within the system. An effective information architecture enables people to step logically through a system confident they are getting closer to the information they require. Information architecture is most commonly associated with websites and intranets, content management systems, but it can be used in the context of any information structures or computer systems.

Information architecture involves the categorization of information into a coherent structure, preferably one that the intended audience can understand quickly, if not inherently, and then easily retrieve the information for which they are searching. The organization structure is usually hierarchical.

Organizing functionality and content into a structure that people are able to navigate intuitively doesn’t happen by chance. Organizations must recognize the importance of information architecture or else they run the risk of creating great content and functionality that no one can ever find. Most people only notice information architecture when it is poor and stops them from finding the information they require.

An effective information architecture comes from understanding business objectives and constraints, the content, and the requirements of the people that will use the site.

Information architecture is often described using the following diagram:

Business/Context 

Understanding an organization's business objectives, politics, culture, technology, resources and constraints is essential before considering development of the information architecture. Techniques for understanding context include:
  • reading existing documentation;
  • mission statements, organization charts, previous research and vision documents are a quick way of building up an understanding of the context in which the system must work;
  • stakeholder interviews;
  • speaking to stakeholders provides valuable insight into business context and can unearth previously unknown objectives and issues. 
Content

The most effective method for understanding the quantity and quality of content (i.e. functionality and information) proposed for a system is to conduct a content inventory. Content inventories identify all of the proposed content for a system, where the content currently resides, who owns it and any existing relationships between content. Content inventories are also commonly used to aid the process of migrating content between the old and new systems.

Users 

An effective information architecture must reflect the way people think about the subject matter. Techniques for getting users involved in the creation of an information architecture include card sorting and card-based classification evaluation.

Card sorting involves representative users sorting a series of cards, each labelled with a piece of content or functionality, into groups that make sense to them. Card sorting generates ideas for how information could be grouped and labelled.

Card-based classification evaluation is a technique for testing an information architecture before it has been implemented. The technique involves writing each level of an information architecture on a large card, and developing a set of information-seeking tasks for people to perform using the architecture.

More about information architecture next time...

Wednesday, December 7, 2011

Engineering Change Process

Let's look at the engineering change process and how Engineering Change Order (ECO) is used in this process.

The stages of the engineering change process are:

1. Issue identification & scoping:

Someone identifies a problem or issue and determines that it may require a change. The scope of the issue and its possible impact are estimated.

2. ECR creation:

An engineering change request (ECR) is created to examine the necessity and feasibility of the change, to identify parts, components and documentation that might be affected, to estimate costs and to list the resources required to implement the change.

3. ECR review:

The ECR is circulated for review and discussion among key stakeholders and is modified as needed.

4. ECO creation:

Once the ECR is approved, an engineering change order (ECO) is generated, which lists the items, assemblies and documentation being changed and includes any updated drawings, CAD files, standard operating procedures (SOPs) or manufacturing work instructions (MWIs) required to make a decision about the change.

5. ECO review: The ECO is then circulated to a change review board made up of all stakeholders (including external partners when appropriate) who need to approve the change.

6. ECN circulation:

Once the ECO has been approved, an engineering change notification/notice (ECN) is sent to affected individuals to let them know that the ECO has been approved and the change should now be implemented.

7. Change implementation:

Those responsible for implementation use the information in the ECO and ECN to make the requested change. While an engineering change order is used for changes that are executed by engineering, other types of change orders may be used by other departments. These include the:

• Manufacturing change order (MCO) — A change order describing modifications to the manufacturing process or equipment.

• Document change order (DCO) — A change order detailing modifications to documents, specifications or SOPs.

ECO benefits

While you may groan at the prospect of pulling together another set of documentation, an ECO is a critical part of keeping product development on track and making sure product information is accurate. A good ECO contains the full description, analysis, cost and impact of a change, and a good ECO process ensures that all stakeholders have bought in to the change. Having an organized method of handling product changes reduces potential design, manufacturing and inventory errors, minimizes development delays and makes it easy to get input from different departments, key suppliers and contract manufacturers.

Following good ECO practices also makes it easy to document a full history of what changes have been made to a product and when they occurred. In industries with regulatory requirements, like the medical device industry, having a full history of every change to a product is mandatory. Depending on the industry, change orders and even the change process itself may be audited by a regulatory body. Keeping a record of product changes will also help you debug any problems that occur after your product launches. The task of identifying and fixing the root cause of any problem is easier when you have a complete product change history.

Without a clear ECO process in place, making a change to a product can set off a chain of costly, time-consuming and avoidable events. Take a part switch that happens late in the development process. Engineering may tell manufacturing to be aware of the new part, but if that information is never conveyed to the purchasing department, the old part will be ordered. When the components arrive, manufacturing will not be able to assemble the product, and its launch will be delayed until the new part is obtained (most likely with some rush charges incurred along the way).

Engineering change orders make it possible to accurately identify, address and implement product changes while keeping all key stakeholders in the loop and maintaining a historical record of your product. Without them, miscommunications occur that lead to delays, incorrect purchase orders and improper product builds.

Companies need to be able to adapt quickly in today’s constantly changing environment, and often that means making changes to their products. Engineers make modifications during development and production with the intent of adding functionality, improving manufacturing performance or addressing the availability of a particular part.

To make sure proposed changes are appropriately reviewed, a solid process is critical, especially if members of your product team are scattered across multiple locations (for instance, design engineers in Boston, the manufacturing team in St. Louis and component manufacturers all over the world). At the heart of a solid change process is the engineering change order.

Engineering Change Orders: Paper-Based vs. Electronic Documentation Systems


Managing Paper ECO
Managing Electronic ECO
Cycle Time
·         ECO is generally reviewed one person at a time.
·         If multiple copies are distributed, edits must be consolidated and reviewed again.
·         Paper ECO can be misplaced ECO review can be a long process (weeks).
·         ECO can be reviewed by many people  at  once.
·         All edits are made to a single version, so no consolidation is needed.
·         ECO is always available online.ECO review is significantly shorter process (days).
Signature Process
·         Early approvers won’t be aware of edits, necessitating additional rounds of review.
·         Official approval disappears if the ECO file is lost.
·         Harder to maintain clean, complete history of changes.

·         All approvers sign off on the same set of documentation.
·         Electronic signature is 21 CFR part 11 compliant, a requirement for the medical device industry.
·         Automatic maintenance of clean history for audits.
Issue Resolution
·         Individuals need to be tracked down to resolve problems.
·         May need to wait for change control review board meeting to connect with other approvers.
·         People’s comments can be viewed, so hold-ups can be quickly resolved
·         Can easily see who hasn’t signed and request approval electronically.
Package Format
·         Large paper file of documents and drawings must be printed.
·         Tedious and labor-intensive to pull together information from many locations.
·         Electronic documentation is environmentally friendly.
·         Easy to create and access ECO when managed in the same system as underlying product information.

Tuesday, December 6, 2011

Document Control

The primary purpose of document control is to ensure that only current documents and not documents that have been superseded are used to perform work and that obsolete versions are removed. Document control also ensures that current documents are approved by the competent and responsible for the specific job persons and are distributed to the places where they are used. In regulated industries, this function is mandatory.

If a company is ISO 9001 compliant, it has document control it place. Document control is a part of ISO 9001 and GMP/GxP requirements.

There are four steps in the document control system:

1. Define the scope of the document control system (which documents must be controlled).

2. Develop a document authorization or approval system.

3. Ensure that only current versions of documents are used at the work places where they are used.

4. Out of date documents that need to be archived must be suitably identified.

Let's look at these steps more closely.

1. Identify which documents need to be controlled

The first task is to identify which documents need to be controlled.

The most important rule of document control is that only current documents must be used for work. You may find that other documents are less vital and it may not be worth the effort of controlling them.

You may even decide to have different levels of document control for different types of documents, e.g. formal approval and controlled distribution for procedures and work instructions, controlled distribution for your list of legal requirements, etc.

2. Define a document approval system

An approval/authorization system ensures that distributed documents are appropriate for persons receiving them. A responsible person must approve these documents. This approval can be in two different forms: on paper documents it will be a signature, on electronic documents it will be either electronic signature or they may only be published when they are approved. If there is no electronic signature, the approval usually can be verified through electronic workflow the document went through to get approved.

When a new document is created or a document is going through a change, Engineering Change Order (ECO) is used to document and approve the document creation or changes. It can also be called Engineering Change Notice (ECN) or Document Change Notice (DCN). ECO outlines the proposed change, lists the product or part(s) that would be affected and requests review and approval from the individuals who would be impacted or charged with implementing the change. ECOs are used to make modifications to components, assemblies, associated documentation and other types of product information.

The change process starts when someone identifies an issue that may need to be addressed with a change to the product. It ends when the agreed-upon change is implemented. ECOs are used in between to summarize the modifications, finalize the details, and obtain all necessary approvals.

3. Assure appropriate distribution

During this step, you need to make sure that everyone who needs the document gets a copy.

Distribution may be physical (paper documents) or electronic. When posting the document on intranet or other electronic systems, ensure that everybody who needs to have the new document knows about the posting (e.g. through an email or workflow notifications). When distribution is physical (paper documents), documents need to be stamped to identify that this is a controlled document and that a user of this document needs to verify that this is the most current version before starting work.

An inventory of controlled documents should be created with the exact location of each controlled document.

4. Remove old and obsolete documents

This is easy if you use an electronic documentation management system but is more complicated with paper documents.

You may request the receiver of new documents to send back obsolete ones. If for some reason you need to retain obsolete versions of documents, they need to be marked to avoid unintended use. Many organisations use a stamp: "obsolete document".

Next time: I am going to talk more about ECO and engineering change process.

Monday, December 5, 2011

CMS Types

There are three major types of CMS: offline processing, online processing, and hybrid systems. These terms describe the deployment pattern for the CMS in terms of when presentation templates are applied to render content output from structured content.

Offline processing

These systems pre-process all content, applying templates before publication to generate Web pages. Since pre-processing systems do not require a server to apply the templates at request time, they may also exist purely as design-time tools.

Online processing

These systems apply templates on-demand. HTML may be generated when a user visits the page or pulled from a cache. Most open source CMS have the capability to support add-ons, which provide extended capabilities including forums, blog, wiki, web stores, photo galleries, contact management, etc. These are often called modules, nodes, widgets, add-ons, or extensions. Add-ons may be based on an open-source or paid license model. Different CMS have significantly different feature sets and target audiences.

Hybrid systems

Some systems combine the offline and online approaches. Some systems write out executable code (e.g., JSP, ASP, PHP, ColdFusion, or Perl pages) rather than just static HTML, so that the CMS itself does not need to be deployed on every web server. Other hybrids operate in either an online or offline mode.

There are literally thousands of content management systems available on the internet. Each one caters to different users offering a variety of features which are consistent across all sites.

There are four main types of content management systems that each of the thousands fall under. The systems include:

1) Homegrown

2) Commercial

3) High-end

4) Open Source

Homegrown content management systems are software created by a single development company for their own use of their own products. Consequently, every aspect of the system is catered to their specific needs since they are the only organization utilizing it. The main issue is the development company relies on a single vendor to fix the bugs and create patches.

The second type is commercial content management systems. This is the most widespread offering many different pricing options, plans and features. Unlike homegrown systems, these are rarely customizable.

The third type is high-end content management systems. One nice feature is their reliability as high-end content management systems deliver robust solutions.

The final content management system is open source. This essentially means that the software is available to anyone for free. The primary advantages are the price (free!) and that these systems are fully customizable since its open source code. The main limitation is the quality of the product. They often lack stability, security, and support of certain infrastructures as you would often expect from free software.

The selection of the type of content management system is based on a customer's need. If they are a company that needs customization and price is not an issue, a homegrown system or high-end system might be the most viable option.

On the other hand, if a price tag is problematic and customization is not important then a commercial content management system is the best choice. Finally, if the consumer is not concerned with stability, security and support and likes the price tag and customization options, open source is the way to go.

Like any type of product, you get what you pay for. Those that have the money will purchase the best content management system, those that do not will function with a potentially unreliable product. Either way, the selection is based on an individual need and due to the availability of thousands of content management systems, there are many options out there.

Friday, December 2, 2011

Content Management Systems (CMS)

In order to efficiently manage content, a content management system is required. A CMS is a tool that enables a variety of (centralised) technical and (de-centralised) non technical staff to create, edit, manage and finally publish (in a number of formats) a variety of content (such as text, graphics, video, documents etc), whilst being constrained by a centralised set of rules, process and workflows that ensure coherent, validated electronic content.

A Content Management System (CMS) has the following benefits:

  1. allows for a large number of people to contribute to and share stored data;
  2. increased ability for collaboration;
  3. facilitates document control, auditing, editing, and timeline management;
  4. controls access to data, based on user roles. User roles define what information each user can view or edit;
  5. aids in easy storage and retrieval of data;
  6. reduces repetitive duplicate input;
  7. documents workflow tasks coupled with messaging which allow for formal review and approval of documents;
  8. the ability to track and manage multiple versions of a single instance of content.

Content Management Systems come in all shapes and sizes. The most known and used CMS are: Microsoft SharePoint, Interwoven, Vignette, Documentum, Livelink, Oracle ECM suite. There are also open source CMS. The most known are Alfresco, Drupal, Joomla.

One of CMS types is web content management system WCMS. It can be used to control a dynamic collection of web material, including HTML documents, images, and other forms of media.

A CMS typically has the following features:

Automated templates

Create standard output templates (usually HTML and XML) that can be automatically applied to new and existing content, allowing the appearance of all content to be changed from one central place.

Access Control

CMS systems support user groups. User groups allow you to control how registered users interact with the site. A page on the site can be restricted to one or more groups. This means if an anonymous user (someone not logged on) or a logged on user who is not a member of the group a page is restricted to, the user will be denied access to the page.

Scalable expansion

Available in most modern CMSs is the ability to expand a single implementation (one installation on one server) across multiple domains, depending on the server's settings. CMS sites may be able to create microsites/web portals within a main site as well.

Easily editable content

Once content is separated from the visual presentation of a site, it usually becomes much easier and quicker to edit and manipulate. Most CMS software includes WYSIWYG editing tools allowing non-technical individuals to create and edit content.

Scalable feature sets

Most CMS software includes plug-ins or modules that can be easily installed to extend an existing site's functionality.

Web standards upgrades

Active CMS software usually receives regular updates that include new feature sets and keep the system up to current web standards.

Workflow management

Workflow is the process of creating cycles of sequential and parallel tasks that must be accomplished in the CMS. For example, one or many content creators can submit a story, but it is not published until the copy editor cleans it up and the editor-in-chief approves it.

Collaboration

CMS software may act as a collaboration platform allowing content to be retrieved and worked on by one or many authorized users. Changes can be tracked and authorized for publication or ignored reverting to old versions. Other advanced forms of collaboration allow multiple users to modify (or comment) a page at the same time in a collaboration session.

Delegation

Some CMS software allows for various user groups to have limited privileges over specific content on the website, spreading out the responsibility of content management.

Document management

CMS software may provide a means of collaboratively managing the life cycle of a document from initial creation time, through revisions, publication, archive, and document destruction.

Content virtualization

CMS software may provide a means of allowing each user to work within a virtual copy of the entire web site, document set, and/or code base. This enables changes to multiple interdependent resources to be viewed and/or executed in-context prior to submission.

Content syndication

CMS software often assists in content distribution by generating RSS and Atom data feeds to other systems. They may also e-mail users when updates are available as part of the workflow process.

Multilingual

Ability to display content in multiple languages.

Versioning

CMS allows the process of versioning by which documents are checked in or out of the CMS, allowing authorized editors to retrieve previous versions and to continue work from a selected point. Versioning is useful for content that changes over time and requires updating, but it may be necessary to go back to or reference a previous copy.

Next time: CMS Types

Thursday, December 1, 2011

What is DITA?

The Darwin Information Typing Architecture (DITA) is an XML-based architecture for authoring, producing, and delivering information. Although its main applications have so far been in technical publications, DITA is also used for other types of documents such as policies and procedures. The DITA architecture and a related DTD and XML Schema were originally developed by IBM. The architecture incorporates ideas in XML architecture, such as modular information architecture, various features for content reuse, and specialization, that had been developed over previous decades. DITA is now an OASIS standard.

The first word in the name "Darwin Information Typing Architecture" is a reference to the naturalist Charles Darwin. The key concept of "specialization" in DITA is in some ways analogous to Darwin's concept of evolutionary adaptation, with a specialized element inheriting the properties of the base element from which it is specialized.

DITA content is written as modular topics, as opposed to long "book-oriented" files. A DITA map contains links to topics, organized in the sequence (which may be hierarchical) in which they are intended to appear in finished documents. A DITA map defines the table of contents for deliverables. Relationship tables in DITA maps can also specify which topics link to each other. Modular topics can be easily reused in different deliverables. However, the strict topic-orientation of DITA makes it an awkward fit for content that contains lengthy narratives that do not lend themselves to being broken into small, standalone chunks. Experts stress the importance of content analysis in the early stages of implementing structured authoring.

Fragments of content within topics (or less commonly, the topics themselves) can be reused through the use of content references (conref), a transclusion mechanism.

DITA includes extensive metadata elements and attributes, which make topics easier to find.

DITA specifies three basic topic types: Task, Concept and Reference. Each of the three basic topic types is a specialization of a generic Topic type, which contains a title element, a prolog element for metadata, and a body element. The body element contains paragraph, table, and list elements, similar to HTML.

A Task topic is intended for a procedure that describes how to accomplish a task. A Task topic lists a series of steps that users follow to produce an intended outcome. The steps are contained in a taskbody element, which is a specialization of the generic body element. The steps element is a specialization of an ordered list element. Concept information is more objective, containing definitions, rules, and guidelines.

A Reference topic is for topics that describe command syntax, programming instructions, and other reference material, and usually contains detailed, factual material. DITA allows adding new elements and attributes through specialization of base DITA elements and attributes. Through specialization, DITA can accommodate new topic types, element types, and attributes as needed for specific industries or companies.

The extensibility of DITA permits organizations to specialize DITA by defining specific information structures and still use standard tools to work with them. The ability to define company-specific information architectures enables companies to use DITA to enrich content with metadata that is meaningful to them, and to enforce company-specific rules on document structure. DITA map and topic documents are XML files. As with HTML, any images, video files, or other files which need to appear in output are inserted via reference. Any XML editor can therefore be used to write DITA content, with the exception of editors that support only a limited set of XML schemas (such as XHTML editors). Various editing tools have been developed that provide specific features to support DITA, such as visualization of conrefs.

Next time: technology for managing content.

Wednesday, November 30, 2011

Metadata Schemes

In my last post, I mentioned that there are three types of metadata: descriptive, structural, and administrative. Today, I am going to talk more about metadata schemes.

Many different metadata schemes are being developed in a variety of user environments and disciplines. I will discuss the most common ones in this post.

Dublin Core

The Dublin Core Metadata Element Set arose from discussions at a 1995 workshop sponsored by OCLC and the National Center for Supercomputing Applicatons (NCSA). As the workshop was held in Dublin, Ohio, the element set was named the Dublin core. The continuing development of the Dublin Core and related specifications is managed by the Dublin Core Metadata Initiative (DCMI).

The original objective of the Dublin Core was to define a set of elements that could be used by authors to describe their own Web resources. Faced with a proliferation of electronic resources and the inability of the library profession to catalog all these resources, the goal was to define a few elements and some simple rules that could be applied by noncatalogers. The original 13 core elements were later increased to 15: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.

Because of its simplicity, the Dublin Core element set is now used by many outside the library community - researchers, museums, music collectors to name only a few. There are hundreds of projects worldwide that use the Dublin Core either for cataloging or to collect data.

Meanwhile the Dublin Core Metadata Initiative has expanded beyond simply maintaining the Dublin Core Metadata Element Set into an organization that describes itself as "dedicated to promoting the widespread adoption of inter-operable operable metadata standards and developing specialized metadata vocabularies for discovery systems.

The Text Encoding Initiative (TEI)

The Text Encoding Initiative is an international project to develop guidelines for marking up electronic texts such as novels, plays, and poetry, primarily to support research in the humanities.

TEI also specify a header portion, embedded in the resource, that consists of metadata about the work. The TEI header, like the rest of the TEI, is defined as an SGML DTD Document Type Definition) — a set of tags and rules defined in SGML syntax that describes the structure and elements of a document. This SGML mark-up becomes a part of electronic resource itself.

Metadata Encoding and Transmission Standard METS)

The Metadata Encoding and Transmission Standard (METS)was developed to fill the need for a standard data structure for describing complex digital library objects. METS is an XML Schema for creating XML document instances that express the structure of digital library objects, the associated descriptive and administrative metadata, and the names and locations of the files that comprise the digital object.

Next time: architecture for for authoring, producing, and delivering information.

Monday, November 28, 2011

Metadata

What is metadata? Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use, or manage information resources. Metadata is often called data about data or information about information. It is used to describe data.

For example, a digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.

There are three main types of metadata:

Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, keywords.

Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.

Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets ofadministrative data; two of them sometimes are listed as separate metadata types:

Rights management metadata which deals with intellectual property rights.

Preservation metadata which contains information needed to archive and preserve a resource.

Metadata can describe resources at any level. It can describe a collection, a single resource, or a component which is a part of a larger resource (for example, a photograph in an article).

Metadata can be embedded in a digital object or it can be stored separately.

Metadata is often embedded in HTML documents and in the headers of image files.

Storing metadata with the object it describes ensures the metadata will not be lost, eliminates problems of linking between data and metadata, and helps to ensure that the metadata and object will be updated together.

However, it is impossible to embed metadata in some types of objects for example, artifacts. Also, storing metadata separately can simplify the management of the metadata itself and facilitate the search and retrieval. Therefore, metadata is commonly stored in a database system and linked to the objects described.

More about metadata next time...

Tuesday, November 22, 2011

Taxonomy Development Process

Guided by the key factors, we can define and follow a taxonomy development process that addresses business context, content, and users. The steps in creating taxonomy are: assemble a team, define a scope, create, implement, test, maintain.

Assemble a team

Successful taxonomy development requires both taxonomy expertise and in-depth knowledge of the corporate culture and content. Therefore a taxonomy team should include subject matter experts or content experts from the business community who have in-depth knowledge of corporate culture and content. For small projects, the group may simply be part of a user focus group that is concentrating on the taxonomy task. Taxonomy interrelates with several aspects of web development, including website design, content management, and web search. So, these roles should be included in the taxonomy team. Common considerations are overall project scope, target audience, existing organizational taxonomy initiatives, and corporate culture.

Define scope
    Answering the following questions would help to define the scope of taxonomy:

    Business context

  1. What is the purpose of the taxonomy?

  2. How is the taxonomy going to be used?

  3. Content

  4. What is the content scope? (Possibilities include company-wide, within an organizational unit, etc.)

  5. What content sources will the taxonomy be built upon? (Specifically, the locations of the content to be covered in the taxonomy.)

  6. User

  7. Who will be using the taxonomy? (Possibilities include employees, customers, partners, etc.)

  8. What are the user profiles?
This step should also define metrics for measuring the taxonomy values. For websites, baselines should be established for later comparison with the new site. An example would be the number of clicks it takes a site visitor to locate certain information.

Create taxonomy

Taxonomy creation can either be manual, automated, or a combination of both. It involves analyzing context, content, and users within the defined scope. The analysis results serve as input for the taxonomy design, including both taxonomy structure and taxonomy view. The taxonomy development team is responsible for the actual mechanics of taxonomy design, whereas the taxonomy interest group is responsible for providing consultation on content inclusion, nomenclature, and labeling.

The design of the taxonomy structure and taxonomy view may run in tandem, depending on the resources available and project time frame. All concepts presented through the taxonomy view need to be categorized properly according to the taxonomy structure. This will ensure that every content item is organized centrally through the same classification schema.

Along with taxonomy structure and taxonomy view, standards and guidelines must be defined. There should be a categorizing rule for each category in taxonomy view and taxonomy structure. In short, you must define what type of content should go under any given category. Content managers can then refer to these rules when categorizing content. If an automated tool is used for content tagging, these rules can be fed to the tagging application. Standards and guidelines help ensure classification consistency, an important attribute of a quality content management system and search engineering process.

Implement the taxonomy

The next step includes setting up the taxonomy and tagging content against it. This is often referred to as "populating" the taxonomy. Similar to taxonomy creation, implementation can be manual, automated, or a combination of both. The goal here is to implement the taxonomy into the website design, search engineering, and content management.

For website design, taxonomy view provides the initial design for the site structure and interface. The focus is on the concepts and groupings, not so much on nomenclature, labeling, or graphics. There may be a need to go through multiple iterations, moving from general to specific in defining levels of detail for the content. Types of taxonomy view include site diagrams, navigation maps, content schemes, and wire frames. The final site layout is built by applying graphical treatment to the last iteration of taxonomy view.

For search engineering, implementation can be accomplished in various ways. Taxonomy structure as a classification schema can be fed into a search engine for training purposes or integrated with the search engine for a combination of category browsing and searching. In the latter case, the exposed taxonomy structure is essentially a type of taxonomy view. One of the most challenging aspects of taxonomy implementation is the synchronization between the search engine and the taxonomy, especially for search engines that do not take taxonomic content tagging in the indexing process. In such cases, a site visitor may receive different results from searching and browsing the same category, which could prove confusing.

Taxonomy structure needs to be integrated within the content management process. Content categorization should be one of the steps within the content management workflow, just like review and approval. If a content management tool is available, the taxonomy structure is loaded into the tool, either through a manual setup process, or imported from a taxonomy created externally. Through the content management process, content is tagged manually or automatically against the taxonomy. In other words, the taxonomy is populated with content.

Test

The goal of testing is to identify errors and discrepancies. The test results are then used to refine the taxonomy design. The testing should be incorporated into the usability testing process for the entire web application, including back-end content management testing and front-end site visitor testing. Here is a sample checklist of testing topics:

Given specific information topics, can the site visitors find what they need easily, in terms of coverage and relevancy? Given specific information topics, how many clicks does it take before a site visitor arrives at the desired information? Given specific tasks, can the site visitors accomplish them within a reasonable time frame? Do the labels convey the concepts clearly or is there ambiguity? Are the content priorities in sync with the site visitors' needs? Does the structure allow content managers to categorize content easily?

Testing results are recorded and can later be compared with the baseline statistics to derive the measurements of improvements.

Maintain

Taxonomy design and fine-tuning is an ongoing process similar to content management. As an organization grows or evolves, its business context, content, and users change. New concepts, nomenclature, and information need to be incorporated into the taxonomy. A change management process is critical to ensure consistency and currency.

Better structure equals better access

Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate website design, content management, and search engineering. If well done, taxonomy will allow for structured web content, leading to improved information access.

Next time: what is metadata?

Monday, November 21, 2011

Taxonomy and Enterprise Content Management


Taxonomy is a hierarchical structure for the classification and/or organization of data. In content management and information architecture, taxonomy is used as a tool for organizing content. Development of an enterprise taxonomy requires the careful coordination and cooperation of departments within your organization. 

Once the taxonomy is created, it needs to be managed. There is no such thing as "finished taxonomy". Taxonomy needs to be revisited and revised periodically. Why? Business changes, new content is created, old content is archived.

The two key aspects of taxonomy are taxonomy structure and taxonomy view. Taxonomy structure provides a classification schema for categorizing content within the content management process. Taxonomy view is a conceptual model illustrating the types of information, ideas, and requirements to be presented on the Web. It represents the logical grouping of content visible to a site visitor and serves as input for Web site design and search engineering. Together, these concepts can guide your Web development efforts to maximize return on investment. Build it right, and they will come.

There are the three key factors of taxonomy development: business context, users, and content.

These factors reflect the fundamental business requirements for most taxonomy projects. Strategically, they provide a "trinity compass" for the road of taxonomy development.

Here's a description of each factor:

"Business context" is the business environment for the taxonomy efforts in terms of business objectives, Web applications where taxonomy will be used, corporate culture, past or current taxonomy initiatives, and artifacts within the organization and across the industry.

"Users" refers to the target audience for the taxonomy, user profiles, and user characteristics in terms of information usage patterns.

"Content" is the type of information that will be covered by the taxonomy or that the taxonomy will be built upon.

There are two common techniques for taxonomy strategy.

Universal Taxonomy

A single taxonomy is used to store and deliver content. When content contributors utilize the content management system, they add, remove, and manage content in a structure that closely resembles the navigation and hierarchy of the delivery framework (your website or application). The navigation structure is the taxonomy.

This method is conceptually simple and makes it quite easy to dynamically build your navigation from knowledge of this hierarchy. However, this model does have drawbacks:

Every time you reorganize the website, the organization of content in your management application shifts. Admittedly, this isn’t much of a drawback if you’re managing content for one moderately sized site or if your team of contributors is small.

It is difficult to reuse content in this structure. If you hope to reuse assets throughout your website, where are they organized in this structure?

In an environment with many contributors and diverse security requirements, organizing content (in the management application) in another way, say by contributor or by department, may be more intuitive.

Content Mapping

A more robust, albeit more complex, method of managing content is to maintain structures and metadata in the content management application that is independent of the delivery system’s organization (navigation).

Content is organized, at the source, as may be required by your security, workflow, or organizational needs. Perhaps your data lives in a content management system or database where different organizational mechanisms exist. Unfortunately, the navigation for your consuming application (the presentation framework) is often managed by some other means.

By some rule or algorithm, leveraging your content classification data, material gets “mapped” to the presentation framework.

Advantages of this model:

There may be more than one way to organize content (think: content reuse). Given the same set of content, same set of classification criteria, but multiple algorithms, we can now build a delivery framework that allows for many methods of organization.

You no longer need to reorganize your content management application to change the delivery application. Just the algorithms (mappings) change.

Drawbacks:

If you hope to build your navigation dynamically, often you’ll need to build a tool or alternate hierarchy. You may not find much value in the content’s taxonomy.

Content, in your management environment, may be orphaned in your presentation framework if there are no rules mapping to an accessible part of the site.

Parts of the site may only be sparsely populated. It may not be readily obvious that you are creating gaps (with little or no content) in your site.

While powerful, this technique can be difficult to administer without having a fairly comprehensive understanding of the site design and algorithms for "mapping".

Assuming there are hierarchical structures within your content classification system, there is a very good chance that valuable information exists in the hierarchy. By taking advantage of relationships within your hierarchical metadata structures, richer algorithms may be developed for your content delivery framework.