Monday, June 11, 2012
Taxonomy and Controlled Vocabulary
A taxonomy is an organizing principle. It is a foundation on which to base any kind of system. It does not matter what kind of project you are involved in, it will benefit from clearly defined, concise language and terminology. A taxonomy and controlled vocabulary help to fine tune search tools, they creates a common language for sharing concepts, and it allows an efficient organization of documents and content across information sources.
Whether a structured tool such as a CRM system, or a less structured one, like a content management system that organizes information for web sites or intranets, all technologies that deal with information require a basis in taxonomy. This is even more important when various systems must interact.
Taxonomies use controlled vocabularies. For example, the issue of language: I call the person I do business with a Customer. Someone else calls them a Client. When we need to exchange or combine or analyze data which entity are we talking about? What is the document that outlines what we are providing, is it a statement of work, a proposal, an SOW or something else? Controlled vocabulary helps to make terms consistent.
When employees search for information, do they use language that is unambiguous? Can this information be easily found and re-purposed? Are employees sure they are not recreating information that already exists?
These are important questions, but there are larger issues that can have an even greater impact on the organization. Are all of these challenges of business going to be magically solved with a taxonomy? Of course not, but if the underlying structure is not in place, then essential tools, technologies and processes will not function together. Connecting system A to system B makes little sense when a common language has not been established to have information make sense in the new context.
Consider what happens if each department does their job, but accounting people spoke British English, IT spoke a Cajun dialect, legal an inner city slang, and business people spoke the language of scientific researchers. For all practical purposes, the languages they use in communicating with their professional peers are as different as these corners of the English language. In order for documents and pieces of content to be reusable and understandable in all of these different contexts and for these different audiences we need to develop a Rosetta stone of the enterprise. That is an enterprise taxonomy and controlled vocabulary.
Some people think that this is an insurmountable task – getting people to agree on common terms and meanings. Language is too ambiguous and variable, needs are too diverse to be able to develop a common denominator of communication for all circumstances. Instead we create a structure for defining and applying terms and for managing change. The alternative is uncontrolled and chaotic. But too much control is impractical. Determining where to control and centralize and where to allow variability is part of the process of developing and implementing an enterprise taxonomy and controlled vocabulary.
There is a prevalent opinion that a Google-like search interface is the answer to the search problem. There are many reasons why this is not true. One is that in a company, many of the clues that Google uses to deliver results are missing. Google will use links between sites to determine how to rank results. If lots of other sites point to a document then that document is deemed to be more valuable. In the corporate intranet, there is no equivalent way of ranking results.
Another fundamental flaw with pure search solutions is that meaning, value, and applicability are context dependent. The usefulness of a piece of content is in the eye of the beholder. A document is useful to a person if this person can use it to solve a problem. This depends upon this person role, task, and background.
A search engine cannot determine these factors and present results based on this person's needs. However, if you perform some process analysis in order to understand a user’s tasks and how they go about solving their problem, you can present information in anticipation of their needs. The role of a taxonomy and controlled vocabulary is to define the labels that correspond to user tasks, experience, needs, and context that helps to refine their search or guide their navigation.
Part of the analysis phase in taxonomy development is to understand what users are trying to accomplish, and then present a set of documents that users should look at when they are performing these tasks. For example, a sales person may be preparing a proposal for a customer. If he/she searches in a large repository for documents, he/she will likely pull up a lot of documents that may contain the term "proposal", but they may not be example proposals that he/she can use.
On the other hand, if this sales person defines the business development function as including proposal creation, he can find sample proposals that will be useful. You can define a tag called "sample proposal" or some other label that we agree will designate documents that can be useful for this purpose.
You may want to go further and define the specific industry, the product or service offering, the size of the deal and so on. By carefully defining labels for the documents you can search based on these labels or navigate to a place where these documents reside. These results will be precisely for your task at hand and will save you from creating a proposal from scratch or from endless searching for relevant documents.
So in the first case search using "proposal" retrieved perhaps hundreds of documents containing the term proposal. In the second case the search contains a smaller subset of documents that more closely meet your criteria.
Imagine that in one repository you refer to proposals for customer service outsourcing as "service outsourcing" and in another repository, you refer to it as "business process outsourcing". If you search on one term, you really also want the documents with the other term. These terms are synonymous. You could make a note of terms that may be used interchangeably and apply a synonym ring to the search mechanism, enabling search on one term to return documents containing the other terms.
As we just observed, search is one area where taxonomies can be leveraged. What about navigation also called browsing? Some people equate taxonomy with navigation. Taxonomy makes navigation possible. By understanding the underlying structure of information and how people access that information, you can propose a structure by which users can click through the content. Navigational structures directly reflect the taxonomy. For example, if you organize content according to departments or functional areas, with geographies comprising navigational nodes, this would be your taxonomy. In other cases, users may navigate according to a task or business process that could start out with a geography and then move to a task, such as customer service.
Taxonomy Development and Maintenance
Taxonomy and controlled vocabulary development and maintenance is an ongoing process. This is very important process. It is essential that we agree on terminology in order to integrate, collaborate, and communicate most effectively. Not addressing this issue will lead to more problems of information overload, difficulties in integrating systems and inefficiencies in the organization.
The short term goal should be to educate your organization on these issues, medium term - to begin the process of formalizing sharing and application of consistent language across systems and processes, long term - the goal would be to develop a mature process for ongoing maintenance and governance of enterprise taxonomies. It is important to start the process now, rather than wait for search, navigation, and access of information to become a big problem.