Galaxy Consulting Blog

Friday, November 8, 2013

New Trends in Records Management

Records management has changed from what it was before and there are new trends in it.

Records management professionals should adapt to it and be open to collaboration and working cross-functionally with business users to develop their road-map for the identification and exploration of new sources of corporate records.

Technology has evolved since the first wave of records management tools entered the market. Regulations changed and they now cover the broader spectrum of electronically stored information, for example, video, social media or instant messaging, not just the images or office documents that have traditionally been called "records." Information governance is emerging as a term that describes and supports a holistic, life cycle view of the creation, use, and protection of digital information.

Let's look more closely at these new trends.

Records management shifts to information governance.

Work-in-progress documents, structured or semi-structured information require governance, including protection from unauthorized access, use, deletion or disclosure across their life cycle. In other words, the frequently changing data in your system reflects your business activities. Holds and discovery to meet litigation or audit need to be available regardless of the item's record status or storage location, i.e. on-premises or in a cloud service.

Many businesses continue to lack confidence in the progress of their electronic records management programs, compliance initiatives, and e-discovery preparedness. The shift to a more comprehensive and proactive management of information across its entire life cycle has begun.

The concepts of file plans, folders, file parts and cutoffs have imposed constraints on how electronic information is organized and managed. The vocabulary of the paper records center is no longer relevant in a digital-first organization. Today, an information worker can compose an agreement in a cloud application, such as Google Docs, share it with a colleague, edit it on a tablet device and push revisions to a customer collaboration site even while on an airplane. But, business records continue to be generated outside traditional desktop applications.

New vendors are taking fresh approach to addressing compliance, categorization and retention requirements. The shift to a more comprehensive and proactive management of information across its entire business life cycle rather than just its end has begun. They approach governance of information beyond the "record." Technologies with strong roots in search, archiving and retention management offer the capabilities to manage many forms of electronically stored information, such as social activity or rich media, even when such information is not formally flagged as a record.

Cloud and social platforms render "file and declare" ineffective.

Traditional records management tools are slow to make the leap to the cloud. Records managers may be at risk of holding obsolete assumptions about importing or filing content into an on-premises records repository.

Records and compliance managers remain wary of cloud and social platforms. Enterprise architectures and their peers in records management practice groups are not aligned on cloud computing benefits. Cloud providers are increasingly supporting content segregation, security, privacy and data sovereignty requirements to attract regulated industries, and they are offering service-level agreements designed to reduce risks. In spite of that, records managers still cite security, legal and privacy risks as the top three reasons to stall adoption of cloud services and SaaS.

Current records management systems are already missing many forms of electronically stored information. Older types of electronically stored information, such as images, e-mail or office documents, are often captured into traditional records management applications. Newer content types are less likely to have policy-driven life cycle or retention rules applied. Mobile messages, social media posts or websites are important sources of discoverable information, but application of legal holds to that content can be difficult.

Digital preservation forces itself onto the governance agenda.

Digital records that have a long-term retention schedule are at risk when hardware devices, software applications and file formats become obsolete. Obsolete software file format is also a concern. Many first generation business and personal productivity tools are retired, and the inability to retrieve or view older digital records is becoming a reality.

Organizations are slowly waking up to digital preservation concerns. Migration, conversion and adoption of open standards are accepted approaches to solve the problem of accessibility over time. Those approaches, however, are not widely adopted at this time.

Decisions to retire older enterprise applications raise content preservation concerns. As organizations begin infrastructure renewal projects, particularly as new SaaS and cloud-based applications become viable alternatives, IT and records professionals must assess the risk of losing information in those older systems. Decisions to maintain older systems in read-only mode, to migrate data into newer systems or to dispose of older systems all together must be made in accordance with business, legal and compliance needs.

Open standards and open source change the sourcing landscape.

Companies drive significant change in the software acquisition landscape by calling for deliberate adoption of open standards and open source. Governments are hedging against the potential loss of electronic information, software obsolescence and increased costs, as well as demanding more portable data. Between 2011 and 2012, several national governments published directives to help their IT, records and procurement managers to understand, investigate and select more open technology platforms.

Open standards help to address preservation, accessibility and interoperability needs. Open source helps to reduce cost and minimize vendor and platform lock-in. Programs developed by governments around the world have raised the profile and acceptance of open source software. Government in the United Kingdom, United States, and Europe have taken proactive approach for using software systems and file formats based on open standards.

Auto-categorization becomes viable and approachable.

Transactional, regulated and semi-structured content is ready for automated capture, categorization and application of retention policies. The electronic information that uses a consistent structure and embedded metadata, or includes predictable patterns of numbers or text lends itself to content analytics, entity extraction and categorization tools for ingestion and application of retention, disposition and security or privacy access controls. Opportunities to use auto-classification technologies for routine, high-volume, predictable electronic content are increasing as technology matures, more vendors provide integrated offerings, and use cases are identified.

Auto-classification joins compliance needs and business priorities. High-volume, transactional information is a pain point when storage costs escalate and discovery requests are made. Capture, categorization and retention schedule are functions that reduce costs, streamline customer service, and increase digitization of processes. Consistent organization creates a foundation upon which are based content analytics and predictive technology use. Consistent disposal of obsolete information reduces the need for more storage resources, facilitates faster retrieval of data, and lowers the cost of e-discovery.

Big content is as important as big data and requires well-thought governance. Big data gets a lot of attention, but organizations must also cope with information stored in semi-structured or unstructured forms. Tabular data often sits unnoticed and unanalyzed in files created by individuals or small teams. E-mail, spreadsheets and ad hoc databases are used for critical business decisions and often sit under the radar of compliance or audit managers on file shares, collaboration sites or personal computers. 70% of companies use spreadsheets for critical business decisions, but fewer than 34% apply governance or controls to them.

Technology enforces and automates defensible approaches to disposition. Organizations that demonstrate consistent and predictable approaches to information handling, including its final deletion, are more successful when e-discovery orders compel extensive search, retrieval, review and production activities. Automation of routine processes, including scheduled disposal, lends weight to retention programs when challenged by legal counsel or auditors. Auto-classification tools ensure that retention and disposition rules are applied within specific parameters and are supported by documented policy rationale and citations.

Thursday, October 31, 2013

Information Governance With SharePoint

The goals of any enterprise content management (ECM) system are to connect an organization's knowledge workers, streamline its business processes, and manage and store its information.

Microsoft SharePoint has become the leading content management system in today's competitive business landscape as organizations look to foster information transparency and collaboration by providing efficient capture, storage, preservation, management, and delivery of content to end users.

A recent study by the Association for Information and Image Management (AIIM) found that 53% of organizations currently utilize SharePoint for ECM. SharePoint's growth can be attributed to its ease of use, incorporation of social collaboration features, as well as its distributed management approach, allowing for self-service. With the growing trends of social collaboration and enhancements found in the latest release of SharePoint 2013, Microsoft continues to facilitate collaboration among knowledge workers.

As SharePoint continues to evolve, it is essential to have a solution in place that would achieve the vision of efficiency and collaboration without compromising on security and compliance. The growing usage of SharePoint for ECM is not without risk. AIIM also estimated that 60% of organizations utilizing SharePoint for ECM have yet to incorporate it into their existing governance and compliance strategies. It is imperative that organizations establish effective information governance strategies to support secure collaboration.

There are two new nice features in SharePoint 2013 version that would help you with compliance issues. E-discovery center is a SharePoint site that allows to get more control of your data. It allows to identify, hold, search, and export documents needed for e-discovery. "In Place Hold" feature allows to preserve documents and put hold on them while users continue working on them. These features are available for both on-premises and in-cloud solutions.

2013 SharePoint has been integrated with Yammer which provides many social features. This presents new challenge with compliance. Yammer is planning to integrate more security in future releases. But for now, organizations need to create policies and procedures for these social features. Roles like "Community Manager", "Yambassadors", "Group Administrators" might be introduced.

There are 3rd party tools that could be used with SharePoint for compliance and information governance. They are: Metalogix and AvePoint for Governance and Compliance, CipherPoint and Stealth Software for Encryption and Security; ViewDo Labs and Good Data for Yammer analytics and compliance.

In order to most effectively utilize SharePoint for content management, there are several best practices that must be incorporated into information governance strategies as part of an effective risk management lifecycle. The goal of any comprehensive governance strategy is to mitigate risk, whether this entails downtime, compliance violation or data loss. In order to do so, an effective governance plan must be established that includes the following components:

Develop a plan. When developing your plan, it is necessary for organizations to understand the types of content SharePoint contains before establishing governance procedures. It is important to involve the appropriate business owners and gather any regulatory requirements. These requirements will help to drive information governance policies for content security, information architecture and lifecycle management.

When determining the best approach to implement and enforce content management and compliance initiatives, chief privacy officers, chief information security officers, compliance managers, records managers, SharePoint administrators, and company executives will all have to work together to establish the most appropriate processes for their organization as well as an action plan for how to execute these processes. During the planning phase, your organization should perform an assessment, set your organization's goals, and establish appropriate compliance and governance requirements based on the results of the assessment to meet the business objectives.

Implement your governance architecture. Once your organization has developed a good understanding of the various content that will be managed through SharePoint, it is time to implement the governance architecture. In this phase, it is important to plan for technical enforcement, monitoring and training for employees that address areas of risk or noncompliance. It is important to note that while SharePoint is known for its content management functionality, there are specific challenges that come with utilizing the platform as a content management system for which your governance architecture must account: content growth and security management.

In order to implement effective content management, organizations should address and plan to manage growth of sites, files, storage, and the overall volume of content. Organizations without a governance strategy often struggle with proliferation of content with no solutions to manage or dispose of it. This is a huge problem with file servers. Over time, file servers grow to the point where they become a bit like the file cabinet collecting dust in the corner of your office. It is easy to add in a new file, but you will not find it later when you need it. The challenge comes from the planning on how to organize and dispose of out-of-date content.

SharePoint offers the technology to address these challenges, but only if it is enabled as part of your governance plan. Information management policies can be used to automatically delete documents, or you may be using third-party solutions to archive documents, libraries and sites. By default in SharePoint 2013, Shredded Storage is enabled to reduce the overall storage of organizations that are utilizing versioning. Remote BLOB Storage (RBS) can also be enabled in SharePoint or through third-party tools to reduce SharePoint's storage burden on SQL Server.

Tagging and classification plays a key role in information governance. Proper classification can improve content findability. Organizations can utilize SharePoint's extensive document management and classification features, including Content Types and Managed Metadata to tag and classify content. Third-party tools that extend SharePoint's native capabilities can also filter for specified content when applying management policies for storage, deletion, archiving, or preservation. Ultimately, however, the people in your organization will play the biggest role here. As such, your plan should identify who the key data owners are and the areas for which they are responsible. This role is often filled by a "site librarian" or those responsible for risk management in the enterprise.

In order to minimize risk to the organization, it is imperative to ensure information is accessible to the people that should have it, and protected from the people that should not have access. SharePoint has very flexible system of permissions that can accommodate this issue.

Ongoing assessments. In order to ensure that established governance procedures continue to meet your business requirements ongoing assessment is required. Conduct ongoing testing of business solutions, monitoring of system response times, service availability and user activity, as well as assessments to ensure that you have complied with your guidelines and requirements for properly managing the content. The content is essentially your intellectual property, the lifeblood that sustains your organization.

React and revise as necessary. In order to continue to mitigate risk, respond to evolving requirements, and harden security and access controls, we must take information gathered in your ongoing assessments and use that to make more intelligent management decisions. Continue to assess and react and revise as necessary. With each change, continue to validate that your system meets necessary requirements.

The risk has never been higher, especially as more data is created along an growing regulatory compliance mandates requiring organizations to ensure that its content is properly managed.

If you develop a plan, implement a governance architecture that supports that plan, assess the architecture on an ongoing basis, and react and revise as necessary, your organization will have the support and agility necessary to truly use all of the content it possesses to improve business processes, innovation, and competitiveness while lowering total costs.

Monday, October 28, 2013

Meeting the Social Media Challenge

When social media volume is low, it is typically handled manually by one or more people in a company. These people are assigned to check Facebook and/or Twitter a couple of times a day and respond when appropriate.

As the volume of inquiries grows, it becomes expensive to respond manually to the posts and comments, and nearly impossible to do it on a timely basis. After a while, it becomes clear that automation is necessary to respond to the large number of social media comments in appropriate time frames.

During the next few years, organizations of all sizes will need to build a social media technology servicing framework to handle an increasing volume of inquiries, complaints, and comments. As social media is conceptually just another channel, it should be incorporated into the enterprise's overall servicing framework. However, the unique characteristics and demands of social media interactions require specialized solutions and processes, even though the responses should be consistent in all channels.

There are many applications to help organizations handle their social media servicing challenges, and new ones are constantly being introduced. However, currently, there is no single solution that addresses all necessary requirements. Enterprises that want a complete solution need to purchase several applications and integrate them. They should also merge these applications with their existing servicing infrastructure to ensure an excellent customer experience.

The underlying technical components required to build a social media servicing infrastructure are:

Tools for monitoring social media sites for brand and company mentions.
Data acquisition/capture tools to identify and gather relevant social media interactions for the company.
Data extraction tools that separate "noise" from interactions that require immediate or timely responses.
An engine for defining business rules that generates alerts, messages, pop-ups, alarms, and events.
Integration tools to facilitate application-to-application communication, typically using open protocols such as Web services. Prebuilt integration tools, along with published application programming interfaces, should be provided for contact center applications.
Storage to house and access large volumes of historical data, and an automated process to retain and purge both online and archived data. Additional capabilities may include the ability to access archived data via other media, such as a CD-ROM, and the ability to store and retrieve data in a corporate storage facility, such as a network-attached storage or storage area network.
Database software for managing large volumes of information.
Work flow tools to automate business processes by systematically passing information, documents, tasks, notifications, or alerts to another business process (or person) for additional or supplementary action, follow-up, or expertise.

The core administrative tools needed are:

User administration capability with prebuilt tools to facilitate system access, user set-up, user identification and rights (privileges), password administration, and security.
Alert management capability that allows thresholds to be set so that alarms, alerts, or notifications can be enabled when predefined levels or time frames are triggered when violations or achievements occur (examples include alerts to signal changes in topics, emerging issues, and sentiment).
Metrics management, including the ability to enter, create, and define key performance indicators (KPIs) and associated metrics.
System configuration with an integrated environment for managing application set-up, and parameters for contact routing, skill groups, business rules, etc.

The core servicing functionality includes:

Skills-based routing tools to deliver identified interactions to agents or other employees with the proficiency to address them.
The ability to queue and route transactions (calls, emails, chat/IM, and social media posts) to the appropriate agent, employee, or team.
Text analytics software that uses a combination of statistical or linguistic modeling methods to extract information from unstructured textual data.
Filtering tools that separate "noise" from social media customer interactions that require immediate or timely responses.
Topic categorization software that identifies themes and trends within social media interactions.
Root cause analysis, a problem-solving tool that enables users to strip away layers of symptoms to identify the underlying reasons for problems or issues.
Search and retrieval abilities that allow large volumes of data to be searched, based on user-defined queries, to retrieve specific instances.
Sentiment analysis capability that can identify positive or negative sentiment about a company, person, or product, and assign a numerical score based on linguistic and statistical analysis.
A social CRM servicing solution that logs and tracks received social media interactions so that agents or employees can view the post/comment, create a customized response, and issue or post it.
Response templates that comprise a library of customizable responses to common social media posts.
A social media publishing tool that enables users to publish posts to social media sites.
Reporting functionality in which reports can be set up based on collected data, metrics, or KPIs in a preferred presentation format (chart or graph); this should also include the ability to create custom reports based on ad hoc queries.
Scorecards/dashboards for all constituents in an organization - agents, supervisors, managers, other departments, and executives.
An analytics tool that conducts multidimensional analyses of social media data, used to look for trends and data relationships over time, identify emerging issues and root causes, etc.
Recording software to capture social media inputs and responses.

Organizations also need a number of management applications to ensure that their social media teams or departments are properly trained and staffed. These tools are:

Quality assurance functionality to measure the quality of social media comments and posts by agents, to ensure that they are adhering to the organization's guidelines.
Coaching and e-learning software to deliver appropriate training courses and best practice clips to agents and other employees involved in responding to social media interactions.
A workforce management solution to forecast the expected volume of social media interactions that will require agent/employee assistance, and to identify and create optimal schedules (this also tracks adherence to service levels for each inquiry type).
Surveying software to determine if customers/comments were satisfied with the company's responses.
Desktop analytics to provide an automated and systematic approach to monitor, capture, structure, analyze, report, and react to all agent/employee desktop activity and process workflows.
An analytics-oriented performance management module that creates scorecards and dashboards to help contact center and other managers measure performance against preset goals.

Social media is going to change the servicing landscape for many organizations within the next five to eight years. This is because the volume of social media comments and posts is expected to grow rapidly, comprising 50 percent of all service interactions. Companies that build a servicing strategy incorporating social media will have a major advantage over their competitors.

Companies do not need all of the solutions identified above, they need to select the ones that allow them to incorporate social media into their servicing strategy and infrastructure so that customers can interact with them in their preferred channel.

Saturday, October 5, 2013

Knowledge Management Adoption Through Gamification

One of the most important components of a successful knowledge management program is its ability to promote and support a culture of collaboration and knowledge sharing.

Tools, processes and organizational policies are important elements but they will only get you so far. Culture is the cornerstone that will determine the willingness of your employees to participate in knowledge management.

How do you influence employees in your organization to adopt productive behaviors around collaboration and knowledge sharing? The answer may be found in a new concept called gamification.

What is gamification? It is a new and rapidly evolving area, but the following description is a good starting point: gamification is the use of game elements and game design techniques in non-game contexts.

That definition of gamification contains three distinct elements:

Game elements - this is about leveraging the components, design patterns, and feedback mechanisms that you would typically find in video games, such as points, badges and leader-boards. It is sometimes referred to as the engineering side of gamification.

Game design techniques - this is the artistic, experimental side of gamification. It includes aesthetics, narrative, player journey, progression, surprise, and, of course, fun. Games are not just a collection of elements, they are a way of thinking about and approaching challenges like a games designer.

Non-game contexts - some common areas in which gamification has taken hold include health and wellness, education, sustainability, and collaboration and knowledge sharing in the enterprise.

There are three key types of knowledge management behavior:

connect: how people connect to the content and communities they need to do their job;
contribute: the level at which people are contributing their knowledge and the impact of those contributions on other people;
cultivate: the willingness to interact with and build upon the ideas and perspectives of other employees, to help nurture a spirit of collaboration.

The unique selling point of gamification is the potential to learn from games and to draw on what makes games so engaging and attractive and to apply those components in other contexts. What is behind this philosophy? While people can be drawn in to collaborate and share via extrinsic motivation, the more you can tap into their intrinsic motivations and help people realize the inherent benefits of collaboration, the more successful and sustained that engagement will be.

We can identify three ways to affect intrinsic motivation: mastery, autonomy, and purpose.

Mastery

Getting really good at something, be it a skill, sport or mental discipline, has its own benefits. The goal of gamifying collaboration is to help people get good at it and, therefore, realize its inherent benefits. As participants progress through the "game", they gradually learn the skills to find expertise, build their network, and share their knowledge in a way that makes them more effective, and advances their careers.

Autonomy

Autonomy is about giving people the freedom to make meaningful choices. Instead of dictating a prescribed path, an autonomous approach allows them to set their own goals, choosing how they wish to collaborate, and ultimately providing a sense of ownership. The more individuals feel that they are in control, the better engaged they are going to be. Participants can share a document, write a blog, post a microblog or create a video. It is about giving participants choices, equipping them with the tools, and rewarding them for their knowledge sharing behaviors regardless of the specific mechanism they used.

Purpose

While there are plenty of personal benefits to collaboration, people are more engaged when they feel socially connected to others as part of a larger purpose. As part of that wider organization, they can take pride in the fact that they are making a broader impact on their organization and collaboration is a key part of that experience.

The use of gamification assumes that you already have knowledge management program in place. Assuming gamification can magically transform absence of knowledge management program into something engaging is a common error. A well thought-out and sustainable approach to gamification offers significant potential to make collaboration fun and engaging.

Gamification Tips

Don't lose sight of your objectives

Start with your business objectives in terms of their outcomes and keep your eyes on those objectives and validate them as you design, develop and implement your knowledge management program.

Focus on behaviors, not activities

It is very easy to get caught up in focusing exclusively on activities and end up having people busy doing "stuff". Similar to objectives, keep a focus on the behaviors you want your people to adopt and identify activities that are indicators of those behaviors.

Data is king

You need to be able to capture, store and retrieve data. Without a way to quantify and measure it, you will be stuck in the first step.

Spread the recognition

Don't limit the number of people who can be recognized through your program. In addition, recognize people's efforts in a variety of meaningful ways. Some examples of recognition are:

e-cards with 100 recognition points (monetary value of $100);
thank-you notes from leadership;
shout-outs in internal corporate communications;
badges on employees' profile pages;
feedback during the employee's performance review process.

People will game the system

You will need to pay attention to people who want to "game" the system. Where possible, build in approaches to limit the ability of people to do so.

Start small and evolve

Gamifying collaboration is not just something you build at once. To arrive at a good and sustainable knowledge management program, you need to be iterative, creating rough versions and play-testing continuously.

No silver bullet exists

Gamification is not a silver bullet. All the available evidence suggests that it can be leveraged further to embed the collaborative behaviors that go to make up a meaningful culture of collaboration and knowledge sharing across any organization.

Wednesday, September 18, 2013

The Mystery of How Enterprise Search Works (For You!)

Enterprise search starts with a user looking for information and submitting a search query. A search query would be a list of keywords (terms) or a phrase. The search engine would look for all records that match the request and return a list to the user. The list would contain results that are ranked in order of most relevant to least relevant for the request.

Let's look at search in more detail.

Performance Measures

There are two performance measures for evaluating the quality of query results: precision and recall.

Precision refers to the fraction of relevant documents from all documents retrieved. Recall is the fraction of relevant documents retrieved by a search from the total number of all relevant documents in the collection. It is said that precision is a measure of usefulness of a result while recall is a measure of the completeness of the result.

Modern search engines provide a high recall with good precision. It is easy to achieve high recall by simply returning all documents in the collection for every query. However, the precision in this case would be poor. A key challenge is how to increase precision without sacrificing recall. For example, most web search engines today provide reasonably good recall but poor precision. In other words, a user gets some relevant results, usually in the first 10 to 20 results, along with many non-relevant results.

Relevancy

Relevancy is a numerical score assigned to a search result representing how well the result meets the information the user who submitted the query is looking for. Relevancy is therefore a subjective measure of the quality of the results as defined by the user. The higher the score, the higher the relevance.

For every document in a result, a search engine calculates and assigns a relevancy score. TF-IDF is the standard relevancy heuristic used for all search engines.It compares TF and IDF variables to provide a ranking score for each document.

TF stands for Term Frequency. This is the number of times a word (or term) appears in a single document as percentage of total number of terms in the document. Term frequency assumes that when evaluating two documents, document A and document B, the one that contains more occurrences of the search term is probably also more "relevant" to the user.

IDF stands for Inverse Document Frequency. This is a measure of the general importance of the term which is the ratio of all documents in the set to the documents that contain the term. IDF prevents a bias towards longer documents.

Additional techniques may put more emphasis other attributes to determine relevancy, for example, freshness - when was the document created or last updated or what part of the document matched the term - document title or author may score higher than finding the term in the text body.

Modern search engines provide good relevancy scoring across a wide range of document formats, but more importantly, allow users to create and use their own relevancy scoring profiles optimized for their queries. These user-defined weights, also called boosting, can be set up and run for a user, group of users, or per query. This is extremely helpful for personalizing the search experience by roles or departments within the organization.

Linguistics

Linguistics is a vital component of any search solution. It refers to the processing and understanding of text in unstructured documents or text fields. There are two parts to linguistics: syntax and semantics.

Syntax is about breaking text into words and numbers which is also called tokenization. Semantics is the process of finding the meaning behind text, from the levels of words and phrases to the level of paragraphs, a document or a set of documents. Semantic analysis often involves grammatical description and deconstruction, morphology, phonology, and pragmatics. One major challenge is ambiguity of language.

Linguistics therefore improves relevancy and affects precision and recall. Common linguistic features in a search solution include stemming and lemmatization of words (reducing words to their root or stem form), phrasing (the recognition and grouping of idioms), removal of stop words (words that appear often in documents but contain little meaning, for example articles), spelling corrections, etc.

Navigation

One way to overcome the challenges of semantics and language ambiguity used by search engines is navigation. In this case, the search engine is using linguistics features, such as extraction of entities (nouns and noun phrases, places, people, concepts, etc.) and predefined taxonomy to narrow the results by clustering related documents together or providing useful dimensions, called facets, to slice the data, for example using price, name, etc. to narrow down the search results.

The Search Index

At the heart of every search engine is the search index. An index is a searchable catalog of documents created by the search engine. The search engine receives content for all source system to place in the index. This process is called ingestion. The search engine then accepts search queries to match against the index. The index is used to quickly find relevant documents for a search query out of collection of documents.

A common index structure is the inverted index which maps every term in the collection to all of its locations in this collection. For example, a search for the term "A" would check the entry for "A" in the index that contains links to all the documents that include "A".

Wednesday, July 31, 2013

ISO 9001 and Documentation

ISO 9001 compliance becomes increasingly important in regulated industries. How does it affect documentation? Here is how...

What is Document Control?

Document control means that the right persons have the current version of the documents they need, while unauthorized persons are prevented from use.

We all handle many documents every day. These documents include forms that we fill out, instructions that we follow, invoices that we enter into the computer system, holiday schedules that we check for the next day off, rate sheets that we use to bill our customers, and many more.

An error on any of these documents could lead to problems. Using an outdated version could lead to problems. Not knowing if we have the latest version or not could lead to problems. Just imagine us setting up a production line to outdated specifications or making strategic decisions based on a wrong financial statement.

ISO 9001 gives us tools (also referred to as "requirements") that show us how to control our documents.

ISO 9001 Documents

There are no "ISO 9001 documents" that need to be controlled, and "non ISO 9001 documents" that don't need control. The ISO 9001 system affects an entire company, and all business-related documents must be controlled. Only documents that don't have an impact on products, services or company don't need to be controlled - all others need control. This means, basically, that any business-related document must be controlled.

However, how much control you apply really depends on the document.

The extent of your approval record, for example, may vary with the importance of the document (remember, documents are approved before they are published for use).

The Quality Policy, an important corporate policy document, shows the signatures of all executives.

Work instructions often just show a note in the footer indicating approval by the department manager.

Some documents don't even need any approval record: if the person who prepared a document is also responsible for its content (e.g., the Quality Manager prepares instructions for his auditors), a separate approval is superfluous.

On the other hand, identifying a document with a revision date, source and title is basic. It really should be done as a good habit for any document we create.

Please note that documents could be in any format: hard copy or electronic. This means that, for example, the pages on the corporate internet need to be controlled.

Responsibility for Document Control

Document control is the responsibility of all employees. It is important that all employees understand the purpose of document control and how to control documents in accordance with ISO 9001.

Please be aware that if you copy a document or print one out from the Intranet and then distribute it, you are responsible for controlling its distribution! The original author will not know that you distributed copies of this documents, so the original author can't control your distribution.

Dating Documents

ISO 9001 requires to show on every document when it was created or last updated. Many of us may have thought to use our word processor's automatic date function for this, but... should we use the automatic date field on documents?

Generally not. If you enter the automatic date field into a document, the field will automatically be updated to always show the current date, no matter when you actually created or updated the document.

Example: For example, if you use the automatic date field in a fax and you save the fax on your computer for future reference, you won't be able to tell when you wrote the fax: when you open the fax on your computer, it will always show today's date.

The automatic date field is not suitable for document control. Therefore, as a general rule, don't use the automatic date field to identify revision status.

ISO 9001 Documentation

ISO 9001 documentation includes:

the Quality Procedures Manual, which also includes corporate policies and procedures affecting the entire company;
work instructions, which explain in detail how to perform a work process;
records, which serve as evidence of how you meet ISO 9001 requirements.

Policies and Procedures

Our ISO 9001 Quality Manual includes the corporate Quality Policy and all required ISO 9001 Procedures. While most procedures affect only managers, every employee must be familiar with the Quality Policy and with the Document Control procedures. The Quality Policy contains the corporate strategy related to quality and customer satisfaction; all other ISO 9001 documents must follow this policy. The Document Control procedures shows how to issue documents, as well as how to use and control documents.

Continuous Improvement

Implementing ISO 9001 is not a one-time benefit to a company. While you are utilizing the quality manual, quality procedures and work instructions in daily business activities, you are not only benefiting from better quality and increased efficiency but you are also continually improving. In fact, the ISO 9001 requirements are designed to make you continually improve. This is a very important aspect because companies that don't continue to improve are soon overtaken by the competition.

Thursday, June 20, 2013

Intelligent Search and Automated Metadata

The inability to identify the value in unstructured content is the primary challenge in any application that requires the use of metadata. Search cannot find and deliver relevant information in the right context, at the right time without good quality metadata.

An information governance approach that creates the infrastructure framework to encompass automated intelligent metadata generation, auto-classification, and the use of goal and mission-aligned taxonomies is required. From this framework, intelligent metadata enabled solutions can be rapidly developed and implemented. Only then can organizations leverage their knowledge assets to support search, litigation, e-discovery, text mining, sentiment analysis and open source intelligence.

Manual tagging is still the primary approach used to identify the description of content, and often lacks any alignment with enterprise business goals. This subjectivity and ambiguity is applied to search, resulting in inaccuracy and the inability to find relevant information across the enterprise.

Metadata used by search engines may be comprised of end user tags, pre-defined tags, or generated using system defined metadata, keyword and proximity matching, extensive rule building, end-user ratings, or artificial intelligence. Typically, search engines provide no way to rapidly adapt to meet organizational needs or account for an organization’s unique nomenclature.

More effective is implementing an enterprise metadata infrastructure that consistently generates intelligent metadata using concept identification. A profoundly different approach, relevant documents, regardless of where they reside, will be retrieved even if they don’t contain the exact search terms, because the concepts and relationships between similar content has been identified. The elimination of end-user tagging and the resulting organizational ambiguity enables the enriched metadata to be used by any search engine index, for example, ConceptSearch, SharePoint, Solr, Autonomy or Google Search Appliance.

Only when metadata is consistently accurate and trusted by the organization can improvements be achieved in text analytics, e-discovery and litigation support.

In the exploding age of big data, and more specifically text analytics, sentiment analysis and even open source intelligence, the ability to harness the meaning of unstructured content in real time improves decision-making and enables organizations to proactively act with greater certainty on rapidly changing business complexities.

To achieve an effective information governance strategy for unstructured content, results are predicated on the ability to find information and eliminate inappropriate information. The core enterprise search component must be able to incorporate and digest content from any repository, including faxes, scanned content, social sites (blogs, wikis, communities of interest, Twitter), emails, and websites. This provides a 360-degree corporate view of unstructured content, regardless of where it resides or how it was acquired.

Ensuring that the right information is available to end users and decision makers is fundamental to trusting the accuracy of the information and is another key requirement in intelligent search. Organizations can then find the descriptive needles in the haystack to gain competitive advantage and increase business agility.

An intelligent metadata enabled solution for text analytics analyzes and extracts highly correlated concepts from very large document collections. This enables organizations to attain an ecosystem of semantics that delivers understandable and trusted results that is continually updated in real time.

Applying the concept of intelligent search to e-discovery and litigation, traditional information retrieval systems use "keyword searches" of text and metadata as a means of identifying and filtering documents. The challenges and escalating costs of e-discovery and litigation support continue to increase. The use of intelligent search reduces costs and alleviates many of the challenges.

Content can be presented to knowledge professionals in a manner that enables them to more rapidly identify relevant information and increase accuracy. Significant benefits can be achieved by removing the ambiguity in content and the identification of concepts within a large corpus of information. This methodology delivers expediencies, and reduces costs, offering an effective solution that overcomes many of the challenges typically not solved in e-discovery and litigation support.

Organizations must incorporate an approach that addresses the lack of an intelligent metadata infrastructure. Intelligent search, a by-product of the infrastructure, must encourage, not hamper, the use and reuse of information and be rapidly extendable to address text mining, sentiment analysis, e-discovery, and litigation support.

The additional components of auto-classification and taxonomies complete the core infrastructure to deploy intelligent metadata enabled solutions, including records management, data privacy, and migration. Search can no longer be evaluated on features, but on proven results that deliver insight into all unstructured content.

Wednesday, May 29, 2013

Digital Assets Management System - Autonomy Virage MediaBin

Autonomy Virage MediaBin is the advanced and comprehensive solution to index, analyze, categorize, manage, retrieve, process, and distribute all types of digital assets within an organization.

Autonomy Virage MediaBin helps organizations with globally distributed teams to effectively manage, distribute, and publish digital assets used to promote their messaging, products, and brands.

Companies would benefit from higher-impact marketing and communications, greater agility, stronger brand equity, increased team productivity, and the security of knowing valuable corporate assets will be fully leveraged and preserved for the future. By providing self-service access to digital assets, marketing personnel no longer have to spend time fulfilling content requests.

Autonomy Virage MediaBin delivers rapid return on investment and can support implementations scaling up to the largest global enterprises.

Major Features:

Unified Management: a single environment which supports standardized and automated tagging to accelerate search and streamline the creation, management, delivery, and archival of all digital assets.

Intelligent Analytics: leverages Autonomy IDOL to automate manual processes such as metadata tagging, summarization, and categorization.

Next-Gen Rich Media Technology: leverages next generation video and speech analytics technology that extracts concepts to enable cross-referencing with other forms of information.

Effective and Agile Content Reuse: provides secure access to all content for all users. Internal and external teams can collaborate more effectively to improve coordination and productivity in all marketing programs.

Transform and Transcode on the Fly: Multi-threaded transformation task engine can handle large quantities of simultaneous complex transformations involving format conversions, color-space conversions, color adjustments, resolution, cropping, sizing, padding, watermarking, and a wide variety of advanced graphics adjustments that would normally require a user to open an editing application on their desktop.

Other Features:

browser based system;
permissions can be defined based on users roles or by folders; search incorporates permissions;
content can be pulled from CMS such as TeamSite and rendered on the fly;
each asset has unique ID which is passed over to TeamSite; TeamSite "knows" when there is a different or a new revision. If an asset gets updated in MediaBin, TeamSite gets notified;
has set of workflows such as approval and review, can define set of rules once assets are approved, they move to publishing area; also includes Process Studio which is the workflow tool and Template which is form builder;
assets can be uploaded by "drag and drop" and it can be Dragged and Dropped to Teamsite from MediaBin;
there is no limitation to size of the files;
upload can be automated for assets to go to specific folders;
after the download, assets will be preserved for individual users;
how assets are used is reported in Teamsite;
can pull content from SharePoint;
metadata is preserved, it is searchable and indexable.
content is automatically categorized by asset type and resolution; asset type is recognized on ingest, so no entering metadata is required;
Teamsite pulls images from MediaBin;
supports 29 languages;
ability to link assets together (for example: associated assets) using existing metadata;
ability to create a taxonomy of assets;
search includes saved searches, recent searches, both preset and executed searches, custom search;
ability to search for words in video and then go that place in the video;
once a user finds content, an action can be taken such as download, send it e-mail, send shortcut to content or add it to light-box which is defined by permissions;
there is Activity Manager which includes all taken actions and an ability to get to users' tasks.

Benefits:

eliminates human error and ensures quicker access to content through automatic metadata extraction and accurate search results;
reduces costs by automating the production, review, and distribution of digital assets;
encreases efficiency by providing users with self-service access at any time;
greater speed time-to-market while maintaining accuracy and consistency;
facilitates quick reuse and re-purposing of images, as well as rapid content creation;
produces higher-impact marketing and communications, greater agility, and stronger brand consistency;
increases compliance by security controlled access, complete audit trail, and control of licensed content.

Thursday, May 9, 2013

Search Engine Technology

Modern web search engines are highly intricate software systems which employ technology that has evolved over the years. There are few categories of search engines that are applicable to specific browsing needs.

These include web search engines (e.g. Google), database or structured data search engines (e.g. Dieselpoint), and mixed search engines or enterprise search.

The more prevalent search engines such as Google and Yahoo! utilize hundreds of thousands of millions of computers to process trillions of web pages in order to return fairly well-aimed results. Due to this high volume of queries and text processing, the software is required to run in a highly dispersed environment with a high degree of superfluity.

Search Engine Categories

Web search engines

These are search engines that are specifically designed for searching web pages. They were developed to facilitate searching through a large amount of web pages. They are engineered to follow a multi-stage process: crawling the infinite number of pages to skim the figurative foam from their contents, indexing the foam/buzzwords in a sort of semi-structured form (for example a database), and returning mostly relevant as links to those skimmed documents or pages from the inventory.

Crawl

In the case of a wholly textual search, the first step in classifying web pages is to find an "index item" that might relate expressly to the "search term". Most search engines use sophisticated algorithms to "decide" when to revisit a particular page, to check its relevance. These algorithms range from constant visit-interval with higher priority for more frequently changing pages to adaptive visit-interval based on several criteria such as frequency of chance, popularity, and overall quality of site. The speed of the web server running the page as well as resource constraints like amount of hardware or bandwidth also figure in.

Link map

The pages that are discovered by web crawls are often distributed and fed into another computer that creates a veritable map of uncovered resources. This looks a little like a graph, on which different pages are represented as small nodes that are connected by links between the pages. The excess of data is stored in multiple data structures that allow quick access to this data by certain algorithms that compute the popularity score of pages on the web based on how many links point to a certain web page, which is how people can access any number of resources concerned with diagnosing psychosis.

Database Search Engines

Searching for text-based content in databases presents few special challenges from which a number of specialized search engines developed. Databases are slow when solving complex queries (with multiple logical or string matching arguments). Databases allow pseudo-logical queries which full-text searches do not use. There is no crawling necessary for a database since the data is already structured. However, it is often necessary to index the data in a more economized form designed to inspire a more expeditious search.

Mixed Search Engines

Sometimes, searched data contains both database content and web pages or documents. Search engine technology has developed to respond to both sets of requirements. Most mixed search engines are large Web search engines, like Google. They search both through structured and unstructured data sources. Pages and documents are crawled and indexed in a separate index. Databases are indexed also from various sources. Search results are then generated for users by querying these multiple indices in parallel and compounding the results according to "rules".

Tuesday, April 30, 2013

Big Data and Content Management

There has been a lot of talk lately about big data. What is big data?

Big data is is a collection of data sets so large and complex that it becomes difficult to process using on-hand commonly used software tools or traditional data processing applications. The challenges include capture, governance, storage, search, sharing, transfer, analysis, and visualization.

What is considered "big data" varies depending on the capabilities of the organization managing the data set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain.

Big data sizes are a constantly moving target. As of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. How does it apply to us and what we do in content management?

The sheer numbers, covered in most enterprise content management (ECM) analyst reports, also extend to all aspects of the information technology sector, prompting developers to create a new generation of software and technology or distributed computing frameworks in an effort to cope with this scalability phenomenon.

Content growth is everywhere. From traditional data warehouses to new consolidated big data stores, IT infrastructure must be ready for this continuing scale; it impacts the entire IT industry, especially ECM.

Content is getting bigger. Applications are growing more complex, challenging IT as never before. How will these changes impact content management technologies? It's difficult to predict exactly, but there are insights to be found and used to plan for the future.

ECM technology is evolving toward a platform-based approach, enabling organizations to make their own content-centric and content-driven applications smarter. Analysts, vendors and users all agree: The time for "out-of-the-box" CMS applications has passed. Now each project can meet specific needs and individual requirements.

Content and data, more often than not, come with embedded intelligence whether through adding custom metadata and in-text information or by leveraging attached media and binary files and it can be utilized, whether structured or unstructured.

This can be observed on many different levels across various domains. For instance, the arrival of what some have started to call "Web 3.0": the semantic Web and the related technology that promotes intelligence out of raw content through advancements like semantic text analysis, automated relations and categorization, sentimental analysis, etc. -- effectively, giving meaning to data.

More traditional ECM components, such as workflows, content lifecycle management and flexibility, demonstrate much of the same. Smart content architecture along with intelligent, adaptive workflow and processed or deep integration with the core applications within information systems are all making enterprise content-centric applications smarter and are refining the way intelligence is brought to content.

In short, content is getting smarter on the inside as much as on the outside.

In fact, such disruptive phenomena as Big Data or the new semantic technology on the scene are huge opportunities for enterprise content management solutions. They are bringing new solutions and possibilities in business intelligence, semantic text analysis, data warehousing and caching that require integration into existing content-centric applications, all without rewriting them.

As a result, Big Data and smart content will push more of enterprise content management toward technical features such as software interoperability, extensibility and integration capabilities.

These developments will also demand a clean and adaptive architecture that is flexible enough to evolve as new standards arise to bridge CMS and semantic technologies, as well as connectors, to a back-end storage system or connectors with text-analysis solutions.

This underscores the advancements made in the development of modular and extensible platforms for content-centric applications. Taking the traditional approach of employing large enterprise content management suites that rely on older software architecture will make it harder to leverage these new and nimble opportunities.

In order to get the most value out of smart content and refine methods of dealing with Big Data, enterprise content management architects must incorporate a modern and well designed content management platform upon which to build, one that not only looks at end-user features but stays true to the development side. Enterprise content management will not be reinvented; Big Data and smart content are evolutions, not revolutions, in the industry.

I will continue on this subject in my future posts.