Wednesday, August 13, 2014

Dublin Core Metadata

The word "metadata" means "data about data". Metadata describes a context for objects of interest such as document files, images, audio and video files. It can also be called resource description. As a tradition, resource description dates back to the earliest archives and library catalogs. The modern "metadata" field that gave rise to Dublin Core and other recent standards emerged with the Web revolution of the mid-1990s.

The Dublin Core Schema is a small set of vocabulary terms that can be used to describe different resources.

Dublin Core Metadata may be used for multiple purposes, from simple resource description, to combining metadata vocabularies of different metadata standards, to providing inter-operability for metadata vocabularies in the Linked data cloud and Semantic web implementations.

"Dublin" refers to Dublin, Ohio, USA where the schema originated during the 1995 invitational OCLC/NCSA Metadata Workshop hosted by the Online Computer Library Center (OCLC), a library consortium based in Dublin, and the National Center for Supercomputing Applications (NCSA). "Core" refers to the metadata terms as broad and generic being usable for describing a wide range of resources. The semantics of Dublin Core were established and are maintained by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, museums, and other related fields of scholarship and practice.

The Dublin Core Metadata Initiative (DCMI) provides an open forum for the development of inter-operable online metadata standards for a broad range of purposes and of business models. DCMI's activities include consensus-driven working groups, global conferences and workshops, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

In 2008, DCMI separated from OCLC and incorporated as an independent entity. Any and all changes that are made to the Dublin Core standard are reviewed by a DCMI Usage Board within the context of a DCMI Namespace Policy. This policy describes how terms are assigned and also sets limits on the amount of editorial changes allowed to the labels, definitions, and usage comments.

Levels of the Standard

The Dublin Core standard originally includes two levels: Simple and Qualified. Simple Dublin Core comprised 15 elements; Qualified Dublin Core included three additional elements (Audience, Provenance and RightsHolder), as well as a group of element refinements (also called qualifiers) that could refine the semantics of the elements in ways that may be useful in resource discovery. Since 2012 the two have been incorporated into the DCMI Metadata Terms as a single set of terms using the Resource Description Framework (RDF).

The original Dublin Core Metadata Element Set which is the Simple level consists of 15 metadata elements:

Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights

Each Dublin Core element is optional and may be repeated. The DCMI has established standard ways to refine elements and encourage the use of encoding and vocabulary schemes. There is no prescribed order in Dublin Core for presenting or using the elements. The Dublin Core became ISO 15836 standard in 2006 and is used as a base-level data element set for the description of learning resources in the ISO/IEC 19788-2.

Qualified Dublin Core

Subsequent to the specification of the original 15 elements, an ongoing process to develop terms extending or refining the Dublin Core Metadata Element Set (DCMES) began. The additional terms were identified. Elements refinements make the meaning of an element narrower or more specific. A refined element shares the meaning of the unqualified element, but with a more restricted scope.

In addition to element refinements, Qualified Dublin Core includes a set of recommended encoding schemes, designed to aid in the interpretation of an element value. These schemes include controlled vocabularies and formal notations or parsing rules.

Syntax

Syntax choices for Dublin Core metadata depends on a number of variables, and "one size fits all" forms rarely apply. When considering an appropriate syntax, it is important to note that Dublin Core concepts and semantics are designed to be syntax independent and are equally applicable in a variety of contexts, as long as the metadata is in a form suitable for interpretation both by machines and by human beings.

The Dublin Core Abstract Model provides a reference model against which particular Dublin Core encoding guidelines can be compared, independent of any particular encoding syntax. Such a reference model allows users to gain a better understanding of descriptions they are trying to encode and facilitates the development of better mappings and translations between different syntax.

I will describe some applications of Dublin Core in my future posts.

Tuesday, July 22, 2014

Hadoop and Big Data

During last ten years the volume and diversity of digital information grew at unprecedented rates. Amount of information is doubling every 18 months, and unstructured information volumes grow six times faster than structured.

Big data is the nowadays trend. Big data has been defined as data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.

Hadoop was created in 2005 by Doug Cutting and Mike Cafarella to address the big data issue. Doug Cutting named it after his son's toy elephant It was originally developed for the Nutch search engine project. Nutch is an effort to build an open source web search engine based on Lucene and Java for the search and index component.

As of 2013, Hadoop adoption is widespread. A number of companies offer commercial implementations or support for Hadoop. For example, more than half of the Fortune 50 use Hadoop. Hadoop is designed to process terabytes and even petabytes of unstructured and structured data. It breaks large workloads into smaller data blocks that are distributed across a cluster of commodity hardware for faster processing.

Ventana Research, a benchmark research and advisory services firm published the results of its groundbreaking survey on enterprise adoption of Hadoop to manage big data. According to this survey:
  • More than one-half (54%) of organizations surveyed are using or considering Hadoop for large-scale data processing needs.
  • More than twice as many Hadoop users report being able to create new products and services and enjoy costs savings beyond those using other platforms; over 82% benefit from faster analysis and better utilization of computing resources.
  • 87% of Hadoop users are performing or planning new types of analysis with large scale data.
  • 94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.
  • 63% of organizations use Hadoop in particular to work with unstructured data such as logs and event data.
  • More than two-thirds of Hadoop users perform advanced analysis such as data mining or algorithm development and testing.
Today, Hadoop is being used as a:
  • Staging layer: the most common use of Hadoop in enterprise environments is as “Hadoop ETL” — pre-processing, filtering, and transforming vast quantities of semi-structured and unstructured data for loading into a data warehouse.
  • Event analytics layer: large-scale log processing of event data: call records, behavioral analysis, social network analysis, clickstream data, etc.
  • Content analytics layer: next-best action, customer experience optimization, social media analytics. MapReduce provides the abstraction layer for integrating content analytics with more traditional forms of advanced analysis.
  • Most existing vendors in the data warehousing space have announced integrations between their products and Hadoop/MapReduce.
Hadoop is particularly useful when:
  • Complex information processing is needed.
  • Unstructured data needs to be turned into structured data.
  • Queries can’t be reasonably expressed using SQL.
  • Heavily recursive algorithms.
  • Complex but parallelizable algorithms needed, such as geo-spatial analysis or genome sequencing.
  • Machine learning.
  • Data sets are too large to fit into database RAM, discs, or require too many cores (10’s of TB up to PB).
  • Data value does not justify expense of constant real-time availability, such as archives or special interest information, which can be moved to Hadoop and remain available at lower cost.
  • Results are not needed in real time.
  • Fault tolerance is critical.
  • Significant custom coding would be required to handle job scheduling.
Does Hadoop and Big Data Solve All Our Data Problems?

Hadoop provides a new, complementary approach to traditional data warehousing that helps deliver on some of the most difficult challenges of enterprise data warehouses. Of course, it’s not a panacea, but by making it easier to gather and analyze data, it may help move the spotlight away from the technology towards the more important limitations on today’s business intelligence efforts: information culture and the limited ability of many people to actually use information to make the right decisions.

Monday, July 14, 2014

Benefits of Automatic Content Classification

I had few questions about my posts about automatic content classification. I would like to thank my blog readers for these questions. This post is to follow up on those questions.

Organizations receive countless amounts of paper documents every day. These documents can be mail, invoices, faxes, or email. Even after organizations scan these paper documents, it is still difficult to manage and organize them.

To overcome the inefficiencies associated with paper and captured documents, companies should implement an intelligent classification system to organize captured documents.

With today’s document processing technology, organizations do not need to rely on manual classification or processing of documents. Organizations that overcome manual sorting and classification in favor of an automated document classification & processing system can realize a significant reduction in manual entry costs, and improve the speed and turnaround time for document processing.

Recent research has shown that two-thirds of organizations cannot access their information assets or find vital enterprise documents because of poor information classification or tagging. The survey suggests that much of the problem may be due to manual tagging of documents with metadata, which can be inconsistent and riddled with errors, if it has been done at all.

There are few solutions for automated document classification and recognition. Some of them are: SmartLogic's Semaphore, OpenText, Interwoven Metatagger, Documentum, CVISION Trapeze, and others. These solutions enable organizations to organize, access, and control their enterprise information.

They are cost effective and eliminate inconstancy, mistakes, and the huge manpower costs associated with manual classification. Putting an effective and consistent automatic content classification system in place that ensures quick and easy retrieval of the right documents means better access to corporate knowledge, improved risk management and compliance, superior customer relationship management, enhanced findability for key audiences and an improved ability to monetize information.

Specific benefits of automatic content classification are:

More consistency. It produces the same unbiased results over and over. Might not always be 100% accurate or relevant, but if something goes wrong, it is at least is easy to understand why.

Larger context. Enforces classification from the whole organizations perspective, not the individuals. For example, a person interested in sports might tag an article which mentions a specific player, but forget/not consider a team and a country topic.

Persistent. A person can only handle a certain number of incoming documents per day, whilst an automatic classification works round the clock.

Cost effective. Possible to handle thousands of documents much faster than a person.

Automatic document classification can be divided into three types: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents; unsupervised document classification (also known as document clustering) where the classification must be done entirely without reference to external information, and semi-supervised document classification where parts of the documents are labeled by the external mechanism.

Automate your content classification and consider using manual labor mainly for quality checking and approval of content.

In my next post on this topic, I will describe the role of automatic classification in records management and information governance.

Tuesday, June 10, 2014

Unified Data Management

In most organizations today, information is managed in isolated silos by independent teams using various tools for data quality, data integration, data governance, metadata and master data management, B2B data exchange, content management, database administration, information life-cycle management, and so on.

In response to this situation, some organizations are adopting Unified Data Management (UDM), a practice that holistically coordinates teams and integrates tools. Other common names for this practice include enterprise data management and enterprise information management.

Regardless of what you call it, the big picture that results from bringing diverse data disciplines together yields several benefits, such as cross-system data standards, cross-tool architectures, cross-team design and development synergies, leveraging data as an organizational asset, and assuring data’s integrity and lineage as it travels across multiple departments and technology platforms.

But unified data management is not purely an exercise in technology. Once data becomes an organizational asset, the ultimate goal of UDM becomes to achieve strategic, data-driven business objectives, such as fully informed operations and business intelligence, plus related goals in governance, compliance, business transformation, and business integration. In fact, the challenge of UDM is to balance its two important goals: uniting multiple data management practices and aligning them with business goals that depend on data for success.

What is UDM? It is the best practice for coordinating diverse data management disciplines, so that data is managed according to enterprise-wide goals that promote efficiency and support strategic, data-oriented business goals. UDM is unification of both technology practices and business management.

For UDM to be considered successful, it should satisfy and balance the following requirements:

1. UDM must coordinate diverse data management areas. This is mostly about coordinating the development efforts of data management teams and enabling greater inter-operability among their participants. UDM may also involve the sharing or unifying of technical infrastructure and data architecture components that are relevant to data management. There are different ways to describe the resulting practice, and users who have achieved UDM call it a holistic, coordinated, collaborative, integrated, or unified practice. The point is that UDM practices must be inherently holistic if you are to improve and leverage data on a broad enterprise scale.

2. UDM must support strategic business objectives. For this to happen, business managers must first know their business goals, then communicate data-oriented requirements to data management professionals and their management. Ideally, the corporate business plan should include requirements and milestones for data management. Hence, although UDM is initially about coordinating data management functions, it should eventually lead to better alignment between data management work and information-driven business goals of the enterprise. When UDM supports strategic business goals, UDM itself becomes strategic.

More Definitions

UDM is largely about best practices from a user’s standpoint. Most UDM work involves collaboration among data management professionals of different specialties (such as data integration, quality, master data, etc.). The collaboration fosters cross-solution data and development standards, inter-operability of multiple data management solutions, and a bigger concept of data and data management architecture.

UDM is not a single type of vendor-supplied tool. Even so, a few leading software vendors are shaping their offerings into UDM platforms. Such a platform consists of a portfolio of multiple tools for multiple data management disciplines, the most common being BI, data quality, data integration, master data management, and data governance.

For the platform to be truly unified, all tools in the portfolio should share a common graphical user interface (GUI) for development and administration, servers should inter-operate in deployment, and all tools should share key development artifacts (such as metadata, master data, data profiles, data models, etc.). Having all these conditions is ideal, but not an absolute requirement.

UDM often starts with pairs of practices. In other words, it’s unlikely that any organization would want or need to coordinate 100% of its data management work via UDM or anything similar. Instead, organizations select combinations of data management practices whose coordination and collaboration will yield appreciable benefits.

The most common combinations are pairs, as with data integration and data quality or data governance and master data management. Over time, an organization may extend the reach of UDM by combining these pairs and adding in secondary, supporting data management disciplines, such as metadata management, data modeling, and data profiling. Hence, the scope of UDM tends to broaden over time into a more comprehensive enterprise practice.

A variety of organizational structures can support UDM. It can be a standalone program or a subset of larger programs for IT centralization and consolidation, IT-to-business alignment, data as an enterprise asset, and various types of business integration and business transformations. UDM can be overseen by a competency center, a data governance committee, a data stewardship program, or some other data-oriented organizational structure.

UDM is often executed purely within the scope of a program for BI and data warehousing (DW), but it may also reach into some or all operational data management disciplines (such as database administration, operational data integration, and enterprise data architecture).

UDM unifies many things. As its name suggests, it unifies disparate data disciplines and their technical solutions. On an organizational level, it also unifies the teams that design and deploy such solutions. The unification may simply involve greater collaboration among technical teams, or it may involve the consolidation of teams, perhaps into a data management competency center.

In terms of deployed solutions, unification means a certain amount of inter-operability among servers, and possibly integration of developer tool GUIs. Technology aside, UDM also forces a certain amount of unification among business people, as they come together to better define strategic business goals and their data requirements. When all goes well, a mature UDM effort unifies both technical and business teams through IT-to-business alignment.

Why Care About UDM?

There are many reasons why organizations need to step up their efforts with UDM:

Technology drivers. From a technology viewpoint, the lack of coordination among data management disciplines leads to redundant staffing and limited developer productivity. Even worse, competing data management solutions can inhibit data’s quality, integrity, consistency, standards, scalability, architecture, and so on. On the upside, UDM fosters greater developer productivity, cross-system data standards, cross-tool architectures, cross-team design and development synergies, and assuring data’s integrity and lineage as it travels across multiple organizations and technology platforms.

Business drivers. From a business viewpoint, data-driven business initiatives (including BI, CRM, regulatory compliance, and business operations) suffer due to low data quality and incomplete information, inconsistent data definitions, non-compliant data, and uncontrolled data usage. UDM helps avoid these problems, plus it enables big picture data-driven business methods such as data governance, data security and privacy, operational excellence, better decision making, and leveraging data as an organizational asset.

To be successful, an organization needs a data strategy that integrates a multitude of sources, case studies, deployment technologies, regulatory and best practices guidelines, and any other operating parameters.

Many organizations have begun to think about the contributions of customer information in a more holistic way as opposed to taking a more fragmented approach. In the past, these efforts were undertaken mainly by organizations traditionally reliant on data for their model (like finance and publishing), but today, as more and more businesses recognize the benefits of a more comprehensive approach, these strategies are gaining much wider deployment.

What should a good strategy look like?

First, identify all data assets according to their background and potential contribution to the enterprise.

Second, outline a set of data use cases that will show how information will support any of a variety of customer marketing functions.

Next, create rules and guidelines for responsible use and access, making sure that the process is flexible and transparent. Keep in mind that not all data should be treated the same way; rather, it should be managed according to its sensitivity and need.

Finally, make sure that this process is ongoing so that tactics can be evaluated and adjusted as needed.

Such a strategy combines the best practices with responsible data governance and smart organization. Everyone wins - the employees who gain quick access to essential information, the enterprise that is running more smoothly; and of course, the customers who are served by a resource-rich organization!

Thursday, May 15, 2014

Content Categorization Role in Content Management

An ability to find content in a content management system is crucial. One of main goals of having a content management system is to make content easy to find, so you can take an action, make a business decision, do research and development work, etc.

The main challenge to findability is anticipating how users might look for information. That's where categorization comes into play. The quality of the categorization of each piece of content makes or breaks its findability. Theoretically, good tagging will last the lifetime of the content. You would think that if you do it well initially, then you can forget about it until it is time to retire that content. But reality can be very different.

Durable Categorization

Many issues complicate content categorization. They include:
  • the sheer volume, velocity, and variety of internal and external-facing content which needs management;
  • evolving/emerging regulations and compliance issues, some of which need to be retroactively applied; 
  • the need to limit the company's exposure and to support the strength of its position in any legal activity.
Some organizations face the added challenge of integrating content from acquisitions or mergers, which most likely use content management structure, categorization, and methodologies that are incompatible and of inconsistent quality.

Considering these issues, the success factor for good content categorization are the automatic categorization techniques and processes.

Traditionally, keywords, dictionaries, and thesauri are used to categorize content. This type of categorization model poses several problems:
  • taxonomy quality - it depends on the initial vision and attention to detail, and whether it has been kept current;
  • term creep - initial categorization will not always accommodate where and how the content will be used over time, or predict relevancy beyond its original focus;
  • policy evolution - it can't easily apply new or evolving policies, regulations, compliance requirements, etc.;
  • cost and complexity - it is difficult and costly, if not practically impossible, to retroactively expand the original categorization of the existing content if big amount of content is added.
Automatic Categorization

Using technology to automatically categorize content is a solution. It applies the rules more consistently than people do. It does it faster. It frees people from having to do the task, and therefore has less costs. And, it can actively or retroactively categorize batches or whole collections of documents.

You can experience these benefits by using concept-based categorization driven by an analytics engine integrated into the content management system. These systems mathematically analyze example documents you provide to calculate concepts that can be used to categorize other documents. Identifying hundreds of keywords per term, they are able to distinguish relevance that escapes keyword and other traditional taxonomy approaches. They are even highly likely to make connections that a person would miss.

Consider 3D printers as an example. These are also known as "materials printers", "fabbers", "3D fabbers", and as "additive manufacturing". If all of those terms are not in the taxonomy, then relevant documents that use one or more of them, but not 3D printer, would not be optimally categorized.

People looking for information about 3D printers who are not aware of the alternative terms would miss related documents of potential significance. This particularly impacts  external facing websites that sell products on their websites. Their business depends on fast and easy delivery of accurate and complete information to their customers, even when the customer doesn't know all of the various terms used to describe the product they are looking for.

In contrast, through example-based mathematical analysis and comparison along multiple keywords, conceptual analytics systems understand that these documents are all related. They would be automatically categorized and tagged as relevant to 3D printing.

Another difference is that taxonomy systems require someone to enter the newly developed or discovered terms. In conceptual analytics, it is simply a matter of providing additional example documents that automatically add to the system's conceptual understanding.

The days of keeping everything "just in case" are long gone. From cost and risk exposure concerns, organizations need to keep only what is necessary, particularly as the volume and variety of content continue to grow. Good categorization and tagging systems are essential to good content management and to controlling expense and exposure.

Outdated and draft documents unnecessary expand every company's content repositories. Multiple copies of the same or very similar content are scattered throughout the organization. By some estimates, these compose upwards of 20% or more of a company's content.

Efficiently weeding out that content means 20% less active and backup storage, bandwidth, cloud storage for offsite disaster recovery, and archive volume. Effective and thorough tagging can identify such elements to reduce these costs, and simultaneously reduce the company's cost and exposure related to legal or regulatory requirements.

The Value Beyond Cost Savings

An effectively managed content delivers better cost of content management and reduced exposure to risk. While this alone is reason to implement improvements in categorization, there are other reasons.

Superior categorization through conceptual analysis also affects operational efficiency by enabling fast, accurate, and complete content gathering. A significant benefit for any enterprise is that it allows more time for actual work by reducing the time it takes to find necessary information. It is of critical importance for companies whose revenue depends on their customers quickly and easily finding quality information.

Conceptual analytics systems deliver two other advantages over traditional taxonomy methods and manual categorization. It creates a mathematical index, so it is useless to anyone trying to discover private information or clues about the company. Also, it is deterministic and repeatable. It will give the same result every time and so it is very valuable in legal or regulatory activities.

Concept-based analysis makes content findable and actionable, regardless of language, by automatically categorizing it based on understanding developed from example documents you provide. Both internally and externally, the company becomes more competitive with one of its most important assets which is unstructured information.

Wednesday, April 30, 2014

Video Instant Messaging for Collaboration and Knowledge Management

Due to falling costs, high-availability data connections, smart mobile devices, and the growth of cloud computing, knowledge management and enterprise collaboration are undergoing something of a rebirth. Most enterprise collaboration remains centered on documents. There is an opportunity to extend beyond document-centric tasks to other collaborative, linear, process-oriented work. Simple to use video instant messaging (VIM) technology may be very useful in this much needed shift.

Today, most companies use video for:
  • team conferencing;
  • customer interaction/service support;
  • pre-recorded-corporate communications;
  • an alternative to text for training/education.
Beyond those areas, VIM functionality is just starting to arrive to companies. There are different opinions about VIM. like it and feel it helps tocreate a better "human connection," and others don't like it and claim that it invades their privacy. In fact, the most common arguments in favor of and against the use of video are surprisingly similar to one another and suggest a cultural split that can be difficult to bridge or even manage.

Video instant messaging, as its name makes clear, is potentially the most invasive use of video in companies. For example, it can be images of unexpected video calls on an active laptop coming from an angry boss, with the recipient of the call being unprepared. One-to one, face-to-face discussions are the most optimal use of such real-time technology. There are valuable ways of using the technology.

What is Video Conferencing

Video conferencing is when multiple people, typically four or more, in more than one geographic location use audio and video technology to connect via a virtual conference room environment to conduct a pre-arranged meeting.

Web conferencing (or webinar) is where multiple users watch a single remote screen; this method is typically used for one-to-many presentations.

Video instant messaging (VIM) is distinct from the other two in that the interactions are typically one-to-one and are also most typically not pre-arranged, but rather ad hoc.

Opportunities

VIM is not particularly new, but its use (and misuse) has until recently been largely confined to the consumer world. Some knowledge management vendors such as Citrix, TIBCO and Teambox are adding VIM to their product suites because VIM functionality is particularly relevant in the mobile era: almost all handheld devices contain cameras. This is key to understanding the value of VIM, since it's not so much the video as the mobile camera that has much potential.

Future use cases for VIM technology in knowledge management will be situations where a physical object or environment needs to be collaboratively viewed. For example, in a service situation, a technician is looking at a complex wiring structure and using a mobile device to stream video of the situation to a remote expert, enabling them to share visuals to resolve the situation. Similarly, designers and creative people can discuss and view samples and compare options in real time. In situations like these, the ability to share a clear real-time moving image of the in real environment via a handheld device can be invaluable and a highly efficient use of collaboration technology.

Pros and Cons of Using Video

Pros
  • Seeing the other attendees builds a deeper connection.
  • It reduces travel costs and time on the road.
  • Meetings can be recorded.
  • Processes can be more efficient.
  • Collaboration can be enhanced.
Cons
  • People are self-conscious on camera.
  • Audio/video quality varies.
  • A poor or lost connection means no conference at all.
  • It is not as good as face-to-face meetings.
  • Signal delays can ruin the meeting.
New Horizons

For traditional users, there is a somewhat parallel use case in that they often want to share a document, drawing or whiteboard in real time with a colleague. It is about seeing the same thing and collaborating on it. VIM can have a valuable role in supporting collaborative processes in areas like law enforcement, healthcare, and engineering. The established use for video are users who create a lot of text-based communications and need to occasionally see one another from their home offices.

Moving from a fixed camera position to a fully mobile situation opens the technology to many more uses. Video stream can be used for all the key management and productivity functions, such as calendaring, expertise location, file access edit and view, knowledge sharing, and team and project management.

Three Priority Objectives

On a more tactical level, it will be important to use these objectives in VIM use.

First, focus on tying collaboration and knowledge sharing back to clearly defined business processes that involve human interaction. Take existing technology and configure it to meet your organization needs, and also use VIM to engage with work objects and environments, rather than simply using it as a face-to-face conversation tool.

In an organization situations, communication can't simply stop if someone hangs up. Whether with video or text or voice, the trail needs to be kept traceable, and the files and data created should be contextualized and made relevant. Documents need to be retained intact until a decision has been made to remove or destroy them, depending on governance, housekeeping and compliance requirements. This option should be available for video and other forms of interaction. Information lifecycle needs to be considered.

Third, if VIM is to be used for collaboration, then still-image functionality that freezes and captures views in HD should be provided within the VIM frame. This enables a detailed visual examination of a specific point in time, and also creates a file of record. Shifting between moving and still images in a collaborative engagement, allowing a free-flowing interaction and the capture of specific elements, would be of great value to many industry and process-specific collaborative situations.

Future

VIM has a key role to play in taking enterprise collaboration to the next level. This role is not so much about being able to see the face of the other person as it is about viewing and capturing in real time what the other party is seeing.Refining the use cases and collaborative process that leverage VIM will take some effort. Over time, VIM will prove to be a value in the growth of enterprise collaboration. It will be especially useful for industries such as healthcare, law enforcement, engineering, and maintenance. Enterprise collaboration tools and suites already have many of the key pieces of the puzzle; now they need to put them together into a coherent, practical whole.

Tuesday, March 25, 2014

Search Applications - Vivisimo

Vivisimo was a privately held technology company that worked on the development of computer search engines. The company product Velocity provides federated search and document clustering. Vivisimo's public web search engine Clusty was a metasearch engine with document clustering; it was sold to Yippy, Inc. in 2010.

The company was acquired by IBM in 2012 and Vivisimo Velocity Platform is now IBM InfoSphere Data Explorer. It stays true to its heritage of providing federated navigation, discovery and search over a broad range of enterprise content. It covers broad range of data sources and types, both inside and outside an organization.

In addition to the core indexing, discovery, navigation and search engine the software includes a framework for developing information-rich applications that deliver a comprehensive, contextually-relevant view of any topic for business users, data scientists, and a variety of targeted business functions.

InfoSphere Data Explorer solutions improve return on all types of information, including structured data in databases and data warehouses, unstructured content such as documents and web pages, and semi-structured information such as XML.

InfoSphere Data Explorer provides analytics on text and metadata that can be accessed through its search capabilities. Its focus on scalable but secure search is part of why it became one of the leaders in enterprise search. The software’s security features are critical, as organizations do not want to make it faster for unauthorized users to access information.

Also key is the platform’s flexibility at integrating sources across the enterprise. It also supports mobile technologies such as smart phones to make it simpler to get to and access information from any platform.

Features and benefits

1. Secure, federated discovery, navigation and search over a broad range of applications, data sources and data formats.
  • Provides access to data stored a wide variety of applications and data sources, both inside and outside the enterprise, including: content management, customer relationship management, supply chain management, email, relational database management systems, web pages, networked file systems, data warehouses, Hadoop-based data stores, columnar databases, cloud and external web services.
  • Includes federated access to non-indexed systems such as premium information services, supplier or partner portals and legacy applications through the InfoSphere Data Explorer Query Routing feature.
  • Relevance model accommodates diverse document sizes and formats while delivering more consistent search and navigation results. Relevance parameters can be tuned by the system administrator.
  • Security framework provides user authentication and observes and enforces the access permissions of each item at the document, section, row and field level to ensure that users can only view information they are authorized to view in the source systems.
  • Provides rich analytics and natural language processing capabilities such as clustering, categorization, entity and metadata extraction, faceted navigation, conceptual search, name matching and document de-duplication.
2. Rapid development and deployment framework to enable creation of information-rich applications that deliver a comprehensive view of any topic.
  • InfoSphere Data Explorer Application Builder enables rapid deployment of information-centric applications that combine information and analytics from multiple sources for a comprehensive, contextually-relevant view of any topic, such as a customer, product or physical asset.
  • Widget-based framework enables users to select the information sources and create a personalized view of information needed to perform their jobs.
  • Entity pages enable presentation of information and analytics about people, customers, products and any other topic or entity from multiple sources in a single view.
  • Activity Feed enables users to "follow" any topics such as a person, company or subject and receive the most current information, as well as post comments and view comments posted by other users.
  • Comprehensive set of Application Programming Interfaces (APIs) enables programmatic access to key capabilities as well as rapid application development and deployment options.
3.Distributed, highly scalable architecture to support large-scale deployments and big data projects.
  • Compact, position-based index structure includes features such as rapid refresh, real-time searching and field-level updates.
  • Updates can be written to indices without taking them offline or re-writing the entire index, and are instantly available for searching.
  • Provides highly elastic, fault-tolerant, vertical and horizontal scalability, master-master replication and “shared nothing“ deployment.
4. Flexible data fusion capabilities to enable presentation of information from multiple sources.
  • Information from multiple sources can be combined into “virtual documents“ which contain information from multiple sources.
  • Large documents can be automatically divided into separate objects or sub-documents that remain related to a master document for easier navigation and comprehension by users.
  • Enables creation of dynamic "entity pages" that allow users to browse a comprehensive, 360-degree view of a customer, product or other item.
5. Collaboration features to support information-sharing and improved re-use of information throughout the organization.
  • Users can tag, rate and comment on information.
  • Tags, comments and ratings can be used in searching, navigation and relevance ranking to help users find the most relevant and important information.
  • Users can create virtual folders to organize content for future use and optionally share folders with other users.
  • Navigation and search results can return pointers to people to enable location of expertise within an organization and encourage collaboration.
  • Shared Spaces allow users to collaborate about items and topics that appear in their individualized views.

Thursday, March 13, 2014

Compliance With Privacy Regulations

Recently, high-profile cases involving breaches of privacy revealed the ongoing need to ensure that personal information is properly protected. The issue is multidimensional, involving regulations, corporate policies, reputation concerns, and technology development.

Organizations often have an uneasy truce with privacy regulations, viewing them as an obstacle to the free use of information that might help the organization in some way.

But like many compliance and governance issues, managing privacy will offer benefits, protecting organizations from breaches that violate laws and damage an organization's reputation. Sometimes the biggest risks in privacy compliance arise from the failure to take some basic steps. A holistic view is beneficial.

Privacy Compliance Components

Rather than being in conflict with the business objectives, privacy should be fully integrated with it. Privacy management should be part of knowledge management program.

An effective privacy management program has three major components: establish clear policies and procedures, follow procedures to make sure that organization's operation is in compliance with those policies, and provide an oversight to ensure accountability. Example of questions to consider: is data being shared with third parties, why the information is being collected, and what is being done with it.

Expertise about privacy compliance varies widely across industries, corresponding to some degree with the size of an organization. Although large companies are far from immune to privacy violations, they might at least be aware and knowledgeable about the issue.

The biggest mistake that organizations make in handling privacy is to collect data without a clear purpose. You should know not just how you are protecting personal information but also why you are collecting it. It is important for organizations to identify and properly classify all their data.

International Considerations

Increasingly, organizations must consider the different regulations that apply in countries throughout the world, as well as the fact that the regulations are changing. For example, on March 12, 2014, the Australian Privacy Principles (APPs) will replace the existing National Privacy Principles and Information Privacy Principles.

The new principles will apply to all organizations, whether public or private, and contain a variety of requirements including open and transparent management of personal information. Of particular relevance to global companies are principles on the use and disclosure of personal information for direct marketing, and cross-border disclosure of personal information.

It is important to consider international regulations in those countries where an organization has operations.

Technology Role

The market for privacy management software products is still relatively small. The market for this software is expected to grow rapidly over the coming years. The current reform process for data protection has created a need for privacy managing technology.

Products from companies such as Compliance 360 automate the process of testing the risk for data breaches, which is required for the audits mandated by the Economic Stimulus Act of 2009. This act expanded the Health Insurance Portability and Accountability Act (HIPAA) of 1996 requirements through its Health Information Technology for Economic and Clinical Health (HITECH) provisions.

These provisions include increased requirements for patient confidentiality and new levels of enforcement and penalties. In the absence of suitable software products, organizations must carry out the required internal audits and other processes manually, which is time consuming and subject to errors.

Enterprise content management (ECM), business process management (BPM) and business intelligence (BI) technology have important role in privacy compliance because content, processes, and reporting are critical aspects of managing sensitive information.

As generic platforms, they can be customized, which has both advantages and disadvantages. They have a broad reach throughout the enterprise, and can be used for many applications beyond privacy compliance. However, they are generally higher priced and require development to allow them to perform that function.

Privacy in the Cloud

Cloud applications and data storage have raised concerns about security in general, and personally identifiable information (PII) in particular. Although many customers of cloud services have concluded that cloud security is as good or better than the security they provide in-house, the idea that personally identifiable information could be "out there" is unsettling.

PerspecSys offers a solution for handling sensitive data used in cloud-based applications that allows storage in the cloud while filtering out personal information and replacing it with an indecipherable token or encrypted value.

The sensitive data is replaced by a token or encrypted value that takes its place in the cloud-based application. The "real" data is retrieved from local storage when the token or encrypted value is retrieved from the cloud. Thus, even though the application is in the cloud, the sensitive information is neither stored in the cloud nor viewable there. It physically resides behind the firewall and can only be seen from there.

This feature is especially useful in an international context where data residency and sovereignty requirements often specify that data needs to stay within a specific geographic area.

Challenges for Small Organizations

Small to medium-sized organizations generally do not have a dedicated compliance or privacy officer, and may be at a loss as to where to start.

Information Shield provides a set of best practices including a policy library with prewritten policies, detailed information on U.S. and international privacy laws, checklists and templates, as well as a discussion of the Organization for Economic Co-operation and Development (OECD) Fair Information Principles. Those resources are aimed at companies that may not have privacy policies in place but need to do so to provide services to larger healthcare or financial services organizations.

Among the resources is a list of core privacy principles based on OECD principles. Each principle has a question, brief discussion and suggested policy. For example, the purpose specification principle states, "The purposes for which personal information is collected should be specified no later than the time of data collection, and the subsequent use should be limited to fulfilling those purposes or such others that are specified to the individuals at the time of the change of purpose." The discussion includes comments on international laws and a citation of several related rulings.

Plans for Future

Business users and consumers alike have become accustomed to the efficiency and speed of digital data. However, more strict regulations are inevitable. Organizations should become more aware of having to prevent privacy breaches, and to make sure they have the systems in place to do this. Companies should also be concerned about reputation damage, which can severely affect business. Along with reliable technology, the best way forward is to follow best practices with respect to data privacy. Technology is essential, but it also has to be supported by people and processes.

Tuesday, February 25, 2014

Unified Knowledge Management

This scenario might be familiar to many organizations.

Inside an organization, valuable information is not being used. It is scattered in pieces across multiple repositories and siloed organization where no one even bothers to look for it. Valuable content also resides outside your organization: in social media, communities, etc., created by your customers and industry experts, which is used and shared by other customers when they need answers.

In many organizations, employees spend a significant amount of time trying to find and process information, often at a high cost. Recent report found that knowledge workers spend anywhere from 15% to 35% of their time searching for, assembling, and then (unfortunately) recreating information that already exists. And studies show that much of this time is spent not only looking for content, but also looking for experts. Most companies are unable to reuse the majority of work that is created every day.

This is the growing challenge of knowledge management today: how to leverage meaningful knowledge through constant reuse by each and every employee and each and every customer when they need it, no matter where it resides.

Return on Knowledge

These are few points to consider:
  • Data on its own is meaningless. It must be organized into information before it can be used.
  • Data is factual information: measurements, statistics or facts. In and of itself, data provides limited value.
  • Information is data in context: organized, categorized or condensed.
  • Knowledge is a human capability to process information to make decisions and take action.
Knowledge keeps organizations competitive and innovative, and is the most valuable intangible asset. Yet, knowledge is one of the most difficult assets to generate a return on (with repeated access, use and re-use), simply because information is so widespread, fractured, and changing at an accelerated pace.

Connecting the dots between relevant content and associated experts on that content is critical to leveraging the collective knowledge of an organization's ecosystem for the greatest return.

How to Get a Higher Return on Knowledge

The key to a higher return on knowledge is accessibility to information from anywhere, presented within any system, and personalized for the user's context.

The following tips would allow your organization to bring the return on investment in managing the knowledge throughout your organization.

1. Consolidate the knowledge ecosystem. Bring together information from enterprise systems and data sources, employees and customer social networks, social media such as Twitter, Chatter and more. Connect overwhelming amounts of enterprise and social information to get a complete picture of your customers, their interaction histories, products, levels of satisfaction, etc.

2. Connect people to knowledge in context. Connect users to the information they need (no matter where it resides) within their context.

3. Connect people to experts in context. Connect the people (the experts) associated with the contextually relevant content to assist in solving a case, answer a key challenge or provide additional insight to a particular situation.

4. Empower contribution. Allow users to create, rate content, and share knowledge about customers, cases, products, etc.

5. Personalize information access. Present employees and customers with information and people: connections that are relevant, no matter where they are, and no matter what they are working on. Just like the suggestive items on the e-commerce websites you visit, the experience is personalized, because it knows what you are working on.

Bringing this content to the fingertips of your employees and customers will increase organizational productivity, result in more innovative and customer-pleasing products, create happy employees, and drive customer satisfaction as well as profitability.

Unified Indexing

Unified indexing and insight technology is the way that forward-thinking companies will access knowledge in the 21st century. The technology brings content into context: assembling fragments of structured and unstructured information on demand and presenting them, in context, to users.

Designed for the enterprise, unified indexing and insight technology works in a similar way to Google on the Internet, but on the heterogeneous systems (e.g. email, databases, CRM, ERP, social media, etc.), locations (cloud and on-premise), and varied data formats of business today. The technology securely crawls those sources, unifies the information in a central index, normalizes the information and performs mash-ups on demand, within the user's context. The user creates the context based on his or her needs and interests.

Advantages of Unified Indexing:
  • Customers will see a personalized and relevant view of information from the entire knowledge ecosystem (from inside or outside your company) intuitively presented so they can solve their own challenges.
  • Service and support agents can solve cases faster. No longer support agents need to search across multiple systems or waste time trying to find the right answer or someone who knows the answer. They will have relevant information about the customer or case at hand, right at their fingertips: suggested solutions, recommended knowledge base articles, similar cases, experts who can help, virtual communication timelines and more.
  • Knowledge workers can stop reinventing the wheel. When every employee can access relevant information, locate experts across the enterprise, and know what does and does not exist, they can finally stop reinventing the wheel.
The new age of knowledge is here and it is powered by instantly accessible, collective, crowd-sourced and contextually relevant information that comes from everywhere and is presented as knowledge workers go about their work and customers look for information they need.

Friday, January 31, 2014

Unified Data Strategy

The amount of data being created, captured, and managed worldwide is increasing at a rate that was inconceivable a few years ago. Data is a collection of discrete units of information but like the stars in the night sky taken together form an organized structure.

Unstructured data comes in many different formats including pictures, videos, audio, PDF files, spreadsheets, documents, email, and many other formats. 

Sometimes unstructured data lives within a database. Sometimes the database acts as an index for the unstructured data. Often the metadata (information about the data) associated with the unstructured data is larger than the data itself. Consider the example of a set of videos. Although the files may be small in size, the information stored regarding the content within a particular video may be very big. Often unstructured data is also called big data.

Certain business functions require analysis of massive amounts of data.

Multiple systems are being utilized to manage different forms of disparate data. Companies need to adopt a comprehensive and holistic approach to managing these many systems and incorporating them into a combined system.

Modern IT systems should be able to ingest, access, store, manipulate and protect data within a wide variety of disparate formats. These multiple data formats may exclude the necessary flexibility, elasticity and alacrity that many modern business functions require. There are situations when data must be accessed so quickly and data management systems should be able to accommodate such situations. Each of these systems recognizes a particular style of data with a fairly well-defined set of attributes and manages that data to satisfy a particular business function.

A Unified Data Strategy (UDS) is a broad concept that describes how massive amounts of data in a multitude of forms can and should be understood and managed. UDS is also a specific individualized methodology developed by each data owner to manage that data in all its forms in a comprehensive but interrelated manner.

By adopting a UDS, data owners will be able to develop comprehensive, customized methodologies to manage their data. By taking into account the interconnected nature of the various sources of data and tailoring the management of that data to the specific business requirements the maximum value can be achieved.

UDS can be used to address the task of comprehensive data management. Cloud computing may provide the solution to this data management and recognition problem. Virtualization, the foundation of cloud computing, is the cornerstone of this strategy. The capabilities and architecture enabled via a virtual/cloud infrastructure can help companies to develop a UDS to address the movement in data management and practice.

Exciting new technologies and methodologies are evolving to address this phenomenon of science and culture creating huge new opportunities. These new technologies are also fundamentally changing the way we look at and use data.

The rush to monetize big data makes various solutions appealing. But companies should perform proper due diligence to fully understand the current state of their data management systems. Companies must learn to recognize the various forms of disparate and seemingly extraneous forms of information as data and develop a plan to manage and utilize all their data assets as a single, more powerful whole.

The transition from traditional relationally-structured data to a UDS could be complicated, but can be navigated effectively with an organized and managed approach to this effort.

To successfully adopt a Unified Data Strategy, companies should focus on the following:

1. Develop a thorough understanding of how the business consumes, produces, manipulates and uses information of all types.

2. Determine how the business can use data to both understand external factors and to assist in making internal decisions, as well as to understand how the data itself is relevant to influencing the business.

3. Analyze the "personality" of each data form so that it can be matched with tools that appropriately acquire, filter, store, safeguard and disperse the data into useful information.

4. Select infrastructure and tools that automate or eliminate traditional high-cost tasks such as import, provisioning, scalability, and disaster tolerance. A highly virtualized infrastructure with complementary tools should provide the majority of these capabilities.

5. Commit to the process of learning as an entirely new approach to technology, and to adopting it in risk-appropriate increments.

Any organization with a significant data infrastructure should be aware of the pitfalls that could occur if a company rushes into acquiring new technologies without understanding their requirements. Thorough analysis will lead to an understanding of the current state of their data management systems, and subsequently to better control of their existing data.

Ultimately, organizations should be able to recognize, manage, and utilize new forms of disparate and seemingly extraneous information as data. Companies, that develop a plan to comprehensively address all their issues around managing and utilizing all useful data, will gain significant strategic advantages.