Galaxy Consulting Blog

Sunday, April 30, 2017

E-Discovery Tools

Electronic discovery or e-discovery refers to discovery in legal proceedings such as litigation or government investigations where the information sought is in electronic format. The ever increasing amount of litigation, greater volumes of data and a move toward adding in-house e-discovery capabilities require strong tools for e-discovery.

Data is scattered throughout companies and has become progressively more difficult to manage. Companies are dealing with big data, data in shared repositories such as Box.com, data on mobile devices, etc.

Data must be protected during e-discovery just as it does when it is a part of any other business activity. The degree of security risk depends on the nature of the data. Standard business contracts might not be highly sensitive and thus create minimal risk, but exposure of intellectual property that represents the crown jewels of a company could be a major risk.

When legal hold is used effectively, companies can meet their preservation duties, then do targeted collections as needed in the case. Good hold process plus targeted collections can significantly reduce the amount of information that must be reviewed by attorneys, which accounts for 70 percent of e-discovery costs.

Another value proposition in using an automated legal hold solution that is integrated with collections and first-pass review is the ability to re-purpose a collection.

Cloud offerings could be used to centralize all this data in one place for efficient reuse and risk management.

Several trends are contributing to strong growth in tools for the e-discovery. In addition to a group of large e-discovery vendors, many smaller vendors have products that are working well for their customers, and there is also room for new entrants that improve performance or address specific needs.

Each product has particular strengths, and that wide array offers options that can be used very selectively or in conjunction with each other to meet a company’s goals.

Sometimes, legal holds are required. Legal holds are required when a company might reasonably expect litigation and therefore should not delete information that might be relevant to the litigation.

Legal Hold Pro

This application has templates for the system and the database with the contact information for employees who are custodians of data. The system can also be used to track the information and people affected, automate the interviews with custodians, send reminders and release holds when appropriate. It allows to check the information of terminated employees to see if it might be subject to hold, and review responses from custodians to create the collection plan.

The same collection and review tagging could be used again by adding only the incremental data generated since the original one.

As a cloud product, Legal Hold Pro is quick and easy to launch, and is updated frequently.

Technology-assisted review (TAR)

Once a set of documents is located that may be responsive to the e-discovery request, it needs to be searched. The effective use of human skills in conjunction with computer capabilities is a key ingredient in lowering down the volume of data that needs to be reviewed by attorneys or other legal professionals.

Technology-assisted review (TAR), also called predictive coding, is a method for training a computer to spot documents that may be relevant and distinguish them from those that are not.

Catalyst

Catalyst provides e-discovery software and services.

Catalyst Insight is a secure cloud-based platform where clients can search, review, mark and produce documents. It can be augmented with Insight Predict, a predictive ranking TAR 2.0 solution that uses continuous active learning (CAL) to speed the review process by allowing technology to work alongside the judgments that human reviewers make. The solution brings the most relevant documents to the top of the list rather than working in a linear fashion.

The company’s TAR 2.0 software is specially designed for e-discovery. Some of the early TAR products were re-purposed machine learning tools. They can work in situations where the target documents are a large proportion of the total, but if you are looking for the one percent that are ‘hot docs,’ then they are not as effective. With TAR 2.0, attorneys and legal professionals who are subject matter experts do the initial coding for relevancy. Each of their judgments about the relevancy of a document is fed back to the system as a means of “training” to identify others that also might be relevant.

In the case of earlier versions of TAR, adding new documents caused the random sampling assumptions to no longer be correct. Unlike earlier products, which had a finite learning phase and then a production phase, TAR 2.0 allows new coding to be immediately incorporated into the algorithm for searching the document repository so that it is correctly tuned to the current problem domain.

It allows every decision made by an attorney to be put to maximum use, allowing humans to do what they do best, and then let the computer do what it does best, which is to quickly surface the relevant documents.

One practical limitation of early versions of TAR was that it could not handle small volumes of documents because the usual percentage of samples did not provide enough examples from which the computer could learn. This became improved in later versions of the tool.

Recommind

In 2006, the federal rules for discovery changed to include discovery of electronic information. E-discovery includes the collection, processing and analysis of e-mail and other electronic documents that might be relevant to a case, including determination of whether the documents are indeed relevant.

What sets Recommind apart from many industry solutions, is the ability to prioritize records and pull together similar records.

Recommind’s Axcelerate product can research, collate and assemble electronic records into reports. The electronic records for a single case can sometimes number into the millions.

Axcelerate’s adaptive batching expedites the feedback loop on search or analytics-based document sets, making continued batching not just automatic, but also conditional on the relevancy found through sampling. That enables a law firm to determine by batch if certain records are indeed relevant to a case, rather than reviewing them individually.

Magnum Software

It allows to quickly search, annotate and link to portions of documents. The collaboration capability is quite robust. Users can share their work product with any other users or groups of users via a one-click e-mail alert.

The alert automatically includes a direct link to the note and passage so the recipient can log in from anywhere, review the remarks and continue the discussion thread within Opus 2 Magnum. Additionally, multiple users can “chat” within the application.

The application works much better with smaller files than loading them all to a large database, but Magnum can also scale for larger file sizes.

Exterro

This in an excellent tool for eDiscovery. It provides eDiscovery and other records management needs in a single platform. Genome data mapping module can be added which will create an excellent solution for the data mapping.

With the increasing number of records and need to keep track of them and pull them together efficiently, the demand for KM technology for records and information management will continue to grow.

Galaxy Consulting has 17 years experience in ensuring that ediscovery process is going smoothly.

Sunday, January 22, 2017

Five Trends of Knowledge Management

Many issues affect knowledge management. The five most important are big data, cybersecurity, mobility, social analytics, and customer engagement.

The availability of big data has opened many options for understanding everything from customer preferences to medical outcomes.

Amidst all that data, concerns about security have grown, so cybersecurity is taking on new importance. Mobility has become pervasive and affects nearly every element in KM, while social analytics is providing insights at a personal level that were never possible before.

Finally, although those four factors feed into many KM objectives, enhancing customer engagement has taken a place at the top of the priority list for virtually every company and is likely to remain there for some time.

Big data

The most dramatic trend impacting knowledge management is harvesting and analyzing big data. An esoteric phenomenon just a few years ago with a new set of technologies and terminology, big data is now wrapped into the strategic plans of many organizations, and not just the big ones.

There are few applications to help with this challenge.

One of them is Hadoop. It can help to integrate complex sets of data to make business decisions and marketing efforts.

Actian Analytics Platform is a big data analytics solution that is accessible and affordable for small businesses, but also scalable to large ones. It can be used to target right customers. It can also be used to generate an economic case for potential buyers.

For example, Yahoo uses Actian to segment millions of users across 10,000 variables, looking for clues that will help predict customer behavior. Amazon uses Actian to provide the core technology components for its cloud-based data warehouse.

The technology can pull together diverse data in near real time as it flows through the data pipeline, marketing, customer engagement, risk assessment and many other applications. At both ends of the spectrum, from startups to large-scale users, big data is the central force in converting large amounts of data to decision-supporting information.

Cybersecurity

With so much information at large, unauthorized access to it has the potential to be destructive. Knowledge management is focused on information. What makes KM so important is that people can get information and analyze it better. In the past, it was hard to find out who was buying products and how they felt about them. Now an enormous amount of information is available, which has benefits. The information can be stolen and used financial gain.

The cybersecurity market is expected to increase from $95.6 billion in 2014 to $155.7 billion by 2019, resulting in a 10.3% per year increase during that time period. This amount includes network, endpoint, application, content and wireless security as well as many other types of technology. Innovative products are emerging in response to increased threats.

The volume of data, including an entire new collection from the Internet of Things, the challenges of mobile devices, greater use of the cloud for data storage and the broad impact of consumer concern are all sparking the growth.

Cybercrime comes in many forms, from stealing credit card numbers out of a merchant’s database to identity theft of consumers. A common strategy is for a cyberthief to obtain some publicly available information about an individual and use it to open an account or figure out a password that provides them access to an account. Users need to be vigilant about changing their passwords and making them strong. Technological safeguards can be put into place, but security depends a great deal on the human effort.

Mobile devices add another element of risk. They are much easier to lose or to steal, and often contain sensitive information such as bank passwords. Technological advances such as the ability to remotely disable a phone will continue to emerge to protect users from the impact of cybertheft. However, the result of users being careless with physical security, such as leaving a laptop in an unlocked car, remains a threat.

Companies can mitigate the impact on their customers by limiting the responsibility of users in the event of fraud or identity theft. Industries are growing up around providing insurance for such scenarios, either to the merchant or the customer.

Mobility

Although mobility brings hazards, it has brought even more advantages, and it will continue to drive the pervasiveness of knowledge management. Increasingly, knowledge management solutions, including content management, process management and analytics, have mobile versions of the solution. No longer a miniaturized version of the desktop browser, mobile apps are delivering usable KM applications.

Mobility is also forging new paths. For example, Apple Pay allows use of the smartphone as a wallet.

One mistake merchants make in designing mobile apps is to try to duplicate a physical purchase experience on a mobile device. Merchants should not necessarily automate an existing process, but instead should look at the experience holistically. Mobile experiences have to be simpler and as good as, if not better than, the non-mobile experience in order to gain loyalty from the customer.

Barriers remain in the use of mobile devices for enterprise applications, but the barriers also represent opportunities. In a study of U.S. and U.K. information technology decision makers conducted by Vanson Bourne, respondents reported that although more than 400 enterprise applications were typically deployed in each organization, only 22% of them could be easily accessed on mobile devices.

One reason for that is the diversity of enterprise applications. Some are custom, some are SaaS and some are off-the-shelf, and the technology for accessing each one is different. Therefore, development of mobile apps for such applications is needed, but organizations are hampered by the high cost. More efficient development techniques would be a big benefit.

The proliferation of mobile devices has also increased a number of other supporting sectors beside mobile application management (MAM), including mobile content management (MCM) and mobile device management (MDM). Each of them has a touchpoint to knowledge management and should be viewed in conjunction with an overall KM strategy.

Social analytics

Social analytics is a booming market which is expected to triple over the next five years to nearly $9 billion and showing a growth rate of nearly 25% per year. Initially based on simple counts of the number of times a brand was mentioned in social media, analytics has evolved to the point where it is using sophisticated algorithms that support the use of social data for targeted marketing and for initiating customer service.

Social analytics moved from hindsight to insight and now to foresight, with predictive capabilities. SAS social media solutions include integration and storage of social data, general text analytics and analysis of comments for sentiment, and a social conversation module that can work directly or integrate with third-party engagement solutions.

Real-time analysis allows marketing or brand campaigns to be synchronized with the topic threads that are emerging. Decision trees allow ‘what-if’ scenarios such as the impact of increasing the frequency of an ad, or combining customer segments. These analysis allows users to determine the relationships among various factors and to present visualizations of the relationships for better marketing decisions.

The value of social media analytics is also increased by combining it with data such as purchasing information from the data warehouse, to compare customers’ stated intentions with actual behavior. There is tremendous growth in analyzing social media information along with data from the Internet of Things which measures physical activity to build a profile not just of transactions but of tone and behavior along the customer journey.

Social media analytics should not be isolated. The information should be tightly connected to upstream data so different departments can use it to drive the customer experience.

Customer engagement

The driving force for all of the above is customer engagement - collecting and managing big data, keeping information secure, enabling mobility and analyzing social media inputs. The ultimate goal is to engage the customer, whether for marketing, customer support, participation in loyalty programs or some other outcome.

The key for customer engagement is omni-channel. Whether the interaction is initiated by the customer or the organization, customers want options in the delivery channels.

Customer engagement is not a static business area. The feedback obtained through social analytics and traditional business intelligence can be merged to explain both what customers are doing and why. That information can guide the delivery of marketing materials and help provide better customer service.

Galaxy Consulting has 17 years experience in knowledge management. We have lead knowledge management initiatives. Contact us for a free consultation.

Saturday, December 31, 2016

Search in the Land of Information Silos

Information access and retrieval within most organizations is a work in progress. There might be a general search system for marketing information, and probably one or more database search systems.

The larger the organization, the greater the number of information retrieval systems. Each laptop and mobile device has a search system. Mobile phone apps sport their own search systems. The lawyers in an organization may have different search systems for specific types of legal matters. The enterprise resource planning (ERP) users have a search system. When it comes to enterprise search, there are many silos.

A “silo” is a content collection available to certain users. In the face of the reality of silos, it might be impractical idea of providing access to “all” information. “All” may not mean all or even some available information. Big data is easy to talk about but difficult to make accessible. The same challenge exists for images, audio recordings, and engineering drawings with details hidden into the proprietary system’s database.

Search which is variously called universal, unified or federated search is a solution to the challenge of information silos. The term meta-search is often used to describe an integrating function that passes the user’s query across discrete content indexes and returns a single results list to the user. Endeca, Inxight Software, Northern Light, Sagemaker and Vivisimo are search applications that can be used for universal, unified or federated search in an organization.

The initial query might not unlock the information stored in the system’s index. The facets, topics and suggests make it easy for the user to click through the links without having to craft additional queries.

Behind the curtains, federated search results requires some maintenance. A user does not want to know the file format in which the information he or she needs is stored. The user wants answers. Early federating systems like WAIS relied on standards for content representation. Today, however, there are many “standards,” and content processing systems must be able to process content in the hundreds of formats found in organizations.

It is important to deliver a system that makes an organization’s disparate types of digital content available.

There are barriers to unified, federated or integrated search.

Some digital content cannot be included in a general purpose search system for security, business or legal reasons. Technical content such as chemical structure information at a pharmaceutical company requires special purpose systems. The same need applies to product manufacturing data, legal information and engineering drawings.

Most search applications exclude video streams from the index. If video is indexed, the system processes the text included in the digital file or indexing provided by the video owner.

The cost of creating connectors to connect with certain content types could be too high, or license fees could be required to gain access to the file formats.

The computational burden required to process certain types of content might exceed the organization’s ability to fund the content processing. Big data, for example, requires a computing capability able to handle the Twitter stream, RSS feeds and telemetry data from tracking devices. Cost could be prohibitive for processing all content types.

The most important challenge is the need for confidentiality. The legal department does not want unauthorized access to information related to a legal matter out of its control.

Some government contracts required that for certain types of government work, the information related to that project must be separated. Common sense dictates that plans for a new product and its pricing remain protected. If someone needs access to that information, a different search system may be used to ensure confidentiality.

Even in the absence of business or legal requirements, some professionals do not want to share content. That may be a management problem. When a manager locks up information in a no-access silo, a software script will skip the flagged server.

To summarize, silos of information present a challenge to process and effectively use in organizations. In the enterprise, integration should take place within silos of content.

Galaxy Consulting has 17 years experience in integrating information silos using universal, unified or federated search. We have experience with search applications. Contact us for no obligation free consultation!

Wednesday, November 30, 2016

Three Values of Big Data

Big Data is everywhere. But to harness its potential, organizations should understand the challenges that come with collecting and analyzing Big Data.

The three values that are important in managing big data are volume, velocity, and variety. These three factors serve as guidance for Big Data management, highlighting what businesses should look for in solutions.

But even as organizations have started to get a handle on these three V’s, two other V’s, veracity and value are important as well, if not more so.

Volume is the ability to ingest, process, and store very large data sets. Definition of "very large" can vary by business and is dependent upon the particular circumstances of the business problem, as well as the preceding volumes used by that business.

Volume can also be defined as the number of rows, or the number of events that are happening in the real world that are getting captured in some way, a row at a time. Accordingly, the more rows that you have, the bigger the data set is going to be.

Bigger Volumes, Higher Velocities

In today’s digital age, having huge volumes of data is hardly rare. The proliferation of mobile devices ensures that companies can gather more data on consumers than ever before, and the rise of the Internet of Things will only increase this plethora of data. Moreover, businesses will have even more information on customers as they begin to use one-on-one messaging channels to interact directly with them.

The sheer volume of data available to us is greater than ever before. In fact, in many ways, nearly every human action can be quantified and logged in a bank of data that’s growing at an incredibly fast rate. All of this data can be turned into actionable insights that drive business decisions and can help transform every customer interaction, create operational efficiency, and more.

This increase in data volume is paired with a simultaneous increase in speed. The speed with which the volume is increasing, as well as the volume itself, are both increasing. These increases have forced IT staff to spend more time trying to figure out how to process and analyze that data.

Velocity is the key V of the three V’s. For example, a customer will visit a company’s site or use its mobile application but only for a short amount of time. The business may have just seconds to gather customer information and deliver a relevant response based on that information, usually just one message or offer.

This quick turnaround time requires you to process all of that real-time behavioral data as fast as possible. If you only understand that your customer was on your Web site the day after, you’re not able to contact them anymore. One aspect of a successful customer journey is being able to send the right message at the right time to the right customer. Timeliness and relevancy are the foundation of delivering personalized customer experiences in real time.

A Variety of Formats

Data sets are in a variety of formats, and the number of data types continues to grow. Radio-frequency identification (the use of electromagnetic fields to gather information from tags attached to objects), smart metering (devices that monitor information on energy consumption for billing purposes), and the ubiquity of mobile devices with geo-location capabilities are only few examples of diverse sources of consumer information.

All of these technologies have their own methods of capturing and publishing data, which adds to the complexity of the information environment.

But overcoming these data complexities could be well worth it. Having a large variety of data is crucial for creating a holistic customer view. Access to data such as a customer’s purchasing history, personal preferences based on social media postings, exercising habits, caloric intake, and time spent in the car can help companies understand that customer on a deeper level, and thus build experiences that are tailored to that customer.

But this diversity of data sources can be a blessing and a curse. A blessing because businesses have an increasingly large range of channels from which to pull customer information, but a curse because it can be difficult to filter through that information to find the most valuable content.

Variety is a little overstated in what people talk about for Big Data. Audio and video as examples of data formats that can be particularly difficult to analyze. Usually what companies do is they try to come up with an intermediate representation of that data, and then use that intermediate representation to apply old or new algorithms to try to extract signals, whatever the definition of signal is for that business problem they’re trying to solve.

Volume, velocity, and variety are undoubtedly important to managing customer information. Companies should keep in mind other important aspects of big data if they want to make the most of it.

Data tools such as Apache Hadoop and Apache Spark have enabled new methods of data processing that were previously out of reach for most organizations. While the growing volume of data, the time needed to process it, and the sheer number of input sources pose challenges for businesses, all three can largely be addressed through technology.

New V's Emerge

Investment in Big Data has begun to stabilize and enter a maturity phase over the past year. It will take time for infrastructure and architectures to mature, and best practices should be developed and refined against these architectures.

Organizations should consider how to use Big Data to bring about specific outcomes, in other words, organizations should examine the challenges of Big Data from a business perspective as opposed to a technical one. A framework that incorporates the business-oriented characteristics of veracity and value can help enterprises harness Big Data to achieve specific goals.

Not all data is the same, but organizations may not be paying enough attention to changes within individual data sets. Contextualizing the structure of the data stream is essential. This includes determining whether it is regular and dependable or subject to change from record to record, or even with each individual transaction. Organizations need to determine how the nature and context of data content in all its forms, text, audio, or video, can be interpreted in a way that makes it useful for analytics.

This is where the veracity of data or the trustworthiness of data comes in. Determining trustworthiness is particularly important when it comes to third-party data. It passes through a set of edits and validation rules.

Veracity entails verifying that data is suitable for its intended purpose, and usable within a given analytic model. Organizations should use several measurements to determine the trustworthiness and usefulness of a given data set. Establishing the degree of confidence in data is crucial so that analytic outputs based on that data can be a stimulus for business change.

Important metrics for evaluating and cleaning up data records are:

completeness measurements, or the percentage of instances of recorded data versus all available data within a business ecosystem or market (or the percentage of missing fields within a data record);
uniqueness measurements, or the percentage of alternate or duplicate data records;
accessibility measurements, or the number of business processes and personnel that can benefit from access to specific data, or that can actually access that data;
relevancy measurements, or the number of business processes that utilize or could benefit from specific data;
scarcity measurements, or the probability that other organizations including competitors and partners have access to the same data (the scarcer the data, the more it has impact).

Value is Paramount

While veracity can’t be overlooked, value is the most important factor. The first three V’s are really talking about architecture, infrastructure, representation of data, things that are important to IT organizations and, by far, less interesting to the business stakeholders.

The business stakeholders really don’t care about the first three, they only care about the value they can extract from the data. Executives often expect the analytical teams at their organizations to hide the first three V’s (volume, velocity, and variety) and only generate the last V - the value that is fundamental to the success of the business.

The concept of value is essential for organizations to succeed in monetizing their data assets. Value is a property that helps identify the purpose, scenario, or business outcomes that analytic solutions seek to address. It helps to confirm what questions are to be answered and what actions will be taken as a result, and defines what benefits are anticipated from collecting and analyzing the data.

Value is a motivating force when it comes to developing new and innovative ideas that can be tested by exploring data in different ways.

The ability to pull valuable information from Big Data and use that information to build a holistic view of the customer is absolutely critical. It’s no longer just an option to develop one-to-one relationships with customers; it’s a requirement. And to build that relationship, companies have to leverage all the customer information they can to personalize every interaction with them.

By using such information to lead customers on a personal journey, companies can help ensure that customers will stay with them long term, and even become brand advocates. Value is derived from making the data actionable. Organizations can have all the information about a customer, but it’s what we they can do with it that drives value for the business.

The Three V’s model of volume, velocity, and variety is useful for organizations that are just beginning to take control of their data, and certainly should not be forgotten by enterprises that have advanced further in their management of customer information.

The first three V’s are equally important. In the digital age, companies have accumulated more data than ever before, are pulling data from a variety of sources, and are increasing the rate at which that data flows, and that a combination of these three factors can help organizations to create relevant, personal, and one-on-one customer interactions.

Deriving value is the ultimate business goal for any enterprise. The standard Three V’s model does not satisfactorily identify any data properties from a business usage perspective. Even though Big Data, and data in general, provides organizations with a lot of capabilities, the challenge for businesses is to make sure that they adapt how they think about the business processes, how they report on them, and how they define key performance indicators.

Organizations should try to get to the value. They need to turn that data into value. It’s figuring out how to use that data to optimize business processes. In the end, the Three V’s model for Big Data is a useful start point. But then it becomes about the ultimate goal, the one organizations must not lose sight of: driving value.

Galaxy Consulting has 17 years experience in big data management. We are on the forefront of driving value of big data.

Pages