Big Data is everywhere. But to harness its potential, organizations should understand the challenges that come with collecting and analyzing Big Data.
The three values that are important in managing big data are volume, velocity, and variety. These three factors serve as guidance for Big Data management, highlighting what businesses should look for in solutions.
But even as organizations have started to get a handle on these three V’s, two other V’s, veracity and value are important as well, if not more so.
Volume is the ability to ingest, process, and store very large data sets. Definition of "very large" can vary by business and is dependent upon the particular circumstances of the business problem, as well as the preceding volumes used by that business.
Volume can also be defined as the number of rows, or the number of events that are happening in the real world that are getting captured in some way, a row at a time. Accordingly, the more rows that you have, the bigger the data set is going to be.
Bigger Volumes, Higher Velocities
In today’s digital age, having huge volumes of data is hardly rare. The proliferation of mobile devices ensures that companies can gather more data on consumers than ever before, and the rise of the Internet of Things will only increase this plethora of data. Moreover, businesses will have even more information on customers as they begin to use one-on-one messaging channels to interact directly with them.
The sheer volume of data available to us is greater than ever before. In fact, in many ways, nearly every human action can be quantified and logged in a bank of data that’s growing at an incredibly fast rate. All of this data can be turned into actionable insights that drive business decisions and can help transform every customer interaction, create operational efficiency, and more.
This increase in data volume is paired with a simultaneous increase in speed. The speed with which the volume is increasing, as well as the volume itself, are both increasing. These increases have forced IT staff to spend more time trying to figure out how to process and analyze that data.
Velocity is the key V of the three V’s. For example, a customer will visit a company’s site or use its mobile application but only for a short amount of time. The business may have just seconds to gather customer information and deliver a relevant response based on that information, usually just one message or offer.
This quick turnaround time requires you to process all of that real-time behavioral data as fast as possible. If you only understand that your customer was on your Web site the day after, you’re not able to contact them anymore. One aspect of a successful customer journey is being able to send the right message at the right time to the right customer. Timeliness and relevancy are the foundation of delivering personalized customer experiences in real time.
A Variety of Formats
Data sets are in a variety of formats, and the number of data types continues to grow. Radio-frequency identification (the use of electromagnetic fields to gather information from tags attached to objects), smart metering (devices that monitor information on energy consumption for billing purposes), and the ubiquity of mobile devices with geo-location capabilities are only few examples of diverse sources of consumer information.
All of these technologies have their own methods of capturing and publishing data, which adds to the complexity of the information environment.
But overcoming these data complexities could be well worth it. Having a large variety of data is crucial for creating a holistic customer view. Access to data such as a customer’s purchasing history, personal preferences based on social media postings, exercising habits, caloric intake, and time spent in the car can help companies understand that customer on a deeper level, and thus build experiences that are tailored to that customer.
But this diversity of data sources can be a blessing and a curse. A blessing because businesses have an increasingly large range of channels from which to pull customer information, but a curse because it can be difficult to filter through that information to find the most valuable content.
Variety is a little overstated in what people talk about for Big Data. Audio and video as examples of data formats that can be particularly difficult to analyze. Usually what companies do is they try to come up with an intermediate representation of that data, and then use that intermediate representation to apply old or new algorithms to try to extract signals, whatever the definition of signal is for that business problem they’re trying to solve.
Volume, velocity, and variety are undoubtedly important to managing customer information. Companies should keep in mind other important aspects of big data if they want to make the most of it.
Data tools such as Apache Hadoop and Apache Spark have enabled new methods of data processing that were previously out of reach for most organizations. While the growing volume of data, the time needed to process it, and the sheer number of input sources pose challenges for businesses, all three can largely be addressed through technology.
New V's Emerge
Investment in Big Data has begun to stabilize and enter a maturity phase over the past year. It will take time for infrastructure and architectures to mature, and best practices should be developed and refined against these architectures.
Organizations should consider how to use Big Data to bring about specific outcomes, in other words, organizations should examine the challenges of Big Data from a business perspective as opposed to a technical one. A framework that incorporates the business-oriented characteristics of veracity and value can help enterprises harness Big Data to achieve specific goals.
Not all data is the same, but organizations may not be paying enough attention to changes within individual data sets. Contextualizing the structure of the data stream is essential. This includes determining whether it is regular and dependable or subject to change from record to record, or even with each individual transaction. Organizations need to determine how the nature and context of data content in all its forms, text, audio, or video, can be interpreted in a way that makes it useful for analytics.
This is where the veracity of data or the trustworthiness of data comes in. Determining trustworthiness is particularly important when it comes to third-party data. It passes through a set of edits and validation rules.
Veracity entails verifying that data is suitable for its intended purpose, and usable within a given analytic model. Organizations should use several measurements to determine the trustworthiness and usefulness of a given data set. Establishing the degree of confidence in data is crucial so that analytic outputs based on that data can be a stimulus for business change.
Important metrics for evaluating and cleaning up data records are:
- completeness measurements, or the percentage of instances of recorded data versus all available data within a business ecosystem or market (or the percentage of missing fields within a data record);
- uniqueness measurements, or the percentage of alternate or duplicate data records;
- accessibility measurements, or the number of business processes and personnel that can benefit from access to specific data, or that can actually access that data;
- relevancy measurements, or the number of business processes that utilize or could benefit from specific data;
- scarcity measurements, or the probability that other organizations including competitors and partners have access to the same data (the scarcer the data, the more it has impact).
Value is Paramount
While veracity can’t be overlooked, value is the most important factor. The first three V’s are really talking about architecture, infrastructure, representation of data, things that are important to IT organizations and, by far, less interesting to the business stakeholders.
The business stakeholders really don’t care about the first three, they only care about the value they can extract from the data. Executives often expect the analytical teams at their organizations to hide the first three V’s (volume, velocity, and variety) and only generate the last V - the value that is fundamental to the success of the business.
The concept of value is essential for organizations to succeed in monetizing their data assets. Value is a property that helps identify the purpose, scenario, or business outcomes that analytic solutions seek to address. It helps to confirm what questions are to be answered and what actions will be taken as a result, and defines what benefits are anticipated from collecting and analyzing the data.
Value is a motivating force when it comes to developing new and innovative ideas that can be tested by exploring data in different ways.
The ability to pull valuable information from Big Data and use that information to build a holistic view of the customer is absolutely critical. It’s no longer just an option to develop one-to-one relationships with customers; it’s a requirement. And to build that relationship, companies have to leverage all the customer information they can to personalize every interaction with them.
By using such information to lead customers on a personal journey, companies can help ensure that customers will stay with them long term, and even become brand advocates. Value is derived from making the data actionable. Organizations can have all the information about a customer, but it’s what we they can do with it that drives value for the business.
The Three V’s model of volume, velocity, and variety is useful for organizations that are just beginning to take control of their data, and certainly should not be forgotten by enterprises that have advanced further in their management of customer information.
The first three V’s are equally important. In the digital age, companies have accumulated more data than ever before, are pulling data from a variety of sources, and are increasing the rate at which that data flows, and that a combination of these three factors can help organizations to create relevant, personal, and one-on-one customer interactions.
Deriving value is the ultimate business goal for any enterprise. The standard Three V’s model does not satisfactorily identify any data properties from a business usage perspective. Even though Big Data, and data in general, provides organizations with a lot of capabilities, the challenge for businesses is to make sure that they adapt how they think about the business processes, how they report on them, and how they define key performance indicators.
Organizations should try to get to the value. They need to turn that data into value. It’s figuring out how to use that data to optimize business processes. In the end, the Three V’s model for Big Data is a useful start point. But then it becomes about the ultimate goal, the one organizations must not lose sight of: driving value.
Galaxy Consulting has 17 years experience in big data management. We are on the forefront of driving value of big data.