There has been a lot of talk lately about big data. What is big data?
Big data is is a collection of data sets so large and complex that it becomes difficult to process using on-hand commonly used software tools or traditional data processing applications. The challenges include capture, governance, storage, search, sharing, transfer, analysis, and visualization.
What is considered "big data" varies depending on the capabilities of the organization managing the data set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain.
Big data sizes are a constantly moving target. As of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. How does it apply to us and what we do in content management?
The sheer numbers, covered in most enterprise content management (ECM) analyst reports, also extend to all aspects of the information technology sector, prompting developers to create a new generation of software and technology or distributed computing frameworks in an effort to cope with this scalability phenomenon.
Content growth is everywhere. From traditional data warehouses to new consolidated big data stores, IT infrastructure must be ready for this continuing scale; it impacts the entire IT industry, especially ECM.
Content is getting bigger. Applications are growing more complex, challenging IT as never before. How will these changes impact content management technologies? It's difficult to predict exactly, but there are insights to be found and used to plan for the future.
ECM technology is evolving toward a platform-based approach, enabling organizations to make their own content-centric and content-driven applications smarter. Analysts, vendors and users all agree: The time for "out-of-the-box" CMS applications has passed. Now each project can meet specific needs and individual requirements.
Content and data, more often than not, come with embedded intelligence whether through adding custom metadata and in-text information or by leveraging attached media and binary files and it can be utilized, whether structured or unstructured.
This can be observed on many different levels across various domains. For instance, the arrival of what some have started to call "Web 3.0": the semantic Web and the related technology that promotes intelligence out of raw content through advancements like semantic text analysis, automated relations and categorization, sentimental analysis, etc. -- effectively, giving meaning to data.
More traditional ECM components, such as workflows, content lifecycle management and flexibility, demonstrate much of the same. Smart content architecture along with intelligent, adaptive workflow and processed or deep integration with the core applications within information systems are all making enterprise content-centric applications smarter and are refining the way intelligence is brought to content.
In short, content is getting smarter on the inside as much as on the outside.
In fact, such disruptive phenomena as Big Data or the new semantic technology on the scene are huge opportunities for enterprise content management solutions. They are bringing new solutions and possibilities in business intelligence, semantic text analysis, data warehousing and caching that require integration into existing content-centric applications, all without rewriting them.
As a result, Big Data and smart content will push more of enterprise content management toward technical features such as software interoperability, extensibility and integration capabilities.
These developments will also demand a clean and adaptive architecture that is flexible enough to evolve as new standards arise to bridge CMS and semantic technologies, as well as connectors, to a back-end storage system or connectors with text-analysis solutions.
This underscores the advancements made in the development of modular and extensible platforms for content-centric applications. Taking the traditional approach of employing large enterprise content management suites that rely on older software architecture will make it harder to leverage these new and nimble opportunities.
In order to get the most value out of smart content and refine methods of dealing with Big Data, enterprise content management architects must incorporate a modern and well designed content management platform upon which to build, one that not only looks at end-user features but stays true to the development side. Enterprise content management will not be reinvented; Big Data and smart content are evolutions, not revolutions, in the industry.
I will continue on this subject in my future posts.