The year 2013 will be important for a couple of reasons. Believe it or not, 2013 marks the twentieth anniversary of the World Wide Web. It is true that Tim Berners-Lee developed the essential technologies of the web at CERN laboratory in Switzerland in 1989-90. However, it was the first graphical browser called Mosaic—developed by a team at the National Center for Computer Applications at the University of Illinois-Urbana—in April 1993 that made the web enormously popular.
Without Mosaic, brainchild of UI-U NCSA team member Marc Andreessen, the explosive growth of the web in the 1990s could not have happened. Mosaic brought the web outside the walls of academia and transformed it into something that anyone could use. In June 1993 there were only 130 web sites; two years later there were 230,000 sites. In 2007 there were 121 million web sites; it is estimated that there are now 620 million web sites. Now that qualifies as exponential growth.
This brings me to the second reason why this year is important: worldwide digital information will likely surpass 4 zettabytes of data in 2013. This is up from 1.2 zettabytes in 2010. Most of us are familiar with terabytes; a zettabyte is 1 billion terabytes. In between these two are petabytes (1 thousand terabytes) and exabytes (1 million terabytes). 2013 is going to be a big year for Big Data.
Companies that grew up in the age of the World Wide Web are experts at Big Data. As of 2009, Google was processing 24 petabytes of data each day to provide contextual responses to web search requests. Wal-Mart records one million consumer transactions per hour and imports them into a database that contains 2.5 petabytes. Facebook stores, accesses and analyzes 30+ petabytes of user-generated data.
The expansion of worldwide Big Data and the metric terms to describe it (yottabytes or 1,000 zettabytes are coming next—beyond that is TBD) has become the subject of much discussion and debate. Big Data is most often discussed in terms of the four V’s: volume, velocity, variety and value.
The accumulation of Big Data volume is being driven by a number of important technologies. Smartphones and tablets and social media networks Facebook, YouTube and Twitter are important Big Data sources. There is another less visible, but nonetheless important, source of Big Data: it is called the “Internet of Things.” This is the collection of sensors, digital cameras and other data gathering systems (such as RFID tags) attached to a multitude of objects and devices all over the world. These systems are generating enormous amounts of data 24/7/365.
The speed of Big Data generation is related to the expansion and increased performance of data networks both wired and wireless. It is also the result of improved capturing technologies. For example, one minute of high definition video generates between 100 and 200 MB of data. This is something that anyone with a smartphone can do and is doing all the time.
The Big Data conversation is more about the quality of the information than it is about the size and speed. Our world is full of information that lies outside structured datasets. Much of it cannot be captured, stored, managed or analyzed with traditional software tools. This poses many problems for IT professionals and business decision makers; what is the value of the information that is largely “exhaust data”?
There are good internal as well as external business reasons for sharing Big Data. Internally, if exhaust data is missed in the analytical process, executives are making decisions based upon intuition rather than evidence. Big Data can also be used externally as a resource for customers that otherwise would be unable to gain real-time access to detailed information about the products and services they are buying. It is the richness and complexity of Big Data that makes it so valuable and useful for both the executive process and customer relationships.
Every organization today is gathering Big Data in the course of its daily activities. In most cases, the bulk of the information is collected in a central EMS or ERP system that connects the different units and functional departments of the organization. But more likely than not, these systems are insufficient and cannot support all data gathering activities within the organization. There are probably systems that have been created ad-hoc to serve various specialized needs and solve problems that the centralized system cannot address. The challenge of Big Data is to capture all ancillary data that is getting “dropped to the floor” and make it useful by integrating it with the primary sources.
Making Big Data available offers organizations the ability to establish a degree of transparency internally and externally that was previously impossible. Sharing enables organization members and customers to respond quickly to rapidly changing conditions and circumstances. Some might argue that sharing Big Data is bad policy because it allows too much of a view “behind the curtain.” But the challenge for managers is to securely collect, store, organize, analyze and share Big Data in a manner that makes it valuable to those who have access and can make use of it.
I remember—upon downloading the Mosaic browser in 1993 with my dial up connection on my desktop computer—how thrilling it was to browse the web freely for the first time. It seemed like Mosaic was the ultimate information-gathering tool. I also remember how excited I was to get my first 80 MB hard disk drive for data storage. The capacity seemed nearly limitless. As we look back and appreciate the achievements of twenty years ago, we now know that those were really the beginnings of something enormous that we could not have fully predicted at the time.
With the benefit of those experiences—and many more over the past two decades of the transition from analog to online and electronic media—it is important to comprehend as best one can the meaning of Big Data in 2013 and where it is going. Those organizations that recognize the implications and respond decisively to the challenges of the explosive growth of structured and unstructured data will be the ones to establish a competitive advantage in their markets.