Data is a lot like electricity – a little bit at a time, not too much direct contact, and you’re fine. For example, a single nine-volt battery doesn’t provide enough power to light a single residential bulb. In fact, it would take about a dozen nine-volt batteries to light that single bulb, and it would only last about an hour. It’s only when you get massive amounts of electricity flowing in a controlled way that you can see real results, like running electric motors, lighting up highway billboards, heating an oven or powering commuter trains.
It’s the same with data. The sale of a single blue shirt at a single outlet store is not much data. And it’s still not much even when you combine it with all the sales for that store in a single day. But what about a year’s worth of data, from multiple locations? Massive amounts of data can do amazing things for us as well. We have all seen in today’s data centric business environment what controlled usage of data can do.
Some examples include:
- The National Oceanic and Atmospheric Administration (NOAA) can now more accurately predict a hurricane’s path thanks to data that has been collected over time
- Marketing firms can save money on culled down distribution lists based on customer demographics, shopping habits and preferences.
- Medical experts can identify and treat conditions and diseases much better based on a patient’s history, life risks and other factors.
- Big ‘multi-plex’ movie houses can predict more accurately the number of theatres it will need to provision for the latest summer block buster by analyzing Twitter and other social media feeds as related to the movie.
All of this can be done thanks to controlled data analytics.
The key word here is “controlled.” With a background in marine engineering and shore-side power generation, I have seen my share of what can happen when electricity and other sources of energy are not kept ‘controlled.’ Ever see what happens when a handful of welding rods go through a steam turbine spinning at 36,000 RPM and designed for tolerances of thousandths of an inch? It’s not pretty. After as many years in database technologies, data analysis and visualizations, I have also seen the damage resulting from large quantities of uncontrolled data. In his book Signal: Discerning What Matters Most in a World of Noise, author Steven Few shows a somewhat tongue-in-cheek graph that ‘proves’ that the sale of ice cream is a direct cause of violent crime. Or was it the other way around? It’s an obvious comic hyperbole that serves to illustrate his point that we need to be careful of how we analyze and correlate data.
With the ‘big data’ explosion, proponents will tell you that ‘if a little is good, then more is better.’ It’s an obvious extension, but is it accurate? Is there such a thing as ‘too much data’?
Let’s say you are a clothing retail store in the mall. Having data for all of your sales over the past ten years, broken down by item, store, date, salesperson and any number of other dimensions may be essential. What if we were to also include ALL the sales of ALL competitors’ products, seasonal weather history, demographic changes, foot traffic patterns in the mall and just about anything else that could influence a customer’s decision to buy your product even down to what they had for lunch just before they made the purchase? The result would most likely be UN-controlled data analysis. This tends to lead to erroneous correlations and bad decisions. For instance, you just might discover that customers are four times as likely to make a purchase if they had pizza for lunch, never realizing that there are more pizza restaurants near your stores than any other type of food service!
When it comes to data, stick with what you know is good data. Make sure it’s clean, reliable and most of all, relevant. Most of all, make sure you CONTROL your data. Without control, there may be no stopping the damage that results.
About Todd: Todd Chittenden started his programming and reporting career with industrial maintenance applications in the late 1990’s. When SQL Server 2005 was introduced, he quickly became certified in Microsoft’s latest RDBMS technology and has added certifications over the years. He currently holds an MCSE in Business Intelligence. He has applied his knowledge of relational databases, data warehouses, business intelligence and analytics to a variety of project for Blum Shapiro since 2011.