Where Einstein Meets Edison

Using Big Data Takes Discipline

Using Big Data Takes Discipline

Jun 26, 2013

“It’s like saying, I gave Stacy a pen, therefore she’s a journalist,” said ZestFinance CEO Douglas Merrill at GigaOM’s Structure:Data conference this March. We’re not automatically great decision makers just because we have big data.

In fact, the hazards of relying strictly on data to make decisions have received a good amount of press, including a set of columns from David Brooks (see here, here, and here). He says that while data helps us keep tabs on our confidence, a worldview bounded by data won’t predict, for example, the customer trust a business might lose if it neglects a market—even if financial performance data puts the market at a lower priority. Out of context, data on the health risks of smoking might have us think we’ll quit—yet we don’t.

Arguments such as Brooks’s and Merrill’s have prompted a few rebuttals (here’s one good one), but entrepreneurs should heed their cautions, especially those exploring data-intensive business models. Here are a few human-centered activities any entrepreneur should allocate time toward when making decisions with data.

1. Think about the nuanced ways humans may react. Netflix employs data remarkably effectively: in 2012, 75 percent of its viewings were the result of data-driven recommendations. Despite its analytical prowess, Netflix didn’t predict how people would react when in 2011 it considered separating its streaming business. Within weeks, nearly 800,000 subscribers unsubscribed, and an unpolished apology email only made people more upset.

Of course, it’s not that data can’t predict human behaviors at all. My favorite example is the Bureau of Labor Statistics’ Longitudinal Survey of Youth. It tracks a sample of Americans from childhood through age 40, monitoring thousands of micro behaviors from religious attendance to school club memberships to particular crimes committed and even if the person ever “hit someone.” Brookings Institution researchers have been able to use the data to predict school dropouts.

Nonetheless, humans behave in subtle ways, and even the richest data sets are no substitute for deliberation when estimating how humans will react to our actions.

2. Decide to measure something currently not measured. Traditionally, as Nate Silver discussed in The Signal and the Noise, while a baseball analyst could precisely measure a pitcher’s throwing speed, he or she could only qualitatively estimate a pitch’s arc shape. Recognizing the arc shape’s importance, Sportvision developed PITCHf/x technology using cameras to observe it precisely, and analysts now incorporate the results into their pitcher evaluations.

An entrepreneur might be able to precisely measure a competitor’s market share or the number of website hits received during a product launch. But savvy entrepreneurs might look for ways to quantitatively estimate other phenomena previously only evaluated qualitatively: a competitor’s expertise in a niche industry, let’s say, by considering the quantity of industry-tenured staff the competitor hired.

3. Recognize the seemingly charted, but actually uncharted, territory. When U.S. housing prices softened in 2006, data models predicted that banks’ mortgage investments would also soften, which was correct. But because average U.S. housing prices since the 1930s had only ever risen, there wasn’t data to predict the bad loan defaults that would accompany the price declines—or the insolvency risk that would come to characterize entire financial institutions. Ahead of time, the territory seemed charted—Wall Street asset valuations frequently rise and fall—but subsequent events made it clear the charts were incomplete.

In a sad example, the USA Space Shuttle Challenger was launched under conditions believed safe according to the data. The thermal distress of a shuttle part known as an O-ring had been measured at several temperature levels. Collectively the measurements adhered to a trend, convincing involved parties the part would function even at temperatures under which it had not actually been measured (credit kelly). The shuttle was launched at 31 degrees Fahrenheit, lower than ever before, resulting in a tragic loss of life.

At GigaOM’s Structure:Data conference, Vibrant Data Labs Founder Eric Berlow spoke of how relying on limited data can also be self-reinforcing. For example, a newspaper website might recommend articles based on what we’ve previously read. But if the only articles we read are those the website recommends, we create a circular reference precluding the algorithm from recommending alternative interesting content.

By nature, entrepreneurs venture into new territory, the completeness of any guiding data unknown. Entrepreneurs should be thoughtful about what the data doesn’t tell us—the range of situations it hasn’t encountered—and whether it warrants pausing and learning more first.

4. Know when an algorithm might mis-detect a pattern. Imagine a student preparing for an exam. To perform well, the student might need both to study and to get a reasonable night’s sleep the night before. In statistics, this kind of effect is known as an interaction between variables. An algorithm that examines studying and sleeping independently might tell us that time spent studying is worth more than time spent sleeping. This may be true to a point, but a more refined algorithm would pick up on variable interactions and tell us that studying is effective only with sufficient sleep.

Unfortunately, with larger data sets, tracking all possible interactions between all possible variables can require enormous amounts of computation, even by today’s standards. To optimize the tradeoff, entrepreneurs and anyone making decisions with data should first consider just which interactions an algorithm should look for.

At GigaOM’s Structure:Data conference, Digital Reasoning CEO Timothy Estes discussed differences in how certain natural language processing (NLP) algorithms extract meaning from human text. Typically, after feeding pages of raw text to the algorithm, it begins to discover the language’s structures. Yet teaching a computer English and teaching it simplified or traditional Chinese require different algorithms. An English-learning algorithm can look to the whitespace between characters for hints at the language’s structure, but whitespace does not characterize Chinese in the same way. A human must instruct the computer which algorithm to use.

5. Communicate results in compelling ways. Stories and narratives to communicate data are important not only to humanize results and to make them feel less technical. They’re important because the competition will use them too, and potentially very well. Economists have been studying the effects of taxes on the economy for years, and their data analyses have become rather nuanced. But in a televised debate, the politician who earns the trust of voters to go ahead and tweak taxes one way or another is often the politician with the greatest emotional appeal.

For an entrepreneur, a story might complement one’s marketing data showing the effectiveness of one’s product. Or, the story might be part of the product itself. Redstar Ventures’ Chris Howard is exploring a service that combines personal health data with a personal advising service to assist individuals with weight loss. “We’re confident that advancements in behavior-tracking technology will enable a revolution in weight loss. But you can’t just give the customer his or her data. You have to weave it into a narrative that compels the person to change his or her long-term behavior.”

Big data is clearly an opportunity-creating winner. Wal-Mart now combines customer data with Twitter feeds and other social information to recommend purchases to customers—including what possible gifts a customer’s friends may be interested in. David Brooks noted the speed with which Google has been able to identify flu outbreak locations using search keyword data. But for all its power, properly using the data—recognizing its limitations as a mere supplement to our kit of existing decision tools—should involve some discipline.


Neil McQuarrie


Neil McQuarrie is an MBA candidate at the MIT Sloan School of Management and an MPA candidate at the Harvard Kennedy School of Government. Before returning to school, Neil analyzed data as a consultant to the healthcare industry. He holds a BA in physics from Cornell University and a Master's in Information Technology from the Rensselaer Polytechnic Institute.