The Almost of Everything
By Adrian Bonifacio | September 29, 2016
Data is the new oil.
This is the unofficial slogan of big data advocates. It refers to big data’s immense value once it has been refined through analytics. In this community, there is a palpable sense that big data is at the precipice of a revolution.
For those hearing the slogan for the first time, the image that big data evokes is a field comprised purely of numbers—and only statisticians are welcome. But data scientists and analytics firms don’t gather data just because they like numbers. They desire the method in the madness. They actually like pictures—the insightful images drawn up by the numbers.
Corporations and businesses usually gather and analyze data to grasp the nuances of their customer’s behavioral patterns. When and where is the best time to sell? Who will buy the most?
Government-led initiatives on data, meanwhile, revolve around transparency, mapping, recording, and improving social services. In the Philippines, there is so much potential for the country to leverage data to drive social progress, such as better weather forecasting for agriculture and disaster preparedness or the redesign of city mobility based on vehicular and pedestrian traffic.
For development practitioners such as myself, data serves as evidence of impact. We extract data to prove that a solution works well. This is where refining comes in—what qualifies as good data? What should be tracked—which numbers represent impact?
As such, refinement is an integral and crucial aspect of big data. Data that is unrefined is inefficient and even dangerous. Imagine all the decisions made by each stakeholder—ranging from policy-level that affects an entire nation to the very personal that charts a family’s future—these are transformative decisions that should not rely on unreliable or faulty data.
"Unlike oil, data is an unlimited resource. Every day, we are creating new touchpoints that can be marked and measured."
Unlike oil, data is an unlimited resource. Every day, we are creating new touchpoints that can be marked and measured.
Somewhere in a rural countryside, a farmer is hiding from the heat by wrapping her head and arms in strips of old cloth. She is harvesting spinach from her backyard. She stands up and considers whether to cook or sell the produce—and wonder when the next harvest would be. Lost to her is the importance of this moment: a data touchpoint—arguably, just as valuable a resource—has been created anew.
Given a large enough dataset, an algorithm will likely predict what happens to our farmer—what her yield would be, when is the best time to plant her spinach, whether she should cook or sell.
Big data bridges gaps in knowledge and lays out the world as a giant pattern of predictable behavior.
This is where the discussion on privacy enters. Do you know what data you are sharing? How much data are you sharing? How much of it should you share in the first place?
This is why people are unnerved when they see ads and posts in Facebook that sounds a little too similar to their Google search. How valuable is your personal data? It’s so valuable that corporations have to spend billions and billions acquiring other companies when all they really want is their data. Your data.
"While it may be the precursor to knowing the almost of everything, big data will not always spit out the exact answer you are looking for."
The 2011 film Moneyball depicts the real-life story of how data and analytics was leveraged by Billy Beane, then general manager of the baseball team, the Oakland Athletics. Beane’s challenge was how to compete against other teams with much larger budgets that could afford to put the biggest star players in their payrolls. With his staff, Beane went an unconventional route of putting together a team of undervalued and unheralded players—an “island of misfits”—based purely on a statistical model.
It didn’t matter if the guy was a confident young star on the rise or a savvy veteran with a great curveball—the question became: what was his win-share rating? What was his on base percentage? What was his salary?
The shift in mindset bucked the trend of traditional intuition-driven scouting and gave way to empirical analysis. Mid-way through the movie, Beane finally had his team of statistically sound players.
Yet they still kept on losing games. It turns out that the coach wasn’t playing the line-ups Beane drew up. Beane’s model forgot to include the coach’s stubborn refusal to venture into uncharted territory. It wasn’t until Beane forced the coach’s hand (by trading away all of the coach’s favored players) did the team start using the data-driven line-ups.Right after, the team rattled 20 wins in a row, a record-breaking achievement that carried them to the playoffs. They lost three games to two.
Data is only useful when it’s being used to drive action, otherwise it remains a jumble of numbers and text. And there are x-factors—the coach—that remain outside the grasp of prediction.
And while it may be the precursor to knowing the almost of everything, big data will not always spit out the exact answer you are looking for. In this massive space of bits and pieces, the compelling possibilities lie deeper—it is in the discovery of answers to questions that haven’t even been asked yet.
Where do you think data will take us next? Respond in the comments below.