Grow Data Awareness

March, 02 2017. 15 minutes read.

I picked up Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier expecting a technical book about Hadoop clusters and MapReduce algorithms. What I got instead was a philosophical shift in how I think about data itself.

The book covers the well-known 5 V's of Big Data (Volume, Velocity, Variety, Veracity, and Value). But the V that struck me hardest was Value. Not value in the abstract "data is the new oil" sense that everyone parrots. But two specific concepts that changed how I see the world: the value of data exhaust and the value of open data.

Data Exhaust: Your Digital Shadow Has Worth

Data exhaust is the trail of data you leave behind as a byproduct of doing something else. When you search on Google, the search results are the product. But the data about what you searched, when, how you phrased it, what you clicked, what you ignored (that is data exhaust). It was never the point. It was the residue.

Mayer-Schönberger and Cukier argue that this exhaust is often more valuable than the primary data. Google did not become an advertising giant because of its search algorithm alone. It became one because it realized the exhaust (search behavior patterns) could predict intent. And intent is what advertisers pay for.

This concept hit me because it applies everywhere. A university's student attendance system generates data exhaust (patterns of when students skip, which courses have declining attendance, which time slots are dead). A hospital's appointment system generates exhaust (no-show rates, seasonal patterns, demographic correlations). A city's traffic light system generates exhaust (flow patterns, congestion hotspots, accident-prone intersections).

Most organizations throw this exhaust away. They collect the primary data (the attendance record, the appointment, the traffic count) and ignore the patterns hiding in the residue. The book made me realize: the exhaust is the gold. You just need to know how to refine it.

Open Data: When Sharing Multiplies Value

The second concept that stuck with me was the value of open data. The book presents cases where governments and organizations released their data publicly, and the resulting innovations far exceeded anything the original data collectors could have imagined.

The classic example: the US government released GPS data to the public. The government built GPS for military navigation. But once the data was open, civilians built everything from car navigation to Uber to Pokémon Go. The value created by opening the data dwarfed the original military application by orders of magnitude.

This is the counterintuitive insight: data often becomes more valuable when you give it away. Not because you are being generous, but because you cannot predict all the ways data can be used. A single dataset in the hands of a thousand creative minds will generate a thousand applications you never imagined. Keeping it locked in a silo limits it to the imagination of whoever collected it.

For Indonesia, this is especially relevant. We have massive amounts of government data (education statistics, health records, agricultural output, transportation patterns) sitting in silos across ministries and local governments. If even a fraction of this data were opened in machine-readable formats, the innovation potential would be enormous. Startups could build tools we have not even conceived yet.

Why This Matters for Me

What struck me even deeper was how the book's concept of data collection aligned with what I had already been thinking about in my own research. In my 2014 paper "Redefining Data Provider: The REST Approach To Solve Indonesia Lecturer Administrative Problems", I was essentially tackling the same question from a different angle: how do you gather, structure, and redistribute data that is scattered across disconnected systems? The book talks about it at a macro level (governments, corporations, entire industries). My research dealt with it at the micro level (lecturer data across Indonesian universities). But the core principle is identical: data that sits in isolation loses value. Data that flows, connects, and becomes accessible multiplies in value.

Reading this book felt like finding the theoretical foundation for what I had been building instinctively. The REST-based data provider I designed was, in essence, an attempt to turn siloed administrative data into something open and reusable. The book gave me the vocabulary and the framework to articulate why that matters beyond just "making the system work."

As a lecturer in IT, this book rewired how I teach data-related courses. I no longer start with "how to store data" or "how to query data." I start with "what is the value of this data?" and "who else could use it?"

It also influenced my own projects. When I worked on the Ministry of National Education database, I was focused on integration and deduplication. After reading this book, I started asking: what data exhaust does this system generate? What patterns are hiding in the educator data that nobody is looking at? What would happen if parts of this data were opened for researchers?

The answers to those questions are more interesting than the database schema itself.

If you work with data in any capacity (and in 2017, who does not?), read this book. Not for the technical details. For the mindset shift. Data is not just something you collect and store. It is something that grows in value when you look at it from unexpected angles, when you combine it with other data, and when you let others see it too.

The most valuable data is often the data nobody meant to collect.
@hepidad