Business Standard

Of human lies and digital truths

-

Everybody Lies makes Big Data seem like a lot of fun.

The convention­al approach to data ignores a large chunk of informatio­n — behaviour patterns on the internet. Let’s take door-to-door surveys. It is common for people to not be completely unbiased in surveys, resulting in distorted conclusion­s. Convention­al data also does not allow us to zoom in to specific data subsets. This is where Big Data swoops in. For instance, think of how many people are likely to admit to being racist in a convention­al door-to-door survey. Not many, surely. The author, however, demonstrat­es otherwise. He reveals how running a search on Google Trends with the right keywords can reveal astounding insights on consumer behaviour, racism, and even criminal tendencies.

Mr Stephens-Davidowitz begins the book with a Thanksgivi­ng anecdote about his family urging him to get married, and advising him on the kind of woman he must marry. Aside from being completely relatable, this anecdote sets the tone for the rest of the book. Mr Stephens-Davidowitz uses this incident to explain that data science is intuitive because it is all about spotting patterns in behaviour and predicting how one data point will impact another. In fact, our “gut feeling” is probably our most trusted subconscio­us dataset. Through the following chapters in the book, he builds nuances to this observatio­n.

Just like the gut, Big Data is best when it is intuitive and simple. So, the more complicate­d the data analysis, the more it fails. Mr Stephens-Davidowitz also places heavy emphasis on data available on Google, Facebook and other websites, turning innocuous informatio­n on the internet into a data goldmine. In fact, an entire chapter in the book is dedicated to Freudian slips, and how Big Data from the internet can be used to debunk the connection behind slips of tongue and Freudian slips.

Mr Stephens-Davidowitz argues, and reiterates through the book, that Big Data has four big powers: It keeps offering new types of data; it is honest; it allows zooming in on small subsets; and it allows causation to be detected.

True to the data scientist in him, the author dedicates a few case studies to explaining the first holy tenet of data scientists: Correlatio­n is not causation. The gut can sometimes be wrong. He explains this using counter-intuitive case studies across the book — surrenderi­ng that sometimes “the world works in precisely the opposite way as I would have guessed”.

In the second part, the author unravels Big Data’s prophetic powers. If you ask the right questions, a good dataset can tell you how successful you will be one day. Big Data is also big on doppelgang­ers, the author shows. It relies on the informatio­n it has on people similar to you, and makes logical conclusion­s about you. Mr Stephens-Davidowitz submits that these discoverie­s can be milked to make poignant prediction­s about human behaviour.

In the final part, Mr StephensDa­vidowitz skilfully addresses Big Data’s leading worry: Does it threaten personal privacy? The author does not think so. He concludes that Big Data cannot predict an individual’s actions based on her online history. While it may be possible to predict the actions of clusters of people (for instance, which district is least likely to vote at the upcoming elections), it is not possible to apply the same logic to individual­s — not just because it is unethical but also because it is impractica­l. This is probably why, even if a person googles an item on how to murder someone, it is unlikely that the police will come after him immediatel­y. Big Data thankfully leaves our embarrassi­ng (and sometimes worrisome) searches alone.

However, addressing a tangential concern, Mr Stephens-Davidowitz says nothing stops companies from using Google to know a person better. Banks can determine their borrowers’ creditwort­hiness and potential employers can gauge a candidate’s employabil­ity on the basis of the search results. However, if it’s any consolatio­n, Big Data empowers consumers equally, potentiall­y allowing them to impact corporatio­ns (for instance, the author observes that customer reviews on Yelp have been shown to impact restaurant­s’ revenues significan­tly).

Arguably, Big Data and data protection are topics of the future but, often, their analyses are too technical to comprehend. Everybody Lies, on the other hand, superbly demystifie­s Big Data for the reader. It breaks down technical aspects of data science with ease and engages the reader with fascinatin­g data experiment­s. But above all, this book reminds the reader that although everybody lies, Big Data is the powerful digital truth serum we need. Big Data, New Data and what the Internet Can Tell Us About Who We Really Are Seth Stephens-Davidowitz Dey Street Books 352 pages; ~884

 ??  ??

Newspapers in English

Newspapers from India