- Content Hub
- Personal Development
- Self-Management
- Self-Awareness
- Big Data: Does Size Matter?
Access the essential membership for Modern Managers
Transcript
Welcome to the latest episode of Book Insights, from Mind Tools. I'm Cathy Faulkner.
In today's podcast, lasting around 15 minutes, we're looking at "Big Data: Does Size Matter?" by Timandra Harkness.
Big data is a big deal. It seems as though every week brings a news story about how much information large corporations have on all of us. Somewhere, a data center logs every consumer choice we make online, and an algorithm swings into action to bring new products to our attention.
It's not just what we buy, either. Simply checking out a web page sends our data to someone. Our social media profiles offer a rich harvest for marketers and retailers.
Facial recognition can see who we are. Our phones can track our movements. Even seemingly innocent domestic appliances can log details of how we live our lives, and what choices we make. And increasingly, they can influence those choices, by pushing ads for products to our phones.
So, are we doomed to a future dominated by an all-knowing data culture, in which unseen people know everything there is to know about us? Can big data help manipulate us into doing things we don't really want to do?
These are questions "Big Data: Does Size Matter?" addresses. And along the way, it puts those questions into their proper historical context.
Timandra Harkness is a stand-up comedian, writer and broadcaster. She's written on a range of subjects for newspapers and magazines in the U.K., and regularly appears on national radio. Her performances feature science, statistics, and math, which may not seem obvious subjects for humor at first. But her fascination with how they affect our everyday lives comes through strongly in this book.
"Big Data: Does Size Matter?" is for nonspecialist readers who want a general overview of the history, development, scale, and uses of data. It's also a good primer for some of the ethical issues surrounding how we share and use our own information, and other people's.
So, keep listening to hear how William the Conqueror understood the principles of big data, how truly huge data can be harnessed to track the smallest possible events, and why big data matters to our sense of identity.
The book is organized into three main parts. Part One outlines the history of the subject, from the earliest Stone Age record of tally-keeping to the development of statistics and computing.
Part Two covers the changes big data has brought about in our lives. It's pretty upbeat. The term "big data" echoes Big Brother, and we often use it in a negative sense. But big data has enabled huge positive change for many people.
Part Three gets to grips with some of the big ideas behind big data. It also asks some difficult questions about the nature of our relationship with it. For example, what happens to personal identity in a data-driven world? Are we now just the sum of our data?
We'll return to some of those questions later. But first, let's consider what the author says about how we got here.
The book has a wealth of detail, anecdotes and interviews to help readers understand the phenomenon of big data. It also has some quirky, recurring motifs. One of these is an old bone.
People have been collecting data for a very long time. One of the earliest known records of data is 57 tally marks notched into the jawbone of a wolf. It dates from 30,000 years ago.
What these marks originally represented is lost. But what they represent now is the beginning of the process of data collection and recording. This would lead eventually to supercomputers, search algorithms, and artificial intelligence.
Data collection and analysis have undergone an explosion since the development of the internet. But for much of human history, the recording and use of data has been a much more downscale, slow-paced business.
When King William the First of England wanted to record all the possessions of the English crown after the Norman Conquest, he sent inspectors to every part of the country to collect information. The Domesday Book, which resulted from this labor, was still incomplete when William died 20 years later.
But the importance of data was established. Only by knowing where everything was, and who had it, could William know for sure what he owned. More importantly, this knowledge enabled him to levy taxes accurately, and so plan for the future.
Wanting to know what the future holds has driven some of the most sustained efforts in data collection and analysis. It's also bred some remarkable theories about what statistical analysis can do.
The Marquis Laplace, an 18th century French mathematician, imagined the universe was run by an all-knowing entity. Later thinkers called this Laplace's Demon.
You might not think this is so very different from an orthodox idea of God. But Laplace believed the Demon could explain and predict everything through the scientific analysis of data. Literally everything, including all the choices an individual might make during their life. It's a world view that might well appeal to modern developers of marketing algorithms and artificial intelligence.
Certainly, businesses have gained a huge amount from the advent of big data. Consider loyalty schemes, as an example. These capture information on every purchase a customer makes, allowing retailers to tailor offers and rewards to individuals.
As the capacity of servers has increased, so has the scope of such schemes to refine and sort data, and to profile their users. This is hugely valuable information for retailers. The couple who initially developed the store card for the British supermarket chain Tesco sold their share in it for over $100 million.
The volume and sophistication of data analysis means that retailers are increasingly able to understand more than simple buying patterns. They can begin to understand their customers' emotions, predicting when they might want to buy comfort food, or insurance, or a holiday in the sun.
If that's a little disquieting, Harkness goes on to try to blow the reader's mind in discussing the use of big data in science.
Science uses really big data. On a visit to the European Organization for Nuclear Research, CERN, in Switzerland, the author learns that processing data from a single experiment needs to be shared between 170 data centers in 40 countries. That's after computers have culled the data, so that only the potentially useful stuff is left.
In total, one day's experimental data amounts to 30 million gigabytes. All this to try to find the tiny anomalies that might indicate the presence of subatomic particles.
It isn't just very small things that produce very big data. Scientists need similar data-processing capacity to discover more about space. And living things pose further data-storage problems. If you want to use an MRI scanner to watch brain activity in nearly real time, for example, you will generate vast amounts of data.
Applying big data to health and medicine has achieved some significant advances. If you've got a huge amount of information and can analyze it accurately, you can spot patterns that might not be immediately obvious. That's particularly true across large populations, in which clusters of disease might be found where a researcher might not think to look for them.
But this kind of research comes with a downside. There's a risk that relatively small data groupings will gain a significance they might not deserve. We often see headlines like "Barbecued meat can cause cancer" or "Red wine prevents heart disease."
In fact, whether helpful or harmful, the effects of barbecued meat or red wine are likely to be more marginal than the headlines, and perhaps the data, would have you believe.
To some extent, this is a result of journalists looking for big headlines. But there's also a tendency in society to fear negative effects flagged up by data. This happens even when the real changes in circumstances are very small.
Harkness does a good job of keeping such downsides to data in the reader's mind, without being alarmist. For example, she describes the collection and use of social data, such as whether a subject is from a single-parent family, or the victim of domestic violence.
This data is sometimes used to stage interventions with those deemed to be "at risk." The intentions behind this are good. But the effect can be to stigmatize potentially vulnerable people. The author handles the issue thoughtfully.
This ethical dimension should be central to any consideration of the value of big data, and it's a key element of Part Three of the book.
This part is entitled "Big Ideas," and that's what it delivers. Here, Harkness gets to grips with the big questions of ethics, public policy, and personal identity. Having spent the previous two parts describing what big data is and what it does, she now addresses what it should be, and how it should be used.
Three aspects of the use of big data that generally get a bad press are facial recognition, voice recognition, and profiling technology. Perhaps that's because they seem so obviously intrusive and personal. They take aspects of our individuality and reduce them to a collection of data points.
The use of recognition and profiling software means that we're all potentially under surveillance by law enforcement agencies. The potential for such surveillance can seem like an erosion of privacy, and if you have no privacy, you can have no freedom.
At a community level, the use of profiling to predict likely trends in crime seems like a good idea. But what if inherent biases in the system lead the local police department to focus on a particular community? That can have a corrosive effect on relations between that community and law enforcement.
And what about profiling that decides the suitability of someone for a loan, or a job offer? Profiling allows banks and potential employers to collect data from a huge range of sources, including social media. If the decision-making algorithm sees a pattern it doesn't like, you're refused. And when the decision is taken by a machine, there's no scope for an appeal.
So far, so bad. Inevitably, Big Brother gets a namecheck. But it's worth remembering that algorithms only repeat the biases of the people who devise them. They understand patterns, but they don't necessarily provide accurate insights.
Fairer algorithms with fewer biases might be possible, however. And they might actually remove some of the biases that HR professionals bring to job selection, for example, even if those biases are unconscious.
One of the key takeaways from this book is that big data is often used as an engine for optimization. Whether it's getting the best search result, the best candidate, increasing customer engagement, or nudging people to change unhealthy lifestyles, big data helps us do things better.
But a sense of fear goes along with this. We fear that in seeking to know us better, people who use big data may use it to shape us into being what they want us to be. That may not necessarily be what we want to be. That raises serious questions about what constitutes human identity in a data-driven world.
At this point in the argument, Harkness makes something of a statement of faith. She writes, "The autonomous self cannot be quantified."
Algorithms can, as yet, only make decisions about things that can be counted. They can predict and nudge, but so far, they can't actually make the decisions fundamental in our lives. Those are still down to us.
The last numbered chapter in the book is an update, written between the original edition in 2016 and the paperback version, which came out the following year. Harkness acknowledges that updating a book on big data is a fool's errand; its influence grows faster than she can write about it. But she does have to process some interesting new material, on the 2016 U.S. Presidential election, and the U.K.'s Brexit referendum that same year.
So, can the use of big data help swing opinion to such an extent that it can affect elections and referenda? Can algorithms target ads to make people vote in a way they might not otherwise have done? The author thinks not, in a conclusion which retains some optimism about human critical intelligence.
She argues that fears about the negative aspects of big data are in danger of becoming a moral panic. Certainly, there are questions to be answered about privacy, profiling, and the reduction of human beings to the sum of their data. But the potential of big data to achieve good is at least as powerful as its ability to do wrong. There’s nothing inherently evil about it.
The book wraps up with an appendix on how to manage your own privacy. The advice is all valuable, and well worth paying attention to. It's unlikely to provide any new insights for people who work in data-sensitive workplaces – which is most of us. But it's useful to have a list of the security options open to us, and how they can protect us.
Harkness writes with the pacy style you might expect from a seasoned stand-up comic. She switches smartly between anecdote and argument, interweaving material from interviews with experts. There's quite a lot of humor, too, although some of it is heavy-handed. The repeated footnotes about differences between British and American English, and the author's tea-drinking habits, are certainly too predictable.
"Big Data" is aimed at the nonexpert reader, which means that a lot of the content may be familiar to those who've already dipped a toe into the subject area. Even so, the book conveys the sheer scale of the sea of data that surrounds us, with a keen eye for the "wow" moment.
But perhaps the book's most important achievement is to strike a balance between enthusiasm and concern. Big data has enabled so much, and could enable so much more. But its value in the future will depend on how people use it, regulate it, and recognize its limits.
"Big Data: Does Size Matter?" by Timandra Harkness is published by Bloomsbury Sigma.
That's the end of this episode of Book Insights. Thanks for listening.