On writing · Technology

Big Data and small data

We live in an era of Big Data. More information is being recorded and stored in more ways than ever before. I am reminded of a phrase that a friend uses, light-heartedly, about the super-rich: they have “more money than a million people could count in a million years”. That phrase, and the way he delivers it, always makes me smile. Something similar is happening with data. There is more information being gathered than a million people could count in a million years. That’s why we use machines. Of the millions of ways that information is being recorded locally, nationally, globally, at regular intervals, which ones should I use to illustrate the point? Here are two.

Since the congestion charge was introduced here in London, for vehicles moving through the centre of town on weekdays, every single number plate entering the charging zone in its hours of operation has been photographed. ANPR (Automatic Number Plate Recognition) links up this information with data held about every car in the UK and whether the congestion charge has been paid. As drivers we only see the results of this process if we fail to pay the charge. We are sent photographic evidence of our vehicle in the charging zone and are fined. It has happened to us as a family once, back in the days when the zone was extended out to Shepherds Bush. Imagine how many people would be involved if this process were conducted manually, to make a note of all the number plates as they entered central London, and to cross-check them against driver details and whether fees have been paid. Of course it wouldn’t work that way. We’d probably have toll-booths, staffed by human beings to collect the fees, and huge delays to enter or leave the centre of town.

Another example of millions of bits of data being gathered and processed quickly is parkrun results from events taking place around the world every weekend. Information about every completed race, for every runner who has registered and had their parkrun barcode scanned, is available within an hour or two to anyone who can access the web. The availability of so much information, about so many runners, with such immediacy, is still a pleasant surprise to me. The only information about the times and distances that I ran as a child was held in notebooks (which I no longer have) and a pocket diary (which I still have, and wrote about here). Again, if parkrun details were collected manually imagine how many human hours would be needed to record and share information from each weekend’s events.

We do not have much control over the Big Data being collected about us, about our shopping habits or web searches, our use of social media or the journeys we take, but we should be able to control small data, the information we store for ourselves. I have spent rather a lot of time in recent days thinking about this small data. I have been reviewing information that I stored more than 10 years ago, information that is not readily available to anyone apart from me. It began with a review of some of the hundreds of thousands of words that I typed in 2006 and 2007. As I noted in this piece in 2015:

“I write a thousand words per day at least five days a week. Or, more accurately, I type a thousand words per day, at least five days a week. I started in January 2006 and now, nearly ten years later, I have finally set up this Blog to allow some of these millions of words to be read by others. Welcome to my world.”

In the early weeks of 2007 there were days when I would type 10,000 words or more, getting into the habit of writing regularly. I recorded the minutiae of our daily lives, and produced early drafts of pieces that have now appeared on this Blog and in “1000 Memories”. Included in these thousands of sentences there is a huge amount of detail about our daughter’s first months – her sleep patterns, feeding patterns, visits to post-natal clinics and so on – and plenty of detail about our son’s development. (He was two years old at the time.)

Also, in those early years, we used our digital camera just about every day to take photos of both children. These photos have been backed up in every conceivable way (CDs, DVDs, USB sticks, cloud storage) and copied onto every computer hard disk that we have used since then, so if we want to see what the children looked like, or what they were wearing, or where we went with them on any given day we can do so. Nothing has been lost.

I spent several hours last week reviewing 2007 in far more detail than I have ever reviewed any other period of my life. It was prompted by memories of my Uncle Jimmy, who featured in two pieces last month (“The Gambler” and “Uncle Jimmy and a trip to The Emperor”). I wanted to check, specifically, the date that he died (5 February 2007) and found that I had written many thousands of words about him, about the funeral and about our families. I had even written an early draft of that piece about “The Gambler”, and the words that I used about our trip to the Woodman pub in Southampton were almost the same as the words I drafted afresh last month. The way I remembered things in 2007 is clearly the way I still remember them. My memories haven’t changed.

I also viewed every photo taken in those first weeks of 2007 and then zig-zagged back and forth through the years either side. I relived February 2006 and February 2008 and then returned to the present day. Although I have categorized this information (the photos, the thousands of words typed) as small data, it took several hours of my time to review just a few weeks’ worth. I create and store less of this information now, and share more of it. I no longer type as many as 10,000 words in a day but usually publish at least 10,000 words a month on this Blog. We gave up taking daily photos of the children when they were still toddlers, but there are still over 12,000 pictures from that first digital camera, recording the years from 2004 to 2010. Since then we have mostly used the cameras on our phones to take pictures and videos. Even though we create and store less of this “small data” it would still take thousands of human hours to review it all. It’s not “more information than a million people could count in a million years” but it’s still pretty daunting.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s