Blackbeard Blog

This is a blog by Tom Ewing about the intersection of online culture and market research. I work for BrainJuicer in this area: everything on this blog is my own personal viewpoint, rather than BrainJuicer's. Here is an good place to start if you're interested in what I think about all this stuff. Contact me at Tom.Ewing@brainjuicer.com, or via @tomewing on Twitter.
Oct 07
Permalink
SMALL DATA
This is Thucydides. He gets called “the father of modern history” sometimes. He doesn’t get to be the father of history overall because that was Herodotus. But Herodotus kept putting awesome stuff in his book like monsters, giant snakes, visions of the divine, crocodiles etc. Thucydides’ book is by and large crocodile-free. Like, it has to be said, most modern history books. So there you go.
I did ancient history at school and university. I don’t know how many other researchers did, I guess marketing and business and suchlike are more common. Doing ancient history meant I had to read Herodotus and Thucydides and a handful of other sources. When I first encountered them the subtext was that Thucydides was SRS BUSINESS and reliable, whereas Herodotus was a bit of a bullshit artist but he was the best source we had.
Then at University I discovered the other interpretation: Thucydides was a good deal less reliable than he thought, but he’d worked out how to write all reliable-like. Herodotus, meanwhile, might or might not have been credulous re. the snake monsters but he was a much sharper historian, because he knew how important it was to record people’s customs and beliefs: what they said about what they did, as well as what they did. He was the first interdisciplinarian.
This idea was vindication for my own personal prejudices, so of course I liked it a lot.
Anyway it hardly mattered, because you had to read both. To do ancient history was to be plunged into the world of small data - where every inscription, every shard and chit, every scrap of archaeological evidence, and most certainly every text had to be scoured for meaning, its authenticity and reliability carefully judged.
THE AUTHORITY OF DATA
Doing this I realised how sensitive historians were to the authority of data - and how this authority was something they themselves helped create. The methodical mind and impassive style of Thucydides created one kind of authority, the digressive enthusiasm and bottomless curiosity of Herodotus another. Since these were, and would forever remain, our main sources, we came back to them again and again, preferring one but relying on both.
At a conference last week I heard people talking - as they do these days - about “big data”. And I started wondering about the authority of data, and where it comes from these days. The authority of small data derived partly from style and methodology but mostly from its existence: there wasn’t much, so you worked with what you had.
The authority of big data must come from somewhere else. Authority isn’t the same thing as validity, by the way. Often they go hand in hand, but sometimes the authority of a data point sweeps its validity away. The 90:9:1 rule; the Pareto Principle; the idea that Henry Ford once said something about faster horses - these are authoritative without being necessarily valid. You might call authority the “ring of truth”. It’s a bundle of factors - style, truthfulness, timeliness, source, spreadability….
Where does authority come from now? It’s more important than ever, after all: not only is there more data produced than ever, that data is sliced, presented and made public in more ways than ever. There’s always someone out there with a different dataset than yours, and potentially more authority. In the world of small data every piece is valuable - in the world of big data you’re looking for reasons to discard as much as you can.
Part of the clue is in the word “big data”: right now scale is a big driver of authority. If you’ve analysed a billion tweets, that’s better than a million, right? No more significant, maybe, but it sounds like it must be. If you’re able to talk at world population level - 4 billion mobile users, 750 million (or is it 800) Facebook users - it looks even better. Aggregation helps guarantee authority, it seems.
Owning the data helps, too - the big online services, in particular, latched quickly onto the fact that there were a lot of people hungry for information out there, and have worked to provide it. Facebook, LinkedIn, Twitter et al. regularly dripfeed stats, metrics and mind-boggling data into the infosphere - often through some kind of boffinish “data blog” type of outlet - and my feeling is that these numbers have a high level of authority. Even if they almost certainly aren’t much to do with the data the services actually use.
And, just as in the days of Herodotus and Thucydides, style is a source of authority. Nowadays it’s particularly visual style. A well-designed infographic spreads, but also - a hypothesis only - I suspect it feels more right than less attractively presented information.
Important to stress that none of these three sources of authority has anything to do with “quality”, “validity”, “truth”, “insight”, and so on. If you’re lucky enough to have data which does trade on these noble qualities then I would hope it gains the authority it deserves. But it’s far from certain it will.

SMALL DATA

This is Thucydides. He gets called “the father of modern history” sometimes. He doesn’t get to be the father of history overall because that was Herodotus. But Herodotus kept putting awesome stuff in his book like monsters, giant snakes, visions of the divine, crocodiles etc. Thucydides’ book is by and large crocodile-free. Like, it has to be said, most modern history books. So there you go.

I did ancient history at school and university. I don’t know how many other researchers did, I guess marketing and business and suchlike are more common. Doing ancient history meant I had to read Herodotus and Thucydides and a handful of other sources. When I first encountered them the subtext was that Thucydides was SRS BUSINESS and reliable, whereas Herodotus was a bit of a bullshit artist but he was the best source we had.

Then at University I discovered the other interpretation: Thucydides was a good deal less reliable than he thought, but he’d worked out how to write all reliable-like. Herodotus, meanwhile, might or might not have been credulous re. the snake monsters but he was a much sharper historian, because he knew how important it was to record people’s customs and beliefs: what they said about what they did, as well as what they did. He was the first interdisciplinarian.

This idea was vindication for my own personal prejudices, so of course I liked it a lot.

Anyway it hardly mattered, because you had to read both. To do ancient history was to be plunged into the world of small data - where every inscription, every shard and chit, every scrap of archaeological evidence, and most certainly every text had to be scoured for meaning, its authenticity and reliability carefully judged.

THE AUTHORITY OF DATA

Doing this I realised how sensitive historians were to the authority of data - and how this authority was something they themselves helped create. The methodical mind and impassive style of Thucydides created one kind of authority, the digressive enthusiasm and bottomless curiosity of Herodotus another. Since these were, and would forever remain, our main sources, we came back to them again and again, preferring one but relying on both.

At a conference last week I heard people talking - as they do these days - about “big data”. And I started wondering about the authority of data, and where it comes from these days. The authority of small data derived partly from style and methodology but mostly from its existence: there wasn’t much, so you worked with what you had.

The authority of big data must come from somewhere else. Authority isn’t the same thing as validity, by the way. Often they go hand in hand, but sometimes the authority of a data point sweeps its validity away. The 90:9:1 rule; the Pareto Principle; the idea that Henry Ford once said something about faster horses - these are authoritative without being necessarily valid. You might call authority the “ring of truth”. It’s a bundle of factors - style, truthfulness, timeliness, source, spreadability….

Where does authority come from now? It’s more important than ever, after all: not only is there more data produced than ever, that data is sliced, presented and made public in more ways than ever. There’s always someone out there with a different dataset than yours, and potentially more authority. In the world of small data every piece is valuable - in the world of big data you’re looking for reasons to discard as much as you can.

Part of the clue is in the word “big data”: right now scale is a big driver of authority. If you’ve analysed a billion tweets, that’s better than a million, right? No more significant, maybe, but it sounds like it must be. If you’re able to talk at world population level - 4 billion mobile users, 750 million (or is it 800) Facebook users - it looks even better. Aggregation helps guarantee authority, it seems.

Owning the data helps, too - the big online services, in particular, latched quickly onto the fact that there were a lot of people hungry for information out there, and have worked to provide it. Facebook, LinkedIn, Twitter et al. regularly dripfeed stats, metrics and mind-boggling data into the infosphere - often through some kind of boffinish “data blog” type of outlet - and my feeling is that these numbers have a high level of authority. Even if they almost certainly aren’t much to do with the data the services actually use.

And, just as in the days of Herodotus and Thucydides, style is a source of authority. Nowadays it’s particularly visual style. A well-designed infographic spreads, but also - a hypothesis only - I suspect it feels more right than less attractively presented information.

Important to stress that none of these three sources of authority has anything to do with “quality”, “validity”, “truth”, “insight”, and so on. If you’re lucky enough to have data which does trade on these noble qualities then I would hope it gains the authority it deserves. But it’s far from certain it will.

Comments (View)
blog comments powered by Disqus