This was very interesting. I’d go so far as to say it’s the most interesting thing I’ve read all week. It’s good because it’s a clear explanation of what a top data scientist thinks “Big Data” is and what it does, and because it’s open and honest about the assumptions being worked from.
It’s also remarkable for its unashamed techno-topianism - a rather unfashionable ‘look’ for the last few years. Sandy Pentland believes big data can and will change the world for the better and is open about how this will happen.
It’s a long talk and quite a long read so I summarised it in two sections - the first “What It Is” and the second “What It Might Do”. I’m not offering any comment on this in this post, bit I certainly don’t agree with all of it.
WHAT IT IS
- Big Data isn’t to do with declared data (Facebook, Google searches) – it’s about behaviour not opinions.
- Big Data is a breadcrumb trail of behavioural data. Who you are is what you do, not what you present.
- Fusing this data helps us predict behaviour – if we can see some behaviour we can infer the rest by looking at who your social group is.
- Big Data is therefore about connections, especially between people (unlike older systems analyses which left people out).
- In really large datasets “statistical significance” stops working as a guide because almost any shift is ‘real’.
- Human intuition and judgement is needed to prevent false correlations from being acted on just because they fit the model.
- Big Data will allow “social physics” – granular look at interactions between individuals rather than thinking in aggregate terms. And also better personalisation.
- Most high-level decision makers are very new to this stuff.
WHAT IT COULD DO
- Because Big Data is mostly about people there are huge privacy issues.
- There are also huge opportunities to remake social systems based on how people actually interact and create a more globally fair world with stable, crash-proof systems.
- And also help tackle issues like eg spread of disease, global warming (designing cities based on actual use and movement)
- Your data is worth more if you share it but there will also be a drive towards transparency and away from “data silos”
- Governments and most firms happy with idea of personal data ownership – Google and Facebook holding out.
- Distributed data systems are more robust so will help protect people from centralised “grabs” of data.