The Data Deluge Makes the Scientific Method Obsolete

Do qualitative and quantitative changes in our capacity to gather and process (big) scientific data change the way we do science? Might they actually usurp the scientific method itself? In fact, is there anything more to the scientific method than just analysing data?

This is a discussion of an article that appeared in wired The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.

To me the article seems to be a little speculative – and gives a sense of philosophical deja vu.

Radical empiricism

This sort of “radical empiricism” and attempt to eliminate theory isn’t really a new idea in philosophy of science.

In fact, you could say that the scientific process really came out of the elevation of empiricism (ie. understanding the world based on direct observation) over rationalism, ie. the idea that we can come to useful conclusions about the world through the application of rational human thought, around the enlightenment. This overturned the Artistotalean “rational” model of the universe – the celestial spheres and ordered cosmology, etc. that had dominated medieval thinking.

Today all modern science worth its salt needs a healthy amount of empiricism.

Because of the association of science (and the “scientific method”) with empiricism, there’s a tendency to go the whole way and try to derive an entire scientific worldview from empirical data – big or otherwise – without any contaminating theory at all.

The death of falsificationism

Probably the most famous instance of this was Popper’s falsificationism, and more broadly logical positivism, at the beginning of 20th century. Falsificationism attempted to create a whole theory of science based upon deductive reasoning – by allowing direct elimination of hypotheses by a single piece of contrary data – in order to eliminate induction. Induction, unlike deduction, is not logically supportable from first principles, and so has bothered philosophers of science since Hume.

Unfortunately – or fortunately, depending on your perspective – falsificationism and logical positivism are one of the few areas of philosophy where there seems to be a broad consensus that it is as dead as a doornail.

According to my favourite source, wikipedia: “as John Passmore expressed it, dead, or as dead as a philosophical movement ever becomes“. Godel’s theory, and Quine and Kuhn’s attacks on falsificationism, are largely responsible.

What scientists actually do

In hindsight, it’s fairly clear that not only does real science rarely operate according to falisificationism – even the Wired article refers to how Newton’s laws of motion do not explain many “falsifying” anomolies, yet we certainly shouldn’t throw them out as “wrong” – but how the many chains of assumptions that need to be made from measuring instrument to empirical data (sometimes called the “theory-laden” nature of observation) prevent any purely deduction-based scientific process. Induction, and theory-forming, are alive and well in philosophy of science – probably the most so in a couple of hundred years.

Where I think this gets interesting is that, as people have said above, pure-data based models can be very useful and effective.

So what’s the problem? Well, I can think of a real-world example where a purely statistical approach has given us a model that makes almost perfect predictions – but this has lead to a destruction of theory and loss of understanding. The downside is it involves backgammon, so bear with me…

The Backgammon Oracle

Backgammon is a game of luck and skill involving dice. There is no way in backgammon to “rationally” show that one given move is better than another – and so how to advise a newbie to play – the only (approximate) way is to simulate the move by performing what is called a “rollout”, which is essentially a Monte Carlo simulation of the results of a game after a move is made. Back in the 60s, backgammon got popular, and the experts who won tournaments started to write books teaching people to play. They used their experience to derive simple, easy to learn theories (always make your 5 point!) because to empirically test these theories, by manually performing thousands of rollouts, was laborious to say the least.

The program has become an oracle – we ask it for the best move, and it cryptically gives us the best answer – but we can never understand why it is the best move.

Cut to today, and we have sophisticated neural-net backgammon software. eXtremeGammon can reliably beat any human player in the world in the long run.

These nets are made by training an initial, barely competent neural net by playing against itself. The interesting thing is that we can use these models to perform rollouts of any move we want, and because it is such a strong player, it can tell us with a high degree of precision what the absolute correct move is.

But here’s the problem – we cannot convert the knowledge stored in the program’s neural net into human-comprehensible theory. The program has become an oracle – we ask it for the best move, and it cryptically gives us the best answer – but we can (as observed above) never know why it is the best move.

More mystery?

And it turns out the experts in the 60s were wrong about lots of things. It isn’t always wrong to make your 1-point, and it isn’t always wrong to leave two blots in home. Most of the hard-and-fast rules – the theory – has disintegrated.

And if you’re a new player, this makes it much harder to learn the game. You can play against the program, and even try to emulate it – but you can never replicate it completely without becoming one of the best players in the world.

This gap between predictive power, meaning and understanding is stark here – and this issue is unlikely to go away, however sophisticated machine learning models become.

Radical empiricism

The death of falsificationism

What scientists actually do

The Backgammon Oracle

More mystery?

Archive