by Scott Ramsay

If you’ve ever been told to critically appraise / critically analyse a source you’d be forgiven for wondering what exactly that might involve. Well, it turns out scientists aren’t necessarily all that great at doing it either (or, at least, they’re creative when it comes to avoiding their own shortcomings).

If it’s a scientific paper, one of the first places to look for some critical analysis fodder is the data section. Do the authors present numerical measurements or recordings? Are they trying to say something about a hypothesis? If so, there’s a good chance they’ll have done some statistical analysis (don’t worry – I’ll keep it light). Broadly speaking, the point of a statistical test is to determine whether there’s a REAL difference between two or more things that have been measured. Perhaps a coin was flipped 100 times and the authors found that 48 times out of 100 it landed heads-up, and 52 times it was tails. It’s a difference, definitely, but is it really outwith the range of results you’d expect? It’s very uncommon to find a system without random fluctuations, and for this reason we need to distinguish between *statistically significant* and non-significant results.

Traditionally researchers have based *significance* on a rule of thumb based around the number 20. If you feed all of the information about your experiment into a stats test and the test says that 10 times out of 20 (i.e. half of the times) your measured result would have appeared just by random chance, we say it’s not statistically significant. If, however, the test says that you’d need to repeat the experiment 1,000,000 times in order to get your result had everything been occurring at random, you can be pretty certain that you’ve not just been a lucky researcher and obtained that 1 in a million result, but rather that things *weren’t *happening at random. A clearer way of putting it is to say that your result that was actually caused by your actions (e.g. a drug treatment).

This is where we come to the climax. Like I mentioned briefly above, scientists have historically chosen something a bit less restrictive than 1 in a million as a measure of statistical ‘proof’ – if your stats test shows that you’d need to repeat an experiment 20 times in order to have found your result at random, it gets the scientist’s seal of approval. The way a researcher talks about this in a report is to say that the *probability* (let’s call it “p”, or “the p value”) of such a result occurring is *less than 1 in 20*. 1 divided by 20 equals 0.05, so the short hand for this magic cut-off is “p < 0.05”. That means that stats tests that churn out p values of 0.06, 0.07, 0.08 and so on, which are larger than 0.05, are bad news for researchers, and that p values of 0.049, 0.045, 0.04, 0.03 and so on, which are smaller, are what every scientist hopes to find.

But what do you do if you’ve spent months and months doing your meticulous research, spending squillions of [insert your favourite currency here] and sleeping under the desk in your office… just to find you’ve got a result with a p value *just very slightly* on the wrong side of the line?

Well, it turns out there are 509 (and counting) different phrases to cheat! Have a peek at the link below before you critically analysing your next source. With that many variations on a cheat’s phrase to be aware of there’s a significant probability you might just spot one yourself…

http://mchankins.wordpress.com/2013/04/21/still-not-significant-2/