Although the Bem-Honorton article on psi is written to be accessible to nonspecialists, this additional guide has been prepared for those not familiar with the concept of meta-analysis or with statistical reporting in the psychological literature.
Meta-analysis: Meta-analysis is a set of statistical procedures for combining the results from several studies to see if there is an overall effect and, if there is, to determine how big it is and whether it varies with variations in experimental procedure. Meta-analysis is thus a quantitative way of evaluating previously reported results and is rapidly replacing the older qualitative literature review. Our discussion of the psi ganzfeld studies illustrates many of the issues involved in meta-analysis.
The Level of Statistical Significance (p): Any numerical result published in a journal article must be accompanied by its level of statistical significance, the probability that the result could have arisen by chance--designated by the letter p. By convention, we are permitted to call a result "statistically significant" if p is less than .05 (or 5% or 1/20). To say that a result is statistically significant does not mean that it is necessarily an important result or that readers are morally obligated to be convinced; it simply means that it is unlikely that the result occurred by chance.
For example, suppose I claim that a coin is biased toward heads. I flip it 3 times and it comes up heads every time. We can calculate the statistical significance of this result by noting that this 3-trial experiment could have come out 8 different ways: HHH, HHT, HTH, HTT, THH, THT, TTH, and TTT. If only chance is operating then all 8 permutations have an equally likely chance of occurring, and hence the observed outcome (HHH) could have occurred by chance 1/8 or .125 of the time. This is not less than .05 and thus this result is not statistically significant. On the other hand, if we got 5 heads out of 5 flips, this could occur by chance only 1/32 of the time; this is less than .05 and is thus statistically significant. The hypothesis that only chance is operating is called the "null" hypothesis; if we get a statistically significant result, we say that we can "reject the null hypothesis."
But even if we decide to reject the null hypothesis, we do not thereby necessarily commit ourselves to believing the experimenter's own hypothesis. The finding might have arisen because of bad experimental methodology or some other cause (e.g., perhaps the coin is not biased but I know how to flip it so that it comes up heads more often than chance would predict).
You now know all you need to know. In reading our article, you can simply look for the p value each time and ignore the details that describe how the statistical tests were conducted. For example, when we note that "the composite Stouffer z score across the 28 studies was 6.60 (p = 2.1 x E-11)", you don't have to know anything about Stouffer's z; just look for the p value. The notation "E-11" means 10-to-the-minus-11th power; thus the pertinent information here is that the probability that this result could have occurred by chance is .000000000021. In other words, not very likely!
The Effect Size: The next important statistic is the effect size: Given that there is a statistically significant effect, how big is it? We provide two measures of effect size. The first is simply the hit rate: How often did a subject correctly identify the target stimulus? Because the subject selects from 4 alternatives, he or she would be expected to select the target correctly 25% of the time by chance. We report that the actual hit rate achieved over many studies is about 34%. (This may not seem spectacular, but if you could achieve this margin over chance at a casino, you would be very rich indeed.) We also use a measure of effect size called pi which indicates how big the effect would be if translated to the coin flipping case. For example, the 34% hit rate is equivalent to having a coin come up heads 61% of the time.
The significance level, the effect size, and the size of the study (how many sessions there were) are all interrelated. If an effect is very small, then it requires many more tests to achieve a statistically significant effect. For example, if a coin is only slightly biased--say it comes up heads 52% of the time--it will require about 1,700 flips to detect the bias at a statistically significant level.
The Correlation Coefficient: A correlation coefficient is a number (usually designated by the letter r) that ranges from -1 to +1 and indicates the degree to which two variables are correlated. For example, the correlation in my undergraduate course this semester between my students' heights and their weights was about +.80. In the article, we report a correlation of +.20 between extraversion and ESP performance in the studies reviewed. This means that extraverts do slightly better than introverts. We also report a correlation of -.64 between the number of sessions in a study and the effect size. The negative sign means that studies with more sessions had smaller effect sizes. Whenever a correlation is reported, we also give its significance level, which indicates how likely is it that a correlation of that magnitude could have occurred by chance.
Go to:
The Bem/Honorton article: Does Psi Exist?
Other abstracts & articles available online