The Department of Psychology
EVIDENTIAL FLAWS IN STATISTICAL REASONING:
Things I Have Learned (so far)
I spend a lot of time thinking and often find myself stubbornly preoccupied with trying to solve a problem by finding a solution to an analogous, but easier, problem. I like to think that this sort of thinking goes on with or without the aid of formal education. That is not to say, of course, that people think effectively about subjects of which they have no knowledge, or that any of us can, without instruction, solve the kinds of problems one finds, say, in statistics textbooks. But over the past year, I've had the opportunity to work with psychology students (some of whom I've found smarter and brighter than myself), and I am continually amazed to discover that without any formal experience with statistics, students nevertheless make inferences, assign effects to causes and causes to effects, imagine the consequences about potential courses of action, and so on.
Despite this observation, however, I have noticed that students' difficulties with understanding statistical concepts such as probability theory and hypothesis testing stem, in part, from a weak understanding of how to evaluate evidence. Consequently, I have observed some patterns of thinking about statistics that are inconsistent with a correct technical understanding, but which appear to be prevalent and pervasive. I should note here that the motivation for paying attention to how people seemingly make persistent errors in reasoning and problem solving emanate from a combination of my own early frustrations with trying to understand the particulars of statistics and a life time of struggles with mathematics in general.
Confusion over Significance and Chance: I recently heard a student proclaim in lab, "Oh-my-God, the p-value is less than .0001 - that's a damn big effect!" There is a tendency for students to equate asterisks in tables with a sense of satisfaction in discovering important results. However, significance, as it is understood in the statistical sense, depends as much on sample size and experimental power (probability of avoiding a Type II Error) as it does on strength of effect. As many statisticians would tell you, with low power, you may be overlooking big effects; with excessive power, you may be gazing at microscopic effects with no real value. The p-value merely indicates the probability that your data are representative of the null hypothesis.
Gracefully Accepting the Null Hypothesis: A derivative of the idea that students often overemphasize highly significant results is the observation that students rarely appreciate non-significant (i.e., null) results. Greenwald (1975) has wittingly proposed a list of unfavorable, behavioral symptoms of rejecting the null hypothesis. I like to think of these behaviors as sins of commission (e.g., changing otherwise adequate operationalizations of variables when unable to obtain rejection of the null hypothesis and continuing to revise until the null hypothesis is at last rejected) and sins of omission (e.g., failing to report initial data results because the null hypothesis was not rejected).
Indeed it is often taught, and the point is commonly made in experimental and statistics courses, that theories predict relationships between variables and that finding these relationships (i.e., non-null results) helps to confirm theories and thereby advance science. This argument, however, ignores the fact that scientific advance is often most powerfully achieved by rejecting theories. It's my belief that one convincing way of doing this is to demonstrate that the relationships predicted by a particular theory are not obtained, and this would require acceptance of a null hypothesis.
Applying Causal Inferences by Using Sophisticated Statistics: I, unfortunately, had to learn this one the hard way! We have all had it drilled into our heads that "correlation is not causation". Notwithstanding, students interpret this statistical slogan as implying correlational analyses cannot by used for causal analysis; or my all time favorite, "ANOVA's are necessary and sufficient for causal inference." Neither of these notions is correct. Helberg (1995) points out that if you assign values to a predictor variable, its perfectly OK to use a correlation or regression coefficient to generate causal inferences. Conversely, it does not matter what sort of painstakingly complicated ANOVA design you think up--you cannot make causal inferences without random assignment. Therefore the bottom line on making causal statements is that you must have random assignment!
For the past four quarters, I've served as a lab instructor for Psy 302 (Inferential Statistics). In that capacity, I've learned that probability and statistical concepts are difficult for students to grasp and often conflict with many of their beliefs and intuitions about data and chance. I suspect this is so because students do not come to class as "blank slates" waiting to be filled, but instead approach learning statistics, as they would approach learning most things, with significant prior knowledge. In learning new concepts, I believe students interpret the new information in terms of the ideas they already have, constructing their own meaning by connecting the new information to what they already believe to be true. Unfortunately, I think this is where most of the flaws in reasoning emerge.
It is my view that students' misconceptions about statistics are resilient and difficult to change. Instructors cannot expect students to ignore strong beliefs merely because they are given contradictory information in class. There are probably no quick solutions for correcting these fallacies and biases in thinking. Instead, long-term answers may lie in identifying some sort of analogue to the problem. For example, there is a concept in introductory statistics known as sampling with replacement; this concept has been found to be related to common faulty "heuristics" in the decision-making literature (e.g., gambler's fallacy, availability). The idea is that each member of the population selected for the sample is returned to the population before the next member is selected. Thus, one conceivable reason why misinterpretations of certain concepts will perhaps outlast the thin veneer of course content is that students persistently and intently "sample" various hypotheses (or ideas they have about tackling a particular problem) "with replacement," ignoring prior hypotheses that were shown to be correct.
This learning obstacle is perhaps not unique to statistics or mathematics. As in many unfamiliar domains (e.g., driving stick-shift, giving effective presentations, dancing the M-A-C-A-R-E-N-A), students at first will probably have trouble computing statistics accurately, communicating statistical results clearly, and using statistics to effectively plunge into novel situations, unless they are permitted to do so over and over again in many different contexts. In the long run, I imagine that students will realize their own biases in thinking only after they have been encouraged to create their own meaning for what they have learned.
"The greatest enemy of truth is very often not the lie--deliberate, contrived, and dishonest--but the myth--persistent, pervasive, and unrealistic." John F. Kennedy