Null hypothesis testing is voodoo.
Changes in the mental state of the experimenter should not affect the objective inference of the experiment. An argument for using Bayesian data analysis instead of H0 vs Ha.

Imagine you have a scintillating hypothesis about the effect of some different treatments on a metric dependent variable. You collect some data (carefully insulated from your hopes about differences between groups) and compute a t statistic for two of the groups. The computer program, that tells you the value of t, also tells you the value of p, which is the probability of getting that t by chance from the null hypothesis.
You want the p value to be less than 5%, so that you can reject the null hypothesis and declare that your observed effect is significant.
What is wrong with that procedure? Notice the seemingly innocuous step from t to p. The p value, on which your entire claim to significance rests, is conjured by the computer program with an assumption about your intentions when you ran the experiment. The computer assumes you intended, in advance, to fix the sample sizes in the groups.
In a little more detail, and this is important to understand, the computer figures out the probability that your t value could have occurred from the null hypothesis if the intended experiment was replicated many, many times. The null hypothesis sets the two underlying populations as normal populations with identical means and variances. If your data happen to have six scores per group, then, in every simulated replication of the experiment, the computer randomly samples exactly six data values from each underlying population, and computes the t value for that random sample. Usually t is nearly zero, because the sample comes from a null hypothesis population in which there is zero difference between groups. By chance, however, sometimes the sample t value will be fairly far above or below zero. The computer does a bizillion simulated replications of the experiment. The top panel of Figure 1 shows a histogram of the bizillion t values. According to the decision policy of NHST, we decide that the null hypothesis is rejectable by an actually observed tobs value if the probability that the null hypothesis generates a value as extreme or more is very small, say p < 0.05. The arrow in Figure 1 marks the critical value tcrit at which the probability of getting a t value more extreme is 5%. We reject the null hypothesis if tobs > tcrit In this case, when N = 6 is fixed for both groups, tcrit = 2.23. This is the critical value shown in standard textbook t tables, for a two-tailed t-test with 10 degrees of freedom.
In computing p, the computer assumes that you did not intend to collect data for some time period and then stop; you did not intend to collect more or less data based on an analysis of the early results; you did not intend to have any lost data replaced by additional collection. Moreover, you did not intend to run any other conditions ever again, or compare your data with any other conditions. If you had any of these other intentions, or if the analyst believes you had any of these other intentions, the p value can change dramatically.

AUTHOR: John Kruschke. The Road to Null Hypothesis Testing is Paved with Good Intentions.

Null hypothesis testing is voodoo.

Changes in the mental state of the experimenter should not affect the objective inference of the experiment. An argument for using Bayesian data analysis instead of H0 vs Ha.

Imagine you have a scintillating hypothesis about the effect of some different treatments on a metric dependent variable. You collect some data (carefully insulated from your hopes about differences between groups) and compute a t statistic for two of the groups. The computer program, that tells you the value of t, also tells you the value of p, which is the probability of getting that t by chance from the null hypothesis.

You want the p value to be less than 5%, so that you can reject the null hypothesis and declare that your observed effect is significant.

What is wrong with that procedure? Notice the seemingly innocuous step from t to p. The p value, on which your entire claim to significance rests, is conjured by the computer program with an assumption about your intentions when you ran the experiment. The computer assumes you intended, in advance, to fix the sample sizes in the groups.

In a little more detail, and this is important to understand, the computer figures out the probability that your t value could have occurred from the null hypothesis if the intended experiment was replicated many, many times. The null hypothesis sets the two underlying populations as normal populations with identical means and variances. If your data happen to have six scores per group, then, in every simulated replication of the experiment, the computer randomly samples exactly six data values from each underlying population, and computes the t value for that random sample. Usually t is nearly zero, because the sample comes from a null hypothesis population in which there is zero difference between groups. By chance, however, sometimes the sample t value will be fairly far above or below zero. The computer does a bizillion simulated replications of the experiment. The top panel of Figure 1 shows a histogram of the bizillion t values. According to the decision policy of NHST, we decide that the null hypothesis is rejectable by an actually observed tobs value if the probability that the null hypothesis generates a value as extreme or more is very small, say p < 0.05. The arrow in Figure 1 marks the critical value tcrit at which the probability of getting a t value more extreme is 5%. We reject the null hypothesis if tobs > tcrit In this case, when N = 6 is fixed for both groups, tcrit = 2.23. This is the critical value shown in standard textbook t tables, for a two-tailed t-test with 10 degrees of freedom.

In computing p, the computer assumes that you did not intend to collect data for some time period and then stop; you did not intend to collect more or less data based on an analysis of the early results; you did not intend to have any lost data replaced by additional collection. Moreover, you did not intend to run any other conditions ever again, or compare your data with any other conditions. If you had any of these other intentions, or if the analyst believes you had any of these other intentions, the p value can change dramatically.

AUTHOR: John Kruschke. The Road to Null Hypothesis Testing is Paved with Good Intentions.

16 notes

  1. pipoytales reblogged this from isomorphismes
  2. bparramosqueda reblogged this from isomorphismes
  3. isomorphismes posted this