Binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.
The binomial test is useful to test hypotheses about the probability (
\pi
H0\colon\pi=\pi0
where
\pi0
If in a sample of size
n
k
n\pi0
\Pr(X=k)=\binom{n}{k}pk(1-p)n-k
If the null hypothesis
H0
n\pi0
p
\pi<\pi0
p
p=
n-i | |
\sum | |
0) |
An analogous computation can be done if we're testing if
\pi>\pi0
k
n
Calculating a
p
\pi0 ≠ 0.5
p
X=k
l{I}=\{i\colon\Pr(X=i)\leq\Pr(X=k)\}
p
p=\sumi\inl{I
One common use of the binomial test is the case where the null hypothesizes that two categories occur with equal frequency (
H0\colon\pi=0.5
When there are more than two categories, and an exact test is required, the multinomial test, based on the multinomial distribution, must be used instead of the binomial test.[1]
Most common measures of effect size for Binomial tests are Cohen's h or Cohen's g.
For large samples such as the example below, the binomial distribution is well approximated by convenient continuous distributions, and these are used as the basis for alternative tests that are much quicker to compute, such as Pearson's chi-squared test and the G-test. However, for small samples these approximations break down, and there is no alternative to the binomial test.
The most usual (and easiest) approximation is through the standard normal distribution, in which a z-test is performed of the test statistic
Z
Z= | k-n\pi |
\sqrt{n\pi(1-\pi) |
where
k
n
\pi
Z= |
| |||||
\sqrt{n\pi(1-\pi) |
For very large
n
In notation in terms of a measured sample proportion
\hat{p}
p0
n
\hat{p}=k/n
p0=\pi
Z= | \hat{p |
-p |
0}{\sqrt{
p0(1-p0) | |
n |
}}
by dividing by
n
Suppose we have a board game that depends on the roll of one die and attaches special importance to rolling a 6. In a particular game, the die is rolled 235 times, and 6 comes up 51 times. If the die is fair, we would expect 6 to come up
235 x 1/6=39.17
times. We have now observed that the number of 6s is higher than what we would expect on average by pure chance had the die been a fair one. But, is the number significantly high enough for us to conclude anything about the fairness of the die? This question can be answered by the binomial test. Our null hypothesis would be that the die is fair (probability of each number coming up on the die is 1/6).
B(N=235,p=1/6)
f(k,n,p)=\Pr(k;n,p)=\Pr(X=k)=\binom{n}{k}pk(1-p)n-k
As we have observed a value greater than the expected value, we could consider the probability of observing 51 6s or higher under the null, which would constitute a one-tailed test (here we are basically testing whether this die is biased towards generating more 6s than expected). In order to calculate the probability of 51 or more 6s in a sample of 235 under the null hypothesis we add up the probabilities of getting exactly 51 6s, exactly 52 6s, and so on up to probability of getting exactly 235 6s:
235 | |
\sum | |
i=51 |
{235\choosei}pi(1-p)235-i=0.02654
If we have a significance level of 5%, then this result (0.02654 < 5%) indicates that we have evidence that is significant enough to reject the null hypothesis that the die is fair.
Normally, when we are testing for fairness of a die, we are also interested if the die is biased towards generating fewer 6s than expected, and not only more 6s as we considered in the one-tailed test above. In order to consider both the biases, we use a two-tailed test. Note that to do this we cannot simply double the one-tailed p-value unless the probability of the event is 1/2. This is because the binomial distribution becomes asymmetric as that probability deviates from 1/2. There are two methods to define the two-tailed p-value. One method is to sum the probability that the total deviation in numbers of events in either direction from the expected value is either more than or less than the expected value. The probability of that occurring in our example is 0.0437. The second method involves computing the probability that the deviation from the expected value is as unlikely or more unlikely than the observed value, i.e. from a comparison of the probability density functions. This can create a subtle difference, but in this example yields the same probability of 0.0437. In both cases, the two-tailed test reveals significance at the 5% level, indicating that the number of 6s observed was significantly different for this die than the expected number at the 5% level.
Binomial tests are available in most software used for statistical purposes. E.g.
PROC FREQ DATA=DiceRoll ; TABLES Roll / BINOMIAL (P=0.166667) ALPHA=0.05 ; EXACT BINOMIAL ; WEIGHT Freq ;RUN;
npar tests /binomial (.5) = node1 node2.