nci logo
NIH
U.S. National Institutes of Health National Cancer Institute

Technical Notes: Statistical Significance

Errors may be made in the estimation of a given statistic. In order to test whether two groups (such as the populations of a state and the entire US) have the same or different actual rates, the observed rates for the groups are compared. Statisticians consider that a difference in observed rates can be explained by one of two hypotheses: (H0) The actual rates are really the same, but the observed rates are different because of some combination of error-causing factors, or (H1) the actual rates of the groups are really different. H0 is called the null hypothesis (because it says there is no real difference); H1 is called the alternate hypothesis. Typically, H0 is rejected only if there is strong evidence in favor of H1. (Thus, if the observed rates are equal, we cannot reject H0.)

Using statistical theory, one can determine the distribution of the rate difference under the assumption that H0 is true. Then values of the rate difference that are very unlikely to occur if H0 is true are identified. More specifically, a small positive number, called alpha (α), is chosen; usually, α is 0.05 or 0.01. (Alpha is called the significance level of the hypothesis test.) One can then identify limits for the difference in rates such that, if H0 is true, the probability of the difference being outside of those limits is α. If the observed difference is outside of these limits, then the observed result is very unlikely to happen if H0 is true, so H0 is rejected.

Another way of looking at the same process is to calculate, assuming H0 is true, the probability that the observed difference or any greater observed difference would occur; this number is called the P-value of the observed result. If the P-value of a comparison is less than α (that is, the observed difference is very unlikely to happen if the null hypothesis is true), H0 will be rejected. If the P-value of a test is greater than the significance level α, H0 will not be rejected. When a difference in rates is sufficiently large to cause the null hypothesis to be rejected for a given value of α (usually 0.05), it is called a statistically significant difference.

When a null hypothesis is rejected, there remains a small chance that a wrong decision has been made. If many statistical comparisons are done, even with α=0.01, the chance of making at least one wrong decision becomes a concern. In testing the differences between the total US rate and the rate for each state (or for the District of Columbia) for a given cancer, 51 statistical comparisons of the type described above are performed. Based on one of Bonferroni's inequalities (if there are n events and pi is the probability of success in event i, then P(atleast1success)<p1+...+pn) (Snedecor & Cochran,1980; p. 115-117), the significance level α for each individual comparison was set equal to 0.01/51 ≈ 0.0002. Thus, only individual-state-to-total-US comparisons with an associated P-value less than 0.0002 are considered to be statistically significant. That is, a very small significance level α(0.0002) is used in order to minimize the total risk (0.01) of falsely deciding that some pair of equal rates are unequal.

Use caution in assessing statistically significant differences. Population size has an important role in any calculation of statistical significance. Some states may have estimated rates that are very close to the estimated total US rate, but because of their large population, the difference between their estimated rate and the estimated total US rate is found to be statistically significant. In this case, the true state rate and the true US rate are almost certainly different, because the observed difference, though small, is nearly impossible if the null hypothesis (equal rates) is true. A small difference in rates, however, may have no practical importance. On the other hand, some smaller states may have estimated rates that differ substantially from the estimated total US rate, but because of their relatively small population, the differences are found to be statistically nonsignificant. When this happens, if the true state rate and the true US rate were equal, the probability of obtaining a difference at least as large as what has been observed is greater than α ≈ 0.0002. Therefore, because the evidence against it isn't strong enough, the null hypothesis (equal rates) is not rejected.

If the percent difference (PD) between the two rates is small, there may be some question about the importance of the difference. It is difficult to specify a minimally significant absolute PD, below which the difference would always be unimportant, because the observed PD will depend on the populations of the areas involved. It may be of value to consider the size of the PD between a state rate and the US rate in assessing the importance of a statistically significant difference.

Comparing individual state rates with the US rate and assessing statistical significance is not an appropriate procedure for assessing geographic clustering of state rates. Identification of states which may represent regional clusters of high or low rates would require additional statistical and graphical analyses.

For a number of cancers, the District of Columbia has the highest death rates. Use caution when comparing cancer rates for the District with those from the 50 states. The District is an entirely urban area, whereas a state includes urban, suburban, and rural areas. Mortality rates for many cancers are higher in urban areas. Also, the District has a higher percentage of blacks (58% of the total population in 2005) than any state. In addition, their higher mortality rates for several types of cancer elevate the overall rate for the District.