Rate Algorithms

Crude Rate

A crude rate is the number of new cases (or deaths) occurring in a specified population per year, usually expressed as the number of cases per 100,000 population at risk.

\(\Large cruderate = \frac{count}{population} \times 100,000\Large\)

SEER*Stat allows you to display rates as cases per 1,000; 10,000; 100,000; or 1,000,000.

An age-adjusted rate is a weighted average of crude rates, where the crude rates are calculated for different age groups and the weights are the proportions of persons in the corresponding age groups of a standard population. Several sets of standard populations are included in SEER*Stat. These include the total U.S. populations (1940, 1950, 1960, 1970, 1980, and 1990), an estimate of the U.S. 2000 population, 1991 Canadian population, and the world population. The age-adjusted rate for an age group comprised of the ages x through y is calculated using the following formula:

\(\Large aarate_{x-y} = \sum_{i=x}^{y} \left[ \left( \frac{count_i}{pop_i} \right) \times\ 100,000 \times\ \left( \frac{stdpop_i}{\sum {}_{j=x}^{y} stdpop_j} \right) \right]\Large\)

where count is the number of cases for the ith age group, pop_i is the relevant population for the same age group, and stdpop_i is the standard population for the same age group.

If count_i and pop_i are both zero, SEER*Stat treats that rate as zero. In the unusual event that pop_i is zero but count_i is non-zero, SEER*Stat sets pop_i to the same value as count_i for the purposes of this calculation and the w_m calculation below only. When this happens, the resulting matrix cells will be flagged and footnoted to indicate this adjustment.

Standard Error for a Crude Rate

This calculation assumes that the cancer counts have Poisson distributions.

\(\Large SE_{crude} = \frac{\sqrt{count}}{population} \times\ 100,000\Large\)

Standard Error for an Age-adjusted Rate

This calculation assumes that the cancer counts have Poisson distributions. Suppose that the age-adjusted rate is comprised of age groups x through y.

\(\Large SE_{AArate} = \sqrt{\sum_{i=x}^{y} \left( \left( \frac{stdpop_i}{\sum {}_{j=x}^{y} stdpop_j} \right)^2 \times\ \left( \frac{count_i}{population_i^2} \right) \right)} \times\ 100,000\Large\)

Crude Rate Confidence Intervals

The endpoints of a (1 - p) x 100% confidence interval are calculated as:

\(\Large CI_{low} = \frac{\frac{1}{2} (ChiInv(\frac{p}{2}, 2 \times\ count)) }{population} \times 100,000\Large\)

\(\Large CI_{high} = \frac{\frac{1}{2} (ChiInv(1 - \frac{p}{2}, 2 \times\ (count+ 1))) }{population} \times 100,000\Large\)

where Chi Inv (p,n) is the inverse of the chi-squared distribution function evaluated at p and with n degrees of freedom, and we define Chi Inv (p,0) = 0.

Although the normal approximation may be used with the standard errors to obtain confidence intervals when the count is large, we use the above exact method that holds even with small counts. When the count is large the two methods produce similar results.

See:

Johnson NL, Kotz S. Distributions in Statistics: Discrete Distributions. John Wiley, New York, 1969.

Fay MP, Feuer EJ. Confidence intervals for directly standardized rates: a method based on the gamma distribution. Statistics in Medicine 1997 Apr 15;16(7):791-801.

Age-adjusted Rate Confidence Intervals

Suppose that the age-adjusted rate is comprised of age groups x through y, and let:

\(\Large w_i = \frac{stdpop_i}{(pop_i \times \sum_{j=x}^y stdpop_j)}\Large\)

If using the Fay and Feuer method (see above):

\(\large w_m = max ( w_i )\)

\(\large z = {w_m^{2}}\)

If using the Tiwari et al. modification:

\(\large w_m = avg (w_i)\) - Note, this calculation is restricted to age groups with a pop > 0 or count > 0.

\(\large z = avg( w_i^{2})\)

\(\large v = \sum_{i = x}^{y} ( w_i^{2} \times count_i)\large\)

The lower endpoint of a (1 - p) × 100% confidence interval is calculated as:

\(\Large CI_{low} = \left( \frac{v}{2 \times rate} \right) \times \left( ChiInv\left(\frac{p}{2}, \frac{\left(2 \times rate^2 \right)}{v} \right) \right) \times 100,000\Large\)

\(\Large CI_{high} = \left( \frac{v + z}{2 \left( rate + w_m\right)} \right) \times \left( ChiInv\left(1 - \frac{p}{2}, \frac{2\left(rate + w_m \right)^2}{v + z} \right) \right)\times 100,000\Large\)

This method for calculating the confidence interval was developed in Fay and Feuer (1997). The method produces similar confidence limits to the standard normal approximation when the counts are large and the population being studied is similar to the standard population. In other cases, the above method is more likely to ensure proper coverage.

Note: The rate used in the above formulas is not per 100,000 population.

Rate Ratios

A rate ratio is one rate divided by another. See Include Rate Ratios on Last Row Variable Groupings for information on the rate ratios that SEER*Stat can generate, and Statistics in a Rate Matrix for information on other figures related to rate ratios.

For the formula for confidence interval calculations, see:

Fay MP. Approximate confidence intervals for rate ratios from directly standardized rates with sparse data. Communications in Statistics: Theory and Methods 28(9), 2141-2160.

For the formula for p-value calculations, see:

Fay MP, Tiwari RC, Feuer EJ, Zou Z. Estimating average annual percent change for disease rates without assuming constant change. Biometrics 2006 Sep;62(3):847-54.