Technical Notes: Reporting Delay
Timely and accurate calculation of cancer incidence rates is hampered by reporting delay, the time lapse before a diagnosed cancer case is reported to the NCI or the delay in receiving updated information for an existing case. Currently, the NCI allows a standard delay of 22 months between the end of the diagnosis year and the time the cancers are reported to the NCI in November, almost two years later. The data are released to the public in the spring of the following year. For example, cases diagnosed in 2006 were first reported to the NCI in November 2008 and released to the public in April 2009. However, in each subsequent release of the SEER data, records from all prior diagnosis years (e.g., diagnosis years 2005 and earlier in the 2008 submission to the NCI) are updated as either new cases are found or new information is received about previously submitted cases.
The submissions for the most recent diagnosis year are, in general, about two percent below the total number of cancers that will eventually be submitted for that year, although this varies by cancer site and other factors.
The idea behind modeling reporting delay is to adjust the recent rates to anticipate future corrections (additions, changes, and deletions) to the data. These adjusted rates and the associated delay model are valuable in more precisely determining current cancer trends, as well as in monitoring the timeliness of data collection—an important aspect of quality control (Clegg et al., 2002). Reporting delay models have been previously used in the reporting of AIDS cases (Brookmeyer & Damiano, 1989; Pagano et al., 1994; Harris, 1990).
In this report, we show SEER age-adjusted incidence rates and trends, along with their calculated delay adjustments for SEER 9 and SEER 13 areas. The adjusted rates, factors, and trends are available for all cancers combined (malignant only except for urinary bladder), for female breast in situ, for urinary bladder (in situ and malignant), and for 22 malignant cancer sites: melanoma (for all races combined and whites only), lung/bronchus, colon/rectum, prostate, female breast, liver and intrahepatic bile duct, pancreas, cervix uteri, corpus and uterus, ovary, testis, kidney and renal pelvis, brain and other nervous system, Hodgkin lymphoma, non-Hodgkin lymphoma, all leukemias, esophagus, larynx, myeloma, oral cavity and pharynx, thyroid, and stomach.
Estimates of observed incidence rates, delay-adjusted incidence rates, and delay-adjustments factors may be found in the Cancer Query Systems.
The SEER 9 delay model
For each cancer site, many combinations of covariates were considered in prediction models of delay probabilities. Potential covariates included delay time, year of diagnosis, age at diagnosis, sex, race, and reporting year effect [Zou et al, 2009]. Models were evaluated by fitting the SEER 9 models using 1983 and 2007 annual submissions, with a maximum 26 year delay, then predicting the counts for the 2008 submission. For each cancer site, the model that minimized the sum of squared prediction errors was chosen as the default final model. However, to choose a more parsimonious model, we added an additional selection step in which possible competing models were selected using the following criteria:
- the competing model had fewer number of parameters of the default model, and
- the percent change between the prediction errors of the competing and the default models per extra parameter (i.e., percent change in prediction errors divided by the difference in the numbers of parameters between the two models) was less than 1 percent.
If more than one competing model met the criteria, the model with the smallest percentage change per extra parameter was generally selected. However, if there are other competing models that had fewer parameters and the differences between their percentage changes per extra parameter and the smallest one did not exceed 0.02, the competing model with the fewest number of parameters (rather than the model with the smallest percentage change per extra parameter) was selected. The chosen model was then refitted using all data (1983-2008 submissions, 1981-2006 diagnosis years) to estimate delay distributions and calculate delay adjusted estimates of the cancer counts.
Age-adjusted (using the 2000 US standard million population) cancer incidence rates were then calculated with and without adjusting for reporting delay. Joinpoint linear regression was used to obtain the annual percentage changes for the 1975-2006 incidence rates for the data series with and without delay adjustment. Because the delay distribution was assumed complete after 26 years, incidence rates for diagnosis years prior to 1982 were not reporting-adjusted. In joinpoint regression analyses, up to four change points (i.e, 5 trend-line segments) were allowed, and these were modeled to fall at either whole years or midway between diagnosis years. Change points were constrained to be at least 2 years away from both the beginning and the end of the data series and at least 2 years apart. Models were fitted using weighted least squares (weighted by appropriate variances of age-adjusted incidence rates) of the joinpoint regression software.
Results show that adjusting for delay tends to raise cancer incidence rates in more current reporting years. While this adjustment increases the rate of change over the most recent diagnosis years, it probably will only rarely cause the detection of a new joinpoint, although this is possible. See Clegg et al. (2002) for details on the impact of reporting-delay adjustment to SEER cancer incidence rates.
The SEER 13 Delay Model
Starting with the April 2009 release of the Cancer Statistics Review we estimated delay adjusted rates for SEER 13 registries. SEER 13 consists of SEER 9 registries, covering diagnosis years 1975 through the present, plus 4 newer registries (Los Angeles, San Jose-Monterey, Rural Georgia, Alaska Native Tumor Registry) covering diagnosis years 1992 through the present. These four registries will be referred to here as SEER 13-9. Delay-adjusted rates for SEER 13 were obtained through a 2 step process. First, the delay adjustment factors are derived separately for SEER 9 and SEER13-9.
Delay adjusted age specific case counts are computed for SEER 9 and SEER 13-9 using their respective delay adjustment factors. Weighted averages of the SEER 9 and SEER 13-9 age specific cancer case counts (with the weights equal to the populations in each registry group) are combined to compute the delay adjusted case counts for SEER 13. These adjusted case counts are then paired with the appropriate denominator to obtain age-specific rates, and are age-adjusted in the usual manner. The formula to compute the age-adjusted combined SEER 13 rates is:

![]()
i is for age, j is for stratum defined by multiple variables included in the delay model.
Future developments will include an application program to allow the computation of SEER 13 combined delay adjusted rates based on this formula.
Consecutive data submissions were not available for the Alaska Native Tumor Registry for the entire period of interest. Modeling SEER 13-9, therefore, was conducted using only Los Angeles, San Jose-Monterey and Rural Georgia though the final delay-adjustment factors were applied across the four registries.
In creating the SEER 13-9 model, we first followed the same process of model selection and delay adjustment factor estimation as is used in SEER 9. We then modified the SEER 13-9 factors to share the same delay adjustment factors as SEER 9 under the assumption that the data have the same delay distribution prior to 1992. The modified delay adjusted factors are then the final estimated factors for SEER 13-9. As with SEER 9, we also assume that in SEER 13-9 there is no delay after 26 year of reporting.
The example in the graph below shows white female breast cancer delay adjustment factors for SEER 9, SEER 13-9, and modified SEER 13-9 for diagnosis years 1981 through 2006. The maximum delay time for the registries in SEER13-9 is only 15 years. We assume that the delay adjustment factors for SEER 13-9 are the same as those for SEER 9 factors when delay time is greater than 15 years.

The black line represents the fit to the SEER 9 registries and assumes there is no delay after 26 years. The 26 year maximum delay was set using all of the currently available data submissions (1983- present) in our archive and also because updates to case information beyond 26 years is deemed minimal. The blue line represents the fit to SEER 13-9 registries and has a maximum 15 year delay which corresponds to the first NCI data submission for these registries in 1994.
The SEER 9 line shows that at 15 years there is a delay adjustment of 1.002 (0.02% further adjustment from 15 to 26 years). Finally, the red line corresponds to the modified delay adjustment factors for SEER 13-9, assuming that 0.02% also remains to be reported by SEER 13-9 after 15 years. The combined SEER 13 delay adjusted rates for 2006 are obtained by utilizing the delay adjustment factors for both SEER 9 (1.021) and SEER 13-9 (1.018).
Cancer Sites and Variables
Delay-adjusted incidence rates and trends are reported for all cancers combined (malignant only except for urinary bladder), for female breast in situ, for urinary bladder (in situ and malignant), and for 22 malignant cancer sites: melanoma (for all races combined and whites only), lung/bronchus, colon/rectum, prostate, female breast, liver and intrahepatic bile duct, pancreas, cervix uteri, corpus and uterus, ovary, testis, kidney and renal pelvis, brain and other nervous system, Hodgkin lymphoma, non-Hodgkin lymphoma, all leukemias, esophagus, larynx, myeloma, oral cavity and pharynx, thyroid, and stomach.
A delay distribution models the probability of a cancer being reported after a delay of d years (d = 2, 3, ...25). The number of cancers reported at each delay year is assumed to follow a Poisson distribution. Cases are removed as corrections to the data are made, and the probability of removing cases is modeled as a binomial distribution. To reduce the number of parameters that have to be estimated and to achieve stability in the tails of the delay distributions, an assumption is made that all cancer cases will be reported within 25 years of diagnosis.
The delay distributions were modeled as a function of covariates using a discrete-time proportional hazards model. The following potential covariates are included: age at diagnosis, sex, diagnosis year, delay times, and race/ethnicity. For each cancer site, a delay distribution was calculated for all races combined and a separate delay distribution was calculated for whites and blacks. In the distributions for all races combined, if a patient's race value changed between two submission years the change of value does not contribute to the delay model. For melanoma, only all races combined and whites were analyzed because melanoma is rare for blacks. Visit the Statistical Research & Applications Branch website for a complete list of covariates and this year's models for each cancer site.
