Explore three types of data used to calculate age-adjusted rates and use the Selection options to carefully define your analysis cohort.
Age-adjustment minimizes the effect of a difference in age distributions when comparing rates. An age-adjusted rate is a weighted average of the age-specific (crude) rates, where the weights are the proportions of persons in the corresponding age groups of a standard population. The potential confounding effect of age is reduced when comparing age-adjusted rates computed using the same standard population. Before proceeding, learn how to calculate age-adjusted rates.
Exercise
Create a table showing incidence rates (age-adjusted to the 2000 U.S. standard) and frequencies for malignant lung and bronchus cancer. Calculate these rates for persons age 65 and older diagnosed from 2017 through 2021 in the SEER 22 Registries. Display the rates and frequencies by sex.
When you finish this exercise, save the matrix as "rate exercise1a.sim". The saved matrix file will be used in future exercises.
Key Points
- Keep in mind that the calculation of an age-adjusted rate requires three types of data: case, population, and standard population. When defining the analysis cohort on the Selection options, choose variables carefully to ensure that the numerators match the denominators and that the standard populations are for the appropriate cohort. Follow these guidelines:
- Standard populations are the age distributions used as weights to calculate age-adjusted rates. These distributions are provided by age for a given geographic area and time period. To subset your data or display rates by age you must use an age variable that is defined in the same way in case, population, and standard population data. That is, the age variable must use the same age groups in the three types of data.
- In SEER*Stat, the age variable for age-adjusting is always an age recode variable or a user-defined version of an age recode variable. Only the age recode variable that is linked with the population and standard population data can be used. For this exercise, we use a variable with 19 age groups while others have single ages.
- Population counts are stratified by age, year, race, sex, and geographic area. To subset your data or display rates by these demographic characteristics you must use variables that are included in both the case and population files. For example, be careful not to use variables from the "Race and Age (case data only)" category of variables. SEER collects cancer case data for single ages and for more races than are defined in the populations from the U.S. Census Bureau.
- SEER 22 Registry databases include data from registries that joined the SEER Program in 2018 or later. They have a more limited set of available variables than other databases and are only available for limited statistics. See Registry Groupings in SEER Data and Statistics for more information.
- SEER*Stat can report exclusion counts with the output matrix properties. In order to report exclusion counts in rate sessions, the session must be run using individual records instead of summary files. The report is available under Session Properties from the Matrix window.
Instructions
Step 1: Create a New Rate Session
- Start SEER*Stat.
- Start a new Rate Session from the New Session menu. The Select Database dialog opens.
Step 2: Select a Database
It is extremely important that you select the database as the first step. The other choices you will make in this session will be based on variables in the selected database. The correct database must be selected in order to see the correct list of variables in selection statements, table statements, and the dictionary editor.
- On the Select Database dialog, select "Incidence - SEER Research Limited-Field Data, 22 Registries, Nov 2023 Sub (2000-2021)" and press the OK button.
- When you select the SEER 22 Regs database in a rate session, a warning dialog appears with a linked database alert that the database contains data from several sources. A link for more information is provided. Note that the warning has a checkbox labeled, Do not show this message in future. Mark this checkbox to prevent this warning from being displayed in future sessions. If you have done so in the past, it will not be displayed now. Press the OK button on the warning dialog, if it opened.
- Make sure the Age Variable is set to "Age recode with < 1 year olds".
- The Age Variable selection on the Data options determines which age recode variable is available throughout the rest of the session on the statistic, selection, and table options, as well as in the dictionary editor.
- The only databases listed on the Data options Select Database dialog are databases appropriate for rate sessions.
Step 3: Choose the Statistics to Display
- Select Statistic from the sidebar menu.
- In the Statistics box, select Rates (Age-Adjusted). Frequencies (and populations) are automatically included in the output when rates are calculated.
- In the Parameters box, verify that:
- Standard Population is set to "2000 US Std Population (19 age groups - Census P25-1130)".
- Age Variable is set to "Age recode with <1 year olds"
- Recode variables are derived from SEER data fields when SEER*Prep creates the database. SEER*Prep creates the age recode variable based on the age at diagnosis variable (single-year ages). The groupings used in the age recode variable are determined by the age groupings in the population data. This exercise uses the age recode variable with 19 age groups (00 years, 1-4 years, 5-9 years, ..., 85+ years).
- To age-adjust using different age groupings, you may create a variable based on the age recode variable. That is, you can collapse the age groups in the database's age recode variable. For example, you could age-adjust using 10 year age groupings.
- If you need finer control over the age groups, you would select the age recode variable with single age groups, with one group for 85+ on the data options.
Step 4: Define the Analysis Cohort with the Selection Options
As discussed in other tutorials, the statements on the Selection options define the subset used in your analysis. However, you will notice that in a rate session these statements are separated into three boxes:
- The Age at Diagnosis (Std Pop, Pop, Case Files) box at the top must be used to create selection statements based on variables that are found in all three types of data used to calculate age-adjusted rates. Age is the only variable used to stratify standard population, population, and case data. In this box, SEER*Stat will not allow you to create selection statements based on variables other than the age variable that is in all three data sources (in SEER databases this would be an age recode variable).
- The Race, Sex, Year Dx (Pop, Case Files) middle box must be used to select records based on variables found in both the population and case data. The label above the box is provided as a guide, "Race, Sex, Year Dx" are the variables that are not in the standard population data but are in both the case and population data.
- The third, Other (Case Files), box must be used to select records based on variables that are found only in the case data. This would include cancer-specific variables such as stage at diagnosis, histology, site, etc. Be aware that there are demographic variables in case data files that are not included in the population or standard population data. These include marital status, place of birth, and alternate race, age, and date variables.
When making selections in a rate session, make your selection in the topmost possible box. For example, never make selections based on race or age in the case only box at the bottom. To correctly calculate rates, selections based on race must be made in the middle box and age must be made in the top box. Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create three selection statements. Be sure to consider each box on the Selection options in order as you review the problem statement.
- Select Selection from the sidebar menu.
- Make sure that the Malignant Behavior option is checked in the Select Only box.
- Verify that the Known Age option is checked and disabled. When calculating age-adjusted rates, all records must have values that are included in the U.S. Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are always excluded from the analysis.
- The Age at Diagnosis (Std Pop, Pop, Case Files) box is for making selections based on age at diagnosis. The problem statement specifies that the rates should be calculated for persons age 65 and older. Use the top Edit button to create a statement selecting persons age 65 and older using the age recode variable. When finished, the Age at Diagnosis (Std Pop, Pop, Case Files) statement should read:
{Age at Diagnosis.Age recode with <1 year olds} = '65-69 years','70-74 years','75-79 years','80-84 years','85+ years'
- The Race, Sex, Year Dx (Pop, Case Files) box must be used to make selections based on race, sex, year of diagnosis, registry, or county. The problem statement specified that the rates be calculated for cases diagnosed from 2017-2021 in the SEER 22 Registries. The database selected on the Data options contains data from the SEER 22 Registries with cases diagnosed from 2000-2021. By selecting this database, you have automatically selected the correct registries. Therefore, you must make a selection based on year of diagnosis but not on registry. Use the middle Edit button to create a selection statement for cases diagnosed from 2017-2021. When finished, the Race, Sex, Year DX (Pop, Case Files) statement in the middle box should read:
{Race, Sex, Year Dx.Year of diagnosis} = '2017', '2018', '2019',' 2020', '2021'
- The Other (Case Files) box must be used to make all other selections, that is, selections based on variables that are only in the case data. Create a statement selecting lung and bronchus cancer in this box. When finished, the statement in the bottom box should read:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'
Step 5: Set Table Variables
It is important to understand the purpose of the Selection and Table options, these are often confused. Before you continue, please consider why some variables are being used on the Table options while others were used on the Selection options.
- Selection options are used to reduce the number of records included in an analysis based on specific variables. In this exercise, we excluded records based on age at diagnosis, years of diagnosis, cancer site and behavior. We do not want to reduce the number of records based on sex, rather, we simply want to control the way in which they are displayed in the table.
- Table options are used to set display variables and do not affect the number of records analyzed in any way. In this exercise the statistics are to be shown by sex. Therefore, the sex variable needs to be used as a display variable on the Table options.
- Move to the Table options.
- Select "Sex" from the "Race, Sex, Year Dx" category.
- Click Row to add "Sex" as a row variable in the Display Variables box.
Step 6: Specify a Title
- Move to the Output options.
- Enter the following title:
Lung and Bronchus Cancer
Incidence Rates for SEER 22 Registries, 2017-2021
Rate Exercise 1a - Set Display Rates as Cases Per to 100,000. SEER publications and reports typically show incidence rates expressed as the number of new cases per 100,000 population at risk.
- Set the Number of Decimal Places for Rates/Trends. The output created in this sample exercise was created using a setting of 0.1.
The number of decimal places used can be changed after the session has been executed. In a results matrix you can change this setting for a selected column of data by setting the Filter on the right mouse menus (see Applying a Filter for more information).
Step 7: Execute SEER*Stat and View the Results
- Select Execute from the Actions menu. An alert dialog will appear with a choice to execute the session using summary files or individual records. The default is to use summary files because they process faster, but no exclusion counts will be reported.
- Select Use individual records (report exclusion counts) to run the session without summary files. You can set this behavior as the default for every run.
- Press the OK button on the alert. A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. See Results Matrix for more information about the SEER*Stat matrix and its features.
- To see the case exclusion counts, select Session Properties from the Matrix window sidebar menu. The report includes the total number of records read and the number of records excluded for each selection statement.
- If you are running SEER*Stat in client-server mode you will also have the option to Execute Remotely. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an email message to you as soon as the execution is complete. The email message includes a link to the location of the resultant matrix.
- To change the default option for the "Exclusion Counts and Summary Files", please go to Edit Profile Preferences.
Step 8: Save and Print the Matrix
- Use the Save As command on the File menu to save the matrix. Enter "Rate Exercise 1a" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
- Use the Print command on the File menu to print the matrix. You will notice that the session information is printed along with the results matrix. The session information serves as documentation for the results. Notice that the name of the matrix file is printed at the top of each page. For more information, see the matrix options and print options topics.
- Compare your results to this SEER*Stat matrix file: Exercise Matrix 1a Results (sim, 21.4 KB).
Step 9: Use the Results in Other Software
Two methods can be used to take results from a SEER*Stat matrix and use them in another program:
- Copy data from the matrix to the Windows clipboard. In the other program, paste the contents of the clipboard to the work space. This technique would work well for programs that allow the pasting of data, including most graphing packages such as Excel and PowerPoint. See Copying Results to a Windows Clipboard for specific instructions.
- Export the data from the matrix to a delimited text file. Some programs, such as Excel, will allow you to open a delimited text file. In other programs, such as Joinpoint and DevCan, you must select the delimited text file as the input file. Please refer to Exporting Results for instructions.