SEER*Stat Rate Exercise 1a: Using the Selection Tab

Age-adjustment minimizes the effect of a difference in age distributions when comparing rates. An age-adjusted rate is a weighted average of the age-specific (crude) rates, where the weights are the proportions of persons in the corresponding age groups of a standard population. The potential confounding effect of age is reduced when comparing age-adjusted rates computed using the same standard population.

Create a table showing incidence rates (age-adjusted to the 2000 U.S. standard) and frequencies for malignant lung and bronchus cancer. Calculate these rates for persons age 65 and older diagnosed from 2017 through 2021 in the SEER 22 Registries. Display the rates and frequencies by sex.

When you finish this exercise, save the matrix as "rate exercise1a.sim". The saved matrix file will be used in future exercises.

Key Points

Keep in mind that the calculation of an age-adjusted rate requires 3 types of data: case, population, and standard population. When defining the analysis cohort on the Selection Tab, choose variables carefully to ensure that the numerators match the denominators and that the standard populations are for the appropriate cohort. Follow these guidelines:

Standard populations are the age distributions used as weights to calculate age-adjusted rates. These distributions are provided by age for a given geographic area and time period. To subset your data or display rates by age you must use an age variable that is defined in the same way in case, population, and standard population data. That is, the age variable must use the same age groups in the 3 types of data.
In SEER*Stat, the Age Variable for age-adjusting is always an age recode variable or a user-defined version of an age recode variable. Only the age recode variable that is linked with the population and standard population data can be used. For this exercise, we use a variable with 19 age groups while others have single ages.
Population counts are stratified by age, year, race, sex, and geographic area. To subset your data or display rates by these demographic characteristics you must use variables that are included in both the case and population files. For example, be careful not to use variables from the "Race and Age (case data only)" category of variables. SEER collects cancer case data for single ages and for more races than are defined in the populations from the U.S Census Bureau.

SEER 22 Registry databases to include data from registries that joined the SEER Program in 2018 or later. They have a more limited set of available variables than other databases and are only available for limited statistics. See Registry Groupings in SEER Data and Statistics for more information.
SEER*Stat can report exclusion counts with the output matrix properties. In order to report exclusion counts in rate sessions, the session must be run using individual records instead of summary files. The report is available under Properties... from the Matrix menu.

Step 1: Create a New Rate Session

Start SEER*Stat.
Start a new Rate Session from the File menu or use option on the toolbar.

Step 2: Select a Database (Data Tab)

It is extremely important that you select the database as the first step. The other choices you will make in this session will be based on variables in the selected database. The correct database must be selected in order to see the correct list of variables in selection statements, table statements, and the dictionary editor.
On the Data Tab select "Incidence - SEER Research Limited-Field Data, 22 Registries, Nov 2023 Sub (2000-2021)".
When you select the SEER 22 Regs database in a rate session, a variable warning dialog will appear which provides a linked database alert that the database contains data from several sources. A link for more information is provided.
Note that the warning has a checkbox labeled, Do not show this message in future. You can mark this checkbox before clicking OK to prevent this warning from being displayed in future sessions. If you have done so in the past, it will not be displayed now.
Make sure the Age Variable is set to "Age recode with < year olds".

Learn More...

The Age Variable selection on the Data Tab determines which age recode variable is available throughout the rest of the session on the statistic, selection, and table tabs, as well as in the dictionary editor.

The only databases listed on the Data Tab are databases appropriate for rate sessions.
See Preferences in the SEER*Stat help system for more information on the primary and secondary data locations.

Step 3: Choose the Statistics to Display (Statistic Tab)

In the Statistics box, select Rates (Age-Adjusted). Frequencies (and populations) are automatically included in the output when rates are calculated.
In the Parameters box:
- Make sure that the Standard Population is set to "2000 U.S. Std Population (19 age groups - Census P25-1130)".
- Make sure the Age Variable is set to "Age recode with <1 year olds"

Learn More...

Recode variables are derived from SEER data fields when SEER*Prep creates the database. SEER*Prep creates the age recode variable based on the age at diagnosis variable (single-year ages). The groupings used in the age recode variable are determined by the age groupings in the population data. This exercise uses the age recode variable with 19 age groups (< 1 year, 1-4 years, 5-9 years, ..., 85+ years).
To age-adjust using different age groupings, you may create a variable based on the age recode variable. That is, you can collapse the age groups in the database's age recode variable. For example, you could age-adjust using 10 year age groupings.

If you need finer control over the age groups, you would select the age recode variable with single age groups, with one group for 85+ on the data tab.

Step 4: Understanding the Selection Tab in a Rate Session

Before you continue, please take time to consider the following:

As discussed in other tutorials, the statements on the Selection Tab define the subset used in your analysis. However, you will notice that in a rate session these statements are separated into three boxes.
The box at the top must be used to create selection statements based on variables that are found in all 3 types of data used to calculate age-adjusted rates. Age is the only variable used to stratify standard population, population, and case data. In this box, SEER*Stat will not allow you to create selection statements based on variables other than the age variable that is in all 3 data sources (in SEER databases this would be an age recode variable).
The middle box must be used to select records based on variables found in both the population and case data. The label above the box is provided as a guide, "Race, Sex, Year Dx" are the variables that are not in the standard population data but are in both the case and population data.
The third box must be used to select records based on variables that are found only in the case data. This would include cancer-specific variables such as stage at diagnosis, histology, site, etc. Be aware that there are demographic variables in case data files that are not included in the population or standard population data. These include marital status, place of birth, and alternate race, age, and date variables.
When making selections in a rate session, make your selection in the topmost possible box. For example, never make selections based on race or age in the case only box at the bottom. To correctly calculate rates, selections based on race must be made in the middle box and age must be made in the top box.

Step 5: Defining the Analysis Cohort (Selection Tab)

Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create three selection statements. Be sure to consider each box on the Selection Tab in order as you review the Problem Statement.

Make sure that the Malignant Behavior option is checked in the Select Only box.
The Known Age option is checked and disabled. When calculating age-adjusted rates, all records must have values that are included in the U.S. Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are always excluded from the analysis.
The top box is for making selections based on age at diagnosis. The problem statement specifies that the rates should be calculated for persons age 65 and older. Use the top Edit button to create a statement selecting persons age 65 and older in the top box using the age recode variable. When finished, the statement should read:
{Age at Diagnosis.Age recode with <1 year olds} = '65-69 years','70-74 years','75-79 years','80-84 years','85+ years'
The middle box must be used to make selections based on race, sex, year of diagnosis, registry, or county. The problem statement specified that the rates be calculated for cases diagnosed from 2017-2021 in the SEER 22 Registries. The database selected on the Data Tab contains data from the SEER 22 Registries with cases diagnosed from 2000-2021. By selecting this database, you have automatically selected the correct registries. Therefore, you must make a selection based on year of diagnosis but not on registry. Use the middle Edit button to create a selection statement for cases diagnosed from 2017-2021. When finished, the statement in the middle box should read:
{Race, Sex, Year Dx.Year of diagnosis} = '2017', '2018', '2019',' 2020', '2021'
The third box must be used to make all other selections, that is, selections based on variables that are only in the case data. Create a statement selecting lung and bronchus cancer in this box. When finished, the statement in the bottom box should read:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'

Step 6: Set Table Variables (Table Tab)

Move to the Table Tab.
It is important to understand the purpose of the selection and table tabs, these are often confused. Before you continue, please consider why some variables are being used on the Table Tab while others were used on the Selection Tab.
- The Selection Tab is used to reduce the number of records included in an analysis based on specific variables. In this exercise, we excluded records on the Selection Tab based on age at diagnosis, years of diagnosis, cancer site and behavior. We do not want to reduce the number of records based on sex -- we simply want to control the way in which they are displayed in the table.
- The Table Tab is used to set display variables and does not affect the number of records analyzed in any way. In this exercise the statistics are to be shown by sex. Therefore, the sex variable needs to be used as a display variable on the Table Tab.
Select "Sex" from the "Race, Sex, Year Dx" category.
Click Row to add "Sex" as a row variable in the Display Variables box.

Step 7: Specify a Title (Output Tab)

Move to the Output Tab.
Enter the following title:

Lung and Bronchus Cancer
Incidence Rates for SEER 22 Registries, 2017-2021
Rate Exercise 1a

Set Display Rates as Cases Per to 100,000. SEER publications and reports typically show incidence rates expressed as the number of new cases per 100,000 population at risk.
Set the Number of Decimal Places for Rates/Trends. The output created in this sample exercise was created using a setting of 0.1.

Learn More...

The number of decimal places used can be changed after the session has been executed. In a results matrix you can change this setting for a selected column of data by setting the Filter on the right mouse or Matrix menus (see Applying a Filter in the SEER*Stat help system).

Step 8: Execute SEER*Stat and View the Results

Select Execute from the Session menu or use the option on the toolbar. (Execute Offline is a 3rd option that is described below in the Learn More section.)
An alert dialog will appear with a choice to execute the session using summary files or individual records. The default is to use summary files because they process faster, but no exclusion counts will be reported.
- Select "Use individual records (report exclusion counts)" to run the session without summary files. You can set this behavior as the default for every run.
A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix.
Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
Notice that only the first line of the title (as entered on the Output Tab) appears on the title bar. The entire title will appear on printouts.
To see the case exclusion counts, select Properties... from the Matrix menu. The report includes the total number of records read and the number of records excluded for each selection statement.

Learn More...

If you are running SEER*Stat in client-server mode you will also have the option to Execute Offline. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an email message to you as soon as the execution is complete. The email message includes a link to the location of the resultant matrix.

Step 9: Save and Print the Matrix

Use the Save As command on the File menu to save the matrix. Enter "Rate Exercise 1a" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
Use the Print command on the File menu to print the matrix. You will notice that the session information is printed along with the results matrix. The session information serves as documentation for the results. Notice that the name of the matrix file is printed at the top of each page. More information regarding the matrix options and print options are included in the SEER*Stat help system.
Compare your results to this SEER*Stat matrix file: Exercise Matrix 1a Results.

Step 10: Using the Results in Other Software

Two methods can be used to take results from a SEER*Stat matrix and use them in another program:

Copy data from the matrix to the Windows clipboard. In the other program, paste the contents of the clipboard to the work space. This technique would work well for programs that allow the pasting of data, including most graphing packages such as Excel and PowerPoint. More specific instructions are provided in the SEER*Stat help system.
Export the data from the matrix to a delimited text file. Some programs, such as Excel, will allow you to open a delimited text file. In other programs, such as Joinpoint and DevCan, you must select the delimited text file as the input file. Please refer to Exporting Results in the SEER*Stat help system for instructions.