Define groupings for variables with unlabeled values.
This exercise focuses on the techniques for working with variables that have unlabeled values. Most variables in the database have labeled values (e.g., the values for race are "White", "Black", etc.), but labeled values are not practical for all variables. For example, histologic type has nearly 2,000 values represented as integers (8000-9993). While there are histologic type variables with labeled values available in the database, it is sometimes easier to use numeric ranges to group the values when creating user-defined variables.
This tutorial was designed exclusively to explain the steps to create and modify groupings for variables with unlabeled values. If you are just getting started with SEER*Stat, be sure to do the introductory tutorials first.
Exercise
Create a table showing incidence rates (age-adjusted to the 2000 U.S. Std Population) and frequencies for malignant lung and bronchus cancer. Calculate these for persons age 65 and older diagnosed from 2017-2021 in the SEER 22 Registries.
Display the rates and frequencies by year of diagnosis (include the 2017-2021 total), sex, and histologic type. Use the following histology groupings: All Histologies (histology codes = 8000-9993); Small Cell (histology codes = 8002, 8041-8045); and All Histologies Excluding Small Cell (histology codes not included in the small cell ranges).
Create a table with a row for each year and a column for each sex. Show the histology groups on separate pages. The only difference between this exercise and Rate Exercise 1b is the addition of histologic type as a table variable. Therefore, you may start by extracting the session from the matrix file saved for Rate Exercise 1b.
Key Points
- Each variable in the database has a default set of "groupings". A grouping is a label associated with a value or set of values. As you will see, the histology variable has only one grouping (all values combined). To display data for individual histologies or grouped values, you need to create a variable.
- The "Histologic Type ICD-O-3" variable was selected for this exercise as an example of a variable with unlabeled values. In Rate Exercise 1b, "Year of diagnosis" was used as an example of a labeled variable. You will see that the steps for editing the two types of variables differ slightly.
- The definitions for Small Cell Lung and Bronchus cancer and All Excluding Small Cell Lung and Bronchus cancer require histology and site information. The SEER site recode variables, available in SEER databases, are derived from primary site and histology. By definition, a site recode value of Lung and Bronchus does not include lymphomas, leukemias, (and in some cases, mesotheliomas and Kaposi sarcoma). These are defined in separate groupings in the site recode variables. Although the histology groupings in this exercise include values for lymphomas, leukemias, mesotheliomas, or Kaposi sarcoma they will not be included in the analysis because only records with Site recode ICD-O-3/WHO 2008=Lung and Bronchus are selected.
- Add All and Add Rest are time-saving features that can be used when editing variables. These shortcuts are demonstrated in this exercise.
Instructions
Step 1: Create a Rate Session
The only difference between this exercise and Rate Exercise 1b is the addition of histologic type as a table variable. Therefore, you can either create a new session or skip a few steps by extracting the session from 1b's results matrix. If you do not have 1b's matrix but feel comfortable with the basic steps in SEER*Stat, you can use our version of Exercise 1b Results Matrix (sim, 23.5 KB).
- Start SEER*Stat.
- Create a new Rate Session either by
- Starting a new Rate Session from the New Session menu and proceeding to Step 2, or
- Using Rate Exercise 1b and proceeding to Step 6:
- Open the file saved in exercise 1b. The filename should be "Rate Exercise 1b.sim".
- From the Matrix window Actions menu, select Retrieve Session. Two windows should now be open.
- Close the matrix window containing the results calculated in exercise 1b. You should now have one window labeled "Rate Session-x" where x is the number of rate session windows that you have created since starting SEER*Stat.
- Verify that the settings on the Data, Statistic, and Table options in Steps 2-5 are selected and then follow the instructions starting in Step 6.
SEER*Stat matrix files include the session information used to generate the table. This information serves as documentation for the results and provides a convenient method for generating similar statistics.
Step 2: Select a Database
It is extremely important that you select the database as the first step. The other choices you will make in this session will be based on variables in the selected database.
- On the Data options select "Incidence - SEER Research Limited-Field Data, 22 Registries, Nov 2023 Sub (2000-2021) "
- Make sure the Age Variable is set to "Age recode with <1 year olds"
The only databases listed on the Data options Select Database dialog are databases appropriate for rate sessions that are available from the Data Locations defined on the Profile dialog. There are two types of data locations, server and local:
- A server data location is the address of a dedicated computer that stores databases and can perform SEER*Stat analyses.
- A local data location is the address of a directory on your computer or local network in which SEER*Stat databases are stored.
Step 3: Choose the Statistics to Display
- Move to the Statistic options.
- In the Statistic box, select Rates (Age-Adjusted).
- In the Parameters box, make sure:
- Standard Population is set to "2000 US Std Population (19 age groups - Census P25-1130)".
- Age Variable is set to "Age recode with <1 year olds"
Step 4: Define the Analysis Cohort
Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create three selection statements. Be sure to consider each box on the Selection options, from top to bottom, as you review the problem statement.
- Make sure that the Malignant Behavior option is checked in the Select Only box.
- Verify that the Known Age option is checked and disabled. When calculating age-adjusted rates, all records must have values that are included in the U.S. Population and Standard Population data. Unknown age is not a valid value, so records with unknown ages are always excluded from the analysis.
- The Age at Diagnosis (Std Pop, Pop, Case Files) box is for making selections based on age at diagnosis. The problem statement specifies that the rates should be calculated for persons age 65 and older. Use the top Edit button to create a statement selecting persons age 65 and older using the age recode variable. When finished, the statement should read:
{Age at Diagnosis.Age recode with <1 year olds} = '65-69 years','70-74 years','75-79 years','80-84 years','85+ years'
- The Race, Sex, Year Dx (Pop, Case Files) box must be used to make selections based on race, sex, and year of diagnosis. The problem statement specified that the rates be calculated for cases diagnosed from 2017-2021 in the SEER 22 Registries. The database selected on the Data options contains data from the SEER 22 Registries with cases diagnosed from 2000-2021. By selecting this database, you have automatically selected the correct registries. Therefore, you must make a selection based on year of diagnosis but not on registry. Use the middle Edit button to create a selection statement for cases diagnosed from 2017-2021. When finished, the statement in the middle box should read:
{Race, Sex, Year Dx.Year of diagnosis} = '2017', '2018', '2019',' 2020', '2021'
- The Other (Case Files) box must be used to make all other selections, that is, selections based on variables that are only in the case data. Create a statement selecting lung and bronchus cancer in this box. When finished, the statement in the bottom box should read:
{Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'
Step 5: Set the Display Variables
The exercise specifies that the incidence rates are to be displayed by year, sex, and histologic type (All Histologies, Small Cell, All excluding Small Cell). The variables are listed in categories in the Available Variables box on the Table options.
- First, add year as a row variable. This variable may be available in the "User-Defined" category (if you did Rate Exercise 1b and saved the variable to the dictionary). If the variable is not available, create it now. Click-by-click instructions for creating variables were given in the previous tutorial.
- Next, add sex as a column variable.
Step 6: Create a Histology Variable
The values of the "Histologic Type ICD-O-3" variable would be grouped in different ways for the various types of cancer; therefore, it only has one grouping defined by the minimum and maximum values (shown in the Unlabeled Values box on the Edit Variable dialog). A new variable is needed for this exercise that has three groupings (All Histologies, Small Cell, and All Excluding Small Cell). Work through the steps below to create the three groupings and learn about the Add All and Add Rest features.
- In the Available Variables box, use the "+" to expand the "Site and Morphology" category, double-click "Histologic Type ICD-O-3" to open the dictionary, and select the Create button to view the values and groupings for this variable. The Edit Variable dialog opens.
- Edit the Name field and give the variable this name: "Hist (Lung: Small Cell, All Excl Small Cell)".
- Delete the 8000-9993 Grouping.
- In the Groupings box on the left side of the screen, select the "8000-9993" grouping. Looking to the right side of the screen, you will see that the values in this grouping (shown in the Selected text box) match the range of all possible values (shown just above Selected). This is a grouping required for this exercise. However, in order to learn new features on this dialog, please delete this grouping.
- Click the Delete button that is just below the Groupings box (or you could use the DELETE key on your keyboard). The Groupings box should now be empty.
- Create the Small Cell grouping (Values: 8002,8041-8045).
- Type "8002,8041-8045" in the Selected values box on the right.
- Click the Add button.
- On the Add Selected Values dialog, the Added as one grouping (all values combined) options is selected. Click the OK button to add the specified range as a single grouping.
- Change the name of the grouping to "Small Cell".
- Create the All Excluding Small Cell grouping. By definition, this grouping is all values that are not small cell histologies. Therefore, this grouping could be defined as "everything not used in the currently defined groupings."
- Click the Add Rest button. An Add Rest dialog opens.
- Select Added as One Grouping (all values combined) and press the OK button.
- Change the name of the grouping to "All Excl Small Cell".
- With the All Excl Small Cell grouping selected, look to the right of the screen. The values in this grouping should be 8000-8001, 8003-8040, 8046-9993. Subsequent changes to the "Small Cell" grouping will not affect the "All Excl Small Cell" grouping. You would have to edit or recreate this group.
- Create the All Histologies grouping. SEER*Stat also provides a convenient method for adding all values of a variable. These can be added in a single grouping (all values combined) or in separate groupings (one per value).
- Click the Add All button. An Add All dialog opens.
- Be sure that Added as One Grouping (all values combined) is selected and press the OK button.
- Change the name of the grouping to "All Histologies".
- Use the Up button located at the bottom of the Groupings box to move the "All Histologies" grouping to the top of the list. The order in which the groupings appear in the Edit Variable dialog is the order they will appear in the output matrix.
- Press the OK button on the Edit Variable dialog and close the Dictionary.
- Add the new user-defined histology variable as a page variable.
Step 7: Specify a Title
- Enter the following title:
Lung and Bronchus Cancer
Incidence Rates for SEER 22 Registries, 2017-2021
Rate Exercise 2
- Set Display Rates as Cases Per to 100,000. SEER publications and reports typically show incidence rates expressed as the number of new cases per 100,000 population at risk.
- Set the Number of Decimal Places for Rates/Trends to 0.1 as the output created in this sample exercise was created using that setting.
Step 8: Execute SEER*Stat
- Select Execute from the Actions menu. A dialog will display the progress of the job. When the job completes a new window will open containing the results matrix. Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. See Results Matrix topic for more information about the SEER*Stat matrix and its features.
- Use the arrow buttons or the checkboxes on the Matrix window sidebar menu to move to different pages of the matrix.
- Compare your results to this SEER*Stat matrix file: Rate Exercise 2 Results Matrix (sim, 25.9 KB).