SEER*Stat Rate Exercise 2: Variables With Unlabeled Values

This exercise focuses on the techniques for working with variables that have unlabeled values. Most variables in the database have labeled values (e.g., the values for race are "White", "Black", etc.), but labeled values are not practical for all variables. For example, histologic type has nearly 2,000 values represented as integers (8000-9992). While there are histologic type variables with labeled values available in the database, it is sometimes easier to use numeric ranges to group the values when creating user-defined variables.

This tutorial was designed exclusively to explain the steps to create and modify groupings for variables with unlabeled values. If you are just getting started with SEER*Stat, be sure to do the introductory tutorials first.

Problem Statement

Create a table showing incidence rates (age-adjusted to the 2000 US Std Population) and frequencies for malignant lung and bronchus cancer. Calculate these for persons age 65 and older diagnosed from 2007-2011 in the SEER 18 Registries.

Display the rates and frequencies by year of diagnosis (include the 2007-2011 total), sex, and histologic type. Use the following histology groupings: All Histologies (histology codes = 8000-9992); Small Cell (histology codes = 8002,8041-8045); and All Histologies Excluding Small Cell (histology codes not included in the small cell ranges).

Create a table with a row for each year and a column for each sex. Show the histology groups on separate pages.

Key Points and Reminders

  • Each variable in the database has a default set of "groupings". A grouping is a label associated with a value or set of values. As you will see, the histology variable has only one grouping (all values combined). To display data for individual histologies or grouped values, you need to create a variable.
  • The "Histologic Type ICD-O-3" variable was selected for this exercise as an example of a variable with unlabeled values. In Rate Exercise 1b, "Year of diagnosis" was used as an example of a labeled variable. You will see that the steps for editing the two types of variables differ slightly.
  • The definitions for Small Cell Lung and Bronchus cancer and All Excluding Small Cell Lung and Bronchus cancer require histology and site information. The SEER site recode variables, available in SEER databases, are derived from primary site and histology. By definition, a site recode value of Lung and Bronchus does not include lymphomas, leukemias, (and in some cases, mesotheliomas and Kaposi sarcoma). These are defined in separate groupings in the site recode variables. Although the histology groupings in this exercise include values for lymphomas, leukemias, mesotheliomas, or Kaposi sarcoma they will not be included in the analysis because only records with Site recode ICD-O-3/WHO 2008=Lung and Bronchus are selected.
  • Add All and Add Rest are time-saving features that can be used when editing variables. These shortcuts are demonstrated in this exercise.

Step 1:  Create a Rate Session

The only difference between this exercise and Rate Exercise 1b is the addition of histologic type as a table variable. Therefore, you can either create a new session or skip a few steps by extracting the session from 1b's results matrix. If you do not have 1b's matrix but feel comfortable with the basic steps in SEER*Stat you can use our version of Exercise 1b Results Matrix.

  • Start SEER*Stat.
  • To Create a New Rate Session:
    • From the File menu select New > Rate Session or use the Rate button on the toolbar.
    • Proceed to Step 2.
  • To Use Rate Exercise 1b as a Starting Point:
    • From the File menu select Open > Rate File or use the Open file/folder button on the toolbar.
    • Open the file saved in exercise 1b. The filename should be "rate exercise 1b.sim".
    • From the Matrix menu select Retrieve Session.
    • Two windows should now be open. Close the matrix window containing the results calculated in exercise 1b. You should now have one window labeled "Rate Session-x" where x is the number of rate session windows that you have created since starting SEER*Stat.
    • Verify the settings on the Data, Statistic, and Table tabs in Steps 2-5 and then follow the instructions starting in Step 6.

Learn More...

  • SEER*Stat matrix files include the session information used to generate the table. This information serves as documentation for the results and provides a convenient method for generating similar statistics.

Step 2:  Select a Database (Data Tab)

  • It is extremely important that you select the database as the first step. The other choices you will make in this session will be based on variables in the selected database.
  • On the Data Tab select "Incidence - SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2013 Sub (2000-2011) <Katrina/Rita Population Adjustment>"
  • Make sure the Age Variable is set to "Age recode with <1 year olds"

Learn More...

The only databases listed on the Data Tab are databases appropriate for frequency sessions that available ftom the Data Locations defined on the Preferences dialog. There are two types of data locations, server and local:
  • A server data location is the address of a dedicated computer that stores databases and can perform SEER*Stat analyses.
  • A local data location is the address of a directory on your computer or local network in which SEER*Stat databases are stored.

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • In the Statistics box, select Rates (Age-Adjusted).
  • In the Parameters box:
    • Make sure that the Standard Population is set to "2000 US Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with <1 year olds"

Step 4:  Defining the Analysis Cohort (Selection Tab)

Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a). Use those techniques to create three selection statements. Be sure to consider each box on the Selection Tab, from top to bottom, as you review the Problem Statement.

  • The top box is for making selections based on age at diagnosis. The Problem Statement specifies that the rates should be calculated for persons age 65 and older. Use the top Edit button to create a statement selecting persons age 65 and older in the top box using the age recode variable. When finished, the statement should read:
    {Age at Diagnosis.Age recode with <1 year olds} = '65-69 years','70-74 years','75-79 years','80-84 years','85+ years'
  • The middle box must be used to make selections based on race, sex, year of diagnosis, registry, or county. The problem statement specified that the rates be calculated for cases diagnosed from 2000-2011 in the SEER 18 Registries. The database selected on the Data Tab contains data from the 18 registries with cases diagnosed from 2000-2011. By selecting this database, you have automatically excluded data from any other registry. You must make a selection based on year of diagnosis but not based on registry. Use the middle Edit button to create a selection statement for cases diagnosed from 2007-2011. When finished, the statement in the middle box should read:
    {Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2007','2008','2009','2010','2011'
  • The third box must be used to make all other selections, that is, selections based on variables that are only in the case data. Create a statement selecting lung and bronchus cancer in this box. When finished, the statement in the bottom box should read:
    {Site and Morphology.Site recode ICD-O-3/WHO 2008} = 'Lung and Bronchus'
  • Make sure that the Malignant Behavior option is checked in the Select Only box at the top of the tab.

Step 5:  Set the Display Variables (Table Tab)

  • The Problem Statement specifies that the incidence rates are to be displayed by year, sex, and histologic type (All Histologies, Small Cell, All excluding Small Cell).
  • The variables are listed in categories in the Available Variables box at the top of the screen. First, add year as a row variable. This variable may be available in the User-Defined category (if you did Rate Exercise 1b and saved the variable to the dictionary). If the variable is not available, create it now. Click-by-click instructions for creating variables were given in previous tutorials.
  • Next, add sex as a column variable.

Step 6:  Create a Histology Variable (Edit Variable Dialog)

Check the groupings for the histologic type variable to see if it meets the needs of this exercise.

  • Use the "+" to expand the "Site and Morphology" category.
  • Double-click "Histologic Type ICD-O-3" to open the dictionary.
  • Click Create to view the values and groupings for this variable (the Edit variable dialog will open).

The values of the "Histologic Type ICD-O-3" variable would be grouped in different ways for the various types of cancer; therefore, it only has one grouping defined by the minimum and maximum values (shown in the Unlabeled Values box on the right). A new variable is needed for this exercise that has three groupings (All Histologies, Small Cell and All Excluding Small Cell). Work through the steps below to create the three groupings and learn about the Add All and Add Rest features.

  • Edit the Name field and give the variable this name: "Hist (Lung: Small Cell, All Excl Small Cell)"
  • Delete the 8000-9992 Grouping
    • In the Groupings box on the left side of the screen, select the "8000-9992" grouping. Looking to the right side of the screen, you will see that the values in this grouping (shown in the Selected text box) match the range of all possible values (shown just above Selected). This is a grouping required for this exercise. However, in order to learn new features on this dialog, please delete this grouping.
    • Click the Delete button that is just below the Groupings box (or you could use the delete key on your keyboard).
    • The groupings box should now be empty.
  • Create the Small Cell Grouping (Values: 8002,8041-8045)
    • Create a new grouping. First, type 8002,8041-8045 in the selected values box on the right.
    • Click Add.
    • On the Add Selected Values dialog, "Added as one grouping (all values combined)" will be selected. Click OK to add the specified range as a single grouping.
    • Change the name of the grouping to "Small Cell".
  • Create the All Excluding Small Cell Grouping
    • By definition, this grouping is all values that are not small cell histologies. Therefore, this grouping could be defined as "everything not used in the currently defined groupings."
    • Click Add Rest.
    • Select "Added as One Grouping (all values combined)" and click OK.
    • Change the name of the grouping to "All Excl Small Cell".
    • With the All Excl Small Cell grouping selected, look to the right of the screen. The values in this grouping should be 8000-8001, 8003-8040, 8046-9992.
    • Subsequent changes to the "Small Cell" grouping will not affect the "All Excl Small Cell" grouping. You would have to edit or recreate this group.
  • Create the All Histologies Grouping
    • SEER*Stat also provides a convenient method for adding all values of a variable. These can be added in a single grouping (all values combined) or in separate groupings (one per value).
    • Click Add All.
    • Be sure that "Added as One Grouping (all values combined)" is selected and click OK.
    • Change the name of the grouping to "All Histologies".
    • Use the Up button located at the bottom of the Groupings box to move the "All Histologies" grouping to the top of the list. The order in which the groupings appear in the Edit Variable window is the order they will appear in the output matrix.
  • Close the dictionary window and add the new histology variable as a page variable.

Step 7:  Specify a Title (Output Tab)

  • Enter the following title:
  • Lung and Bronchus Cancer
    Incidence Rates for SEER 18 Registries, 2007-2011
    Rate Exercise 2
  • Set Display Rates as Cases Per to 100,000. SEER publications and reports typically show incidence rates expressed as the number of new cases per 100,000 population at risk.
  • Set the Number of Decimal Places for Rates/Trends. The output created in this sample exercise was created using a setting of 0.1.

Step 8:  Execute SEER*Stat

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes a new window will open containing the results matrix. Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. The Results Matrix section of the help system contains more information about the SEER*Stat matrix and its features.
  • Notice that only the first line of the title (as entered on the Output Tab) appears on the title bar. The entire title will appear on printouts.
  • Use the arrow buttons or the drop down box on the SEER*Stat toolbar to move to different pages of the matrix.
  • Compare your results to this SEER*Stat matrix file: Rate Exercise 2 Results Matrix.