SEER*Stat Frequency Exercise 1b: User-defined Variables

In Frequency Exercise 1a you created a matrix showing the frequency of lung and bronchus cancer for cases of known age by stage at diagnosis for the years 1973-2011. The table included a cell for "Blank(s)", which included lung and bronchus cancer cases for years in which stage is not coded. In this exercise, you will create the same table without the unwanted table cell, and include only the years when stage was coded.

Using the matrix created in Frequency Exercise 1a, create a new table that does not include the "Blank(s)" row, and select only cases diagnosed from 1998-2011 in the SEER 18 registries. To start, extract the session from the matrix saved in Frequency Exercise 1a.

Key Points and Reminders

This exercise builds on the previous exercise to illustrate file management strategies and provides an introduction to the SEER*Stat data dictionary. In addition, it demonstrates potential inconsistencies in the data.

  • Session information is stored within a matrix and can be extracted when needed. It is a good idea to save matrix files containing statistics that you plan to publish. You are likely to need the matrix to verify your graphs or reports. Typically, you only need to save matrix files and not session files. When you have a matrix you can always get the session information by using Retrieve Session on the Matrix menu.
  • New variables may be created using existing variables as a starting point. In this exercise, a new stage variable is created that does not include the "Blank(s)" grouping.
  • The SEER Program strives to make all Localized/Regional/Distant (L/R/D) stage variables consistent for all cancer sites for the appropriate years. However, there are certain site/year combinations where this is not possible. For example, lung and bronchus cancer has been blanked out in the Summary stage 2000 (1998+) variable for all cases diagnosed prior to 1998. To see which cancer sites were affected by the stage adjustments, click on the "For More Information..." link located on the warning dialog that appears when you execute this session. This link is also available on the selection dialog and within the dictionary editor when working with the Summary stage 2000 (1998+) variable.
  • When using a variable in a table, the groupings shown in the table are determined by the variable's definition in the database. As you will see, the databases provided with SEER*Stat contain variables that are formatted for common use. However, you will often need to add or remove groupings for your particular analysis. See Working With Variables in the SEER*Stat help system for more information.

Step 1:  Open Exercise 1a's Matrix

  • Start SEER*Stat.
  • From the File menu select Open > Frequency File or use the Open file/folder button on the toolbar.
  • Open the file saved in exercise 1a. The filename should be "Frequency Exercise 1a.sfm".
  • If you did not save the output for exercise 1a you may open our version of the output: Exercise Matrix 1a Results.

Learn More...

  • The Open file/folder button on the toolbar can be used to open any type of SEER*Stat file. It is equivalent to selecting Any SEER*Stat File from the File > Open menu. SEER*Stat will automatically determine the file type and open the appropriate session or matrix. If you prefer, you can specify the type of file you wish to open using the File menu. The only benefit to specifying the type of file is to reduce the number of files listed in the folder. See SEER*Stat's Help System for more information.
  • SEER*Stat uses the ".sfm" extension for "SEER*Stat Frequency Matrix" files.

Step 2:  Extract the Session

  • SEER*Stat matrix files include the session information used to generate the table. This information serves as documentation for the results and provides a convenient method for generating similar statistics.
  • From the Matrix menu select Retrieve Session.
  • Two windows should now be open. Close the matrix window containing the results calculated in exercise 1a. You should now have one window labeled "Frequency Session-x" where x is the number of frequency session windows that you have created since starting SEER*Stat.

Learn More...

  • SEER*Stat uses two modes of execution. If you are using SEER*Stat in client-server mode, you may have been prompted for a username and password. See Local vs. Client-Server Mode in the SEER*Stat help system to learn more.
  • SEER*Stat is an MDI application. MDI stands for Multiple Document Interface. Word and Excel are all MDIs. That is, they allow you to work with more than one document at a time. With SEER*Stat that means that you can have any combination of session and matrix windows open at the same time.
  • SEER*Stat displays the filename in the title bar of an open window. If the session or matrix is not saved then a generic, numbered label is used ("Frequency Session-x", "Frequency Matrix-x", etc).

Step 3:  View the Extracted Session

  • Since we extracted this session from exercise 1a's matrix file, most of the tabs have the correct settings. Take a look at each tab.
    • There are no changes necessary on the Data Tab -- the SEER 18 Registries database should be selected.
    • There are no changes to be made on the Statistics Tab -- frequencies and column percentages should already be selected.
    • Move to the Selection Tab. When you created this session for exercise 1a the statement to select lung and bronchus cases was added, and in the Select Only box the behavior option was turned off. Since all lung and bronchus cancer cases from 1973-1997 are coded as "Blank(s)" in the "Summary stage 2000 (1998+)" variable, you will need to make changes to the selection statement to only include cases diagnosed from 1998-2011.

Learn More...

Step 4:  Modify the Selection Statement

  • On the Selection Tab and select Edit.
  • Using the controls at the top of the Case Selection window to modify the search statement.
  • In the Variable box, use the "+" to expand the "Race, Sex, Year Dx, Registry, County" category.
  • Select "Year of diagnosis". Notice that a new line is added to your original selection statement joined by AND.
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Select "1998" in the Values box and select all the years from 1998 though 2011.
  • At this time, the following should appear in the Selection Statement box at the bottom of the window:
    {Site and Morphology.Site recode ICD-O-3/WHO 2008} = ' Lung and Bronchus'
    AND {Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '1998','1999','2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011'
  • Use the OK button to close the Case Selection window.

Step 5:  Verify the Table Tab

  • Move to the Table Tab.
  • "Summary stage 2000 (1998+)" is listed as the row variable. As we saw in the output for the previous exercise, the format of this variable in the selected database includes 6 groupings: "In situ", "Localized", "Regional", "Distant", "Unknown/unstaged", and "Blank(s)". Since we are calculating frequencies for lung and bronchus cancer for 1998 through 2011, we need to create a new variable to remove the blank grouping.

Learn More...

  • The databases provided with SEER*Stat contain variables that are formatted for common use. Part of learning SEER*Stat is familiarizing yourself with the variables distributed with each database.

Step 6:  Open the Dictionary Editor

  • Before you continue, please take time to consider the following:
    1. SEER*Stat is distributed with several databases containing pre-formatted variables. That is, each variable has one or more defined "groupings", groups of values with an associated label. Commonly used groupings are set by default in the databases distributed with the software.
    2. You may create new variables based on existing variables. When you create a variable you are not adding data to the database. You are simply defining a new set of groupings for an existing variable.
    3. For this exercise, you need to create a new variable based on "Summary stage 2000 (1998+)".
  • There are several ways to open the SEER*Stat dictionary editor. Open the dictionary now by:
    1. selecting Dictionary from the File menu,
    2. using the Dictionary button on the toolbar; or
    3. double-clicking on the "Summary stage 2000 (1998+)" variable listed in the Available Variables box or the Display Variables box on the Table Tab.

Learn More...

  • The method used to open the data dictionary is strictly a matter of personal preference. When the data dictionary is opened by double-clicking a variable, that variable is highlighted in the Dictionary window. That can save one step if you are creating a user-defined variable based on the selected variable.

Step 7:  Edit the Summary stage 2000 (1998+) Variable

  • The Dictionary window should now be open.
  • If it is not already selected, select the "Summary stage 2000 (1998+)" variable from the "Stage - LRD (Summary and Historic)" category. (Use the + sign to expand the variable categories.)
  • The Create button will be enabled when a variable is selected. Use the Create button to open the Edit Variable window. You will be creating a new variable by editing "Summary stage 2000 (1998+)" and saving the revised variable with a new name.

Learn More...

These are the main features and controls of the Edit Variable window:

  • Name - Every variable in the dictionary must have a unique name.
  • Groupings - A grouping is a group of values with an associated label. When you click on a label in the Groupings box, the values associated with the label will be highlighted in the Values box. Groupings are essentially format statements that allow you to label individual or groups of values. Throughout these exercises you will be adding and deleting groupings to create tables.
  • Values - All values occurring in the database for the variable are listed. The values for most variables will be listed with descriptive labels. The list of values can not be changed, it is determined when the database is created.

Step 8:  Create a New Stage Variable

  • Edit the Name field and give the variable this name: "Summary stage 2000 (1998+) (IS/L/R/D/U)". This naming convention is a common shorthand to say that this is:
    • a variable based on the "Summary stage 2000 (1998+)" variable,
    • and the groupings are in situ (IS), localized (L), regional (R), distant (D), and unstaged (U).
  • You will develop your own naming conventions as you become more experienced with SEER*Stat. You will find that some variables are generic and can be used for a variety of sessions. By using meaningful variable names you will be able to easily identify the variables in your data dictionary.
  • Select the "Blank(s)" grouping. Delete this grouping using either your delete key or the Delete button below the Groupings box.
  • Click OK.
  • You will notice that a new category, "User-Defined" has been added to the dictionary. Click Close to close the dictionary.

Learn More...

  • Other features of the dictionary editor will be explored in later exercises. You will see that the Save to Dictionary option allows you to use a variable in other sessions that use the same database. Give your variables meaningful names to avoid clutter in your dictionary.

Step 9:  Replace the Row Variable

  • In the Display Variables box on the Table Tab remove "Summary stage 2000 (1998+)" from the row. To do this, select the variable and use either your delete key or the Remove button on the right side of the screen.
  • All variables are listed in categories in the Available Variables box at the bottom of the screen.
  • Use the "+" to expand the "User-Defined" category.
  • Select the new stage variable that you created in the previous step.
  • Click Row on the right side of the screen.
  • At this time, your newly created stage variable should be listed as a row variable in the Display Variables box at the top of the window.

Step 10:  Review the Title

  • Move to the Output Tab.
  • Remember that you are working with a session that was extracted from exercise 1a's matrix file. Therefore, change the title to reflect the new year range and that this is exercise 1b:
  • Lung and Bronchus Cancer
    Stage Distribution SEER 18
    Varying Years 1998-2011
    Frequency Exercise 1b

Step 11:  Execute SEER*Stat

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. The "Output Matrix" section of the help system contains more information about the SEER*Stat matrix and its features.
  • Compare your results to this SEER*Stat matrix file: Exercise Matrix 1b Results.
  • Notice that the "Blank(s)" row has been removed.

Learn More...

  • If you are running SEER*Stat in client-server mode you will also have the option to Execute Offline. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an email message to you as soon as the execution is complete. The email message includes a link to the location of the resultant matrix.