An official website of the United States government

Frequency Exercise 1b: User-Defined Variables

left-img

Learn how to extract a session from the results matrix, use the data dictionary, and create your own variables.

In Frequency Exercise 1a you created a matrix showing the frequency of lung and bronchus cancer for cases of known age by stage at diagnosis for the years 2000-2021. The table included a cell for "Blank(s)", which included lung and bronchus cancer cases for years in which stage is not coded, as well as a cell for "N/A",  not applicable to this cancer site. In this exercise, you will create the same table without the unwanted cells and include only the years when stage was coded.

Exercise

Using the matrix created in Frequency Exercise 1a, create a new table that does not include the rows that are not applicable, and select only cases diagnosed from 2004-2021 in the SEER 22 registries. To start, extract the session from the matrix saved in Frequency Exercise 1a.

Key Points

This exercise builds on the previous exercise to illustrate file management strategies and introduces the SEER*Stat data dictionary. In addition, it demonstrates potential inconsistencies in the data.

  • Session information is stored within a matrix and can be extracted when needed. It is a good idea to save matrix files containing statistics that you plan to publish. You are likely to need the matrix to verify your graphs or reports. Typically, you only need to save matrix files and not session files. When you have a matrix, you can always get the session information by using the Retrieve Session button on the Actions menu.
  • New variables may be created using existing variables as a starting point. In this exercise, a new stage variable is created that does not include the unwanted groupings.
  • The SEER Program strives to make all Localized/Regional/Distant (L/R/D) stage variables consistent for all cancer sites for the appropriate years. However, there are certain site/year combinations where this is not possible. For example, the Combined Summary Stage (2004+) variable only includes cases diagnosed from 2004+. To see which cancer sites were affected by the stage adjustments, click on the For More Information... link located on the warning dialog that appears when you execute this session.
  • When using a variable in a table, the groupings shown in the table are determined by the variable's definition in the database. As you will see, the databases provided with SEER*Stat contain variables that are formatted for common use. However, you will often need to add or remove groupings for your particular analysis. See Working With Variables for more information.

Instructions

Step 1:  Open Exercise 1a's Matrix

  1. Start SEER*Stat.
  2. Select Open from the File menu.
  3. Use the Browse button or Recent Files list to open the file saved in exercise 1a. The filename should be "Frequency Exercise 1a.sfm". If you did not save the output for exercise 1a you may open our version of the output: Exercise Matrix 1a Results (sfm, 20.1 KB).

  • The folder icon on the title bar can be used to open any type of SEER*Stat file. It is equivalent to selecting Open from the File menu. Select the Browse button to access the Open dialog where, next to the File Name, you can specify a SEER*Stat file type to limit list of available files.
  • SEER*Stat uses the ".sfm" extension for "SEER*Stat Frequency Matrix" files.

Step 2:  Extract the Session

SEER*Stat matrix files include the session information used to generate the table. This information serves as documentation for the results and provides a convenient method for generating similar statistics.

  1. From the Actions menu select the Retrieve Session button.
  2. Press the OK button on the Linked Database Change warning, if it opens. Two windows should now be open.
  3. Close the matrix window containing the results calculated in exercise 1a. You should now have one window labeled "Frequency Session-x" where x is the number of frequency session windows that you have created since starting SEER*Stat.

  • SEER*Stat uses two modes of execution. If you are using SEER*Stat in client-server mode, you may have been prompted for a username and password. See Local vs. Client-Server Mode to learn more.
  • SEER*Stat is an MDI application. MDI stands for Multiple Document Interface. Word and Excel are MDIs. That is, they allow you to work with more than one document at a time. With SEER*Stat that means that you can have any combination of session and matrix windows open at the same time.
  • SEER*Stat displays the filename in the title bar of an open window. If the session or matrix is not saved then a generic, numbered label is used ("Frequency Session-x", "Frequency Matrix-x", etc).

Step 3:  View the Extracted Session

Since we extracted this session from exercise 1a's matrix file, most of the session options are correct. Use the sidebar menu to take a look at each option set:

  1. There are no changes necessary on the Data options - the "Incidence - SEER Research Limited-field Data, 22 Registries, Nov 2023 Sub (2000-2021)" database should be selected.
  2. There are no changes to be made on the Statistics options - column percentages should already be selected.
  3. Click Selection from the sidebar menu. When you created this session for exercise 1a, the statement to select lung and bronchus cases was added, and in the Select Only box the Malignant Behavior option was turned off. Since the "Combined Summary Stage (2004+)" variable only has cases coded for 2004+, you will need to make changes to the selection statement to only include cases diagnosed from 2004-2021.

For a complete list of cancer sites affected by the stage adjustment, see Localized/Regional/Distant Stage Adjustments.

Step 4:  Modify the Selection Statement

  1. In the Selection Statement box, press the Edit button to open the Case Selection dialog.
  2. Check that the "And" logical operator is selected and press the New Line button to open the Case Selection Line dialog.
  3. In the Variable box, use the "+" to expand the ""Race, Sex, Year Dx" category and select "Year of diagnosis".
  4. Moving to the center of the window, check that "is = to" is selected as the Operator.
  5. In the Values box, select all the years from 2004 though 2021. The following should appear in the Selection Statement Line box at the bottom of the dialog:
     {Race, Sex, Year Dx.Year of diagnosis} = ''2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'.
  6. Select the OK button to close the Case Selection Line dialog. The Selection Statement on the Case Selection dialog should read: {Site and Morphology.Site recode ICD-O-3/WHO 2008} = '    Lung and Bronchus'
    AND {Race, Sex, Year Dx.Year of diagnosis} = '2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021'
  7. Select the OK button to close the Case Selection dialog.

Step 5:  Edit the Combined Summary Stage (2004+) Variable

In the Table options, "Combined Summary Stage (2004+)" is listed as the row variable. As we saw in the output for the previous exercise, the format of this variable in the selected database includes 7 groupings: "In situ", "Localized", "Regional", "Distant", "N/A", "Unknown/unstaged", and "Blank(s)". Since we are calculating frequencies for lung and bronchus cancer for 2004-2021, we need to create a new variable to remove the groupings that are not applicable. Before you continue, please take time to consider the following:

  • SEER*Stat is distributed with several databases containing pre-formatted variables. That is, each variable has one or more defined "groupings", groups of values with an associated label. Commonly used groupings are set by default in the databases distributed with the software.
  • You may create new variables based on existing variables. When you create a variable you are not adding data to the database. You are simply defining a new set of groupings for an existing variable.
  • For this exercise, you need to create a new variable based on "Combined Summary Stage (2004+)".
  1. Select Table from the sidebar menu.
  2. Select the Dictionary button from the Actions menu, or double-click on the "Combined Summary Stage (2004+)" variable listed in the Available Variables box or the Display Variables box. The Dictionary editor opens.
  3. If it is not already selected, select the "Combined Summary Stage (2004+)" variable from the "Stage - Summary/Historic" category. (Use the + sign to expand the variable categories.)
  4. The Create button will be enabled when a variable is selected. Use the Create button to open the Edit Variable dialog. In the next step, you will create a new variable by editing "Combined Summary Stage (2004+)" and saving the revised variable with a new name.

  • The method used to open the data dictionary is strictly a matter of personal preference. When the data dictionary is opened by double-clicking a variable, that variable is highlighted in the Dictionary editor. That can save one step if you are creating a user-defined variable based on the selected variable.
  • These are the main features and controls of the Edit Variable window:
    • Name - Every variable in the dictionary must have a unique name.
    • Groupings - A grouping is a group of values with an associated label. When you click on a label in the Groupings box, the values associated with the label will be highlighted in the Values box. Groupings are essentially format statements that allow you to label individual or groups of values. Throughout these exercises you will be adding and deleting groupings to create tables.
    • Values - All values occurring in the database for the variable are listed. The values for most variables will be listed with descriptive labels. The list of values can not be changed, it is determined when the database is created.

Step 6:  Create a New Stage Variable

  1. Edit the Name field and give the variable this name: "Combined Summary Stage (2004+) (IS/L/R/D/U)". This naming convention is a common shorthand to say that this is a variable based on the "Combined Summary Stage (2004+)" variable, and the groupings are in situ (IS), localized (L), regional (R), distant (D), and unknown/unstaged (U).
  2. Select the "N/A" and "Blank(s)" Groupings and delete them using either your DELETE key or the Delete button below the Groupings box.
  3. Click the OK button. A new category, "User-Defined" is added to the dictionary. Click the Close button to close the dictionary.


  • Other dictionary features are explored in later exercises.
  • The Save to Dictionary checkbox on the Edit Variable dialog is selected by default and allows you to use a variable in other sessions that use the same database.
  • You will develop your own naming conventions as you become more experienced with SEER*Stat. Some variables are generic and can be used for a variety of sessions. By using meaningful variable names you will be able to easily identify the variables in your data dictionary.

Step 7:  Replace the Row Variable

  1. In the Table options Display Variables box, remove "Combined Summary Stage (2004+)" from the row by using either your DELETE key or the Remove button.
  2. All variables are listed in categories in the Available Variables box at the bottom of the screen. Use the "+" to expand the "User-Defined" category.
  3. Select the new stage variable that you created in the previous step and select the Row button. The newly created stage variable is listed as a row variable in the Display Variables box.

Step 8:  Review the Title

  1. Select Output from the sidebar menu.
  2. Remember that you are working with a session that was extracted from exercise 1a's matrix file. Therefore, change the title to reflect the new year range and that this is exercise 1b:

Lung and Bronchus Cancer
Stage Distribution SEER 22
Years 2004-2021
Frequency Exercise 1b

Step 9:  Execute SEER*Stat

  1. Select the Execute button from the Actions menu and press OK on the variable warning. A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. Results shown in the SEER*Stat matrix window cannot be edited. You do have the ability to print the matrix, export the results to a text file, and copy-and-paste data into other applications. See Results Matrix for more information about the SEER*Stat matrix and its features.
  2. Compare your results to this SEER*Stat matrix file: Exercise Matrix 1b Results (sfm, 20.4 KB). Notice that the unwanted rows have been removed.


  • If you are running SEER*Stat in client-server mode, you have the option to Execute Remotely from the Actions menu. This feature is intended for jobs that take several minutes or longer to complete. SEER*Stat automatically sends an email message to you as soon as the execution is complete. The email message includes a link to the location of the resultant matrix. 
  • If you have already started executing a session but want to execute remotely, select the Finish Offline button on the status dialog.

right-img