After producing a results matrix, you can create a text data file containing the results of your analysis. This data file can be used as input to other software packages. Any software package which can read text files can import the text-based export file. Use the options on the Export dialog to configure the data file so that it is compatible with your target software package.
Depending on your needs, it may be more efficient to copy and paste the SEER*Stat data into the other application.
To export a SEER*Stat results matrix, follow these steps:
- Click on the matrix window to make sure it is active.
- Open the Matrix menu and select Export.
- Type or use the Browse buttons to select locations and filenames for the two exported files.
- Select options which will produce an ASCII text file that is compatible with your target software package.
- Click OK when ready.
Exported SEER*Stat results are stored in an ASCII text data file. The format description of the data file is written to a second text file, the export dictionary file. You may also choose to output a third file containing SAS code that reads in the text data file.
When the Export dialog (shown above) first appears, default name and directories for each file will be presented. Use the Browse button or type in the fields to specify your own names and directories.
- If you have not saved the matrix, the default path for the files will be the same as the user variables path specified on the Preferences dialog. The default filenames will be "export.txt" for the data file, "export.dic" for the export dictionary file, and "export.sas" for the SAS code file.
- If you have saved the matrix, the default path for the files will be the same as the path of the saved matrix. Their default names will be the first part of the matrix filename, plus the appropriate extension (".txt", ".dic", or ".sas").
Data File
The results in the matrix will be exported to this file. The data may be saved as ASCII text or in compressed text using gzip compression. SEER*Stat determines whether to store it as text or gzipped text based on the extension that you use in the filename. The ".txt" extension is recommended for uncompressed files. Use ".gz" as the extension if you would like the data to be compressed.
The data file is a delimited text file. You can choose the character used as the field delimiter, as well as other file format specifications, using the various options on the Export dialog. A detailed description of the data file's format is given in the export dictionary file.
Export Dictionary File
The export dictionary file includes the following information:
- the name of the data file being described, as well as the name of the database used to generate the results
- variable names and format information
- file format information, including the field delimiter, line delimiter, and other options specified on the Export dialog
The information in the export dictionary file is presented in a Windows INI file format. This format presents information in a structured arrangement that can be understood by people, but that can also be easily used as input to other software packages. The sections of the export dictionary file are as follows.
SAS Code File
When the Generate SAS Code to Read Data check box is marked, SEER*Stat will also output a file containing code that can be used in a SAS program to read in the accompanying data file. This code includes the associated formats, and takes into account your settings, such as the line and field delimiters you specify.
Output Variables as
Table variables may be coded in the export data file as numbers or as the label text, with or without quotes. The code equivalents will appear in the export dictionary file. Choose the coding method that will be the most compatible with your target software package. The Numeric Representation option is disabled for Case Listing matrices.
Numeric Representation
|
The groupings of each variable will be recoded to numbers, starting with zero, in the order in which the groupings are listed in the variable's definition. For example, if the variable were "Sex" with the values in the order "Male and female”, "Male”, and "Female”, then the recoding would be: 0 = "Male and female", 1 = "Male”, and 2 = "Female”. If the order of the variable groupings in the database dictionary were changed to "Male", "Female”, and "Male and female”, then the recoding would be: 0 = "Male”, 1 = "Female”, and 2 = "Male and female”. |
Labels Enclosed in Quotes
|
The labels of the variable groupings and values will be the same as the text shown on your matrix, with double quotes surrounding them. For a Case Listing matrix, Unformatted labels will not appear in quotes in the data file. |
Labels without Quotes |
The labels of the variable groupings and values will be the same as the text shown on your matrix, without double quotes surrounding them, unless you have selected the Enclose Fields Containing Delimiter in Quotes option. |
Line Delimiter
Line breaks in the data file are significant; depending on the type of matrix, they may indicate the end of a row of data, or they may separate different sets of statistics. DOS/Windows and UNIX use different character sequences to represent a line break. Choose the option that matches the platform on which you will be using your exported data file.
DOS/Windows (CR/LF) |
DOS/Windows expects both a carriage return (CR) and a line feed (LF). |
Unix (LF |
UNIX expects only a line feed (LF). |
Field Delimiter
Export data files are in a delimited format; that is, each data item (field) is separated by a certain character. Select the character to be used as the field delimiter from this list. Tab, space, comma, and semicolon are allowed. Variables that are output as labels will likely include spaces and commas, and may include semicolons. Therefore, when choosing the character to be used as the field delimiter, follow these guidelines:
- Tab is the recommended delimiter, since tab characters never appear within a field. Most commercial software will accept tab-delimited data. However, if your software does not support tab-delimited data, then the semicolon is recommended, since it is less likely to appear in a data field than a space or comma.
- If space, comma, or semicolon is your field delimiter and you choose to output the variables as labels, you should either select Labels Enclosed in Quotes or check the Enclose Fields Containing Delimiter in Quotes option. Either of these will ensure that a data field that has a space, comma, or semicolon in its text will be enclosed in quotes, and therefore will not be mistaken as multiple fields.
- If you select comma as the field delimiter, then it is recommended that you do not use comma as the missing character.
- If you select comma as the field delimiter, then it is recommended that you Remove All Thousands Separators from the numeric fields. This will ensure that a field containing the value 1,500 is stored as 1500, so that it is not mistaken as two fields with values equal to 1 and 500.
Missing Character
When a statistic cannot be calculated in an analysis, the cell of the matrix is marked with a footnote. No number appears in the cell. Choose a missing character to represent the empty cell.
Space |
A space is placed wherever the value of a missing cell would appear. If you have chosen a space as the field delimiter, you may not want to choose a space as the missing character. |
Period |
A period is placed wherever the value of a missing cell would appear. You should consider if your software package will handle a period as a numeric missing rather than as text. |
"NA" |
"NA" is placed wherever the value of a missing cell would appear. You should use the "NA" when exporting to an R program as these do not recognize spaces or periods as a missing character. |
Enclose Fields Containing Delimiter in Quotes
This option will place double quotes around any field value which contains the field delimiter character. For example, if you selected the comma for your field delimiter, the label All Sites will not appear with quotes in your data file, but "Uterus, NOS" will because it contains a comma.
Remove all Thousands Separators (Commas)
This option removes all commas acting as thousands separators from the calculated statistics. You may wish to select this option, even if you are using a field delimiter other than the comma. The absence of the thousands separators may be easier for your software package to handle.
Remove Flags (Footnote Characters)
When checked, this option removes all footnote characters from the data file. Footnote characters can indicate such things as a positive significance test or the reason for an empty cell. When deciding whether to select this option, you should consider if the footnote information will be helpful in your analysis of the data file. This option is disabled when working with a Case Listing matrix.
Output Variable Names Before Data
This option lists the variable names before the data in the export dictionary file. The variable names are always in quotes. This can be useful when you import/copy data into another software package that uses field names.
Preserve Matrix Columns and Rename Fields
Checking this option displays the Edit Export Column Names dialog where you can makes changes to the Edit Variable Names.
Defaults
Click this button if you make changes to the export options and wish to return to the default settings.
CSV Defaults
Click this button if you make changes to the CSV export options and wish to return to the default settings.
Set Default
Once you have edited the export options, you may set the current settings as the defaults by clicking this button. This may be useful if you use certain export settings regularly. These options will now be the defaults any time you open a session or matrix in SEER*Stat, and will be applied whenever you click the Defaults button.
For additional information, please see: