Introduction to SAS


I.  SAS:  Current release: 9.1.3

II.  Documentation

    A.  Version 9 Documentation
        1. Available through SAS Institute
    B.  User written documentation
        1.  DiIorio, SAS Applications Programming: A gentle introduction
        2.  Books for advanced techniques, e.g., SAS System for 
            Mixed Models
        3. The Little SAS Book, Delwiche and Slaughter
        4. Applied Statistics and the SAS Programming Language, Cody &
           Smith
    C.  U-System Documentation
        1. Getting Started With SAS at 
http://www.u.arizona.edu/udocs/stat/sas/sas.html
        2.  Other links at same location.

III.  Running SAS

    A.  Interactive mode on U system
        1.  "sas" with no arguments
        2.  Need Xwindows interface or run in Windows
        3.  setenv DISPLAY localnodename:0.0
        4.  xhost +bast.u.arizona.edu
           a.  3 & 4 may not be necessary with SSH
        5.  most efficient to write syntax 
        6.  log and listing windows

    B.  Interactive mode in Windows
        1.  Click on SAS Program icon
        2.  Will bring up Program and Log windows; output will appear when 
            program is executed

    C.  Noninteractive mode
        1.  uses SAS command file (e.g., myfile.sas)
            a. Commands for SAS (similar to SPSS files)
            b. Usual extension is SAS
        2.  "sas myfile" to run (no extension if .sas file)
        3.  produces log and listing files (.log, .lst)
            a. Log file shows commands executed, error messages, warnings
            b. Listing file contains output of procedures (no syntax)
            c. Misc. Output files
                i. Data written to file in data step commands
               ii. Graphics files (postscript format)
              iii. SAS data sets
        4.  batch queue for very long jobs (> 10 min. CPU, lg. memory)


    D.  Data files
        1.  Text data
            a. Inline (after "datalines;" or "cards;" statement)
            b. External
                i. Format same as for SPSS
               ii. Missing data may be indicated with "."
              iii. Column formatted data (fixed field width), 
                   freefield, comma delimited data 
        2.  SAS datasets
            a. Similar to SPSS system files
            b. transport data sets can be created to transfer 
              between platforms
        3.  Other formats available
            a. Can read SPSS export files (cannot read SPSS system files)
                 i. GET FILE='myfile.save' 
                    ... 
                    EXPORT OUTFILE='myfile.xpt'
            b. Mainframe is more restricted regarding data types:
                i. dBase
               ii. OSIRIS data sets
              iii. Usually cannot read spreadsheets
               iv. Tab and comma delimited files (can include 
                   variable names in first line of file)
                v. Cannot read Excel spreadsheets, although can output
                   files in RTF format
            c. Windows version:
                i. Excel
               ii. dBase
              iii. Lotus
               iv. Access
                v. Tab and comma delimited files as above
               vi. Generally more flexible for data import and export.

IV.  Getting Help

    A.  Interactive version has entry for help on Toolbar

    B.  Extensive online documentation at www.sas.com

    C.  STATHELP@listserv.arizona.edu

    D.  sample files in /usr/local/sas/sas9.1.3/samples/x 
        1.  ls -l /usr/local/sas/sas9.1.3/samples/stat/regex*.sas
            a. Copy into own directory and run

V.  Structure of SAS  command file

    A.  Data step
        1.  Purpose is to create one or more SAS data sets
        2.  Read in data, select cases, transform values, select 
          or drop variables
        3.  May have multiple data steps in program  
        4.  Data step begins with statements defining data set-- 
            may read in external text file or SAS data set or 
            data set from previous data step in program 
        5.  Data step ends when another another data step begins 
            (word DATA encountered at beginning of line) or PROC 
            begins.

    B.  PROCs
        1.  Procedures to work with data sets
        2.  Statistical procedures 
          (PROC MEANS, PROC GLM, PROC REG, PROC PRINCOMP, PROC ARIMA)
        3.  Data set manipulation and examination procedures 
          (PROC CONTENTS, PROC DATASETS, PROC IMPORT, PROC COPY, PROC SORT)
        4.  Utility procedures 
          (PROC PRINT, PROC CALENDAR, PROC PLOT, PROC TABULATE)
        5.  Graphics procedures (PROC GPLOT, PROC GCHART, PROC G3D)
        6.  Documentation for PROCS
            a. Base system documentation for PROCs for descriptive 
              statistics (MEANS, FREQ, CORR), data set manipulation 
              and examination (SORT, CONTENTS)(SAS Procedures Guide)
            b. STAT documentation for multivariate procedures 
              (FACTOR, PRINCOMP, ANOVA, GLM, REG, CATMOD, LOGISTIC)
            c. Time series procedures in ETS documentation 
              (ARIMA, STATESPACE, X11)
            d. Other statistical procedures documented in OR and QC 
              manuals (CAPABILITY, GANTT)
            e. Interactive Matrix Language documented in IML 
              manual (PROC IML)
                i. Ex:  
                     PROC IML; 
                     A ={1.1  4.5 3.4, 2.1 7.6 4.8} ; 
                     AT = A`;
            f. Graphics procedures in SAS/GRAPH manual (GPLOT, GCHART) 
              as well as statements to set up AXIS and SYMBOLS.
        7.  Other statements controlling environment can go anywhere: 
          options, comment, goptions, libname, filename.  
          (must appear before needed)
        8.  run ;  after procedures (necessary after last procedure in 
          interactive mode, not needed otherwise
        9.  endsas; statement to end program (not required, 
            should not use interactively

VI.  Syntax rules

    A.  Statements begin in any column

    B.  Statements end with ";"

    C.  Multiple statements allowed per line, separate with ";"

    D.  May continue on next line, indentation not required

    E.  Comments begin with "*", end with ";"

    F.  Block comments "/*", "*/" (semicolon has no effect)

    G.  Inline data after "cards" or "datalines", appears at 
        END of data step
        1.  Terminate data with ";" or PROC, 
            do not end with blank line

    H.  Not case sensitive

    I.  Names: 
        1.  Up to 32 characters
        2.  May begin with letter of alphabet or underscore 
            (but be careful)
            a. Reserved names: _N_, _TYPE_, _NAME_
            b. Cannot use logical operators (GE, LT, EQ.... 
            c. Generally same rules as SPSS, but a little more flexible
        3.  May contain digits, but may not begin with digit
            (_1 is permissible, 1 or 1X is not)
        4.  May contain embedded blanks, name must be in quotes 
        5.  V1-V75 to generate list of 75 variables beginning with "V"
            a. V1 ^= V01
        6.  Must be spelled consistently (V01 has different 
            "spelling" from V1)

VII.  DATA step

    A.  Purpose
        1.  Create SAS data set to use in other procedures
        2.  Read in data from external or inline file (text or 
            special formats)
        3.  Merge or concatenate data sets
        4.  Select/drop cases/variables
        5.  Create/transform variables
            a. Use SAS functions for creating variables if needed 
              (trig, date, statistical)
        6.  Define missing data
        7.  Restructure, rewrite data set
        8.  Conceptually a loop in which cases are read in and 
            processed one at a time.

     B.  Begins with DATA dsn ; where "dsn" is name of one or more 
         SAS data sets
        1.  DATA one ;  
            creates a temporary SAS data set named "one" which can be 
            used in PROCs or read into other SAS data sets
        2.  DATA one two ; creates two temporary SAS data sets 
        3.  DATA "one" ; creates permanent SAS data set called one.sas7bdat 
          which is stored in current directory.
        4.  DATA "~/edp548/data/mysasfile" ; creates a SAS data set called 
          mysasfile.sas7bdat which is stored in ~/edp548/data.
        5.  No SAVE statement as with SPSS.  Saving system file depends name.

    C.  DATA step terminates when PROC, DATA, or DATALINES is encountered.

    D.  Reading in raw data:
        1.  INFILE statement defines location of external data file
            a. Not required for inline data
            b. Analogous to FILE= in SPSS DATA LIST statement
            c. form INFILE 'mytextfile.dat' MISSOVER ;
                i. Purpose of MISSOVER
        2.  INPUT statement names variables, defines types, locations (columns)
            a. INPUT A $ -  A is character variable
            b. INPUT A 1-3 B 4 C $ 15-24 ;
            c. More flexible than SPSS.  May mix freefield, 
              column specific, formatted
            d. INPUT A 3. ; would read in first 3 columns of numeric data, 
              store in A
                i. Same as INPUT A 1-3 ;
            e. Cannot do automatic assignment as with SPSS (V1 TO V75 6-80). 
                i. Use format instead:  V1-V75 (1.) ; (assumes in column 6)
            f. Purpose of @, +
            g. numbered lines (#2, #3, etc.)
            h. Trailing @, @@
                i. Data step as a loop
        3.  PROC IMPORT as alternative to DATA step with INFILE and INPUT
            a. Can be used with CSV, Tab-delimited text files or 
              Excel spreadsheets, DBF files (Excel won't work in Unix).
            b. Transformations and selection then occurs in data step 
              (have to read in data set created by PROC IMPORT).

    E.  Creating new variables

        1.  COMPUTE command not required, just write expression
            a. PC1=.345*X1 + .410*X2  -.613*X3 ;
            b. XBAR=MEAN(OF X1-X5)  ;
            c. XBAR=MEAN(X1,X2,X3,X4,X5) ;
            d. X2Y2 = X**2 * Y**2 ;
            e. SQRTX= SQRT(X) ;
        2.  Conditional computations
            a. IF...THEN, ELSE
            b. THEN is required
            c. Block IF...THEN...ELSE
        3.  SAS Functions
            a.   Similar to SPSS
            b.   Argument lists
            c.  Types of functions
		1)  Mathematical: EXP, LOG, LOG10, SQRT, ARSIN, ABS
                2)  Statistical, data:  N, NMISS, MEAN, SUM, STD, VAR
                3)  Sampling, probability:  RANUNI, PROBNORM(Z),
                    PROBF(F*,ndf,ddf), PROBT(t*,df),PROBIT(p), 
                    FINV(p,ndf,ddf),TINV(p,df)
                4)  General:  Date/time (MDY), ZIPSTATE, UPCASE...

     F. Case selection
        1.  IF SEX='M' ; selects only cases for which SEX has code of "M".  
        2.  IF SEX ^= 'M' ; retains any case with any value other than 'M' ;
        3.  IF SEX='F' THEN DELETE ; deletes females with code of 'F'; 
          all other cases retained

     G. Arrays and Looping
        1.  Array statement form
        2.  DO loops

     H.  Counting

     I.  Missing values
        1. No MISSING VALUES command, no user-defined missing values
        2. blank numeric data is automatically treated as missing
        3. Treat "missing codes" by assigning system-defined value:
            a. IF INCOME=-99 THEN INCOME=. ;
        4. For a list of variables, combine #3 with ARRAY and 
           DO statements

     J.  Recoding values

     K.  Variable labels
        1.  label varname='label here' ; (equals sign needed)
        2.  may have multiple labels in one label statement.

     L.  Value labels
        1. No VALUE LABEL statement
        2. Handled with formats which overlay values of variables 
           (print labels instead of values)
        3.   Have to create outside of data step using PROC FORMAT 
             and then apply as desired.

     M.  Character variables
        1.  STRING declaration not required, but should declare length 
            (or first value must be longest)
        2.  LENGTH declaration:  LENGTH STATE $2. ;  simultaneously
            declares that STATE is character variable and that it will 
            be two characters long.
            a.   Longer values are truncated
        3.  Example:  IF CITY='Tucson' THEN STATE='AZ' ;
        4.  Values in quotes refer only to character variables:

            If variable is character, then values in quotes:
              INPUT GROUP $ ;
              IF GROUP='1' THEN DO ;
            vs.
              INPUT GROUP ;
              IF GROUP=1 THEN DO ;

            Do not refer to missing values as '.' (no quotes).
   
      N.  SET statement
         1.  DATA NEW ;
              SET OLD ;
         2.  DATA COMBINE ;
              SET A B C ;
         3.  DATA COMBINE ;
              MERGE A B ;
              BY ID ;

      O.  Determining variables for data set
         1. DROP varlist ; statement
         2. KEEP varlist ; statement
            a.  Affects data set created, not active file.
         3. dataset DROP or KEEP:
             DATA NEW (DROP=var1-var3) ;
             DATA NEW (KEEP=score1-score15) ;
             a.  Affects data set created, not active file.
         4. SET DROP or KEEP:
             DATA NEW ;
                SET OLD(DROP=var1-var3) ;
             DATA NEW ;
                SET OLD(KEEP=score1-score15) ;
             a.  Affects data set read into active file and data set
                 created.

      P.  Writing out text files
         1.  File statement to name output file
         2.  PUT statement to write variables to file:

         DATA UNIVAR ;
           INFILE 'multivariate.dat' ;
           INPUT ID PRE POST FU ;
           ARRAY MEASURE (3) PRE POST FU ;
           DO TIME=1 TO 3 ;
             Y=MEASURE(TIME) ;
             FILE 'univariate.dat' ;
             PUT ID TIME Y ;
           END ;

             


Sample SAS command file

Sample SAS log file

Sample SAS listing file

GZII command file for checking data

GZII log file for checking data

GZII listing file for checking data

Last updated: March 23, 2004