Overview
This document provides general guidelines for writing SAS programs.
SAS programs are divided into DATA steps and PROCs. The purpose of the
data step is to create one or more SAS data sets. The data step contains
statements which read in raw data files or existing SAS data sets. Other
data step tasks include transforming, creating, and selecting variables,
selecting cases, defining missing data, and providing labels for
variables. The data step begins with the word DATA followed by the name
of a data set. See Sample Program #1 for an
example of a simple data step.
SAS PROCs are used to analyze or graph data or provide information about a
SAS data set. For example, PROC REG performs multiple regression on sample
data, while PROC CONTENTS tells the user the name and location of the variables
in a SAS data set.
A SAS program may contain one or more data steps and/or one or more
procedures.
The following texts provide useful information for writing your SAS program:
- Online Documentation from SAS Institute.
- The Little SAS Book: a primer by Lora D. Delwiche and Susan
J. Slaughter (Cary, NC: SAS Institute).
- Applied Statistics and the SAS Programming Language, by Cody
and Smith
The SAS data step consists of a series of statements. Rules for
writing these statements follow.
Words:
- Words in statements must be separated by one or more blanks.
- A word may not be split between lines.
- Words may be in upper, lower, or mixed case.
- Values of character variables must match data values exactly (case-sensitive).
Variable names:
- Variable names must be 32 or fewer characters in length.
- All variable names must begin with an alphabetic character (A-Z, a-z)
or an underscore (_). Subsequent characters may include digits.
- A variable list such as
V1-V5 means V1,
V2, V3, V4, and V5.
- SAS matches variable names precisely character-wise, but not case-wise.
That is,
V1 is not the same as V01, but V1 is the same as v1.
- Variable names may not contain embedded blanks.
V1 and
V_1 are acceptable; V 1 is not.
- Certain names are reserved for use by SAS, e.g.,
_N_,
_TYPE_, and _NAME_. Similarly, logical
operators such as ge, lt, and, and
eq should not be used as variable names.
Statements:
- A statement may begin anywhere on a line and may be continued on additional
lines as necessary.
- Statements end with a semicolons (;).
- Statements which beginning with an asterisk (*) are treated
as comments and are not interpreted. A comment is concluded with a semicolon.
- A group of statements preceded by /* are ignored until */
is read (block comment). Semicolons between the /* ... */ have no
effect.
- Multiple statements may appear on a line; they must be separated by
semicolons.
The Data Step
SAS PROCS
- SAS PROCs (procedures) are used for many purposes including carrying
out statistical analysis (e.g.,
PROC ANOVA, PROC MEANS), displaying information about a SAS data set
(e.g., PROC CONTENTS, PROC PRINT), and creating graphs (PROC GPLOT).
- Most PROCs produce output of some kind. The output of statistical
PROCs usually appears in the listing file. The output of graphics
PROCs often takes the form of a graphics file or graphics stream
output to the screen.
- The PROC(s) must appear after a data step which creates the SAS
data set used in the procedure.
- The word PROC automatically terminates a SAS data step.
- Data step commands may not appear after a PROC unless a new data step
is initiated with the word DATA.
- A SAS PROC begins with word PROC followed by the name of the
specific procedure (e.g., PROC REG).
- Some PROCs have options or subcommands which allow the user to output
information into a SAS data set (e.g., PROC MEANS, PROC REG).
- The default data set used by a PROC is the data set created by the last data
step or PROC before the current PROC. To change the data set used by a
PROC, use the DATA= option on the PROC line.
Miscellaneous Commands
- The
OPTIONS statement allows the programmer to
set options for the current session. For example: OPTIONS
NOCENTER LINESIZE=80 ; sets the line size in the listing file as
80 columns in length and shifts the output to the left side of the
page.
-
LIBNAME is used to define a libref used to create
a permanent SAS data set. FILENAME is used to define a fileref used
in referring to a specific file.
-
ENDSAS; is used to terminate the SAS program.