# DATA ANALYSIS UTILITIES

## MINF

### Why MINF is needed

When an analysis of variance based on subject means (i.e. mean RT for each subject for each condition) yields a significant F-ratio, this should be interpreted as meaning that if the experiment were run again with the same items but new subjects from the same subject pool, it is likely that the same pattern of results would occur. A significant F-ratio from an analysis based on item means (mean RT for each item over all subjects) signifies that if the experiment were run again with the same subjects but a new set of items from the same item pool, it is likely that the same pattern of results would occur.

The experimenter usually wants to claim that his or her results can be generalised over both subjects and items simultaneously. To justify this claim, it is not sufficient to show that the subject-F and the item-F are separately significant. The quasi-ratio F' may be used to test whether a result may be generalised over subjects and items simultaneously. The F' statistic is often expensive to calculate, however, and instead we usually calculate min F' which is a conservative estimate of F' - that is, min F' will only be significant if F' is significant.

The MINF program calculates min F' according to the formula

min F'(i,j) = (F1 x F2) / (F1 + F2)

j = (F1+F2) x (F1+F2)

(F1xF1/n2) + (F2xF2/n1)

where F1(i,n1) is the F-ratio from the subject analysis

and F2(i,n2) is the F-ratio from the item analysis.

(The foregoing discussion is based on H.H. Clark (1973). The language-as-fixed-effect fallacy. Journal of Verbal Learning and Verbal Behaviour, 12, 335-359.)

### RUNNING THE PROGRAM

This program does not use any files, so it does not matter what drive or directory you are in. Simply run the program by typing

MINF

at the terminal. The program will respond by typing its name and version number and the current date. It will then ask you if you want printed output. Do not answer Yes if there is no printer attached to the computer you are using.

Next you will be prompted to enter the F1 and F2 values. These are the F-values obtained from the subject and item analyses respectively. Then you will be required to enter the denominator degrees of freedom associated with the two F values. Make sure you enter these in the same order as the F values. Finally you will be asked for the numerator degrees of freedom. This is just one value, which applies to F1, F2 and min F'.

When you have entered all these values, the program will print out all three F values and their associated degrees of freedom and the exact probability corresponding to the min F' value. It will then request values for the next calculation. When you have finished all your calculations, just press Return in response to the prompt for more F values and the program will terminate. If you terminate the program with Control-C, some of your printed output (if it was requested) may be lost.

## HIST

### DESCRIPTION

HIST produces descriptive statistics and optionally produces 1 or more histograms based on a sample of data read from either a DAT file (see Section 4) or an ASCII file created by CONCAT or by an editor such as WordStar.

### RUNNING THE PROGRAM

Change the default drive and directory to the drive and directory containing the input file. Run the program by typing

HIST

at the terminal. The program will respond by typing its name and version number and the current date. Enter the name of the input file in response to the prompt. The default filename extension is DAT. The program will automatically determine whether the file is a DAT file or an ASCII file, and proceed accordingly.

### ASCII FILE AS INPUT

An ASCII input file for HIST consists of 1 or more data sets, one after the other. Each data set consists of a title line followed by any number of lines containing data values. Only 1 data value will be read from each line. The program completes processing of each data set before proceeding to the next. A DAS file produced by CONCAT is a suitable input file for HIST.

The program will print out the title of the first data set and ask for the format telling it where to read a value in each data line. The default format is (F5.0). If you enter a format, it should contain an F field and be enclosed in parentheses. The program will then ask for the number of values to be read. You should enter the number of data lines (not including the title) in your data set.

The program will then print out sample statistics and histograms based on the data read (see 12.2.5 and 12.2.6 below). When processing of the first data set is complete, the program will proceed to the next data set. The default format is the format that was used for the previous data set.

### DAT FILE AS INPUT

If the input file is a DAT file, you will be asked to specify from which subject or condition the data is to be taken. E.g. if you enter S2 the data will be taken from subject 2 (all conditions); if you enter C4 the data will taken from condition 4 (all subjects); if you enter Q the program will terminate.

If you enter a subject number, the program will tell you whether that subject's data was incorporated by UPDATE. If you enter a condition number, the program will ask whether you want to read data from all subjects, only those incorporated by UPDATE, or only those rejected by UPDATE. In either case (subject or condition), the program will then ask whether you want to read all reaction times, only reaction times for correct responses, or only reaction times for incorrect responses.

The program will then print out sample statistics and histograms based on the data read (see 12.2.5 and 12.2.6 below). When processing of the first data set is complete, the program will prompt for the next subject or condition number.

### SAMPLE STATISTICS

When the data set has been read, the program calculates and displays on the screen a variety of descriptive statistics for that sample. If N > 150, the skewness is converted to z-score form. If N > 1000, the kurtosis is converted to z-score form. These z-scores may be used to test the hypothesis that the sample is taken from a normal distribution.

If a histogram is later sent to the printer, these statisics will also be sent to the printer.

### HISTOGRAM

After displaying the sample stats, the program will offer you the choice of a histogram on the screen, a histogram on the printer, or no histogram at all. If a histogram is requested, the program will ask the user to enter the desired interval size. The histogram is based on 14 intervals centred about the mean, so the default interval size of 0.5 standard deviation units will usually give a good picture of the data. If an interval size is entered, the 14 intervals will be centred about the nearest multiple of the interval size below the mean, so entering a round number such as 50 or 100 will ensure that all intervals begin and end on a multiple of that number. When a histogram has been generated, you will be offered the choice of another histogram based on that sample or proceeding to the next sample. If a histogram is sent to the printer, it will be accompanied by information identifying the data sample and by the sample statistics.

## DASFILE

### DESCRIPTION

Program DASFILE supersedes programs T2, CORR and LINEAR. It performs all of the functions of those programs. That is, it reads from a DAS file (or other ASCII file of similar format) a data matrix containing a number of columns of data (variables), it optionally creates new variables (columns) which are linear combinations of the existing variables, and if new variables are created it optionally writes the augmented data matrix to a disk file. It then optionally calculates means, standard deviations, correlations and Student's t values for all variables or pairs of variables.

### RUNNING THE PROGRAM

Change the default drive and directory to the drive and directory containing your DAS file (or other similar ASCII file). Run the program by typing

DASFILE

at the terminal. The program will respond by typing its name and version number and the current date. It will then ask you to enter the name of your input file. The default filename extension is DAS.

### CREATING NEW VARIABLES

The program will type out the title of the first data matrix in the input file and then prompt you for the format of the data. The default format is (25F5.0) and this can be used to read DAS files created by CONCAT (unless you wish to skip some of the columns). It also asks you how many lines of data to read for the first data matrix.

The program now displays on the screen the data read and asks you if you want to create new variables. New variables are created by adding together multiples of the variables which already exist - that is, they are linear combinations of existing variables. To specify how a new variable is to be created, it is necessary to specify the co-efficient by which each variable will be multiplied when it is added into the new variable. For instance, if there are already 4 variables and you wish to create a new variable which is the average of all of them, you would enter the co-efficients .25, .25, .25 and .25. If you want to create a new variable which is the difference between variables 1 and 2 you would enter the co-efficients 1, -1, 0, 0. Any co-efficient which is zero can be typed simply as a comma - e.g. the preceding sequence could have been entered as 1, -1,,,.

When a new varaible has been created, the program will type on the screen the transpose of the new column which has been added to the data matrix (that is, the data is typed as a row instead of a column to save space).

### DISK FILE OUTPUT

If any new columns are added to the data matrix, the program will offer you the chance to output the augmented matrix (i.e. the input matrix plus the new columns) to a disk file. If ypu do request disk file output, the program will allow you to select which columns are output.

### STATISTICS - mean, sd, r and t

The program will now optionally print out the mean and standard deviation of each variable and/or the (Pearson) correlation of each pair of variables and/or the matched sample Student's t value for each pair of variables.

## DASCONV

### DESCRIPTION

The analysis of variance program PQRAOV is limited to 3 factors. To analyse data from more complex designs, it is necessary to use ANOVA, which is one of the Perlman suite of programs. The Perlman programs are stored in the /PERLMAN directory on PC-3. Instructions on how to set up your data for analysis by the Perlman program ANOVA are contained in the file ANOVA.MAN. It is possible to take a DAS file created by CONCAT and convert it to a form suitable for input to ANOVA by using the program DASCONV.

### RUNNING THE PROGRAM

A separate output file is generated for each data matrix in the input file. There is no default extension for output filenames.

DASCONV writes every data value from the input file on a separate line of the output file along with a specification of which subject or item and which level of each factor the data value corresponds to. Thus it is necessary to tell DASCONV for each data matrix in the input DAS file how many factors are in the design, how many levels they have, the name of each level of each factor and whether each factor is repeated.

Operation of the program is self-explanatory.