Carrying out multiple comparison tests for interaction effects in both SPSS and SAS can be difficult within the procedure in which the data were analyzed. The purpose of this document is to show how to do contrasts for situations where cell means, sample sizes, and MSE (or cell standard deviations) are known.
A companion document shows how to do followup analyses using SPSS ONEWAY. This document shows how to carry out the analysis using SAS PROC GLM. The example below shows how to set up an analysis for a two-way between subjects design, but it can be easily modified to allow analysis of within subjects designs and more complex multiway interactions.
The following information is necessary:
The cell mean and standard deviations can be obtained from raw data, if available, using the MEANS statement in PROC GLM. If raw data are not available and only the MSE term is available, then the RMSE can be calculated by taking the square root of MSE.
If the analysis has been carried out using raw data and the interaction effect of interest is the highest order interaction, then the followup tests can be be generated by converting the multi-way ANOVA to a oneway ANOVA. This is done by generating a variable for the oneway levels from the variables for the interaction. For instance, for an A2B3 analysis, we can generate a variable whose values represent a combination of the levels of A and B. Since SAS treats these values as categorical data, their exact values are of no interest, but they do serve to indicate the cells in the design.
data oneway ; input a $ b $ y ; c=trim(a)||trim(b) ;In the example above, C will have values 11, 12, 13, 21, 22, and 23. The variable C can be used in a oneway analysis to generate the necessary posthoc tests:
proc glm ; title 'oneway analysis with raw data' ; class c ; model y = c ; means c / tukey cldiff ; lsmeans c / tdiff ;
The MODEL statement generates the ANOVA sums of squares and degrees of freedom for the oneway model. These results should be ignored; the test displayed is the significance of the model as a whole (combined main effects and interaction terms), not the test of the interaction of interest.
The MEANS statement with the TUKEY option generates the tukey pairwise comparisons for the cells. This test protects against inflation of the type I error rate due to multiple t-tests and uses a constant error term for the analysis. Other procedures for posthoc comparisons are available in the SAS documentation for PROC GLM. The CLDIFF option prints the comparison results as confidence intervals for the for a specific pair. A confidence interval which includes 0 in its range (negative lower bound, positive upper bound) is not significant. Significant comparisons are shown with "***" after the confidence intervals. If the CLDIFF option is eliminated in a balanced design, SAS will generate a table showing homogeneous groupings of the means.
The LSMEANS statement presents the least squares means. These are means generated under a balanced model. The TDIFF option generates all possible pairwise comparisons of the LS means with the associated probabilities. To prevent inflation of type I error, the probabilities should be compared with one generated under a multiple comparison procedure (such as that using the Bonferroni adjustment where the nominal alpha level is divided by the number of comparisons) when determining significance.
When only cell statistics are available (means, cell sample sizes, standard deviations), the situation becomes more difficult. These statistics are "sufficient" statistics for an ANOVA, but PROC GLM is constructed to work with raw data only. Therefore, it is necessary to generate a data set with the same characteristic summary statistics. If the cell standard deviations are not available, then the root MSE (the square root of the error term for the statistical test of the interaction effect) can be substituted for each of the cell standard deviations (the goal here is to generate a data set with the same cell means and mean square error as the original data set, so the individual observations which generate the values are of no interest.
The following code as well as the rationale described above were given by David A. Larson in the American Statistician (May, 1992; v. 46, pp. 151-152). Briefly, the code generates Nj-1 observations whose values are equal to the cell mean plus the standard error of the cell mean (actually, it generates a single observation which is later weighted by Nj-1). The last observation is equal to Nj x the cell mean (the sum of the observations in the cell) - the sum of the previous Nj-1 observations. This gives a mean value equal to the supplied cell mean. Furthermore, it can be shown that the variance of these observations is equal to the cell variance. Since SAS pools the cell variances to obtain MSE, PROC GLM has the necessary information to generate the tests of interest. In this example, the cell variances are simulated through the use of MSE, the error term for the test.
data oneway ; /* Analysis for A2B3 Design */ * input means and nj from previous analysis ; input nj ybarj ; * mse below is obtained from error term for test for interaction ; mse=11.49 ; * simulate cell values ; yis=ybarj + sqrt(mse/nj) ; yns=nj*ybarj-(nj-1)*yis ; group+1 ; y=yis ; freq=nj-1 ; output ; y=yns ; freq=1 ; output ; datalines ; 20 11.18 20 6.79 20 6.33 20 3.99 20 1.59 20 2.49 ; proc format ; value cellfmt 1='A1B1' 2='A1B2' 3='A1B3' 4='A2B1' 5='A2B2' 6='A2B3' ; proc glm ; class group ; freq freq ; model y = group ; means group / tukey ; lsmeans group / tdiff ; estimate 'A1B1 vs A1B2' group 1 -1 ; estimate 'A1B1 vs A2B1' group 1 0 0 -1 ; estimate 'B1-B2 in A1 vs B1-B2 in A2' group 1 -1 0 -1 1 0 ; format group cellfmt. ;
input nj ybarj sdj ;) and then
substituting sdj**2 for mse in the data step above.
This approach is documented in "One-way ANOVA from
Summary Statistics" from the University of Texas.
In a mixed analysis containing both between- and within-goups effects, contrasts representing between group differences for specific levels of a repeated measure (e.g., A1 vs. A2 in B1 where A is between and B is within) should be tested against a pooled error term. The details of obtaining this term can be obtained by reading Posthoc Tests for Interactions using SPSS to obtain the necessary ms(w.cell) error term and degrees of freedom. The degrees of freedom should be apportioned among the cells such that the correct error df are obtained.