Path Analysis
Developed by Sewall Wright, path analysis is a method employed to determine whether or not a multivariate set of nonexperimental data fits well with a particular (a priori) causal model. Elazar J. Pedhazur (Multiple Regression in Behavioral Research, 2nd edition, Holt, Rinehard and Winston, 1982) has a nice introductory chapter on path analysis which is recommended reading for anyone who intends to use path analysis. This lecture draws heavily upon the material in Pedhazur’s book.
Each oval represents a variable. We have data on each variable for each subject. In this diagram SES and IQ are considered to be exogenous variables — their variance is assumed to be caused entirely by variables not in the causal model. The connecting line with arrows at both ends indicates that the correlation between these two variables will remain unanalyzed because we choose not to identify one variable as a cause of the other variable. Any correlation between these variables may actually be casual (1 causing 2 and/or 2 causing 1) and/or may be due to 1 and 2 sharing common causes. For example, having a certain set of genes may cause one to have the physical appearance that is necessary to obtain high SES in a particular culture and may independently also cause one to have a high IQ, creating a spurious correlation between 1 and 2 that is totally due to their sharing a common cause, with no causal relationship between 1 and 2. Alternatively, some genes may cause only the physical appearance necessary to obtain high SES and high SES may cause high IQ (more money allows you to eat well, be healthy, afford good schools, etc., which raises your IQ). Alternatively, the genes may cause only elevated IQ, and high IQ causes one to socially advance to high SES. In this model we have chosen not to decide among these alternatives.
GPA and nAch are endogenous variables in this model — their variance is considered to be explained in part by other variables in the model. Paths drawn to endogenous variables are directional (arrowhead on one end only). Variance in GPA is theorized to result from variance in SES, IQ, nAch, and extraneous (not in the model) sources. The influence of these extraneous variables is indicated by the arrow from EY. Variance in nAch is theorized to be caused by variance in SES, IQ, and extraneous sources.
Please note that the path to an endogenous variable must be unidirectional in path analysis. Were we to decide that not only does high SES cause high nAch but that also high nAch causes high SES, we could not use path analysis.
For each path to an endogenous variable we shall compute a path coefficient, pij, where “i” indicates the effect and “j” the cause. If we square a path coefficient we get the proportion of the affected variable’s variance that is caused by the causal variable. The coefficient may be positive (increasing the causal variable causes increases in the dependent variable if all other causal variables are held constant) or negative (increasing causal variable decreases dependent variable).
A path analysis can be conducted as a hierarchical (sequential) multiple regression analysis. For each endogenous variable we shall conduct a multiple regression analysis predicting that variable (Y) from all other variables which are hypothesized to have direct effects on Y. We do not include in this multiple regression any variables which are hypothesized to affect Y only indirectly (through one or more intervening variables). The beta weights from these multiple regressions are the path coefficients shown in the typical figures that are used to display the results of a path analysis.
Consider these data from Pedhazur:
For our analysis, let us make one change in Figure 1: Make IQ an endogenous variable, with SES a cause of variance in IQ (make unidirectional arrow from SES to IQ). Our revised model is illustrated in Figure 1A, to which I have added the path coefficients computed below.
Obtain and run Path-1.sas from my SAS Programs page. Here is the code that produced the coefficients for the model in the figure above:
PROC REG;
Figure_1_GPA: MODEL GPA = SES IQ NACH;
Figure_1_nACH: MODEL NACH = SES IQ;
Our diagram indicates that GPA is directly affected by SES, IQ, and nAch. We regress GPA on these three causal variables and obtain R2 4.123 = .49647, β41.23 = p41 = .009, β42.13 = p42 = .501, and β43.12 = p43 = .416.
R-Square 0.4965
The path coefficient from extraneous variables is
Note that the program contains the correlation matrix from Pedhazur. I decided to use an N of 50, but did not enter means and standard deviations for the variables, so the parameter estimates that SAS produces are standardized (the slope is a beta).