Topic 10 Mediation analysis
10.1 Overview
- What are mediation and moderation?
- Mediation analysis example
- Packages needed
- Baron and Kenny approach in R
- Mediation package approach in R
10.2 What is mediation?
Where the relationship between a predictor (X) and an outcome (Y) is mediated by another variable (M).
In the above model, we theorise that socio-economic status predicts education level, which predicts future prospects.
10.3 What is moderation?
There is a direct relationship between X and Y but it is affected by a moderator (M)
In the above model, we theorise that socio-economic status predicts future prospects but the strength of the relationship is changed by education level
10.4 Why different models?
This might be more appropriate if higher education costs money
This might be more appopriate if access to higher education is free
10.5 Mediation analysis
10.5.1 What is a mediation design?
Whether a mediation analysis is appropriate is determined as much by the design as by statistical criteria.
We must consider whether it makes sense to predict this relationship between variables
10.5.2 What is mediation analysis?
- Based on regression
A summary of the logic of mediation:
- The direct relationship between X and Y should be significant
- The relationship between X and M should be significant
- The relationship between M and Y (controlling for X) should be significant
- When controlling for M, the strength of the relationship between X and Y decreases and is not significant
- The direct relationship between X and Y should be significant
- The relationship between X and M should be significant
- The relationship between M and Y (controlling for X) should be significant
- When controlling for M, the strength of the relationship between X and Y decreases and is not significant
Baron & Kenny (1986) originally used a 4-step regression model to test each of these relationships.
10.6 Mediation analysis (the Baron and Kenny Approach)
10.6.1 Conducting mediation analysis (the Baron and Kenny Approach)
- Baron & Kenny (1986) originally used a 4-step regression model to test each of these relationships.
- The sobel test is then used to test the significance of mediation
10.6.2 Step 1: Total Effect
#1. Total Effect
<- lm(Y ~ X, data=Meddata)
fit summary(fit)
##
## Call:
## lm(formula = Y ~ X, data = Meddata)
##
## Residuals:
## Min 1Q Median 3Q
## -10.917 -3.738 -0.259 2.910
## Max
## 12.540
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 19.88368 14.26371
## X 0.16899 0.08116
## t value Pr(>|t|)
## (Intercept) 1.394 0.1665
## X 2.082 0.0399 *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Residual standard error: 5.16 on 98 degrees of freedom
## Multiple R-squared: 0.04237, Adjusted R-squared: 0.0326
## F-statistic: 4.336 on 1 and 98 DF, p-value: 0.03993
10.6.3 Step 2: Path A (X on M)
#2. Path A (X on M)
<- lm(M ~ X, data=Meddata)
fita summary(fita)
##
## Call:
## lm(formula = M ~ X, data = Meddata)
##
## Residuals:
## Min 1Q Median 3Q
## -9.5367 -3.4175 -0.4375 2.9032
## Max
## 16.4520
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 6.04494 13.41692
## X 0.66252 0.07634
## t value Pr(>|t|)
## (Intercept) 0.451 0.653
## X 8.678 8.87e-14 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Residual standard error: 4.854 on 98 degrees of freedom
## Multiple R-squared: 0.4346, Adjusted R-squared: 0.4288
## F-statistic: 75.31 on 1 and 98 DF, p-value: 8.872e-14
10.6.4 Step 3: Path B (M on Y, controlling for X)
#3. Path B (M on Y, controlling for X)
<- lm(Y ~ M + X, data=Meddata)
fitb summary(fitb)
##
## Call:
## lm(formula = Y ~ M + X, data = Meddata)
##
## Residuals:
## Min 1Q Median 3Q
## -9.3651 -3.3037 -0.6222 3.1068
## Max
## 10.3991
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 17.32177 13.16216
## M 0.42381 0.09899
## X -0.11179 0.09949
## t value Pr(>|t|)
## (Intercept) 1.316 0.191
## M 4.281 4.37e-05 ***
## X -1.124 0.264
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Residual standard error: 4.756 on 97 degrees of freedom
## Multiple R-squared: 0.1946, Adjusted R-squared: 0.1779
## F-statistic: 11.72 on 2 and 97 DF, p-value: 2.771e-05
10.6.5 Step 4: Reversed Path C (Y on X, controlling for M)
#4. Reversed Path C (Y on X, controlling for M)
<- lm(X ~ Y + M, data=Meddata)
fitc summary(fitc)
##
## Call:
## lm(formula = X ~ Y + M, data = Meddata)
##
## Residuals:
## Min 1Q Median 3Q
## -14.438 -2.573 -0.030 3.010
## Max
## 11.779
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 96.11234 9.27663
## Y -0.11493 0.10229
## M 0.69619 0.08356
## t value Pr(>|t|)
## (Intercept) 10.361 < 2e-16 ***
## Y -1.124 0.264
## M 8.332 5.27e-13 ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Residual standard error: 4.823 on 97 degrees of freedom
## Multiple R-squared: 0.4418, Adjusted R-squared: 0.4303
## F-statistic: 38.39 on 2 and 97 DF, p-value: 5.233e-13
10.6.6 Viewing output
Summary Table
stargazer(fit, fita, fitb, fitc, type = "text", title = "Baron and Kenny Method")
10.6.7 Interpreting Baron and Kenny approach
A reminder of the logic of mediation:
- The direct relationship between X and Y should be significant
- The relationship between X and M should be significant
- The relationship between M and Y (controlling for X) should be significant
- When controlling for M, the strength of the relationship between X and Y decreases and is not significant
10.6.8 Running the Sobel test
- The Sobel test checks the singificance of indirect effects
#Sobel Test
library(multilevel)
sobel(Meddata$X, Meddata$M, Meddata$Y)
## $`Mod1: Y~X`
## Estimate Std. Error
## (Intercept) 19.8836805 14.2637142
## pred 0.1689931 0.0811601
## t value Pr(>|t|)
## (Intercept) 1.394004 0.16646905
## pred 2.082220 0.03992761
##
## $`Mod2: Y~X+M`
## Estimate Std. Error
## (Intercept) 17.3217682 13.16215851
## pred -0.1117904 0.09949262
## med 0.4238113 0.09899469
## t value Pr(>|t|)
## (Intercept) 1.316028 1.912663e-01
## pred -1.123605 2.639537e-01
## med 4.281152 4.371472e-05
##
## $`Mod3: M~X`
## Estimate Std. Error
## (Intercept) 6.0449365 13.41692114
## pred 0.6625203 0.07634187
## t value Pr(>|t|)
## (Intercept) 0.4505457 6.533122e-01
## pred 8.6783345 8.871741e-14
##
## $Indirect.Effect
## [1] 0.2807836
##
## $SE
## [1] 0.07313234
##
## $z.value
## [1] 3.83939
##
## $N
## [1] 100
However, the above code only gives us to information we need to test the significance of the indirect effect, not the significance itself. Thereore, we can use the following, to get the actual significance of the indirect effect:
library(bda)
## Loading required package: boot
##
## Attaching package: 'boot'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:car':
##
## logit
## bda v15 (Bin Wang, 2021)
mediation.test(Meddata$M, Meddata$X, Meddata$Y)
## Sobel Aroian
## z.value 3.8393902040 3.8190525305
## p.value 0.0001233403 0.0001339652
## Goodman
## z.value 3.8600562907
## p.value 0.0001133609
10.7 Mediation analysis (the Mediation package)
10.7.1 Preacher & Hayes (2004) mediation approach
- Mediation package in R uses the Preacher & Hayes (2004) bootstrapping approach
- They argue that few people test the significance of the indirect effect
“Baron and Kenny simply state that perfect mediation has occurred if c’ becomes nonsignificant after controlling for M, so researchers have focused on that requirement.” (Preacher & Hayes, 2004, p. 719)
- Sobel test has low power (requires larger sample sizes)
- Sobel test assumes normality (often violated)
10.7.2 What is bootstrapping?
“Bootstrapping is a nonparametric approach to effect-size estimation and hypothesis testing that makes no assumptions about the shape of the distributions of the variables or the sampling distribution of the statistic” (Preacer & Hayes, 2004, p. 722)
Bootstrapping takes a large number of samples from our data and runs the analysis on each of these samples
The sampling is done randomly with replacement, and each sample in the bootstrap is the same size as our dataset
Using this method, we can create estimates with that fall within a narrower confidence interval (since we have now run the analysis on 100’s of samples)
Bootstrapping overcomes concerns about the distribution of our original dataset
10.7.3 Mediation example
Is the relationship between No of hours awake and wakefulness mediated by caffiene consumption?
This example is from Demos & Salas (2019). A Language, not a Letter: Learning Statistics in R (Chapter 14)
10.7.4 Step 1: Run the models
#Mediate package
library(mediation)
<- lm(M ~ X, data=Meddata) #IV on M; Hours since waking predicting coffee consumption
fitM <- lm(Y ~ X + M, data=Meddata) #IV and M on DV; Hours since dawn and coffee predicting wakefulness fitY
10.7.5 Step 2: Check assumptions
gvlma(fitM)
##
## Call:
## lm(formula = M ~ X, data = Meddata)
##
## Coefficients:
## (Intercept) X
## 6.0449 0.6625
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = fitM)
##
## Value p-value
## Global Stat 8.833 0.06542
## Skewness 6.314 0.01198
## Kurtosis 1.219 0.26949
## Link Function 1.076 0.29959
## Heteroscedasticity 0.223 0.63674
## Decision
## Global Stat Assumptions acceptable.
## Skewness Assumptions NOT satisfied!
## Kurtosis Assumptions acceptable.
## Link Function Assumptions acceptable.
## Heteroscedasticity Assumptions acceptable.
# We can see that the data is positively skewed. We might need to transform the data (we will discuss this another time).
gvlma(fitY)
##
## Call:
## lm(formula = Y ~ X + M, data = Meddata)
##
## Coefficients:
## (Intercept) X M
## 17.3218 -0.1118 0.4238
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = fitY)
##
## Value p-value
## Global Stat 3.41844 0.4904
## Skewness 1.85648 0.1730
## Kurtosis 0.77788 0.3778
## Link Function 0.71512 0.3977
## Heteroscedasticity 0.06896 0.7929
## Decision
## Global Stat Assumptions acceptable.
## Skewness Assumptions acceptable.
## Kurtosis Assumptions acceptable.
## Link Function Assumptions acceptable.
## Heteroscedasticity Assumptions acceptable.
10.7.6 Step 3.1: Run the mediation analysis on the models
The mediate function gives us:
- Average Causal Mediation Effects (ACME)
- Average Direct Effects (ADE)
- combined indirect and direct effects (Total Effect)
- the ratio of these estimates (Prop. Mediated).
The ACME here is the indirect effect of M (total effect - direct effect) and thus this value tells us if our mediation effect is significant.
<- mediate(fitM, fitY, treat="X", mediator="M")
fitMed summary(fitMed)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower
## ACME 0.28159 0.14991
## ADE -0.11100 -0.30382
## Total Effect 0.17059 0.00862
## Prop. Mediated 1.62837 0.55308
## 95% CI Upper p-value
## ACME 0.42 <2e-16
## ADE 0.09 0.260
## Total Effect 0.33 0.038
## Prop. Mediated 9.84 0.038
##
## ACME ***
## ADE
## Total Effect *
## Prop. Mediated *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Sample Size Used: 100
##
##
## Simulations: 1000
10.7.7 Step 3.2: Plot the mediation analysis of the models
The plot below reiterates what was on the previous slide:
- The confidence intervals of Total Effect and ACME are significant
- The confidence interval of ADE is not significant
Translation:
- Total effect is signficant: there is a relationship between X and Y (direct and indirect)
- ADE is not significant: the relationship between X and Y is not direct
- ACME is significant: the relationship between X and Y is mediated by M
plot(fitMed)
10.7.8 Step 4: Bootstrap the mediation model
The plot below changes our interpretation slightly:
- The confidence interval ACME is significant
- The confidence interval of Total Effect and ADE are not significant
Translation:
Total effect is not signficant: the relationship between X and Y is not significant when we combine direct and indirect effects
ADE is not significant: the relationship between X and Y is not direct
ACME is significant: the relationship between X and Y is mediated by M
<- mediate(fitM, fitY, boot=TRUE, sims=999, treat="X", mediator="M") fitMedBoot
## Running nonparametric bootstrap
summary(fitMedBoot)
##
## Causal Mediation Analysis
##
## Nonparametric Bootstrap Confidence Intervals with the Percentile Method
##
## Estimate 95% CI Lower
## ACME 0.2808 0.1409
## ADE -0.1118 -0.3080
## Total Effect 0.1690 -0.0123
## Prop. Mediated 1.6615 -3.7235
## 95% CI Upper p-value
## ACME 0.42 <2e-16
## ADE 0.12 0.31
## Total Effect 0.34 0.07
## Prop. Mediated 11.33 0.07
##
## ACME ***
## ADE
## Total Effect .
## Prop. Mediated .
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05
## '.' 0.1 ' ' 1
##
## Sample Size Used: 100
##
##
## Simulations: 999
plot(fitMedBoot) ##
10.8 References
Demos & Salas (2019). A Language, not a Letter: Learning Statistics in R (Chapter 14). https://ademos.people.uic.edu/ Accessed Jan 2020.
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior research methods, instruments, & computers, 36(4), 717-731.