Conduct and interpret hierarchical multiple regression
8.1 What is multiple regression?
Multiple regression ia an extension of simple regression that allows us to predict an outcome variable (Y) based on multiple predictors (X1, X2 …).
\[ Y = b_1X_1 + b_2X_2 + b_0 \]
(The constant can be referred to in the equation as c or b0 )
8.2 What are the assumptions of Multiple Regression?
They are primarily the same as simple regression, so look back at the assumptions of simple regression in that section. The additional assumption of no multicollinearity applies, due to having multiple predictors. The multicollinearity assumption means that predictors should not be highly correlated. If they were, it would be difficult to determine the unique contribution of each predictor to the model.
8.2.1 What is multicollinearity?
Multicollinearity means that predictors correlated highly with each other. This is not good because:
It makes it difficult to determine the role of individual predictors
Increases the error of the model (higher standard errors)
Difficult to identify significant predictors
Wider confidence interval
8.2.2 Testing multicollinearity
To test for multicollinearity, we can use the mctest() function in R. This function is part of the mctest package. It performs several tests for multicollinearity, including the Variance Inflation Factor (VIF) and the Condition Index (CI).
To run the test, you pass the regression model to the mctest() function. The function will then return the results of the tests.
## use the mctest package# install.packages(‘mctest’)library(mctest)1m1 <-lm(aggression_level ~ treatment_group + treatment_duration + trust_score, data=regression_data)2mctest(m1)
1
In this code, we are running a regression model with three predictors (treatment group, treatment duration, and trust score) and the outcome variable aggression level.
2
We then pass the regression model to the mctest() function to test for multicollinearity.
Call:
omcdiag(mod = mod, Inter = TRUE, detr = detr, red = red, conf = conf,
theil = theil, cn = cn)
Overall Multicollinearity Diagnostics
MC Results detection
Determinant |X'X|: 0.9229 0
Farrar Chi-Square: 7.7960 0
Red Indicator: 0.1547 0
Sum of Lambda Inverse: 3.1728 0
Theil's Method: -0.8800 0
Condition Number: 13.6549 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
The mctest() function also takes an additional argument, type, which specifies whether you want, the main tests, each individual predictor, or both. The default is type = "main", which will run the main tests.
8.3 Sample size for multiple regression
As the number of predictors increases, the sample size needed to run a multiple regression analysis also increases. A common rule of thumb is to have at least 10-15 participants per predictor. However, this is a loose rule, and the actual number of participants needed will depend on the complexity of the model and the effect size. Always run a power analysis to determine the sample size needed for your study.
8.4 Approaches to multiple regression: All predictors at once
There are different ways we could run a multiple regression analysis depending on our research question and how we conceptualise the relationship between the predictors and the outcome. One way is to include all predictors at the same time. This is useful when we want to know the combined predictive power of all the predictors at the same time.
Research question: Do a client’s treatment duration and treatment group predict aggression level?
Here we are including all of the predictors at the same time. Note that we are using a plus sign + between each predictor - this means that no interactions will be tested.
8.4.1 Using categorical predictors in R
Treatment group is a categorical (also called “nominal” or “factor”) variable
No special “dummy coding” is required in R to use categorical predictors in regression
R will use the first group as the reference category and test whether being in another group shows a significant difference
R chooses the reference group based on numerical value or alphabetical order
If you want you can change the reference category or “force” it using the relevel function:
More information in categorical predictors in section @ref(catreg)
8.4.2 Reviewing the output
The output from this approach will look the same as simple regression, except there will be an additional row for each predictor in the model in the coefficients table.
summary(model1)
Call:
lm(formula = aggression_level ~ treatment_duration + treatment_group,
data = regression_data)
Residuals:
Min 1Q Median 3Q Max
-2.9468 -1.1104 0.0205 0.9621 3.4481
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.58713 0.77331 14.984 < 2e-16 ***
treatment_duration -0.66024 0.07119 -9.274 4.96e-15 ***
treatment_grouptherapy2 0.85032 0.30449 2.793 0.0063 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.5 on 97 degrees of freedom
Multiple R-squared: 0.5206, Adjusted R-squared: 0.5107
F-statistic: 52.67 on 2 and 97 DF, p-value: 3.267e-16
As a reminder of what the output tells us:
Multiple \(R^2\) = Total variance in outcome that is explained by the model
p-value = Statistical significance of the model
Coefficients = Contribution of each predictor to the model
Pr = Significance of the individual predictor
Estimate = Change in the outcome level that occurs when the predictor increases by 1 unit of measurement (for continuous predictors) or the difference in the mean outcome level between the predictor and the reference category (for categorical predictors)
8.4.3 All predictors at once (testing interactions)
It might be the case that we are interested in whether the predictors interact with each other to predict the outcome. For example, we might want to know if the effect of treatment duration on aggression level depends on the treatment group.
Research questions:
Do a client’s treatment duration and treatment group predict aggression level
Here we are including all of the predictors at the same time. Note that we are using an asterisk * between each predictor. This means that interactions will be tested.
In the output, we get additional information in the coefficients table about the interaction between variables. We can see from the output that none of the interactions are significant. This being the case, we would not interpret the main effects of the predictors in the model, and instead, we would interpret the results based on the interaction between the predictors.
There is more information on interactions in regression in the section on moderation.
8.4.4 Hierarchical multiple regression: Theory driven “blocks” of variables
When we have multiple predictors, it might be the case that we have previous research or theory to guide how we run the analysis. For example, we might know that treatment duration and therapy group are likely to predict the outcome, based on the results of previous studies. However, we might also want to check whether client’s level of trust in the clinician has any additional impact on our ability to predict the outcome (aggression level).
This being the case, we could run a hierarchical multiple regression analysis. This is where we run the regression in “blocks” (groups) of variables. We start with a baseline model (Model 0) and then add additional predictors in each subsequent model.
How we group our predictors will depend on our research question and the theory behind the relationship between the predictors and the outcome. As such, we should have a clear rationale for how we group our predictors.
To do this, we run three regression models:
Model 0: the constant (intercept)
Model 1: treatment duration and therapy group
Model 2: treatment duration and therapy group and trust score
We then compare the two regression models to see if:
Model 1 is better than Model 0 (the intercept)
Model 2 is better than Model 1
The intercept (null) model is the simplest model we can run. It is a model that only includes the constant (intercept) and no predictors. This model is used as a baseline to compare the other models to. Therefore, we don’t really interpret the null model in isolation, rather, we examine whether another model (with predictors) is more useful in predicting the outcome than the null model.
Hierarchical multiple regression: Running and comparing 2 models
## run regression using the same method as above1model0 <-lm(data = regression_data, aggression_level ~1)2model1 <-lm(data = regression_data, aggression_level ~ treatment_duration + treatment_group)3model2 <-lm(data = regression_data, aggression_level ~ treatment_duration + treatment_group + trust_score)## use the aov() command to compare the models4anova(model0,model1,model2)
1
Here we are running the null model (Model 0) with only the intercept
2
Here we are running Model 1 with treatment duration and treatment group
3
Here we are running Model 2 with treatment duration, treatment group, and trust score
4
We then use the anova() command to compare the models
Analysis of Variance Table
Model 1: aggression_level ~ 1
Model 2: aggression_level ~ treatment_duration + treatment_group
Model 3: aggression_level ~ treatment_duration + treatment_group + trust_score
Res.Df RSS Df Sum of Sq F Pr(>F)
1 99 455.27
2 97 218.26 2 237.013 52.2195 4.507e-16 ***
3 96 217.86 1 0.399 0.1757 0.676
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When we run the anova() command, we get an output that tells us whether the additional predictors in each model significantly change the R^2 value of the model (reducing the residual variance of the model). We can see from the output that Model 1 is significantly better than the null model (Model 0), and Model 2 is not significantly better than Model 1.
8.4.5 Model performance metrics
When we run a hierarchical multiple regression analysis, we are interested in whether the additional predictors in each model significantly change the R^2 value of the model. However, adding additional predictors to the model will always increase the R^2 value of the model somewhat, even if the predictors are not useful This is because the R^2 value is based on the amount of variance in the outcome that is explained by the predictors in the model.
Because of this, we should also consider other metrics of model performance, such as the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). These metrics take into account the number of predictors in the model and penalise the model for having too many predictors. A lower AIC or BIC value indicates a better model. The AIC() and BIC() functions in R can be used to calculate these metrics.
We can see from the output that Model 1 has the lowest AIC and BIC values, indicating that it is the best model of the three. We phrase this has having the best fit to the data, given the number of predictors in the model. Although, in this example, model 2 is not significantly better than model 1, it could be the case that you have a different research question or theory where the anova() test would show that the model with the additional predictors is significantly better. You could then use the AIC and BIC values to determine which model is actually the best fit to the data.
Remember that the accuracy of your models will depend on the quality of your data and the assumptions of regression being met. How you interpret the results will depend on your research question and the theory behind the relationship between the predictors and the outcome. The analysis cannot prove causation, only association. It is up to you to design your study and analysis to best answer your research question.