8  Multiple Regression and Heirarchical Regression

Tip

By the end of this session, you will be able to:

  • Compare multiple regression to simple regression
  • Describe the assumptions of multiple regression
  • Consider sample size in regression
  • Interpret the output of Multiple regression
  • Conduct and interpret hierarchical multiple regression

8.1 What is multiple regression?

Multiple regression ia an extension of simple regression that allows us to predict an outcome variable (Y) based on multiple predictors (X1, X2 …).

\[ Y = b_1X_1 + b_2X_2 + b_0 \]

(The constant can be referred to in the equation as c or b0 )

8.2 What are the assumptions of Multiple Regression?

They are primarily the same as simple regression, so look back at the assumptions of simple regression in that section. The additional assumption of no multicollinearity applies, due to having multiple predictors. The multicollinearity assumption means that predictors should not be highly correlated. If they were, it would be difficult to determine the unique contribution of each predictor to the model.

8.2.1 What is multicollinearity?

Multicollinearity means that predictors correlated highly with each other. This is not good because:

  • It makes it difficult to determine the role of individual predictors
  • Increases the error of the model (higher standard errors)
  • Difficult to identify significant predictors
  • Wider confidence interval

8.2.2 Testing multicollinearity

To test for multicollinearity, we can use the mctest() function in R. This function is part of the mctest package. It performs several tests for multicollinearity, including the Variance Inflation Factor (VIF) and the Condition Index (CI).

To run the test, you pass the regression model to the mctest() function. The function will then return the results of the tests.

## use the mctest package
# install.packages(‘mctest’)

library(mctest)

1m1 <- lm(aggression_level ~ treatment_group + treatment_duration + trust_score, data=regression_data)

2mctest(m1)
1
In this code, we are running a regression model with three predictors (treatment group, treatment duration, and trust score) and the outcome variable aggression level.
2
We then pass the regression model to the mctest() function to test for multicollinearity.

Call:
omcdiag(mod = mod, Inter = TRUE, detr = detr, red = red, conf = conf, 
    theil = theil, cn = cn)


Overall Multicollinearity Diagnostics

                       MC Results detection
Determinant |X'X|:         0.9229         0
Farrar Chi-Square:         7.7960         0
Red Indicator:             0.1547         0
Sum of Lambda Inverse:     3.1728         0
Theil's Method:           -0.8800         0
Condition Number:         13.6549         0

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test

The mctest() function also takes an additional argument, type, which specifies whether you want, the main tests, each individual predictor, or both. The default is type = "main", which will run the main tests.

8.3 Sample size for multiple regression

As the number of predictors increases, the sample size needed to run a multiple regression analysis also increases. A common rule of thumb is to have at least 10-15 participants per predictor. However, this is a loose rule, and the actual number of participants needed will depend on the complexity of the model and the effect size. Always run a power analysis to determine the sample size needed for your study.

8.4 Approaches to multiple regression: All predictors at once

There are different ways we could run a multiple regression analysis depending on our research question and how we conceptualise the relationship between the predictors and the outcome. One way is to include all predictors at the same time. This is useful when we want to know the combined predictive power of all the predictors at the same time.

Research question: Do a client’s treatment duration and treatment group predict aggression level?

1model1 <- lm(data = regression_data, aggression_level ~ treatment_duration + treatment_group)
1
Here we are including all of the predictors at the same time. Note that we are using a plus sign + between each predictor - this means that no interactions will be tested.

8.4.1 Using categorical predictors in R

  • Treatment group is a categorical (also called “nominal” or “factor”) variable
  • No special “dummy coding” is required in R to use categorical predictors in regression
  • R will use the first group as the reference category and test whether being in another group shows a significant difference
  • R chooses the reference group based on numerical value or alphabetical order
  • If you want you can change the reference category or “force” it using the relevel function:
regression_data$treatment_group <- relevel(regression_data$treatment_group, ref = "therapy1")

More information in categorical predictors in section @ref(catreg)

8.4.2 Reviewing the output

The output from this approach will look the same as simple regression, except there will be an additional row for each predictor in the model in the coefficients table.

summary(model1)

Call:
lm(formula = aggression_level ~ treatment_duration + treatment_group, 
    data = regression_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9468 -1.1104  0.0205  0.9621  3.4481 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)             11.58713    0.77331  14.984  < 2e-16 ***
treatment_duration      -0.66024    0.07119  -9.274 4.96e-15 ***
treatment_grouptherapy2  0.85032    0.30449   2.793   0.0063 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.5 on 97 degrees of freedom
Multiple R-squared:  0.5206,    Adjusted R-squared:  0.5107 
F-statistic: 52.67 on 2 and 97 DF,  p-value: 3.267e-16

As a reminder of what the output tells us:

  • Multiple \(R^2\) = Total variance in outcome that is explained by the model
  • p-value = Statistical significance of the model
  • Coefficients = Contribution of each predictor to the model
    • Pr = Significance of the individual predictor
    • Estimate = Change in the outcome level that occurs when the predictor increases by 1 unit of measurement (for continuous predictors) or the difference in the mean outcome level between the predictor and the reference category (for categorical predictors)

8.4.3 All predictors at once (testing interactions)

It might be the case that we are interested in whether the predictors interact with each other to predict the outcome. For example, we might want to know if the effect of treatment duration on aggression level depends on the treatment group.

Research questions:

  • Do a client’s treatment duration and treatment group predict aggression level
  • Do the predictors interact?
1model2 <- lm(data = regression_data, aggression_level ~ treatment_duration * treatment_group)
1
Here we are including all of the predictors at the same time. Note that we are using an asterisk * between each predictor. This means that interactions will be tested.

Reviewing the output

summary(model2) %>% coefficients
                                             Estimate Std. Error    t value
(Intercept)                                12.3529190  1.1006127 11.2236751
treatment_duration                         -0.7334435  0.1033086 -7.0995381
treatment_grouptherapy2                    -0.5615517  1.4753596 -0.3806202
treatment_duration:treatment_grouptherapy2  0.1394649  0.1425977  0.9780305
                                               Pr(>|t|)
(Intercept)                                3.599000e-19
treatment_duration                         2.166226e-10
treatment_grouptherapy2                    7.043260e-01
treatment_duration:treatment_grouptherapy2 3.305175e-01

In the output, we get additional information in the coefficients table about the interaction between variables. We can see from the output that none of the interactions are significant. This being the case, we would not interpret the main effects of the predictors in the model, and instead, we would interpret the results based on the interaction between the predictors.

There is more information on interactions in regression in the section on moderation.

8.4.4 Hierarchical multiple regression: Theory driven “blocks” of variables

When we have multiple predictors, it might be the case that we have previous research or theory to guide how we run the analysis. For example, we might know that treatment duration and therapy group are likely to predict the outcome, based on the results of previous studies. However, we might also want to check whether client’s level of trust in the clinician has any additional impact on our ability to predict the outcome (aggression level).

This being the case, we could run a hierarchical multiple regression analysis. This is where we run the regression in “blocks” (groups) of variables. We start with a baseline model (Model 0) and then add additional predictors in each subsequent model.

How we group our predictors will depend on our research question and the theory behind the relationship between the predictors and the outcome. As such, we should have a clear rationale for how we group our predictors.

To do this, we run three regression models:

  • Model 0: the constant (intercept)
  • Model 1: treatment duration and therapy group
  • Model 2: treatment duration and therapy group and trust score

We then compare the two regression models to see if:

  • Model 1 is better than Model 0 (the intercept)
  • Model 2 is better than Model 1

The intercept (null) model is the simplest model we can run. It is a model that only includes the constant (intercept) and no predictors. This model is used as a baseline to compare the other models to. Therefore, we don’t really interpret the null model in isolation, rather, we examine whether another model (with predictors) is more useful in predicting the outcome than the null model.

Hierarchical multiple regression: Running and comparing 2 models

## run regression using the same method as above
1model0 <- lm(data = regression_data, aggression_level ~ 1)
2model1 <- lm(data = regression_data, aggression_level ~ treatment_duration + treatment_group)
3model2 <- lm(data = regression_data, aggression_level ~ treatment_duration + treatment_group + trust_score)


## use the aov() command to compare the models
4anova(model0,model1,model2)
1
Here we are running the null model (Model 0) with only the intercept
2
Here we are running Model 1 with treatment duration and treatment group
3
Here we are running Model 2 with treatment duration, treatment group, and trust score
4
We then use the anova() command to compare the models
Analysis of Variance Table

Model 1: aggression_level ~ 1
Model 2: aggression_level ~ treatment_duration + treatment_group
Model 3: aggression_level ~ treatment_duration + treatment_group + trust_score
  Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
1     99 455.27                                   
2     97 218.26  2   237.013 52.2195 4.507e-16 ***
3     96 217.86  1     0.399  0.1757     0.676    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When we run the anova() command, we get an output that tells us whether the additional predictors in each model significantly change the R^2 value of the model (reducing the residual variance of the model). We can see from the output that Model 1 is significantly better than the null model (Model 0), and Model 2 is not significantly better than Model 1.

8.4.5 Model performance metrics

When we run a hierarchical multiple regression analysis, we are interested in whether the additional predictors in each model significantly change the R^2 value of the model. However, adding additional predictors to the model will always increase the R^2 value of the model somewhat, even if the predictors are not useful This is because the R^2 value is based on the amount of variance in the outcome that is explained by the predictors in the model.

Because of this, we should also consider other metrics of model performance, such as the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion). These metrics take into account the number of predictors in the model and penalise the model for having too many predictors. A lower AIC or BIC value indicates a better model. The AIC() and BIC() functions in R can be used to calculate these metrics.

1AIC(model0,model1,model2)

2BIC(model0,model1,model2)
1
Here we are using the AIC() function to calculate the AIC value for each model
2
Here we are using the BIC() function to calculate the BIC value for each model
       df      AIC
model0  2 439.3603
model1  4 369.8394
model2  5 371.6566
       df      BIC
model0  2 444.5707
model1  4 380.2601
model2  5 384.6825

We can see from the output that Model 1 has the lowest AIC and BIC values, indicating that it is the best model of the three. We phrase this has having the best fit to the data, given the number of predictors in the model. Although, in this example, model 2 is not significantly better than model 1, it could be the case that you have a different research question or theory where the anova() test would show that the model with the additional predictors is significantly better. You could then use the AIC and BIC values to determine which model is actually the best fit to the data.

Remember that the accuracy of your models will depend on the quality of your data and the assumptions of regression being met. How you interpret the results will depend on your research question and the theory behind the relationship between the predictors and the outcome. The analysis cannot prove causation, only association. It is up to you to design your study and analysis to best answer your research question.