Multiple Linear Regression
Example: Predicting Blood Pressure
In a study to predict systolic blood pressure (SBP) using age, body
mass index (BMI), and smoking status (smoker or non-smoker) as
predictors, we can use multiple linear regression. The model can be
represented as:
\[
\text{SBP} = \beta_0 + \beta_1 \text{Age} + \beta_2 \text{BMI} + \beta_3
\text{SmokingStatus} + \epsilon
\]
Here, SBP is the dependent variable, Age and BMI are numerical
independent variables, and SmokingStatus is a categorical independent
variable.
Steps for Interpretation:
Fit the Model: Use statistical software to fit
the model and obtain the regression coefficients.
Check the Coefficients: \[
\text{SBP} = 90 + 0.5 \times \text{Age} + 1.2 \times \text{BMI} + 15
\times \text{SmokingStatus} + \epsilon
\]
- Intercept (\(\beta_0\)): The intercept of 90
indicates the expected SBP when Age, BMI, and SmokingStatus are
zero.
- Age (\(\beta_1\)):
A coefficient of 0.5 means that for each additional year of age, the SBP
increases by 0.5 mmHg, holding other variables constant.
- BMI (\(\beta_2\)):
A coefficient of 1.2 means that for each unit increase in BMI, the SBP
increases by 1.2 mmHg, holding other variables constant.
- SmokingStatus (\(\beta_3\)): The coefficient of 15
indicates that smokers have, on average, 15 mmHg higher SBP than
non-smokers, holding other variables constant.
Assess Model Fit:
- R-squared: Indicates the proportion of the variance
in the dependent variable that is predictable from the independent
variables.
- Residual Plots: Check for homoscedasticity and
normality of residuals.
Check Significance:
- p-values: Determine if the coefficients are
significantly different from zero.
- Confidence Intervals: Provide a range of values for
the coefficients.
Logistic Regression
Example: Predicting the Presence of a Disease
Suppose we are predicting the presence of a certain disease (Yes/No)
using age, BMI, and family history (Yes/No) as predictors. The logistic
regression model can be written as:
\[
\text{logit}(P(Y = 1)) = \log\left(\frac{P(Y=1)}{1-P(Y=1)}\right) =
\beta_0 + \beta_1 \text{Age} + \beta_2 \text{BMI} + \beta_3
\text{FamilyHistory}
\]
Here, \(Y\) is the dependent binary
variable, Age and BMI are numerical independent variables, and
FamilyHistory is a categorical independent variable.
Steps for Interpretation:
Fit the Model: Use statistical software to fit
the model and obtain the regression coefficients.
Check the Coefficients: \[
\text{logit}(P(Y = 1)) = -2 + 0.03 \times \text{Age} + 0.1 \times
\text{BMI} + 0.8 \times \text{FamilyHistory}
\]
- Intercept (\(\beta_0\)): The intercept of -2 is
the log-odds of having the disease when Age, BMI, and FamilyHistory are
zero.
- Age (\(\beta_1\)):
A coefficient of 0.03 means that each additional year of age increases
the log-odds of having the disease by 0.03, holding other variables
constant.
- BMI (\(\beta_2\)):
A coefficient of 0.1 means that each unit increase in BMI increases the
log-odds of having the disease by 0.1, holding other variables
constant.
- FamilyHistory (\(\beta_3\)): The coefficient of 0.8
indicates that individuals with a family history of the disease have
higher log-odds of having the disease by 0.8 compared to those without,
holding other variables constant.
Convert Log-Odds to Odds Ratios: \[
\text{Odds Ratio (Age)} = e^{0.03} \approx 1.03
\] \[
\text{Odds Ratio (BMI)} = e^{0.1} \approx 1.11
\] \[
\text{Odds Ratio (FamilyHistory)} = e^{0.8} \approx 2.23
\]
- Age: Each additional year of age increases the odds
of having the disease by 3%.
- BMI: Each unit increase in BMI increases the odds
of having the disease by 11%.
- FamilyHistory: Having a family history of the
disease increases the odds of having the disease by 123%.
Assess Model Fit:
- Likelihood Ratio Test: Compare the fitted model to
a null model.
- Hosmer-Lemeshow Test: Assess goodness of fit.
- ROC Curve: Evaluate the model’s discriminative
ability.
Check Significance:
- p-values: Determine if the coefficients are
significantly different from zero.
- Confidence Intervals: Provide a range of values for
the coefficients.
Summary
In both multiple linear regression and logistic regression, it’s
crucial to interpret the coefficients, assess model fit, and check the
significance of predictors. The interpretation of numerical and
categorical variables is similar across both types of regression, but
logistic regression involves an additional step of converting log-odds
to odds ratios for better understanding.
< Go Back