Step 0: Substantive Meaning
The example used in to demonstrate a multiple linear regression is a small expansion of the example used for a simple linear regression. It shows the relationship between how much county level voting shifted towards Donald Trump in 2016 based on county level income as well as county level education. The multiple linear regression function can be expressed as
\[\begin{align} y &= \beta_1 x_1 + \beta_2 x_2 + \beta_0 + \epsilon\\ \text{shift to Trump} & = \beta_1 * \text{median income (\$1,000s)} + \beta_2 * \text{\% college experience} + \beta_0 + \epsilon, \end{align}\]where
- the unit of observation is one county,
- \(y\) represents the shift to Trump,
- How much more did the county vote for Trump in 2016 than it voted for Romney in 2012.
- \(x_1\) represents county median income ($1,000s), and
- \(x_2\) represents county % college experience.
- The percent of the county with at least some college experience, regardless of whether they earned a degree.
The regression results appear in the last column of the table below. The results from the simple linear regression are also included, both to emphasize that the coefficient for county median income changed and because it is common practice to include results from different regression models in the same table.
% shift to Trump, 2012-2016 | ||
County median income ($1,000s) | -0.158* | -0.013 |
(0.006) | (0.007) | |
County college experience | -0.344* | |
(0.012) | ||
Constant | 8.337* | 20.103* |
(0.351) | (0.512) | |
Observations | 3,111 | 3,111 |
Adjusted R2 | 0.203 | 0.371 |
Note: | * p<0.05 |
The regression function, including the estimates for the best fit regression surface, looks like this:
\[\begin{align} \text{shift to Trump} & = -0.013 * \text{median income (\$1,000s)} - 0.344 * \text{\% college experience} + 20.103 + \epsilon. \end{align}\]