# 18๐ธ Midterm review!

Below are some sample questions that you should be able to answer before you take the midterm! This does not necessarily reflect the questions you may be asked on the midterm.

• Why might I want to take the log of a variable?

To make it appear as though there is less skew in the data. To bring large values closer to smaller values. Makes it appear more like a normal distribution.

• What does the central tendency of a univariate descriptive statistic refer to?

It is one way of describing a value I might expect to get if I randomly grabbed an observation from my data.

• What are two calculations I can make to describe the central tendency of a variable?
• Mean
• Median
• What does the dispersion or the spread of a univariate descriptive statistic refer to?

It goes beyond saying what the average value is for a variable in my data, but it tells me how spread out they are. I need to know more than what the median house price is; I may want to know whether all houses are around that median or if there are some really cheap houses and some really expensive houses.

• What are the two calculations I can make to describe the spread or dispersion of a variable?
• Variance ($$\sigma^2$$)
• Standard deviation ($$\sigma$$)
• When you are asked to describe a variable, what are the two things that you should include to describe it?
• Central tendency

I need to do this because one of these things on their own is not sufficient for me to understand what the bulk of observations look like on that observation (central tendency โ mean or median) or I wonโt understand how spread out observations are from that central tendency (spread/dispersion โ variance or standard deviation).

• An independent variable refers to what?

The variable that we think explains, has an effect upon or predicts another variable.

• A dependent variable refers to what?

The variable that we think is the outcome, is explained by, or is dependent on some other variable.

• A bivariate regression refers to a regression including two variables or more variables?

Two variables. Bi โ two; variate โ variables

• What is a confounding variable?

A variable that effects both the dependent and independent variable. It is not a variable that is effected by either of the two.

• What plot is appropriate for describing the bivariate relationship between a categorical variable and a continuous variable?

A two-way boxplot! The categorical variable goes on the x-axis and the continuous variable would be on the y-axis. Make sure to know which plots are most appropriate for different types of data!

• How would I interpret Table 25.1 from a bivariate regression model?
Table 18.1: The effect of family income on feelings toward Hillary Clinton
(1)
(Intercept) 43.643***
(1.218)
faminc โ0.032
(0.035)
Num.Obs. 1178
R2 0.001
AIC 11822.2
BIC 11837.4
Log.Lik. โ5908.086
RMSE 36.47
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Data source: Waffles dataset (McElreath 2020).
Credit: damoncroberts.com
Coefficient estimates from OLS.
Standard errors in parentheses

For every unit increase in family income, I would expect a -0.032 decrease in favorable attitudes directed toward Hillary Clinton. The probability that the effect of family income on feelings toward Hillary Clinton would be this large or larger if the true effect were actually 0 is 37.336. It seems relatively plausible that the effect of income on feelings toward Hillary Clinton is actually zero.

It reflects the variation in Feelings toward Hillary Clinton (the dependent variable) that is not explained by the independent and control variables I include in my model. It reflects the baseline feeling toward Hillary Clinton for people that have 0 family income. It is the y-intercept.

• What does the p-value of a model represent and what does it tell me about statistical significance?

The p-value states the probability that Iโd observe an effect (my $$\beta$$ coefficient) that large or larger if the actual effect is zero.

Smaller values means it is more implausible that I would have come up with a $$\beta$$ coefficient if the effect of the independent variable on the dependent variable were actually zero.

This means that the smaller the p-value, the better for statistical significance! Usually the standard is: if your p-value is less than 0.05, then you have a statistically significant result on your hands.

• What is a residual?

It is the difference between the observed value (what I have in my data) and the predicted value I get from my regression model (or line of best fit if I plot it). It reflects how well my particular regression fits to my data. The larger the residuals, the worse my model is doing in predicting my observed values.

• Knowing this about a residual, what does my standard error tell me?

The standard error is an estimate of how uncertain we are about our model. It tells us that, when I am wrong (when my residual is not equal to zero), just how โoffโ am I? When I am wrong, is my residual huge or small? The smaller the standard error, the better. It would mean that, if I am off, my residuals arenโt all that large on average.

• Say I give you Table 25.2 to interpret, how would you go about doing that?
Table 18.2: The effect of family income on feelings toward Hillary Clinton, conditional on gender.
(1)
(Intercept) 40.746***
(1.749)
faminc โ0.069
(0.053)
genderFemale 5.702*
(2.429)
faminc ร genderFemale 0.063
(0.071)
Num.Obs. 1178
R2 0.010
When I am looking at male respondents (when family income equals zero), for every unit increase in family income, there is a -0.069 unit decrease in feelings toward Hillary Clinton. This effect does not appear statistically insignificant. When Looking at Female individuals with zero income, they tend to report 5.702 points higher on their feelings toward Hillary Clinton relative to Males with zero income. This does not appear to be statistically significant. We see that for every unit increase in family income, Women tend to report 0.063 points higher on their feelings toward Hillary Clinton relative to males. This effect also does not appear to be statistically significant.