## Information about Midterm 2

### General Information

• Midterm 2 is an in-class test.
• Midterm 2 will cover all topics of Chapter 2 and 3 and Section 4.2, with the exception of section 2.5 (two-way tables).
• The number of questions is approximately 15.
• The questions are primarily multiple-choice.
• Several questions require calculator use.
• Table A (normal distribution), Table B (random digits) will be provided if needed.

### List of Chapter 2 topics covered

• Association and relationship.
• Form (linear, non-linear, no association).
• Stength.
• Direction.
• Plotting for 2 variables.
• Multiple box plots (when one variable is categorical).
• Scatter plots (when both variables are quantitative).
• Response vs. explanatory variables.
• Relationship vs. association.
• Causality.
• Being able to identify in examples.
• Scatter plots
• Identifying form from the plot (linear, non-linear, no association).
• Being able to draw for tiny samples (3 and 4 points).
• Determining association strength (weak, strong, very strong) from a scatter plot.
• Identifying outliers, understanding the difference vs. the single variable case.
• Correlation and correlation coefficient.
• Understand the formula (corrected) $r = \frac{1}{n-1} \sum_{i=1}^n \left(\frac{x_i-\bar{x}}{s_{x}}\right) \left(\frac{y_i-\bar{y}}{s_{y}}\right)$
• Being able to calculate correlation coefficient by hand for 3 and 4 points.
• Knowledge of basic properties such as range (-1 to 1), lack of units, symmetry with respect to swapping variables.
• The significance of high vs. low correlation.
• Correlation and direction of a relationship.
• Understanding the limitations of correlation coefficient for non-linear relationship (correlation coefficient does not capture non-linear relationships, may be zero in the presence of a strong non-linear relationship).
• Understanding that correlation only applies to quantitative variables; for example, there cannot be correlation between gender and life span of people, although it is known that women live a few years longer than man.
• Least squares, linear regression.
• Formulas for the slope $$b_1$$ and intercept $$b_0$$ of a regression line $$y = b_0 + b_1\cdot x$$: $\begin{eqnarray} b_1 &=& r \frac{s_y}{s_x}\\ b_0 &=& \bar{y}-b_1\,\bar{x} \end{eqnarray}$
• Ability to calculate regression line, predicted values $\hat{y}_i = b_0 + b_1\,x_i$ residuals $$y_i-\hat{y}_i$$; ($$ESS$$ - the sum of squared errors of prediction; See Wikipedia page for variation in naming of this quantity) $ESS = \sum_{i=1}^n (y_i - \hat{y}_i)^2$
• The total sum of squares = sum of squares of residuals of $$y$$ $TSS = \sum_{i=1}^n (y_i - \bar{y})^2$
• Interpolation and extrapolation using regression line
• Understanding predicted value; ability to calculate, given the regression line equation
• Understanting residuals; ability to calculate, given the regression line equation
• Ability to calculate fitted values and residuals by hand for 3-point datasets.
• Understanding the interpretation of R-squared as the percent of the variation of $$y$$ in the vertical direction explained by the variation of $$x$$; calculation of the coefficient of determination $R^2 = 1-\frac{ESS}{TSS}$
• Knowing that $$R^2 = r^2$$ for the two-variable linear least squares regression (the only kind of regression we have done so far; there will be other kinds of regression when this formula does not hold!).
• NOTE: The fraction $\frac{ESS}{TSS}$ is interpreted as the unexplained portion of the variation of $$y$$.
• There is a third sum of squares, the regression sum of squares or $$RSS$$: $RSS = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2$ There is a famous equation, the partition of the sum of squares which states $TSS = ESS + RSS$ reminiscent of the Pythagorean Theorem. This is a mathematical theorem and it is always exact. This implies that $R^2 = \frac{RSS}{TSS}$
• See also Wikipedia article on the coefficient of determination. Unfortunately, there is an incompatibility of notations.
• Here is a summary of various notations related to the sums of squares (there are many more!): $\begin{eqnarray} ESS &=& SSE = S_{err} \qquad\text{The sum of squares of the prediction errors}\\ RSS &=& SSR = S_{reg} = SSM \qquad\text{The sum of squares of the regression errors}\\ TSS &=& SST = S_{tot} \qquad\text{The total sum of squares} \end{eqnarray}$ NOTE: SSM translates to "Sum of squares of error of model", where the "model" refers to a linear model (regression line).

### List of Chapter 3 topics covered

• The three principles of experimental design.
• Observational vs. experimental studies.
• Identification of experimental units.
• Identification of population.
• Sampling techniques using table of random digits
• Basic experimental designs:
• Block (=Stratified)
• Matched pair
• Multi-stage
• Lurking variables (including information in Section 2.6). Definition, identification, when to watch out for.
• Confounding (including information in Section 2.6). Definition, identification, when to watch out for. You may study this note which gives contrasting examples of confounding and lurking. Study more examples, to be confortable with the difference.
• Bias and variability. Differentiating between the two.
• Controlling bias. Controlling by randomization.
• Problems when using anecdotal evidence.
• Problems when using polling.

### List of Chapter 4 topics covered

#### Probability Models

• Know the meaning of outcomes.
• Be familiar with basic set theory: elements, sets, curly brace notation, pairs, tuples, union, intersection, complement.
• Know the meaning of sample spaces and events.
• Know the difference between outcomes and elementary events.
• Be able to identify and construct sample spaces. Be able to describe sample spaces using set notation.
• Be able to use union, intersection and complement to describe events described in plain English, using connectives such as "or", "and" and "not".
• Be familiar with standard examples used in class such as multiple coin tosses, die tosses, free throws in basketball, picking M&M candy out of a jar (with and without replacement), tosses of a bottle cap, etc.
• Be able to perform calculations of probabilities of events, based on laws of probability and set notation (union, intersection, complement).
• Know the addition rule for disjoint events and its generalization, the Inclusion-Exclusion Principle for 2 and 3 events: $P(A\cup B) = P(A) + P(B) \quad\text{if A\cap B=\emptyset}$ $P(A\cup B) = P(A) + P(B) - P(A\cap B) \quad\text{(always)}$ $P(A\cup B \cup C) = P(A) + P(B) + P(C) - P(A\cap B) - P(A\cap C) - P(B\cap C) + P(A\cap B \cap C)$
• Know the meaning of independence of events. Be able to apply the Multiplication Rule for independent events.