Information about Midterm 2
General Information

Midterm 2 is an inclass test.

Midterm 2 will cover all topics of Chapter 2 and 3 and Section 4.2, with the exception of section 2.5 (twoway tables).

The number of questions is approximately 15.

The questions are primarily multiplechoice.

Several questions require calculator use.

Table A (normal distribution), Table B (random digits)
will be provided if needed.
List of Chapter 2 topics covered
 Association and relationship.
 Form (linear, nonlinear, no association).
 Stength.
 Direction.
 Plotting for 2 variables.
 Multiple box plots (when one variable is categorical).
 Scatter plots (when both variables are quantitative).
 Response vs. explanatory variables.
 Relationship vs. association.
 Causality.
 Being able to identify in examples.
 Scatter plots
 Identifying form from the plot (linear, nonlinear, no association).
 Being able to draw for tiny samples (3 and 4 points).
 Determining association strength (weak, strong, very strong) from a scatter plot.
 Identifying outliers, understanding the difference vs. the single variable case.
 Correlation and correlation coefficient.
 Understand the formula (corrected)
\[ r = \frac{1}{n1} \sum_{i=1}^n
\left(\frac{x_i\bar{x}}{s_{x}}\right)
\left(\frac{y_i\bar{y}}{s_{y}}\right)
\]
 Being able to calculate correlation coefficient by hand for 3 and 4 points.
 Knowledge of basic properties such as range (1 to 1), lack
of units, symmetry with respect to swapping variables.
 The significance of high vs. low correlation.
 Correlation and direction of a relationship.
 Understanding the limitations of correlation coefficient for nonlinear relationship
(correlation coefficient does not capture nonlinear relationships, may be zero
in the presence of a strong nonlinear relationship).
 Understanding that correlation only applies to quantitative variables;
for example, there cannot be correlation between gender and life span of people, although it is
known that women live a few years longer than man.
 Least squares, linear regression.
 Formulas for the slope \(b_1\) and intercept \(b_0\) of a regression line
\( y = b_0 + b_1\cdot x \):
\[
\begin{eqnarray}
b_1 &=& r \frac{s_y}{s_x}\\
b_0 &=& \bar{y}b_1\,\bar{x}
\end{eqnarray}
\]

Ability to calculate regression line, predicted values \[\hat{y}_i = b_0 + b_1\,x_i\]
residuals \(y_i\hat{y}_i\); (\(ESS\)  the sum of squared errors of prediction;
See Wikipedia page for
variation in naming of this quantity)
\[ ESS = \sum_{i=1}^n (y_i  \hat{y}_i)^2\]

The total sum of squares = sum of squares of residuals of \(y\)
\[ TSS = \sum_{i=1}^n (y_i  \bar{y})^2 \]

Interpolation and extrapolation using regression line

Understanding predicted value; ability to calculate, given the regression line equation

Understanting residuals; ability to calculate, given the regression line equation

Ability to calculate fitted values and residuals by hand for 3point datasets.

Understanding the interpretation of Rsquared as the
percent of the variation of \(y\) in the vertical
direction explained by the variation of \(x\); calculation
of the coefficient of determination
\[ R^2 = 1\frac{ESS}{TSS}\]

Knowing that \(R^2 = r^2\) for the twovariable linear
least squares regression (the only kind of regression we
have done so far; there will be other kinds of regression
when this formula does not hold!).

NOTE: The fraction
\[\frac{ESS}{TSS}\]
is interpreted as the unexplained portion of the variation of \(y\).

There is a third sum of squares, the regression sum of squares or \(RSS\):
\[ RSS = \sum_{i=1}^n (\hat{y}_i  \bar{y})^2 \]
There is a famous equation, the partition of the sum of squares which states
\[ TSS = ESS + RSS \]
reminiscent of the Pythagorean Theorem. This is a mathematical theorem and it is always exact.
This implies that
\[ R^2 = \frac{RSS}{TSS} \]

See also Wikipedia article
on the coefficient of determination. Unfortunately, there is an incompatibility of notations.

Here is a summary of various notations related to the sums of squares (there are many more!):
\[
\begin{eqnarray}
ESS &=& SSE = S_{err} \qquad\text{The sum of squares of the prediction errors}\\
RSS &=& SSR = S_{reg} = SSM \qquad\text{The sum of squares of the regression errors}\\
TSS &=& SST = S_{tot} \qquad\text{The total sum of squares}
\end{eqnarray}
\]
NOTE: SSM translates to "Sum of squares of error of model", where the "model" refers to
a linear model (regression line).
List of Chapter 3 topics covered

The three principles of experimental design.

Observational vs. experimental studies.

Identification of experimental units.

Identification of population.

Sampling techniques using table of random digits

Basic experimental designs:
 Block (=Stratified)
 Matched pair
 Multistage

Lurking variables (including information in Section 2.6).
Definition, identification, when to watch out for.

Confounding (including information in Section 2.6).
Definition, identification, when to watch out for. You may
study this
note which gives contrasting examples of confounding and
lurking. Study more examples, to be confortable with the
difference.

Bias and variability. Differentiating between the two.

Controlling bias. Controlling by randomization.

Problems when using anecdotal evidence.

Problems when using polling.
List of Chapter 4 topics covered
Probability Models

Know the meaning of outcomes.

Be familiar with basic set theory: elements, sets, curly
brace notation, pairs, tuples, union, intersection,
complement.

Know the meaning of sample spaces and events.

Know the difference between outcomes and elementary events.

Be able to identify and construct sample spaces.
Be able to describe sample spaces using set notation.

Be able to use union, intersection and complement to
describe events described in plain English, using
connectives such as "or", "and" and "not".

Be familiar with standard examples used in class such as
multiple coin tosses, die tosses, free throws in
basketball, picking M&M candy out of a jar (with and without
replacement), tosses of a bottle cap, etc.

Be able to perform calculations of probabilities of events,
based on laws of probability and set notation
(union, intersection, complement).

Know the addition rule for disjoint events and its generalization,
the InclusionExclusion Principle for 2 and 3 events:
\[ P(A\cup B) = P(A) + P(B) \quad\text{if $A\cap B=\emptyset$} \]
\[ P(A\cup B) = P(A) + P(B)  P(A\cap B) \quad\text{(always)} \]
\[ P(A\cup B \cup C) = P(A) + P(B) + P(C)  P(A\cap B)  P(A\cap C)  P(B\cap C) +
P(A\cap B \cap C) \]

Know the meaning of independence of events. Be able to
apply the Multiplication Rule for independent events.