Midterm 1 Information

Information about Midterm 1 (Updated for Fall 2012)

General Information

Midterm 1 will cover topics of Chapter 1.
The number of questions is approximately 15.
The questions are primarily multiple-choice.
There may be a problem involving calculation of regression
Several questions require calculator use.
Table A for normal distribution will be provided.
Some questions will required good judgment. For instance, a statement that 165 is approximately 180 may be valid. If a variable has mean 180 and standard deviation is 20 then 165 is approximately 180. If standard deviation is 2, it is not a good approximation.

List of Chapter 1 topics covered

Quantitative vs. categorical variables.
Stem plots. Please make sure your familiar with the notion of splitting.
Calculation of median, quartiles $Q_1$ and $Q_3$, range and $IQR=Q_3-Q_1$ using the two major methods: Method 1 (excludes median when calculating quartiles) and Method 2 (includes median when calculating quartiles). Understand the differences in calculating quartiles. Your calculator may give different values from the calculation method in the book (Method 1). Rule for the Exam: Unless explicitly asked for Method 2, you must use the method discussed in the book, i.e. Method 1 (exclude the median if it is a data point, when calculating the quartiles; this is only a concern when the number of data points is odd; for even sample sizes, Method 1 and 2 yield the same quartiles).
Here is an additional invormation on this issue, reflecting the unsatisfactory situation resulting from the lack of standardization of quartile. Even in this course, you will encounter this issue. Excel, R and the book differ in the calculation of quartiles, and WebAssign seems to be in flux on this issue (it does not always appear to follow the book!). You may need to sort the data and calculate the quartiles by hand if your calculator does Method 2, or some other method. Method 2, which is used by R, is dissected on this page. Please read Wikipedia article which precisely defines the two methods. You may be asked explicitly to compute a quartile by a specific method, i.e. a request "Use Method 1" or "Use Method 2" may appear in your test. For reference, here are the most likely situations where you have to know which method is used:
- Method 1 is used by TI-83 and the book.
- Method 2 is used by R function fivenum
By no means, these are the only methods. More ways to find the positions of quartiles are found here. Excel appears to do its own thing, which is neither Method 1 nor Method 2. It calculates $r=1/4\times(n-1)$ where $n$ is the sample size. If it is a whole number, it is the position of the quartile. Otherwise, it takes the numbers at positions $\lfloor r \rfloor$ and $\lfloor r \rfloor + 1$ and it computes the weighted average of the data at these position, where the weight may be $k/4$, with $k=1,2,3$. See this page and link there to another page which describes the method of Excel. In general, there is a massive confusion about this matter because of inadequate documentation of MS Office.
Bar Charts
- Applicable to one quantitative and one categorical variable.
- Bar chart vs. histogram
Histograms
- One quantitative variable. Contrast with bar chart.
- Bins. Under and over- summarized.
- Recognizing right/left skewed distributions.
- Recognizing outliers.
- Modality (unimodal/bimodal/multimodal).
The mean
- What does it measure?
- Understanding the formula \[\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\] where the sample consists of the numbers $x_1,x_2,\ldots,x_n$ and $n$ is the sample size.
- Calculating for small samples.
- Estimation based on graphs.
Standard deviation.
- What does it measure?
- Understanding the formula \[s_x^2 = \frac{1}{df}\sum_{i=1}^n (x_i-\bar{x})^2\] where the sample consists of the numbers $x_1,x_2,\ldots,x_n$ and $n$ is the sample size and $df = n-1$ is the number of degrees of freedom; in particular, do not make a mistake of dividing by $n$.
- Be aware that $s_x^2$ is called sample variance.
- Calculation for tiny samples (1-4 elements).
$IQR$
- Estimating from data.
- Estimating from stemplots.
- Estimating from boxplots.
Understanding boxplots, including representation of outliers
$1.5\cdot IQR$ rule; a data point that is not in the interval: \[ \bigg[Q_1-1.5\cdot IQR, Q_3+1.5\cdot IQR \bigg] \] is an outlier, where $Q_1$ is the first quartile and $Q_3$ is the third quartile.
Uniform distribution.
- Density curve \[ f(x) = \begin{cases} \frac{1}{b-a} & \text{$a\leq x\leq b$}\\ 0 &\text{otherwise} \end{cases} \]
- Understanding that the area under the curve is 1
- Evaluating mean, median, quartile, range, $IQR$
- Estimating probability
Normal distribution
- Density curve.
- Parameters $\mu$ and $\sigma$, center, symmetry, spread.
- Familiarity with the form of equation when given: \[f(x)=\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\] Memorization not necessary, but be able to distinguish from uniform distribution density curve.
- Understanding the distinction between the mean $\mu$ of the normal distribution and the (a parameter of the normal density curve) and a sample mean \[\bar{x}=\frac{1}{n}\sum_i x_i\] calculated from a sample drawn from a normally distributed general population.
- Understanding the distinction between the standard deviation $\sigma$ of the normal distribution and the (a parameter of the normal density curve) and a sample standard deviation \[s_x=\sqrt{\frac{1}{n-1}(x_i-\bar{x})^2}\] calculated from a sample drawn from a normally distributed general population.
- Understanding notation $N(\mu,\sigma)$
- Understanding the standard normal distribution $N(0,1)$ and Table A. Note that Table A tabulates the area under the density curve: the value in the table for given $z$ is: \[F(z)=\frac{1}{\sqrt{2\pi}} \int_{-\infty}^z e^{-\frac{1}{2}x^2}\,dx\]
- Understand that $F(z)$ is an increasing function, $\lim_{z\to\infty}F(z)=1$ and $\lim_{z\to\infty}F(z)=0$.
- Standardization and z-score.
- 68-95-99.7 rule.
- Calculating probabilities based on Table A, both straight (given z, find p) and inverse (given p, find z).
- Interpretation of questions in terms of inequalities (Z>z, z₁<Z<z₂).
- Identification of probability as area under the curve.
- Linear interpolation based on Table A (both straight and inverse). This is optional and it should not be needed on the test. This is used to increase accuracy of results obtained by using tables, and is covered in one of the videos. Also, the inverse lookup technique based on the tables is covered in this video.

Information about Midterm 1 (Updated for Fall 2012)

General Information

List of Chapter 1 topics covered

Topics explicitly excluded