Information about Midterm 1 (Updated for Fall 2012)
General Information
- Midterm 1 will cover topics of Chapter 1.
- The number of questions is approximately 15.
- The questions are primarily multiple-choice.
- There may be a problem involving calculation of regression
- Several questions require calculator use.
- Table A for normal distribution will be provided.
-
Some questions will required good judgment. For instance,
a statement that 165 is approximately 180 may be valid. If a variable
has mean 180 and standard deviation is 20 then 165 is approximately
180. If standard deviation is 2, it is not a good approximation.
List of Chapter 1 topics covered
- Quantitative vs. categorical variables.
- Stem plots. Please make sure your familiar with the notion of splitting.
- Calculation of median, quartiles \(Q_1\) and \(Q_3\), range and \(IQR=Q_3-Q_1\)
using the two major methods: Method 1 (excludes median when calculating quartiles) and Method 2 (includes median when
calculating quartiles). Understand the differences in calculating quartiles. Your calculator
may give different values from the calculation method in the book (Method 1).
Rule for the Exam: Unless explicitly asked for Method 2, you must use the method discussed in the book,
i.e. Method 1 (exclude the median if it is a data point, when calculating the quartiles; this is only a concern when
the number of data points is odd; for even sample sizes, Method 1 and 2 yield the same quartiles).
-
Here is an additional invormation on this issue, reflecting the unsatisfactory
situation resulting from the lack of standardization of quartile. Even in this
course, you will encounter this issue. Excel, R and the book differ in
the calculation of quartiles, and WebAssign seems to be in flux on this issue
(it does not always appear to follow the book!).
You may need to sort the data and calculate the quartiles by hand if
your calculator does Method 2, or some other method.
Method 2, which is used by R, is dissected on this page.
Please read Wikipedia article
which precisely defines the two methods. You may be asked explicitly
to compute a quartile by a specific method, i.e. a request "Use Method 1" or "Use Method 2"
may appear in your test. For reference, here are the most likely situations
where you have to know which method is used:
- Method 1 is used by TI-83 and the book.
- Method 2 is used by R function fivenum
By no means, these are the only methods. More ways to find the
positions of quartiles are found here.
Excel appears to do its own thing, which is neither Method 1 nor Method 2.
It calculates \(r=1/4\times(n-1)\) where \(n\) is the sample size. If it is
a whole number, it is the position of the quartile. Otherwise, it
takes the numbers at positions \(\lfloor r \rfloor\) and
\(\lfloor r \rfloor + 1\) and it computes the weighted average
of the data at these position, where the weight may be \(k/4\),
with \(k=1,2,3\). See
this page and link there to
another page
which describes the method of Excel. In general, there is a massive confusion about this
matter because of inadequate documentation of MS Office.
- Bar Charts
- Applicable to one quantitative and one categorical variable.
- Bar chart vs. histogram
- Histograms
- One quantitative variable. Contrast with bar chart.
- Bins. Under and over- summarized.
- Recognizing right/left skewed distributions.
- Recognizing outliers.
- Modality (unimodal/bimodal/multimodal).
- The mean
- What does it measure?
- Understanding the formula
\[\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\]
where the sample consists of the numbers \(x_1,x_2,\ldots,x_n\) and \(n\) is the sample size.
- Calculating for small samples.
- Estimation based on graphs.
- Standard deviation.
- What does it measure?
- Understanding the formula
\[s_x^2 = \frac{1}{df}\sum_{i=1}^n (x_i-\bar{x})^2\]
where the sample consists of the numbers \(x_1,x_2,\ldots,x_n\) and \(n\) is the sample size
and \(df = n-1\) is the number of degrees of freedom; in particular, do not
make a mistake of dividing by \(n\).
- Be aware that \(s_x^2\) is called sample variance.
- Calculation for tiny samples (1-4 elements).
- \(IQR\)
- Estimating from data.
- Estimating from stemplots.
- Estimating from boxplots.
- Understanding boxplots, including representation of outliers
- \(1.5\cdot IQR\) rule; a data point that is not in the interval:
\[ \bigg[Q_1-1.5\cdot IQR, Q_3+1.5\cdot IQR \bigg] \]
is an outlier, where \(Q_1\) is the first quartile and \(Q_3\) is the
third quartile.
- Uniform distribution.
- Density curve
\[ f(x) = \begin{cases} \frac{1}{b-a} & \text{$a\leq x\leq b$}\\
0 &\text{otherwise}
\end{cases}
\]
- Understanding that the area under the curve is 1
- Evaluating mean, median, quartile, range, \(IQR\)
- Estimating probability
- Normal distribution
- Density curve.
- Parameters \(\mu\) and \(\sigma\), center, symmetry, spread.
- Familiarity with the form of equation when given:
\[f(x)=\frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\]
Memorization not necessary, but be able to distinguish from uniform distribution density curve.
-
Understanding the distinction between the mean \(\mu\) of the normal
distribution and the (a parameter of the normal density curve) and
a sample mean \[\bar{x}=\frac{1}{n}\sum_i x_i\] calculated from
a sample drawn from a normally distributed general population.
-
Understanding the distinction between the standard deviation \(\sigma\) of the normal
distribution and the (a parameter of the normal density curve) and
a sample standard deviation
\[s_x=\sqrt{\frac{1}{n-1}(x_i-\bar{x})^2}\] calculated from
a sample drawn from a normally distributed general population.
-
Understanding notation \(N(\mu,\sigma)\)
-
Understanding the standard normal distribution \(N(0,1)\) and Table A. Note that Table A
tabulates the area under the density curve: the value in the table for given \(z\) is:
\[F(z)=\frac{1}{\sqrt{2\pi}} \int_{-\infty}^z e^{-\frac{1}{2}x^2}\,dx\]
-
Understand that \(F(z)\) is an increasing function, \(\lim_{z\to\infty}F(z)=1\)
and \(\lim_{z\to\infty}F(z)=0\).
-
Standardization and z-score.
-
68-95-99.7 rule.
-
Calculating probabilities based on Table A, both straight (given z, find p)
and inverse (given p, find z).
-
Interpretation of questions in terms of inequalities (Z>z, z1<Z<z2).
-
Identification of probability as area under the curve.
-
Linear interpolation based on Table A (both straight and inverse). This is optional
and it should not be needed on the test. This is used to increase accuracy of
results obtained by using tables, and is covered in
one of the videos.
Also, the inverse lookup technique based on the tables is covered in
this video.
Topics explicitly excluded