# Math 263, Section 001 and 003 - authored by Marek Rychlik

Last updated on September 17, 6:56PM.

## Excel Assignment 2

In this assignment you will:
• Make scatter plots.
• Fit regression lines and find correlation coefficients.
• Calculate coefficient of determination.
• Compute ESS, TSS and RSS.
• Draw conclusions about data.

## Software used

Excel with Data Analysis Pack. Open Office is sufficient.

## The data file

The topic of the data file is "Development of Lung Capacity with Age and the Effect of Smoking". The file FEV.xlsx shows the Forced Expiratory Volume for a number of children and teenagers, along with information about their age, height, and whether they smoke. In this assignment we will look at the relationship between lung capacity and age (repeated for lung capaticity and height) using scatter plots and regression.

## Important: Smoking is ignored

In this assignment, you ignore the information about smoking.

## Steps for regression of FEV on age

• Make a scatter plot of FEV against age. This means FEV is the response variable, that is the dependent variable and age is the explanatory variable, that is the independent variable. Make sure to properly label your data.
• Compute the correlation coefficient between FEV and age.
• Fit a regression line to the data.

## Repeat the above steps for regression of FEV on height

Use the results to:
• Find the prediction of FEV for a 5 foot (60 inch) child

## Compute the partition of squares for the regression you performed

Do this part for regression of FEV on age only.

Find the values of ESS, TSS and RSS. The definitions are given in Midterm 1 study guide Verify the equation $$TSS = ESS + RSS$$ for the specific situation you have. This is a good check that you performed your calculations correctly. Here are some hints:
• Create cells containing sample means $$\bar{x}$$ and $$\bar{y}$$, sample standard deviations $$s_x$$ and $$s_y$$, and correlation coefficient $$r$$
• Create a cell containing the slope $$b_1=r\,s_y/s_x$$and intercept $$b_0=\bar{y}-b_0\,\bar{x}$$ of the regression line
• Compute a column of deviations of FEV from the formula $$= y_i - \bar{y}$$
• Compute a column of predictions $$\hat{y}_i = b_0 + b_1\,x_i$$
• Compute a column of residuals from the formula $$= y_i - \hat{y}$$
• Use the formulas for ESS, TSS and RSS.
NOTE: I did this part by filling columns A-M of my spreadsheet with formulas (you may get fewer columns if you use Excel functions for things such as correlation coefficient and standard deviation). I also used the "Names" facility of Excel to avoid having to remember the addresses of certain important cells. Thus, I used names such as sx, sy, r, yhat, ybar, etc. I also used the "fill handle" a lot. Please make sure you know what it is and how to use it.

## Calculate coefficient of determination

Do this part for regression of FEV on age only.

Calculate $$R^2$$ using several methods:
• Using the fact that $$R^2 = r^2$$
• Using the fact that $$R^2 = 1-ESS/TSS$$
• Using the fact that $$R^2 = RSS/TSS$$
• Making Excel compute the value of $$R^2$$

### Plot the residuals

Repeat this part for both regression of FEV on age and FEV on height. That is, in both cases y is FEV and x is age in the first version and height in the second.

Using Excel options for Regression command in the Data Analysis pack, plot the residuals against the predicted values.

### Plot the normal qqplot for the residuals

Repeat this part for both regression of FEV on age and FEV on height. That is, in both cases y is FEV and x is age in the first version and height in the second.

Using Excel options for Regression command in the Data Analysis pack, plot the normal quantile-quantile plot for the residuals.

### Draw conclusions about the data

Please answer the following questions for both the regression of FEV on age and FEV on height:
• Is there an association between FEV and the explanatory variable (age or height)?
• If the association exists, is it strong or weak?
• Is the relationship linear or non-linear
• What is the direction of the relationship?
• Do the residuals look normally distributed?
• What percent of the variation of FEV is explained by the variation of the explanatory variable (age or height)?
NOTE: When deciding whether the relationship is linear or non-linear, you may fit a quadratic trendline, to see if the resulting parabola differs substantially from the regression line. Also, you may check what happens to the coefficient of determination for the quadratic fit.