# Math 263, Section 001 and 003 - Excel/R Assignment 6

Last updated on December 9, 1:35 AM.

## A note of a typo eliminated by the last update

Note that you should always call 'oneway.test' with a '~' not ','. Thus
	oneway.test(Sodium ~ Type, data=hot.dog.data)

not
	oneway.test(Sodium, Type, data=hot.dog.data)


## Excel/R Assignment 6

In this assignment you will:
• Read data from a text file using R.
• Perform one-way ANOVA for data with a stratified design.
• Perform two-sample t-tests to test for differences between groups.
• Draw conclusions about the respective means.

## Software used

The statistical package R. Although RExcel may be used, the best way is to use R without RExcel. You may also use Excel Data Analysis Pack ANOVA, but no instructions are provided here.

## The data file

The data file is Dataset6.txt. The file is prepared to be read with the R command:
> hot.dog.data <- read.table("Dataset6.txt", header=T)
> attach(hot.dog.data)


The dataset is a famous dataset from CMU of http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html . Please read the story to understand the data.

## Variables in the dataset

• Please identify the quantitative and categorical variables present in the dataset.

## Experimental design

• Please identify the experimental design used (SRS? Block?).

## Looks at the data (with boxplot)

For each of the quantitative variables (e.g. "Sodium") split the variable into groups according to the levels of the factor (= categorical variable) "Type":
• Draw the simultaneous boxplot of the three groups: Beef, Meat and Poultry.

### Splitting variables according to a factor (= categorical variable)

#### Rationale

When your observations are recorded in a spreadsheet or a datafile, the group is identified by a level of a factor. We need to split the observations into groups.

#### How to split and create a boxplot with R?

It is simple:
> sodium.split <-  split(Sodium, Type)
> boxplot(sodium.split, ylab="Sodium")

The result for sodium is presented below:

#### Under the hood

The variable "sodium.split" is a list of variables "Beef", "Meat" and "Poultry". It is not (and cannot be) a dataframe, because the number of observations differs in each variable. However, you can still do this:
> attach(sodium.split)

to make variables "Beef", "Meat" and "Poultry" that you will need to conduct the t-tests in the last part of the assignment.

#### Detach unused dataframes or lists

Please make sure to detach the list after use, and before analysing "Calories", because you would have a name conflict between variables.
> detach(sodium.split)


#### An even faster way to get a side-by-side boxplot

Do this:
> boxplot(Sodium ~ Type)

This uses the '~', which indicates "the formula syntax". Formulas are a powerful mechanism in R, but using formualas correctly requires some experience.

#### What is a name conflict?

If you already have a variable "Beef" from splitting "Sodium", you cannot have a "Beef" variable from splitting "Calories". R will complain after you try to attach the second list, given the first one is already attached. Thus, you need to detach the "Sodium" list before you can attach "Calories" list. The complaint refers to one variable "masking" the other.

## Perform ANOVA, using a built-in command 'oneway.test'

Below there are instructions on how to perform the test with R, using built-in commands for maximum time savings. For every of the two quantitative variables, "Calories" and "Sodium":
• Perform one-way ANOVA.
• Include the software output in your paper.
• Carefully state what colclusions can be made based on the sample, at 90% confidence level.
There are several ways to perform ANOVA in R. For example, you may use the following command:
> oneway.test(Q ~ F, data=x)

which will perform a one-way ANOVA for a quantitative variable Q and a categorical variable F (also called a "factor"). The values of Q will be split into groups according to the factor F and a significance test for the means will be conducted for you. All that remains is to draw the conclusions.

### An example

Thus, the following will perform the ANOVA on Sodium split according to (meat) "Type":
> oneway.test(Sodium ~ Type, data=hot.dog.data)


## Perform ANOVA, using R as a super-calculator

### The step-by-step script example

Please use the script script.R illustrating the approach. In principle, you can also use Excel for a similar calculation. In principle, the calculations also may be performed by hand, or with the aid of a plain calculator without statistical functions.

### A note vis a vis the Final Exam

The best way to master one-way ANOVA for the Final Exam is to follow every step and confirm the calculation results with a simple calculator (the one you will bring to the Final Exam).

### What to include in your report

• The number of degrees of freedom for the numerator and the denominator.
• The value of the Fisher's F-statistic.
• The P-value.
• The null and alternative hypothesis.
• The test conclusion.

## Perform two-sample t-tests for differences in each pair

### Rationale

Often, t-test is used to reveal the differences between individual groups. This procedure is suspect (see comments below on Bonferroni procedure). However, it is often used

### What t-tests you should conduct?

You should conduct a t-test to test for the difference between the three groups (levels of the factor "Type"): "Beef", "Meat" and "Poultry". Please repeat for all quantitative variables (e.g. "Sodium"). Thus, you will have three pairs of variable, and three t-test for each quantitative variable.

### How to use R to answer the question?

Please follow the following steps for both quantitative variables ("Calories" and "Sodium"):
• Let X be one of the variables ("Calories" or "Sodium"). Split the variable according to the factor "Type". The following commands will do this (X="Calories")
> hot.dog.data = read.table("Dataset6.txt", header=T)
> attach(hot.dog.data)
> calories.by.type = split(Calories, Type)
> attach(calories.by.type)

Now, you will have three variables: "Beef", "Meat" and "Poultry". They will hold calories in each kind of hot dog. (For convenience, we repeated some commands which read data and attach frames).
• Conduct the two-sample t-test on each pair of variable (thus, you will have three t-tests to perform for each quantitative variable). The t-test may be conducted in the following manner (using the pair Beef-Meat as an example):
> t.test(Beef, Meat)

• Report the values of the
• t-statistic
• the corresponding P-value
It is sufficient to include the output of the above command.
• Draw conclusions about each of the 6 pairs of variables. Minimally, reject null hypothesis, or say there is no reason to reject, at 90% confidence level.

### A word of caution about multiple t-tests

Performing multiple t-tests on the same data is not a valid statistical procedure. Basically, you have a lower confidence level than you would think.

### The Bonferroni procedure

See the Bonferroni procedure to correctly use multiple t-tests. Also, there is a Wikipedia article: http://en.wikipedia.org/wiki/Bonferroni_correction. Actually, there is a fast way to perform the pairwise t-test procedure in R, taking the Bonferroni correction into account (for Sodium):
> pairwise.t.test(Sodium, Type)

Pairwise comparisons using t tests with pooled SD

data:  Sodium and Type

Beef Meat
Meat    0.58 -
Poultry 0.21 0.43


Note that the P-values (corrected due to multiple t-tests) are
• 0.58 between Beef and Meat
• 0.21 between Beef and Poultry
• 0.43 between Meat and Poultry
None of them are significant. Note that 'pairwise.t.test' also splits the variable "Sodium" into groups according to the levels of "Type" by itself.

NOTE: Your results done without correction will be different!

## Troubleshooting

### Transferring graphics to Word or another processor, when using R console only

If you use RExcel or RCommander, this is a simple matter of cut-and-paste. However, if you are using R console, there is an extra step: you need to store your graph in a file. This is how this is accomplished:
> sodium.split <-  split(Sodium, Type)
> jpeg("myboxplot.jpg")
> boxplot(sodium.split, ylab="Sodium")
> dev.off()
null device
1

After you do this, there is a graphcs file "myboxplot.jpg" in your working directory, which you can open and include in your documents (you can simply drop this file onto your Word document).

#### An explanation

The command 'jpeg("myboxplot.jpg") tells R to put the graphics in a file with the designated name "myboxplot.jpg", as a JPEG file.

The command 'dev.off()' turns off the current graphics 'device' (in this case, the JPEG file). You must do this, because this causes the file to be actually written. If you do not do this, the file will exist, but it will be empty. As we say, the graphics will be "flushed" at this moment. After you do 'dev.off()', the graphics will go again to your screen.

If you are interested in having graphics in another format, or controlling things like size of the image, please read the manual page:

> ?jpeg