Math 263, Section 001 and 003 - Excel/R Assignment 6Last updated on December 9, 1:35 AM.
A note of a typo eliminated by the last updateNote that you should always call 'oneway.test' with a '~' not ','. Thus
oneway.test(Sodium ~ Type, data=hot.dog.data)not
oneway.test(Sodium, Type, data=hot.dog.data)
Excel/R Assignment 6In this assignment you will:
- Read data from a text file using R.
- Perform one-way ANOVA for data with a stratified design.
- Perform two-sample t-tests to test for differences between groups.
- Draw conclusions about the respective means.
Software usedThe statistical package R. Although RExcel may be used, the best way is to use R without RExcel. You may also use Excel Data Analysis Pack ANOVA, but no instructions are provided here.
The data file
Loading dataThe data file is Dataset6.txt. The file is prepared to be read with the R command:
> hot.dog.data <- read.table("Dataset6.txt", header=T) > attach(hot.dog.data)
About the datasetThe dataset is a famous dataset from CMU of http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html . Please read the story to understand the data.
Variables in the dataset
- Please identify the quantitative and categorical variables present in the dataset.
- Please identify the experimental design used (SRS? Block?).
Looks at the data (with boxplot)For each of the quantitative variables (e.g. "Sodium") split the variable into groups according to the levels of the factor (= categorical variable) "Type":
- Draw the simultaneous boxplot of the three groups: Beef, Meat and Poultry.
Splitting variables according to a factor (= categorical variable)
RationaleWhen your observations are recorded in a spreadsheet or a datafile, the group is identified by a level of a factor. We need to split the observations into groups.
How to split and create a boxplot with R?It is simple:
> sodium.split <- split(Sodium, Type) > boxplot(sodium.split, ylab="Sodium")The result for sodium is presented below:
Under the hoodThe variable "sodium.split" is a list of variables "Beef", "Meat" and "Poultry". It is not (and cannot be) a dataframe, because the number of observations differs in each variable. However, you can still do this:
> attach(sodium.split)to make variables "Beef", "Meat" and "Poultry" that you will need to conduct the t-tests in the last part of the assignment.
Detach unused dataframes or listsPlease make sure to detach the list after use, and before analysing "Calories", because you would have a name conflict between variables.
An even faster way to get a side-by-side boxplotDo this:
> boxplot(Sodium ~ Type)This uses the '~', which indicates "the formula syntax". Formulas are a powerful mechanism in R, but using formualas correctly requires some experience.
What is a name conflict?If you already have a variable "Beef" from splitting "Sodium", you cannot have a "Beef" variable from splitting "Calories". R will complain after you try to attach the second list, given the first one is already attached. Thus, you need to detach the "Sodium" list before you can attach "Calories" list. The complaint refers to one variable "masking" the other.
Perform ANOVA, using a built-in command 'oneway.test'Below there are instructions on how to perform the test with R, using built-in commands for maximum time savings. For every of the two quantitative variables, "Calories" and "Sodium":
- Perform one-way ANOVA.
- Include the software output in your paper.
- Carefully state what colclusions can be made based on the sample, at 90% confidence level.
> oneway.test(Q ~ F, data=x)which will perform a one-way ANOVA for a quantitative variable Q and a categorical variable F (also called a "factor"). The values of Q will be split into groups according to the factor F and a significance test for the means will be conducted for you. All that remains is to draw the conclusions.
An exampleThus, the following will perform the ANOVA on Sodium split according to (meat) "Type":
> oneway.test(Sodium ~ Type, data=hot.dog.data)
Perform ANOVA, using R as a super-calculator
The step-by-step script examplePlease use the script script.R illustrating the approach. In principle, you can also use Excel for a similar calculation. In principle, the calculations also may be performed by hand, or with the aid of a plain calculator without statistical functions.
A note vis a vis the Final ExamThe best way to master one-way ANOVA for the Final Exam is to follow every step and confirm the calculation results with a simple calculator (the one you will bring to the Final Exam).
What to include in your reportPlease report the following:
- The number of degrees of freedom for the numerator and the denominator.
- The value of the Fisher's F-statistic.
- The P-value.
- The null and alternative hypothesis.
- The test conclusion.
Perform two-sample t-tests for differences in each pair
RationaleOften, t-test is used to reveal the differences between individual groups. This procedure is suspect (see comments below on Bonferroni procedure). However, it is often used
What t-tests you should conduct?You should conduct a t-test to test for the difference between the three groups (levels of the factor "Type"): "Beef", "Meat" and "Poultry". Please repeat for all quantitative variables (e.g. "Sodium"). Thus, you will have three pairs of variable, and three t-test for each quantitative variable.
What to report?
- Please confirm the statement in the story http://lib.stat.cmu.edu/DASL/Stories/Hotdogs.html regarding t-tests.
How to use R to answer the question?Please follow the following steps for both quantitative variables ("Calories" and "Sodium"):
Let X be one of the variables ("Calories" or "Sodium").
Split the variable according to the factor "Type". The
following commands will do this (X="Calories")
> hot.dog.data = read.table("Dataset6.txt", header=T) > attach(hot.dog.data) > calories.by.type = split(Calories, Type) > attach(calories.by.type)Now, you will have three variables: "Beef", "Meat" and "Poultry". They will hold calories in each kind of hot dog. (For convenience, we repeated some commands which read data and attach frames).
Conduct the two-sample t-test on each pair of variable
(thus, you will have three t-tests to perform for each
quantitative variable). The t-test may be conducted in the
following manner (using the pair Beef-Meat as an example):
> t.test(Beef, Meat)
Report the values of the
- the corresponding P-value
- Draw conclusions about each of the 6 pairs of variables. Minimally, reject null hypothesis, or say there is no reason to reject, at 90% confidence level.
A word of caution about multiple t-testsPerforming multiple t-tests on the same data is not a valid statistical procedure. Basically, you have a lower confidence level than you would think.
The Bonferroni procedureSee the Bonferroni procedure to correctly use multiple t-tests. Also, there is a Wikipedia article: http://en.wikipedia.org/wiki/Bonferroni_correction. Actually, there is a fast way to perform the pairwise t-test procedure in R, taking the Bonferroni correction into account (for Sodium):
> pairwise.t.test(Sodium, Type) Pairwise comparisons using t tests with pooled SD data: Sodium and Type Beef Meat Meat 0.58 - Poultry 0.21 0.43 P value adjustment method: holmNote that the P-values (corrected due to multiple t-tests) are
- 0.58 between Beef and Meat
- 0.21 between Beef and Poultry
- 0.43 between Meat and Poultry
NOTE: Your results done without correction will be different!
Transferring graphics to Word or another processor, when using R console only
If you use RExcel or RCommander, this is a simple matter of cut-and-paste.
However, if you are using R console, there is an extra step: you need to store
your graph in a file. This is how this is accomplished:
> sodium.split <- split(Sodium, Type)
> boxplot(sodium.split, ylab="Sodium")
After you do this, there is a graphcs file "myboxplot.jpg" in your working directory,
which you can open and include in your documents (you can simply drop this file
onto your Word document).
The command 'jpeg("myboxplot.jpg") tells R to put the graphics in a file with the designated name "myboxplot.jpg", as a JPEG file.
The command 'dev.off()' turns off the current graphics 'device' (in this case, the JPEG file). You must do this, because this causes the file to be actually written. If you do not do this, the file will exist, but it will be empty. As we say, the graphics will be "flushed" at this moment. After you do 'dev.off()', the graphics will go again to your screen.
If you are interested in having graphics in another format, or controlling things like size of the image, please read the manual page: