By Sammy Ngugi

**R-exercises**, and kindly contributed to R-bloggers)

When we are interested in finding if there is a statistical difference in the mean of two groups we use the t test. When we have more than two groups we cannot use the t test, instead we have to use analysis of variance (ANOVA). In one way ANOVA we have one continuous dependent variable and one independent grouping variable or factor. When we have two groups the t test and one way ANOVA are equivalent.

For our one way ANOVA results to be valid there are several assumptions that need to be satisfied. These assumptions are listed below.

- The dependent variable is required to be continuous
- The independent variable is required to be categorical with or more categories.
- The dependent and independent variables have values for each row of data.
- Observations in each group are independent.
- The dependent variable is approximately normally distributed in each group.
- There is approximate equality of variance in all the groups.
- We should not have any outliers

When our data shows non-normality, unequal variance or presence of outliers you can transform your data or use a non-parametric test like Kruskal-Wallis. It is good to note Kruskal-Wallis does not require normality of data but still requires equal variance in your groups.

For this exercise we will use data on patients having stomach, colon, ovary, brochus, or breast cancer. The objective of the study was to identify if the number of days a patient survived was influenced by the organ affected. Our dependent variable is Survival measured in days. Our independent variable is Organ. The data is available here http://lib.stat.cmu.edu/DASL/Datafiles/CancerSurvival.html and a cancer-survival file has been uploaded

Solutions to these exercises can be found here

Exercise 1

Load the data into R

Exercise 2

Create summary statistics for each organ

Exercise 3

Check if we have any outliers using boxplot

Exercise 4

Check for normality using Shapiro.wilk test

Exercise 5

Check for equality of variance

Exercise 6

Transform your data and check for normality and equality of variance.

Exercise 7

Run one way ANOVA test

Exercise 8

Perform a Tukey HSD post hoc test

Exercise 9

Interpret results

Exercise 10

Use a Kruskal-Wallis test

**leave a comment**for the author, please follow the link and comment on their blog:

**R-exercises**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…

Source:: R News