ANOVA (Analysis of variance)

anova

What is Analysis of Variance (ANOVA)?

Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study.

in other words, ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups.

The Anova test 

ANOVA will tell you if there are differences among the levels of the independent variable, but not which differences are significant. ... The Tukey test runs pairwise comparisons among each of the groups, and uses a conservative error estimate to find the groups which are statistically different from one another.

The Formula for ANOVA is:

F= \frac{MST}{MSE}

where:

F=ANOVA coefficient

MST=Mean sum of squares due to treatment

MSE=Mean sum of squares due to error​

There are two main types of ANOVA:

  • one way ANOVA
  • two way ANOVA

ONE-WAY ANOVA (or unidirectional): 

A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables.

One-way ANOVA example As a crop researcher, you want to test the effect of three different fertilizer mixtures on crop yield.

You can use a one-way ANOVA to find out if there is a difference in crop yields between the three groups.

When to use a one-way ANOVA

Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable. The independent variable should have at least three levels (i.e. at least three different groups or categories).

ANOVA tells you if the dependent variable changes according to the level of the independent variable. For example:

  • Your independent variable is social media use, and you assign groups to low, medium, and high levels of social media use to find out if there is a difference in hours of sleep per night.
  • Your independent variable is brand of soda, and you collect data on Coke, Pepsi, Sprite, and Fanta to find out if there is a difference in the price per 100ml.
  • You independent variable is type of fertilizer, and you treat crop fields with mixtures 1, 2 and 3 to find out if there is a difference in crop yield.

The null hypothesis (H0) of ANOVA is that there is no difference among group means. The alternate hypothesis (Ha) is that at least one group differs significantly from the overall mean of the dependent variable.

If you only want to compare two groups, use a t-test instead.

TWO-WAY ANOVA

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two independent variables, in combination, affect a dependent variable.

Example:

You are researching which type of fertilizer and planting density produces the greatest crop yield in a field experiment. You assign different plots in a field to a combination of fertilizer type (1, 2, or 3) and planting density (1=low density, 2=high density), and measure the final crop yield in bushels per acre at harvest time.

You can use a two-way ANOVA to find out if fertilizer type and planting density have an effect on average crop yield.

When to use a two-way ANOVA

You can use a two-way ANOVA when you have collected data on a quantitative dependent variable at multiple levels of two categorical independent variables.

A quantitative variable represents amounts or counts of things. It can be divided to find a group mean.

Bushels per acre is a quantitative variable because it represents the amount of crop produced. It can be divided to find the average bushels per acre.

categorical variable represents types or categories of things. A level is an individual category within the categorical variable.

Fertilizer types 1, 2, and 3 are levels within the categorical variable fertilizer type. Planting densities 1 and 2 are levels within the categorical variable planting density.

You should have enough observations in your data set to be able to find the mean of the quantitative dependent variable at each combination of levels of the independent variables.

Both of your independent variables should be categorical. If one of your independent variables is categorical and one is quantitative, use an ANCOVA instead.

One-Way ANOVA  Versus  Two-Way ANOVA

There are two main types of ANOVA: one-way (or unidirectional) and two-way. There also variations of ANOVA. For example, MANOVA (multivariate ANOVA) differs from ANOVA as the former tests for multiple dependent variables simultaneously while the latter assesses only one dependent variable at a time. 

One-Way ANOVA

Two-Way ANOVA

One-way refers to the number of independent variables in your analysis of variance testA two-way ANOVA is an extension of the one-way ANOVA.
A one-way ANOVA evaluates the impact of a sole factor on a sole response variable.With a two-way ANOVA, there are two independents.
It determines whether all the samples are the same.a two-way ANOVA allows a company to compare worker productivity based on two independent variables, such as salary and skill set.
The one-way ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent (unrelated) groups.It is utilized to observe the interaction between the two factors and tests the effect of two factors at the same time.