Analysis of Variance (AOV or ANOVA)

AOV is a form of data analysis that can be understood in terms of three essential components:

  • formulation of an additive model, i.e., one that adds up “effects” on some observed trait that are associated with moving from one genetically defined variety to the next, or from one defined location or situation to the next, or both.
  • estimation of the effects that minimize the variance of the “residuals” for the specific model used
  • comparison of the variances of the different effects to assess their relative sizes

Additive model

Consider, for example, an agricultural crop trial in which a trait, say, yield, is observed for a number of cultivated varieties (cultivars) in one or more replicates in each of a number of locations:

Deviation from the overall mean yield of the yield of cultivar i grown in replicate k of location j
= cultivar i effect + location j effect + interaction effect for cultivar-location combination i,j + residual

Estimation of effects

When this model is expressed in terms of the effects that minimize the variance of the residuals, it becomes:
Deviation from the overall mean yield of the yield for cultivar i grown in replicate k of location j

= the mean of cultivar i over all locations and replicates – overall mean

+ mean in location j of all replicates of all cultivars – overall mean

+ mean over all replicates of cultivar i grown in location j – the sum of the above – overall mean

+ residual

Variance of yield

= variance of cultivar effects

+ variance of location effects

+ variance of cultivar-location interaction effects

+ variance of residuals

Comparison of the variances

The last formulation indicates why the term Analysis of Variance is used to name the process of assessing the relative sizes of the associations among the trait (here, yield) and the effects in an additive model. It is possible to give an assessment of whether moving among genetically defined varieties is associated with larger shifts in the observed trait than moving among the defined location/situations.

Agricultural Crop Trials vs. Human Variation

The AOV (and many other statistical techniques) were originally developed in agricultural research where it is possible to replicate genetic types in many defined locations or situations, and even control some of the environmental factors. For humans a genetic type is replicated twice in the case of identical twins and even when twins are raised in the same household they do not necessarily experience identical environmental factors. Nor is it easy to define or control what those environmental factors are. Nevertheless, using techniques based on Path Analysis, it is possible to get some handle on the effects and variances related to an observed trait by paying comparing relatives of different degrees of genetic relatedness (e.g., monozygotic vs. dizygotic twins) and relatedness of the situation in which they are raised (e.g., separated at birth).

Most analyses of human variation are specific to a given location. The term “location” need not be taken literally; it can refer to distinguishable situations of many kinds, e.g., the experience of membership in different racial groups should be analyzed as different locations. This is not to say that environmental factors are disjunct for the different racial groups, but only that replications in observational studies of humans are always within, not across, racial groups. Even if a child identified as African-American is raised in a white family, the experience of membership in the African-American group is not the same as the experience of membership in the white group of white children in the family.

(Original page by pjt)
referring page