You’ve acquired your longawaited data – and it looks great! There are some exciting trends in your experimental groups, but how do you determine if they deserve that ohso coveted asterisk of significance? You’ve already done the heavy lifting, so don’t let the statistical analysis of your data get you hung up. Here we will dive into the nitty gritty details of comparison statistical tests – when to use them, what they can tell you, and how to identify which one is right for you.
Statistics – The basics
Statistical analysis is what determines if there is statistical significance among data sets. Significance is a numerical value which quantifies the likelihood that a result is due to chance vs. the factor that is being experimentally manipulated. Below are several core factors of statistics relevant for many downstream statistical tests.
 Independent variables – factors which are not affected by other variables (cause).
 Dependent variables – factors which are affected by other variables (effect).
 Central tendency – mean, median, and mode give you a summary snapshot without telling you anything about outliers or data spread within those data sets.
 Measures of dispersion – standard deviation/error, variance, data ranges. how closely data points fall to each other (see Fig. 1).
 Alpha values – numerical risk allowance in a statistical test.
 Distribution  spread of data; typically, normal (bell curve) or skewed (see Fig. 2).
 Assumptions – characteristics of a data set; the ‘rules’ a statistical test assumes your data follows.
Fig. 1: Examples of comparable groups with and without equal levels of variance among data.


Fig. 2: Examples of data distribution profiles. 
Parametric tests
Statistical tests fall into two general categories: parametric and nonparametric. In this blog we will be discussing parametric tests. Parametric tests are suitable for data sets that “play by the rules” and which we have certain information about. The assumptions and requirements that need to be met for a parametric test are as follows:
Parametric test assumptions:
Distribution – Normal
Variables – Independent
Variance – Approximately equal
Data type  Quantitative
If your data doesn’t fit the requirements above, a nonparametric test may be the way to go, which we will discuss more in another blog.
Ttests
Ttests compare the mean of two groups to determine if they are statistically different. They are classic parametric tests and one of the simplest statistical tests available.
Tailed ttests
If you are solely interested in whether or not two samples are statistically different than one another, a twotailed test should be performed. This test will assess if the mean of an experimental group is significantly greater or lesser than the mean of a control group. If you only care if one group is different in a certain direction (i.e., you just want to measure if a sample is greater than a control), then a onetailed test is suitable. Of course, a onetailed test also works for exclusively lessthan comparisons.
Paired and unpaired ttests
Samples that come from the same subject or population  before and after a treatment, for example  should be analyzed with a paired ttest, because they eliminate subjecttosubject variation (the test assesses the sum of differences between pairs). Samples arising from different subjects or populations (a treatment group vs. a control group), on the other hand, should be analyzed with an unpaired ttest, which does account for subjecttosubject variation in its analysis (the test assesses the difference of the group means). Unpaired ttests can also be referred to as twosample or independent, but these designations all refer to the same test. On some occasions, there is an experimental group being compared to a standard value (not a control group). This situation is best served by a onesample test as the standard value has no variation that needs to be accounted for.
ANOVA tests
ANOVA tests are essentially ttests with room for more variables and groups. ANOVAs can test three or more groups (two samples and a control, for example) simultaneously, with up to two independent variables. This test will tell you if there are significant differences between the means of your groups. However, it will only tell you if differences exist somewhere within your data; it does not indicate which groups the differences exist between. For example, if an ANOVA analysis on a fivesample data set yields statistical significance, you won’t know if only two of those samples are different from each other, or if every permutation of pairs within the five is different. To identify significant pairings, additional tests, referred to as posthoc analysis, must be performed. These will be discussed later.
One and twoway ANOVAs
ANOVA tests can compare either one (oneway ANOVA) or two (twoway ANOVA) independent variables on a single dependent variable. A twoway ANOVA could test the effect of temperature and humidity on an outcome such as total crop production, while a oneway ANOVA can test either the effect of temperature or humidity on crop production.
MANOVA
The MANOVA is a multivariate form of ANOVA. Isn’t that what a twoway ANOVA test is you might ask? Not exactly! A MANOVA test assesses the effect of two dependent variables on up to two independent variables. For example, a oneway MANOVA could test the effect of temperature on total crop production and the production window. A twoway MANOVA could measure the effect of temperature and humidity on total crop production and the production window.
Factorial ANOVA
This type of ANOVA assesses multiple independent variables on a single dependent variable. I know what you’re thinking – this really sounds like a twoway ANOVA. And this time, it is! Factorial ANOVAs include twoway ANOVAs but can include many more than two independent variables. For example, you could test the impact of temperature, humidity, sunlight exposure time, and watering volume on total crop production (though you would not be able to tell if the dependent variables worked synergistically, antagonistically, were additive, etc.).
ANCOVA and MANCOVA
All variants of ANOVAs discussed so far assume that the independent variables are independent of each other. That is, multiple independent variables are assumed to not have any effect on each other, they have discrete effects on the dependent variable, and those variables are the only things that matter. However, this is oftentimes an oversimplification. If your independent variables are affecting each other – or if you think they may be affecting each other – then a MANCOVA or an ANCOVA can measure this effect, known as covariance. This is the “C” that differentiates these tests from ANOVA/MANOVAs. Quantifying and controlling for any covariance between manipulated independent variables allows for a more accurate measurement of the effect on an outcome. These tests can also compare how independent variables you are not directly manipulating affect an outcome (age, demographic, diet, etc., in a cancer drug clinical trial, for example).

Fig. 3: Question flowchart for choosing an ANOVA/MANOVA statistical test 
Post hoc tests
Knowing that significance exists somewhere amongst multiple groups is rarely enough information. A post hoc test, done after your initial analysis can provide information on specifically which groups are statistically different within your ANOVA/ANCOVA analysis. If you have performed a MANOVA/MANCOVA and want this information, multiple ANOVAs/ANCOVASs must be performed first, and then post hoc tests can be run on these tests.
All sample comparisons
Do you really need to know every pairwise comparison within your population? Sometimes this can’t be avoided, but if it can be, you can reduce the familywise error rate by only performing the comparisons you really care about. Familywise error is the probability of a false positive occurring when many comparisons are performed. For example, if your alpha is set at .05, then 5% of the time the statistical test will be incorrect. If you have 5 samples and perform every pairwise comparison (10 in total), your likelihood of a false positive is much higher than if you only look at 2 or 3 pairwise comparisons. Similar to familywise error, is the false discovery rate which controls how many false positives are allowed relative to true positives. Both metrics can be used to reduce false positives within a test.
The Bonferroni Correction and the BenjaminiHochberg procedure are designed to reduce false positive rates and control for large numbers of comparisons. The Bonferroni does this by establishing a significance threshold at a p value less than or equal to the alpha value divided by the number of null hypotheses. Benjamini uses a complementary approach which identifies a p value cut off. This method entails arranging all calculated p values from smallest to largest and ranking them (1 being the smallest value). It calculates the cut off for significance by finding the largest p value where its rank divided by the total number of comparisons performed multiplied by the alpha value is less than or equal to the p value. There are drawbacks to these tests as they can be quite stringent if many samples are assessed. However, if many comparisons need to be performed, Bonferroni and Benjamini will greatly reduce familywise error and the false discovery rate, respectively.
Many other tests exist to identify significant pairwise combinations – Tukey’s test, NewmanKeuls, Duncan’s multiple range test, and several others. Each test has different likelihoods of returning false positives/negatives, but they all generally yield a similar result and are appropriate to use if all sample comparison is needed, without a high number of samples (~5).
Tukey’s test – Most conservative test. Higher false negatives, lower false positives.
NewmanKeuls method – More powerful. Moderate false negatives, moderate false positives.
Duncan’s multiple range test – Most powerful. Low false negatives, high false positives.
Control vs. all samples
Oftentimes there exists a control group and multiple experimental groups, with the only comparisons that matter being between control vs. experimental. In this case, save yourself from familywise error and just perform the relevant comparisons using Dunnett’s test, a test that compares all groups to a single group. Dunnett’s is more powerful and has a lower error rate than the multiple comparison tests described above.
Violated assumptions and statistics
Statistical power, the ability to correctly identify true positives within a test, depends on the input data ‘following the rules’ of that test. All statistical tests have assumptions, many of which have been described in this blog. If your data does not meet the assumptions for the tests described above (normal distribution, equal variance, and independent variables), it’s best to find a test that does. If you perform a statistical test when assumptions are broken, you run the risk of higher error, both false positives and negatives (or, as the mathematicians say, Type I and Type II errors) and can lose a significant amount of power.
Not sure if your data meets the assumptions of parametric tests? If you aren’t sure if your variance is equal or unequal, an Ftest of equality of variance can determine this for you. Ftests measure the variance between samples, and determine if there is a statistically significant difference in that variance. If you want to know type of distribution you are working with, graphing data with a histogram is sometimes sufficient to determine normality (or lack thereof). Alternatively, a Quantile vs. Quantile plot can determine if your data deviates from the norm. These plots essentially sort and plot your data in ascending numerical value relative to quantiles derived from a theoretical distribution. If your data is normally distributed, it will form (more or less) a straight line.
If your data does not fit parametric assumptions, then a nonparametric test is a better option. Stay tuned for our upcoming blog on nonparametric statistical tests!
Adding it all up
There are many, many statistical tests beyond the ones detailed in this article. These are just some of the most common parametric comparison tests along with some general statistics knowledge to have in your tool belt. To analyze your hardearned data in the most effective and accurate way possible it’s important to select the right test for the job (and, uh, gentle reminder: keep your friendly neighborhood biostatistician happy by making your statistical analysis plan when you design your experiment, instead of after you’ve already run it.) You don’t need to be a mathematician to select the right statistical test, just do your homework and happy testing!
References
Wissing, Dennis R., and Timm, Donna. Statistics for the nonstatistician: Part I. South Med J., 105(3): 126130 (2012). 10.1097/SMJ.0b013e3182498ad5
CurranEverett, Douglas. Explorations in statistics: statistical facets of reproducibility. Adv Physiol Educ. 40(2): 248252 (2016). 0.1152/advan.00042.2016
Finch, W. Holmes. Comparison of Multivariate Means across Groups with Ordinal Dependent Variables: A Monte Carlo Simulation Study. Front. Appl. Math. Stat. Volume 2 (2016). doi.org/10.3389/fams.2016.00002
Topics: Education
Leave a Comment