What statistical test would be used with interval or ration data with multiple dependent variables?

Choosing a statistical test can be a daunting task for those starting out in the analysis of experiments. This chapter provides a table of tests and models covered in this book, as well as some general advice for approaching the analysis of your data.

Plan your experimental design before you collect data

It is important to have an experimental design planned out before you start collecting data, and to have some an idea of how you plan on analyzing the data. One of the most common mistakes people make in doing research is collecting a bunch of data without having thought through what questions they are trying to answer, what specific hypotheses they want to test, and what statistical tests they can use to test these hypotheses.

What is the hypothesis?

The most important consideration in choosing a statistical test is determining what hypothesis you want to test. Or, more generally, what question are you are trying to answer.

Often people have a notion about the purpose of the research they are conducting, but haven’t formulated a specific hypothesis. It is possible to begin with exploratory data analysis, to see what interesting secrets the data wish to say. But ultimately, choosing a statistical test relies on having in mind a specific hypothesis to test.

For example, we may know that our goal is to determine if one curriculum works better than another. But then we must be more specific in our hypothesis. Perhaps we wish to compare the mean of scores that students get on an exam across the different curricula. Then a specific null hypothesis is, There is no difference among the mean of student scores across curricula.

In this example, we identified the dependent variable as Student scores, and the independent variable as Curriculum.

Of course, we might make things more complicated. For example, if the curricula were used in different classrooms, we might want to include Classroom as an independent blocking variable.

What number and type of variables do you have?

To a large extent, the appropriate statistical test for your data will depend upon the number and types of variables you wish to include in the analysis.

Consider the type of dependent variable you wish to include.

• If it is of interval/ratio type, you can consider parametric tests or nonparametric tests.

• However, if it is an ordinal variable, you would look toward ordinal regression models, permutation tests, nonparametric tests, or tests for ordinal tables.

• Nominal variables arranged in contingency tables can be analyzed with chi-square and similar tests. Nominal dependent variables can be related to independent variables with logistic regression.

• Count data dependent variables can be related to independent variables with Poisson regression and related models. If the dependent variable is a proportion or percentage, beta regression might be appropriate.

The number and type of independent variables will also be taken into account. As will whether there are paired observations or random blocking variables.

The table below lists the tests in this book according to their number and types of variables.

Note that each test has its own set of assumptions for appropriate data, which should be assessed before proceeding with the analysis.

Also note that the tests in this book cover cases with a single dependent variable only. There are other statistical tests, included under the umbrella of multivariate statistics that can analyze multiple dependent variables simultaneously. These include multivariate analysis of variance [MANOVA], canonical correlation, and discriminant function analysis.

The “References” and “Optional readings” sections of this chapter includes a few other guides to choosing statistical tests.

Test	DV type, or variable type when there is no DV	DV	IV type	Number of IV	Levels in IV	Test type
One-sample Wilcoxon	Ordinal or interval/ratio	Independent	Single default value	N/A	N/A	Nonparametric
Sign test for one-sample	Ordinal or interval/ratio	Independent	Single default value	N/A	N/A	Nonparametric
Two-sample Mann–Whitney	Ordinal or interval/ratio	Independent	Nominal	1	2	Nonparametric
Mood’s median test for two-sample	Ordinal or interval/ratio	Independent	Nominal	1	2	Nonparametric
Two-sample paired rank-sum	Ordinal or interval/ratio	Paired	Nominal	1, or 2 when one is blocking	2	Nonparametric
Sign test for two-sample paired	Ordinal or interval/ratio	Paired	Nominal	1, or 2 when one is blocking	2	Nonparametric
Kruskal–Wallis	Ordinal or interval/ratio	Independent	Nominal	1	2 or more	Nonparametric
Mood’s median	Ordinal or interval/ratio	Independent	Nominal	1	2 or more	Nonparametric
Friedman	Ordinal or interval/ratio	Independent blocked, or paired	Nominal	2 when one is blocking, in unreplicated complete block design	2 or more	Nonparametric
Quade	Ordinal or interval/ratio	Independent blocked, or paired	Nominal	2 when one is blocking, in unreplicated complete block design	2 or more	Nonparametric
One-way Permutation Test of Independence	Ordinal or interval/ratio	Independent	Nominal	1	2 or more	Permutation
One-way Permutation Test of Symmetry	Ordinal or interval/ratio	Independent blocked, or paired	Nominal	2 when one is blocking	2 or more	Permutation
Two-sample CLM	Ordinal	Independent	Nominal	1	2	Ordinal regression
Two-sample paired CLMM	Ordinal	Paired	Nominal	2 when one is blocking	2	Ordinal regression
One-way ordinal Regression CLM	Ordinal	Independent	Nominal	1	2 or more	Ordinal regression
One-way repeated ordinal regression CLMM	Ordinal	Independent	Nominal	2 when one is blocking	2 or more	Ordinal regression
Two-way ordinal regression CLM	Ordinal	Independent	Nominal	2	2 or more	Ordinal regression
Two-way repeated ordinal regression CLMM	Ordinal	Independent	Nominal	3 when one is blocking	2 or more	Ordinal regression
Goodness-of-fit tests for nominal variables • binomial test • multinomial test • G-test goodness-of-fit • Chi-square test goodness-of-fit	Nominal	Independent	Expected counts	N/A	Overall: vector of counts and expected proportions	Nominal
Association tests for nominal variables • Fisher exact test of association • G-test of association • Chi-square test of association	Nominal	Independent	Nominal	N/A	Overall: 2-way contingency table	Nominal
Tests for paired nominal data • McNemar • McNemar–Bowker	Nominal	Paired	Nominal	N/A	Overall: 2-way marginal contingency table	Nominal
Cochran–Mantel–Haenszel	Nominal	Independent	Nominal	N/A	Overall: 3-way contingency table	Nominal
Cochran’s Q	Nominal [2 levels only]	Paired	Nominal	2 when one is blocking	2 or more	Nominal
Linear-by-linear	Ordered nominal [ordinal]	Independent	Ordered nominal [ordinal]	N/A	Overall: 2-way or 3-way contingency table	Nominal
Cochran–Armitage [extended]	Ordered nominal [ordinal]	Independent	Nominal	N/A	Overall: 2-way or 3-way contingency table	Nominal
Log-linear model [multiway frequency analysis]	Nominal	Independent	Nominal	N/A	Overall: contingency table with 2 or dimensions	Generalized linear model
Logistic regression [standard]	Nominal with 2 levels	Independent	Interval/ratio or nominal	1 or more	2 or more	Generalized linear model
Multinomial logistic regression	Nominal with 2 or more levels	Independent	Interval/ratio or nominal	1 or more	2 or more	Generalized linear model
Mixed-effects logistic regression	Nominal with 2 levels	Independent or paired	Interval/ratio or nominal	1 or more when one is blocking or random	2 or more	Generalized linear model
One-sample t-test	Interval/ratio	Independent	Single default value	N/A	N/A	Parametric
Two-sample t-test	Interval/ratio	Independent	Nominal	1	2	Parametric
Paired t-test	Interval/ratio	Paired	Nominal	1, or 2 when one is blocking	2	Parametric
One-way ANOVA	Interval/ratio	Independent	Nominal	1	2 or more	Parametric
One-way ANOVA with blocks	Interval/ratio	Independent	Nominal	2 when one is blocking	2 or more	Parametric
One-way ANOVA with random blocks	Interval/ratio	Independent	Nominal	2 when one is blocking	2 or more	Parametric
Two-way ANOVA	Interval/ratio	Independent	Nominal	2	2 or more	Parametric
Repeated measures ANOVA	Interval/ratio	Paired across time	Nominal	2 or more when one is time effect	2 or more	Parametric
Multiple correlation	Interval/ratio or ordinal, depending on type selected	Independent	Interval/ratio or ordinal, depending on type selected	1 or more	Overall: multiple vectors of interval/ratio or ordinal data	Parametric or nonparametric depending on type selected
Pearson correlation	Interval/ratio	Independent	Interval/ratio	1	Overall: two vectors of interval/ratio data	Parametric
Kendall correlation	Interval/ratio or ordinal	Independent	Interval/ratio or ordinal	1	Overall: two vectors of interval/ratio or ordinal data	Nonparametric
Spearman correlation	Interval/ratio or ordinal	Independent	Interval/ratio or ordinal	1	Overall: two vectors of interval/ratio or ordinal data	Nonparametric
Linear regression	Interval/ratio	Independent	Interval/ratio	1	N/A	Parametric
Polynomial regression	Interval/ratio	Independent	Interval/ratio	2 or more that are polynomial terms	N/A	Parametric
Nonlinear regression and curvilinear regression	Interval/ratio	Independent	Interval/ratio	1	N/A	Parametric
Multiple regression	Interval/ratio	Independent	Interval/ratio	2 or more	N/A	Parametric
Robust linear regression	Interval/ratio	Independent	Interval/ratio	1	N/A	Robust parametric
Kendall–Theil regression	Interval/ratio	Independent	Interval/ratio	1	N/A	Nonparametric
Linear plateau and quadratic plateau models	Interval/ratio	Independent	Interval/ratio	1	N/A	Parametric
Cate–Nelson analysis	Interval/ratio	Independent	Interval/ratio	1	N/A	Mostly nonparametric
Poisson and related regression • Hermite regression • Poisson regression • Negative binomial regression • Zero-inflated regression	Count	Independent	Interval/ratio or nominal	1 or more	2 or more	Generalized linear model
Beta regression	Proportion or percentage	Independent	Interval/ratio or nominal	1 or more	2 or more	Generalized linear model

Optional discussion: Sometimes it’s all about the hypothesis

Tests that have analogous purposes, like comparing a measurement variable across two groups, may test very different hypotheses.

For example, imagine you are investigating the income of two towns. Let’s say the income of Town A is normally distributed about a mean and median of $48,000. The income of Town B has a similar median, but has right skew, with some observations close to $1 million.

What test or statistic would you use to compare the income of these two towns?

You might be tempted to compare the means of the two towns with a t-test. In this case, however, means may not be the best statistic for skewed data, and this data may not meet the assumptions of the t-test.

You might be interested in comparing the median of the income of the two towns, for example with Mood’s median test. This might make sense for some regulatory purpose that is concerned with medians.

On the other hand, looking for a systemic change in the income across the two towns may make more sense. For example, the higher incomes in Town B may give the town a different character, for example, some streets with larger homes or upscale stores. For this, you might use the Mann–Whitney test.

Another approach is to use a permutation test.

Or you might compare the overall distributions of incomes for the two towns using the Kolmogorov–Smirnov test.

Finally, you might want to compare at the 75th percentile of income for the two towns. This could be done using quantile regression.

Example

The following code compares some of these results for a hypothetical data set of income in two towns.

Note that the assumptions and pitfalls of these tests are not discussed here, but should be considered in real situations.

### load required packages

if[!require[FSA]]{install.packages["FSA"]}
if[!require[psych]]{install.packages["psych"]}
if[!require[RVAideMemoire]]{install.packages["RVAideMemoire"]}
if[!require[coin]]{install.packages["coin"]}
if[!require[quantreg]]{install.packages["quantreg"]}

### Read the data frame

TwoTowns = read.table["//rcompanion.org/documents/TwoTowns.csv",
header=TRUE, sep=","]

### Check the data frame

library[psych]

headTail[TwoTowns]

summary[TwoTowns]

### Summarize the data

library[FSA]

Summarize[Income ~ Town,
data=TwoTowns,
digits=3]

Town n mean sd min Q1 median Q3 max
1 Town.A 101 48146.43 10851.67 23560 40970 48010 56420 77770
2 Town.B 101 115275.22 163878.17 29050 34140 47220 108200 880000

boxplot[Income ~ Town,
data=TwoTowns]

### Mood’s median test

library[RVAideMemoire]

mood.medtest[Income ~ Town,
data = TwoTowns]

Mood's median test

X-squared = 0, df = 1, p-value = 1

### Mann–Whitney test

wilcox.test[Income ~ Town,
data=TwoTowns]

Wilcoxon rank sum test with continuity correction

W = 4672, p-value = 0.3029

alternative hypothesis: true location shift is not equal to 0

### Permutation test

library[coin]

independence_test[Income ~ Town,
data = TwoTowns]

Asymptotic General Independence Test

Z = -3.9545, p-value = 7.669e-05

### Kolmogorov–Smirnov test

library[FSA]

ksTest[Income ~ Town,
data = TwoTowns]

Two-sample Kolmogorov-Smirnov test

D = 0.35644, p-value = 5.349e-06

### quantile regression considering the 75th percentile

library[quantreg]

model.q = rq[Income ~ Town,
data = TwoTowns,
tau = 0.75]

model.null = rq[Income ~ 1,
data = TwoTowns,
tau = 0.75]

anova[model.q, model.null]

Quantile Regression Analysis of Deviance Table

Df Resid Df F value Pr[>F]
1 1 200 5.7342 0.01756 *

References

[IDRE] Institute for Digital Research and Education. 2015. What statistical analysis should I use? UCLA. stats.idre.ucla.edu/other/mult-pkg/whatstat/.

“Choosing a statistical test” in McDonald, J.H. 2014. Handbook of Biological Statistics. www.biostathandbook.com/testchoice.html.

Optional readings

[Video] “Choosing which statistical test to use” from Statistics Learning Center [Dr. Nic]. 2014. www.youtube.com/watch?v=rulIUAN0U3w.

What statistical test would be used with interval or ration data with?

To determine if incremental changes in one interval/ratio variable has an impact on a dependent variable that is interval/ratio a Pearson's “r” correlation test would be used. 4. To determine if one [linear or more [multiple] have a significant influence on a dependent variable regression test would be used.

What type of test has multiple dependent variables?

Multivariate ANOVA [MANOVA] extends the capabilities of analysis of variance [ANOVA] by assessing multiple dependent variables simultaneously. ANOVA statistically tests the differences between three or more group means.

What statistical test do you use for ratio data?

With a normal distribution of ratio data, parametric tests are best for testing hypotheses. Parametric tests are more powerful than non-parametric tests and let you make stronger conclusions regarding your data.

What test is used in situations where the dependent variable is at the interval or ratio level and believed to be normally distributed?

In ANOVA, the dependent variable must be a continuous [interval or ratio] level of measurement. The independent variables in ANOVA must be categorical [nominal or ordinal] variables. Like the t-test, ANOVA is also a parametric test and has some assumptions. ANOVA assumes that the data is normally distributed.