What is the degree of correlation among independent variables in a regression model called?

What is the degree of correlation among independent variables in a regression model called?

Get the answer to your homework problem.

Try Numerade free for 7 days

We don’t have your requested question, but here is a suggested video that might help.

What is the coefficient of determination for two variables that have perfect positive linear correlation or perfect negative linear correlation? Interpret your answer.

What is the degree of correlation among independent variables in a regression model called?

Discussion

You must be signed in to discuss.

Video Transcript

If we have two variables that are perfectly linearly correlated either positively or negatively, then the coefficient of determination Will always be one. Let me say that again. We have two variables that have perfect linear correlation perfect positive or perfect negative linear correlation. Then the coefficient of determination R squared will always be one, meaning that, yeah, each variable Either of the two Explains 100 have the variation in the other variable.

Correlated Chronometric and Psychometric Variables

Arthur R. Jensen, in Clocking the Mind, 2006

Multiple Correlation

A multiple correlation coefficient (R) yields the maximum degree of liner relationship that can be obtained between two or more independent variables and a single dependent variable. (R is never signed as + or −. R2 represents the proportion of the total variance in the dependent variable that can be accounted for by the independent variables.) The independent variables are each optimally weighted such that their composite will have the largest possible correlation with the dependent variable. Because the determination of these weights (beta coefficients) is, like any statistic, always affected (the R is always inflated) by sampling error, the multiple R is properly “shrunken” so as to correct for the bias owing to sampling error. Shrinkage of R is based on the number of independent variables and the sample size. When the number of independent variables is small (<10) and the sample size is large (>100), the shrinkage procedure has a negligible effect. Also, the correlations among the independent variables that go into the calculation of R can be corrected for attenuation (measurement error), which increases R. Furthermore, R can be corrected for restriction of the range of ability in the particular sample when its variance on the variables entering into R is significantly different from the population variance, assuming it is adequately estimated. Correction of correlations for restriction of range is frequently used in studies based on students in selective colleges, because they typically represent only the upper half of the IQ distribution in the general population.

Two examples of the multiple R between several RT variables and a single “IQ” score are given below. To insure a sharp distinction between RTs based on very simple ECTs and timed scores on conventional PTs, the following examples were selected to exclude any ECTs on which the mean RTs are greater than 1 s for normal adults or 2 s for young children. Obviously, not much cogitation can occur in this little time.

The simplest example is the Hick paradigm. Jensen (1987a) obtained values of R in large samples, where the independent variables are various parameters of RT and MT derived from Hick data, viz. mean RT, RTSD, the intercept and slope of the regression of RT on bits, and mean MT.

Without corrections for attenuation and restriction of range in the samples, R=.35; with both of these corrections, R=.50. This is the best estimate we have of the population value of the largest correlation that can be obtained between a combination of variables obtained from the Hick parameters and IQ as measured by one or another single test, most often the Raven matrices.

Vernon (1988) analyzed four independent studies totaling 702 subjects. Each study used a wide variety of six to eight ETCs that were generally more complex and far more heterogeneous in their processing demands than the much simpler and more homogeneous Hick task. The average value of the multiple R (shrunken but not corrected for restriction of range) relating RT and IQ was .61, and for RTSD and IQ R was .60. For RT and RTSD combined, R was .66.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080449395500100

Applying the Tools to Multivariate Data

J. Douglas Carroll, Paul E. Green, in Mathematical Tools for Applied Multivariate Analysis, 1997

6.2.2 Strength of Overall Relationship and Statistical Significance

The squared multiple correlation coefficient is R2, and this measures the portion of variance in Y (as measured about its mean) that is accounted for by variation in X1 and X2. As mentioned in Chapter 1, the formula is

R2=1−∑i=112ei2∑i=112(Yi−Y¯) 2R2=1−34.099354.25=0.904

The statistical significance of R, the positive square root of R2, is tested via the analysis of variance subtable of Table 6.2 by means of the F ratio:

F=42.25

which, with 2 and 9 degrees of freedom, is highly significant at the α = 0.01 level. Thus, as described in Chapter 1, the equivalent null hypotheses

Rp=0β1=β2=0

are rejected at the 0.01 level, and we conclude that the multiple correlation is significant.

Up to this point, then, we have established the estimating equation and measured, via R2, the strength of the overall relationship between Y versus X1 and X2.

If we look at the equation again

Y^i=−2.263+1.550Xi1−0.239X i2

we see that the intercept is negative. In terms of the current problem, a negative 2.263 days of absenteeism is impossible, illustrating, of course, the possible meaninglessness of extrapolation beyond the range of the predictor variables used in developing the parameter values.

The partial regression coefficient for X1 seems reasonable; it says that predicted absenteeism increases 1.55 days per unit increase in attitude rating. This is in accord with the scatter plot (Fig. 1.2) that shows the association of Y with X1 alone.

The partial regression coefficient for X2, while small in absolute value, is negative, even though the scatter plot of Y on X2 alone (Fig. 1.2) shows a positive relationship. The key to this seeming contradiction lies in the strong positive relationship between the predictors X1 and X2 (also noted in the scatter plot of Fig. 1.2). Indeed, the correlation between X1 and X2 is 0.95. The upshot of all of this is that once X1 is in the equation, X2 is so redundant with X1 that its inclusion leads to a negative partial regression coefficient that effectively is zero (given its large standard error).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012160954250007X

Correlation

Milan Meloun, Jiří Militký, in Statistical Data Analysis, 2011

Problem 7.13 Significance of the relationship between the nitrogen content in soil and in corn

In Problem 7.6 the multiple correlation coefficient expressing the relationship between the nitrogen content in com and a linear combination of organically bound nitrogen and inorganically bound nitrogen in soil is equal toR^1232=0.6945. Examine the null hypothesis H0: R1(2,3) = 0.

Solution: According to Eq. (7.58), the test criterionFR=18−30.694523−11−0.69452=6.988 is higher than the quantile of the Fisher-Snedecor distribution F0.95(2, 15) = 3.682, and therefore the null hypothesis H0: Rl(2,3) = 0 is rejected at significance level α = 0.05.

Conclusion: The content of nitrogen in soil significantly affects the content of nitrogen in com. Inorganically bound nitrogen contributes predominantly.

Case Rm > 0: To calculate the sample multiple correlation coefficient,R^m2 the complicated exact expression or a convenient approximation may be used. Gurland [6] has proposed a relatively precise approximation

(7.62)R^m21−R^m2≈n−1R^m21− R^m2+m−1n−m×Fr,n−m

where the quantity Fr,n–m has the F-distribution with r and (n – m) degrees of freedom. Then

(7.62a)r=Kn−1+m−1/Z

where

(7.62b)Z=n−1KK+2+m−1n− 1K+m−1

(7.62c)andK=R^m21−R^m2

For large sample sizes, the square of the multiple correlation coefficient reaches approximately a normal distribution with the mean valueER^m2=Rm2 and varianceDR^m2 =4Rm21−Rm22 n−1. The random variable

(7.63)uR=n−1 R^m2−Rm22Rm1−Rm2

has the normalized normal distribution. Also, the Fisher and other transformations for speeding up convergence to normality can be used.

For the mean value of the squared multiple correlation coefficient

(7.64)ER^m2 =Rm2+m−1n−11−Rm2−2n−mn2−1Rm21−Rm2+⋯

The variance is given by

(7.65)DR^m2=4Rm21−Rm22n −m2n2−1n+3≈4Rm21−Rm22n

For smaller sample sizes, the estimateR^m2 is overestimated. The corrected multiple correlation coefficient is expressed by

(7.66)R^m*2=R^m2−m−3n−m1−R^m2−2n−3n−m21−R^m2+⋯

It can be seen thatR^m*2. For small values ofR^m2, the correctedR^ m*2 can be even be negative and therefore it should be restricted to the interval [0,1].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780857091093500078

Reduction of Dimensionality

Zhidong Bai, P.R. Krishnaiah, in Encyclopedia of Physical Science and Technology (Third Edition), 2003

IX Tests for Rank of Canonical Correlation Matrix

It is known that the multiple correlation coefficient is the maximum correlation between a variable and linear combinations of a set of variables. Hotelling (1935, 1936) generalized this concept to two sets of variables x1′ : 1 × p1 and x2′ : 1 × p2 and introduced canonical correlation analysis. Canonical correlation analysis is useful in studying the relationship between the two sets of variables. Let the covariance matrix of x′ = (x1′, x2′) be Σ, where

(89)Σ=(Σ11Σ12Σ21 Σ22)

and Σii : pi × pi is the covariance matrix of xi. Then Σ−111Σ12Σ−122Σ21 is known to be the canonical correlation matrix. Without loss of generality, we assume that p1 < p2 and ρ12≥⋯≥ρp12 denote the eigenvalues of Σ−111Σ12Σ−122Σ21. Here ρ1,⋯,ρp1 are known as canonical correlations where ρi is the positive square root of ρ2i. Now let α;i and βi denote the eigenvectors of Σ−111Σ12Σ−122Σ21 and Σ−122Σ21Σ−111Σ12, respectively, corresponding to ρi2. Then αi′ x1,⋯,αp1′x1 and βi′x2,⋯,βp1′x2 are known as canonical variables. One of the important problems in the area of canonical correlation analysis is to find the number of canonical correlations that are significantly different from zero. In this section, we discuss some procedures for testing the hypothesis on the rank of the canonical correlation matrix when the underlying distribution is multivariate normal.

Let X : n × p be a random matrix such that E(X) = 0 and E(X′X) = nΣ. Also let

(90)S=X′X=(S11S12S21S22)

where Sij is of order pi × pj. In addition, let r12≥⋯ ≥rp12 denote the eigenvalues of S−111S12S−122S21. Then r1,⋯,rp1 are known as the sample canonical correlations where ri is the positive square root of ri2. Various functions of r12,⋯ ,rp12 were proposed in the literature as test statistics for determination of the rank of the canonical correlation matrix. We review these procedures in this section.

We first assume that the rows of X are imnd. In this case, Bartlett (1948) proposed a procedure for testing the hypothesis Ht, where Ht denotes ρt+12=⋯=ρp 12=0; he also derived the asymptotic distribution of the preceding statistic. Fujikoshi (1974) showed that the foregoing test statistic is the LRT statistic. Hsu (1941b) derived the asymptotic joint distribution of the sample canonical correlations when Ht is true. When the population canonical correlations ρ1,⋯,ρp1 have multiplicities, and none of them is equal to zero, Fujikoshi (1978) derived the nonnull distribution of a single function of the sample canonical correlations, whereas Krishnaiah and Lee (1979) derived the asymptotic joint distribution of functions of the sample canonical correlations. The expressions derived by Krishnaiah and Lee involve multivariate normal density and multivariate Hermite polynomials. When the underlying distribution is not multivariate normal, Fang and Krishnaiah (1981, 1982) obtained results analogous to those obtained in the paper of Krishnaiah and Lee.

Now, let the joint distribution of the elements of X be elliptically symmetric, with density given by

(91)f(X)=|Σ|−n/2 h(trΣ−1X′X)

Then, Krishnaiah, Lin, and Wang (1985a) showed that the LRT statistic for testing the hypothesis ρt+1=⋯=ρp1=0 is given by

(92)L( k)=∏j=t+1p1(1−rj2)n/2.

So the LRT statistic is the same as when the underlying distribution is multivariate normal. They also noted that the distribution of any function of r12,⋯,rp12 is independent of the form of the underlying distribution as long as the underlying distribution belongs to the family of elliptical distributions. We now review some of the work reported in the literature on canonical correlation analysis when it is assumed that the observations are iiesd. with the common density

(93)f(x)=|Σ|− 1/2h(x′Σ−1x).

Now, let

(94)ci=n(ri2−ρi2)2 ρi2(1−ρi2).

Then Muirhead and Waternaux (1980) showed that c 1,…,cp1 are asymptotically distributed independently as normal, with mean 0 and variance (κ + 1) when ρ12,…,ρp12 are distinct. This is a special case of a result of Fang and Krishnaiah (1981, 1982). Krishnaiah, Lin, and Wang (1985a) derived the asymptotic joint distribution of the sample canonical correlations when the population canonical correlations have multiplicities and the last few population canonical correlations are zero. In particular, they showed that the joint asymptotic distribution of (nrs+12/k+1),…,(nrp12/k+1), when Hs:ρs+12=⋯=ρp12=0, is the same as the joint distribution of the eigenvalues of the central Wishart matrix Wp1−s with (p2 − s) degrees of freedom and E(Wp1−s)=(p2−s)Ip1−s. This result is useful in the implementation of certain test procedures for Hs when the sample size is large. For example, we can use r2s + 1 or (rs+12+⋯+r p12) as a test statistic for Hs.

We now discuss the problem of testing for the rank of the canonical correlation matrix under the correlated multivariate regression equations (CMRE) model considered by Kariya, Fujikoshi, and Krishnaiah (1984). Consider the CMRE model,

(95)Yi=Xiθi+Ei,

for i = 1, 2. In this model, the rows of (E1, E2) are imnd. with mean vector 0 and covariance matrix Σ, where

(96)Σ=(Σ11Σ12Σ21Σ22 )

and Σij is of order pi × pj. Also, xi : n × ri is the design matrix and θii : ri × pi is the matrix of unknown parameters for i = 1, 2. Without loss of generality, we assume that p1 ≤ p2. Now, let

(97)S=(S11S12S21S22)

where Sij = Yi′QiQjYj and Qi = I − xi(xi′xi)−1xi′. Also, let R = S11−1S12S−122S21. Kariya, Fujikoshi, and Krishnaiah (1984) investigated the problem of testing the hypothesis that ρt2=⋯=ρp12=0. They also derived the asymptotic distributions of three statistics in the null case and under local alternatives. We can test the hypothesis that ρt2=⋯=ρp12=0 by considering suitable functions of rt2,⋯,rp 12, such as rt2,rt2+⋯+rp 12, and so on, where r12≥⋯≥rp12 are the eigenvalues of the sample canonical correlation matrix S11−1S12S−122S21.

For an application of the canonical correlation analysis in econometrics, the reader is referred to Hannan (1967) and Chow and Ray-Chowdhuri (1967).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B012227410500466X

Multiple Regression

Gary Smith, in Essential Statistics, Regression, and Econometrics, 2012

The Coefficient of Determination, R2

As with the simple regression model, the model's predictive accuracy can be gauged by the multiple correlation coefficient R or the coefficient of determination, R2, which compares the sum of squared prediction errors to the sum of squared deviations of Y about its mean:

(10.7)R2=1− ∑(Y−Y^)2∑(Y−Y¯)2

In our consumption model, R2 = 0.999, indicating that our multiple regression model explains an impressive 99.9 percent of the variation in the annual consumption.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123822215000106

Multiple Regression

Donna L. Mohr, ... Rudolf J. Freund, in Statistical Methods (Fourth Edition), 2022

Concept Questions

1.

Given that SSR=50 and SSE=100, calculate R2.

2.

The multiple correlation coefficient can be calculated as the simple correlation between________ and________.

3.(a)

What value of R2 is required so that a regression with five independent variables is significant if there are 30 observations? [Hint: Use the 0.05 critical value for F(5,24)].

(b)

Answer part (a) if there are 500 observations.

(c)

What do these results tell us about the R 2 statistic?

4.

If x is the number of inches and y is the number of pounds, what is the unit of measure of the regression coefficient?

5.

What is the common feature of most “influence” statistics?

6.

Under what conditions is least squares not the best method for estimating regression coefficients?

7.

What is the interpretation of the regression coefficient when using logarithms of all variables?

8.

What is the basic principle underlying inferences on partial regression coefficients?

9.

Why is multicollinearity a problem?

10.

List some reasons why variable selection is not always an appropriate remedial method when multicollinearity exists.

11.

________ (True/False) When all VIF are less than 10, then multicollinearity is not a problem.

12.

________ (True/False) The adjusted R-square attempts to balance good fit against model complexity.

13.

________ (True/False) The t statistic for an individual coefficient measures the contribution of the corresponding independent variable, after controlling for the other variables in the model.

14.

________ (True/False) Because polynomials are smooth functions, it is permissible to extrapolate slightly beyond the range of the independent variable when fitting quadratic models.

15.

You fit a full regression model with five independent variables, obtaining an SSE with 40 df. Then you fit a reduced model that has only three of the independent variables, but now you obtain an SSE with 46 df. Does this make sense? What is the most likely explanation? What should you do?

16.

The null hypothesis for the test for the model (Section 8.3) does not include the intercept term β0. Give the interpretation of a null hypothesis that did include β0, H0:β0=β1=…=βm=0 . Explain why this hypothesis would rarely be of interest.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128230435000084

Methods You Might Meet, But Not Every Day

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Canonical Correlation

Multiple regression, met in Chapters 22 and 23Chapter 22Chapter 23, is a form of multivariate analysis. In this case, one dependent variable is predicted by several independent variables. A coefficient of determination R2 is calculated and may be considered as a multiple correlation coefficient, that is, the correlation between the dependent variable and the set of independent variables. If this design is generalized to multiple dependent variables, a correlation relationship between the two sets is of interest.

Canonical correlation is a term for an analysis of correlation among items in two lists (vectors of variables). For example, does a list of lab test results correlate with a list of clinical observations on a patient? The goal is to find two linear combinations, one for each list of variables, that maximize the correlation between them. The coefficients (multipliers of the variables) act as weights on the variables providing information on the interrelationships.

Suppose an investigator is interested in differentiating forms of meningitis. Blood test results (C-reactive protein, blood counts, et al) along with MRI and lumbar puncture (LP) findings are taken from samples of meningitis patients having different forms. Canonical correlations are generalizations of simple correlations between individual variables to correlations between groups. In this case, canonical correlations are found between blood test results as a group and MRI/LP results as a group for each form of meningitis and may then be compared with one another.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123848642000287

Factor Analysis

Christof Schuster, Ke-Hai Yuan, in Encyclopedia of Social Measurement, 2005

Principal-Axis Factor Analysis

Principal-axis factoring starts by considering the matrix Sr=S−Ψ˜, where Ψ˜ contains initial estimates of the uniquenesses. One popular method of obtaining these initial estimates is to calculate ψ˜ii=sii(1−Ri2), where sii is the ith variable's variance and Ri2 is the squared multiple correlation coefficient obtained from a regression of xi on the remaining variables.

Because Sr is symmetric, it is possible to write Sr = ΓΘΓ′, where the columns of Γ contain p eigenvectors of Sr and Θ = diag(θ1, …, θp) contains the corresponding eigenvalues θj, j = 1, …, p. Without loss of generality, the eigenvalues can be assumed ordered such that θ1 ≥ θ2… ≥ θp. Note that some of these eigenvalues may be negative. Let the number of positive eigenvalues be greater or equal to q, then Λq=Γqdiag(θ11/2,…,θq1/2) can be defined, where Γq contains the first q columns of Γ.

If one defines Σ˜=ΛqΛq′, then it can be shown that this matrix minimizes the least-squares discrepancy function

(6)F(Λ)=trace[(Sr−Σ˜)2]

for fixed q. In other words, Sr can be optimally approximated in the least-squares sense by ΛqΛ′q, and therefore S is closely approximated by ΛqΛq′+Ψ˜ . It is possible to iteratively apply this procedure. Having estimated Λq using the procedure just explained, the initial estimate of Ψ can be updated by calculating Ψ˜=diag(S−ΛqΛq′). An updated Sr matrix can then be calculated that leads to a new estimates of Λq. The iteration is continued in this manner until the change in the factor loadings across successive iterations becomes negligible. Minimizing Eq. (6) has been recommended as more likely to find real but small factors when compared to the number of factors extracted from minimizing the maximum-likelihood discrepancy function [see Eq. (5)].

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985001626

ASYMPTOTICALLY UNIMPROVABLE SOLUTION OF MULTIVARIATE PROBLEMS

VADIM I. SERDOBOLSKII, in Multiparametric Statistics, 2008

Special Cases

We consider “shrinkage-ridge estimators” defined by the function η(x) = α ind(x ≥ t), t ≥ 0. The coefficient α > 0 is an analog of the shrinkage coefficient in estimators of the Stein estimator type, and 1/t presents a regularization parameter. In this case, by Theorem 4.10, the leading part of the quadratic risk (3) is

R0(ρ)=R0(α,t)=σ2−2αφ(ts(t))+α2Δ(t,t )/s2(t).

If α = 1, we have

R0(ρ) =R0(1,t)=1s2(t )ddt[t(σ2−κ(t))].

In this case, the empirical risk is Remp(t) = s2(t)R0(t). For the optimum value α=αopt=s2(t)φ(ts(t)) /Δ(t,t), we have

R0(ρ)= R0(dopt,t)=σ2 (1−s2(t)φ2(ts(t))Δ(t,t)).

Example 1. Let λ → 0 (the transition to the case of fixed dimension under the increasing sample size N → ∞). To simplify formulas, we write out only leading terms of the expressions. If λ = 0, then s(t) = 1, h(t) = n−1tr(I + tΣ)−1, k(t) = ϕ(t), Δ(t, t) = ϕ(t), tϕ′(t). Set Σ = I. We have

φ(t)≈σ2r2t1+t,h(t)≈11+t,Δ(t,t)≈σ2r2 t2(1+t)2,

where r2 = g2/σ2 is the square of the multiple correlation coefficient. The leading part of the quadratic risk (3) is

R0=σ2[1− 2α2t/(1+t)+α 2r2t2/(1+t)2].

For the optimal choice of d, as well as for the optimal choice of t, we have α = (1 + t)/t and Ropt = σ2(1 − r2), i.e., the quadratic risk (3) asymptotically attains its a priori minimum.

Example 2. Let N → ∞ and n → ∞ so that the convergence holds λ = n/N → λ0. Assume that the matrices Σ are nondegenerate for each n, σ2→σ02,r2=gT ∑−1g/σ2→r02, and the parameters γ → 0. Under the limit transition, for each fixed t ≥ 0, the remainder terms in Theorems 4.8-4.11 vanish. Let d = 1 and t → ∞ (the transition to the standard nonregularized regression under the increasing dimension asymptotics). Under these conditions,

s(t) →1−λ0,s′(t)→0, φ(ts(t))→σ02r*2,κ(t)→κ(∞)=def σ02r02(1−λ0)+σ02λ0,tκ′(t)→0.

The quadratic risk (3) tends to R0 so that

lim⁡t→∞lim⁡γ→0lim⁡N→∞|ER(t)−R0|=0,

where R0=defσ02(1 −r02)/(1−λ0). This limit expression was obtained by I. S. Yenyukov (see in [2]). It presents an explicit dependence of the quality of the standard regression procedure on the dimension of observations and the sample size. Note that under the same conditions, the empirical risk Remp→σ02(1−r02)(1−λ0) that is less than σ02(1−r02).

Example 3. Under the same conditions as in Example 2, let the coefficients d be chosen optimally and then t → ∞. We have α = αopt(t) = s2(t)Φ(ts(t))/Δ(t, t) and t → ∞. Then,

s(t)→1−λ0,φ(ts(t))→σ02r02,Δ(t,t)→σ02(1−λ0)[λ0(1−r02)+(1 −λ0)r02],αopt→r02(1−λ0)[λ0(1−r02)+(1−λ0)r02].

By (23), the quadratic risk (3) R0(t, αopt) → R0 as t → ∞, where

R0=σ02(1−r02)[λ0+(1−λ0)r02]λ0(1−r02)+(1−λ0)r02≤σ02(1−r02 )1−λ0.

If λ0 = 1, the optimal shrinkage coefficient αopt → 0 and the quadratic risk remains finite (tends to σ02) in spite of the absence of a regularization, whereas the quadratic risk for the standard linear regression tends to infinity.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444530493500072

Volume 3

J. Ferré, in Comprehensive Chemometrics, 2009

3.02.3.5.1 The collinearity problem

The columns of the design matrix X can be anything between orthogonal and perfectly correlated (one column being a multiple of another column, or a linear combination of several columns). When the columns are orthogonal, the estimated regression coefficients are independent and their variance (or a combined measure of their variance) is smaller than in the non-orthogonal case for that same number of training points. Independency and low variance are important when the objective is to interpret the coefficients (the effect of the variables in the measured response y) and also to obtain predictions with a low variance. Hence, orthogonality of the x-variables is often sought for by the experimenter whenever it is possible to fix the x-values (e.g., in designed experiments). For many experimental situations, statistical experimental design presents optimal X matrices that have orthogonal columns (e.g., full factorial designs, factorial fractional designs, Plackett–Burman designs), and offers criteria such as D- and A-optimality to select the points from a list of candidates, in such a way that the X matrix formed with the selected points has columns that are as much orthogonal as possible. The opposite situation is a perfect correlation between two or more columns of X, a situation called singularity. This can be expressed as83

(74)∑k=1Kwkxk=0

where xk is the kth column of X and the wk are constants, not all of which are zero. In this case, the rank of XTX is less than K, and XTX is singular and cannot be inverted; hence, it is not possible to calculate the OLS estimates of the regression coefficients with Equation (7). Usually (e.g., near-infrared (NIR) spectra), the x-variables are neither completely correlated nor orthogonal and the coefficients can be estimated by OLS solution but their variance–covariance will be larger than for the case of orthogonal x’s. The most problematic situation is the existence of nearly exact linear combinations among the independent variables, that is when Equation (74) holds approximately. This situation is usually called collinearity, multicollinearity, or ill-conditioning. A more precise definition of collinearity can be found in Gunst84 (p 81). Collinearity has adverse effects on the regression results. The more collinear the x-variables are, the more unstable the OLS estimators of coefficients are: the estimated coefficients may change substantially with small changes in the observed y due to random errors. This instability means large (inflated) variances and covariances of the coefficients of variables involved in the linear dependencies, which make it difficult to interpret the impact of each regressor on the response. Moreover, the coefficient estimates cannot be interpreted separately, they are often too large or of wrong sign, and the t-tests of significance (Equation (51)) can indicate that the coefficients are statistically insignificant. In addition, despite the coefficients being estimated poorly, the model can have a good fit; hence, the traditional analysis of the model adequacy with summary statistics such as SSE, multiple correlation coefficient, or residual plots, will not signal the collinearity problems. Note that these statistics reflect how well the fitted model estimates the observed y but not necessarily the validity of the model for prediction. Actually, the prediction for new x measurements may be good at points whose combination of x’s are similar to those in the training data (so collinearity in X is not that big a problem if we are only interested in predictions at points with the same collinearity pattern). However, prediction at the points that do not have the same pattern of collinearity as X or extrapolation beyond the range of the data can be very adversely affected and have large errors. Gunst and Mason83 show an example with the problems associated with collinearity. Mandel85 reasoned that collinearity must be seen as a warning to limit the use of the regression model for predictions in a specific subspace of the x-space. This amounts to making a difference between the sample domain (SD), defined by the maximum and minimum value of each x-variable, and the effective prediction domain (EPD), which is the part of the x-space in which training data lie, and in which and near which prediction is safe. The collinearity problem and these domains are illustrated in Figure 9 for two independent variables, x1 and x2. (Similar discussions can be found in Sergent et al.,86 Belsley et al.1 (p 87), Mandel,87 and Larose88 (p 117).) Four data sets with five points each are simulated with given values of x1 and x2 (Table 2). The SD is the rectangle ABCD. The model was given by Equation (4), E(y∣xi) = β0 + β1xi,1 + β2xi,2, with β0 = 0.5, β1 = 0.2, and β2 = 0.15. For each point, random error ɛi from a normal distribution with mean zero and standard deviation 0.03 was added. Hence, the measured y is yi = 0.5 + 0.2xi,1 + 0.15xi,2 + ɛi. Data sets R1 and R2 were simulated with the same values of x1 and x2, but with different values of random error, to test the stability of the coefficients to the variation of the random error. Data sets R3 and R4 were also simulated with the same values of x1 and x2 but with different values of random error. The independent variables in R1 and R2 illustrate a situation where x1 and x2 are not correlated with each other; that is, they are orthogonal. The x-variables in R3 and R4 illustrate a collinear situation where x1 and x2 are correlated with each other, so that as one increases, so does the other. The random error that was added to data sets R1 and R3 was the same, and the random error added to data sets R2 and R4 was the same. For each data set, the model ŷ = b0 + b1x1 + b2x2 (Equation (8)) was calculated. The values of x1 and x2, E(y∣xi), ɛi, and yi are listed in Table 2. The estimated coefficients, the coefficient of multiple determination, and the variance inflation factors (VIFs) (Section 3.02.3.5.3(iii)) are given in Table 3. The four models are plotted in Figure 9. Figure 9(a) plots the models of data sets R1 and R2. The points are well spread over the whole regression domain and form a solid basis for the model. A bit larger or smaller yi values due to random error do not vary excessively the coefficients. This is translated into stable coefficient estimates b1 and b2, each with small variances as follows:

What is the degree of correlation among independent variables in a regression model called?

Figure 9. (a) Plot of the fitted model for data sets R1 and R2 with an orthogonal model matrix X. (b) Plot of the fitted model for data sets R3 and R4, with collinear data: the x1 values increase as the corresponding values for x2 increase.

Table 2. Simulated data corresponding to Figure 9. Data sets R1 and R2 correspond to Figure 9(a). Data sets R3 and R4 correspond to Figure 9(b)

R1R2
x1x2E(y∣xi)εiyiɛiyi
0.2 0.2 0.570 0.026 0.596 −0.013 0.557
0.2 0.8 0.660 0.003 0.663 −0.033 0.627
0.5 0.5 0.675 −0.026 0.649 0.012 0.687
0.8 0.2 0.690 0.026 0.716 −0.029 0.661
0.8 0.8 0.780 −0.013 0.767 0.005 0.785
R3R4
x1x2E(y∣xi)ɛiyiɛiyi
0.2 0.2 0.570 0.026 0.596 −0.013 0.557
0.3 0.2 0.590 0.003 0.593 −0.033 0.557
0.5 0.5 0.675 −0.026 0.649 0.012 0.687
0.7 0.6 0.730 0.026 0.756 −0.029 0.701
0.8 0.8 0.780 −0.013 0.767 0.005 0.785

Table 3. Estimated coefficients, coefficient of multiple determination, and VIF for the data sets in Table 2

R1R2R3R4
b0 0.536 0.473 0.509 0.489
b1 0.187 0.218 0.361 −0.099
b2 0.098 0.162 −0.038 0.473
R2 0.934 0.949 0.946 0.995
VIF1 1 1 22.7 22.7
VIF2 1 1 22.7 22.7

VIF, variance inflation factor.

var(b)=[1.59−1.39−1.39−1.392.78 0.00−1.380.002.78]σ2

Figure 9(b) illustrates the problems caused by collinearity among the columns of X in data sets R3 and R4. One of the dimensions of the x-space is very poorly spanned, with almost no data dispersion: the data mainly varies along the diagonal of the SD, whereas the perpendicular direction is hardly spanned. Consequently, the model is stable along the direction with higher x-variability, but easily modifiable by random errors in the direction of low variability. This means very poor, high variance, coefficient estimates for the variables that are involved in the collinearity:

var(b)=[1.29−5.253.33−5.2587.18−83.333.33 −83.3383.33]σ2

The high variability associated with the estimated coefficients b1 and b2 means that y samples may produce coefficient estimates with very different values (note how despite the y having the same random errors as R1 and R2, the coefficients are much more affected). Clearly, this instability is unacceptable to the experimenter if the coefficients must be interpreted. This situation, however, cannot be detected from the fit. Note how the fit in the models R1 and R2 is not better than for the models R3 and R4.

Predictions are also severely affected by collinearity. A prediction at a point P2 in the direction AC will have low uncertainty for all four models. However, a prediction at a point P1 near vertex A (but inside the SD) will be predicted very differently by the models from R1 and from R3. The model from R1 will produce a prediction with a low variance, whereas the model from R3 will produce a prediction with a large variance due to the uncertainty of the model in that direction. Point P1 will have a large leverage and be detected as an outlier for this model. A possible solution to these problems is to reconsider the domain, and shift from the SD to the EPD. Figure 9(b) illustrates the EPD. The two new axes correspond to the PCA decomposition used in the PCR model. PCA defines a new variable in the direction AC and another in the direction BD, and the new EPD is now defined as the ‘sample domain’ for these new variables (as the largest and smallest values of the scores along these two PCs). In this zone, predictions are stable, and point P1, which in the original MLR model would be predicted wrongly, now will be outside the limits of the EPD and be detected as an outlier.

This example shows how interrelationships among the x’s can severely restrict the effective use of the model, which may only be adequate for prediction in limited regions of the predictor variables. The analyst must investigate the correlation structure among the predictor variables with regression diagnostics to determine if the multivariate data being analyzed corresponds to case R1 or to case R3. The next section identifies some possible causes of collinearity, and possible solutions.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444527011000764

What is it called when independent variables are correlated?

Key Takeaways. Multicollinearity is a statistical concept where several independent variables in a model are correlated. Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0. Multicollinearity among independent variables will result in less reliable statistical inferences.

What is the independent variable known as in regression analysis?

Independent variables are also known as predictors, factors, treatment variables, explanatory variables, input variables, x-variables, and right-hand variables—because they appear on the right side of the equals sign in a regression equation.

What is correlation coefficient in regression?

Correlation in Linear Regression The square of the correlation coefficient, r², is a useful value in linear regression. This value represents the fraction of the variation in one variable that may be explained by the other variable.