Negative correlation Positive correlation

Which of the following is a correct representation of a strong negative correlation?

Statistics for the Behavioral Sciences

Lesson 7

Correlation

Roger N. Morrissette, PhD

I. Correlations [Video Lesson 7 I] [YouTube version]

A correlation is a statistical test that demonstrates the relationship between two variables. Even though you may be able to show a significant relationship between two variables, a correlation does not show a causal relationship between the two variables. For example, although depression and self-esteem are two variables that are significantly correlated to each other we can not say that low self-esteem causes depression. Likewise, we can not say that depression causes low self-esteem. The two variables may be significantly correlated but no causal relationship is assumed. Correlations are best represented graphically by a scatterplot and best calculated by using the Pearson Product Moment Correlation formula.
Let's test this hypothesis that depression scores are negatively correlated to self-esteem scores. We design our surveys and sample 8 subjects. Their data is presented below. Data for a correlation are always presented in two columns like the data set shown below. Depression scores are our X data and Self-Esteem Scores are our Y data:

Depression [X]

Self-Esteem [Y]

10

104

12

100

19

98

4

150

25

75

15

105

21

82

7

133

II. Scatterplots [Video Lesson 7 II] [YouTube version]

A scatterplot is a graphical representation of the two sets of data you are comparing. The X-axis plots your first or "X" data, and the Y-axis plots your second or "Y" set of data. The scatterplot can tell you two important things about the relationship between your two variables. First it can show you if you have a weak or strong relationship between your variables. Secondly, it can tell you if your variables are negatively or positively related.

A. Scatterplots can show the strength of the relationship between two variables

1. Weak relationships will have a wide scattering of the plots

2. Strong relationships will have a minimal scattering of the plots

B. Scatterplots can show the direction or type of the relationship between two variables

1. Positive Correlation

both factors vary in the same direction

as one factor increases, the other increases

2. Negative Correlation

both factors vary in opposite directions

as one factor increases, the other decreases

3. Zero or Neutral Correlation

the two factors show no relationship to one another

III. The Pearson Product Moment Correlation [Correlation Coefficient] [Video Lesson 7 III] [YouTube version] [Correlation Calculation - YouTube version] [mp4 version]

The correlation coefficient is a statistic that calculates the actual relationship between two variables. It has a range between -1.00 and +1.00. You can not get a correlation of 1.5. A value of -1.00 would be a perfect [very strong] negative correlation, a value of +1.00 would be a perfect [very strong] positive correlation, and a value of 0.00 would be a [very weak] zero or neutral correlation. To calculate the correlation coefficient we use the Pearson Product Moment Correlation [r]:

The formula reads: r equals. In the numerator: n or number of pairs multiplied by the sum of X and Y then subtract the sum of X times the sum of Y. In the denominator: Take the square root of the final sum of n times the sum of X squared minus the sum of X then squared, then multiply that value by n times the sum of Y squared, then minus the sum of Y then squared.

Depression [X]	Self-Esteem [Y]
10	104
12	100
19	98
4	150
25	75
15	105
21	82
7	133

To calculate the correlation coefficient [r] for the data above we first need to expand the columns just as we did when we calculated standard deviation. If you look at the formula above you will see that we need an X squared column, a Y squared column and an X times Y column. This first step is show below:

X	Y	X2	Y2	XY
10	104	100	10816	1040
12	100	144	10000	1200
19	98	361	9604	1862
4	150	16	22500	600
25	75	625	5625	1875
15	105	225	11025	1575
21	82	441	6724	1722
7	133	49	17689	931

The next step is to calculate the sums of our columns:

X	Y	X2	Y2	XY
10	104	100	10816	1040
12	100	144	10000	1200
19	98	361	9604	1862
4	150	16	22500	600
25	75	625	5625	1875
15	105	225	11025	1575
21	82	441	6724	1722
7	133	49	17689	931
113	847	1961	93983	10805
n = 8

Now we have all the information we need to solve our equation:
r = [8 x 10805] - [113 x 847] / square root [[8 x 1961] - [113]2] x [[8 x 93983] - [847]2]
r = [86440] - [95711] / square root [[15688] - [12769]] x [[751864] - [717409]]
r = - 9271 / square root [[2919] x [34455]]
r = - 9271 / square root [100574145]
r = - 9271 / 10028.666
r = - 0.9244
Our correlational coefficient is negative and very close to 1.00 which tells us that we have a strong negative relationship between our two variables. If we look at the scatterplot of our data we can see that the scatterplot is in aligned with our correlational coefficient.

IV. Determining Significance [Video Lesson 7 IV] [YouTube version]

Now that we have calculated our correlation coefficient we need determine how significant it is. There are two ways to determine the significance of a correlation: the first is to calculate the Coefficient of Determination and the second is to use the R Table.

A. The Coefficient of Determination

The coefficient of determination determines how much of the variance of one factor can be explained by the variability of a factor with which it is correlated. To calculate the coefficient of determination we simply square the r value.

Coefficient of Determination = r2

For our r of - 0.9244, the coefficient of determination would be r2 = 0.8545. This means that 85% of the variance of our depression scores can be explained by the variance of our self-esteem scores. This is a pretty high value and suggests a very strong relationship between the two variables. Although the coefficient of determination is a good predictor of the strength of the relationship between two variables, it does not predict significance. We will need to use the R Table to confirm if our correlation is statistically significant.

B. The R Table

The R Table is located in its entirety in Appendix A in the back of the text book. A shortened version is also available at the bottom of this lecture. It starts on page 435 and gives the critical R values based on degrees of freedom of your sample, the level of significance of the statistical test, and whether your hypothesis is one- or two-tailed. These three factors plus the critical R values are represented in the R Table and will be explained one at a time.

1. Degrees of Freedom.

The term degrees of freedom refers to the number of scores within a data set that are free to vary. In any sample with a fixed mean, the sum of the deviation scores is equal to zero. If your sample has an n equal to 10. The first 9 scores are free to vary but the 10th score must be a specific value that makes the entire distribution equal to zero. Therefore in a single sample the degrees of freedom would be equal to n - 1. The degrees of freedom for a correlation is slightly different because n equals number of pairs not simply sample size. Therefore, the degrees of freedom for a correlation in n - 2. So to calculate the degrees of freedom you simply take the number of pairs and subtract two. For our data set of depression and self-esteem scores the degrees of freedom are calculated the following way:

df = n -2

df = 8 - 2

df = 6

The R Table shows the degrees of freedom values in the far left column as shown below: