How do you determine if there is a relationship between two quantitative variables?

Today I was feeling extra-hungry so I went to an all-you-can-eat buffet. There is a wide variety of dishes and desserts, but what really caught my attention was the pricing chart for children under the age of \(15\). The younger they were, the less they would charge them.

After thinking about it for a while, it made perfect sense to me. Because toddlers don't eat as much as teenagers, it is fine to charge them less. It seems that there is a relationship between age and the amount of food a person eats. How to study this relationship? Maybe you could run a survey for this research!

Whenever you are looking at the relation between two variables that you can measure you are dealing with two quantitative variables. Here you will learn how to study their relationship and the techniques used for their representation.

Relationship between Two Quantitative Variables

Before proceeding, it is important to review the difference between quantitative and categorical variables.

A quantitative variable is a variable that can be measured with units.

It does not matter which type of units you are using, as long as you can measure a variable then it is a quantitative variable. What about categorical variables?

A categorical variable, also known as a qualitative variable, is a variable whose properties are described rather than measured.

Categorical variables are usually things like colors, names, favorite meals, and so on.

Suppose you are doing a survey in your neighborhood, and you will ask for the following data:

  • Age
  • Gender
  • Height
  • Last name
  • Favorite activity

Which variables are quantitative, and which are categorical?

Answer:

To find which variables are quantitative you must ask yourself which variables can be measured. From the given list, age is typically measured in years, while you can measure height in feet, inches, meters, or more units. This means that the quantitative variables are:

  • Age
  • Height

The rest of the variables will be given as words rather than numbers, so it is easier to think of them either as labels (in the case of the last name) or as a description of themselves (like gender and their favorite activity). So, the categorical variables are:

  • Gender
  • Last name
  • Favorite activity

Typically, a survey is made in order to gather data for its inspection. Decisions are made based on conclusions drawn from data, so it is important to analyze the relationship between variables.

When comparing two quantitative variables you can have a clearer picture of the data by organizing it according to the numerical values that are being represented. This is not the case for categorical variables, as you will see in the next example.

Suppose you want to make a graph to study the relationship between these two pairs of variables:

  • Height and weight of male high school students.
  • Favorite sport and favorite color of male high school students.

When doing a graph of two quantitative variables, as in the case of the height and weight of the students, you can arrange data in numerical order. That is, each axis will represent a number line, so before filling it with data, your graph will look like this:

Figure 1. Empty graph of height vs. weight

The graph comparing two quantitative variables is more insightful as if you move to the right you are looking at taller people, and if you move up you will look at people with higher weights. You can tell this even if the graph is empty!

If you were to use a graph to represent two categorical variables, as in the case of the favorite sport and color, there is no clear arrangement for the data. You might organize it in alphabetical order, or maybe you will arrange it according to your preferences, but this arrangement does not tell you anything beforehand.

It is important to keep in mind the context of the survey in order to properly classify a variable as quantitative or categorical. For example, you might think that a zip code is a quantitative variable because it is a number, but since it is just a label, it is a categorical variable instead.

If you want to know how to analyze categorical variables, please reach out to our Two Categorical Variables article.

How to Compare Two Quantitative Variables

A natural question that arises whenever you are given two variables is: Are these two variables related to each other?

Consider the case of height and weight. The taller a person is, the more they will weigh. This does not mean that a taller person will always weigh more than a shorter person, but rather it tells you that there is a relation between these variables.

It might also be possible to have two unrelated variables, like the age and height of a population of full-grown men. Whenever you are dealing with two variables, be they related or not, you are dealing with bivariate data.

Bivariate data is data that is given as pairs of variables.

In the height and weight example, when you are doing a survey you will be asking for both the height and the weight of each individual, so each of these values will be paired. This is an example of bivariate data.

Bivariate quantitative data is bivariate data that consists of two quantitative variables.

Bivariate quantitative data can be represented in many ways. For example, you can use a table of values, where each column represents one of the variables.

Suppose you want to investigate if there is a relationship between consumption habits and age. For this reason, you go to your local mall and politely ask each leaving person if they are up to a survey. In this survey, you just ask for their age and how many items they bought if any. Your data can be arranged in a table like this:

Age (Years) Number of Bought Items
\[12\] \[0\]
\[36\] \[4\]
\[21\] \[12\]
\[24\] \[5\]
\[15\] \[2\]
\[23\] \[7\]
\[45\] \[2\]
\[67\] \[1\]
\[11\] \[1\]

From the above table, you can begin to note some patterns. It looks like children tend to buy fewer things, maybe because they lack money. On the other hand, young adults seem to like getting their hands on a lot of stuff. Of course, there are many more factors involved in consumption habits, but this is a good start!

You can rearrange the above table by ordering the data by age, in which case you need to make sure that you pair correctly each entry.

Age (Years) Number of Bought Items
\[11\] \[1\]
\[12\] \[0\]
\[15\] \[2\]
\[21\] \[12\]
\[23\] \[7\]
\[24\] \[5\]
\[36\] \[4\]
\[45\] \[2\]
\[67\] \[1\]

Please keep in mind that the table can also be written horizontally, in which case each row will represent an inquiry.

Another way of representing bivariate quantitative data is by drawing points in a plane, as you will see in the next section.

Two Quantitative Variables Graphs

There are many ways of displaying quantitative data. For example, if you are interested in doing a survey about the ages of high school students, you can use a histogram, a dot plot, or a stem-and-leaf display. However, all these graphs are used to display a single variable along with its frequency.

Suppose you are given a set of bivariate quantitative data, this means that both variables are quantitative variables, so you are dealing with a pair of numbers. This makes graphing bivariate quantitative data a straightforward task, as you can represent data by points on the plane. In order to do this, you need to assign an axis to each variable.

Consider our mall example. You can assign the either variable to either axis, but you will usually assign the \(x\)-axis to variables like age and height, which either change at a constant rate or are less likely to change.

On the other hand, variables like weight, the number of bought items, or bottles of water drank in a week, are more likely to be assigned to the \(y\)-axis.

Figure 2. The age of the customers is on the \(x\)-axis and the number of items bought is on the \(y\)axis

Note that the ages of customers range between \(11\) and \(67\) years, so the \(x\)-axis is scaled accordingly. Likewise, the \(y\)-axis ranges from \(0\) to \(12\).

Now that you have the plane labeled in a representative way, it is time to draw a lot of points. Here, each point represents an inquiry.

Figure 3. Scatter plot of the ages of customers vs. the number of items bought at a mall

The graph shown in the previous example is known as a scatter plot, and it is one of the most common ways of displaying bivariate quantitative data.

For more information about these plots, please check out our Scatter Plots article!

Correlation between Two Quantitative Variables

One of the reasons scatter plots are often used to represent bivariate quantitative data is that it is possible to identify patterns in data. Consider the following scatter plot.

Figure 4. Scatter plot of the ages of children between \(4\) and \(12\) years versus their height

From the above scatter plot you might have found a pattern in which, in general, as the ages of children increase, they become taller, which makes perfect sense. In this case, we say that both variables are correlated.

Correlation is a measure of how much two quantitative variables are associated with each other.

It is worth noting that correlation only applies to two quantitative variables. If you are dealing with bivariate data where one or both variables are categorical, then you should not be looking for correlation.

When two variables are correlated, you can draw a straight line that more or less describes how the data behaves. This line is known as the line of best fit, which is obtained by means of linear regression.

Reach out to our Linear Regression article for more information about this topic!

If two variables are correlated, you expect the change of one to impact the other in a significant way. Because of this, if the variables are correlated then the line of best fit will either be an increasing or a decreasing line.

Figure 5. Graph of strongly correlated bivariate data along with its line of best fit

On the other hand, if two variables are not correlated at all, you should expect the line of best fit to be horizontal, as the change in one variable does not impact the other at all. In this scenario, the data will be scattered all the way around.

Figure 6. Graph of weakly correlated bivariate data along with its line of best fit

In order to measure how correlated are two variables, you need to look at the Pearson correlation coefficient.

The Pearson correlation coefficient, also known as Pearson's \(r\), or just as correlation coefficient, is a number that ranges between \(-1\) and \(1\), which is used to measure the correlation of bivariate data.

For more information about the correlation coefficient, and how it is obtained, please take a look at our article about Linear Correlation.

There are some things to keep in mind when talking about correlation, which can be addressed using the correlation coefficient.

Shape of the Correlation

Correlation typically refers to the linear association between two variables, but it is also possible to find that some variables are related by other types of relations, like quadratic or exponential.

Figure 7. A scatter plot with exponential correlation

These other types of correlation will not be discussed further in this article.

The Pearson correlation coefficient only applies to linearly correlated bivariate data!

Direction of the Correlation

In the previous examples, you have seen how in the case of two correlated variables, as one increases, so does the other. This is a particular type of correlation, called positive correlation.

Two variables are positively correlated if the Pearson correlation coefficient is positive.

When two variables are positively correlated, the line of best fit has a positive slope. However, it is also possible to have negatively correlated variables.

Two variables are negatively correlated if the Pearson correlation coefficient is negative.

Likewise, when two variables are negatively correlated, the line of best fit has a negative slope.

Figure 8. A scatter plot of negatively correlated data along with its line of best fit

Remember that negative correlation means that the data is correlated. The word negative is used to address the slope of the line of best fit.

Strength of the Correlation

Sometimes you might notice that a scatter plot strongly resembles a linear graph, while others have data so scattered around that it looks almost as if it was random! The absolute value of the Pearson correlation coefficient will give you insight into this matter.

Let \(r\) be the Pearson correlation coefficient of a set of bivariate quantitative data. The closer \(|r|\) is to \(1,\) the stronger is the correlation. A Pearson correlation coefficient of exactly \(1\) or \(-1\) means the data is completely linear, which is a scenario so perfect that it is unlikely to happen.

On the other hand, if the Pearson correlation coefficient is close to \(0\), then the data suggests that the variables are not correlated, or are correlated in a weak way.

Examples of Two Quantitative Variables

Here you can look at some examples of scatter plots of two quantitative variables and tell whether they represent correlated data or not.

A survey was made on female adults about their reading habits, obtaining the following scatter plot.

Figure 9. Scatter plot of the average number of books read in a year versus height of female adults

  1. What is this scatter plot about?
  2. Are the variables involved quantitative variables?
  3. What kind of conclusion can you draw from this scatter plot?

Answer:

  1. By looking at the axes of the graph, you can note that the height is on the horizontal axis, while the average number of books read in a year is on the vertical axis. Therefore, this scatter plot is used to study if there is a relation between the heights and the reading habits of female adults.
  2. This scatter plot is relating height, which can be measured, and the average number of books read in a year, which can be counted. Therefore, both variables are quantitative variables.
  3. By looking at the scatter plot you can note how the data is very spread with no clear tendency towards a positive or negative correlation. Therefore you can assume that there is no relation between the height of a female adult and the number of books they read in a year.

You can find correlations even in your grocery store!

Suppose you are on a diet and you are recommended to avoid added sugar in canned beverages like juice and soda.

You are skeptical about this suggestion, so you decide to study if there is a relationship between added sugar and calories per serving.

In order to do so, you go to the grocery store and check the nutrition facts on \(20\) different products that you would consume. When you get back home, you make the following scatter plot.

Figure 10. Scatter plot of added sugar versus calories per serving on canned beverages

Should you follow the recommendation?

Answer:

By looking at the data you can find that as the amount of added sugar increases, so does the amount of calories per serving in these canned beverages.

You can conclude that there is a positive correlation between the amount of average sugar and the calories of canned beverages. Since you are on a diet, you should limit the calories you consume, so you should follow the recommendation.

Two Quantitative Variables - Key takeaways

  • A quantitative variable is a variable that can be measured with units.
  • Bivariate quantitative data is data that is given as pairs ofquantitative variables.
  • Bivariate quantitative data is represented on a plane. Each dot in the plane corresponds to an inquiry.
  • Scatter plots are used to represent bivariate quantitative data.
  • A line of best fit can be used in a scatter plot to represent the tendency of the data.
    • If the slope of a line of best fit is positive, the data is positively correlated.
    • If the slope of a line of best fit is negative, the data is negatively correlated.
  • If the data of a scatter plot is very spread outside of the line of best fit, then the data is weakly correlated.

How would you test the relationship between two quantitative variables?

A scatterplot is the most useful display technique for comparing two quantitative variables. We plot on the y-axis the variable we consider the response variable and on the x-axis we place the explanatory or predictor variable.

What is a quantitative relationship between variables?

Correlation measures the linear relationship between two quantitative variables. Correlation is possible when we have bivariate data. In other words, when the subjects in our dataset have scores on two separate quantitative variables, we have bivariate data.