# Covariance vs Correlation: What’s the difference?

In statistics, covariance and correlation are two mathematical notions. Both phrases are used to describe the relationship between two variables. This blog talks about covariance vs correlation: what’s the difference? Let’s get started!

**Introduction**

Covariance and correlation are two mathematical concepts used in statistics. Both terms are used to describe how two variables relate to each other. Covariance is a measure of how two variables change together. The terms covariance vs correlation is very similar to each other in probability theory and statistics. Both the terms describe the extent to which a random variable or a set of random variables can deviate from the expected value. But what is the difference between covariance vs correlation? Let’s understand this by going through each of these terms.

It is calculated as the covariance of the two variables divided by the product of their standard deviations. Covariance can be positive, negative, or zero. A positive covariance means that the two variables tend to increase or decrease together. A negative covariance means that the two variables tend to move in opposite directions.

A zero covariance means that the two variables are not related. Correlation can only be between -1 and 1. A correlation of -1 means that the two variables are perfectly negatively correlated, which means that as one variable increases, the other decreases. A correlation of 1 means that the two variables are perfectly positively correlated, which means that as one variable increases, the other also increases. A correlation of 0 means that the two variables are not related.

**Contributed by: Deepak Gupta **

*If you are interested in learning more about Statistics, taking up a free online course will help you understand the basic concepts required to start building your career. At Great Learning Academy, we offer a Free Course on Statistics for Data Science. This in-depth course starts from a complete beginner’s perspective and introduces you to the various facets of statistics required to solve a variety of data science problems. Taking up this course can help you power ahead your data science career. *

In statistics, it is frequent that we come across these two terms known as covariance and correlation. The two terms are often used interchangeably. These two ideas are similar, but not the same. Both are used to determine the linear relationship and measure the dependency between two random variables. But are they the same? **Not really. **

Despite the similarities between these mathematical terms, they are different from each other.

Covariance is when two variables vary with each other, whereas Correlation is when the change in one variable results in the change in another variable.

In this article, we will try to define the terms correlation and covariance matrices, talk about covariance vs correlation, and understand the application of both terms.

**What is covariance?**

Covariance signifies the direction of the linear relationship between the two variables. By direction we mean if the *variables* are directly proportional or inversely proportional to each other. (Increasing the value of one variable might have a positive or a negative impact on the value of the other variable).

The values of covariance can be any number between the two opposite infinities. Also, it’s important to mention that covariance only measures how two variables change together, not the dependency of one variable on another one.

The value of covariance between 2 variables is achieved by taking the summation of the product of the differences from the means of the variables as follows:

The upper and lower limits for the covariance depend on the variances of the variables involved. These variances, in turn, can vary with the scaling of the variables. Even a change in the units of measurement can change the covariance. Thus, covariance is only useful to find the direction of the relationship between two variables and not the magnitude. Below are the plots which help us understand how the covariance between two variables would look in different directions.

**Example:**

**Step 1: Calculate Mean of X and Y **

Mean of X ( μx ) : 10+12+14+8 / 4 = 11

Mean of Y(μy) = 40+48+56+32 = 44

**Step 2: Substitute the values in the formula **

xi –x̅ |
yi – ȳ |

10 – 11 = -1 | 40 – 44 = – 4 |

12 – 11 = 1 | 48 – 44 = 4 |

14 – 11 = 3 | 56 – 44 = 12 |

8 – 11 = -3 | 32 – 44 = 12 |

**Substitute the above values in the formula **

Cov(x,y) = (-1) (-4) +(1)(4)+(3)(12)+(-3)(12)

___________________________

4

**Cov(x,y) =** 8/2 =** 4 **

**Hence, Co-variance for the above data is 4 **

**Quick check – **Introduction to Data Science

**What is correlation?**

Correlation analysis is a method of statistical evaluation used to study the strength of a relationship between two, numerically measured, continuous variables.

It not only shows the kind of relation (in terms of direction) but also how strong the relationship is. Thus, we can say the correlation values have standardized notions, whereas the covariance values are not standardized and cannot be used to compare how strong or weak the relationship is because the magnitude has no direct significance. It can assume values from -1 to +1.

To determine whether the covariance of the two variables is large or small, we need to assess it relative to the standard deviations of the two variables.

To do so we have to normalize the covariance by dividing it with the product of the standard deviations of the two variables, thus providing a correlation between the two variables.

The main result of a correlation is called the correlation coefficient.

The correlation coefficient is a dimensionless metric and its value ranges from -1 to +1.

The closer it is to +1 or -1, the more closely the two variables are related.

If there is no relationship at all between two variables, then the correlation coefficient will certainly be 0. However, if it is 0 then we can only say that there is no linear relationship. There could exist other functional relationships between the variables.

When the correlation coefficient is positive, an increase in one variable also increases the other. When the correlation coefficient is negative, the changes in the two variables are in opposite directions.

**Example: **

**Step 1: Calculate Mean of X and Y **

Mean of X ( μx ) : 10+12+14+8 / 4 = 11

Mean of Y(μy) = 40+48+56+32/4 = 44

**Step 2: Substitute the values in the formula **

xi –x̅ |
yi – ȳ |

10 – 11 = -1 | 40 – 44 = – 4 |

12 – 11 = 1 | 48 – 44 = 4 |

14 – 11 = 3 | 56 – 44 = 12 |

8 – 11 = -3 | 32 – 44 = 12 |

**Substitute the above values in the formula **

Cov(x,y) = (-1) (-4) +(1)(4)+(3)(12)+(-3)(12)

___________________________

4

**Cov(x,y) =** 8/2 =** 4 **

**Hence, Co-variance for the above data is 4 **

**Step 3: Now substitute the obtained answer in Correlation formula **

Before substitution we have to find standard deviation of x and y

Lets take the data for X as mentioned in the table that is 10,12,14,8

To find standard deviation

**Step 1: Find the mean of x that is x̄**

10+14+12+8 /4 = 11

**Step 2: Find each number deviation: Subtract each score with mean to get mean deviation**

10 – 11 = -1 |

12 – 11 = 1 |

14 – 11 = 3 |

8 – 11 = -3 |

**Step 3: Square the mean deviation obtained **

**Step 4: Sum the squares **

1+1+9+9 = 20

**Step5: Find the variance **

**Divide the sum of squares with n-1 that is 4-1 = 3 **

20 /3 = 6.6

**Step 6: Find the square root**

Sqrt of 6.6 = 2.581

**Therefore, Standard Deviation of x = 2.581**

**Find for Y using same method **

The Standard Deviation of y = 10.29

Correlation = 4 /(**2.581** x10.29 )

Correlation = 0.15065

So, now you can understand the difference between Covariance vs Correlation.

**Applications of covariance **

- Covariance is used in Biology – Genetics and Molecular Biology to measure certain DNAs.
- Covariance is used in the prediction of amount investment on different assets in financial markets
- Covariance is widely used to collate data obtained from astronomical /oceanographic studies to arrive at final conclusions
- In Statistics to analyze a set of data with logical implications of principal component we can use covariance matrix
- It is also used to study signals obtained in various forms.

**Applications of correlation **

- Time vs Money spent by a customer on online e-commerce websites
- Comparison between the previous records of weather forecast to this current year.
- Widely used in pattern recognition
- Raise in temperature during summer v/s water consumption amongst family members is analyzed
- The relationship between population and poverty is gauged

**Methods of calculating the correlation**

- The graphic method
- The scatter method
- Co-relation Table
- Karl Pearson Coefficient of Correlation
- Coefficient of Concurrent deviation
- Spearman’s rank correlation coefficient

Before going into the details, let us first try to understand variance and standard deviation.

**Quick check –** Statistical Analysis Course

**Variance**

Variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers are spread out from their average value.

**Standard Deviation**

Standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. It essentially measures the absolute variability of a random variable.

Covariance and correlation are related to each other, in the sense that covariance determines the type of interaction between two variables, while correlation determines the direction as well as the strength of the relationship between two variables.

**Differences between Covariance and Correlation**

Both the Covariance and Correlation metrics evaluate two variables throughout the entire domain and not on a single value. The differences between them are summarized in a tabular form for quick reference. Let us look at Covariance vs Correlation.

Covariance |
Correlation |

Covariance is a measure to indicate the extent to which two random variables change in tandem. | Correlation is a measure used to represent how strongly two random variables are related to each other. |

Covariance is nothing but a measure of correlation. | Correlation refers to the scaled form of covariance. |

Covariance indicates the direction of the linear relationship between variables. | Correlation on the other hand measures both the strength and direction of the linear relationship between two variables. |

Covariance can vary between -∞ and +∞ | Correlation ranges between -1 and +1 |

Covariance is affected by the change in scale. If all the values of one variable are multiplied by a constant and all the values of another variable are multiplied, by a similar or different constant, then the covariance is changed. | Correlation is not influenced by the change in scale. |

Covariance assumes the units from the product of the units of the two variables. | Correlation is dimensionless, i.e. It’s a unit-free measure of the relationship between variables. |

Covariance of two dependent variables measures how much in real quantity (i.e. cm, kg, liters) on average they co-vary. | Correlation of two dependent variables measures the proportion of how much on average these variables vary w.r.t one another. |

Covariance is zero in case of independent variables (if one variable moves and the other doesn’t) because then the variables do not necessarily move together. | Independent movements do not contribute to the total correlation. Therefore, completely independent variables have a zero correlation. |

**Conclusion**

Both Correlation and Covariance are very closely related to each other and yet they differ a lot.

When it comes to choosing between Covariance vs Correlation, the latter stands to be the first choice as it remains unaffected by the change in dimensions, location, and scale, and can also be used to make a comparison between two pairs of variables. Since it is limited to a range of -1 to +1, it is useful to draw comparisons between variables across domains. However, an important limitation is that both these concepts measure the only linear relationship.

*If you wish to learn more about statistical concepts such as covariance vs correlation, upskill with Great Learning’s PG program in Data Science and Business Analytics. The PGP DSBA Course is specially designed for working professionals* *and helps you power ahead in your career. You can learn with the help of mentor sessions and hands-on projects under the guidance of industry experts*. *You will also have access to career assistance and 350+ companies*. *You can also check out Great Learning Academy’s free online certificate courses.*

**Further Reading**

Source : https://www.mygreatlearning.com/blog/artificial-intelligence/