Covariance Matrix

4.3 Covariance

Variance and SD are purely 1-dimensional.Data sets like this could be: height of all the people in the room,marks for the last CSC378 exam etc.However many data sets have more than one dimensions , an the aim of the statistical analysis of these data sets is usually to see if there is any relationship between the dimensions.For example, we might have as our data set both the height of all the students in a class,and the mark they received for that paper.We could then perform statistical analysis to see if the height of a student has any effect on their mark.It is usefull to have measure to find out how much the dimensions vary from the mean with respect o each other.

Covariance is such a measure.it is always measured between 2 dimensions.If we calculate the covariance between one dimension and itself, you get the variance.So if we had a three dimensional data set (x,y,z), then we could measure the covariance between the x and y dimensions, the x and z dimensions, and the y and z dimensions. Measuring the covariance between x and x, or y and y , or z and z would give us the variance of the x,y and z dimensions respectively.

The formulae for covariance is very similar to the formulae for variance.

How does this work ? Lets use some example data. Imagine we have gone into the world and collected some 2-dimensional data,say we have asked a bunch of students how many hours in total that they spent studying CSC309 , and the mark that they received . So we have two dimensions, the first is the H dimension,the hours studied,and the second is the M dimension,the mark received.

So what does the covariance between H and M tells us ? The exact value is not as important as it's sign(ie. positive or negative). if the value is positive, then that indicates that noth dimensions increase together,meaning that,in general,as the number of hours of study increased,so did the final mark.

If the value is negative,then as one dimension increase the other decreases.If we had ended up with a negative covariance then would mean opposite, that as the number of hours of study increased the final mark decreased.

In the last case, if the covariance is zero,it indicates that the two dimensions are independent of each other.

4.4 The covariance Matrix

A usefull way to get all the possible covariance values betweenall the different dimensions is to calculate them all and put them in a matrix.An example.We will make up the covariance matrix for an imaginary 3 dimensional data set, using the usual dimensions x,y and z.Then the covariance matrix has 3 rows and 3 columns,and the values are this:

cov(x,x) cov(x,y) cov(x,z)

C = cov(y,x) cov(y,y) cov(y,z)

cov(z,x) cov(z,y) cov(z,z)

Point to note:Down the main diagonal, we see that the covariance value is between one of the dimensions and itself.These are the variances for that dimension.The other point is that since cov(a,b) = cov(b,a), the matrix is symmetrical about the main diagonal.