Variance

4.2 Variance

The variance is a measure of the spread of data. Statisticians are usually concerned with taking a sample of a population . To use election polls as an example , the population is all the people in the country , whereas a sample is a subset of the population that the statisticians measure. The great thing about statistics is that by only measuring a sample of the population , we can work out what is most likely to be the measurement if we used the entire population.

Let's take an example :

X = [1 2 4 6 12 25 45 68 67 65 98]

We could simply use the symbol X to refer to this entire set of numbers. For referring to an individual number in this data set , we will use subscript on the symbol X to indicate a specific number. There are number of things that we can calculate about a data set. For example we can calculate the mean of the sample. It can be given by the formulae :-

mean = sum of all numbers / total no. of numbers

Unfortunately, the mean doesn't tell us a lot about the data except for a sort of middle point. For example, these two data sets have exactly the same mean(10) , but are obviously quite different :

[0 8 12 20] and [8 9 11 12]

So what is different about these two sets ? It is the spread of the data that is different . The Variance is a measure of how spread out data is. Its just like Standard Deviation.

SD is "The average distance from the mean of the data set to a point". The way to calculate it is to compute the squares of the distance from each data point to the mean of the set , add them all up , divide by n-1 , and take the positive square root . As a formulae :