Measures of variability in data Part 2: standard deviations

Earlier I explained how to calculate and use statistical variance (See my post of Feb. 18, 2011: variance in statistics.). I used as an example patent applications handled by a group of law firms. To solve the problem that variance figures aren’t natural, they are squared, statisticians compute the square root of the figure, easily done on a spreadsheet, which translates it back to the everyday unit: patents prosecuted. In statistical-speak the standard deviation, as it is called, is simply the square root of the variance.

Standard deviation is the most common statistic to describe how spread out are the values in a data set. Assuming a normal distribution of patents worked on, 68.2% of the firms will lie within one standard deviation on either side of the average number handled. One more standard deviation on either side accounts for 27.2 percent more of the firms; thus, 95.4 percent of all the firms will be between two standard deviations on either side of the average amount. With only ten firms, the distribution may be lumpy but if you were to imagine data on 100 patent firms, the familiar bell curve would start to be distinguishable.

Among other uses, as we shall see in a later post, standard deviations allow a general counsel to compare the degree of dispersion among two sets of unrelated members. For example, how do you compare the distribution of a law department’s patent firm output to the amounts of invoices received from those firms?