Standard deviation is the most common statistic to describe how spread out are the values in a data set. (OK, if you insist: standard deviation is the square root of the data’s variance.)

Standard deviations allow a person to compare the degree of dispersion among two sets of unrelated members. For example, how do you compare the distribution of a law department’s invoices to the size of its law firms in the same terms? Assuming a normal distribution of invoices, 68.2% of them will lie within one standard deviation on either side of the average invoice amount. One more standard deviation on either side accounts for 27.2 percent more of the invoices; thus, 95.4 percent of all the invoices will be between two standard deviations on either side of the average amount.

If the law department also graphs the number of lawyers in each of the firms that represented it over several years, assuming a statistically normal distribution, the department could describe their sizes in terms of standard deviations.

The department could then show similar extremes — an invoice amount or law firm size — even though the two sets of numbers have no relation (See my posts that refer to standard deviation of Oct. 24, 2005 on the bell curve; Jan. 20, 2006 on Bayesian statistics; May 31, 2006 on the value of statistical acumen; and June 30, 2006 on statistics to describe dispersion.).

*Related*