Statistics and concepts to describe dispersion of law department data

Let’s take an example and look at some tools. A law department knows the number of matters each of its law firms handled during a year. The easiest ways to describe that data include the average, median, and mode of matters per firm (See my post of Nov. 30, 2005 for definitions of those terms.). A scatter-gram and its trend line, once you sort the data, depicts another aspect of the data’s dispersion (See my post of June 6, 2006 about scatter-grams and trend lines.). Here are three more concepts regarding how to show the dispersion of such a data set: central tendency, shape and variance.

To depict central tendency, you can use the minimum, maximum, and range of matter numbers per firm. The minimum is one; the maximum could be scores of matters, and the range is the difference between those two numbers. The inter-quartile mean is a truncated mean: discard the lowest and the highest group of matter numbers (sometimes the highest and lowest quartiles or quintiles); then calculate the average of the remaining numbers.

A histogram shows the “shape” of the data, as it depicts the number of matters by the heights of columns. A histogram looks like a single-humped mountain with its peak at the mode if the distribution is reasonably normal (See my post of Oct. 24, 2005 on bell curves and the standard distribution.). You can go further and show the percentage of the total at each stage of a cumulative presentation. Skewness is a measure of the asymmetry of the distribution of data. Roughly speaking, a distribution has positive skew (right-skewed) if the higher-figure tail is longer and negative skew (left-skewed) if the lower-figure tail is longer. Kurtosis is a measure of the “peakedness” of the data’s distribution.

Variance is a measure of statistical dispersion, indicating how far from the expected value the actual values typically are (See my post of Nov. 13, 2005 on fractals and standard deviations.). Typically the point from which the deviation is measured is the value of either the median or the mean of the data set. The variance of a random variable is the square of its standard deviation range, or inter-quartile range, or absolute deviation. The average absolute deviation of a data set is the average of the absolute deviations and is a summary statistic of statistical dispersion.