Spring Release of GC Metrics published; take part now to get the Summer Release!

We sent out the Spring Release last week. It provides benchmark data on staffing and spending from 286 companies. The release shows medians on six fundamental benchmarks, such as total legal spend as a percentage of revenue as well as a range of other results.

If you would like to get a copy of the Spring Release, complete the confidential online survey here; (https://novisurvey.net/n/GCM2014.aspx).  Aside from some demographic questions like name, email and industry, the no-cost survey asks for your 2013 number of lawyers, paralegals, and other staff; inside and external legal spend; and revenue.  Participants will receive the Summer Release in August.

Plots with All CAPITAL LETTER axis labels, locations at the left and top and no title

The graphical visualization skills of the New York Times leave me envious. For that reason, a plot in the sports section on August 12, 2015 regarding aces by male tennis players caught my eye. Not having the data available to the Times, I sort-of re-created the plot below. In the same style, mine shows how many law departments participated in the GC Metrics benchmarking survey, sponsored by Major, Lindsey & Africa, during its first five years.

8-14-15 NY Times

The three salient features of the plot that are discussed below are (1) the location of the axis labels, (2) the case used for them, and (3) the absence of a title.

 

The axis label for the vertical, y-axis perches on top of that axis and reads left-to-right instead of in the usual location in the middle and rotated 90 degrees. It is easier to read left to right, to be sure, but it takes a moment to locate a label far from the customary place. The axis label for the horizontal axis lurks on the far left, instead of in the middle of the plot. Perhaps this placement has advantages, perhaps not.

 

Generally speaking, WHEN YOU WRITE IN ALL CAPITAL LETTERS it comes across as shouting (“flaming” in the online world). Even odder to me is that the names of the tennis players next to each point are not in all caps, so the discrepancy is even more jarring.

 

As to the final feature, the lack of a title to the plot might be excused by the surrounding presence of the article. The article talks about career aces (a serve in tennis that the opponent served to does not even hit – a bit like a swing and a miss by a batter in baseball) by notable tennis players and their number of aces per match. Career aces are on the bottom axis; aces per match are on the vertical axis, and both axis are labeled. Even so, plots should convey self-contained stories, which means putting in a brief summary for a title.

 

To make clear the differences between the Times plot and the three elements discussed above about it, the plot below incorporates the changes. Note: I like the placement of the labels, I would lower-case them, and I would add a title.

8-14-15 NY Times revised

Plot with useless grid lines, colors without significance, and curious sort order of bars

Let’s take a look at a plot from a survey conducted by DigitalWarRoom, its “2015 Ediscovery IQ Meter.” On page 12 of the report, which was published in July 2105, there is a plot that looks quite similar to the plot below. (The reproduction does not have tiny tick marks on the horizontal axis placed at the ends of the axis and between the vertical bars nor does it match the green color gradient of the bars.) Nevertheless, we can draw from it a few lessons in graphical presentation.

Rplot01 Digital green

First, if you label for bars with values, such as the four percentages on top of the four bars, you don’t gain anything from horizontal grid lines. In truth, you clutter the plot. Even odder, the vertical y-axis has no values so the reader can’t even calibrate lines to values!

Second, although the plot above does not show how the original has each bar with the same gradient of darker green at the bottom gradually changing hue to a lighter gree

n (or white) at the top of the bar, it still conveys the lack of meaning derived from a color scheme for the bars. Color should not be splashed on graphics unless it serves a purpose.

 

Third, this graph sorts the bars from high on the left to low on the right, but that is not the most sensible sort. Most people would read left to right and assume “Not prepared” would be the first bar, “Somewhat prepared” would be to its right, “Prepared” and then “Very prepared” on the right. As it is, the eye has to hopscotch around to make sense of the progression of preparedness.

What would an improved plot look like?

Rplot09

 

Reproducible research regarding legal management surveys – lessons from pharma

Bearing in mind the benefits of more reproducible research regarding legal management, a piece in the Economist, July 25, 2015 at page 8, makes a good point. That short article explains how pharmaceutical companies have not been publishing results from clinical trials regarding their drug research that are negative or inconclusive. Without the full results, no one can accurately and comprehensively assess the efficacy of a new drug.

 

Stated differently, someone is not practicing reproducible research if they cherry pick only the results that show their drug in clinical trials has met with success. It would be akin to a surveyor who asks law firms or law departments a set of questions but then publicizes only the data that puts the surveyor’s views or products or services in a favorable light. In contrast, research that is done with integrity discloses contradictory findings, unexplained findings, as well as favorable findings. Reproducible research implies full disclosure.

Weighting survey responses so that the findings better represent underlying demographics

Surveyors sometimes weight their data to make the findings more representative of some other set of information. This point comes through in an article in the New York Times, July 23, 2015 at 83 regarding political polls. Pollsters may get too few responses from some demographic slice, such as farmers, and want to correct for that imbalance when they present conclusions respecting the entire population. The polling company weights the few farmer respondents more heavily to make up for the imbalance and represent the locations of residents more in line with reality.

 

How does this transformation of data apply in surveys for the legal industry? Let’s assume that we know roughly how many companies in the United States there are that have revenue over $100 million by each major industry. Let’s also assume that a benchmark survey of law departments has gathered compensation data regarding the lawyers in the responding law departments.

 

If the participants in the law department survey materially under-represent some industry — the proportions in each industry don’t match the proportions that we know to be true – it is not hard to adjust the compensation data. One way would be to replicated representatives in industries that have been insufficient number to be proportional by enough to make up the difference. This is what is happens when a surveyor weights survey data to present more proportional data.

 

To summarize, you need to have some basis for an underlying distribution of data, such as numbers of companies above a certain size or industry. Secondly, you need a survey data set that you can adjust so that it reflects the proportions of the first data set.

Nything but trivial – the crucial ubiquity of “N = “ in survey findings

A precept of reproducible research, such as survey results that allow readers to understand the methodology and credibility of the findings, is to make generous use of “N = some number”. That conventional shorthand for “how many are we talking about” shows up in almost every reproducible-research graphic. Whether in the title of a plot, the text that relates to it, on the plot itself or in a footnote, a reader should always be quickly able to learn how many respondents answered each question or how many documents were reviewed or how many law departments had a given benchmark, or whatever pertains to the topic of the plot.

 

The larger the N, the more reliable the averages or medians that result from the data. For example, if the “average base compensation of general counsel” rose 2% from one year to the next, it makes a huge difference whether that change applies to N = 8 [general counsel] or N = 80.  Changes in small numbers of observations have much less credibility than changes in large numbers.

Choices on plots that involve flipping axes, using points instead of bars, and axis labels for intervals

We can take one more look at the seminal Winston & Strawn plot, now streamlined and improved as discussed previously. A few graphical design choices deserve comment. We emphasize, however, that graphical design choices are many, which means the permutations and combinations of them are even more numerous. Experience (and some research on how humans perceive and interpret graphs) suggest quite a few well-accepted guidelines, such as simplicity and clarity, but graphical visualization remains in the subjective domain of what feels appropriate to the designer. We could analogize to writing style.

 

A convention in plotting is that the so-called factors run along the x-axis at bottom and the quantitative values run up the y-axis on the left. With such long axis labels, however, that choice has no appeal here. If we shorten the labels and rotate them, it is possible, as seen in the plot below.

 

Another choice would have eschewed bars in favor of points.

 

Finally, had there been finer Interval numbering on the lower axis there would have been no need for the obtrusive numbers at the end. The plot below shows how this would have looked with points and intervals and short, rotated labels.

Rplot points angles

Attractive spacing and width of bars on plots; informative labels

Returning once again to the same plot from the Winston & Strawn survey report, but shifting from criticism, we should praise several aspects of the original plot.

Screenshot (6)_snip Winston pg19

The somewhat-narrow width of the bars makes a more appealing impression than when bars are thick and therefore tightly packed shoulder to shoulder. Compare the version below where thick bars put more ink on the plot, but offer no more insights or clarity.

Rplot08nojunk

Similarly, the spacing between the bars helps a reader take in the message of the plot, and better than very narrow lines. The version above takes away that spacing although it adds around each box a frame colored black to clarify individual bars. This is not an improvement!

 

Third, the labels for each risk element are clearly written and spelled out on the left, vertical axis. An alternative choice could have been placing the text above the bars. The plot below shows labels on top of the bars.

Rplot label over bars

 

Fourth the plot takes up most of the page and has been placed squarely in the middle of it and therefore becomes the obvious focus of attention.

Superfluous elements – chart junk – but two useful additions

We revisit the same Winston & Strawn plot which appears as the plot as it was in the most recent post in its improved re-incarnation. Now, let’s take up four more observations.

 

The thick black line on the vertical y-axis adds nothing: It is an example of what is referred to as “chart junk”, an element of a plot that adds no useful information but clutters up the plot and makes it that much harder to grasp.

 

Second, neither axis has a label to explain what the axis represents. Labels are generally a good thing so that a plot can stand on its own without explanations in the report text.

 

Third, the plot lacks a title, which also helps make it self-contained. By that term I mean that a reader can understand what the plot has to offer without having to read elsewhere. It is true that the header of the page serves like a plot title, but it is in a different color and font and location than the plot itself. For PowerPoint decks, headers often serve a different purpose than as a surrogate plot title.

 

A final two steps took out ticks and panel borders. The text labels quite adequately match up to the horizontal bars, so the tiny tick marks on the left, y-axis fall into disfavor. And, nothing is added by the gray border around the plot, in my opinion. Just the facts, ma’am.

 

Let’s unveil the de-cluttered, self-contained plot!

Rplot08nojunk

Excessive use of colors in a plot; sorting an axis

Another aspect of the plot that has been discussed previously [Click here for the latest post in this series] should be called out.

Whoever prepared the plot chose to color differently each bar of the three risks most often selected. The blue bar represents “geographic locations in which the company operates”, a sort-of red bar represents another risk, and the third with yellow. In addition to those color distinctions, the plot also embeds the labels of those three risks in black boxes with white font. Shown below is the plot as it originally appeared.

Screenshot (6)_snip Winston pg19Neither of these graphical techniques add value to the plot or, indeed, make sense. They make readers work more to figure them out. Are the choices of colors significant, as in red-yellow-green means something? Is there a linkage between the coloring and the boxing? What do either or both tell us that the length of the bar and the label at the end don’t?

To emphasize the three leading risks, this plot could have sorted the risks in decreasing order of selection, as shown below.

Winstnocolorsorted

It is conventional to place the largest item at the top and the others in descending order down to the smallest on the bottom. Sorting data by something meaningful makes a clearer point than random coloring and redundant boxing.

Multiple and superfluous typography used on a plot

We return to the same survey plot and our topic of effective visualization of survey results. To see the previous post that explains the source data and the purpose of this series, click here. The version shown below incorporates the changes recommended previously regarding redundant data and serves as the starting point for the improvements discussed here. Let’s focus on the typography.

Winstonpg19noredundantdata2

 

A font comes from a font family, such as the familiar Helvetica, Courier or Times Roman. The face of a font could be normal, italic, bold, upper case, or other formats. Third, with any family and face, the size of a letter, number or symbol can be small, medium, large or some specified size. There are other ways to characterize type (such as kerning and left or right alignment), but we will limit ourselves here to the three of family, face and size. We will use the term “typeset” to summarize font, face, and size.

 

The font on the left-hand, y-axis labels is different from the font on the x axis along the bottom, and both of those fonts differ from the bulky numbers at the ends of the columns. Additionally, on the original plot, but not shown here, there are black rectangles around three of the labels, which also have white coloring instead of black, so we could say that there are four different typesets employed in this one plot.

 

Compounding the multiplicity of fonts and colors, the typeset comes in at least three sizes.

 

Sometimes the designer of a plot deliberately interjects a different font/family/size, such as for emphasis, or to bring to the reader’s attention something important. But none of the four variations on the original plot convey any special meaning (although the numbers at the ends of the bars give the gist of the plot and might therefore justify the bold face).

 

To show how one might improve the plot by unifying the typeset, the plot below renders each of the text elements in Helvetica, 12 point, plain, black. Unless there is an informational reason to change fonts, stick with the same set.

Winstonsamefont