Spring Release of GC Metrics published; take part now to get the Summer Release!

We sent out the Spring Release last week. It provides benchmark data on staffing and spending from 286 companies. The release shows medians on six fundamental benchmarks, such as total legal spend as a percentage of revenue as well as a range of other results.

If you would like to get a copy of the Spring Release, complete the confidential online survey here; (https://novisurvey.net/n/GCM2014.aspx).  Aside from some demographic questions like name, email and industry, the no-cost survey asks for your 2013 number of lawyers, paralegals, and other staff; inside and external legal spend; and revenue.  Participants will receive the Summer Release in August.

Law schools and their graduates who are Fortune 500 GCs, adjusted for enrollment of law school

My hypothesis was that the larger the law school, in terms of students enrolled, the more graduates it would have who are the general counsel of a Fortune 500 company. To keep the data tractable and the plot below legible, I took the 150 top-ranked schools from US News & World Report and kept only the 34 schools that have at least three such graduates (some of the GCs do not have a law school associated with them).

Next, I divided that graduate number for each law school by the school’s enrollment (it is unclear as I write this whether enrollment includes LLM candidates, JSD candidates and other students aside from the standard three-year LLB or JD students). So that the numbers on the graph are not miniscule, I first divided the enrollment by 100.

Accordingly, the chart below presents the law schools on the bottom axis in alphabetical order and their Fortune 500 GC graduates per 100 enrollees above with a blue dot. Chicago and Harvard stand highest at two or more graduates per 100. Note that the former is a very large school; the other modest. At the other end, Brooklyn and Miami are lowest at about 0.3 (meaning for every three hundred enrolled students they have produced one Fortune 500 GC graduate).

F500 GCs and school by enrollment

My hypothesis fails, judging from the fairly random distribution of the dots.

Sure, spreadsheet analyses can mislead, but consider the benefits

An article in the Harvard Business Review, June 2014, at 67, by Clayton Christensen and another author, includes a sidebar that criticizes overuse of spreadsheets. When strategic decisions are based on spreadsheet analysis, the authors believe managers are often misguided.

Without a doubt, spreadsheets can mislead or can create a false sense of certainty. Nevertheless, efforts to gather data and look at what that data suggest help combat the well-known shortcomings of intuition, selectively-remembered experience, and less-than-rational gut instinct.

More than that point, spreadsheets require data, and the discussions that should result from deciding which data to collect and how to collect it and then how to weight the various pieces of data should help planners think through future scenarios. Data has value in its own right, that is to say, as well as value in stimulating thinking.

To plot cities of law schools or law firms, you need longitude and latitude values

If you want to plot cities on a map, for example to show locations of law schools on a map of the United States, you need to have the longitude and latitude of each school’s city.

The brute force way to get those geographic intersections of longitude and latitude, which I call “coordinates”, is to search one at a time on a web site that provides them when you enter the city’s name. This takes a long time if you have hundreds of cities.

A second way to do it is to find some compilation of coordinates and extract the ones you need and merge that information with the appropriate law school. This, too can be a long process prone to errors.

A third way is to use an application programming interface (API) to the vast resources of Google. If you give Google a list of city names, it will return the coordinates. The trick on doing this however is that the cities must be appropriately identified. Berlin, Connecticut cannot be put in just as “Berlin” or you will get back the German city.

A choropleth showing the states whose law schools produced Fortune 500 GCs

As explained before, a choropleth map colors geographic regions according to some variable. The choropleth map below shows the United States and the regions are some of its states. Those states are colored on a gradient by a variable: the number of graduates from highly-ranked law schools in the state who serve as the general counsel of a Fortune 500 company.

F500 GCs and states of top 50 law schools


Many states are missing because they have no law school in them that has a US News & World Report ranking better than 50 (the best-ranked school has a 1 ranking). I took that subset of the full 150 law firms that have rankings simply to make the creation of this plot easier.

The gradient color code ranges from very light blue for those states with the fewest graduates from its law schools (Alabama had one, for example) up to bright red for Massachusetts, which mostly because of Harvard Law School can boast the most Fortune 500 general counsel who graduated from its law schools (50).

Fortune 500 general counsel and the ranking of their law school

Continuing the analysis of Fortune 500 chief legal officers, let’s test a hypothesis: the better the law school, the more of its graduates lead one of these illustrious legal departments. To have data regarding which schools are better, I incorporated the rankings of about 150 law schools in 2013 by US News & World Report.

The plot below has a point for each law school that had more than one general counsel of a Fortune 500 company, as reported by American Lawyer Media. Note that it does not have complete data because ALM did not report the law school of about 50 of the GCs. Finally, if US News did not rank a law school, that school is not on this plot. The plot sorts the schools from the best ranking on the left to the highest ranking on the right.

F500 GCs and LS Ranking       The high-flying point on the left is Harvard Law School, which was ranked in a tie for second and has 42 graduates serving as a Fortune 500 GC. Yale Law School, ranked number one, has 8 such graduates.

My hypothesis is somewhat supported, in that the top 50 ranked schools account for many more graduates-as-GCs than the next 50 schools. Even so, the representation in the third 50 of the ranked schools is quite robust.

Which law schools have graduated the most Fortune 500 general counsel?

ALM publishes data about the Fortune 500 companies and their chief legal officers. One of the pieces of information is the law school from which the CLO graduated. Firing up my trusty software for data analysis, I looked at the distribution of those graduates.

The plot below shows how many of that select group of general counsel graduated from each law school where the school had at least two graduates. Thus, the eight schools at the bottom left claim three graduates each. Sixty-eight law schools (out of a total of 117 different schools) had a single graduate or two graduates. I left them out because the graph becomes much harder to read with so many schools on the left axis. By the way, at least two of them are not U.S. law schools!

F500 GCs law schools


Having sorted the schools by increasing numbers of GC-graduates, it is clear that primus inter pares, by far, is Harvard Law School. Virginia (19), Michigan (16), and Georgetown (15) trail by quite a bit.

Governments are the best sources of data

A column in Bloomberg BusinessWeek, July 2014 at 10, argues against House Republicans’ efforts to prevent the Department of Education from collecting and publishing data on college costs. Without good information on such matters as all-in costs of attending a school or graduation rates, prospective students will be left mostly in the dark.

The column brought to mind that when governments require data to be submitted and make it available to the public, the data is much more reliable, comprehensive, and timely than data collected by other means. Voluntary efforts lead to low compliance and selection bias; efforts by publishers or players in a market can never reach a government agency’s level of certitude; and privately collected data is, well, private. If you want data collected over time so that you can tease out trends, the problems of non-governmental data are magnified.

To my knowledge, no Federal or state government agency obtains and makes public any information about either corporate legal departments or private law firms. There is data about the legal industry sector and labor numbers (employees, gross revenue, possibly numbers of firms) but nothing else. Particularly, data is lacking about individual law firms. You can painfully extract some data from sources such as EDGAR filings or patent applications, but the collecting agencies are not focused on metrics regarding legal industry participants.

The best legal-industry analysts can obtain comes from their own efforts or the data collection of others, flawed and incomplete though they may be. Even with that somewhat pessimistic summary, I stoutly maintain that much more can be learned from legal industry data sources and analyses.

One of the Ten Commandments of programmers working with data: record each change you make!

One hugely important lesson branded into me from analyzing data is the importance of step-by-step procedures. This may sound elementary, but when you start with an Excel file of data from a client, it is crucially important to keep an audit trail of each step of your transformations and calculations.

If you change the names of columns so that they are consistent with code you have already written, you should record and store each change. If you add another variable [think of a variable as a column in Excel], you need to track how you made that addition. Do likewise for any calculations, such as calculating and storing external spending per lawyer. And, by the way, comments along the way complement your efforts to be logical and measured.

Data preparation always involves learning as you go, so if you haven’t saved the steps you have taken, you create nightmares of uncertainty about the quality of your data. Or you can’t figure out how you got (or failed to get) some result later on.

I visualize data preparation as starting from the original data and then methodically molding it: cleaning it, re-arranging it, adding to it, sub-setting it, and naming it. When you save that sculpting, you can go back and confirm each step, or alter one or more of them, and be confident that the final data set gives you a consistent, accurate, and reliable platform for graphics and exploratory data analysis.

Subsetting and aggregating: two fundamental programming steps for analysts of data

Two very common steps for a data analyst are to subset data or to aggregate data. When you write code that subsets data, you instruct the computer to pick out a portion of the data and work with that smaller set. For example if you have data on law firm mergers, you might want to isolate the mergers in a single state or for a particular year. You would subset the larger data collection so that only the particular state or year would be worked on thereafter. Or you might want to isolate the states of a particular region. In all these instances, you would need the work-horse of programming: subset.

The reciprocal function of subsetting is aggregating. Pivot tables in Excel perform aggregation quite easily. In fact, every programming language that does quantitative analysis has the function. Very commonly, a data analyst writes code so that data is combined. Staying with the law-firm merger example, a short program segment – actually, only a line or two of code, would add up all of the lawyers in the acquiring firms of a particular state. The computer will dutifully aggregate that amount.

Many graphical plots present either subsetted data or aggregated data, or both. The two concepts and the program code that carry them out are ubiquitous in data science.

Ideologies trump arguments based on data

I had just written about levels of state regulatory burdens when I read two editorials in the New York Times, July 7, 2014 at A17. One of them describes four ways that GDP calculations mismeasure the size of our economy. For example, the author writes “In its first 20 years, the Clean Air Act generated health savings and other benefits valued at $22 trillion, compared with $500 billion in compliance costs.” He points out that the net gain is not counted in GDP. But my point is that some people will cheer that finding and accept it; others will jeer at it and vehemently reject the methodology as well; almost no one will reconsider their views.

Coincidentally, right next to that editorial, Paul Krugman bemoans the disjunction he perceives in many people between the beliefs they hold and how they process facts: “Confronted with a conflict between evidence and what they want to believe for political and/or religious reasons, many people reject the evidence.” Worse, the better informed they are, the more fervently opponents will toss out the contrary findings.

Those of us who collect and analyze data that reflect law department or law firm management decisions come to realize that the best benchmark data, the most insightful correlations, the clearest graphs stand almost no chance to persuade, or even inform, those who “just know” something different. Incentives work; money matters; technology speeds up; law firms gouge; convergence saves …. All of us, even as we cherish our self-image as being thoughtful, willing to change our minds, and open to different beliefs, are for the most part in ideological straight-jackets.