Published on:

A calculation called “entropy” can tell us how concentrated the companies are in an industry. Concentration means how large the share is of revenue for the largest company in the industry, that company and the next larges, the two largest and the third, and so on. Specifically, entropy is measured through information described by the shape of the probability distribution of market shares. A higher entropy index describes a large number of participants and represents a lower concentration and consequently higher competition in the industry.

It would be my hypothesis that law departments in lower entropy industries – those dominated by a few companies – would enjoy better benchmarks than law departments in higher entropy industries – those with many companies and no dominant entity.

According to the Journal of Management In Engineering, Jan. 2005 at 19, several alternative indices of entropy gauge industry concentration, including “ogive, national average, portfolio, McLaughlin, and information theory”. However, the article argues, the entropy measure is superior to other measurements in that “entropy can be decomposed into additive elements which define the contribution of diversification at each level of product aggregation to the total”. Accordingly, entropy has frequently been used to measure the degree of industrial concentration and thus competition within an industry.

Published on:

Some scatterplots have so many data points close to each other that you can’t distinguish much from the cloudy mass where they cluster. That problem is known to data scientists as over-plotting. An example would be a plot that shows for a large law department the amounts of incoming invoices clustered by month, where each invoice is represented by a separate point. For invoices of a common amount range, such as $3,000 to $5,000, there would be lots of them each month, and the plot would blur them together.

One technique to deal with such unintelligible blobs of points is to plot the same data on two, three or more plots. Each one has the same axes but displays the data of law firms of a certain size or matter type. Doing that, readers of the plot will be much more likely able to discern individual points. Some data visualizers refer to sets of comparable plots as “panels” or “small groups.”

A second graphing technique uses hexbins, six-sided polygons that are shaded to indicate the density of points in a certain area. You lose precision with hexbins, but if there were that many points in an area, you would not have been able to pick out individual points anyway. The color gradient of the shadings, however, readily conveys relative density – darker colors, for example, tell us that many more invoices of that type arrived than lighter-colored hexbins.

Published on:

If we had more data on the sizes of law firms retained by U.S. law departments, the industry would have some guidelines for typical distributions. For example, it might be a rule of thumb that roughly one-third of the law firms retained by a typical U.S. law department would be small, say with less than 10 lawyers. Perhaps one-third of the firms would be in the intermediate category of 11-to-20 lawyers. The final third would be large firms with more than 21 lawyers.

The guidelines would vary depending on how you rank the firms. If it is by total fees paid perhaps larger firms of predominate. For example, 60% of fees to the top third of firms by size. If you rank by number of matters handled that could present a very different picture. This data from a representative group of law departments would likely undermine claims about convergence.

Published on:

A Palo Alto startup, Ayasdi, builds software that uses the branch of mathematics known as topology. Topology concerns how shapes interact with space, and has application to portraying large collections of data.

As described in Bloomberg BusinessWeek, January 28, 2013 at 34, Avasdi’s software can take huge amounts of data and help users find patterns in it. Users can upload their information to the company’s data centers which then applies its algorithms to look for relationships and interconnections. Those findings appear as colorful 3-D pictures on screen and users can ask questions about that data and manipulate it.

Topological computations and graphics sound like complements to other classification and clustering tools available to data analysts. Specifically, benchmark surveys amass many variables and hundreds participating companies. Software of the kind offered by Ayasdi may in future allow those who manage in law departments to visualize better and learn more about such accumulations of metrics.

 

Published on:

In the sciences, a recent movement is often referred to as “reproducible research.”  What it  espouses is a philosophy of transparency regarding data and analysis – share the data you collected, what you did to it, and how you did calculations and graphics.  Those who conduct surveys, for example, should make every step of what they did clear to others and available to them for review.  They should explain how they gathered their data, what they did to prepare it for analysis, the steps they carried out in the mathematical analyses and then, of course, their conclusions.

 

In the sciences, reproducible research has gone even further to make the actual data sets available to others.  Unfortunately, too many times scientific findings have failed to be corroborated by others.  Indeed, there have been some well-publicized instances of fraudulent research, fake data, and unsupportable conclusions.  That sort of check on quality is possible only if someone else can follow your tracks.

 

To the extent that law department data developed by vendors, consultants, and academics is used to produce findings, reproducible research should be the aspiration.  We may not be able to go so far as to expose the actual proprietary data that is collected, but all of us can go much farther than we do now to explain how the data was collected and what was done with it.  Explain your methodology!  Moving in that direction would improve the quality of findings in the result reliability of results.

Posted in:
Published on:
Updated:
Published on:

Mostly for lack of a better way to classify companies, benchmark surveys ask respondents to choose from a list of “industries.”  We see those lists all the time: manufacturing, technology, pharmaceutical, and so on.  In the messy real world, we all realize, companies are not so neatly boxed and defined.  Indeed, almost any company of much size does business in what could be considered more than one industry.

One way to measure diversification and therefore more accurately study the effects of industry on legal staffing and spending would be to make use of an entropy calculation.  The entropy of a company where P is the proportion of firm sales in SIC code i for a firm with N different four-digit SIC business units,

Total entropy = ∑ Piln(1/Pi) with N over the sigma symbol and i=1 underneath.

Posted in:
Published on:
Updated:
Published on:

Some benchmark surveys ask for spending data in U.S. dollars and leave it to the participants to convert their non-dollar spending however they choose to do so.  Other surveys, including GC Metrics, accepts data in whatever currency the participant uses and then has to decide on a conversion rate.

What I have done is taken the approximate average of the currency against the U.S. dollar for the calendar year involved.  By approximate I mean that I eyeball the exchange rate for the year and pick a figure that seems as representative as possible.  There are undoubtedly more precise ways to convert currencies, but they would be much more computationally intensive and harder to explain to those who receive the report.

*******************

Published on:

It is a mistake to think that your data has to be complete and clean for you to push ahead with analytics.  You will leave on the table significant savings and insights that could be realized even from imperfect and provisional models or conclusions based on partial or not-fully-scrubbed data.  For example, if you did not more than study the distribution of timekeepers who bill time to you from the five firms you use the most, that will be progress.

 

Lawyers like completeness and tidiness, but neither is a feature of complex data.  Resist the conservative reins!  This idea came from the Deloitte Review (undated) at page 16.  Data is never perfect, so it is better to get your hands dirty and work with what you have than to delay, spend money, grow frustrated and perhaps never learn anything.  Plunging in will help you figure out better what to collect and how to collect it.

Published on:

Let’s assume that in the coming years general counsel who give a thought to law department benchmarks can readily find some of those basic metrics.  If they can find them without submitting their own department’s data, they may decide not to submit if they know they compare unfavorably.  If they foresee, for instance, that their total legal spending is out of line with their industry peers, they may conclude that they should let the sleeping dogs of embarrassing metrics lie.

 

If that decision happens very much, then law department benchmark participants will tip more and more toward those departments that believe themselves are well situated in comparison to the metrics.  There will be a “race to the top” where relatively poor performers drop out and benchmarks will become tougher and tougher.  They will also grow less and less representative.

 

This would be a shame, because then the entire industry will have nothing but a distorted sense of the typical range of metrics.  After all, you would not want to base your sense of body mass index on those BMI metrics gathered only from runners of marathons.

Published on:

A network graph, when the term is used by mathematicians, means a structure comprised of nodes and edges.  For example, a law department could represent the law firms it retained during the previous year by means of such a graph.  The department would be the central node on the graph and each law firm would be at the end of an edge.

The visual depiction would be more informative if the edges were of different thicknesses to indicate the amount paid the law firm during the year.  Likewise, the length of the edge could be proportional to the number of matters handled by the law firm.

If the data from a law department were represented this way, you could also show the size of the law firms by the area of their node shape (a circle).  One further use of this graph would be to color or shape the node of each law firm according to some other factor, such as the number of timekeepers used by the firm or the number of areas of law serviced.