Create multiple plots to avoid over-plotting of points

Some scatterplots have so many data points close to each other that you can’t distinguish much from the cloudy mass where they cluster. That problem is known to data scientists as over-plotting. An example would be a plot that shows for a large law department the amounts of incoming invoices clustered by month, where each invoice is represented by a separate point. For invoices of a common amount range, such as $3,000 to $5,000, there would be lots of them each month, and the plot would blur them together.

One technique to deal with such unintelligible blobs of points is to plot the same data on two, three or more plots. Each one has the same axes but displays the data of law firms of a certain size or matter type. Doing that, readers of the plot will be much more likely able to discern individual points. Some data visualizers refer to sets of comparable plots as “panels” or “small groups.”

A second graphing technique uses hexbins, six-sided polygons that are shaded to indicate the density of points in a certain area. You lose precision with hexbins, but if there were that many points in an area, you would not have been able to pick out individual points anyway. The color gradient of the shadings, however, readily conveys relative density – darker colors, for example, tell us that many more invoices of that type arrived than lighter-colored hexbins.

We welcome comments

Your email address will not be published. Required fields are marked *