How to tell if a correlation between two sets of numbers is statistically significant

I have written frequently about correlations (See my post of Feb.13, 2008: correlations with 16 references.). What I haven’t explained is how to find out whether a correlation is one that you can rely on.

The statistician’s term is “statistically significant,” a standard that has three components. To explain them let’s start with a correlation, such as between median partner hourly rates and median associate hourly rates. You collect that data for 20 law firms, enter it into a spreadsheet, and use a built-in function to calculate the correlation between the two sets of figures. The correlation is 0.45. How confident can you be that the correlation really means something and isn’t just some chance finding? I found a very clear explanation online of statistical significance and tinkered with it.

The easiest way to find out is to look in a statistics book that has a table of critical values of r (the correlation figure, here 0.45). You need to decide on a significance level, which is commonly called alpha and set at .05. This means that the odds that the correlation is a chance occurrence are no more than 5 out of 100.

You also have to compute something called the degrees of freedom or df. The df is the number of data points you have less two (=N-2): in this example 20-2 = 18. The more data, the more reliable the findings. For the third component you have to decide whether you are doing a one-tailed or two-tailed test. In this example, since you believe the relationship between partner rates and associate rates is positive, youl opt for the one-tailed test.

With these three pieces of information — the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (one-tailed) – you can look up the significance of the correlation you found. Online sites will calculate it for you, such as Vassar’s.