Published on:

Understanding the underlying metrics: an example from search engine results

I was interested how many times certain law departments show up in Google search results. When I searched “Google law department”, Google returned what it determined are the top 10 web pages for that search. At the top of the first page, in modest grey font, it said “Page 1 of 4,060 results (0.16 seconds)”. In fact, those “results” merely estimated the total number of “hits” the search would have found had the search engine carefully scoured what had been indexed on the Web. Those are not actual hits.

Moreover, the grey results number drops as you call up subsequent pages of 10 results each. The second page showed 4,050 results, while the third and final page showed 25 results. Eventually the results estimate stabilizes as on this search it did at 25. My second search, for “Microsoft law department”, started at 242 results but that estimate shrank to 37 by the fifth and final page.

Out of curiosity, I ran identical searches on Bing. The Google search returned 15 on the last of two pages while the Microsoft search returned 22 on the second page. I do not know why Google stabilized at more than twice as many results for Google and 60% higher for Microsoft.

My point goes to the heart of data analysis. You have to do your best to understand the accuracy and reliability of numbers that you use. Then, you owe it to those who might rely on your results to explain them as well as you can and to point out possible limitations in those numbers.