Software that finds documents based on tools other than search words

Analytical discovery software determines the relevance of documents in a collection based on the contents of the documents rather than on the presence in them of specific words or phrase. According to discovery consultant Conor Crowley in Met. Corp. Counsel, Dec. 2010 at 15, three different capabilities are offered are on the market. Some software gathers similar documents into clusters, some make binary relevance determinations, and some rank the documents they find according to an algorithm for relevance.

According to Crowley, the newer generations of analytical software use sampling and iterative learning. The software becomes more accurate as human beings outline what it should evaluate and then repeatedly assess what it finds and refine the filters, rankings, terms and other elements of the software’s functions. Once tested and taught, the software applies that learning to cull relevant documents from the entire set.

An example of such software appears in the following article, by Randall Burrows of Xerox Litigation Services. He explains that CategoriX embeds learning from linguists and statisticians to most efficiently find the right documents from the training set. The result of improving the software over repeated runs – fine-tuning what it will seek in the larger documents set – is a “ranked value of where all those documents stand relative to that original training set.”