Published on:

Concept searching: semantic and linguistic approaches

In document search, some software locates documents based on concepts whereas some uses linguistic analysis. I am not competent to evaluate the two approaches against each other, but I will give my understanding of the basic difference.

Semantic-search software takes significant words in a set of documents and looks at every other such word in the set. It draws conclusions about what documents are related to each other according to the frequency with which certain words show up in proximity to other words. The software builds up relationships so that it can link one document to another based on those proximity and frequency connections. A leading vendor of such search software is Attenex with its Attenex Patterns 4.0. I appreciate the assistance of Attenex’s Michael Korch, who sent me material on this topic, but all errors are my own.

By contrast, with linguistic-concept software humans have identified and defined the key words in a collection of documents. One of the leading vendors of this approach is Cognition Technologies with CognitionSearch. Cognition Technology’s linguists have spent years compiling a linguistic analysis of the English language (think of it as a dictionary) which handles almost the entire common English language. With this work done, Cognition no longer needs new human work to handle a particular document base. If a new terminology has to be learned, such as specialized terms or company-specific product names, the software learns it automatically. CognitionSearch uses computational linguistic science to analyze a hierarchy of meanings of a word (ontology), all the forms the word might take (morphology), and a thesaurus of related concepts (synonymy). I appreciate the advice of Brian Maser of Cognition for this summary.

As mentioned, I won’t try to sort out the pros and cons of the two schools of document search. It does seem worthwhile, however, to point out for those who care about in-house management the fundamental variance in approach.

Posted in:
Published on:

Comments are closed.