We have become overwhelmed with electronic information and it seems our situation is not going to improve. When computers first became thought of as instruments to assist us and make our lives easier we thought of a future, that would be a manageable one. We envisioned a day when documents, no matter when they were produced, would be as close as a click of the mouse and the typing of a few words. Locating information of interest was not going to take all day. What we have found is technology changes faster than we can keep up with it. This thesis will look at how we can provide faster access to the information we are looking for. Previous research in the area of document/information retrieval has mainly focused on the automated creation of abstracts and indexes. But today’s requirements are more closely related to searching for information through the use of queries. At the heart of the query process is the removal of search terms with little or no significance to the search being performed. More often than not stop-lists are constructed from the most commonly occurring words in the English language. This approach may be fine for systems, which handle information from very broad categories.

Developing A Corpus Specific Stop-list Using Quantitative Comparison

Products

Recommended products