Salton and Buckley’s Landmark Research in Experimental Text Information Retrieval
Abstract
Objectives – To compare the performance of the vector space model and the probabilistic weighting model of relevance feedback for the overall purpose of determining the most useful relevance feedback procedures. The amount of improvement that can be obtained from searching several test document collections with only one feedback iteration of each relevance feedback model was measured.
Design – The experimental design consisted of 72 different tests: 2 different relevance feedback methods, each with 6 permutations, on 6 test document collections of various sizes. A residual collection method was utilized to ascertain the “true advantage provided by the relevance feedback process.” (Salton & Buckley, 1990, p. 293)
Setting – Department of Computer Science at Cornell University.
Subjects – Six test document collections.
Methods – Relevance feedback is an effective technique for query modification that provides significant improvement in search performance. Relevance feedback entails both “term reweighting,” the modification of term weights based on term use in retrieved relevant and non-relevant documents, and “query expansion,” which is the addition of new terms from relevant documents retrieved (Harman, 1992).
Salton and Buckley (1990) evaluated two established relevance feedback models based on the vector space model (a spatial model) and the probabilistic model, respectively. Harman (1992) describes the two key differences between these competing models of relevance feedback.
[The vector space model merges] document vectors and original query vectors. This automatically reweights query terms by adding the weights from the actual occurrence of those query terms in the relevant documents, and subtracting the weights of those terms occurring in the non-relevant documents. Queries are automatically expanded by adding all the terms not in the original query that are in the relevant documents and non-relevant documents. They are expanded using both positive and negative weights based on whether the terms are coming from relevant or non-relevant documents. Yet, no new terms are actually added with negative weights; the contribution of non-relevant document terms is to modify the weighting of new terms coming from relevant documents. . . . The probabilistic model . . . is based on the distribution of query terms in relevant and non-relevant documents, This is expressed as a term weight, with the rank of each retrieved document then being the sum of the term weights for terms contained in the document that match query terms. (pp. 1-2)
Second, while the vector space model “has an inherent relationship between term reweighting and query expansion” (p. 2), the probabilistic model does not. Thus, query expansion is optional, but given its usefulness, various schemes have been proposed for expanding queries using terms from retrieved relevant documents.
In the Salton and Buckley study 3 versions of each of the two relevance feedback methods were utilized, with two different levels of query expansion, and run on 6 different test collections. More specifically, they queried test collections that ranged in size from small to large, and that represented different domains of knowledge, including medicine and engineering with 72 experimental runs in total.
Salton and Buckley examined 3 variants of the vector space model, the second and third of which were based on the first. The first model was the classic Rocchio algorithm (1971), which uses reduced document weights to modify the queries. The second model was the “Ide regular” algorithm, which reweights both relevant and non-relevant query terms (Ide, 1971). And the third model was the “Ide dec-hi” algorithm, which reweights all identified relevant items but only one retrieved nonrelevant item, the one retrieved first in the initial set of search results (Ide & Salton, 1971).
As well, 3 variants of the probabilistic model developed by S.E. Robertson (Robertson, 1986; Robertson & Spark Jones, 1976; Robertson, van Rijsbergen, & Porter, 1981; Yu, Buckley, Lam, & Salton, 1983) were examined: the conventional probabilistic approach with a 0.5 adjustment factor, the adjusted probabilistic derivation with a different adjustment factor, and finally an adjusted derivation with enhanced query term weights. The 6 vector space model and probabilistic model relevance feedback techniques are described in Table 3 (p. 293).
The performance of the first iteration feedback searches were compared solely with the results of the initial searches performed with the original query statements. The first 15 documents retrieved from the initial searches were judged for relevance by the researchers and the terms contained in these relevant and non-relevant retrieved items were used to construct the feedback queries. The authors utilized the residual collection system, which entails the removal of all items previously seen by the searcher (whether relevant or not), and to evaluate both the initial and any subsequent queries for the reduced collection only.
Both multi-valued (partial) and binary weights (1=relevant, 0=non-relevant) were used on the document terms (Table 6, p. 296). Also, two types of query expansion method were applied: expanded by the most common terms and expanded by all terms (Table 4, p. 294). While not using any query expansion and relying solely on reweighting relevant and non-relevant query terms is possible, this option was not examined. Three measures were calculated to assess relative relevance feedback performance, the rank order (recall-precision value); search precision (with respect to the average precision at 3 particular recall points of 0.75, 0.50, and 0.25), and the percentage improvement in the 3-point precision feedback and original searches.
Main Results – The best results are produced by the same relevance feedback models for all test collections examined, and conversely, the poorest results are produced by the same relevance feedback models, (Tables 4, 5, and 6, pp. 294-296). In other words, all 3 relevance feedback algorithms based on the vector space retrieval model outperformed the 3 relevance feedback algorithms based on the probabilistic retrieval model, with the best relevance feedback results obtained for the “Ide dec hi” model. This finding suggests that improvements in relevance from term reweighting are attributable primarily to reweighting relevant terms. However, the probabilistic method with adjusted derivation, specifically considering the extra weight assignments for query terms, was almost as effective as the vector space model relevance feedback algorithms.
Paired comparisons between full query expansion (all terms from the initial search are utilized in the feedback query) and partial query expansion by the most common terms from the relevant items, demonstrate that full expansion is better, however, the difference between expansion methods is small.
Conclusions – Relevance feedback methods that reformulate the initial query by reweighting existing query terms and adding new terms (query expansion) can greatly improve the relevance of search results after only one feedback iteration. The amount of improvement achieved was highly variable across the 6 test collections, from 50% to 150% in the 3-point precision. Other variables thought to influence relevance feedback performance were initial query length, characteristics of the collection, including the specificity of the terms in the collection, the size of the collection (number of documents), and average term frequency in documents. The authors recommend that the relevance feedback process be incorporated into operational text retrieval systems.
Design – The experimental design consisted of 72 different tests: 2 different relevance feedback methods, each with 6 permutations, on 6 test document collections of various sizes. A residual collection method was utilized to ascertain the “true advantage provided by the relevance feedback process.” (Salton & Buckley, 1990, p. 293)
Setting – Department of Computer Science at Cornell University.
Subjects – Six test document collections.
Methods – Relevance feedback is an effective technique for query modification that provides significant improvement in search performance. Relevance feedback entails both “term reweighting,” the modification of term weights based on term use in retrieved relevant and non-relevant documents, and “query expansion,” which is the addition of new terms from relevant documents retrieved (Harman, 1992).
Salton and Buckley (1990) evaluated two established relevance feedback models based on the vector space model (a spatial model) and the probabilistic model, respectively. Harman (1992) describes the two key differences between these competing models of relevance feedback.
[The vector space model merges] document vectors and original query vectors. This automatically reweights query terms by adding the weights from the actual occurrence of those query terms in the relevant documents, and subtracting the weights of those terms occurring in the non-relevant documents. Queries are automatically expanded by adding all the terms not in the original query that are in the relevant documents and non-relevant documents. They are expanded using both positive and negative weights based on whether the terms are coming from relevant or non-relevant documents. Yet, no new terms are actually added with negative weights; the contribution of non-relevant document terms is to modify the weighting of new terms coming from relevant documents. . . . The probabilistic model . . . is based on the distribution of query terms in relevant and non-relevant documents, This is expressed as a term weight, with the rank of each retrieved document then being the sum of the term weights for terms contained in the document that match query terms. (pp. 1-2)
Second, while the vector space model “has an inherent relationship between term reweighting and query expansion” (p. 2), the probabilistic model does not. Thus, query expansion is optional, but given its usefulness, various schemes have been proposed for expanding queries using terms from retrieved relevant documents.
In the Salton and Buckley study 3 versions of each of the two relevance feedback methods were utilized, with two different levels of query expansion, and run on 6 different test collections. More specifically, they queried test collections that ranged in size from small to large, and that represented different domains of knowledge, including medicine and engineering with 72 experimental runs in total.
Salton and Buckley examined 3 variants of the vector space model, the second and third of which were based on the first. The first model was the classic Rocchio algorithm (1971), which uses reduced document weights to modify the queries. The second model was the “Ide regular” algorithm, which reweights both relevant and non-relevant query terms (Ide, 1971). And the third model was the “Ide dec-hi” algorithm, which reweights all identified relevant items but only one retrieved nonrelevant item, the one retrieved first in the initial set of search results (Ide & Salton, 1971).
As well, 3 variants of the probabilistic model developed by S.E. Robertson (Robertson, 1986; Robertson & Spark Jones, 1976; Robertson, van Rijsbergen, & Porter, 1981; Yu, Buckley, Lam, & Salton, 1983) were examined: the conventional probabilistic approach with a 0.5 adjustment factor, the adjusted probabilistic derivation with a different adjustment factor, and finally an adjusted derivation with enhanced query term weights. The 6 vector space model and probabilistic model relevance feedback techniques are described in Table 3 (p. 293).
The performance of the first iteration feedback searches were compared solely with the results of the initial searches performed with the original query statements. The first 15 documents retrieved from the initial searches were judged for relevance by the researchers and the terms contained in these relevant and non-relevant retrieved items were used to construct the feedback queries. The authors utilized the residual collection system, which entails the removal of all items previously seen by the searcher (whether relevant or not), and to evaluate both the initial and any subsequent queries for the reduced collection only.
Both multi-valued (partial) and binary weights (1=relevant, 0=non-relevant) were used on the document terms (Table 6, p. 296). Also, two types of query expansion method were applied: expanded by the most common terms and expanded by all terms (Table 4, p. 294). While not using any query expansion and relying solely on reweighting relevant and non-relevant query terms is possible, this option was not examined. Three measures were calculated to assess relative relevance feedback performance, the rank order (recall-precision value); search precision (with respect to the average precision at 3 particular recall points of 0.75, 0.50, and 0.25), and the percentage improvement in the 3-point precision feedback and original searches.
Main Results – The best results are produced by the same relevance feedback models for all test collections examined, and conversely, the poorest results are produced by the same relevance feedback models, (Tables 4, 5, and 6, pp. 294-296). In other words, all 3 relevance feedback algorithms based on the vector space retrieval model outperformed the 3 relevance feedback algorithms based on the probabilistic retrieval model, with the best relevance feedback results obtained for the “Ide dec hi” model. This finding suggests that improvements in relevance from term reweighting are attributable primarily to reweighting relevant terms. However, the probabilistic method with adjusted derivation, specifically considering the extra weight assignments for query terms, was almost as effective as the vector space model relevance feedback algorithms.
Paired comparisons between full query expansion (all terms from the initial search are utilized in the feedback query) and partial query expansion by the most common terms from the relevant items, demonstrate that full expansion is better, however, the difference between expansion methods is small.
Conclusions – Relevance feedback methods that reformulate the initial query by reweighting existing query terms and adding new terms (query expansion) can greatly improve the relevance of search results after only one feedback iteration. The amount of improvement achieved was highly variable across the 6 test collections, from 50% to 150% in the 3-point precision. Other variables thought to influence relevance feedback performance were initial query length, characteristics of the collection, including the specificity of the terms in the collection, the size of the collection (number of documents), and average term frequency in documents. The authors recommend that the relevance feedback process be incorporated into operational text retrieval systems.