Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

University of Windsor

Computer Science Publications

Series

2010

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Ranking Bias In Deep Web Size Estimation Using Capture Recapture Method, Jianguo Lu Jan 2010

Ranking Bias In Deep Web Size Estimation Using Capture Recapture Method, Jianguo Lu

Computer Science Publications

Many deep web data sources are ranked data sources, i.e., they rank the matched documents and return at most the top k number of results even though there are more than k documents matching the query. While estimating the size of such ranked deep web data source, it is well known that there is a ranking bias—the traditional methods tend to underestimate the size when queries overflow (match more documents than the return limit). Numerous estimation methods have been proposed to overcome the ranking bias, such as by avoiding overflowing queries during the sampling process, or by adjusting the initial …