The enormous growth of the World Wide Web has made it important to perform resource discovery efficiently for any given topic. Several new techniques have been proposed in the recent years for this kind of topic specific web-mining, and among them a key new technique called focused crawling which is able to crawl topic-specific portions of the web without having to explore all pages. Most existing research on focused crawling considers a simple topic definition that typically consists of one or more keywords connected by an OR operator. However this kind of simple topic definition may result in too many irrelevant pages in which the same keyword appears in a wrong context.
In this research we explore new strategies for crawling topic specific portions of the web using complex and precise predicates. A complex predicate will allow the user to precisely specify a topic using Boolean operators such as "AND", "OR" and "NOT". Our work will concentrate on defining a format to specify this kind of a complex topic definition and secondly on devising a crawl strategy to crawl the topic specific portions of the web defined by the complex predicate, efficiently and with minimal overhead. Our new crawl strategy will improve the performance of topic-specific web crawling by reducing the number of irrelevant pages crawled.
In order to demonstrate the effectiveness of the above approach, we have built a complete focused crawler called "Eureka" with complex predicate support, and a search engine that indexes and supports end-user searches on the crawled pages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.