: WEB HARVESTING
Web Harvesting begins by identifying and specifying as input to a computer program a list of URLs
that define a specialized collection or set of knowledge.
The computer program then begins to download this list of URLs.
Embedded hyperlinks that are encountered can be either followed or ignored, depending
on human or machine guidance.
A key differentiation between Web harvesting and general purpose Web
crawlers is that for Web harvesting, crawl depth will be defined and the crawls need not recursively
follow URLs until all links have been exhausted.
The downloaded content is then indexed by the
search engine application and offered to information customers as a searchable Web application.
Information customers can then access and search the Web application and follow hyperlinks to
the original URLs that meet their search criteria.