Crawler
-
The seeker algorithm is relatively straightforward.
Both keywords
and URLs are used to seed the search.
Keywords are used to search
online search engines to retrieve web pages, through a module
which learns effective queries.
URLs are spidered.
Speculative
fetching is performed based on expectation that site is a project
URL or a metasite, as classified by WebKB tools.
In this way, a
database of project URLs is found.
Next, we use information extraction to populate KBs about software
systems, then use these to intiate searches.
Eventually we would
like this to extend this to a set of tactics for retrieving all
information related to packaging and systems integration.