New active, community directed crawler.

Good results are most important for search. Most valuable raw material is a well filled index.
Of course in the long run the whole internet should be indexed. In the mean time we may improve search result also with a smaller index, if the crawler is exactly indexing the pages, the users are looking for. In this way for the same index size we may improve its efficiency.

This is exactly what our new active, community directed crawler is intended for. Additional to crawling visited pages, FAROO is now able to crawl autonomously.  Crawler start points are derived from visited pages and searches of the FAROO users. If a search returns only few or no results, pages are crawled in real-time and included in the results of that search. While searching the community directed crawler increases the index exactly there where it’s needed. If there are missing results, gaps are instantly closed.

Active crawling increases the index size at faster pace and overcomes the chicken egg problem when crawling only visited pages with relatively few users. By active crawling also passive peers may contribute. Increasing the index becomes independent from browsing activity, in this way also pages get indexed which nobody from the current FAROO community visited before.

The improved efficiency and speed of crawling and indexing will provide you with richer results every day.

3 Responses to “New active, community directed crawler.”

  1. [...] our blog post “New active, community directed crawler” we outlined already two years ago how our “Crawler start points are derived from visited pages“ [...]

  2. [...] crawler there where new pages emerge. In addition to instantly indexing all visited web pages our active, community directed crawler is also deriving its crawler start points from discovered [...]

  3. [...] crawler there where new pages emerge. In addition to instantly indexing all visited web pages our active, community directed crawler is also deriving its crawler start points from discovered pages.Beyond real time search this is [...]

Leave a Reply