The discovery of topical, fresh and novel information has always been an important aspect of search. Often recent events in sports, culture and economics are triggering the demand for more information.
But the perception of what recent is, has changed dramatically with the popularity of services like Twitter.
Once an index was considered up to date if pages were re-indexed once a week, but under the term “Real time search” documents are now expected to be found in search results within minutes from their creation.
There are two main challenges:
- First, the discovery of relevant, changed documents as a brute force approach of indexing the whole web within every minute is not feasible.
- Second, those documents need to be ranked right away when they appear. With the dramatically increased number of participants in content creation in social networks, blogging and micro-blogging also the amount of noise increased. To make real time search feasible, its necessary to separate the relevant documents from the increased stream of noise. Traditional ranking methods based on links fail, as new documents naturally have no history and record of incoming links. Ranking based on the absolute number of votes again penalizes new documents, which is the opposite of what we want for real time search.
The answer to both challenges is taking the crowd sourced approach to search, where the users are discovering and ranking new and relevant documents.
This sounds familiar to FAROO’s P2P architecture of instant, user driven crawling and attention based ranking (see also) . And in fact all the required genes for real-time search have been inherent parts of FAROO’s P2P architecture, long before real time search became so ubiquitous popular.
To really utilize the wisdom of crowds and deliver competitive results requires a large user base. But we will unleash the power of our approach right now by opening up in several ways:
- First, with the introduction of attention connectors to other social services we are now able to leverage a much more representative base of attention data for discovery and ranking. We do deep link crawling for all discovered links and use the number of votes among other parameters for ranking.
- And second, with providing a browser based access to our real time search service we are removing all installation hurdles and platform barriers. Our p2p client additionally offers enhanced privacy, personalized results and continuous search.
So, apart from Social Discovery and Attention Based Ranking how does FAROO differ from other real time search services?
Social Noise Filter
We analyze trust and reputation of the original source and the recommending middle man and the attention and popularity of information among the final consumer in order to separate the relevant documents from the constant real time stream of noise.
There is nothing as powerful as the human brain for categorizing information. We use again the collective intelligence of the users and aggregate the tags from all users and all connected services for a specific document. Of course you are able to search for tags and use them as filters in the faceted search.
Rich Visual Preview
A picture says a thousand words. Whenever possible a teaser picture from the article is shown in front of the text summary, not just a thumbnail of the whole webpage.
The author is displayed if available, and can be used for filtering.
It’s not just the pure news, but also the emotions which involve us and make information outstanding. FAROO detects and visualizes which kinds of sentiments have been triggered in the conversation.
RSS and ATOM result feeds
You can subscribe to the result streams, applying any combination of the faceted search filters. So you can get notified and browse through the news in you preferred web or client based feed reader.
Multi Language support
The real time search services are still dominated by English content. But meanwhile the country with the most internet users is China, and due to the long tail the vast majority of Internet users use different languages than English. So a language indifferent voting, ranking and searching is certainly not appropriate. Multi language search results come together with a localized user interface.
Our faceted search enables to navigate a multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. This helps to cope with the increasing flow of information by narrowing, drill down, refining and filtering.
Faceted search provides also a simple statistical overview of the current and recent activities in different languages, sources and topics.
Architecture and Approach
But the most signifiant difference is, that for us real time search is just one part of a much broader, unified and distributed web search approach.
We believe that the era of document centered search is over. The equally important role of users and conversation, both as target of search as well as by contributing to discovery and ranking should be reflected in a adequate infrastructure.
As long as both source and recipients of information are distributed the natural design for search is distributed, despite the increasing tendencies to incapacitate the collective force of users by removing the distributed origins of the internet through cloud services and cloud based operating systems. P2P provides an efficient alternative to those concentration and centralization tendencies in search.
In the longer perspective, with an increased peer-to-peer user base the real time search capability based on a client approach with implicit discovery and attention ranking is superior to explicit mentions, as every visited web page is covered. This is important, as the majority of links also in real time search belongs to the long tail. They appear once or not at all in the Twitter stream, and can’t be discovered and ranked by popularity through explicit voting.
In real time search the amount of index data is limited, because only recent documents with high attention and reputation need to be indexed. This allows a centralized infrastructure at moderate cost. But as soon as search moves beyond the short head of real time search and aims to fully index the long tail of the whole web, then our distributed peer-to-peer architecture provides a huge cost advantage.
Scaling & Market Entry Barrier
In web search we have three different types of scaling issues:
1. Search load grows with user number
P2P scales organically, as every additional user also provides additional infrastructure
2. With the growth of the internet more documents needs be indexed (requiring more index space)
P2P scales, as the average hard disk size of the users grows, and the number of users who might provide disk space grows as well
3. With the growth of the internet more documents needs to be crawled in the same time
P2P scales as the average bandwidth per user grows, and the number of users who might take part in crawling grows as well.
Additionally P2P users help to smarten up the crawling by discovering the most relevant and recently changed documents.
For market dominating incumbents the scaling in web search is not so much a problem.
For now they solve it just with money, derived from a quasi advertising monopoly and its giant existing user base. But this brute force approach of replicating the whole internet into one system doesn’t leave the Internet unchanged. It bears the danger that one day the original is replaced by its copy.
But for small companies the huge infrastructure costs are posing an effective market entry barrier. Opposite to other services, where the infrastructure requirements are proportional to the user number, for web search you have to index the whole internet from the first user on, to provide competitive search results.
This is where P2P comes in, effectively reducing the infrastructure costs and lowering the market entry barrier.
Try our beta at search.faroo.com or see the screencast: