FAROO at TMT.Communities’09

June 24th, 2009

Gosia of FAROO has been speaker and special guest at the TMT.Communities’09 conference in Warsaw, Poland.

The conference toke place in July, 18th, at the Warsaw Stock Exchange in the Chamber of Listings, and was held under the motto “Generation C”.

Here a short excerpt from the conference web site:
“What is Generation C? It’s a group of people all over the world aged 15 to 45 choosing a digitally-enhanced lifestyle and thus empowering hardware, application and service providers but also grassroots organizations like creative commons or Piratbyrån. In the world of Generation C it’s all about content, communication and cooperation. And since the content is digital it doesn’t exist without a proper medium and your favorite device. It all ads up to a digital world where people are the most important component individually and form powerfull and influencial communities all together. ”

We used the change to evangelize the power of P2P search again ;-)


Wyszukiwanie P2P - demokratyzacja wyszukiwania (Peer-to-peer Search - Democratic Search)


Warsaw Stock Exchange, Chamber of Listings

Lightweight Chinese Word Segmentation

June 9th, 2009

If you are building an internet search engine you know that there are a lot of different languages and character sets to consider. You might try to keep your algorithms language independent and with Unicode the character problems seems to be solved.

But still this is not sufficient for some languages, whose internet population meanwhile is larger than those of the U.S.

In the Chinese language words are not separated by white spaces.
But words are an important unit in information retrieval. Many operations, such as indexing and search are based on words. Therefore the word segmentation of un-segmented Chinese text is essential for a truly international search engine.

An additional challenge is a lightweight word segmenting algorithm, which could be integrated into a distributed p2p search client. While a decent recall and precision is prerequisite, small size and high speed are essential. The small size is required to keep the installation package of the p2p search client compact and the memory requirements low, while the fast speed is necessary for real time crawling. Especially the required small size is a challenge, as many segmentation approaches are based on large dictionaries.

FAROO’s lightweight word segmentation algorithm handles full & half-width as well as traditional & simplified characters.
Having your search terms in one character form you still find also all documents in the opposite character form.
This applies also for the term highlighting. Of course also documents with mixed Latin and Chinese characters are properly processed.

The Sleeping Power

May 2nd, 2009

A plea for restoring end-to-end connectivity

When the internet was born, it was truly decentralized. The most natural, core function was that users could communicate directly with each other.

But then an unholy alliance of unfounded security fears, technology naysayers, and advocates of centralized technology & walled gardens degraded the Internet, virtually removing the end to end connectivity. Well in theory, you still can connect from on public IP to another, but in a world where almost every user is behind a router this ceased to work.

You may say there is port forwarding, but it requires to configure the router manually which is simply beyond the scope of the average user. Wait, there is UPnP, which lets the application configure the router automatically! Great, but this functionality in most routers is disabled by default. Enabling it manually requires configuring the router, which is beyond the scope of the average user. Here you go again! And then there is STUN, STUNT, TURN and ICE, more hole punching hacks than standards, all operating in a gray area of specifications and differing implementations of routers or again requiring auxiliary constructions in form of additional, centralized traversal servers. But with IPv6 all will be better, right? In the IPv6 address space there are enough addresses for every atom in the universe. But before this comes effective, there are already proposals for IPv6-to-IPv6 NAT.

It’s unbelievable, after 30 years, the Internet has almost completely lost its end-to-end connectivity.

Don’t let them fool you. Neither the limited address space in IPv4, nor security is a founded reason to remove end to end connectivity. As long as the operating system asks the user for his confirmation, if he wants to allow inbound access to this computer, to this specific application everything should be fine.

Over time, people forgot about the decentralized origins of the Internet and got used to a centralized architecture. There the users connect to a centralized service provider and are solely able to communicate over this middleman, from whom they are now dependent and whom they have to pay in one or another way. The current move toward the cloud is only the next step into a fully centralized system, controlled by few big players, manifesting monopolies, and imposing additional borders and taxes. Due to the lack of standards it removes the rest of independence from users and small companies.

Well, of course the users have still a plenty of unused resources (disk space, bandwidth, processor cycles), they already paid for, and which would be super sufficient to serve as infrastructure for all kind of services. Together, they are far more powerful than all those big guys out there. Utilizing their own resources would prevent that the users had to pay a second time, making them independent from providers, who are locking them and their data into walled gardens.

Just somebody “forgot” to standardize the way all those users could unite their forces.

In such a system people would own their data, they could make could grant or remove access at their will. They wouldn’t be exposed for unsolicited data mining and their communication couldn’t be blocked, censored, inspected nor monitored. There simply wouldn’t be central instances, where providers are held as deputy for the interests of monopolistic incrusted industries or political interests.

The average user does not feel sorry, because he did not bother with that technical stuff. He just doesn’t know about the potential applications and healthy competition to the big centralized incumbents he is going to miss due to the connectivity restrictions.
Social networks, instant messaging, micro blogging, all those naturally decentralized services are still forced into a centralized corset, keeping the users in dependency of divided and walled communities.

We believe that the sleeping power of the masses can be unleashed by overcoming their artificial isolation …

FAROO introduces Continuous Search

February 15th, 2009

About 40 percent of the searches people make on the Internet are duplicate queries they have made at least once before.

Now FAROO assists in the time consuming task of staying up to date, and alerts you in real time about relevant news. Based on attention data FAROO automatically detects queries with long term relevancy to the user. Opposite of other solutions there is no extra action from the user required.

This serves also as smart discovery search, providing the user automatically with updates in his fields of interest.

Additionally, also a list of currently Hot Topics and related images are displayed. This provides in many cases a good visual feedback of breaking events or topics dominating the news.

Currently we are using Twitter data for update detection and real time search, as our index is not yet comprehensive enough.
But in the long run our own p2p data of all visited web pages will provide even more relevant results.

The new set of features will be available with our next release.

Follow FAROO on Twitter!

February 2nd, 2009

Now we are also twittering! Follow us on twitter.com/faroo_p2p to stay informed or send a tweet @faroo_p2p to get in touch.

After integrating Twitter search into FAROO we thought we should give some content back ;-)

P.S. You may follow us also on friendfeed.com/faroo

FAROO Redesign

January 25th, 2009

After our previous user interface served us for more than three years, we decided to look for a cleaner, more consistent, more modern (Web 2.0-ish) design.

We tried to remove all non essential level of details, and to focus on search. It’s time to move on from the developer and technology perspective and to give the user experience first priority.

And finally, we integrated the first stage of social search into the new design.

Search page (click to zoom)

Result page (click to zoom)

Previewpage (click to zoom)

While our p2p technology has many advantages, we don’t expect the average user to look into this deeply. In our fast moving times the decision whether he likes our search is made within seconds, rather than by intense evaluation.

We hope that our new design will help to make a good first impression. It will be shipped with the next release.

Let us know what you think.

Goodbye 2008, Welcome 2009!

January 1st, 2009

As every year it’s time to hold on for a second, look back to see what have been accomplished, and try an outlook for the next things to come.

2008 has been very intense and successful for us at FAROO.

We significantly enhanced our Peer-to-peer technology

FAROO attracted attention as a promising alternative

We launched the product to a wider audience

We continued to build up the company

  • by strengthening our team
  • and securing the base for our future growth.

While last year we demonstrated the technological feasibility of p2p search, this year we will concentrate on a large scale distribution and indexing, as base for a much broader adaption in the market.

We would like to thank everybody who supported us during that year, our long term friends and the exceptional people we met this year, and we hope that also in 2009 you will be on our side to further explore the future of search.

FAROO a Top 10 Alternative Search Engine 2008

December 26th, 2008

FAROO has been elected as one of the Top 10 Alternative Search Engines of 2008 by ReadWriteWeb and Charles Knight of AltSearchEngines.

We also made it into the ReadWriteWeb list of the Top 100 Products of 2008.

Of course we are very proud to be honored in this way. While we know that the recognition in the tech community is an important step, the ultimate proof of any radical paradigm shift is the broad adoption in the mass market.

We are committed to make this happen.

Codename Locust: A collaborative crawler swarm

October 25th, 2008

While FAROO already contains a distributed crawler, each peer is yet crawling the web independently and un-coordinated.
The next generation is a swarm of collaborating crawlers.
They are grazing the web in the shortest possible time, with no overlap, while leaving no blind spot.

The new species exhibits the following behavior:

  • No overlap and no blind spots also under churn.
  • Dynamic task sharing for growing user number with low communication cost.
  • Complete crawling and re-crawling: Detects crawling completion and switches to re-crawl mode.
  • Politeness: low impact on websites by both the individual crawler and the swarm.
  • Low impact to the peer and workgroup.
  • Limited crawl queue size.
  • Exploiting geographic proximity.
  • Exploiting interest and linguistic proximity:  not the absolute size of the index is important, but the overlap between user interest and indexed content.
  • Relevance based crawling prioritization.
  • Spam reduction.
  • Spider trap proof.

A small swarm is already harvesting the green leaves from the web. With the next release we will set the free the rest of the bread.

FAROO at Web 2.0 Expo in Berlin

September 25th, 2008


Web 2.0 Expo Europe 2008
Also this year we will attend the Web 2.0 Expo in Berlin (21.-23 October).

On the occasion of the Expo there will be an event organized by Charles Knight of AltSearchEngines and other search engines. We will join the event and looking already forward to it.