Scoble-Arrington on Google’s SearchWiki

In a recent post Robert Scoble explains why he believes Mike Arrington is wrong for about hating it so much. Interesting discussion.

I’m probably with Scoble on this one but for a different reason. See, I don’t believe Google have added this feature to make me spend more time on their page so they can make money from advertising. An easier route would have been to add some ads on the search page (Schmidt told Cramer on Mad Money recently that they could have made good few millions per day just from that but obviously decided against) or make the search result page more cumbersome on purpose (ridiculous, I know).

Time on page is not Google’s game. After all their utility, from an end-user perspective, is to help users find the things they are looking for as quickly as possible. Their game is accuracy and relevance and so far it played well at the static (albeit fast changing) page level. The problem is that this is becoming more and more difficult. As information chunks are getting smaller (blog post to twitter messages) the intrinsic ‘information value’ in those submission is very illusive. Augmenting the underlying text with user comments helps to beef up the indexed body making it more ’searchable’ and easier to handle by AdSense. Google’s new SearchWiki does just that!

Larger haystack, same needle

Search engine company Cuil has launched a new service earlier today, claiming a 120 billion page index – larger than any of its competitors, including Google. Building an index of this magnitude is definitely impressive (or shall I say cool) but only addresses one aspect of the ever-growing problem of information overload on the web. The others are maintaining freshness, scaling query execution, delivering relevance and providing usable presentation.

Perhaps the greatest achievement of Google is index freshness and query scalability. They don’t need thousands of machines for nothing. Page-rank did introduce an ingenious and elegant way of ranking results but it has not fundamental change the TF-IDF algorithms used by information retrieval engines 30 years its senior. So where is the brilliance? In my humble option it’s the sheer engineering required to incrementally update a very large weighted reachability matrix, scaling the crawler and delivering sub-second query execution time.

Cuil are claiming a massive improvement in these areas, which could potentially reduce the cost of index maintenance and consequently increase its freshness, as well as deliver lightning fast performance to end-users. However, Cuil are also among the first large scale search engines to introduce some level of semantic analysis. The ‘Explore by Category’ widget on the top right of the results page is probably the most significant usability improvement delivered by Cuil today. This small feature, if used along a rich taxonomy (it’s too early to say, but the few queries I have tried did come up with relevant categories) is a huge promise. It has the potential to change the way people interact with search results – navigating deeper into a concept rather than manually refining the query.

Relevance of the result set is a different story. Here, again through a brief experiment, I was not overly impressed. Yes, results on the first page (depending on how broad the query was) came up relevant, but are they the most relevant? This, of course, is highly dependent on the subjective measure of what one considers relevant to a search term.

Now this is what really got me puzzled – presentation. What is this nonsense? If all pages are considered equal candidates for a match and users expect the ‘answer’ to be in one ‘most relevant’ page, what value does a news portal arrangement provide? Is the second block on the third row more relevant than the one right next to it? Just show results in a way that enables a quick scan. At the end of the day, users are looking for the needle – they don’t want to stay and read, and sure don’t care about a bigger haystack.