Blogs Blogger:   


 


Mar 22, 2007 3:59 PM PDT
Why I believe google has about 8.2 billion documents
 
Press coverage about google is quoted having anywhere between 17 billion and 30 billion documents. I believe the number is much lower.

This is how I arrived at the conclusion. Let me know if I am wrong. I started searching for keywords on google. For instance when I searched for "IT" it returned me:

"Results 1 - 10 of about 3,070,000,000 for IT. (0.07 seconds)"

which indicates that approximately 3 billion documents contained the word IT. Which was impressive. I would think it would be much harder to get a good page rank based on that keyword, as there are so many competing pages.

So then, I got greedy. I searched for "html", and that returned me:

"Results 1 - 10 of about 3,840,000,000 for html [definition]. (0.07 seconds)"

which means 3.8 billion documents contain that term.

So this sparked an idea. Perhaps, if I search for real elementary terms like "the" or "and" I would get even more documents.

"Results 1 - 10 of about 5,180,000,000 for the. (0.05 seconds)"

and I was right! So I figured, the number of documents returned (N) should asymptotically converge to some large number, and in order to get that, if I enter successively simpler terms, I would get much larger result sets. So I went further and tried individual alphabets like "a", "b", "c" and very quickly I realized that "a" returns the maximum documents:

"Results 1 - 10 of about 7,680,000,000 for a. (0.25 seconds)"

Infact a few days before I wrote this blog, there were about 8.1 billion. So something sinister must have happened! Anyway, you got the idea, that the number keeps increasing, as the simplicity increases.

A few minutes later, I thought of numbers like "0", "1", "2" and soon I realized that "1" returns even larger than "a" does:

"Results 1 - 10 of about 8,210,000,000 for 1. (0.05 seconds)"

So it looks like we are converging on this magic 8 billion number (and some change). I now concluded that google probably doesn’t have any more documents. Or any more documents worth returning:-) I would rather not be part of those remaining documents, they claim they have but are not returned.

If you ever find you are able to get a larger result set, please let me know. I am more interested in the keyword that you use.

So as far as I know, "1" wins. Yeah. So what’s special about "1". Only literary geniuses would tell you. For now I can’t think of any reason. But I do know that 8 billion is nowhere close to 17 billion leave alone 25 billion. (http://www.ams.org/featurecolumn/archive/pagerank.html)

 

Read Blogs

Daily Blogs

Xml