• The web is a trillion pages strong and growing says Google

  • milkyway
    The Google index started with 26 million pages in 1998, shot to a billion pages in 2000 and now it is has hit a new milestone: 1 trillion (as in 1,000,000,000,000) unique url’s. Google claims not to index every one of the trillion pages as indexing can be expensive as many of them are similar to each other, or represent auto-generated content. The way Google indexes has evolved it now indexes blogs and other rapidly changing websites every 15 minutes. Michael Arrington of Techcrunch hints at something big coming up next week which may challenge Google position of having the most comprehensive index of any search engine.

    To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google’s index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

    Source

    Comments

  • Recent Posts

    Google Glass gets a host of third party apps - Facebook, Twitter, Tumblr, CNN and more

    Google Glass gets a host of third party apps - Facebook, Twitter, Tumblr, CNN and more

    Yesterday, Google’s annual I/O developer conference shed light on something we're all too eager to lay hands on, the Google Glass Project and the apps that could come with it,....

    Huggies makes a diaper sensor that tweets when its time for a diaper change

    Huggies makes a diaper sensor that tweets when its time for a diaper change

    Huggies along with the advertising agency Ogivly Brasil has developed TweetPee, a diaper attachment which keeps a track of moisture levels and alerts the parents with a tweet or text....

    Preserve irreplaceable files with Carbonite cloud backup

    Preserve irreplaceable files with Carbonite cloud backup

    Cloud storage and backup has become more of a necessity than just a trend; it makes storing and preserving files hassle-free. Out of many options, Carbonite is a popular cloud....

    Sony to release a 13.3” A4 size “digital paper” to replace the usage of conventional paper

    Sony to release a 13.3” A4 size “digital paper” to replace the usage of conventional paper

    Educational Institutions and offices are the two highest consumers of paper for their daily functioning, disappearing forests and its menacing environmental hazards has fuelled the research of alternative means. Sony,....

    International Space Station to ditch Windows and switch to Linux for stability and security

    International Space Station to ditch Windows and switch to Linux for stability and security

    Laptops on board of International Space Station's 'opsLAN' network, that provide vital services and functions to the astronauts, are being switched to Linux by removing Windows. Microsoft is a known....

  • Follow