BreadcrumbsHome / Data harvest – a look at how Google sees users on Google+
Data harvest – a look at how Google sees users on Google+
Last Updated on Friday, 29 July 2011 04:30 Written by Bot Friday, 29 July 2011 03:56
Back in 1998, there were these twofix guys who figured out that if all of the stuff on the internet could be indexed, sorted, and presented in a useful fashion, it would be really great. These guys formed Google, and they figured out that even if it is not immediately useful, being able to collect as much information about how people use the internet was a pretty big deal. Since then, Google has grown out in every direction imaginable to collect more and more information, harvest all the data around them, and to sort it out until it shows a trend or becomes otherwise useful.
Google hasn’t quite figured out a few avenues to collect data yet, but new services are coming out all the time to fix that. One of these services is the barely a month old Google+. As a rapidly expanding social network, Google will have gobs and gobs of new data to collect and figure out how to use. Its not often we get a peek into how exactly Google does things, but a recent tip reveals a little bit of how Google is already sorting G+ users for harvest.
Most website owners are familiar with sitemaps. These serve as indexes of the pages on any given website in order to make it easier for services like Google’s indexing algorithms to gather data. Since Google’s robots read these so frequently, its not a shock to see that Google themselves use sitemaps for their products. In fact, one area in which Google is deploying Sitemaps consistently is with their Google Profiles service.
Buried deep within Gstatic.com are over 7,000 sitemaps for Google Profiles. Each sitemap with 5,000 links on each page for the robots to index. The last sitemap, number 7103, has just under 4,000 links, leading us to believe that it is the most recent sitemap currently being filled. Added up, that makes just over 35 million Google Profiles, each ready to be indexed and their content scoured for useful information.
What does this have to do with Google+? Each of the links will re-direct to a Google+ account profile if Google+ has been activated for that profile. This link is in Google’s sitemap regardless of whether or not you have requested that your account not be indexed in Google+, but that doesn’t necessarily indicate that Google will do anything with the information that they shouldn’t. The accounts are organized in much the same way a list of webpages are on a sitemap, and Google likely scans them for information the same way. This enables Google to not only gather information from Google+ quickly and efficiently, but also means that the information in Google+ will become rapidly available in Google’s search engines.
News of your social postings being so rapidly searchable may come as a concern for some. Every now and again you read about someone who got fired after their Facebook or Twitter accounts were discovered. The notion that what is published on a publicly viewable and searchable profile being none of your employers business is naive in my opinion, but as long as you adjust your profile so that Google does not index your account, this should never be a problem for you unless you slip up and put your boss in the wrong circle.
With numbers like 20 million Google+ accounts being activated since its inception, I was a little surprised to see that there were only 35 million Google Profiles, which is certainly indicative of just how few people had been using Google socially until now. There’s no shortage of opinion as to how Google+ is doing, and while these numbers are nowhere near Facebooks monolithic 500 million users, its clear that Google+ is doing something right.