Testing Google's Panda algorithm: CNET analysis
CNET evaluates nearly 100,000 search results to learn which Web sites are the winners and losers after Google's latest changes to its algorithm.
Google's sweeping changes to Web site rankings have roiled the Web industry, including the company's announcement last week that its algorithms now incorporate more "user feedback signals."
The reason Google made such a dramatic change to how it ranks Web sites is simple: Search engine optimizers had learned how to game the earlier algorithm to make low-quality writing more visible than quality content. Instead of preparing Web pages designed to benefit readers, SEO-focused content farms were writing for search engines.
To test the changes and provide a rare glimpse into Google's algorithmic workings, CNET compiled nearly 100,000 results by testing Google.com in March and again last Friday after the most recent alterations took effect. (See below for charts and downloadable data.)
News sites generally benefited from the changes. According to our rankings based on the number of appearances on the first page of Google results, Fox News moved up from the No. 89 spot to No. 23. ABC News had a similarly impressive uptick, and ESPN, The New York Times, and Yahoo News became more visible as well.
The "Panda" algorithm change dramatically lowered traffic to sites like AssociatedContent.com, FindArticles.com, and EZineArticles.com, according to a post by SearchMetrics.com. It also negatively impacted some perfectly legitimate sites, including Cult of Mac and the British Medical Journal as well.
CNET's analysis found no significant change among the very top sites, which remained the same. Wikipedia, YouTube, Amazon.com, and IMDB stayed in the same enviable tier one positions, respectively. Hulu.com surged to position No. 22 from No. 51.
Twitter, Facebook, and Huffington Post each moved up a single notch, with Yelp, Flickr, Apple.com, and WebMD slipping a bit. Government Web sites got a boost, with WhiteHouse.gov climbing from No. 125 to No. 79, and NASA, the Centers for Disease Control, and the National Institutes of Health increasing as well.
Among the Web sites that slid in visibility: WikiHow and eHow, which is consistent with other reports that Panda lowered the ranking of so-called content farms. The comparison site Nextag.com also slid.
"People who got hit were trying everything to get their sites out of it," says Barry Schwartz, news editor of Search Engine Land. "It was targeting low quality content sites."
Google declined to elaborate. "We typically don't comment on how specific algorithmic improvements impact specific Web sites," a spokeswoman said.
How we did this
To generate these results, we compiled approximately 2,000 search terms from a sampling of Google Insights' Web, news, and shopping searches. We then removed the duplicates, resulting in a total of 1,656 search terms, and tested those with Google.com (while not logged in) to see what the results would be.
We ignored advertisements, Google shopping results, and "searches related to" suggestions. We did decide to include Google News results, even though they're relatively ephemeral and can change by the hour. Plus, our analysis showed that excluding them wouldn't have changed the results very much.
Now, the disclaimers: Google, as it will be the first to tell you, is constantly altering its algorithm, and by the time you read this, the results from Friday's searches could well be out of date. Our first scan was in March, after Panda's appearance in late February, so it likely didn't capture the most significant changes.
Also, this shouldn't be viewed as a representative cross-section of Web searches. Google Insights only includes the most popular requests, not the more obscure ones. It focuses disproportionately on current events and--because we borrowed terms from the shopping searches--products, especially tech gadgets.
Then again, "charlie sheen teeth" and "venereal disease" appeared in our list of search terms. Thank you, Google Insights!
Google's localization algorithm
We also tested what happens if you connect to Google.com from an overseas Internet address. We picked one in London. We performed the same searches on the same day--the only variable that should have changed, in other words, was our location.
The results? Google engages in significant localization efforts, as you might imagine, with Yelp.com being the largest beneficiary by far.
In searches originating from the U.K., Yelp appeared only twice. In U.S. searches, by contrast, it was the ninth-most popular Web site, with both its topic and individual business pages weaved seamlessly into the main search results.
From our California address, Yelp garnered an enviable 45 first-page appearances for generic searches like "chocolate," "cleaning," "food," "lights," "laundry," "tv," and "weddings."
Other big localization beneficiaries that appeared prominently in U.S. searches but not from the U.K.: Davidsbridal.com, BarnesandNoble.com, and Walgreens.com.
In addition to highlighting nearby bookstores and drugstores operated by national chains, Google also heavily favors local businesses.
For our U.S. tests, we used an Internet address near Palo Alto, Calif., which prompted Google to rank nearby businesses and municipal Web sites near the top of search results.
The City of Palo Alto's Web site appears in the first page of search results for terms including "adventures," "art," "business," "gas," and "jobs." PaloAltoOnline.com makes repeat appearances ("budget cuts," "restaurants"), as do Stanford, the Palo Alto Medical Foundation, and Mike's Bikes.
There's not as much localization in the other direction. But the BBC's Web site leaps from the No. 66 spot to No. 5, and the U.K.'s National Health Service (which made no appearance in the U.S.) shows up at No. 26. The visibility of Amazon.co.uk, the U.K. pharmacy chain Boots, and NetDoctor.co.uk also jumps dramatically.
We wondered if connections to Google.com last month from abroad bypassed Panda and used the earlier algorithm, which would have made for another intriguing test. But an informed source close to the company, alas, says that's not the case.
See for yourself
Below you'll find an Excel file with multiple spreadsheets containing the raw data. If you use the data for any purpose, please attribute it to CNET and include a link to this article.
Four of the spreadsheets (March U.S., April U.S., March U.K., April U.K.) should be self-explanatory. The others show comparisons and may require a bit of explanation: the first column is the hostname, and the second and third columns show how the ranking has changed from the point of comparison. The final columns represent the search terms that bring up that Web site on the first page of Google.com results.
In the case of the U.K. spreadsheet comparing March to April, the second and third columns indicate that Facebook.com moved from position #11 to #7. The difference is 4, which shows a positive change (negative numbers are the opposite). For the "April U.S. vs. U.K" spreadsheet, those columns show that Yelp moved from rank #1328 to an enviable rank of #9 because it benefited from Google's localization efforts in the United States.
"NA" means the Web site didn't exist in the spreadsheet being used for comparison. In the "April U.S. vs. U.K" spreadsheet, "NA" shows up because the U.K.'s National Health Service Web site doesn't appear in any U.S. searches for the terms we tested.
If you find anything interesting, or have any suggestions, please contribute to the discussion below!
Excerpts on Google Docs (limited because Google Docs allows only 400,000 cells)
Full spreadsheet in Excel format (.xlsx.gz)
Disclosure: McCullagh is married to a Google employee who is not involved with Panda.