Posted by randfish
UDPATE: Please read the tail end of this post as well, as there were multiple problematic issues affecting the subdirectory in question.The story starts with a smart SEOmoz member, Per Svanström, getting stumped by a perfectly legitimate, white hat subdirectory, with plenty of PageRank, dropping out of Google"s index:

You can see from the image that the single URL was dropped, but a site:birdstep.com/database query reveals that in fact, all of those pages are out of the index. Time for some detective work.
Jane & I spent a few minutes trying to puzzle out if bad links were pointing in or if the pages were somehow cloaking or violating TOS. As we were digging through the backlink profile, we saw that, naturally, the birdstep.com domain was linking to the subdirectory on most every page. When we viewed the source code of those pages (for example, the homepage - www.birdstep.com), we saw something strange. Below is the tail end of the source code for their top nav bar:
<li class="menuObject"><a href="http://www.birdstep.com/Corporate/"><img src="/images/menu/Corporate.gif" border="0" alt="Corporate" /></a></li>
<li class="menuObject"><a href="http://www.birdstep.com/Contact-us/"><img src="/images/menu/Contact_us_active.gif" border="0" alt="Contact us" /></a></li>
<li class="menuObject"><a href="http://www.birdstep.com/database/"><img src="/images/menu/Database.gif" border="0" alt="Database" /></a></li>
Looks fine, right? Just a regular menu serving up images as the clickable link. Only problem is...

Notice the navbar? See the missing link? That"s where the "database" section should be linked-to, only the image is missing. Apparently, it was just a design mistake and so they used a 1x1 pixel gif until they could get it fixed. There are plenty of other visible links in the content body of many pages over to the database section, but that top link in the navbar is invisible - technically violating Google"s rules. Despite the fact that plenty of other sites and pages link to the database section legitimately, and Birdstep certainly has no reason or intention to hide that link (other than a miscalculation on pixel width), the whole subdirectory was removed from the index.
Luckily, we caught it, Birdstep has removed the link and they"ll hopefully have the subdirectory re-included in the near future. They also generously gave us permission to discuss the Q+A issue on the blog, which we very much appreciate. I think this serves as a wise warning to developers and designers everywhere - unintentional, white-hat spirited mistakes can be just as dangerous and have just as dire consequences as black hat manipulation. Watch your code!
One more point of interest - in searching around on this issue, I noticed that a Google search for http://www.birdstep.com/database/. (with the added period at the end) brought up this result:

I ran another query on a page I know was removed from the index, and it also yielded a result like the one above (unfortunately, I can"t share that page publicly). It"s possible that this might help diagnose future pages that are removed for bad behavior and exhibit similar symptoms - definitely not a bad query to have in your arsenal if it really does work consistently.
UPDATE: Looks like although this hidden nav element could be a problem, it wasn"t actually this issue coming into play here. The answer was... capital letters cloaking 404 pages to Google (an excellent find from John Mueller). Basically, Birdstep was using some user-agent and port detection to redirect Googlebot to a 404 error page (obviously, not an intentional, we"re cloaking because we want to trick Google, but the oops, that was dumb kind). The odd part is, it looks like Yahoo! and MSN/Live got it right (and there are plenty of links), but Googlebot was being treated differently.
We didn"t notice this initially due to multiple problems - first, just switching your user agent to Googlebot in Firefox won"t expose the issue. Neither will using search spider emulators like SEO-Browser. You need to actually telnet to Port 80 (as Matt Cutts notes in the comments). Second, you will see the page in Yahoo! and MSN (making it feel more like a penalty than a crawl issue). I seriously doubt they"ll be banned for this - the intent to spam or deceive isn"t there - but once again a fascinating detective story about the problems a site can have. Big thanks to Matt and to John for their help.
p.s. Removed the bottom part of the original post due to overwhelming feelings of sheepishness.
p.p.s. Dave Naylor has a tool that can help detect this sort of thing (though it wasn"t originally intended for that use).
Do you like this post? Yes No
Other Posts:
>>Awesome Accomplishment Roundup Thursday for the Week of 6/22/08
>>The Mobile Web - Vital For Social Networking; Important For Everyone Else
>>Whiteboard Interviews-URL Hijacking with Rob Kerry
>>Don"t Create False Expectations, Especially When It Comes to Baked Potatoes
>>The SEOmoz PRO Training Series Continues - Expert Seminar: August 19th & 20th in Seattle
>>Matt Cutts Translated: 8 SEO Tips I Heard Him Tell Eric Enge
>>Tactical SEO: How Many Terms/Phrases Should I Target on a Single Page?
>>The Associated Press Uses the DMCA to Try and Shut Down Bloggers
>>The Quizzical Duality of Paid Links
>>Super-Sized Roundup Thursday(ish) for the Weeks of 6/1/08 and 6/8/08
>>Whiteboard Friday - Building a Personal Brand
>>Tool of the Week - Disk Usage Analysis with Baobab
Month Archives:
Top Tags:
Google Internet Technology Company & Product Profiles Search feature Business and Technology Web2.0 column analysis 服务介绍 application comment letter 业界信息 news China2.0 Startups deal Search Headlines 產業策進 未來趨勢 創投 創業案例 widget Social Network 业界动态 Google/SEO news_in SEW Experts Web 2.0 News & Ideas
@2007 All rights Reserved |