Friends, Technology, Web2.0 - What I am reading

    [Home] [Recent] [Site Map]

   

What Google Doesn"t See CAN Hurt You (Update: But, in this case, it was cloaking)

Posted by randfish

UDPATE: Please read the tail end of this post as well, as there were multiple problematic issues affecting the subdirectory in question.

It"s been a big month for false positives and getting caught with spam, and I"ve never been one to break up a theme. Short post, but an important one that every dev team should be aware of.

The story starts with a smart SEOmoz member, Per Svanström, getting stumped by a perfectly legitimate, white hat subdirectory, with plenty of PageRank, dropping out of Google"s index:

Birdstep Database out of Google

You can see from the image that the single URL was dropped, but a site:birdstep.com/database query reveals that in fact, all of those pages are out of the index. Time for some detective work.

Jane & I spent a few minutes trying to puzzle out if bad links were pointing in or if the pages were somehow cloaking or violating TOS. As we were digging through the backlink profile, we saw that, naturally, the birdstep.com domain was linking to the subdirectory on most every page. When we viewed the source code of those pages (for example, the homepage - www.birdstep.com), we saw something strange. Below is the tail end of the source code for their top nav bar:

<li class="menuObject"><a href="http://www.birdstep.com/Corporate/"><img src="/images/menu/Corporate.gif" border="0" alt="Corporate" /></a></li>
<li class="menuObject"><a href="http://www.birdstep.com/Contact-us/"><img src="/images/menu/Contact_us_active.gif" border="0" alt="Contact us" /></a></li>
<li class="menuObject"><a href="http://www.birdstep.com/database/"><img src="/images/menu/Database.gif" border="0" alt="Database" /></a></li>

Looks fine, right? Just a regular menu serving up images as the clickable link. Only problem is...

Notice the navbar? See the missing link? That"s where the "database" section should be linked-to, only the image is missing. Apparently, it was just a design mistake and so they used a 1x1 pixel gif until they could get it fixed. There are plenty of other visible links in the content body of many pages over to the database section, but that top link in the navbar is invisible - technically violating Google"s rules. Despite the fact that plenty of other sites and pages link to the database section legitimately, and Birdstep certainly has no reason or intention to hide that link (other than a miscalculation on pixel width), the whole subdirectory was removed from the index.

Luckily, we caught it, Birdstep has removed the link and they"ll hopefully have the subdirectory re-included in the near future. They also generously gave us permission to discuss the Q+A issue on the blog, which we very much appreciate. I think this serves as a wise warning to developers and designers everywhere - unintentional, white-hat spirited mistakes can be just as dangerous and have just as dire consequences as black hat manipulation. Watch your code!

One more point of interest - in searching around on this issue, I noticed that a Google search for http://www.birdstep.com/database/. (with the added period at the end) brought up this result:

Birdstep database search with trailing period

I ran another query on a page I know was removed from the index, and it also yielded a result like the one above (unfortunately, I can"t share that page publicly). It"s possible that this might help diagnose future pages that are removed for bad behavior and exhibit similar symptoms - definitely not a bad query to have in your arsenal if it really does work consistently.

UPDATE: Looks like although this hidden nav element could be a problem, it wasn"t actually this issue coming into play here. The answer was... capital letters cloaking 404 pages to Google (an excellent find from John Mueller). Basically, Birdstep was using some user-agent and port detection to redirect Googlebot to a 404 error page (obviously, not an intentional, we"re cloaking because we want to trick Google, but the oops, that was dumb kind). The odd part is, it looks like Yahoo! and MSN/Live got it right (and there are plenty of links), but Googlebot was being treated differently.

We didn"t notice this initially due to multiple problems - first, just switching your user agent to Googlebot in Firefox won"t expose the issue. Neither will using search spider emulators like SEO-Browser. You need to actually telnet to Port 80 (as Matt Cutts notes in the comments). Second, you will see the page in Yahoo! and MSN (making it feel more like a penalty than a crawl issue). I seriously doubt they"ll be banned for this - the intent to spam or deceive isn"t there - but once again a fascinating detective story about the problems a site can have. Big thanks to Matt and to John for their help.

p.s. Removed the bottom part of the original post due to overwhelming feelings of sheepishness.

p.p.s. Dave Naylor has a tool that can help detect this sort of thing (though it wasn"t originally intended for that use).


Do you like this post? Yes No


>>Source Link
>>Blog: SEOmoz Daily SEO Blog
>>Publish Date: 6/27/2008 7:02:59 AM
>>Keywords: quot google

Related Posts
>>Google Analytics Tracker Code Change and More #
    There is no specific immediate need to do this, but if you are using Google Analytics for your traffic tracking, Google asks you to update the tracker script from this (replace YOURNUMBER with your ID
>>Google首页新增“搜索建议” #
      百度改版了首页,Google也没闲着,今天收到Google公关发来的邮件,说Google.CN的首页增加了“搜索建议”功能。  目前在Google.CN上已经可以看到实际的搜索效果,在搜索框中输入关键字时,Google搜索框下会自动出现下拉框,给出一些常用的建议词。  根据分析,这些建议词有可能是按照搜索次数由大到小排列的。这个搜索次数排行应该是近期一段时间的排行,而不
>>META标签的常见用法 #
      最近,Danny Sullivan谈到了如何处理的搜索引擎meta标签的问题,以下是一些关于在Google中如何处理这些问题的答案。  多内容数值  我们建议你把所有的Meta标签内容放在一个meta中。这将使得meta标签更易于读取并降低发生错误的概率。例如:  <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW&
>>Whaddya Say, San Jose? #
    Posted by rebeccaAh, San Jose. Your city may be boring and somewhat bland, but your conference is always a fun time. Below are the best things I overheard at SES San Jose 2007: "I hate mov
>>Winner of Office 2.0 Caption Contest #
    We had an awesome response to our caption contest on Monday - over 80 comments! The Under The Radar folks and myself had a tough choice deciding... but the winner of the free ticket to the Under The R
>>Google Sitemaps支持索引Google Earth地标文件 #
      据Google Maps Api官方博客报道,通过在Google Sitemap文件中加入KML文件,可以使得基于Google Maps API的网站获得更多的流量。  Google Sitemaps是Google的一个和网站管理员相关的工具,在网站上按照Google的规则建立Sitemaps后,任何一个站点,只要有更新,便会自动“通知”Google,方便Google进
>>访问Google被劫持的解决办法 #
      根据Google公关发给我的一封邮件报道,最近一些用户在输入www.google.com或www.google.cn时,网页会自动跳转到一个完全错误的页面。如下图所示,这是因为用户的系统文件被病毒恶意篡改了,按Google提供的下列步骤操作即可修复系统文件,清除病毒。  1) 打开记事本Notepad (开始 -> 所有程序 -> 附件 -> 记事本)。  2)在菜单中选择
>>Google Analytics(Google分析)使用技巧 #
      Google Analytics(Google分析)是Google的一款免费的网站分析服务,自从其诞生以来,即广受好评。Google Analytics功能非常强大,只要在网站的页面上加入一段代码,就可以提供的丰富详尽的图表式报告。今天,我将总结一下加入代码的一些技巧,使用不同格式的urchinTracker代码,可以跟踪网站上一些特殊事件(例如不会产生综合浏览量的事件、JavaScript事
>>Apparently I Work for Google #
    Posted by rebecca...at least, according to my relatives I do. *Disclaimer:* This post isn"t really search-related or remotely useful/valuable. Deal with it. I"m bored. I"m in Michigan right now. Hot,
>>Google Adsense更新计划政策 #
      Google本月初发布了最新的AdSense计划政策,访问这里可以阅读到其中文全文。新的计划政策进行了一些修改,以下是我对新的计划政策的一些解读。  “鼓励点击”方面,已经明确了以前所说的在广告旁边放置图片是非法的,以前只是在官方Blog中说明,现在明确列入计划政策后,那些广告旁边放图片的再不修改,就只有被停用了。  “网站内容”方面,开始支持繁

Other Posts:
>>Awesome Accomplishment Roundup Thursday for the Week of 6/22/08
>>The Mobile Web - Vital For Social Networking; Important For Everyone Else
>>Whiteboard Interviews-URL Hijacking with Rob Kerry
>>Don"t Create False Expectations, Especially When It Comes to Baked Potatoes
>>The SEOmoz PRO Training Series Continues - Expert Seminar: August 19th & 20th in Seattle
>>Matt Cutts Translated: 8 SEO Tips I Heard Him Tell Eric Enge
>>Tactical SEO: How Many Terms/Phrases Should I Target on a Single Page?
>>The Associated Press Uses the DMCA to Try and Shut Down Bloggers
>>The Quizzical Duality of Paid Links
>>Super-Sized Roundup Thursday(ish) for the Weeks of 6/1/08 and 6/8/08
>>Whiteboard Friday - Building a Personal Brand
>>Tool of the Week - Disk Usage Analysis with Baobab


Month Archives:

Top Tags:
Google Internet Technology Company & Product Profiles Search feature Business and Technology Web2.0 column analysis 服务介绍 application comment letter 业界信息 news China2.0 Startups deal Search Headlines 產業策進 未來趨勢 創投 創業案例 widget Social Network 业界动态 Google/SEO news_in SEW Experts Web 2.0 News & Ideas


@2007 All rights Reserved