If everything is coming your way, you’re in the wrong lane.

Robot Control

November 7, 2003 - 11:13pm

Lazy Web, I invoke thee…

Google’s sending people to my pages’ RSS feeds rather than the real page. The robots.txt file has no control over file extensions (specifically, query arguments). There appear to be no HTTP headers to control crawling (caching is not appropriate).

Is there any way to prevent Google from going to any URL ending in ?rss on the site?

Answer: It was in the FAQ, of all places. grumble

==

12. How do I tell Googlebot not to crawl dynamically generated pages on my site?
The following robots.txt file will achieve this.
User-agent: Googlebot
Disallow: /*?

==

“Self-denial is the test and definition of self-government.” — “The Field of Blood” Alarms and Discursions – G. K. Chesterton