View Story | 317 comments
Comments: Expand Shrink Hide (Always) | Indented Flat (Always)
This file is the one that Search Engine Crawlers like Google use to respect the wishes of the site owner while indexing the site.
The Robots.txt file can be found here: http://www.whitehouse.gov/...
It's interesting to note that it lists a great deal of content that the White House does not want Google to search and index. The reasons why are not clear.
The other search engines are discouraged from indexing the White House web site. It is contained in the directive: "User-agent: *", meaning all search engine robots.
The disallow command means to not index this information found at this location, like:
Disallow: 911progress/text Disallow: 911remembrance/text Disallow: 911response/text
For those searching using the WhiteHouse.gov site using the White House web search, try the Google search like this: site:whitehouse.gov "last throes" and substitute your search word(s) for "last throes".
For the latest on John McCain: John "100-years" McCain
by wmholt on Mon Feb 26, 2007 at 06:08:42 PM PDT
wide narrow
View Story | 317 comments