Daily Kos

View Story | 317 comments

  •  The WhiteHouse.gov Robots.txt file (3+ / 0-)

    Recommended by:
    smintheus, shirah, ArmyWife

    This file is the one that Search Engine Crawlers like Google use to respect the wishes of the site owner while indexing the site.

    The Robots.txt file can be found here:
    http://www.whitehouse.gov/...

    It's interesting to note that it lists a great deal of content that the White House does not want Google to search and index.  The reasons why are not clear.

    The other search engines are discouraged from indexing the White House web site.  It is contained in the directive: "User-agent: *", meaning all search engine robots.

    The disallow command means to not index this information found at this location, like:

    Disallow: 911progress/text
    Disallow: 911remembrance/text
    Disallow: 911response/text

    For those searching using the WhiteHouse.gov site  using the White House web search, try the Google search like this: site:whitehouse.gov "last throes" and substitute your search word(s) for "last throes".

View Story | 317 comments