Tuesday, January 10, 2012

Stop search engine from crawling web site

SEO (Search Engine Optimization) is to promote web site being indexed by search engines (Google, Yahoo, Bing, Baidu etc), however, sometimes webmaster wants to prevent search engines from indexing his web site for various reasons (security, privacy, policy etc), he usually can do 3 things
  1. Create robots.txt and put to web site root folder
  2. Use password to protect sensitive content
  3. Add X-Robots-Tag in Http headers
Page by page control can also use meta tags
  1. <meta name="robots" content="nofollow" />
  2. <meta name="robots" content="noindex" />
This web page has detailed explanation
http://antezeta.com/news/avoid-search-engine-indexing

Here is one example of robots.txt to disallow search engines to crawl your website (some bad bots may skip this robots.txt)
#Disallow search engines to index your website
#Please put it to the root directory of your website
User-agent: *
Disallow:/

No comments:

Post a Comment