Most of the time we want all search engines to spider our websites and index all pages. Because it is very important to search engine optimization. But There are time when we doesn’t want search engines to crowl our web sites. Times like if the site is still developing or you have made a diplicate of your original site for testing purposes you don’t want search engines to spider these web sites. The best way to stop it is using robot.txt file. The reason is, if you allow search engines to crawl these duplicate sites or pages, would be considered duplicate content by Google and other search engines

This is how to stop all search engines spidering your site. Just make a text file called robot and put these lines ;

User-agent: *
Disallow: /

How to tells all crawlers not to enter into some directories of a website

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /test/

How to tell a specific crawler not to enter one specific directory

User-agent: Googlebot
Disallow: /DirectoryName/

How to tell all crawlers not to enter one specific file

User-agent: *
Disallow: /directory/file.html

I think that’s enough for you to get a pretty good idea about how to use robot.txt file. It can be very useful in times. Specially having a robots.txt file is much preferred over ‘nofollow’ links.