Robots.txt file plays vital role in SEO. It instructs search engine robots whether website/webpage should be crawled or not. This file should present in root path.
If a search engine robot wants to visit the web page (say, www.domain.com/about-me/), before it does so, it will visit the page www.domain.com/robots.txt to check is there any instruction to crawl the webpage www.domain.com/about-me/

- The website doesn’t have the robots.txt file
- There is no instruction about the page which robots looking for.
Above are the two cases where search engine robots can crawl the webpage. All the websites should have this file to hide some non seo friendly pages from search engines eye.
What are the pages should be blocked through robots.txt
1) Printer friendly pages
2) Thank you success page
3) Archives
4) The pages which you consider as duplicate content.
How to write the robots.txt file?
* this wont allows any search engines to crawl any of your web pages.
User-agent: *
Disallow: /
# Following is the instruction only to Google for not to crawl the whole website.
User-agent: Googlebot
Disallow: /
# Following is the instruction only to yahoo for not to crawl the image folder.
User-agent: yahoo-slurp
Disallow: /images/
Each and every search engines have their own robots name. Click here to get the list of search engines robots name.
# Following is the instruction for yahoo to crawl image folder and Google to block the “includes” folder.
User-agent: Googlebot
Disallow: /includes/
User-agent: yahoo-slurp
Allow: /images/
# Following is the instruction only to google for not to crawl the “includes” folder and to crawl the images folder.
User-agent: Googlebot
Disallow: /includes/
Allow: /images/
# Following are the instructions to google for disallow all files ending with “.php” extensions
User-agent: Googlebot
Disallow: /*.php$
# Following are the instructions to all search engines for disallow all files in a folder except one file.
User-agent: *
Disallow: /~flare/wings/
Using the above combination write your robots.txt file more effectively and have full access on search engine robots crawling your web pages.

hi elan,
one doubt. how can i write if i just want to block one url
Hi George,
Yeap you can follow the below instruction,
Disallow: filename.php
tats it.
But for blocking one particular URL u can use meta robot.
Hey Elen. Thanks for the simple tips!
Before this I don’t know that we can block SE to crawl any php files.
By the way, are this command only for googlebot?
User-agent: Googlebot
Disallow: /*.php$
Zack, if you want to remove php files from yahoo, then go with this,
User-agent: yahoo-slurp
Disallow: /*.php$
You can use for Live also.