How to write robots.txt file effectively?

Robots.txt file plays vital role in SEO. It instructs search engine robots whether website/webpage should be crawled or not. This file should present in root path.

If a search engine robot wants to visit the web page (say, www.domain.com/about-me/), before it does so, it will visit the page www.domain.com/robots.txt to check is there any instruction to crawl the webpage www.domain.com/about-me/

  • The website doesn’t have the robots.txt file
  • There is no instruction about the page which robots looking for.

Above are the two cases where search engine robots can crawl the webpage. All the websites should have this file to hide some non seo friendly pages from search engines eye.

What are the pages should be blocked through robots.txt

1) Printer friendly pages

2) Thank you success page

3) Archives

4) The pages which you consider as duplicate content.

How to write the robots.txt file?

* this wont allows any search engines to crawl any of your web pages.
User-agent: *
Disallow: /

# Following is the instruction only to Google for not to crawl the whole website.
User-agent: Googlebot
Disallow: /

# Following is the instruction only to yahoo for not to crawl the image folder.
User-agent: yahoo-slurp
Disallow: /images/
Each and every search engines have their own robots name. Click here to get the list of search engines robots name.

# Following is the instruction for yahoo to crawl image folder and Google to block the “includes” folder.
User-agent: Googlebot
Disallow: /includes/

User-agent: yahoo-slurp
Allow: /images/

# Following is the instruction only to google for not to crawl the “includes” folder and to crawl the images folder.
User-agent: Googlebot
Disallow: /includes/
Allow: /images/

# Following are the instructions to google for disallow all files ending with “.php” extensions
User-agent: Googlebot
Disallow: /*.php$

# Following are the instructions to all search engines for disallow all files in a folder except one file.
User-agent: *
Disallow: /~flare/wings/

Using the above combination write your robots.txt file more effectively and have full access on search engine robots crawling your web pages.

Advertisement

4 Responses to “How to write robots.txt file effectively?”

  1. George
    October 10, 2008 at 11:17 am #

    hi elan,

    one doubt. how can i write if i just want to block one url

  2. October 15, 2008 at 11:58 am #

    Hi George,

    Yeap you can follow the below instruction,

    Disallow: filename.php

    tats it.

    But for blocking one particular URL u can use meta robot.

  3. zack
    November 9, 2008 at 11:33 pm #

    Hey Elen. Thanks for the simple tips!

    Before this I don’t know that we can block SE to crawl any php files.

    By the way, are this command only for googlebot?

    User-agent: Googlebot
    Disallow: /*.php$

  4. November 10, 2008 at 12:36 am #

    Zack, if you want to remove php files from yahoo, then go with this,

    User-agent: yahoo-slurp
    Disallow: /*.php$

    You can use for Live also.

Leave a Comment