How to Website Disallow and Allow directives by Robots.txt ?
SEO Help and Tips
Disallow:
------------
User-agent:*
Disallow: /
How to Website Disallow and Allow directives by Robots.txt ?
Disallow and Allow directives all Robots.txt Here:Disallow:
------------
User-agent:*
Disallow: /
# prohibits crawling for the entire site
User-agent:*
Disallow: /catalogue
User-agent:*
Disallow: /catalogue
# prohibits crawling the pages that start with /catalogue
User-agent:*
Disallow: /page?
User-agent:*
Disallow: /page?
# Prohibits crawling the pages with a URL that contains parameters
Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /
Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /
# prohibits downloading anything except pages # starting with '/cgi-bin'
User-agent:*
Allow: /file.xml
User-agent:*
Allow: /file.xml
# allows downloading the file.xml file
Common:
------------
User-agent:*
Allow: /archive
Disallow: /
Common:
------------
User-agent:*
Allow: /archive
Disallow: /
# allows everything that contains '/archive', everything else is prohibited
User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files
User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files
# in the '/obsolete/private/...' path
Disallow: /*.php$ # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
# '/private/', but the Allow above negates
# a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing
# in the path '/old/'
User-agent:*
Disallow: /add.php?*user=
Disallow: /*.php$ # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
# '/private/', but the Allow above negates
# a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing
# in the path '/old/'
User-agent:*
Disallow: /add.php?*user=
# prohibits all 'add.php?' scripts with the 'user' option
Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog
Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog
# Sorted robots.txt:
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.
Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /
User-agent:*
Allow: # isn't taken into account by the robot
Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file.
# but allows downloading pages starting with '/catalog/auto'.
Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /
User-agent:*
Allow: # isn't taken into account by the robot
Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file.
Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file.
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file.
Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:*
Allow: /
Disallow: /
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:*
Allow: /
Disallow: /
# everything is allowed
User-agent:*
Allow: /$
Disallow: /
User-agent:*
Allow: /$
Disallow: /
# everything is prohibited except the main page
User-agent:*
Disallow: /private*html
# prohibits '/private*html',
User-agent:*
Disallow: /private*html
# prohibits '/private*html',
# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*
Disallow: /private$
Disallow: /private$
# prohibits only '/private'
User-agent: *
Disallow: /
User-agent:*
Allow: /
Disallow: /
User-agent:*
Allow: /
# since the* robot # selects entries that include 'User-agent:',
# everything is allowed
User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml
User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml
User-agent:*
Disallow:
Clean-param: ref /some_dir/get_book.pl
Additional examples:
-----------------------------
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/showthread.php
#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: sid /index.php
User-agent:*
Disallow:
Clean-param: sid /index.php
#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php
Comments
Post a Comment
Thanks for your Comments.