All Robots.txt
SEO Help and Tips
All Robots.txt
Website Disallow and Allow directives all Robots.txt code with details Here:
Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /
# prohibits downloading anything except pages # starting with '/cgi-bin'
User-agent:*
Allow: /file.xml
# allows downloading the file.xml file
Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /
User-agent:*
Allow: # isn't taken into account by the robot
Disallow:
------------
User-agent:*
Disallow: /
# prohibits crawling for the entire site
User-agent:*
Disallow: /catalogue
# prohibits crawling the pages that start with /catalogue
User-agent:*
Disallow: /page?
# Prohibits crawling the pages with a URL that contains parameters
Common:
------------
User-agent:*
Allow: /archive
Disallow: /
# allows everything that contains '/archive', everything else is prohibited
User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files
# in the '/obsolete/private/...' path
Disallow: /*.php$ # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
# '/private/', but the Allow above negates
# a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing
# in the path '/old/'
User-agent:*
Disallow: /add.php?*user=
# prohibits all 'add.php?' scripts with the 'user' option
Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog
# Sorted robots.txt:
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.
Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
# and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
# and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages
# starting with '/cgi-bin'
Disallow: /cgi-bin # the same
Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:*
Allow: /
Disallow: /
# everything is allowed
User-agent:*
Allow: /$
Disallow: /
# everything is prohibited except the main page
User-agent:*
Disallow: /private*html
# prohibits '/private*html',
# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*
Disallow: /private$
# prohibits only '/private'
User-agent: *
Disallow: /
User-agent:*
Allow: /
# since the* robot # selects entries that include 'User-agent:',
# everything is allowed
User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml
User-agent:*
Disallow:
Clean-param: ref /some_dir/get_book.pl
Additional examples:
-----------------------------
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/showthread.php
#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: sid /index.php
#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php
User-agent: *
This line specifies the user agent to which the subsequent directives apply. In this case, the asterisk (*) is a wildcard character that means "all user agents." So, the directives that follow will apply to all web robots or crawlers.
Disallow: /search
This line indicates that the "/search" directory or folder should not be crawled by web robots. The "/search" directory typically contains search result pages generated by the website's search functionality. By disallowing it, you are instructing search engine crawlers not to index or consider those search result pages in their search results.
Allow: /
This line allows web robots to crawl and index the rest of the website, excluding the "/search" directory. The forward slash ("/") is a directive that represents the root of the website. By allowing it, you are essentially permitting web robots to crawl all other parts of your site.
Comments
Post a Comment
Thanks for your Comments.