All Robots.txt

SEO Help and Tips

All Robots.txt 

Website Disallow and Allow directives all Robots.txt code with details Here:

Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /
# prohibits downloading anything except pages # starting with '/cgi-bin'

User-agent:*
Allow: /file.xml
# allows downloading the file.xml file

Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /

User-agent:*
Allow: # isn't taken into account by the robot

Disallow:
------------
User-agent:*
Disallow: / 
# prohibits crawling for the entire site

User-agent:*
Disallow: /catalogue 
# prohibits crawling the pages that start with /catalogue

User-agent:*
Disallow: /page? 
# Prohibits crawling the pages with a URL that contains parameters

Common:
------------
User-agent:*
Allow: /archive
Disallow: /
# allows everything that contains '/archive', everything else is prohibited

User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files                                  
# in the '/obsolete/private/...' path

Disallow: /*.php$  # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
                      # '/private/', but the Allow above negates
                      # a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing 
                        # in the path '/old/'

User-agent:*
Disallow: /add.php?*user= 
# prohibits all 'add.php?' scripts with the 'user' option


Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Sorted robots.txt:
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto
# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.

Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
   
By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same

User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same


Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:* 
Allow: /
Disallow: /
# everything is allowed

User-agent:* 
Allow: /$
Disallow: /
# everything is prohibited except the main page

User-agent:*
Disallow: /private*html
# prohibits '/private*html', 
# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*
Disallow: /private$
# prohibits only '/private'

User-agent: *
Disallow: /
User-agent:*
Allow: /
# since the* robot # selects entries that include 'User-agent:',
# everything is allowed

User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml

User-agent:*
Disallow:
Clean-param: ref /some_dir/get_book.pl

Additional examples:
-----------------------------

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/showthread.php
#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: sid /index.php
#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php
#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php

User-agent: *
This line specifies the user agent to which the subsequent directives apply. In this case, the asterisk (*) is a wildcard character that means "all user agents." So, the directives that follow will apply to all web robots or crawlers.

Disallow: /search
This line indicates that the "/search" directory or folder should not be crawled by web robots. The "/search" directory typically contains search result pages generated by the website's search functionality. By disallowing it, you are instructing search engine crawlers not to index or consider those search result pages in their search results.

Allow: /
This line allows web robots to crawl and index the rest of the website, excluding the "/search" directory. The forward slash ("/") is a directive that represents the root of the website. By allowing it, you are essentially permitting web robots to crawl all other parts of your site.

Comments

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources