How to Website Disallow and Allow directives by Robots.txt ?

 SEO Help and Tips

How to Website Disallow and Allow directives by Robots.txt ?

Disallow and Allow directives all Robots.txt Here:
Disallow:
------------
User-agent:*
Disallow: / 

# prohibits crawling for the entire site
User-agent:*
Disallow: /catalogue 

# prohibits crawling the pages that start with /catalogue
User-agent:*
Disallow: /page? 

# Prohibits crawling the pages with a URL that contains parameters
Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /

# prohibits downloading anything except pages # starting with '/cgi-bin'
User-agent:*
Allow: /file.xml

# allows downloading the file.xml file
Common:
------------
User-agent:*
Allow: /archive
Disallow: /

# allows everything that contains '/archive', everything else is prohibited
User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files                                  

# in the '/obsolete/private/...' path
Disallow: /*.php$  # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
                      # '/private/', but the Allow above negates
                      # a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing 
                        # in the path '/old/'
User-agent:*
Disallow: /add.php?*user= 

# prohibits all 'add.php?' scripts with the 'user' option
Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Sorted robots.txt:
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto

# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.
Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /
User-agent:*
Allow: # isn't taken into account by the robot
Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
   
By default, the * character is appended to the end of every rule described in the robots.txt file. 

Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. 

Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same
Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:* 
Allow: /
Disallow: /

# everything is allowed
User-agent:* 
Allow: /$
Disallow: /

# everything is prohibited except the main page
User-agent:*
Disallow: /private*html
# prohibits '/private*html', 

# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*
Disallow: /private$

# prohibits only '/private'
User-agent: *
Disallow: /
User-agent:*
Allow: /

# since the* robot # selects entries that include 'User-agent:',

# everything is allowed
User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml

User-agent:*
Disallow:
Clean-param: ref /some_dir/get_book.pl
Additional examples:
-----------------------------
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/showthread.php

#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: sid /index.php

#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php

#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php 

Comments

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources