How to Website Disallow and Allow directives by Robots.txt ?

 SEO Help and Tips

How to Website Disallow and Allow directives by Robots.txt ?

Disallow and Allow directives all Robots.txt Here:
Disallow:
------------
User-agent:*
Disallow: / 

# prohibits crawling for the entire site
User-agent:*
Disallow: /catalogue 

# prohibits crawling the pages that start with /catalogue
User-agent:*
Disallow: /page? 

# Prohibits crawling the pages with a URL that contains parameters
Allow:
------------
User-agent:*
Allow: /cgi-bin
Disallow: /

# prohibits downloading anything except pages # starting with '/cgi-bin'
User-agent:*
Allow: /file.xml

# allows downloading the file.xml file
Common:
------------
User-agent:*
Allow: /archive
Disallow: /

# allows everything that contains '/archive', everything else is prohibited
User-agent:*
Allow: /obsolete/private/*.html$ # allows HTML files                                  

# in the '/obsolete/private/...' path
Disallow: /*.php$  # prohibits all '*.php' on the website
Disallow: /*/private/ # prohibits all subpaths containing
                      # '/private/', but the Allow above negates
                      # a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing 
                        # in the path '/old/'
User-agent:*
Disallow: /add.php?*user= 

# prohibits all 'add.php?' scripts with the 'user' option
Combining directives:
---------------------------
# Source robots.txt:
User-agent:*
Allow: /
Allow: /catalog/auto
Disallow: /catalog

# Sorted robots.txt:
User-agent:*
Allow: /
Disallow: /catalog
Allow: /catalog/auto

# prohibits downloading pages starting with '/catalog',
# but allows downloading pages starting with '/catalog/auto'.
Allow and Disallow directives without parameters:
----------------------------------------------------------------
User-agent:*
Disallow: # same as Allow: /
User-agent:*
Allow: # isn't taken into account by the robot
Using the special characters:
-------------------------------------
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
   
By default, the * character is appended to the end of every rule described in the robots.txt file. 

Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same
User-agent:*
Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'
                          # and '/cgi-bin/private/test.aspx'
Disallow: /*private # prohibits both '/private'
                   # and '/cgi-bin/private'
By default, the * character is appended to the end of every rule described in the robots.txt file. 

Example:
User-agent:*
Disallow: /cgi-bin* # blocks access to pages 
                    # starting with '/cgi-bin'
Disallow: /cgi-bin # the same
Examples of how directives are interpreted:
--------------------------------------------------------
User-agent:* 
Allow: /
Disallow: /

# everything is allowed
User-agent:* 
Allow: /$
Disallow: /

# everything is prohibited except the main page
User-agent:*
Disallow: /private*html
# prohibits '/private*html', 

# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*
Disallow: /private$

# prohibits only '/private'
User-agent: *
Disallow: /
User-agent:*
Allow: /

# since the* robot # selects entries that include 'User-agent:',

# everything is allowed
User-agent:*
Allow: /
sitemap: https://example.com/site_structure/my_sitemaps1.xml
sitemap: https://example.com/site_structure/my_sitemaps2.xml

User-agent:*
Disallow:
Clean-param: ref /some_dir/get_book.pl
Additional examples:
-----------------------------
#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/showthread.php

#for URLs like:
www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df
www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: sid /index.php

#if there are multiple parameters like this:
www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311
www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s&ref /forum*/showthread.php

#if the parameter is used in multiple scripts:
www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243
www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243

#robots.txt will contain:
User-agent:*
Disallow:
Clean-param: s /forum/index.php
Clean-param: s /forum/showthread.php 

Comments

Popular posts from this blog

How to fix SSL Certificate Issues?

How to Fix Website Mixed Content Issues?

How to Fix Mobile Responsiveness Issues?

Popular posts from this blog

How to fix SSL Certificate Issues?

How to Fix Website Mixed Content Issues?

How to Fix Mobile Responsiveness Issues?