Name: SEO Help and Tips Contents Database
Creator: SEO Help and Tips
Published: 2011-01-10
License: https://www.blogger.com

All Robots.txt

- June 25, 2023

SEO Help and Tips

All Robots.txt

Website Disallow and Allow directives all Robots.txt code with details Here:

Allow:

------------

User-agent:*

Allow: /cgi-bin

Disallow: /

# prohibits downloading anything except pages # starting with '/cgi-bin'

User-agent:*

Allow: /file.xml

# allows downloading the file.xml file

Allow and Disallow directives without parameters:

----------------------------------------------------------------

User-agent:*

Disallow: # same as Allow: /

User-agent:*

Allow: # isn't taken into account by the robot

Disallow:

------------

User-agent:*

Disallow: /

# prohibits crawling for the entire site

User-agent:*

Disallow: /catalogue

# prohibits crawling the pages that start with /catalogue

User-agent:*

Disallow: /page?

# Prohibits crawling the pages with a URL that contains parameters

Common:

------------

User-agent:*

Allow: /archive

Disallow: /

# allows everything that contains '/archive', everything else is prohibited

User-agent:*

Allow: /obsolete/private/*.html$ # allows HTML files

# in the '/obsolete/private/...' path

Disallow: /*.php$ # prohibits all '*.php' on the website

Disallow: /*/private/ # prohibits all subpaths containing

# '/private/', but the Allow above negates

# a part of this prohibition Disallow: /*/old/*.zip$ # prohibits all '*.zip' files containing

# in the path '/old/'

User-agent:*

Disallow: /add.php?*user=

# prohibits all 'add.php?' scripts with the 'user' option

Combining directives:

---------------------------

# Source robots.txt:

User-agent:*

Allow: /

Allow: /catalog/auto

Disallow: /catalog

# Sorted robots.txt:

User-agent:*

Allow: /

Disallow: /catalog

Allow: /catalog/auto

# prohibits downloading pages starting with '/catalog',

# but allows downloading pages starting with '/catalog/auto'.

Using the special characters:

-------------------------------------

User-agent:*

Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'

# and '/cgi-bin/private/test.aspx'

Disallow: /*private # prohibits both '/private'

# and '/cgi-bin/private'

By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent:*

Disallow: /cgi-bin* # blocks access to pages

# starting with '/cgi-bin'

Disallow: /cgi-bin # the same

User-agent:*

Disallow: /cgi-bin/*.aspx # prohibits '/cgi-bin/example.aspx'

# and '/cgi-bin/private/test.aspx'

Disallow: /*private # prohibits both '/private'

# and '/cgi-bin/private'

By default, the * character is appended to the end of every rule described in the robots.txt file. Example:

User-agent:*

Disallow: /cgi-bin* # blocks access to pages

# starting with '/cgi-bin'

Disallow: /cgi-bin # the same

Examples of how directives are interpreted:

--------------------------------------------------------

User-agent:*

Allow: /

Disallow: /

# everything is allowed

User-agent:*

Allow: /$

Disallow: /

# everything is prohibited except the main page

User-agent:*

Disallow: /private*html

# prohibits '/private*html',

# '/private/test.html', '/private/html/test.aspx', etc. User-agent:*

Disallow: /private$

# prohibits only '/private'

User-agent: *

Disallow: /

User-agent:*

Allow: /

# since the* robot # selects entries that include 'User-agent:',

# everything is allowed

User-agent:*

Allow: /

sitemap: https://example.com/site_structure/my_sitemaps1.xml

sitemap: https://example.com/site_structure/my_sitemaps2.xml

User-agent:*

Disallow:

Clean-param: ref /some_dir/get_book.pl

Additional examples:

-----------------------------

#robots.txt will contain:

User-agent:*

Disallow:

Clean-param: s /forum/showthread.php

#for URLs like:

www.example2.com/index.php?page=1&sid=2564126ebdec301c607e5df

www.example2.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

#robots.txt will contain:

User-agent:*

Disallow:

Clean-param: sid /index.php

#if there are multiple parameters like this:

www.example1.com/forum_old/showthread.php?s=681498605&t=8243&ref=1311

www.example1.com/forum_new/showthread.php?s=1e71c417a&t=8243&ref=9896

#robots.txt will contain:

User-agent:*

Disallow:

Clean-param: s&ref /forum*/showthread.php

#if the parameter is used in multiple scripts:

www.example1.com/forum/showthread.php?s=681498b9648949605&t=8243

www.example1.com/forum/index.php?s=1e71c4427317a117a&t=8243

#robots.txt will contain:

User-agent:*

Disallow:

Clean-param: s /forum/index.php

Clean-param: s /forum/showthread.php

User-agent: *

This line specifies the user agent to which the subsequent directives apply. In this case, the asterisk (*) is a wildcard character that means "all user agents." So, the directives that follow will apply to all web robots or crawlers.

Disallow: /search

This line indicates that the "/search" directory or folder should not be crawled by web robots. The "/search" directory typically contains search result pages generated by the website's search functionality. By disallowing it, you are instructing search engine crawlers not to index or consider those search result pages in their search results.

Allow: /

This line allows web robots to crawl and index the rest of the website, excluding the "/search" directory. The forward slash ("/") is a directive that represents the root of the website. By allowing it, you are essentially permitting web robots to crawl all other parts of your site.

All Robots.txt

All Robots.txt

Comments

Post a Comment

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources

Popular posts from this blog

Office Tool_SPSS v23 + Serial key

How to Fix FATAL error Failed to parse input Document ?

How to Reduce Lazy Load Resources

Customer Support