Details of Robots.txt
Details Robots.txt
Web page all contents allowed crawl to Index
The default robots.txt file allows search engine crawle to access all pages and
posts on the website. Here is the default robots.txt
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: [Your-Blog-URL]/sitemap.xml
The robots.txt provided allows specific user-agents to access different types of
content on the website. Here is a breakdown of the directives in the robots.txt file:
Details Robots txt:
Which Robts.txt use or not ?
User-agent: * (This directive applies to all other user-agents crawlers)Allow: / (This allows crawling of all other pages on the website)Disallow: /search (This disallows crawling of pages under the /search directory)User-agent: Mediapartners-Google (used for Google AdSense)User-agent: Googlebot (used for regular web crawling)Disallow: /nogooglebot/" (used for which parts of a website they should not crawl)User-agent: Adsbot-Google (used for AdWords campaigns)User-agent: Googlebot-News (used for crawling news content)User-agent: Googlebot-Image (used for crawling images)User-agent: Googlebot-Video (used for crawling video content)User-agent: Googlebot-Mobile (used for crawling mobile content)
For all of the specified user-agents, no specific disallow rules are provided, which means they are allowed to crawl all content on the website.
This robots.txt file allows specific Google user-agents to access all content on the website, while other user-agents are allowed to access all content except for pages under the /search directory.
Disallow: /nogooglebot/" (used for which parts of a website they should not crawl)
User-agent: Adsbot-Google (used for AdWords campaigns)
Comments
Post a Comment
Thanks for your Comments.