Hi
I was invetigating to to reduce 302 and 404 error in magento. If we add the following in robots.txt file then will it be good for SEO. Is there any thing that we should include? Pelase update me.
User-agent: *
Disallow: /*?
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /wishlist/
Disallow: /admin/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /review/product/
Disallow: /sendfriend/
Disallow: /enable-cookies/
Disallow: /LICENSE.txt
Disallow: /LICENSE.html
Disallow: /skin/
Disallow: /js/
Disallow: /directory/
Give the sitemap link on the top like this
sitemap:http://www.yourdomain.com/sitemapurl.xml
Disallow: /*? - This might cause trouble, test the sitemap if it blocks any important resource then please remove it, rest all looks good
Hi All I got it.....
If you are having multi country muslti language website then it looks like this.
# Website Sitemaps
Sitemap: http://www.yourdomain.com/sitemap.xml
Sitemap: http://www.yourdomain.com/br/pt/sitemap.xml
Sitemap: http://www.yourdomain.com/de/de/sitemap.xml
Sitemap: http://www.yourdomain.com/au/en/sitemap.xml
Sitemap: http://www.yourdomain.com/us/en/sitemap.xml
User-agent: Googlebot-Image
Allow: /media/catalog/product/
# Crawlers Setup
User-agent: *
# should not wait too long for a page to load
Crawl-delay: 10
## Don't crawl development files and folders
Disallow: CVS
Disallow: .svn
Disallow: /*.svn$
Disallow: /*.idea$
Disallow: /*.sql$
Disallow: /*.tgz$
# Now come the rules: restrict robots from indexing the following pages:
User-agent: *
Disallow: /admin/
Disallow: /app/
Disallow: /downloader/
Disallow: /errors/
Disallow: /cgi-bin/
Disallow: /magento/
Disallow: /includes/
Disallow: /404/
Disallow: /js/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /shell/
Disallow: /skin/
Disallow: /var/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /stats/
Disallow: /media/captcha/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/product/gallery/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /tag/
Disallow: /poll/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /wishlist/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /api.php
Disallow: /get.php
Disallow: /mage
Disallow: /RELEASE_NOTES.txt
# Paths (no clean URLs)
User-agent: *
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=
Disallow: /*?dir*
Disallow: /*?dir=desc
Disallow: /*?dir=asc
Disallow: /*?limit=all
Disallow: /*?mode*
Hi @dhmagento
It's better to remove disallow directive for /js/ due to recent Google notifications http://searchengineland.com/google-search-console-warnings-issued-for-blocking-javascript-css-226227
Hey,
We put together an SEO-focused robots.txt file for Magento ourselves. It's available to copy and paste over on GitHub at the following address;
https://github.com/Creare/magento-robots/blob/master/robots.txt
Is there a specific issue you're trying to solve or are you just looking to launch a new site and ensure that pages that aren't supposed to get cached don't get cached?
James