in robots.txt.
what the meaning of :
in detail
User-agent: * Disallow: /index.php/ Disallow: /*? Disallow: /checkout/ Disallow: /app/ Disallow: /lib/ Disallow: /*.php$ Disallow: /pkginfo/ Disallow: /report/ Disallow: /var/ Disallow: /catalog/ Disallow: /customer/ Disallow: /sendfriend/ Disallow: /review/ Disallow: /*SID=
Solved! Go to Solution.
Well, that is definitely us then! I will poke around a little internally. I don't understand why all the entries are there myself. I think there might be some historical baggage in there. In particular, the /app and /lib paths don't look right to me. There is no harm, but it does not seem right.
Here is a first summary of the items however:
I am guessing some of the links might be in case someone posts a URL to a forum or similar. We don't want a search engine picking up that URL and recommending it. Better to tell the search engine "forget this URL completely". So it might be overly prescriptive for normal usage, but being safe.
Sorry, I am not sure what you know about robots.txt, so not sure what level you are asking at. Put simply, it is trying to guide search engines about what to index and what not to index on a site. For example, don't index checkout pages. Are all entries there required? I suspect not - there may be a bit of noise in the file. For example, /app is a directory - you should never see it in a URL.
Where did you find this file from? I cannot find it shipped with M2, so I assume it is from some project you are working on (not a M2 specific question)?
Hi @kinan11, did I answer your question or did you have any additional information to provide? If its not a Magento 2 shipped file, I cannot really respond why it contains what it does (other than educated guesses).
from magento2 admin
stores - configuration
general - design- search engin robots
in the "Edit custom instruction of robots.txt File"
i click " reset to default " "This action will delete your custom instructions and reset robots.txt file to system's default settings."
then in the box i see this info
User-agent: * Disallow: /index.php/ Disallow: /*? Disallow: /checkout/ Disallow: /app/ Disallow: /lib/ Disallow: /*.php$ Disallow: /pkginfo/ Disallow: /report/ Disallow: /var/ Disallow: /catalog/ Disallow: /customer/ Disallow: /sendfriend/ Disallow: /review/ Disallow: /*SID=
Well, that is definitely us then! I will poke around a little internally. I don't understand why all the entries are there myself. I think there might be some historical baggage in there. In particular, the /app and /lib paths don't look right to me. There is no harm, but it does not seem right.
Here is a first summary of the items however:
I am guessing some of the links might be in case someone posts a URL to a forum or similar. We don't want a search engine picking up that URL and recommending it. Better to tell the search engine "forget this URL completely". So it might be overly prescriptive for normal usage, but being safe.
I have the same question.
Disallow: /catalog/ -- doesn't that mean that all URLs that have /catalog/ in them will block bots from crawling them from that folder on? So, "wwwexamplesite.com/pub/media/catalog/product/cache/small_image/140x140/ex23le4443am/awesome-product.html" would block access to "/catalog/product/cache/small_image/140x140/ex23le4443am/awesome-product.html" - correct?
Few of our images are indexed by Google, and I'm assuming this "default" robots.txt is the reason why. Most of our images use that basic path and exist in the catalog folder.