Hi all,
I’ve noticed Googlebot accessing some unusually over-encoded URLs on our Magento site. These URLs contain multiple levels of URL encoding (e.g., %252525…) and appear under the /catalogsearch/result/index/… path. I did not create or link to such URLs myself.
Here’s an example from our server logs:
https://www.example.com/catalogsearch/result/index/%2525252525252525E2%252525252525252580%25252525252525259C/yfk-1234-6-b.html%E2%80%9D/.../?cat=1234
https://www.example.com/catalogsearch/result/index/lp/kanto-sr/%2525E2%252580%25259C/yfk-1679-6-b.html%E2%80%9D/?cat=2442
User-agent: Googlebot, Bingbot
I’d like to understand:
1. Why Googlebot and Bingbot might attempt to crawl these over-encoded URLs.
2. Could it be caused by malformed internal links or by external websites linking incorrectly?
3. The best approach to handle them — should we block them on catalogsearch, or return a 404/410 response?
I’m asking here to gather insights from the community, but my goal is to also investigate the root cause and prepare a detailed report for my manager.
Any guidance or similar experiences would be greatly appreciated.
Thank you very much!