I am experiencing a weird problem in website. The number of users to certain search terms is constantly increasing to thousands of users. I cant tell if its a bot because the terms searched are actually common searched terms. The searched term changes over time and it starts to increase the users of another terms. I already tried to delete all searched terms without result.
I suppose it can be a bad bot, but how can I find out this? How do I find out from where are these searches comming from? Please, any guidance is appreciated!!!!
It could be that bots/scrapers are finding the popular search terms page in Magento, e.g. http://magento-demo.lexiconn.com/catalogsearch/term/popular/.
That would also explain why it comes back even after deleting all search terms.
Do you have a link to this page anywhere on the site that you know of? If you want to make sure it doesn't exist you could override the template and remove the output as a quick solution.
Indeed, it could be a bot that find my popular terms, but it keeps comming up with old terms that do not exist anymore at all. I deleted all of them and it keeps showing up again, even when no one (living person) makes any search. My site is new and it has barely 5 users per day, so I am sure that, at least immeadiattly after I delete all terms, none of my actual users are searching for them again and again. Is there any log on Magento that shows from which Ip my visits are comming from?
EDIT: I`ve checked google console today and it seems that my indexed pages have increased in the past two days. I have though a Disallow in my robots.txt to /catalogsearch/ , so I guess google should not be following this terms. Could google somehow be following this terms somewhere else?
EDIT2: I have found this in my host acces log. Below is my robots.txt. This crawler is ignoring it apparently. Should I block this Ip from my website? - - [03/Jan/2018:21:25:03 -0200] "GET /catalogsearch/result/index?cat=https%3A%2F%2Fwww.*****.com.br%2Faudio-video%3F___SID%3DU&dir=asc&mode=list&order=relevance&p=5&q=caia+de+som HTTP/1.1" 200 157843 "-" "DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/*****.com.br)" - - [03/Jan/2018:21:25:07 -0200] "GET /catalogsearch/result/index?cat=https%3A%2F%2Fwww.*****.com.br%2Faudio-video%3F___SID%3DU&dir=desc&mode=list&order=relevance&q=caia+de+som HTTP/1.1" 200 156641 "-" "DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/****.com.br)" - - [03/Jan/2018:21:25:09 -0200] "GET /catalogsearch/result/index?cat=https%3A%2F%2Fwww.*****.com.br%2Faudio-video%3F___SID%3DU&dir=asc&mode=grid&order=relevance&q=caia+de+som HTTP/1.1" 200 159727 "-" "DomainCrawler/3.0 (info@domaincrawler.com; http://www.domaincrawler.com/****.com.br)"
User-agent: *
Allow: /
#Disallow: /
############### SITEMAP ###############
Sitemap: https://www.******.com.br/sitemap/sitemap.xml
################ PAGES ################
Disallow: /privacy-policy-cookie-restriction-mode/
Disallow: /terms/
##### Directories #####
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/
##### Paths (clean URLs) #####
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
##### Files #####
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
######## QUERY STRING BLOCKER #########
#Uncomment if your site is a brand new un-cached site.
#Disallow: /*?*
##### Uncomment if using Wordpress in subdirectory #####
#Disallow: /blog/wp-content/upgrade/
#Disallow: /blog/wp-admin/
#Disallow: /blog/wp-includes/
########### SCREAMING FROG ############
User-agent: Screaming Frog SEO Spider
Allow: /
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.bmp$
Disallow: /*.xml$
Disallow: /*.css$
Disallow: /*.js$
It does appear to be ignoring your request to ignore catalogsearch URLs. To double check that your robots.txt is set up correctly you could go into Google Search Console and copy a catalogsearch URL into the fetch as google tester to see if google knows it shouldn't access those URLs.
It is at least an honest crawler, http://www.domaincrawler.com/, so maybe there's a configuration problem. If you don't wish them to crawl your site though, you can choose to block the user agent or the IP as that appears to be a dedicated one to internetvikings, the company behind comaincrawler. They may use more than one IP though.