Our Magento ver. 22.214.171.124 website has around 3,500 pages in its sitemap, but Google Webmaster Tools is now showing nearly 100K pages in its 'index status>total indexed' count.
Hi, thanks for responding.
No, we're not using an SEO or Extended Layered navigation plugins that I'm aware of. We're now using Solr for the site search, but i'm not sure if that's relevant.
This is an example of a URL which Google is crawling and indexing:
As you can see, it's a search query with a lot of apparently nonsense search terms.
To try and stop Google indexing these pages, since around three months ago, we've set robots.txt to block Googlebot from the pages in these folders:
Just to add to my last post, can I ask is it possible to switch off completely Magento's underlying search functions which Googlebot seems to be accessing?
We are using Solr for our customer site search, so the standard Magento search no longer serves any practical purpose.
Is it possible to simply turn off the default Magento search so that it's no longer server pages when Googlebot attempts to search URLs like
In fact, anything in these folders it would be much better for Googlebot to find nothing at all:
I'm really desperate to find a solution here. Any help very gratefully received.
All the best, Alex
How many rows do you have in your core_url_rewrite table?
Do you have any products that have the same name as one another?
Could you tell me how I view the core_url_rewrite table please? It's not something I'm familar with.
Thanks for your help.
It is a table in the database.
I was just wondering if you had multiple products with the same name. This would cause this table to fill up faster and the urls it creates may have reached google. It is a bit of a guess though as i cannot see any URLs i wouldnt expect to be in google.
The pages Google has indexed for us is on a grand scale. In Webmaster Tools the total index count has suddenly gone up to 100K.
Are there any other folders which Magento generates which could hold search returns, that is apart from the ones i'm already aware of:
I did post about this problem on Friday but the situation has got even more desperate since.
Does anyone have any ideas about what might be happening here? Google's number of pages indexed for our site has now jumped almost another 100k up to 184,000.
We don't know what these extra pages are - we should only have about 5,000 indexed at the most - we thought it was the site's internal search results pages but they all have 'noindex' tags in them and we are also blocking them by robots.txt, so Googlebot shouldn't even reach them in the first place.
Google is reporting that our robots.txt file is blocking many millions of pages.
Has anyone got any ideas? We're tearing our hair out.
All the best,