cancel
Showing results for 
Search instead for 
Did you mean: 

Which rule in our robots.txt blocking filter?

Which rule in our robots.txt blocking filter?

I just want to ask which rule in our robots.txt file is blocking colour filter e.g. .html?color=2694

 

Our website was hacked but there is still one page that has the malicious content on it which is this specific page colour filter: .html?color=2694

 

I cant request google to recrawl this as this was blocked in the robots.txt. My plan is to unblock it and let google recrawl this page but I don't know which line in the robots.txt is blocking the colour filter.

 

I hope someone can help me out. Below is the robots.txt.

 

User-agent:*
Disallow:

Disallow: /lib/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /*SID=
Disallow: /*?

# Disable checkout & customer account
Disallow: /checkout/
Disallow: /onestepcheckout/
Disallow: /customer/
Disallow: /customer/account/
Disallow: /customer/account/login/

# Disable Search pages
Disallow: /catalogsearch/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/

# Disable common folders
Disallow: /app/
Disallow: /bin/
Disallow: /dev/
Disallow: /lib/
Disallow: /phpserver/
Disallow: /pub/
Allow: /pub/media/catalog/product

# Disable Tag & Review (Avoid duplicate content)
Disallow: /tag/
Disallow: /review/

# Common files
Disallow: /composer.json
Disallow: /composer.lock
Disallow: /CONTRIBUTING.md
Disallow: /CONTRIBUTOR_LICENSE_AGREEMENT.html
Disallow: /COPYING.txt
Disallow: /Gruntfile.js
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /nginx.conf.sample
Disallow: /package.json
Disallow: /php.ini.sample
Disallow: /RELEASE_NOTES.txt

# Disable sorting (Avoid duplicate content)
Disallow: /*?*product_list_mode=
Disallow: /*?*product_list_order=
Disallow: /*?*product_list_limit=
Disallow: /*?*product_list_dir=

# Disable version control folders and others
Disallow: /*.git
Disallow: /*.CVS
Disallow: /*.Zip$
Disallow: /*.Svn$
Disallow: /*.Idea$
Disallow: /*.Sql$
Disallow: /*.Tgz$

1 REPLY 1

Re: Which rule in our robots.txt blocking filter?

Hi @jameskahonc68f ,

 

Disallow: /*?

is the reason behind blocking query string.

* is a wildcard and ? is treated as the literal question-mark in robots patterns, so /*? matches any path with a ? (i.e. any URL with query parameters).

 

It's not suggested to remove Disallow: /*? rule

One more thing is that you can add Allow: /*.html?color= which is more specific rule.

Problem Solved? Accept as Solution!

 

Thanks,

Ankit

Ankit Jasani