cancel
Showing results for 
Search instead for 
Did you mean: 

Report viewed products does not seem to work correctly

Report viewed products does not seem to work correctly

The database table "report_viewed_product_index" is updated each time a product is viewed on the front-end. I have noticed many problems regarding the way this table is updated:

 

  1. The file "vendor/magento/module-customer/etc/di.xml" contains a section with the User Agents of bots for which the product visits are supposed to be ignored. However, this does not actually happen. Indeed, I added a new entry for my own User Agent to the relevant section, I recompiled the code and generated the static files. When I visited a product page with my User Agent, a relevant entry was indeed created in the table "report_viewed_product_index".
  2. The user agents to be ignored should be matched as complete strings in the aforementioned di.xml file. By default, the di.xml file contains 3 user agents associated with GoogleBot. But, the GoogleBot user agent contains the string "Chrome/W.X.Y.Z" which changes from time to time based on the version of the Chrome browser used by that very user agent; e.g.: "Chrome/41.0.2272.96". So, the log files of Apache or Nginx should be regularly checked in order to get new user agents associated with GoogleBot. The same happens with BingBot, and maybe with other bots. A relative match would be better, for example matching any user agent including the string "Googlebot/2.1".
  3. It is not clear how the visitor_id and customer_id fields of the table "report_viewed_product_index" are updated. When a logged-in customer views a product page, then the customer_id field gets a value, while the visitor_id field is NULL. When the customer logs out and view another product page, then the visitor_id field gets a value while the customer_id field is NULL. However, when the customer has never logged in while browsing the store, then both the visitor_id and customer_id fields get the NULL value.

Due to the above problems, the table "report_viewed_product_index":

  • includes data coming from both real visitors and bots, while it is not possible to discriminate the data coming from bots, so as to at least truncate the relevant records. As a result the statistics are polluted by bots.
  • can become really huge within a short time depending o.n the number of products.
  • generates slow queries in the database when its size becomes large, as it is used in queries via INNER JOINs. A slow query causes a general performance issue on the database, as the involved tables stay open more time waiting for the query to get executed.
  • is updated for every visit of product pages generating unnecessary work load on the database given that a bot can crawl thousands of pages within a day.