I import products from a csv with a import script.
Currently I'm skipping the re-indexing process for catalog_url because they take too long.
But for Google indexing reasons I now want to create these url rewrites ONLY for products t
hat are new and whose rewrites don't exist yet. Thats where I'm stuck!
how can i start / what should i look for to solve this?
What version of Magento are you using, how many products do you have? And also; you say ;too long' - how long does the URL generation take?
Andrew
about 4k products with magento 1.7.0.2
and the url generation takes about 12 minutes.
Okay,
And based on the 4000 products - how many URL rewrites do you have in you core_url_rewrite table in total?
And then ALSO.. how many of these URL rewrites are not system ones?? e.g
SELECT count(url_rewrite_id) from core_url_rewrite WHERE is_system = 0
Where I'm going with this.. is I think instead of coding around to only add the indexes for 'new products'.. we try and find out why you have so many and why it takes so long. I have a feeling that you're going to have a lot of custom redirects - this occurs when one or more products have the same url-key and magento then adds a new key for each reindex and has to calculate what number to add on the end of the url - which is one of the more expensive parts of the rewrite generation.
After i deleted and reimpoted all the prducts the reindexing somehow was way faster... but i noticed that after running the import script the cout (output of the query you sent) gets higher and higher each time and is alredy at 20k and it takes 1minute to reindex.
Unfortunately i cant access the database directly to take a look inside the tables
The key here will be to make sure that none of your url-keys are getting duplicated... the more URL duplicates you have - the longer it will take.
You will ideally need access to your tables to diagnose this properly.
i now have access to the database directly, what exactly should i look out for? i searched the table for duplicates but i didn't find any.
As neil said; If you run the following query on your table
SELECT count(url_rewrite_id) from core_url_rewrite WHERE is_system = 0
And report back how many items it says...
I checked it and it returns: 46528
after i ran the import script again i re-checked and it then returned: 49436
EDIT: it alredy took 1:41 min to complete...
Okay
I think its likely that you hae duplicate URL keys in your product data that is causing this high number of unesscary rewrites being added to the table.
If you run the below query.. this will output 2 columns.. the first column is how many rewrites (urls) exist / have existed for a product.. and the 2nd column will tell you the Product ID. You ideally should edit the individual products in the query and ensure they have unique url keys.
SELECT count(url_rewrite_id), product_id FROM core_url_rewrite WHERE is_system = 0 AND isnull(category_id) GROUP BY product_id ORDER BY count(url_rewrite_id) DESC
Once you've done that.. you can run the following to clear up the rewrites.. after you've ran this the rewrite index should operate a lot faster.
DELETE FROM core_url_rewrite WHERE is_system = 0
Ensure you reindex after; and ensure you take a database backup before running it just in-case anything goes wrong.