cancel
Showing results for 
Search instead for 
Did you mean: 

Preg_replace should be the last resort as should all preg_* functions

Preg_replace should be the last resort as should all preg_* functions

I have only recently started seriously working with Magento and one issue I find appalling is Magento performance.  The common excuse is "Magento is a heavy platform"  but in truth, it seems more to be either ignorance or laziness that is the culprit.

 

A stunning example of this is preg_replace.   Every programmer should know that operations that involve regular expressions are expensive and should therefore be kept to a minimum.  There are times when they are mandatory and the only solution.  There are also times when they are the most economical solution.   If you use them because they are the most economical, then you mark it in your code as a business decision and you leave a some sort of todo/fixme marker to go back later and fix it.

 

I first discovered this issue because a performance extension was adding a full second to every initial page generation - it uses preg_replace to scan the code and modify all the script/css tags.    I will grudgingly grant that this is an acceptable reason to use preg_replace.   However, even here the extension had room for improvement.

Never assume the generated content is html.   Due to the way the htaccess file is setup, index.php can be invoked for missing files, and in addition it could be json or xml being generated for some api call.  Yet this extension blindly executes preg_replace for all executions.  Wrap that function call with something like:

$tag = '<script';
$body = &$response->getBody();
$tagExists = strpos($mystring, $findme);

if ($tagExists !== false) {
// FIXME: replace this stupidly expensive preg_replace call with something better

// stupidly expensive preg_replace call
}

For those ready to yell "gotcha!" about assigning $body by reference - and hey, in response to a cranky old farts complaint gotcha is perfectly valid - my experience is that you cannot assume that an api will not change.  As I write this - in the 1.9.x code I am looking at - getBody() will return a string or an array.  In the future it may return an object.  And then it could go back to an array or a string.   Being explicit cannot hurt - it can only generate strict or deprecated messages which should be ignored in production.

 

PHP does not clean up after itself by default!   Don't make extra copies in memory of huge string data if you don't need them.  

 

 

4 REPLIES 4

Re: Preg_replace should be the last resort as should all preg_* functions

As a further addendum, and which really made me blow my top, was never use preg_replace for something simple.

 

Once I saw the extra second being added for silly little 404 errors,  My gut told me that it was a preg_* function call.  I ran a search on the entire code base for preg_ and the enumeration was incredible.  Limiting it to just preg_replace and it became a bit more manageable while making me more concerned.  I found this little nugget in a free extension - making it justified as you get what you pay for.  However, I found almost similar passages in commercial extensions so be buyer beware:

$desc = preg_replace("/[\n\r\t]{1,}/", '', $desc);

What does it do?  It strips some formatting characters from a string.  It does this using the most expensive function call in php.  And there is an extremely cheap alternative.

$controlCharArray = array(’\n’, ’\r’,‘\t’);
$cleanCharArray = array('','','');
$desc = str_replace($controlCharArray, $cleanCharArray, $desc);

This is not rocket science.   It is even well documented!  "If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg_replace()"

 

Never go with "well I have to use preg_* for X anyway, so I might as well use it for everything"  - minimize your usage.

 

Re: Preg_replace should be the last resort as should all preg_* functions

My last rant on this topic...for now.

 

Every time you call a regular expression function in php, PHP will make a copy of the data it is executing on in memory and it will re-initialize the regular expression engine.   Because of this, you want to minimize these calls - which PHP allows you to do quite easily because you are allowed to pass strings OR arrays to these functions.  Instead of stupidly calling it multiple times, such as:

$result = preg_replace("/\n/", '', $result);
$result = preg_replace("/\r/", '', $result);
$result = preg_replace("/\t/", '', $result);

Make a single call:

$controlCharArray = array(’\n’, ’\r’,‘\t’);
$cleanCharArray = array('','','');
$result = preg_replace($controlCharArray, $cleanCharArray, $result);

The above code example is for demonstration purposes only!

 

If you go to my second post in this thread, there is a much better way to strip control charectors using str_replace.

 

This example comes from an actual snippet of code - I am not making up something that is ridiculously stupid to prove a point.  It is ridiculously stupid.  And it does prove my point.  It just is not something I made up as an example.

 

 

Re: Preg_replace should be the last resort as should all preg_* functions

There is a very tiny difference comparing to other heavy operations in Magento. Replacing all preg_replace calls with str_replace even won't improve avg load time of the page by a fraction of the percent. It is better to be concerned about redundant or heavy DB operations, Configuration Merging and Layout building processes. 

Re: Preg_replace should be the last resort as should all preg_* functions

Also using references (&) slowing down your operations on big arrays, as PHP by default using copy on write. Your sample is not going to make it faster or more memory efficient. As well you will need to call setBody() method anyway in the end. Better to avoid references, unless you really need them. 

$body = &$response->getBody();