Are php strings immutable?

One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:

1. Less instantiations means less time spent on construction.

2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.

For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.

The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.

String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.

Here's a run_test.bash of a measurement experiment:

#!/bin/bash

for i in `seq 1 200`;
do
        /usr/bin/time -p -a -o ./measuring_data.rb  php5 ./string_instantiation_speedtest.php
done

Here are the ./string_instantiation_speedtest.php and the measurement results:

<?php

// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True;  // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;

for($i=0;$i<$n;$i++) {
    if($b_instantiate) {
        $s_1=''.$s;
    } else {
        $s_1=&$s;
    }
    // The rand is for avoiding optimization at storage.
    array_push($ar,''.rand(0,9).$s_1);
} // for

echo($ar[rand(0,$n)]."\n");

?>

My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.

One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.

To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:

<?php

// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True;  // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths  of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';

if($b_suboptimal) {
    for($i=0;$i<$n;$i++) {
        $s_whole=$s_whole.$s_token.$i;
    } // for
} else {
    $i_watershed=(int)round((($n*1.0)/2),0);
    $s_part_1='';
    $s_part_2='';
    for($i=0;$i<$i_watershed;$i++) {
        $s_part_1=$s_part_1.$i.$s_token;
    } // for
    for($i=$i_watershed;$i<$n;$i++) {
        $s_part_2=$s_part_2.$i.$s_token;
    } // for
    $s_whole=$s_part_1.$s_part_2;
} // else

// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);

?>

For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.

A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.


PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.

Tags:

Php

String