Why is require_once so bad to use?

This thread makes me cringe, because there's already been a "solution posted", and it's, for all intents and purposes, wrong. Let's enumerate:

  1. Defines are really expensive in PHP. You can look it up or test it yourself, but the only efficient way of defining a global constant in PHP is via an extension. (Class constants are actually pretty decent performance wise, but this is a moot point, because of 2)

  2. If you are using require_once() appropriately, that is, for inclusion of classes, you don't even need a define; just check if class_exists('Classname'). If the file you are including contains code, i.e. you're using it in the procedural fashion, there is absolutely no reason that require_once() should be necessary for you; each time you include the file you presume to be making a subroutine call.

So for a while, a lot of people did use the class_exists() method for their inclusions. I don't like it because it's fugly, but they had good reason to: require_once() was pretty inefficient before some of the more recent versions of PHP. But that's been fixed, and it is my contention that the extra bytecode you'd have to compile for the conditional, and the extra method call, would by far overweigh any internal hashtable check.

Now, an admission: this stuff is tough to test for, because it accounts for so little of the execution time.

Here is the question you should be thinking about: includes, as a general rule, are expensive in PHP, because every time the interpreter hits one it has to switch back into parse mode, generate the opcodes, and then jump back. If you have a 100+ includes, this will definitely have a performance impact. The reason why using or not using require_once is such an important question is because it makes life difficult for opcode caches. An explanation for this can be found here, but what this boils down to is that:

  • If during parse time, you know exactly what include files you will need for the entire life of the request, require() those at the very beginning and the opcode cache will handle everything else for you.

  • If you are not running an opcode cache, you're in a hard place. Inlining all of your includes into one file (don't do this during development, only in production) can certainly help parse time, but it's a pain to do, and also, you need to know exactly what you'll be including during the request.

  • Autoload is very convenient, but slow, for the reason that the autoload logic has to be run every time an include is done. In practice, I've found that autoloading several specialized files for one request does not cause too much of a problem, but you should not be autoloading all of the files you will need.

  • If you have maybe 10 includes (this is a very back of the envelope calculation), all this wanking is not worth it: just optimize your database queries or something.


require_once and include_once both require that the system keeps a log of what's already been included/required. Every *_once call means checking that log. So there's definitely some extra work being done there but enough to detriment the speed of the whole app?

... I really doubt it... Not unless you're on really old hardware or doing it a lot.

If you are doing thousands of *_once, you could do the work yourself in a lighter fashion. For simple apps, just making sure you've only included it once should suffice but if you're still getting redefine errors, you could something like this:

if (!defined('MyIncludeName')) {
    require('MyIncludeName');
    define('MyIncludeName', 1);
}

I'll personally stick with the *_once statements but on silly million-pass benchmark, you can see a difference between the two:

                php                  hhvm
if defined      0.18587779998779     0.046600103378296
require_once    1.2219581604004      3.2908599376678

10-100× slower with require_once and it's curious that require_once is seemingly slower in hhvm. Again, this is only relevant to your code if you're running *_once thousands of times.


<?php // test.php

$LIMIT = 1000000;

$start = microtime(true);

for ($i=0; $i<$LIMIT; $i++)
    if (!defined('include.php')) {
        require('include.php');
        define('include.php', 1);
    }

$mid = microtime(true);

for ($i=0; $i<$LIMIT; $i++)
    require_once('include.php');

$end = microtime(true);

printf("if defined\t%s\nrequire_once\t%s\n", $mid-$start, $end-$mid);

<?php // include.php

// do nothing.

I got curious and checked out Adam Backstrom's link to Tech Your Universe. This article describes one of the reasons that require should be used instead of require_once. However, their claims didn't hold up to my analysis. I'd be interested in seeing where I may have misanalysed the solution. I used PHP 5.2.0 for comparisons.

I started out by creating 100 header files that used require_once to include another header file. Each of these files looked something like:

<?php
    // /home/fbarnes/phpperf/hdr0.php
    require_once "../phpperf/common_hdr.php";

?>

I created these using a quick Bash hack:

for i in /home/fbarnes/phpperf/hdr{00..99}.php; do
    echo "<?php
    // $i" > $i
    cat helper.php >> $i;
done

This way I could easily swap between using require_once and require when including the header files. I then created an app.php to load the one hundred files. This looked like:

<?php
    // Load all of the php hdrs that were created previously
    for($i=0; $i < 100; $i++)
    {
        require_once "/home/fbarnes/phpperf/hdr$i.php";
    }

    // Read the /proc file system to get some simple stats
    $pid = getmypid();
    $fp = fopen("/proc/$pid/stat", "r");
    $line = fread($fp, 2048);
    $array = split(" ", $line);

    // Write out the statistics; on RedHat 4.5 with kernel 2.6.9
    // 14 is user jiffies; 15 is system jiffies
    $cntr = 0;
    foreach($array as $elem)
    {
        $cntr++;
        echo "stat[$cntr]: $elem\n";
    }
    fclose($fp);
?>

I contrasted the require_once headers with require headers that used a header file looking like:

<?php
    // /home/fbarnes/phpperf/h/hdr0.php
    if(!defined('CommonHdr'))
    {
        require "../phpperf/common_hdr.php";
        define('CommonHdr', 1);
    }
?>

I didn't find much difference when running this with require vs. require_once. In fact, my initial tests seemed to imply that require_once was slightly faster, but I don't necessarily believe that. I repeated the experiment with 10000 input files. Here I did see a consistent difference. I ran the test multiple times, the results are close but using require_once uses on average 30.8 user jiffies and 72.6 system jiffies; using require uses on average 39.4 user jiffies and 72.0 system jiffies. Therefore, it appears that the load is slightly lower using require_once. However, the wall clock time is slightly increased. The 10,000 require_once calls use 10.15 seconds to complete on average and 10,000 require calls use 9.84 seconds on average.

The next step is to look into these differences. I used strace to analyse the system calls that are being made.

Before opening a file from require_once the following system calls are made:

time(NULL)                              = 1223772434
lstat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes/phpperf", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes/phpperf/h", {st_mode=S_IFDIR|0755, st_size=270336, ...}) = 0
lstat64("/home/fbarnes/phpperf/h/hdr0.php", {st_mode=S_IFREG|0644, st_size=88, ...}) = 0
time(NULL)                              = 1223772434
open("/home/fbarnes/phpperf/h/hdr0.php", O_RDONLY) = 3

This contrasts with require:

time(NULL)                              = 1223772905
lstat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes/phpperf", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/home/fbarnes/phpperf/h", {st_mode=S_IFDIR|0755, st_size=270336, ...}) = 0
lstat64("/home/fbarnes/phpperf/h/hdr0.php", {st_mode=S_IFREG|0644, st_size=146, ...}) = 0
time(NULL)                              = 1223772905
open("/home/fbarnes/phpperf/h/hdr0.php", O_RDONLY) = 3

Tech Your Universe implies that require_once should make more lstat64 calls. However, they both make the same number of lstat64 calls. Possibly, the difference is that I am not running APC to optimize the code above. However, next I compared the output of strace for the entire runs:

[fbarnes@myhost phpperf]$ wc -l strace_1000r.out strace_1000ro.out
  190709 strace_1000r.out
  210707 strace_1000ro.out
  401416 total

Effectively there are approximately two more system calls per header file when using require_once. One difference is that require_once has an additional call to the time() function:

[fbarnes@myhost phpperf]$ grep -c time strace_1000r.out strace_1000ro.out
strace_1000r.out:20009
strace_1000ro.out:30008

The other system call is getcwd():

[fbarnes@myhost phpperf]$ grep -c getcwd strace_1000r.out strace_1000ro.out
strace_1000r.out:5
strace_1000ro.out:10004

This is called because I decided to relative path referenced in the hdrXXX files. If I make this an absolute reference, then the only difference is the additional time(NULL) call made in the code:

[fbarnes@myhost phpperf]$ wc -l strace_1000r.out strace_1000ro.out
  190705 strace_1000r.out
  200705 strace_1000ro.out
  391410 total
[fbarnes@myhost phpperf]$ grep -c time strace_1000r.out strace_1000ro.out
strace_1000r.out:20008
strace_1000ro.out:30008

This seems to imply that you could reduce the number of system calls by using absolute paths rather than relative paths. The only difference outside of that is the time(NULL) calls which appear to be used for instrumenting the code to compare what is faster.

One other note is that the APC optimization package has an option called "apc.include_once_override" that claims that it reduces the number of system calls made by the require_once and include_once calls (see the PHP documentation).