Yesterday I found some people on my favorite reddit wonder about the output of the following code:
<?php
$a = 1;
$c = $a + $a++;
var_dump($c); // int(3)
$a = 1;
$c = $a + $a + $a++;
var_dump($c); // int(3)
As you can see the expressions $a + $a++
and $a + $a + $a++
have the same result, which is rather unexpected. What's happening here?
At this point many people seem to think that the order in which an expression is evaluated is determined by operator precedence and associativity. But that's not true. Precedence and associativity only tell you how the expressions are grouped:
<?php
// in the first expression
$a + $a++;
// "++" has higher precedence than "+", so "$a++" is grouped:
$a + ($a++);
// in the second expressions
$a + $a + $a++;
// "++" again has higher precedence than "+":
$a + $a + ($a++);
// and "+" is a left-associative operator, so the left "+" is grouped:
($a + $a) + ($a++);
What does this tell us about the order of evaluation? Nothing. Operator precedence and associativity specify grouping, but they do not specify in which order the groups are executed. In the last example either ($a + $a)
or ($a++)
could run first.
PHP does not specify what will actually happen. One version of PHP can give you one result and a different version another. Don't write code that depends on some particular evaluation order.
Even though PHP does not define an order, it would still be interesting to know why you get that rather odd result in the first code sample (this result is consistent across all recent PHP versions).
The reason behind it is the "compiled variables" (CV) optimization that was introduced in PHP 5.1. This optimization basically comes down to allowing simple variables (like $a
, but not $a->b
or $a['b']
) to directly act as operands of an opcode. (Opcodes are what PHP generates from your script and what the Zend VM executes. Every opcode has at most two operands and an optional result.)
Now, lets look at the opcodes generated by the two code snippets. We'll start with $a + $a + $a++
:
// code:
$a = 1;
$c = ($a + $a) + ($a++);
// opcodes:
ASSIGN $a, 1
$tmp_1 = ADD $a, $a
$tmp_2 = POST_INC $a
$tmp_3 = ADD $tmp_1, $tmp_2
ASSIGN $c, $tmp_3
The generated opcodes should be rather intuitive: First assign $a = 1
, add $a + $a
and store the result in $tmp_1
, then perform a post-increment on $a
and store the result in $tmp_2
, then add both temporary variables and assign the result to $c
.
The evaluation here happened left-to-right (first $a + $a
was run, then $a++
) as you would probably expect. Now let's look at the $a + $a++
case:
// code:
$a = 1;
$c = $a + ($a++);
// opcodes:
ASSIGN $a, 1
$tmp_1 = POST_INC $a
$tmp_2 = ADD $a, $tmp_1
ASSIGN $c, $tmp_2
As you can see, in this case the POST_INC
($a++
) happens first and the value of $a
is only read after that in the ADD
opcode. Why? Because reading the value of a variable does not require an extra opcode. Any opcode can handle reading the value of a simple variable. This is what the CV optimization does.
There are some (rare) circumstances in which the CV optimization is not performed, e.g. when the @
error suppression operator is in use.
Lets try it out. We use the $a + $a++
expression again, but this time prepend a @
before it:
<?php
$a = 1;
@ $c = $a + $a++;
var_dump($c); // int(2)
With the error-suppression operator present, the result suddenly becomes 2
rather than 3
. To figure out why, lets look at the opcodes once again:
ASSIGN $a, 1
$tmp_1 = BEGIN_SILENCE
$var_3 = FETCH_R 'a'
$tmp_4 = POST_INC $a
$tmp_5 = ADD $var_3, $tmp_4
$var_2 = FETCH_W 'c'
ASSIGN $var_2, $tmp_5
END_SILENCE $tmp_1
Several things changed here: Firstly, everything is now wrapped in BEGIN_SILENCE
and END_SILENCE
opcodes for handling of @
. Those are of no interest to us. Secondly, $a
and $c
are now fetched using FETCH_R
(fetch for read) and FETCH_W
(fetch for write) rather than being used directly as operands.
Because the fetch of $a
now has an actual opcode, the fetch will happen before the increment and as such the result changes.
If you take anything away from this, let it be these two things:
- Don't rely on order of evaluation within an expression. It is undefined.
@
disables CV optimizations and as such hurts performance.@
also hurts performance in other ways.
Nice article (and thanks to the Reddit people to bring it up)!
How would you measure the impact (in performance) of using
@
(or better said not using Compiled Variables)? Has that been measured and published somewhere? (do you know of any place with more information on that?)