SlideShare a Scribd company logo
Understanding PHP Opcodes Andy Wharmby
PHP Opcodes Presentation splits into 3 sections  Generation of opcodes  ZEND_COMPILE  Generation of the Interpreter code  Interpreter comes in many flavours!! Execution of opcodes  ZEND_EXECUTE
Execution path for a script php_execute_script()‏ zend_execute_scripts()‏ zend_execute()‏ user call (function/method )‏ include/require zend_compile_file()‏
zend_compile_file Function ptr that can be overridden to call alternative compiler, e.g. by B-compiler  By default resolves to a call to compile_file() in zend_language_scanner.c Compilation is broken down into 2 steps: Lexical analysis of source PHP script into tokens Parsing of resulting tokens into opcodes PHP script Lexical Analyser Parser byte codes tokens
Lexical Analysis Lexical analyser code in zend_language_scanner.c  generated from zend_language_scanner.l using “flex” Exposed to userspace as token_get_all()‏ <?php $tokens = token_get_all(&quot;<?php  echo = ‘Hello World’;   ?>&quot;); foreach($tokens as $token) { if (is_array($token)) { printf(&quot;%s \t %s\n&quot;,  token_name($token[0]), $token[1]); } else { printf(&quot;\t'%s'\n&quot;, $token);  } } ?> T_OPEN_TAG  <?php T_ECHO  echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING  'Hello World' ';' T_CLOSE_TAG  ?> Lexical Analysis
Parsing Next the tokens are compiled into opcodes Parser code in zend_language_parser.c which is generated from zend_langauge_parser.l by “Bison” Calls code in zend_compile.c to generate opcodes Parser T_OPEN_TAG  <?php T_ECHO  echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING  'Hello World' ';' T_CLOSE_TAG  ?> ZEND_OP ZEND_OP ZEND_OP ZEND_OP
Non-PHP statements  Whats does the complier do with any non-PHP statements in the input script, e.g. HTML All such statements are complied into ECHO statements  So at execution time the statements are just output asis <!-- example for PHP 5.0.0 final release --> <?php $domain = &quot;localhost&quot;; $user = &quot;root&quot;;#note &quot;MIKE&quot; is unacceptable $password = &quot;&quot;; $conn = mysql_connect( $domain, $user, $password ); if($conn)‏ { $msg = &quot;Congratulations !!!! $user, You connected to MySQL&quot;;  }  ?> <html> <head> <title>Connecting user</title> </head> <body> <h3>  <?php echo( $msg ); ?>   </h3> </body> </html>
Non-PHP statements  line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  ECHO  '%3C%21--+example+for+ PHP+5.0.0+final+release+--%3E%0D%0A%0D%0A' 5  1  ASSIGN  !0, 'localhost' 6  2  ASSIGN  !1, 'root' 7  3  ASSIGN  !2, '' 9  4  INIT_FCALL_BY_NAME  'mysql_connect' …… .. snip …. 22  ADD_STRING  ~5, ~5, 'MySQL' 23  ASSIGN  !4, ~5 14  24  JMP  ->25 26  25  ECHO  '%0D%0A%3Chtml%3E%0D%0A %0D%0A+%3Chead%3E%0D%0A++%3Ctitle%3EConnecting+user%3C%2Ftitle%3E%0D%0A+%3C%2Fhead%3E%0D %0A%0D%0A+%3Cbody%3E%0D%0A++%3Ch3%3E+%0D%0A+++' 26  ECHO  !4 30  27  ECHO  '+%0D%0A++%3C%2Fh3%3E %0D%0A+%3C%2Fbody%3E%0D%0A%0D%0A%3C%2Fhtml%3E' 28  RETURN  1 29  ZEND_HANDLE_EXCEPTION
Opcodes  Each Opcodes consists of: Opcode handler  1 or 2 input operands Optional result operand  Optional “Extended value” Meaning opcode dependent, e.g on a ZEND_CAST it defines target type Line number in original source script Opcode. Range 0 - 151.  All listed in zend_vm_opcodes.h Total size of each zend_op is 96 bytes  Some operations consist of 2 opcodes e.g ZEND_ASSIGN_OBJ  2nd Opcode set to ZEND_OP_DATA struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; };
znode One for each operand and result each znode is 24 bytes  Type can be as follows: IS_CONST (0x1)‏ program literal  IS_TMP_VAR (0x2)‏ temporary variable with no name intermediate result IS_VAR (0x4)‏ temporary variable with a name defined in symbol table IS_UNUSED (0x8)‏ operand not specified IS_CV (0x10)‏ optimized version of VAR For some opcodes type of znode is implied, e.g. for a JMP opcode’s op1 znode defines jump target address in “jmp_addr” EA defines “extended attributes” meanings opcode dependent  e.g. on a ZEND_UNSET_VAR it defines if variable is static or not  typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num;  zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode;
zend_compile_file()‏ Returns a pointer to zend_op_array for global scope first, it's not an array but a structure  zend_op_array contains a pointer to an array of opcodes,  plus much more including:  pointer to array of complied variables details. More on these later. count of number of temporaries (TMP + VAR) required by opcodes  i.e. the number of Zend Engine registers used pointer to hash table for all static's defined by the function  the Hashtable is created and populated by the compiler if needed Compiler produces one zend_op_array for:  global scope  this is the one returned to caller of zend_compile_file and is saved in EG(active_op_array)  each user function  added to thread’s function table by compiler each user class method added to function table for class by compiler
zend_compile_file()‏ Initial opcode array allocated by init_op_array() in zend_opcode.c allocated from heap  sufficient for just 64 opcodes  Reallocated each time it is full when by get_next_op()  Reallocates new array 4 times current size Storage for opcode array freed by call to destroy_op_array() at request end For global scope called from zend_execute_scripts()‏ For functions and methods called by Hash Table dtor routine. More later  struct _zend_op_array { …… . zend_uint *refcount; zend_op *opcodes; zend_uint last, size; zend_compiled_variable *vars; int last_var, size_var; zend_uint T;zend_brk_cont_element *brk_cont_array; zend_uint last_brk_cont; zend_uint current_brk_cont; zend_try_catch_element *try_catch_array; int last_try_catch; /* static variables support */ HashTable *static_variables; … .. e.t.c ; ZEND_OP ZEND_OP ZEND_OP ZEND_OP ZEND_OP
Opcodes Not all opcode information can be determined as opcodes are generated by compiler, e.g. target address for a JMP opcode. So after all opcodes generated a 2 nd  pass is made over opcode array to fill in the missing information:  set target for all jump opcodes during compilation jump targets are opcode array index’s. These are changed to absolute addresses set opcode handler to as defined by executor generated: CALL: address of handler function  GOTO: address of label SWITCH: identifier (int) for handling CASE block  for any operands (op1 or op2) which are CONSTANTS modify zval to is_ref=1, refcount=2 to ensure zval copied trim opcode array to required size; i.e. free unused storage  See pass_two() in zend_opcode.c
Functions and Classes During MINIT 2 hash tables are built which the compiler uses GLOBAL_FUNCTION_TABLE  Populated with names of all built-in functions, and functions defined by any enabled extensions  GLOBAL_CLASS_TABLE Populated with default classes and any classes defined by enabled extensions  The complier ADDs to both tables during compile step Any new entries are then removed again at request shutdown In a non-ZTS environment compiler updates the GLOBAL hash tables  In a ZTS environment GLOBAL tables are read-only A separate r/w copy of each table is created for each new thread and populated from GLOBAL table in compiler_globals_ctor()‏
Function table’s One function table per thread Address stored in executor globals (EG)‏ Function table is a Hashtable mapping function name to  “zend_function”  zend_function structure is itself a union of structure’s Populated with built in functions, extension functions e.t.c by copying GLOBAL_FUNCTIONS_TABLE built during MINIT in compiler_globals_ctor()‏ type == ZEND_INTERNAL_FUNCTION zend_function == zend_internal_function During each request functions defined by user script are  added to function table at compile time  type ==  ZEND_USER_FUNCTION zend_function == zend_op_array  typedef union _zend_function { zend_uchar type;  struct { zend_uchar type;  /* never used */ char *function_name; zend_class_entry *scope; ……… . <SNIP > …………. zend_bool pass_rest_by_reference; unsigned char return_reference; } common; zend_op_array op_array; zend_internal_function internal_function; } zend_function;
State of play after  compile complete opcodes zend_op_array active_oparray symbol_table active_symbol_table function_table class_table executor_globals symbol_table function_table <?php function add5($a) { return $a + 5;  } function sub5($a) { return $a - 5; } $a = add5(10); $b = sub5(15); ?> zend_op_array zend_op_array zend_internal_fucntion op_array for global scope zend_internal_fucntion GLOBALS _ENV HTTP_ENV_VARS …… . <internal func> add5 sub5 …… . <internal func>
Function tables User entries removed from global function table during RSHUTDOWN processing by call to shutdown_executor()‏ As user function entries are added after all internal functions the  code uses the zend_hash_reverse_apply() function to traverse threads function table entries backwards removing entries until type != ZEND_USER_FUNCTION Removal triggers HT dtor routine ZEND_FUNCTION_DTOR which in turn calls destroy_op_array() to free opcode array and other structures which hang of zend_op_array
Class_table One class table per thread Address stored in executor globals (EG)‏ Class table is a Hashtable mapping class name to  “zend_class_entry”  Populated with default classes and extension defined classes by copying GLOBAL_CLASS_TABLE built during MINIT in compiler_globals_ctor()‏ During each request classes defined by user script are  added to class table at compile time  Each class has its own function table and compiler adds an entry for each method defined by a class
State of play after  compile complete :  opcodes zend_op_array active_oparray symbol_table active_smbol_table function_table class_table executor_globals symbol_table class_table <?php class Dog { function bark()‏ { print &quot;Woof!&quot;; } function sit()‏ { print “Sit!!”; } } $pooch = new Dog; $pooch ->bark(); $pooch ->sit(); ?> zend_op_array function_table function_table zend_op_array function_table zend_internal_fucntion zend_internal_fucntion GLOBALS _ENV HTTP_ENV_VARS …… . <internal class> Dog …… . bark sit …… . <internal func> <internal func> …… . <internal method> <internal method> …… .
Class table  User class entries are removed by shutdown executor() by traversing threads class table backwards removing all entries until type != ZEND_USER_CLASS Removal triggers HT door routine ZEND_CLASS_DTOR which in turn calls destroy_zend_class()‏ destroy_zend_class() calls zend_hash_destroy() on the class’s function_table which walks the HT and calls dtor ZEND_FUNCTION_DTOR on each entry as described earlier
Static variables Local scope but value retained across calls  Hashtable allocated by compiler per function or method when first static variable defined Referenced by zend_op_array structure Statics added to Hashtable as found by compiler
Examining compile results  Two tools available for analysing results of compile  VLD Parsekit Both available from PECL
VLD Dumps opcodes for a given PHP script Written by Derick Rethans Download from PECL http://pecl.php.net/package/vld/0.8.0 Simple configuration --enable-vld[=shared] Invoked via command line switches php -dvld.active=[ 0 |1] –dvld.execute=[0| 1 ] –f <php script>  Can override defaults in php.ini
VLD  No config.w32 file for Windows ARG_ENABLE(&quot;vld&quot;, “Enable Vulcan Opcode decoder&quot; , &quot;no&quot;); if (PHP_VLD != &quot;no&quot;) { EXTENSION(&quot;vld&quot;, &quot;vld.c srm_oparray.c&quot;); }
VLD output line  #  op  fetch  ext  operands -------------------------------------------------------------------------- 2  0  ECHO  '%0A' 4  1  ASSIGN  !0, 5 5  2  ASSIGN  !1, 10 6  3  ADD  ~2, !0, !1 4  ADD  ~3, ~2, 99 5  ASSIGN  !2, ~3 8  6  INIT_STRING  ~5 7  ADD_STRING  ~5, ~5, 'c' 8  ADD_STRING  ~5, ~5, '%3D+' 9  ADD_VAR  ~5, ~5, !2 10  ADD_STRING  ~5, ~5, '+' 11  ADD_CHAR  ~5, ~5, 10 12  ECHO  ~5 11  13  RETURN  1 14  ZEND_HANDLE_EXCEPTION c= 114 <?php $a = 5; $b = 10; $c = $a + $b + 99; echo  &quot;c= $c \n&quot;; ?> php -f test.php -dvld.active=1 KEY !  == compiler variable $ == variable ~ == temporary There are TMP’s defied for results here but they are not used and VLD does not list them
Why all these “+” in VLD output for CONST’s ? <?php echo &quot;Hello World&quot;; echo &quot;Hello  World&quot;; echo &quot;Hello + World&quot;;  ?> line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 2  0  ECHO  '%0D%0A+' 4  1  ECHO  'Hello+World' 5  2  ECHO  'Hello++++++++++++++++++++++++++++++++World' 6  3  ECHO  'Hello+%2B+World' 9  4  RETURN  1 5  ZEND_HANDLE_EXCEPTION Answer:  VLD calls  php_url_encode() on the CONST to format it before output which amongst other things converts all spaces to “+”. Internally white space is stored as 0x20 as you would expect.
parsekit PHP opcode analyser written by Sara Goleman meant for development and debug only; some code not thread safe Download from PECL  http://pecl.php.net/package/parsekit Simple configuration --enable-session[=shared] Implements 5 functions parsekit_compile_string parsekit_compile_file parsekit_func_arginfo parsekit_opcode_flags parsekit_opcode_name
parsekit  array parsekit_compile_string ( string phpcode [, array &errors [, int options]] )‏ compiles and then analyzes supplied string  array parsekit_compile_string ( string phpcode [, array &errors [, int options]] )‏ errors:  2 dimensional array of errors encounterd during compile example of use in parsekit/examples options: either PARSEKIT_SIMPLE or PARSEKIT_QUIET PARSEKIT_QUIET results in more verbose output array parsekit_compile_file ( string filename [, array &errors [, int options]] )‏ As above but takes name of a .php file as input  array parsekit-func-arginfo (mixed function)‏ Return the arg_info data for a given user defined function/method  long parsekit_opcode_flags (long opcode)‏ Return flags which define return type, operand types etc for an opcode string parsekit_opcode_name (long opcode)‏ Return name for given opcode
parsekit-compile-string:   SIMPLE output <?php $oparray = parsekit_compile_string('echo &quot;HelloWorld&quot;;', $errors, PARSEKIT_SIMPLE); var_dump($oparray); ?> array(5) { [0]=> string(36) &quot;ZEND_ECHO UNUSED 'HelloWorld' UNUSED&quot; [1]=> string(30) &quot;ZEND_RETURN UNUSED NULL UNUSED&quot; [2]=> string(42) &quot;ZEND_HANDLE_EXCEPTION UNUSED UNUSED UNUSED&quot; [&quot;function_table&quot;]=> NULL [&quot;class_table&quot;]=> NULL }
parsekit-compile-file:   QUIET output  <?php $oparray = parsekit_compile_string('echo &quot;HelloWorld&quot;;', $errors, PARSEKIT_QUIET); var_dump($oparray); ?> array(20) { [&quot;type&quot;]=>  int(2)‏ [&quot;type_name&quot;]=>  string(18) &quot;ZEND_USER_FUNCTION&quot; [&quot;fn_flags&quot;]=>  int(0)‏ [&quot;num_args&quot;]=>  int(0)‏ [&quot;required_num_args&quot;]=>  int(0)‏ [&quot;pass_rest_by_reference&quot;]=>  bool(false)‏ [&quot;uses_this&quot;]=>  bool(false)‏ [&quot;line_start&quot;]=>  int(0)‏ [&quot;line_end&quot;]=>  int(0)‏ [&quot;return_reference&quot;]=>  bool(false)‏ [&quot;refcount&quot;]=>  int(1)‏ [&quot;last&quot;]=>  int(3)‏ [&quot;size&quot;]=>  int(3)‏ [&quot;T&quot;]=>  int(0)‏ [&quot;last_brk_cont&quot;]=>  int(0)‏ [&quot;current_brk_cont&quot;]=>  int(-1)‏ [&quot;backpatch_count&quot;]=>  int(0)‏ [&quot;done_pass_two&quot;]=>  bool(true)‏ [&quot;filename&quot;]=>  string(49) &quot;C:\Testcases\helloWorld.php&quot; [&quot;opcodes&quot;]=> array(3) { [0]=>  array(5) { [&quot;opcode&quot;]=>  int(40)‏ [&quot;opcode_name&quot;]=>  string(9) &quot;ZEND_ECHO&quot; [&quot;flags&quot;]=>  int(768)‏ [&quot;op1&quot;]=>  array(3) { [&quot;type&quot;]=>  int(1)‏ [&quot;type_name&quot;]=>  string(8) &quot;IS_CONST&quot; [&quot;constant&quot;]=>  &string(11) &quot;Hello World&quot; } [&quot;lineno&quot;]=>  int(3)‏ etc…..
parsekit-func-arginfo  <? php function foo ($a, stdClass $b, &$c) { } $oparray = parsekit_func_arginfo (‘foo’); var_dump($oparray); ?> array(3) { [0]=> array(3) { [&quot;name&quot;]=> string(1) &quot;a&quot; [&quot;allow_null&quot;]=> bool(true)‏ [&quot;pass_by_reference&quot;]=> bool(false)‏ } [1]=> array(4) { [&quot;name&quot;]=> string(1) &quot;b&quot; [&quot;class_name&quot;]=> string(8) &quot;stdClass&quot; [&quot;allow_null&quot;]=> bool(false)‏ [&quot;pass_by_reference&quot;]=> bool(false)‏ } [2]=> array(3) { [&quot;name&quot;]=> string(1) &quot;c&quot; [&quot;allow_null&quot;]=> bool(true)‏ [&quot;pass_by_reference&quot;]=> bool(true)‏ } }
parsekit-opcode-name  <?php $opname = parsekit_opcode_name (61); var_dump($opname); ?> string(21) &quot;ZEND_DO_FCALL_BY_NAME&quot; <?php $opflags = parsekit_opcode_flags (61); var_dump($opflags); ?> int(16777218)‏ flags define whether opcode takes op1 and op2,  defines EA, sets a result etc
Execution path for a script php_execute_script()‏ zend_execute_scripts()‏ zend_execute()‏ user call (function/method )‏ include/require zend_compile_file()‏
PHP Interpreter Can be generated in many flavours 12 different versions possible Generated by a chunk of PHP code; zend_vm_gen.php You need to understand regular expressions before attempting to read this code Interpreter generated from definition of each opcode in zend_vm_def.h, and  skeletal interpreter body in zend_vm-execute.skl
Interpreter generation process zend_vm_gen.php zend_vm_execute.skl zend_vm_def.h zend_vm-execute.h zend_vm_opcodes.h
zend_vm_execute.skl {%DEFINES%} ZEND_API void {%EXECUTOR_NAME%}(zend_op_array *op_array TSRMLS_DC)‏ { zend_execute_data execute_data; {%HELPER_VARS%} {%INTERNAL_LABELS%} if (EG(exception)) { return; } /* Initialize execute_data */ EX(fbc) = NULL; EX(object) = NULL; EX(old_error_reporting) = NULL; if (op_array->T < TEMP_VAR_STACK_LIMIT) { EX(Ts) = (temp_variable *) do_alloca(sizeof(temp_variable) * op_array->T); } else { EX(Ts) = (temp_variable *) safe_emalloc(sizeof(temp_variable), op_array->T, 0); } ……  etc  triggers to zend_vmg_gen.php to insert generated code
zend_vm_defs.h ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); } opcode opcode name types accepted for op1 types accepted for op2 .. although this is just a macro!  helper function triggers to php code to replace text
Interpreter generation process Usage information: php zend_vm_gen.php [options] Options: --with-vm-kind=CALL|SWITCH|GOTO   - select threading model (default is CALL)‏ --without-specializer   - disable executor specialization --with-old-executor   - enable old executor --with-lines   - enable #line directives – with-vm-kind defines execution method CALL: Each opcode handler is defined as a function SWITCH: Each opcode handler is a case block in one huge switch statement GOTO: Label defined for each opcode handler --without-specializer means only one handler per opcode  With specializer’s a handler generated for each possible combination of operand types  A reported 20% speedup with specializers enabled over old executor
Interpreter generation process --with-old-executor  enables runtime decision to call old pre-ZE2 type executor which is a CALL type executor with no specializer’s  zend_vm_use_old_executor() defined to switch executor model  no current callers though --with-lines results in  addition of #lines directives to generated zend_vm_execute.h #line 28 &quot;C:\PHPDEV\php5.2-200612111130\Zend\zend_vm_def.h&quot; static into ZEND_ADD_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, … . etc  default interpreter which is checked into CVS is generated as follows php zend_vm_gen.php –with-vm-kind=CALL
Specialization With specialization enabled an handler is generated for each valid combination of input operand  As each input operand (op1 and op2) can take 1 of 5 types TMP VAR CV CONST UNUSED This gives a theoretical 25 opcode handlers for each opcode
zend_vm_defs.h ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); }
ZEND_ADD without specialization static int ZEND_ADD_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, get_zval_ptr(&opline->op1, EX(Ts), &free_op1, BP_VAR_R), get_zval_ptr(&opline->op2, EX(Ts), &free_op2, BP_VAR_R)‏   TSRMLS_CC); FREE_OP(free_op1); FREE_OP(free_op2); ZEND_VM_NEXT_OPCODE(); } Handler calls  non-type specific  routines to get zval * for op1 and op2
ZEND_ADD with specialization static int ZEND_ADD_SPEC_CONST_CONST_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, &opline->op2.u.constant TSRMLS_CC); ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_TMP_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)‏ TSRMLS_CC); zval_dtor(free_op2.var); ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_VAR_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)‏ TSRMLS_CC); if (free_op2.var) {zval_ptr_dtor(&free_op2.var);}; ZEND_VM_NEXT_OPCODE(); } … . and 13 other handlers  Handlers call  type specific  routines to get zval * for op1 and op2
zend_vm_gen.php $op1_get_zval_ptr = array( &quot;ANY&quot;  => &quot;get_zval_ptr(&opline->op1, EX(Ts), &free_op1,  \\1 )&quot;, &quot;TMP&quot;  => &quot;_get_zval_ptr_tmp(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)&quot;, &quot;VAR&quot;  => &quot;_get_zval_ptr_var(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)&quot;, &quot;CONST&quot;  => &quot;&opline->op1.u.constant&quot;, &quot;UNUSED&quot; => &quot;NULL&quot;, &quot;CV&quot;  => &quot;_get_zval_ptr_cv(&opline->op1, EX(Ts), \\1 TSRMLS_CC)&quot;, ); $op2_get_zval_ptr = array( &quot;ANY&quot;  => &quot;get_zval_ptr(&opline->op2, EX(Ts), &free_op2, \\1)&quot;, &quot;TMP&quot;  => &quot;_get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)&quot;, &quot;VAR&quot;  => &quot;_get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)&quot;, &quot;CONST&quot;  => &quot;&opline->op2.u.constant&quot;, &quot;UNUSED&quot; => &quot;NULL&quot;, &quot;CV&quot;  => &quot;_get_zval_ptr_cv(&opline->op2, EX(Ts), \\1 TSRMLS_CC)&quot;, … ..<snip> function gen_code(….)‏ …… $code = preg_replace( array( ......... &quot;/GET_OP1_ZVAL_PTR\(([^)]*)\)/&quot;, &quot;/GET_OP2_ZVAL_PTR\(([^)]*)\)/&quot;, ........ ), array( ....... ....... $op1_get_zval_ptr[$op1], $op2_get_zval_ptr[$op2], ....... ), $code);
Generated code not always the best !! static int ZEND_INIT_ARRAY_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (IS_CONST == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if 0 || IS_CONST != IS_UNUSED } else { return ZEND_ADD_ARRAY_ELEMENT_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU); #endif } } ZEND_VM_HANDLER(71, ZEND_INIT_ARRAY, CONST|TMP|VAR|UNUSED|CV, CONST|TMP|VAR|UNUSED|CV)‏ { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (OP1_TYPE == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if !defined(ZEND_VM_SPEC) || OP1_TYPE != IS_UNUSED } else { ZEND_VM_DISPATCH_TO_HANDLER(ZEND_ADD_ARRAY_ELEMENT); #endif } } Input:  zend_vm-def.h Output:  zend_vm-execute.h
Mapping opcode to an handler Generated zend_execute.h contains an array to map opcodes to handlers without specializers array has just 151 entries with specializers 3775 (151 * 25) entries zend_execute.c defines a function to enable compiler to determine correct handler for a given opcode  zend_vm_set_opcode_handler(zend_op *op)‏ Decodes type information for op1 and op2 in supplied “zend_op” and picks appropriate handler from array of handlers. Handler returned will be either:  function pointer for handler when CALL id of handler routine for SWITCH address of handlers label for GOTO Mapping performed at compile time  pass_two() of complier calls zend_vm_set_opcode_handle() to patch handler into all generated opcodes
zend_execute By default zend_execute function pointer addresses the generated  execute() routine in zend_execute.h This is called by zend_execute_scripts() with : a pointer to the zend_op_array for global scope, and  if ZTS enabled the tsrm_ls pointer Executor keeps state data for current user function in zend_execute_data structure which is allocated in execute() stack frame  Address of currently executing functions zend_execute_data stored in EG struct _zend_execute_data { struct _zend_op *opline; zend_function_state function_state; zend_function *fbc; /* Function Being Called */ zend_op_array *op_array; zval *object; union _temp_variable *Ts; zval ***CVs; zend_bool original_in_execution; HashTable *symbol_table; struct _zend_execute_data *prev_execute_data; zval *old_error_reporting; };
execute()‏ On entry acquire storage for Temporary variables Number of temporary variables used by function stored in “T” field of zend_op_array Storage allocated on stack if alloca() available and T < 2000 If alloca not available or 2000+ temporaries then allocated by emalloc from heap  CV cache Number of compiled variables used stored in “last_var” field of zend_op_array Allocated on stack regardless of size if alloca available or emalloc otherwise Initialize zend_execute_data Initialize EX(opline) to address first opcode to execute EX(symbol_table) = EG(active_symbol_table)‏ EX(prev_execute_data) = EG(current_execute_data); EG(current_execute_data) = &execute_data; zend_execute_data ……… current_execute_data …… .... executor_globals zend_execute_data null global scope foo()‏ <?php function foo() { … } …… foo(); }
Operand Types Operands Op1 and Op2 can be either:  VAR ($)‏ Temporary variable into which interpreter caches zval * and zval ** for a defined symbol. TMP (~)‏ Temporary variable were interpreter keeps an intermediate result. For example $a = $b + $c,  the sum of $b and $c will be stored in a TMP before being assigned to $a CV (!)‏ Compiled variable.  Optimized version of a VAR. More to follow shortly CONSTANT  Program literal, e.g. $a = “hello” Symbols are also constants ZVAL allocated by complier  ZVAL has is_ref=1 refcount=2 to force split on assignment UNUSED Operand not defined for opcode  Result operand can be VAR, TMP or CV
Temporary Variables: VAR and TMP “ Ts” field of zend_execute_data addresses an array of temp_variables  Size of array based on information gathered by compiler.  The “var” field in the operands znode contains the  offset  into the “temp_variables” array Temporaries are each 24 bytes  T and EX_T macros provided to do this Temporary variables are NOT re-used by compiler   struct _zend_execute_data { struct _zend_op *opline; …… . union _temp_variable *Ts ; … .. etc }; typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num;  zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode; typedef union _temp_variable { zval tmp_var ; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; zval *str; zend_uint offset; } str_offset; zend_class_entry *class_entry; } temp_variable;
VAR variables FETCH_W  $0, 'a'   /* Retrieve the $a variable for writing */  ASSIGN  $0, 123  /* Assign the numeric value 123 to retrieved variable 0 */  FETCH_W  $2, 'b'  /* Retrieve the $b variable for writing */  ASSIGN  $2, 456  /* Assign the numeric value 456 to retrieved variable 2 */  FETCH_R  $5, 'a'   /* Retrieve the $a variable for reading */  FETCH_R  $6, 'b'  /* Retrieve the $b variable for reading */  ADD  ~7, $5, $6  /* Add the retrieved variables (5 & 6) together and store the result in 7 */  FETCH_W  $4, 'c'  /* Retrieve the $c variable for writing */  ASSIGN  $4, ~7  /* Assign the value in temporary variable 7 into retrieved variable 4 */  FETCH_R  $9, 'c'  /* Retrieve the $c variable for reading */  ECHO  $9  /* Echo the retrieved variable 9 */  <?php  $a = 123;  $b = 456;  $c = $a + $b; echo $c;  ?>  Note: Each time $a is accessed we look it up in symbol table and store result in a different VAR
VAR variables typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; …  etc } temp_variable; ZVAL typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; …  etc } temp_variable; ZVAL After FETCH_R After FETCH_RW or FETCH_W pDataPtr symbol_table pDataPtr symbol_table
Compiled Variables Introduced in PHP 5.1 Avoids need for expensive symbol table lookup EVERY TIME a symbol is referenced The “var” field in the operands znode contains the  index  into the CV cache When variable is initialized at runtime engine looks up symbol in symbol table and stores zval ** in a CV cache addressed from zend Hash value of variable calculated at compile time which allows “quick” HT functions to be used at runtime Subsequent uses of CV avoid symbol table lookup All references to same symbol by a function/method refer to same CV Unlike temporary variables Only supported for simple variables i.e. not object properties, auto globals or “this” pointer  For more information: See Sara Golemon’s blog on subject:  http://blog.libssh2.org/index.php?/archives/21-Compiled-Variables.html
Compiled Variables – compile time processing An array of eligible variables constructed at compile time by lookup_CV()‏ Address of array stored in “vars” field of zend_op_array For any variable eligible to be a CV compiler walks the current “vars” array to check for a match.  If found index returned.  If not found then its added in next free slot  Name and length of symbol Hash code  Array allocated from heap. When array fills its extended by 16 entries by erealloc  struct _zend_op_array { …… zend_compiled_variable *vars; int last_var, size_var; … e.t.c }; last_var contains index of last slot used, size_var last available index typedef struct zend_compiled_variable { char *name; int name_len; ulong hash_value; } zend_compiled_variable;
Compiled Variables – At runtime To access a CV, we extract CV number from the znode and use as index into CV cache.  If CV cache slot non-zero then you have the zval **!! No symbol table lookup If CV cache slot is zero then it’s the first reference to  X so:  Lookup X in symbol table using pre-computed hash, i.e.  zend_hash_quick_find()‏ First lookup of a symbol in a function will also fail so symbol is also added to symbol table at this point if lookup for W or RW using information in “vars” array. Uses zend_hash_quick_update() using pre-computed hash again. If lookup is for R then CV cache set to address EG(uninitialized_zval)‏ Save returned zval** for X saved in CV cache  On userspace “unset” the CV cache is entry is set to NULL to force ht lookup on next reference to symbol ..... zval *** CVs … . zval ** execute_data CV cache ZVAL zval * symbol table … .. zend_uint var; … . znode pDataPtr slot of a HT bucket
Compiled variables: with regular variables  <?php  $a = 123;  $b = 456;  $c = $a + $b; echo $c;  ?>  FETCH_W  $0, 'a'  /* Retrieve the $a variable for writing */  ASSIGN  $0, 123  /* Assign the numeric value 123 to retrieved variable 0 */  FETCH_W  $2, 'b'  /* Retrieve the $b variable for writing */  ASSIGN  $2, 456  /* Assign the numeric value 456 to retrieved variable 2 */  FETCH_R  $5, 'a'  /* Retrieve the $a variable for reading */  FETCH_R  $6, 'b'  /* Retrieve the $b variable for reading */  ADD  ~7, $5, $6  /* Add the retrieved variables (5 & 6) together and store the result in 7 */  FETCH_W  $4, 'c'  /* Retrieve the $c variable for writing */  ASSIGN  $4, ~7  /* Assign the value in temporary variable 7 into retrieved variable 4 */  FETCH_R  $9, 'c'  /* Retrieve the $c variable for reading */  ECHO  $9  /* Echo the retrieved variable 9 */  ASSIGN  !0, 123  /* Assign the numeric value 123 to compiled variable 0 */  ASSIGN  !1, 456  /* Assign the numeric value 456 to compiled variable 1 */  ADD  ~2, !0, !1  /* Add compiled variable 0 to compiled variable 1 */  ASSIGN  !2, ~2  /* Assign the value of temporary variable 2 to compiled variable 2 */  ECHO  !2  /* Echo the value of compiled variable 2 */  Without CV With CV
Compiled variables :  with Object variables <?php  $f->a = 123;  $f->b = 456;  $f->c = $f->a + $F->b; echo $f->c;  ?>  ASSIGN_OBJ  !0, 'a'  /* Assign the numeric value 123 to property 'a' of compiled variable 0 object */ OP_DATA  123  /* Additional data for ASSIGN_OBJ opcode */  ASSIGN_OBJ  !0, 'b'  /* Assign the numeric value 456 to property 'b' of compiled variable 0 object */ OP_DATA  456  /* Additional data for ASSIGN_OBJ opcode */  FETCH_OBJ_R  $3, !0, 'a‘  /* Retrieve property 'a' from compiled variable 0 object */ FETCH_OBJ_R  $4, !0, 'b‘  /* Retrieve property 'b' from compiled variable 0 object */ ADD  ~5, $3, $4  /* Add those values and store the result in temp var 5 */ ASSIGN_OBJ  !0, 'c'  /* Assign the ADD result to property 'c' of compiled variable 0 object */ OP_DATA  ~5  /* Additional data for ASSIGN_OBJ opcode */  FETCH_OBJ_R  $6, !0, 'c‘  /* Retrieve property 'c' from compiled variable 0 object */  ECHO  $6  /* Echo the value */   With CV Note: Properties are re-fetched every time a read or write is performed on them which cannot be avoided due to the magic methods _get(), _set() e.t.c. which can return a different variable on different fetches
Symbol tables One for global scope created during RINIT processing by call to init_executor()‏ Add “GLOBALS entry to new symbol table; which is a recursive reference Populated with requested super globals by call to php_hash_environment()  _POST, _GET,  and _COOKIE if INI register_argc_argc=on specified then argv and argc added  if INI auto_globals_jit = off specified then _ENV, _SERVER and _REQUEST added If auto_globals_jit=on they are added by compiler if reference found; see zend_is_auto_global()  if INI register_long_arrays=on specified long versions of _ENV, _POST etc  if INI register_globals=yes the adds all globals to symbol table Symbol tables created for each called function/method at time of call  created and freed in zend_do_fcall_common_helper()‏ Hashtable for Symbol table allocated from a cache if any available Up to 32 Symbol table HashTable’s cached  Hashtables are cleared before being added to cache so no need to initialize on allocation from cache Otherwise allocate and init a new HashTable  Symbols added to symbol table at runtime on first reference to a symbol On reference to a symbol we do a Hashtable lookup; if lookup fails we add it Initially its set to reference a special zval of uninitialized_zval until its assigned to
Which symbol table to use ? EX(active_symbol_table) contains reference to current function/method symbol table However, as super/auto globals only stored in global scope symbol table what happens if a function references one ? this is where the EA attribute of znode comes into play !! EA for op2 set by complier to direct interpreter to FETCH from global symbol table EX(symbol_table) rather than current symbol table EX(active_symbol_table) when it finds a reference to an auto_global complier checks every symbol against auto_globals hash table see fetch_simple_variable_ex()‏ different flag set in EA for static variable, class static etc see zend_get_target_symbol_table()  <?php function foo() { $a = $_ENV; } foo(); ?> line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 9  0  FETCH_R  global  $0, '_ENV' 1  ASSIGN  !0, $0 10  2  RETURN  null 3  ZEND_HANDLE_EXCEPTION
Static Variables Hashtable referenced by zend_op-array details all statics, if any, defined for a function/method When a static is accessed at runtime an entry is added to local symbol table (EG(active_symbol_table)  Entry made to reference same zval referenced by static_variables Hashtable in a “Change on Write” set  just like a parameter passed by reference
Static Variables <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 5  0  FETCH_W  static  $0, 'count' 1  ASSIGN_REF  !0, $0 6  2  POST_INC  ~1, !0 3  FREE  ~1 8  4  RETURN  null 5  ZEND_HANDLE_EXCEPTION vld output for foo()‏ count value=0  is_ref=0  refount=1 zval count static_variables EG(active_symbol_table)‏ value=0  is_ref=1  refount=2 <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> value=1  is_ref=1  refount=2
Function calling  Opcode sequence depends if target known at compile time Target function name is not known at compile time if Call site before user function definition Conditional functions used Referred to in code as “dynamic function call” If complier cannot verify function name at compile time then sequence is  INIT_FCALL_BY_NAME performs a runtime check on function name  SEND_* for each argument.  EA set to force arg checks for “pass by ref” at runtime DO_FCALL_BY_NAME If compiler can verify function name then sequence is just SEND_* for each argument DO_FCALL  foo(&£a, $b, 100)‏ SEND_REF SEND_VAR SEND_VAL
Function calling  <?php $a= 10;  $b= 5;  foo($a, $b); function foo(&$x, &$y) { echo &quot;foo called with: $x $y&quot;;  } foo($a, $b); ?> line  #  op  fetch  ext  operands ----------------------------------------------------  ------------------- 2  0  ECHO  '%0D%0A+' 4  1  ASSIGN  !0, 10 5  2  ASSIGN  !1, 5 6  3  INIT_FCALL_BY_NAME  'foo' 4  SEND_VAR  !0 5  SEND_VAR  !1 6  DO_FCALL_BY_NAME  2  0 8  7  NOP 14  8  SEND_REF  !0 9  SEND_REF  !1 10  DO_FCALL  2  'foo', 0 17  11  RETURN  1 12  ZEND_HANDLE_EXCEPTION SEND_VAR opcode’s extended_value set when FCALL_BY_NAME to force SEND_VAR handler to check expected args at RUNTIME and if call by REF expected it re-dispatches SEND_REF handler  Uses EX(fbc) set by INIT_FCALL_BY_NAME to access required arg info. extended_value on FCALL opcode is number of arguments passed opcodes for global scope
Calling other user functions  When user space function calls another user function or method then arguments pushed to a LIFO argument stack  Address in EG(argument stack)‏ Initial argument stack of 64 slots is allocated by init_executor()  When full a new stack twice the size allocated  ZEND_SEND_* opcodes for each argument pushes a zval * to argument stack  zval’s are split when necessary
Example 1:  Passing arguments by value without splitting  line  #  op  ext  operands -------------------------------------------------- 3  0  NOP 8  1  ASSIGN  !0, 10 9  2  ASSIGN  !1, 5 10  3  ASSIGN  !2, !0 12  4  SEND_VAR  !0 5  SEND_VAR  !1 6  DO_FCALL  2  'foo', 0 15  7  RETURN  1 8  ZEND_HANDLE_EXCEPTION value=10 refcount= 3 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;;  } $a= 10;  $b= 5;  $c= $a  foo($a, $b);  ?> value=5 refcount=2  is_ref= 0 a b c symbol_table After opcode #5: number of args opcodes for global scope no need to split zval for $a as its part of a “copy on write” set NULL NULL arg1 arg2 2
Example 2:  Passing arguments by value when splitting required line  #  op  ext  operands ------------------------------------------------- 3  0  NOP 8  1  ASSIGN  !0, 10 9  2  ASSIGN  !1, 5 10  3  ASSIGN  !2, !0 12  4  SEND_VAR  !0 5  SEND_VAR  !1 6  DO_FCALL  2  'foo', 0 15  7  RETURN  1 8  ZEND_HANDLE_EXCEPTION <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;;  } $a= 10;  $b= 5;  $c= &$a  foo($a, $b);  ?> After opcode #5: we have to split zval for $a  as its part of a “change on write” set opcodes for global scope NULL NULL arg1 arg2 2 value=10 refcount= 2 is_ref= 1 argument_stack value=5 refcount=2  is_ref= 0 a b c symbol_table value=10 refcount=1 is_ref= 0
Example 3: Passing arguments by reference when splitting required value=10 refcount= 1 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;;  } $a= 10;  $b= 5;  $c = $a foo(&$a, &$b);  ?> value=10 refcount=2  is_ref= 1 a b c symbol_table CV is actually in op1 not result operand line  #  op  ext  operands ------------------------------------------------ 3  0  NOP 8  1  ASSIGN  !0, 10 9  2  ASSIGN  !1, 5 10  3  ASSIGN  !2, !0 11  4  SEND_REF  !0 5  SEND_REF  !1 6  DO_FCALL  2  'foo', 0 14  7  RETURN  1 8  ZEND_HANDLE_EXCEPTION value=10 refcount= 2 is_ref= 1 After opcode #5: here we have to split zval for $a as its part of a “copy on write” set NULL NULL arg1 arg2 2
Example 4: Passing arguments by reference (compile time)‏ value=10 refcount= 1 is_ref= 0 argument_stack <?php $a= 10;  $b= 5;  $c= $a;  foo($a, $b); function foo(&$x, &$y) { echo &quot;foo called with: $x $y&quot;;  }  ?> value=10 refcount=2  is_ref= 1 a b c symbol_table line  #  op  ext  operands ---------------------------------------------------- 5  0  ASSIGN  !0, 10 6  1  ASSIGN  !1, 5 7  2  ASSIGN  !2, !0 8  3  INIT_FCALL_BY_NAME  'foo' 4  SEND_VAR  !0 5  SEND_VAR  !1 6  DO_FCALL_BY_NAME  2  0 10  7  NOP 16  8  RETURN  1 9  ZEND_HANDLE_EXCEPTION value=10 refcount= 2 is_ref= 1 After opcode #5: As its FCALL_BY_NAME extra checks in SEND_VAR kick in to check arg info for  compile time call by ref.  If so redispatch SEND_REF to do right thing ! Not shown but op2 znode is used to save argument number which is  used to index into arg info structure NULL NULL arg1 arg2 2
Receiving arguments passed by value value=10 refcount= 4 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;;  } $a= 10;  $b= 5;  $c = $a foo($a, $b);  ?> value=10 refcount=3  is_ref= 0 line  #  op  ext  operands -------------------------------------------------- 3  0  RECV  1 1  RECV  2 4  2  INIT_STRING  ~0 3  ADD_STRING  ~0, ~0, 'foo' 4  ADD_STRING  ~0, ~0, '+' 5  ADD_STRING  ~0, ~0, 'called‘ e.t.c  After opcode #1: this is argument number Result op is CV for arg NULL NULL arg1 arg2 2 a b c callers symbol_table x y callee symbol_table
Receiving arguments passed by reference value=10 refcount= 1 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;;  } $a= 10;  $b= 5;  $c = $a foo(&$a, &$b);  ?> value=10 refcount=3  is_ref= 1 line  #  op  ext  operands -------------------------------------------------- 3  0  RECV  1 1  RECV  2 4  2  INIT_STRING  ~0 3  ADD_STRING  ~0, ~0, 'foo' 4  ADD_STRING  ~0, ~0, '+' 5  ADD_STRING  ~0, ~0, 'called‘ e.t.c  value=10 refcount= 3 is_ref= 1 After opcode #1: NULL NULL arg1 arg2 2 a b c symbol_table x y symbol_table
Function binding  <?php function a() { echo &quot;called function a&quot;; } function b() { echo &quot;called fucntio b&quot;; } a(); b();  ?> line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  NOP 6  1  NOP 10  2  DO_FCALL  0  'a', 0 11  3  DO_FCALL  0  'b', 0 14  4  RETURN  1 5  ZEND_HANDLE_EXCEPTION What are these NOP’s ?
Function binding  They are artefacts of “function binding” When compiler encounters a function declaration in a script it generates a “ZEND_DECLARE_FUNCTION” opcode in current opcode array  op1 is long function name  \0<function name><file name><address> “  fooC:\Testcases\tes.php0012CD38” where address is character position of last char of function prototype in scripts buffer op2 is short name, i.e. just ”foo” a function table entry is added by compiler for long name After parsing function body and generating its zend_op_array compiler then performs “early binding” for the unconditional functions  Effectively executes opcode at compile time. See zend_do_early_binding(). Opcode checks for duplicate function names Looks up function table entry for long name. This should always be successful !! Attempts to add a function entry with short name using zend_function_entry just retrieved. If this fails we have a duplicate function name and an error message is produced detailing filename and line number of previous declaration. If no duplicate then the ZEND_DECLARE_FUNCTION opcode is converted to a NOP opcode set to ZEND_NOP op1 and op2 set to UNUSED and zval’s for name strings freed  Deletes function table entry for “long name”
Conditional Functions  Same function name can be defined multiple time with different content and/or signature  A zend_op_array generated for each different version of a conditional function  Which function gets executed not known until runtime So function binding delayed until runtime ZEND_DECLARE_FUNCTION perists on complier output <?php $a= 10; if (a > 10) { function foo()‏ { echo &quot;foo has no parms&quot;; }  } else { function foo($a)‏ { echo &quot;foo has 1 parm&quot;; } }  if (a > 10) { foo();  } else { foo($a); }
Conditional Functions line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 2  0  ECHO  '%0D%0A+' 5  1  ASSIGN  !0, 10 7  2  FETCH_CONSTANT  ~1, 'a' 3  IS_SMALLER  ~2, 10, ~1 4  JMPZ  ~2, ->7 8  5  ZEND_DECLARE_FUNCTION  '%00fooC%3A%5CEclipse-PHP%5C workspace%5CTestcases%5Ctest.php0140E973', 'foo' 12  6  JMP  ->8 13  7  ZEND_DECLARE_FUNCTION   '%00fooC%3A%5CEclipse-PHP%5C workspace%5CTestcases%5Ctest.php0140E9B9', 'foo' 20  8  FETCH_CONSTANT  ~3, 'a' 9  IS_SMALLER  ~4, 10, ~3 10  JMPZ  ~4, ->14 . . . e.t.c . . .
Conditional Functions line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 10  0  ECHO  'foo+has+no+parms' 11  1  RETURN  null 2  ZEND_HANDLE_EXCEPTION line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 13  0  RECV  1 15  1  ECHO  'foo+has+1+parm' 16  2  RETURN  null 3  ZEND_HANDLE_EXCEPTION zend_op_array for foo()‏ zend_op_array for foo($a)‏
Exception Handling <?php function foo($x)‏ { if ($x > 1 ) { throw new Exception; } } try { foo(1); }  catch (Exception $e) { echo &quot;exception 1&quot;; }  try { foo(2); }  catch (Exception $e) { echo &quot;exception 2&quot;; }  ?> line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  NOP 11  1  SEND_VAL  1 2  DO_FCALL  1  'foo', 0 13  3  ZEND_FETCH_CLASS  :1, 'Exception' 4  ZEND_CATCH  null, 'e' 14  5  ECHO  'exception+1' 18  6  SEND_VAL  2 7  DO_FCALL  1  'foo', 0 20  8  ZEND_FETCH_CLASS  :3, 'Exception' 9  ZEND_CATCH  null, 'e' 21  10  ECHO  'exception+2' 25  11  RETURN  1 12  ZEND_HANDLE_EXCEPTION line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  RECV  1 5  1  IS_SMALLER  ~0, 1, !0 2  JMPZ  ~0, ->8 6  3  ZEND_FETCH_CLASS  :1, 'Exception' 4  NEW  $2, :1 5  DO_FCALL_BY_NAME  0  0 6  ZEND_THROW  $2 7  7  JMP  ->8 8  8  RETURN  null 9  ZEND_HANDLE_EXCEPTION not shown by VLD but extended value of CATCH opcode contains opcode number to branch too if exception not thrown. if no exception thrown during TRY block then we actually execute the ZEND_FETCH_CLASS and ZEND_CATCH opcodes. ZEND_CATCH on finding no exception thrown dispatches first opcode after end of catch block, i.e 6 in this case
Exception handling When ZEND_THROW opcode executes it sets EG(exception) and EG(opline_before_exception) before dispatching the ZEND_HANDLE_EXCEPTION opcode at end of current op array see zend_throw_exception_internal()‏ ZEND_HANDLE_EXCEPTION  opcode handler checks all try/catch blocks in current scope to see if the range they cover includes the last opcode executed in current scope before exception.  If any dispaches the firstopcode of the catch block which will be ZEND_FETCH_CLASS uses an array built by compiler which defines scope of eah try/catch block array records scope in terms of opcode number of first try block opcode and opcode number of first catch block opcode if none then return to caller  on seeing EG(exception) still set return processing sets  EG(opline_before_exception)  to the last opcode executed in the caller, i.e. the FCALL opcode, and then sets next opcode in caller to the  ZEND_HANDLE_EXCEPTION opcode at end of callers opcode array  Repeat until a catch block found or we return from global scope and which point if EG(execption) set “uncaught exception” error msg produced
Exception handling struct _zend_op_array { …… . zend_try_catch_element *try_catch_array; int last_try_catch; … .. etc }; typedef struct _zend_try_catch_element { zend_uint try_op; zend_uint catch_op;  /* ketchup! */ } zend_try_catch_element ; struct _zend_execute_data { struct _zend_op *opline; …… . zend_op_arrary *op_array … .. etc }; contains opcode number of first opcode of  try and catch blocks array is realloacted as every try/catch block found  by compiler
Exception Handling line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  NOP 11  1  SEND_VAL  1 2  DO_FCALL  1  'foo', 0 13  3  ZEND_FETCH_CLASS  :1, 'Exception' 4  ZEND_CATCH  null, 'e' 14  5  ECHO  'exception+1' 18  6  SEND_VAL  2 7  DO_FCALL  1  'foo', 0 20  8  ZEND_FETCH_CLASS  :3, 'Exception' 9  ZEND_CATCH  null, 'e' 21  10  ECHO  'exception+2' 25  11  RETURN  1 12  ZEND_HANDLE_EXCEPTION line  #  op  fetch  ext  operands ------------------------------------------------------------------------------- 3  0  RECV  1 5  1  IS_SMALLER  ~0, 1, !0 2  JMPZ  ~0, ->8 6  3  ZEND_FETCH_CLASS  :1, 'Exception' 4  NEW  $2, :1 5  DO_FCALL_BY_NAME  0  0 6  ZEND_THROW  $2 7  7  JMP  ->8 8  8  RETURN  null 9  ZEND_HANDLE_EXCEPTION try_op = 1 catch_op = 3 try_op = 6 catch_op = 8 try_catch_array last_try_catch = 2 zend_op_array try_catch_array last_try_catch = 0 zend_op_array null

More Related Content

PDF
Php engine
PDF
PHP Internals and Virtual Machine
PDF
PHP 7 OPCache extension review
PDF
Regular Expression (RegExp)
PDF
Quick tour of PHP from inside
TXT
Exploit techniques - a quick review
PDF
PHP7 is coming
ODP
The promise of asynchronous PHP
Php engine
PHP Internals and Virtual Machine
PHP 7 OPCache extension review
Regular Expression (RegExp)
Quick tour of PHP from inside
Exploit techniques - a quick review
PHP7 is coming
The promise of asynchronous PHP

What's hot (20)

TXT
Mona cheatsheet
PDF
Php and threads ZTS
ODP
PHP Tips for certification - OdW13
PPT
Unit 4
PPT
Assembler (2)
PDF
Phpをいじり倒す10の方法
PPT
Falcon初印象
ODP
Runtime Symbol Resolution
PPT
Assembler
PPT
Assembler
PDF
Embedded C - Lecture 4
PPT
Unit 5
PPT
C tutorial
PPT
Unit 6
PPT
C tutorial
DOCX
Bit manipulation in atmel studio for AVR
PPT
Assembler
PPT
The Php Life Cycle
PDF
Multithreaded sockets c++11
Mona cheatsheet
Php and threads ZTS
PHP Tips for certification - OdW13
Unit 4
Assembler (2)
Phpをいじり倒す10の方法
Falcon初印象
Runtime Symbol Resolution
Assembler
Assembler
Embedded C - Lecture 4
Unit 5
C tutorial
Unit 6
C tutorial
Bit manipulation in atmel studio for AVR
Assembler
The Php Life Cycle
Multithreaded sockets c++11
Ad

Similar to Php opcodes sep2008 (20)

PDF
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
PPTX
Php Extensions for Dummies
ODP
PHP Performance SfLive 2010
PPTX
Php’s guts
PDF
PHP, Under The Hood - DPC
PDF
Php extensions workshop
PDF
Php7 extensions workshop
ODP
What's new, what's hot in PHP 5.3
PDF
Dynamic PHP web-application analysis
PPTX
Extending php (7), the basics
PPTX
Zend Framework Workshop
PDF
Create your own PHP extension, step by step - phpDay 2012 Verona
PDF
PHP7 - The New Engine for old good train
PPTX
Php extensions
PDF
PHPcon Poland - Static Analysis of PHP Code – How the Heck did I write so man...
PPTX
PHP 5.3
PPTX
Php extensions
PDF
Code Generation in PHP - PHPConf 2015
PPTX
Php extensions
PPTX
Listen afup 2010
SyScan Singapore 2010 - Returning Into The PHP-Interpreter
Php Extensions for Dummies
PHP Performance SfLive 2010
Php’s guts
PHP, Under The Hood - DPC
Php extensions workshop
Php7 extensions workshop
What's new, what's hot in PHP 5.3
Dynamic PHP web-application analysis
Extending php (7), the basics
Zend Framework Workshop
Create your own PHP extension, step by step - phpDay 2012 Verona
PHP7 - The New Engine for old good train
Php extensions
PHPcon Poland - Static Analysis of PHP Code – How the Heck did I write so man...
PHP 5.3
Php extensions
Code Generation in PHP - PHPConf 2015
Php extensions
Listen afup 2010
Ad

Php opcodes sep2008

  • 2. PHP Opcodes Presentation splits into 3 sections Generation of opcodes ZEND_COMPILE Generation of the Interpreter code Interpreter comes in many flavours!! Execution of opcodes ZEND_EXECUTE
  • 3. Execution path for a script php_execute_script()‏ zend_execute_scripts()‏ zend_execute()‏ user call (function/method )‏ include/require zend_compile_file()‏
  • 4. zend_compile_file Function ptr that can be overridden to call alternative compiler, e.g. by B-compiler By default resolves to a call to compile_file() in zend_language_scanner.c Compilation is broken down into 2 steps: Lexical analysis of source PHP script into tokens Parsing of resulting tokens into opcodes PHP script Lexical Analyser Parser byte codes tokens
  • 5. Lexical Analysis Lexical analyser code in zend_language_scanner.c generated from zend_language_scanner.l using “flex” Exposed to userspace as token_get_all()‏ <?php $tokens = token_get_all(&quot;<?php echo = ‘Hello World’; ?>&quot;); foreach($tokens as $token) { if (is_array($token)) { printf(&quot;%s \t %s\n&quot;, token_name($token[0]), $token[1]); } else { printf(&quot;\t'%s'\n&quot;, $token); } } ?> T_OPEN_TAG <?php T_ECHO echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING 'Hello World' ';' T_CLOSE_TAG ?> Lexical Analysis
  • 6. Parsing Next the tokens are compiled into opcodes Parser code in zend_language_parser.c which is generated from zend_langauge_parser.l by “Bison” Calls code in zend_compile.c to generate opcodes Parser T_OPEN_TAG <?php T_ECHO echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING 'Hello World' ';' T_CLOSE_TAG ?> ZEND_OP ZEND_OP ZEND_OP ZEND_OP
  • 7. Non-PHP statements Whats does the complier do with any non-PHP statements in the input script, e.g. HTML All such statements are complied into ECHO statements So at execution time the statements are just output asis <!-- example for PHP 5.0.0 final release --> <?php $domain = &quot;localhost&quot;; $user = &quot;root&quot;;#note &quot;MIKE&quot; is unacceptable $password = &quot;&quot;; $conn = mysql_connect( $domain, $user, $password ); if($conn)‏ { $msg = &quot;Congratulations !!!! $user, You connected to MySQL&quot;; } ?> <html> <head> <title>Connecting user</title> </head> <body> <h3> <?php echo( $msg ); ?> </h3> </body> </html>
  • 8. Non-PHP statements line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 ECHO '%3C%21--+example+for+ PHP+5.0.0+final+release+--%3E%0D%0A%0D%0A' 5 1 ASSIGN !0, 'localhost' 6 2 ASSIGN !1, 'root' 7 3 ASSIGN !2, '' 9 4 INIT_FCALL_BY_NAME 'mysql_connect' …… .. snip …. 22 ADD_STRING ~5, ~5, 'MySQL' 23 ASSIGN !4, ~5 14 24 JMP ->25 26 25 ECHO '%0D%0A%3Chtml%3E%0D%0A %0D%0A+%3Chead%3E%0D%0A++%3Ctitle%3EConnecting+user%3C%2Ftitle%3E%0D%0A+%3C%2Fhead%3E%0D %0A%0D%0A+%3Cbody%3E%0D%0A++%3Ch3%3E+%0D%0A+++' 26 ECHO !4 30 27 ECHO '+%0D%0A++%3C%2Fh3%3E %0D%0A+%3C%2Fbody%3E%0D%0A%0D%0A%3C%2Fhtml%3E' 28 RETURN 1 29 ZEND_HANDLE_EXCEPTION
  • 9. Opcodes Each Opcodes consists of: Opcode handler 1 or 2 input operands Optional result operand Optional “Extended value” Meaning opcode dependent, e.g on a ZEND_CAST it defines target type Line number in original source script Opcode. Range 0 - 151. All listed in zend_vm_opcodes.h Total size of each zend_op is 96 bytes Some operations consist of 2 opcodes e.g ZEND_ASSIGN_OBJ 2nd Opcode set to ZEND_OP_DATA struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; };
  • 10. znode One for each operand and result each znode is 24 bytes Type can be as follows: IS_CONST (0x1)‏ program literal IS_TMP_VAR (0x2)‏ temporary variable with no name intermediate result IS_VAR (0x4)‏ temporary variable with a name defined in symbol table IS_UNUSED (0x8)‏ operand not specified IS_CV (0x10)‏ optimized version of VAR For some opcodes type of znode is implied, e.g. for a JMP opcode’s op1 znode defines jump target address in “jmp_addr” EA defines “extended attributes” meanings opcode dependent e.g. on a ZEND_UNSET_VAR it defines if variable is static or not typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode;
  • 11. zend_compile_file()‏ Returns a pointer to zend_op_array for global scope first, it's not an array but a structure zend_op_array contains a pointer to an array of opcodes, plus much more including: pointer to array of complied variables details. More on these later. count of number of temporaries (TMP + VAR) required by opcodes i.e. the number of Zend Engine registers used pointer to hash table for all static's defined by the function the Hashtable is created and populated by the compiler if needed Compiler produces one zend_op_array for: global scope this is the one returned to caller of zend_compile_file and is saved in EG(active_op_array) each user function added to thread’s function table by compiler each user class method added to function table for class by compiler
  • 12. zend_compile_file()‏ Initial opcode array allocated by init_op_array() in zend_opcode.c allocated from heap sufficient for just 64 opcodes Reallocated each time it is full when by get_next_op() Reallocates new array 4 times current size Storage for opcode array freed by call to destroy_op_array() at request end For global scope called from zend_execute_scripts()‏ For functions and methods called by Hash Table dtor routine. More later struct _zend_op_array { …… . zend_uint *refcount; zend_op *opcodes; zend_uint last, size; zend_compiled_variable *vars; int last_var, size_var; zend_uint T;zend_brk_cont_element *brk_cont_array; zend_uint last_brk_cont; zend_uint current_brk_cont; zend_try_catch_element *try_catch_array; int last_try_catch; /* static variables support */ HashTable *static_variables; … .. e.t.c ; ZEND_OP ZEND_OP ZEND_OP ZEND_OP ZEND_OP
  • 13. Opcodes Not all opcode information can be determined as opcodes are generated by compiler, e.g. target address for a JMP opcode. So after all opcodes generated a 2 nd pass is made over opcode array to fill in the missing information: set target for all jump opcodes during compilation jump targets are opcode array index’s. These are changed to absolute addresses set opcode handler to as defined by executor generated: CALL: address of handler function GOTO: address of label SWITCH: identifier (int) for handling CASE block for any operands (op1 or op2) which are CONSTANTS modify zval to is_ref=1, refcount=2 to ensure zval copied trim opcode array to required size; i.e. free unused storage See pass_two() in zend_opcode.c
  • 14. Functions and Classes During MINIT 2 hash tables are built which the compiler uses GLOBAL_FUNCTION_TABLE Populated with names of all built-in functions, and functions defined by any enabled extensions GLOBAL_CLASS_TABLE Populated with default classes and any classes defined by enabled extensions The complier ADDs to both tables during compile step Any new entries are then removed again at request shutdown In a non-ZTS environment compiler updates the GLOBAL hash tables In a ZTS environment GLOBAL tables are read-only A separate r/w copy of each table is created for each new thread and populated from GLOBAL table in compiler_globals_ctor()‏
  • 15. Function table’s One function table per thread Address stored in executor globals (EG)‏ Function table is a Hashtable mapping function name to “zend_function” zend_function structure is itself a union of structure’s Populated with built in functions, extension functions e.t.c by copying GLOBAL_FUNCTIONS_TABLE built during MINIT in compiler_globals_ctor()‏ type == ZEND_INTERNAL_FUNCTION zend_function == zend_internal_function During each request functions defined by user script are added to function table at compile time type == ZEND_USER_FUNCTION zend_function == zend_op_array typedef union _zend_function { zend_uchar type; struct { zend_uchar type; /* never used */ char *function_name; zend_class_entry *scope; ……… . <SNIP > …………. zend_bool pass_rest_by_reference; unsigned char return_reference; } common; zend_op_array op_array; zend_internal_function internal_function; } zend_function;
  • 16. State of play after compile complete opcodes zend_op_array active_oparray symbol_table active_symbol_table function_table class_table executor_globals symbol_table function_table <?php function add5($a) { return $a + 5; } function sub5($a) { return $a - 5; } $a = add5(10); $b = sub5(15); ?> zend_op_array zend_op_array zend_internal_fucntion op_array for global scope zend_internal_fucntion GLOBALS _ENV HTTP_ENV_VARS …… . <internal func> add5 sub5 …… . <internal func>
  • 17. Function tables User entries removed from global function table during RSHUTDOWN processing by call to shutdown_executor()‏ As user function entries are added after all internal functions the code uses the zend_hash_reverse_apply() function to traverse threads function table entries backwards removing entries until type != ZEND_USER_FUNCTION Removal triggers HT dtor routine ZEND_FUNCTION_DTOR which in turn calls destroy_op_array() to free opcode array and other structures which hang of zend_op_array
  • 18. Class_table One class table per thread Address stored in executor globals (EG)‏ Class table is a Hashtable mapping class name to “zend_class_entry” Populated with default classes and extension defined classes by copying GLOBAL_CLASS_TABLE built during MINIT in compiler_globals_ctor()‏ During each request classes defined by user script are added to class table at compile time Each class has its own function table and compiler adds an entry for each method defined by a class
  • 19. State of play after compile complete : opcodes zend_op_array active_oparray symbol_table active_smbol_table function_table class_table executor_globals symbol_table class_table <?php class Dog { function bark()‏ { print &quot;Woof!&quot;; } function sit()‏ { print “Sit!!”; } } $pooch = new Dog; $pooch ->bark(); $pooch ->sit(); ?> zend_op_array function_table function_table zend_op_array function_table zend_internal_fucntion zend_internal_fucntion GLOBALS _ENV HTTP_ENV_VARS …… . <internal class> Dog …… . bark sit …… . <internal func> <internal func> …… . <internal method> <internal method> …… .
  • 20. Class table User class entries are removed by shutdown executor() by traversing threads class table backwards removing all entries until type != ZEND_USER_CLASS Removal triggers HT door routine ZEND_CLASS_DTOR which in turn calls destroy_zend_class()‏ destroy_zend_class() calls zend_hash_destroy() on the class’s function_table which walks the HT and calls dtor ZEND_FUNCTION_DTOR on each entry as described earlier
  • 21. Static variables Local scope but value retained across calls Hashtable allocated by compiler per function or method when first static variable defined Referenced by zend_op_array structure Statics added to Hashtable as found by compiler
  • 22. Examining compile results Two tools available for analysing results of compile VLD Parsekit Both available from PECL
  • 23. VLD Dumps opcodes for a given PHP script Written by Derick Rethans Download from PECL http://pecl.php.net/package/vld/0.8.0 Simple configuration --enable-vld[=shared] Invoked via command line switches php -dvld.active=[ 0 |1] –dvld.execute=[0| 1 ] –f <php script> Can override defaults in php.ini
  • 24. VLD No config.w32 file for Windows ARG_ENABLE(&quot;vld&quot;, “Enable Vulcan Opcode decoder&quot; , &quot;no&quot;); if (PHP_VLD != &quot;no&quot;) { EXTENSION(&quot;vld&quot;, &quot;vld.c srm_oparray.c&quot;); }
  • 25. VLD output line # op fetch ext operands -------------------------------------------------------------------------- 2 0 ECHO '%0A' 4 1 ASSIGN !0, 5 5 2 ASSIGN !1, 10 6 3 ADD ~2, !0, !1 4 ADD ~3, ~2, 99 5 ASSIGN !2, ~3 8 6 INIT_STRING ~5 7 ADD_STRING ~5, ~5, 'c' 8 ADD_STRING ~5, ~5, '%3D+' 9 ADD_VAR ~5, ~5, !2 10 ADD_STRING ~5, ~5, '+' 11 ADD_CHAR ~5, ~5, 10 12 ECHO ~5 11 13 RETURN 1 14 ZEND_HANDLE_EXCEPTION c= 114 <?php $a = 5; $b = 10; $c = $a + $b + 99; echo &quot;c= $c \n&quot;; ?> php -f test.php -dvld.active=1 KEY ! == compiler variable $ == variable ~ == temporary There are TMP’s defied for results here but they are not used and VLD does not list them
  • 26. Why all these “+” in VLD output for CONST’s ? <?php echo &quot;Hello World&quot;; echo &quot;Hello World&quot;; echo &quot;Hello + World&quot;; ?> line # op fetch ext operands ------------------------------------------------------------------------------- 2 0 ECHO '%0D%0A+' 4 1 ECHO 'Hello+World' 5 2 ECHO 'Hello++++++++++++++++++++++++++++++++World' 6 3 ECHO 'Hello+%2B+World' 9 4 RETURN 1 5 ZEND_HANDLE_EXCEPTION Answer: VLD calls php_url_encode() on the CONST to format it before output which amongst other things converts all spaces to “+”. Internally white space is stored as 0x20 as you would expect.
  • 27. parsekit PHP opcode analyser written by Sara Goleman meant for development and debug only; some code not thread safe Download from PECL http://pecl.php.net/package/parsekit Simple configuration --enable-session[=shared] Implements 5 functions parsekit_compile_string parsekit_compile_file parsekit_func_arginfo parsekit_opcode_flags parsekit_opcode_name
  • 28. parsekit array parsekit_compile_string ( string phpcode [, array &errors [, int options]] )‏ compiles and then analyzes supplied string array parsekit_compile_string ( string phpcode [, array &errors [, int options]] )‏ errors: 2 dimensional array of errors encounterd during compile example of use in parsekit/examples options: either PARSEKIT_SIMPLE or PARSEKIT_QUIET PARSEKIT_QUIET results in more verbose output array parsekit_compile_file ( string filename [, array &errors [, int options]] )‏ As above but takes name of a .php file as input array parsekit-func-arginfo (mixed function)‏ Return the arg_info data for a given user defined function/method long parsekit_opcode_flags (long opcode)‏ Return flags which define return type, operand types etc for an opcode string parsekit_opcode_name (long opcode)‏ Return name for given opcode
  • 29. parsekit-compile-string: SIMPLE output <?php $oparray = parsekit_compile_string('echo &quot;HelloWorld&quot;;', $errors, PARSEKIT_SIMPLE); var_dump($oparray); ?> array(5) { [0]=> string(36) &quot;ZEND_ECHO UNUSED 'HelloWorld' UNUSED&quot; [1]=> string(30) &quot;ZEND_RETURN UNUSED NULL UNUSED&quot; [2]=> string(42) &quot;ZEND_HANDLE_EXCEPTION UNUSED UNUSED UNUSED&quot; [&quot;function_table&quot;]=> NULL [&quot;class_table&quot;]=> NULL }
  • 30. parsekit-compile-file: QUIET output <?php $oparray = parsekit_compile_string('echo &quot;HelloWorld&quot;;', $errors, PARSEKIT_QUIET); var_dump($oparray); ?> array(20) { [&quot;type&quot;]=> int(2)‏ [&quot;type_name&quot;]=> string(18) &quot;ZEND_USER_FUNCTION&quot; [&quot;fn_flags&quot;]=> int(0)‏ [&quot;num_args&quot;]=> int(0)‏ [&quot;required_num_args&quot;]=> int(0)‏ [&quot;pass_rest_by_reference&quot;]=> bool(false)‏ [&quot;uses_this&quot;]=> bool(false)‏ [&quot;line_start&quot;]=> int(0)‏ [&quot;line_end&quot;]=> int(0)‏ [&quot;return_reference&quot;]=> bool(false)‏ [&quot;refcount&quot;]=> int(1)‏ [&quot;last&quot;]=> int(3)‏ [&quot;size&quot;]=> int(3)‏ [&quot;T&quot;]=> int(0)‏ [&quot;last_brk_cont&quot;]=> int(0)‏ [&quot;current_brk_cont&quot;]=> int(-1)‏ [&quot;backpatch_count&quot;]=> int(0)‏ [&quot;done_pass_two&quot;]=> bool(true)‏ [&quot;filename&quot;]=> string(49) &quot;C:\Testcases\helloWorld.php&quot; [&quot;opcodes&quot;]=> array(3) { [0]=> array(5) { [&quot;opcode&quot;]=> int(40)‏ [&quot;opcode_name&quot;]=> string(9) &quot;ZEND_ECHO&quot; [&quot;flags&quot;]=> int(768)‏ [&quot;op1&quot;]=> array(3) { [&quot;type&quot;]=> int(1)‏ [&quot;type_name&quot;]=> string(8) &quot;IS_CONST&quot; [&quot;constant&quot;]=> &string(11) &quot;Hello World&quot; } [&quot;lineno&quot;]=> int(3)‏ etc…..
  • 31. parsekit-func-arginfo <? php function foo ($a, stdClass $b, &$c) { } $oparray = parsekit_func_arginfo (‘foo’); var_dump($oparray); ?> array(3) { [0]=> array(3) { [&quot;name&quot;]=> string(1) &quot;a&quot; [&quot;allow_null&quot;]=> bool(true)‏ [&quot;pass_by_reference&quot;]=> bool(false)‏ } [1]=> array(4) { [&quot;name&quot;]=> string(1) &quot;b&quot; [&quot;class_name&quot;]=> string(8) &quot;stdClass&quot; [&quot;allow_null&quot;]=> bool(false)‏ [&quot;pass_by_reference&quot;]=> bool(false)‏ } [2]=> array(3) { [&quot;name&quot;]=> string(1) &quot;c&quot; [&quot;allow_null&quot;]=> bool(true)‏ [&quot;pass_by_reference&quot;]=> bool(true)‏ } }
  • 32. parsekit-opcode-name <?php $opname = parsekit_opcode_name (61); var_dump($opname); ?> string(21) &quot;ZEND_DO_FCALL_BY_NAME&quot; <?php $opflags = parsekit_opcode_flags (61); var_dump($opflags); ?> int(16777218)‏ flags define whether opcode takes op1 and op2, defines EA, sets a result etc
  • 33. Execution path for a script php_execute_script()‏ zend_execute_scripts()‏ zend_execute()‏ user call (function/method )‏ include/require zend_compile_file()‏
  • 34. PHP Interpreter Can be generated in many flavours 12 different versions possible Generated by a chunk of PHP code; zend_vm_gen.php You need to understand regular expressions before attempting to read this code Interpreter generated from definition of each opcode in zend_vm_def.h, and skeletal interpreter body in zend_vm-execute.skl
  • 35. Interpreter generation process zend_vm_gen.php zend_vm_execute.skl zend_vm_def.h zend_vm-execute.h zend_vm_opcodes.h
  • 36. zend_vm_execute.skl {%DEFINES%} ZEND_API void {%EXECUTOR_NAME%}(zend_op_array *op_array TSRMLS_DC)‏ { zend_execute_data execute_data; {%HELPER_VARS%} {%INTERNAL_LABELS%} if (EG(exception)) { return; } /* Initialize execute_data */ EX(fbc) = NULL; EX(object) = NULL; EX(old_error_reporting) = NULL; if (op_array->T < TEMP_VAR_STACK_LIMIT) { EX(Ts) = (temp_variable *) do_alloca(sizeof(temp_variable) * op_array->T); } else { EX(Ts) = (temp_variable *) safe_emalloc(sizeof(temp_variable), op_array->T, 0); } …… etc triggers to zend_vmg_gen.php to insert generated code
  • 37. zend_vm_defs.h ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); } opcode opcode name types accepted for op1 types accepted for op2 .. although this is just a macro! helper function triggers to php code to replace text
  • 38. Interpreter generation process Usage information: php zend_vm_gen.php [options] Options: --with-vm-kind=CALL|SWITCH|GOTO - select threading model (default is CALL)‏ --without-specializer - disable executor specialization --with-old-executor - enable old executor --with-lines - enable #line directives – with-vm-kind defines execution method CALL: Each opcode handler is defined as a function SWITCH: Each opcode handler is a case block in one huge switch statement GOTO: Label defined for each opcode handler --without-specializer means only one handler per opcode With specializer’s a handler generated for each possible combination of operand types A reported 20% speedup with specializers enabled over old executor
  • 39. Interpreter generation process --with-old-executor enables runtime decision to call old pre-ZE2 type executor which is a CALL type executor with no specializer’s zend_vm_use_old_executor() defined to switch executor model no current callers though --with-lines results in addition of #lines directives to generated zend_vm_execute.h #line 28 &quot;C:\PHPDEV\php5.2-200612111130\Zend\zend_vm_def.h&quot; static into ZEND_ADD_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, … . etc default interpreter which is checked into CVS is generated as follows php zend_vm_gen.php –with-vm-kind=CALL
  • 40. Specialization With specialization enabled an handler is generated for each valid combination of input operand As each input operand (op1 and op2) can take 1 of 5 types TMP VAR CV CONST UNUSED This gives a theoretical 25 opcode handlers for each opcode
  • 41. zend_vm_defs.h ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); }
  • 42. ZEND_ADD without specialization static int ZEND_ADD_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, get_zval_ptr(&opline->op1, EX(Ts), &free_op1, BP_VAR_R), get_zval_ptr(&opline->op2, EX(Ts), &free_op2, BP_VAR_R)‏ TSRMLS_CC); FREE_OP(free_op1); FREE_OP(free_op2); ZEND_VM_NEXT_OPCODE(); } Handler calls non-type specific routines to get zval * for op1 and op2
  • 43. ZEND_ADD with specialization static int ZEND_ADD_SPEC_CONST_CONST_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, &opline->op2.u.constant TSRMLS_CC); ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_TMP_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)‏ TSRMLS_CC); zval_dtor(free_op2.var); ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_VAR_HANDLER (ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)‏ TSRMLS_CC); if (free_op2.var) {zval_ptr_dtor(&free_op2.var);}; ZEND_VM_NEXT_OPCODE(); } … . and 13 other handlers Handlers call type specific routines to get zval * for op1 and op2
  • 44. zend_vm_gen.php $op1_get_zval_ptr = array( &quot;ANY&quot; => &quot;get_zval_ptr(&opline->op1, EX(Ts), &free_op1, \\1 )&quot;, &quot;TMP&quot; => &quot;_get_zval_ptr_tmp(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)&quot;, &quot;VAR&quot; => &quot;_get_zval_ptr_var(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)&quot;, &quot;CONST&quot; => &quot;&opline->op1.u.constant&quot;, &quot;UNUSED&quot; => &quot;NULL&quot;, &quot;CV&quot; => &quot;_get_zval_ptr_cv(&opline->op1, EX(Ts), \\1 TSRMLS_CC)&quot;, ); $op2_get_zval_ptr = array( &quot;ANY&quot; => &quot;get_zval_ptr(&opline->op2, EX(Ts), &free_op2, \\1)&quot;, &quot;TMP&quot; => &quot;_get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)&quot;, &quot;VAR&quot; => &quot;_get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)&quot;, &quot;CONST&quot; => &quot;&opline->op2.u.constant&quot;, &quot;UNUSED&quot; => &quot;NULL&quot;, &quot;CV&quot; => &quot;_get_zval_ptr_cv(&opline->op2, EX(Ts), \\1 TSRMLS_CC)&quot;, … ..<snip> function gen_code(….)‏ …… $code = preg_replace( array( ......... &quot;/GET_OP1_ZVAL_PTR\(([^)]*)\)/&quot;, &quot;/GET_OP2_ZVAL_PTR\(([^)]*)\)/&quot;, ........ ), array( ....... ....... $op1_get_zval_ptr[$op1], $op2_get_zval_ptr[$op2], ....... ), $code);
  • 45. Generated code not always the best !! static int ZEND_INIT_ARRAY_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)‏ { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (IS_CONST == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if 0 || IS_CONST != IS_UNUSED } else { return ZEND_ADD_ARRAY_ELEMENT_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU); #endif } } ZEND_VM_HANDLER(71, ZEND_INIT_ARRAY, CONST|TMP|VAR|UNUSED|CV, CONST|TMP|VAR|UNUSED|CV)‏ { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (OP1_TYPE == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if !defined(ZEND_VM_SPEC) || OP1_TYPE != IS_UNUSED } else { ZEND_VM_DISPATCH_TO_HANDLER(ZEND_ADD_ARRAY_ELEMENT); #endif } } Input: zend_vm-def.h Output: zend_vm-execute.h
  • 46. Mapping opcode to an handler Generated zend_execute.h contains an array to map opcodes to handlers without specializers array has just 151 entries with specializers 3775 (151 * 25) entries zend_execute.c defines a function to enable compiler to determine correct handler for a given opcode zend_vm_set_opcode_handler(zend_op *op)‏ Decodes type information for op1 and op2 in supplied “zend_op” and picks appropriate handler from array of handlers. Handler returned will be either: function pointer for handler when CALL id of handler routine for SWITCH address of handlers label for GOTO Mapping performed at compile time pass_two() of complier calls zend_vm_set_opcode_handle() to patch handler into all generated opcodes
  • 47. zend_execute By default zend_execute function pointer addresses the generated execute() routine in zend_execute.h This is called by zend_execute_scripts() with : a pointer to the zend_op_array for global scope, and if ZTS enabled the tsrm_ls pointer Executor keeps state data for current user function in zend_execute_data structure which is allocated in execute() stack frame Address of currently executing functions zend_execute_data stored in EG struct _zend_execute_data { struct _zend_op *opline; zend_function_state function_state; zend_function *fbc; /* Function Being Called */ zend_op_array *op_array; zval *object; union _temp_variable *Ts; zval ***CVs; zend_bool original_in_execution; HashTable *symbol_table; struct _zend_execute_data *prev_execute_data; zval *old_error_reporting; };
  • 48. execute()‏ On entry acquire storage for Temporary variables Number of temporary variables used by function stored in “T” field of zend_op_array Storage allocated on stack if alloca() available and T < 2000 If alloca not available or 2000+ temporaries then allocated by emalloc from heap CV cache Number of compiled variables used stored in “last_var” field of zend_op_array Allocated on stack regardless of size if alloca available or emalloc otherwise Initialize zend_execute_data Initialize EX(opline) to address first opcode to execute EX(symbol_table) = EG(active_symbol_table)‏ EX(prev_execute_data) = EG(current_execute_data); EG(current_execute_data) = &execute_data; zend_execute_data ……… current_execute_data …… .... executor_globals zend_execute_data null global scope foo()‏ <?php function foo() { … } …… foo(); }
  • 49. Operand Types Operands Op1 and Op2 can be either: VAR ($)‏ Temporary variable into which interpreter caches zval * and zval ** for a defined symbol. TMP (~)‏ Temporary variable were interpreter keeps an intermediate result. For example $a = $b + $c, the sum of $b and $c will be stored in a TMP before being assigned to $a CV (!)‏ Compiled variable. Optimized version of a VAR. More to follow shortly CONSTANT Program literal, e.g. $a = “hello” Symbols are also constants ZVAL allocated by complier ZVAL has is_ref=1 refcount=2 to force split on assignment UNUSED Operand not defined for opcode Result operand can be VAR, TMP or CV
  • 50. Temporary Variables: VAR and TMP “ Ts” field of zend_execute_data addresses an array of temp_variables Size of array based on information gathered by compiler. The “var” field in the operands znode contains the offset into the “temp_variables” array Temporaries are each 24 bytes T and EX_T macros provided to do this Temporary variables are NOT re-used by compiler struct _zend_execute_data { struct _zend_op *opline; …… . union _temp_variable *Ts ; … .. etc }; typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode; typedef union _temp_variable { zval tmp_var ; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; zval *str; zend_uint offset; } str_offset; zend_class_entry *class_entry; } temp_variable;
  • 51. VAR variables FETCH_W $0, 'a' /* Retrieve the $a variable for writing */ ASSIGN $0, 123 /* Assign the numeric value 123 to retrieved variable 0 */ FETCH_W $2, 'b' /* Retrieve the $b variable for writing */ ASSIGN $2, 456 /* Assign the numeric value 456 to retrieved variable 2 */ FETCH_R $5, 'a' /* Retrieve the $a variable for reading */ FETCH_R $6, 'b' /* Retrieve the $b variable for reading */ ADD ~7, $5, $6 /* Add the retrieved variables (5 & 6) together and store the result in 7 */ FETCH_W $4, 'c' /* Retrieve the $c variable for writing */ ASSIGN $4, ~7 /* Assign the value in temporary variable 7 into retrieved variable 4 */ FETCH_R $9, 'c' /* Retrieve the $c variable for reading */ ECHO $9 /* Echo the retrieved variable 9 */ <?php $a = 123; $b = 456; $c = $a + $b; echo $c; ?> Note: Each time $a is accessed we look it up in symbol table and store result in a different VAR
  • 52. VAR variables typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; … etc } temp_variable; ZVAL typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; … etc } temp_variable; ZVAL After FETCH_R After FETCH_RW or FETCH_W pDataPtr symbol_table pDataPtr symbol_table
  • 53. Compiled Variables Introduced in PHP 5.1 Avoids need for expensive symbol table lookup EVERY TIME a symbol is referenced The “var” field in the operands znode contains the index into the CV cache When variable is initialized at runtime engine looks up symbol in symbol table and stores zval ** in a CV cache addressed from zend Hash value of variable calculated at compile time which allows “quick” HT functions to be used at runtime Subsequent uses of CV avoid symbol table lookup All references to same symbol by a function/method refer to same CV Unlike temporary variables Only supported for simple variables i.e. not object properties, auto globals or “this” pointer For more information: See Sara Golemon’s blog on subject: http://blog.libssh2.org/index.php?/archives/21-Compiled-Variables.html
  • 54. Compiled Variables – compile time processing An array of eligible variables constructed at compile time by lookup_CV()‏ Address of array stored in “vars” field of zend_op_array For any variable eligible to be a CV compiler walks the current “vars” array to check for a match. If found index returned. If not found then its added in next free slot Name and length of symbol Hash code Array allocated from heap. When array fills its extended by 16 entries by erealloc struct _zend_op_array { …… zend_compiled_variable *vars; int last_var, size_var; … e.t.c }; last_var contains index of last slot used, size_var last available index typedef struct zend_compiled_variable { char *name; int name_len; ulong hash_value; } zend_compiled_variable;
  • 55. Compiled Variables – At runtime To access a CV, we extract CV number from the znode and use as index into CV cache. If CV cache slot non-zero then you have the zval **!! No symbol table lookup If CV cache slot is zero then it’s the first reference to X so: Lookup X in symbol table using pre-computed hash, i.e. zend_hash_quick_find()‏ First lookup of a symbol in a function will also fail so symbol is also added to symbol table at this point if lookup for W or RW using information in “vars” array. Uses zend_hash_quick_update() using pre-computed hash again. If lookup is for R then CV cache set to address EG(uninitialized_zval)‏ Save returned zval** for X saved in CV cache On userspace “unset” the CV cache is entry is set to NULL to force ht lookup on next reference to symbol ..... zval *** CVs … . zval ** execute_data CV cache ZVAL zval * symbol table … .. zend_uint var; … . znode pDataPtr slot of a HT bucket
  • 56. Compiled variables: with regular variables <?php $a = 123; $b = 456; $c = $a + $b; echo $c; ?> FETCH_W $0, 'a' /* Retrieve the $a variable for writing */ ASSIGN $0, 123 /* Assign the numeric value 123 to retrieved variable 0 */ FETCH_W $2, 'b' /* Retrieve the $b variable for writing */ ASSIGN $2, 456 /* Assign the numeric value 456 to retrieved variable 2 */ FETCH_R $5, 'a' /* Retrieve the $a variable for reading */ FETCH_R $6, 'b' /* Retrieve the $b variable for reading */ ADD ~7, $5, $6 /* Add the retrieved variables (5 & 6) together and store the result in 7 */ FETCH_W $4, 'c' /* Retrieve the $c variable for writing */ ASSIGN $4, ~7 /* Assign the value in temporary variable 7 into retrieved variable 4 */ FETCH_R $9, 'c' /* Retrieve the $c variable for reading */ ECHO $9 /* Echo the retrieved variable 9 */ ASSIGN !0, 123 /* Assign the numeric value 123 to compiled variable 0 */ ASSIGN !1, 456 /* Assign the numeric value 456 to compiled variable 1 */ ADD ~2, !0, !1 /* Add compiled variable 0 to compiled variable 1 */ ASSIGN !2, ~2 /* Assign the value of temporary variable 2 to compiled variable 2 */ ECHO !2 /* Echo the value of compiled variable 2 */ Without CV With CV
  • 57. Compiled variables : with Object variables <?php $f->a = 123; $f->b = 456; $f->c = $f->a + $F->b; echo $f->c; ?> ASSIGN_OBJ !0, 'a' /* Assign the numeric value 123 to property 'a' of compiled variable 0 object */ OP_DATA 123 /* Additional data for ASSIGN_OBJ opcode */ ASSIGN_OBJ !0, 'b' /* Assign the numeric value 456 to property 'b' of compiled variable 0 object */ OP_DATA 456 /* Additional data for ASSIGN_OBJ opcode */ FETCH_OBJ_R $3, !0, 'a‘ /* Retrieve property 'a' from compiled variable 0 object */ FETCH_OBJ_R $4, !0, 'b‘ /* Retrieve property 'b' from compiled variable 0 object */ ADD ~5, $3, $4 /* Add those values and store the result in temp var 5 */ ASSIGN_OBJ !0, 'c' /* Assign the ADD result to property 'c' of compiled variable 0 object */ OP_DATA ~5 /* Additional data for ASSIGN_OBJ opcode */ FETCH_OBJ_R $6, !0, 'c‘ /* Retrieve property 'c' from compiled variable 0 object */ ECHO $6 /* Echo the value */ With CV Note: Properties are re-fetched every time a read or write is performed on them which cannot be avoided due to the magic methods _get(), _set() e.t.c. which can return a different variable on different fetches
  • 58. Symbol tables One for global scope created during RINIT processing by call to init_executor()‏ Add “GLOBALS entry to new symbol table; which is a recursive reference Populated with requested super globals by call to php_hash_environment() _POST, _GET, and _COOKIE if INI register_argc_argc=on specified then argv and argc added if INI auto_globals_jit = off specified then _ENV, _SERVER and _REQUEST added If auto_globals_jit=on they are added by compiler if reference found; see zend_is_auto_global() if INI register_long_arrays=on specified long versions of _ENV, _POST etc if INI register_globals=yes the adds all globals to symbol table Symbol tables created for each called function/method at time of call created and freed in zend_do_fcall_common_helper()‏ Hashtable for Symbol table allocated from a cache if any available Up to 32 Symbol table HashTable’s cached Hashtables are cleared before being added to cache so no need to initialize on allocation from cache Otherwise allocate and init a new HashTable Symbols added to symbol table at runtime on first reference to a symbol On reference to a symbol we do a Hashtable lookup; if lookup fails we add it Initially its set to reference a special zval of uninitialized_zval until its assigned to
  • 59. Which symbol table to use ? EX(active_symbol_table) contains reference to current function/method symbol table However, as super/auto globals only stored in global scope symbol table what happens if a function references one ? this is where the EA attribute of znode comes into play !! EA for op2 set by complier to direct interpreter to FETCH from global symbol table EX(symbol_table) rather than current symbol table EX(active_symbol_table) when it finds a reference to an auto_global complier checks every symbol against auto_globals hash table see fetch_simple_variable_ex()‏ different flag set in EA for static variable, class static etc see zend_get_target_symbol_table() <?php function foo() { $a = $_ENV; } foo(); ?> line # op fetch ext operands ------------------------------------------------------------------------------- 9 0 FETCH_R global $0, '_ENV' 1 ASSIGN !0, $0 10 2 RETURN null 3 ZEND_HANDLE_EXCEPTION
  • 60. Static Variables Hashtable referenced by zend_op-array details all statics, if any, defined for a function/method When a static is accessed at runtime an entry is added to local symbol table (EG(active_symbol_table) Entry made to reference same zval referenced by static_variables Hashtable in a “Change on Write” set just like a parameter passed by reference
  • 61. Static Variables <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> line # op fetch ext operands ------------------------------------------------------------------------------- 5 0 FETCH_W static $0, 'count' 1 ASSIGN_REF !0, $0 6 2 POST_INC ~1, !0 3 FREE ~1 8 4 RETURN null 5 ZEND_HANDLE_EXCEPTION vld output for foo()‏ count value=0 is_ref=0 refount=1 zval count static_variables EG(active_symbol_table)‏ value=0 is_ref=1 refount=2 <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> <?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo(); ?> value=1 is_ref=1 refount=2
  • 62. Function calling Opcode sequence depends if target known at compile time Target function name is not known at compile time if Call site before user function definition Conditional functions used Referred to in code as “dynamic function call” If complier cannot verify function name at compile time then sequence is INIT_FCALL_BY_NAME performs a runtime check on function name SEND_* for each argument. EA set to force arg checks for “pass by ref” at runtime DO_FCALL_BY_NAME If compiler can verify function name then sequence is just SEND_* for each argument DO_FCALL foo(&£a, $b, 100)‏ SEND_REF SEND_VAR SEND_VAL
  • 63. Function calling <?php $a= 10; $b= 5; foo($a, $b); function foo(&$x, &$y) { echo &quot;foo called with: $x $y&quot;; } foo($a, $b); ?> line # op fetch ext operands ---------------------------------------------------- ------------------- 2 0 ECHO '%0D%0A+' 4 1 ASSIGN !0, 10 5 2 ASSIGN !1, 5 6 3 INIT_FCALL_BY_NAME 'foo' 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL_BY_NAME 2 0 8 7 NOP 14 8 SEND_REF !0 9 SEND_REF !1 10 DO_FCALL 2 'foo', 0 17 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION SEND_VAR opcode’s extended_value set when FCALL_BY_NAME to force SEND_VAR handler to check expected args at RUNTIME and if call by REF expected it re-dispatches SEND_REF handler Uses EX(fbc) set by INIT_FCALL_BY_NAME to access required arg info. extended_value on FCALL opcode is number of arguments passed opcodes for global scope
  • 64. Calling other user functions When user space function calls another user function or method then arguments pushed to a LIFO argument stack Address in EG(argument stack)‏ Initial argument stack of 64 slots is allocated by init_executor() When full a new stack twice the size allocated ZEND_SEND_* opcodes for each argument pushes a zval * to argument stack zval’s are split when necessary
  • 65. Example 1: Passing arguments by value without splitting line # op ext operands -------------------------------------------------- 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 12 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL 2 'foo', 0 15 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION value=10 refcount= 3 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;; } $a= 10; $b= 5; $c= $a foo($a, $b); ?> value=5 refcount=2 is_ref= 0 a b c symbol_table After opcode #5: number of args opcodes for global scope no need to split zval for $a as its part of a “copy on write” set NULL NULL arg1 arg2 2
  • 66. Example 2: Passing arguments by value when splitting required line # op ext operands ------------------------------------------------- 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 12 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL 2 'foo', 0 15 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;; } $a= 10; $b= 5; $c= &$a foo($a, $b); ?> After opcode #5: we have to split zval for $a as its part of a “change on write” set opcodes for global scope NULL NULL arg1 arg2 2 value=10 refcount= 2 is_ref= 1 argument_stack value=5 refcount=2 is_ref= 0 a b c symbol_table value=10 refcount=1 is_ref= 0
  • 67. Example 3: Passing arguments by reference when splitting required value=10 refcount= 1 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;; } $a= 10; $b= 5; $c = $a foo(&$a, &$b); ?> value=10 refcount=2 is_ref= 1 a b c symbol_table CV is actually in op1 not result operand line # op ext operands ------------------------------------------------ 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 11 4 SEND_REF !0 5 SEND_REF !1 6 DO_FCALL 2 'foo', 0 14 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION value=10 refcount= 2 is_ref= 1 After opcode #5: here we have to split zval for $a as its part of a “copy on write” set NULL NULL arg1 arg2 2
  • 68. Example 4: Passing arguments by reference (compile time)‏ value=10 refcount= 1 is_ref= 0 argument_stack <?php $a= 10; $b= 5; $c= $a; foo($a, $b); function foo(&$x, &$y) { echo &quot;foo called with: $x $y&quot;; } ?> value=10 refcount=2 is_ref= 1 a b c symbol_table line # op ext operands ---------------------------------------------------- 5 0 ASSIGN !0, 10 6 1 ASSIGN !1, 5 7 2 ASSIGN !2, !0 8 3 INIT_FCALL_BY_NAME 'foo' 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL_BY_NAME 2 0 10 7 NOP 16 8 RETURN 1 9 ZEND_HANDLE_EXCEPTION value=10 refcount= 2 is_ref= 1 After opcode #5: As its FCALL_BY_NAME extra checks in SEND_VAR kick in to check arg info for compile time call by ref. If so redispatch SEND_REF to do right thing ! Not shown but op2 znode is used to save argument number which is used to index into arg info structure NULL NULL arg1 arg2 2
  • 69. Receiving arguments passed by value value=10 refcount= 4 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;; } $a= 10; $b= 5; $c = $a foo($a, $b); ?> value=10 refcount=3 is_ref= 0 line # op ext operands -------------------------------------------------- 3 0 RECV 1 1 RECV 2 4 2 INIT_STRING ~0 3 ADD_STRING ~0, ~0, 'foo' 4 ADD_STRING ~0, ~0, '+' 5 ADD_STRING ~0, ~0, 'called‘ e.t.c After opcode #1: this is argument number Result op is CV for arg NULL NULL arg1 arg2 2 a b c callers symbol_table x y callee symbol_table
  • 70. Receiving arguments passed by reference value=10 refcount= 1 is_ref= 0 argument_stack <?php function foo($x, $y) { echo &quot;foo called with: $x $y&quot;; } $a= 10; $b= 5; $c = $a foo(&$a, &$b); ?> value=10 refcount=3 is_ref= 1 line # op ext operands -------------------------------------------------- 3 0 RECV 1 1 RECV 2 4 2 INIT_STRING ~0 3 ADD_STRING ~0, ~0, 'foo' 4 ADD_STRING ~0, ~0, '+' 5 ADD_STRING ~0, ~0, 'called‘ e.t.c value=10 refcount= 3 is_ref= 1 After opcode #1: NULL NULL arg1 arg2 2 a b c symbol_table x y symbol_table
  • 71. Function binding <?php function a() { echo &quot;called function a&quot;; } function b() { echo &quot;called fucntio b&quot;; } a(); b(); ?> line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 NOP 6 1 NOP 10 2 DO_FCALL 0 'a', 0 11 3 DO_FCALL 0 'b', 0 14 4 RETURN 1 5 ZEND_HANDLE_EXCEPTION What are these NOP’s ?
  • 72. Function binding They are artefacts of “function binding” When compiler encounters a function declaration in a script it generates a “ZEND_DECLARE_FUNCTION” opcode in current opcode array op1 is long function name \0<function name><file name><address> “ fooC:\Testcases\tes.php0012CD38” where address is character position of last char of function prototype in scripts buffer op2 is short name, i.e. just ”foo” a function table entry is added by compiler for long name After parsing function body and generating its zend_op_array compiler then performs “early binding” for the unconditional functions Effectively executes opcode at compile time. See zend_do_early_binding(). Opcode checks for duplicate function names Looks up function table entry for long name. This should always be successful !! Attempts to add a function entry with short name using zend_function_entry just retrieved. If this fails we have a duplicate function name and an error message is produced detailing filename and line number of previous declaration. If no duplicate then the ZEND_DECLARE_FUNCTION opcode is converted to a NOP opcode set to ZEND_NOP op1 and op2 set to UNUSED and zval’s for name strings freed Deletes function table entry for “long name”
  • 73. Conditional Functions Same function name can be defined multiple time with different content and/or signature A zend_op_array generated for each different version of a conditional function Which function gets executed not known until runtime So function binding delayed until runtime ZEND_DECLARE_FUNCTION perists on complier output <?php $a= 10; if (a > 10) { function foo()‏ { echo &quot;foo has no parms&quot;; } } else { function foo($a)‏ { echo &quot;foo has 1 parm&quot;; } } if (a > 10) { foo(); } else { foo($a); }
  • 74. Conditional Functions line # op fetch ext operands ------------------------------------------------------------------------------- 2 0 ECHO '%0D%0A+' 5 1 ASSIGN !0, 10 7 2 FETCH_CONSTANT ~1, 'a' 3 IS_SMALLER ~2, 10, ~1 4 JMPZ ~2, ->7 8 5 ZEND_DECLARE_FUNCTION '%00fooC%3A%5CEclipse-PHP%5C workspace%5CTestcases%5Ctest.php0140E973', 'foo' 12 6 JMP ->8 13 7 ZEND_DECLARE_FUNCTION '%00fooC%3A%5CEclipse-PHP%5C workspace%5CTestcases%5Ctest.php0140E9B9', 'foo' 20 8 FETCH_CONSTANT ~3, 'a' 9 IS_SMALLER ~4, 10, ~3 10 JMPZ ~4, ->14 . . . e.t.c . . .
  • 75. Conditional Functions line # op fetch ext operands ------------------------------------------------------------------------------- 10 0 ECHO 'foo+has+no+parms' 11 1 RETURN null 2 ZEND_HANDLE_EXCEPTION line # op fetch ext operands ------------------------------------------------------------------------------- 13 0 RECV 1 15 1 ECHO 'foo+has+1+parm' 16 2 RETURN null 3 ZEND_HANDLE_EXCEPTION zend_op_array for foo()‏ zend_op_array for foo($a)‏
  • 76. Exception Handling <?php function foo($x)‏ { if ($x > 1 ) { throw new Exception; } } try { foo(1); } catch (Exception $e) { echo &quot;exception 1&quot;; } try { foo(2); } catch (Exception $e) { echo &quot;exception 2&quot;; } ?> line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 NOP 11 1 SEND_VAL 1 2 DO_FCALL 1 'foo', 0 13 3 ZEND_FETCH_CLASS :1, 'Exception' 4 ZEND_CATCH null, 'e' 14 5 ECHO 'exception+1' 18 6 SEND_VAL 2 7 DO_FCALL 1 'foo', 0 20 8 ZEND_FETCH_CLASS :3, 'Exception' 9 ZEND_CATCH null, 'e' 21 10 ECHO 'exception+2' 25 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 RECV 1 5 1 IS_SMALLER ~0, 1, !0 2 JMPZ ~0, ->8 6 3 ZEND_FETCH_CLASS :1, 'Exception' 4 NEW $2, :1 5 DO_FCALL_BY_NAME 0 0 6 ZEND_THROW $2 7 7 JMP ->8 8 8 RETURN null 9 ZEND_HANDLE_EXCEPTION not shown by VLD but extended value of CATCH opcode contains opcode number to branch too if exception not thrown. if no exception thrown during TRY block then we actually execute the ZEND_FETCH_CLASS and ZEND_CATCH opcodes. ZEND_CATCH on finding no exception thrown dispatches first opcode after end of catch block, i.e 6 in this case
  • 77. Exception handling When ZEND_THROW opcode executes it sets EG(exception) and EG(opline_before_exception) before dispatching the ZEND_HANDLE_EXCEPTION opcode at end of current op array see zend_throw_exception_internal()‏ ZEND_HANDLE_EXCEPTION opcode handler checks all try/catch blocks in current scope to see if the range they cover includes the last opcode executed in current scope before exception. If any dispaches the firstopcode of the catch block which will be ZEND_FETCH_CLASS uses an array built by compiler which defines scope of eah try/catch block array records scope in terms of opcode number of first try block opcode and opcode number of first catch block opcode if none then return to caller on seeing EG(exception) still set return processing sets EG(opline_before_exception) to the last opcode executed in the caller, i.e. the FCALL opcode, and then sets next opcode in caller to the ZEND_HANDLE_EXCEPTION opcode at end of callers opcode array Repeat until a catch block found or we return from global scope and which point if EG(execption) set “uncaught exception” error msg produced
  • 78. Exception handling struct _zend_op_array { …… . zend_try_catch_element *try_catch_array; int last_try_catch; … .. etc }; typedef struct _zend_try_catch_element { zend_uint try_op; zend_uint catch_op; /* ketchup! */ } zend_try_catch_element ; struct _zend_execute_data { struct _zend_op *opline; …… . zend_op_arrary *op_array … .. etc }; contains opcode number of first opcode of try and catch blocks array is realloacted as every try/catch block found by compiler
  • 79. Exception Handling line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 NOP 11 1 SEND_VAL 1 2 DO_FCALL 1 'foo', 0 13 3 ZEND_FETCH_CLASS :1, 'Exception' 4 ZEND_CATCH null, 'e' 14 5 ECHO 'exception+1' 18 6 SEND_VAL 2 7 DO_FCALL 1 'foo', 0 20 8 ZEND_FETCH_CLASS :3, 'Exception' 9 ZEND_CATCH null, 'e' 21 10 ECHO 'exception+2' 25 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 RECV 1 5 1 IS_SMALLER ~0, 1, !0 2 JMPZ ~0, ->8 6 3 ZEND_FETCH_CLASS :1, 'Exception' 4 NEW $2, :1 5 DO_FCALL_BY_NAME 0 0 6 ZEND_THROW $2 7 7 JMP ->8 8 8 RETURN null 9 ZEND_HANDLE_EXCEPTION try_op = 1 catch_op = 3 try_op = 6 catch_op = 8 try_catch_array last_try_catch = 2 zend_op_array try_catch_array last_try_catch = 0 zend_op_array null