Photo by Ben Griffiths on Unsplash

The Magic Behind PHP

Atakan Demircioğlu
Jotform Tech
Published in
4 min readSep 7, 2023

--

Are you curious about how PHP works internally? This post is the first in a series about better understanding PHP and how it works. In the next posts, I will cover OPcache, preloading, and also JIT. But first, let’s start with non-optimized PHP.

Definition Of PHP

If you search for the definition of PHP in any search engine, you will find results like this: “Widely-used open source general-purpose scripting language that is especially suited for web development and can be embedded into HTML.”

Let’s look at the “scripting” part of this definition.

What is scripting?

In programming, we can look at two different approaches: compiled languages and interpreted languages. PHP is one of the interpreted languages.

There are major differences between these two different approaches. If you are typing a C or C++, after writing code, you need to compile the program to get a binary (executable). If you re-run the program, it still works with the same executable. If you change something, you need to compile it again.

But in interpreted languages like PHP, we don’t compile the program in the development phase. There is an interpreter for doing this. In the PHP world, Zend Engine handles it.

How does PHP work internally?

In simple terms, PHP works by tokenizing, parsing, compiling, and executing.

Tokenize: In this step, PHP checks the source code character by character. Then it creates tokens that are basically a simplified data structure for the next steps. In tokenizing, PHP uses re2c (a free, open-source lexer generator) with given tokenizing rules. Also, remove the comments in this part.

Note: Some characters like =, ;, :, ? are considered tokens by themselves.

Parse (syntax analyses):

In this step, PHP

  • Checks to see if the script matches the syntax rules
  • Uses tokens to build an Abstract Syntax Tree

Using the php-ast extension, you can preview an example of this structure.

Compilation: In this part, the bytecodes (opcodes in the PHP world) are generated from AST. Opcodes are basically VM instructions.

If we look at an example, a basic echo script converts like this.

There are some different ways to dump the opcodes from the given script:

  • PHP’s internal OPcache extension
  • phpdbg debugger
  • Vulcan Logic Dumper (VLD) PECL extension, which
    provides easy ways to dump all the opcodes for a given snippet or a file.

Also, you can always see a return statement at the end of the opcodes. It’s related to the working style of the Zend Execution step. Basically, it is an infinite loop and runs the instructions one by one. To stop this loop, it needs a return.

In the next posts, I will cover OPcache, preloading, and also JIT.

Atakan Demircioğlu is a Full Stack Developer currently working at Jotform, a leading online form-builder platform.

He is passionate about blogging, learning, and creating. Sharing his experiences on Medium and GitHub.

Twitter: https://twitter.com/atakde

Github: https://github.com/atakde

Buy me a coffee if you like the content.

--

--

Passionate about blogging and sharing insights on tech, web development, and beyond. Join me on this digital journey! 🚀