The Magic Behind PHP

Published in

Jotform Tech

4 min readSep 7, 2023

Are you curious about how PHP works internally? This post is the first in a series about better understanding PHP and how it works. In the next posts, I will cover OPcache, preloading, and also JIT. But first, let’s start with non-optimized PHP.

Definition Of PHP

If you search for the definition of PHP in any search engine, you will find results like this: “Widely-used open source general-purpose scripting language that is especially suited for web development and can be embedded into HTML.”

Let’s look at the “scripting” part of this definition.

What is scripting?

In programming, we can look at two different approaches: compiled languages and interpreted languages. PHP is one of the interpreted languages.

There are major differences between these two different approaches. If you are typing a C or C++, after writing code, you need to compile the program to get a binary (executable). If you re-run the program, it still works with the same executable. If you change something, you need to compile it again.

But in interpreted languages like PHP, we don’t compile the program in the development phase. There is an interpreter for doing this. In the PHP world, Zend Engine handles it.

How does PHP work internally?

In simple terms, PHP works by tokenizing, parsing, compiling, and executing.

Tokenize: In this step, PHP checks the source code character by character. Then it creates tokens that are basically a simplified data structure for the next steps. In tokenizing, PHP uses re2c (a free, open-source lexer generator) with given tokenizing rules. Also, remove the comments in this part.

Note: Some characters like =, ;, :, ? are considered tokens by themselves.

Parse (syntax analyses):

In this step, PHP

Checks to see if the script matches the syntax rules
Uses tokens to build an Abstract Syntax Tree

Using the php-ast extension, you can preview an example of this structure.

Compilation: In this part, the bytecodes (opcodes in the PHP world) are generated from AST. Opcodes are basically VM instructions.

If we look at an example, a basic echo script converts like this.

There are some different ways to dump the opcodes from the given script:

PHP’s internal OPcache extension
phpdbg debugger
Vulcan Logic Dumper (VLD) PECL extension, which
provides easy ways to dump all the opcodes for a given snippet or a file.

Also, you can always see a return statement at the end of the opcodes. It’s related to the working style of the Zend Execution step. Basically, it is an infinite loop and runs the instructions one by one. To stop this loop, it needs a return.

In the next posts, I will cover OPcache, preloading, and also JIT.

How I Scraped More Than 100.000 Posts on Linkedin

I need to analyze some LinkedIn posts and I decided to scrape viral posts from Linkedin. This is the story of LinkedIn…

atakde.medium.com

Introduction to Serverless DB PlanetScale

In this article, I will share my notes about a serverless MySQL platform named as PlanetScale.

keeplearning.dev

Getting out of the MVC bottleneck

Model-View-Controller (MVC) is a popular architectural design pattern used in software development, particularly in web…

tech.jotform.com

Atakan Demircioğlu is a Full Stack Developer currently working at Jotform, a leading online form-builder platform.

He is passionate about blogging, learning, and creating. Sharing his experiences on Medium and GitHub.

Twitter: https://twitter.com/atakde

Github: https://github.com/atakde

Buy me a coffee if you like the content.