What I learned porting JustHTML to PHP with GPT 5.2 Codex

A couple of weeks ago, I read Emil Stenström’s write-up on building JustHTML, then Simon Willison’s post about porting it to JavaScript with Codex. That inspired me to create a PHP port in the same vein: a pure PHP library for HTML parsing that passes the html5lib test suite and stays compatible with as many PHP versions as possible, while still feeling natural to use for PHP developers.

The result is justhtml-php, an HTML5 parser for PHP with CSS selectors, streaming, and text/HTML/Markdown output:

GitHub: https://github.com/diffen/justhtml-php
Composer/Packagist: https://packagist.org/packages/diffen/justhtml

The Setup

$20 per month plan with ChatGPT.
codex-cli 0.84.0. Model: GPT 5.2 Codex, reasoning set to High
Yolo mode: Nope
Ralph loop: Nope

This was done over a couple of intermittently-used Codex sessions across the week with each session paused for many hours at a time. Nothing was Ralphed or Yoloed; I was hitting Yes for most of the actions codex wanted to take.

How it went

My initial prompt was just a link to Emil’s JustHTML-Python project on Github and asking the agent to port it to PHP. The initial port came together quite quickly and without much input from me. Most of my time went into performance tuning, benchmarking, testing on all PHP versions, and improving documentation to make it easier to understand. The agent also helped me publish the library as a composer package on Packagist and hook it up to Github for auto-updates.

As part of performance tuning, Codex suggested adding a queryFirst method that would consume less time and memory than loading all nodes matching a css selector. This was a good example of the AI suggesting features, not just coding.

What I learned

Codex CLI rarely hits limits
Claude Code tends to hit limits faster for me; Codex just kept going. I’m on the $20 per month plan for both ChatGPT and Anthropic. But I didn’t encounter rate limiting even once on codex. I pulled in Claude Opus 4.5 for one round of perf tuning after all the tests were already passing, and it couldn’t finish the job before it was rate limited. It also hit a token limit reading one of the files. Codex didn’t break a sweat. The compaction in codex also seemed to work great. I could see the available context shrinking from 100% down with every step but then suddenly it would be back up to like 97% and it would keep chugging along without any apparent loss in quality.
I’m still too chicken for YOLO mode
I never ran Codex with --yolo. I did hit “yes” 99% of the time, but there were still issues where I felt it was important to first plan, or to review the edits codex made before I allowed it to git commit and push. I don’t see myself being able to got into full delegation mode any time soon. Hats off to the masters who are doing that and running dozens of agents in parallel.
Approvals for commands don’t work well in codex

When prompting for permission to run something like git commit -m "Optimize tokenizer hot paths" codex gives me 3 options: 1) Yes; 2) Yes, and don’t ask again for messages that begin with git commit -m "Optimize tokenizer hot paths"; 3) No, do something else. I wanted to give it permission for all git commit commands but that option was never presented. Claude Code is smarter at this. Codex behavior is very odd here and I’m surprised no one at OpenAI has noticed this.
Audio notifications would be very useful
The new development workflow now in the age of LLMs is: give an agent a task, go doomscroll on Twitter, then come back. It’s easy to get distracted and forget to check in on your agent, which then is just sitting there waiting for your input. The real masters are proponents of Ralphing and yoloing but for control freaks like me it would be useful if a ding/sound played every time the agent is blocked on your input. Apparently you can set notification hooks for this but I couldn’t manage to get them to work.
Humans can help with documentation quality
I think I added some value in making the documentation more developer-friendly. Codex didn’t have the best Quickstart examples in the readme; they were either very basic, or didn’t have comments, or didn’t show all the main features. Being dumber than codex helped because in order to understand the examples myself, I pushed codex to explained them better.
The agent introduced silent fallbacks
Early on, a DOM\HTMLDocument benchmark would silently fall back to DOMDocument on older PHP versions. That made older versions report metrics for a feature that is only available in PHP 8.4+. I changed things so older versions show “not installed” instead of silently using libxml. I think you need to be skeptical about the agent’s outputs. It helps to have an intuition for what could go wrong. And if something is out of place, don’t ignore it. If you dig a little bit you might find the agent has made strange decisions you don’t agree with.
GPT-5.2-Codex is genuinely strong
The quality was excellent. The initial port was essentially a single shot. There are excellent test cases already available so this is a good problem for an autonomous agent to code for. Then I ran some benchmarking and asked it how it could improve speed. It came up with some ideas and the per-document parsing time went down from 20ms to about 11ms. I pushed it more and it was able to bring it down to about 7ms. Then I brought in Claude Opus 4.5 to see what it could do. Claude made a couple of performance tweaks, but the before/after benchmark numbers did not move. I love Anthropic (full disclosure: I’m an investor in Anthropic) but credit where it’s due: Codex performed really well. [Aside: I recently built a Lichess extension to delay your moves for a few seconds so you can undo them. I tried Claude first, then Gemini, and neither worked. Codex was the only one that was able to implement the extension. I might blog about that separately when I publish that extension.]
I’m never going to write another line of code ever again
I will tell agents what to do. I will get them to interview me and write detailed specs. I will look for the kinds of software development mistakes I have made myself or seen other coders make. I will ask the agent about security, performance, re-usability, good documentation. I will try to engineer a good system. I will try to verify it. I will be skeptical and perhaps even a little bit anal. But code I shall not write. Not any more. I learned C when I was 16 so this is the end of an era.
Agency, taste and high standards will matter
Much as I would like to think that my guidance helped improve this library, I don’t think technical acumen will be necessary in a few years. Non-technical people will be able to download skills or plugins that ask all the types of security/performance/load balancing etc. questions that you need in well-designed software. And models will get better at doing it right the first time. What matters is agency: do you want to build something? Do you have good taste so you can discern what to improve? Do you have high standards and the patience to keep pushing the agent to improve? For example, I could have iterated on this blog post with an LLM and improved its flow, readability and “hook” or virality. These 3 attributes will be the limiting factor in the future, not than technical ability.

Benchmarks and PHP version support

A deliberate goal was PHP 7.4+ compatibility, so I ran tests and benchmarks across 7.4, 8.0, 8.1, 8.2, 8.3, 8.4, 8.5. The README includes PHP 8.4 results; the benchmarks/README.md has the rest.

Benchmarking Dataset

Emil’s original project mentions the web100k dataset. ~~The only version I could actually find was on Hugging Face, and after downloading the Parquet file I realized the 100k rows were plain text, not HTML. That made it unsuitable for parser benchmarking.~~ Update: Emil commented on HN and shared a link for the actual web100k data set.

So I asked Codex to find an alternative, and it suggested using Common Crawl instead. Codex then wrote a script to download pages directly from Common Crawl; the only decision I made was how many pages to pull down. That became the basis for the benchmarks.

Links and sources

Emil Stenström’s JustHTML and his write-up:
- https://github.com/EmilStenstrom/justhtml
- https://friendlybit.com/python/writing-justhtml-with-coding-agents/
Simon Willison’s porting write-up:
- https://simonwillison.net/2025/Dec/15/porting-justhtml/
This PHP port:
- https://github.com/diffen/justhtml-php
- https://packagist.org/packages/diffen/justhtml