URLs are a fundamental building block of the Web we rely on every day.
Their familiarity makes them appear deceptively simple: Seemingly clearly delineated components like scheme, hostname, path, and some others suggest that it’s trivial to extract information from a URL. In reality, there are thousands of custom parsers built over the years, each with their own take on details.
For us web developers, there are two main standards specifying how URLs are supposed to work. RFC 3986, which is the original URI standard from 2005; and the WHATWG URL Living Standard, which is followed by web browsers. Because things are not as simple as they appear at first glance, these two commonly used standards are incompatible with each other! Mixing and matching different standards and their parsers, especially when they do not exactly follow the standard, is something that commonly leads to security issues.
Despite the importance of correctly parsing URLs, PHP did not include any
standards-compliant parser within the standard library for the longest time.
There is the
parse_url()
function,
which has existed since PHP 4, but it does not follow any standard and is
explicitly documented not to be used with untrusted or malformed URLs.
Nevertheless, it is commonly used for lack of a better alternative that is
readily available and also because it appears to work correctly for a majority
of well-formed inputs that developers encounter in day-to-day work. This can
mislead developers to believe that the security issues of parse_url()
are a
purely theoretical problem rather than something that will cause issues
sooner or later.
As an example, the input URL example.com/example/:8080/foo
is a valid URL
consisting of only a relative path according to RFC 3986. It is invalid
according to the WHATWG URL standard when not resolved against a base URL.
However, according to parse_url()
it is a URL for the host example.com
,
port 8080 and path /example/:8080/foo
, thus including the 8080 in two of
the resulting components:
<?php
var_dump(parse_url('example.com/example/:8080/foo'));
/*
array(3) {
["host"]=> string(11) "example.com"
["port"]=> int(8080)
["path"]=> string(18) "/example/:8080/foo"
}
*/
This changes with PHP 8.5. Going forward, PHP will include standards-compliant parsers for both RFC 3986 and the WHATWG URL standard as an always-available part of its standard library within a new “URI” extension. Not only will this enable easy, correct, and secure parsing of URLs according to the respective standard, but the URI extension also includes functionality to modify individual components of a URL.
<?php
use Uri\Rfc3986\Uri;
$url = new Uri('HTTPS://thephp.foundation:443/sp%6Fnsor/');
$defaultPortForScheme = match ($url->getScheme()) {
'http' => 80,
'https' => 443,
'ssh' => 22,
default => null,
};
// Remove default ports from URLs.
if ($url->getPort() === $defaultPortForScheme) {
$url = $url->withPort(null);
}
// Getters normalize the URL by default. The `Raw`
// variants return the input unchanged.
echo $url->toString(), PHP_EOL;
// Prints: https://thephp.foundation/sponsor/
echo $url->toRawString(), PHP_EOL;
// Prints: HTTPS://thephp.foundation/sp%6Fnsor/
In this post we not only want to showcase the functionality but also tell you the story of how this project developed and how work gets done in PHP to keep the language modern and a great choice for web development. There is often more work behind new PHP features than meets the eye. We hope to provide some insight into why we prefer doing things right rather than fast.
Máté Kocsis from The PHP Foundation’s dev team
initially started discussion for his RFC of a new URL parsing
API in June 2024. Given PHP’s
strong backwards compatibility promise, the new API needed to get things right
on the first attempt in order to serve the PHP community well for the decade to
come without introducing disruptive changes. Thus, over the course of almost
one year, more than 150 emails on the PHP Internals
list were sent. Additionally,
several off-list discussions in various chat rooms have been had. Throughout
this process, various experts from the PHP community continuously refined the
RFC. They discussed even seemingly insignificant details, to provide not just a
standards-compliant implementation, but also a clean and robust API that will
guide developers towards the right solution for their use case. We also planned
ahead and made sure that the new URI extension with its dedicated Uri
namespace provides a clear path forward to add additional URI/URL-related
functionality in future versions of PHP.
The RFC ultimately went to vote in May 2025 and was accepted with a 30:1 vote. But work didn’t stop there: The proposed API also had to be implemented and reviewed. Instead of building a PHP-specific solution, Máté opted to stand on the shoulders of giants and selected two libraries to perform the heavy lifting. The uriparser library provides the RFC 3986 parser, and the Lexbor library, which is already used by PHP 8.4’s new DOM API, provides the WHATWG parser.
As part of the integration, Máté and The PHP Foundation worked together with
the upstream maintainers to include missing functionality in the respective
libraries. As an example, neither library included functionality to cheaply
duplicate the internal data structures, which was necessary to support cloning
the readonly PHP objects representing the parsed URL when attempting to modify
individual components with the so-called with-er methods (e.g.,
->withPort(8080)
). The uriparser library also did not include any functions
for modifying components of a parsed URL. All this functionality is now
available in the upstream libraries for everyone to use and benefit from.
The review and testing of Máté’s PHP implementation was carried out by PHP community contributors Niels Dossche and Ignace Nyamagana Butera. This included reviewing and testing the new functionality that had been added to the two upstream libraries. Tideways, a founding member and Silver sponsor of The PHP Foundation, also sponsored engineering time; their contribution came in the form of Tim Düsterhus. During the review and testing, these reviewers discovered several pre-existing bugs in the upstream libraries. They submitted fixes to the upstream maintainers, Sebastian Pipping (uriparser) and Alexander Borisov (Lexbor), who quickly reviewed and applied them.
This work paid off, and PHP’s new URI extension with not just one but two feature-rich and standards-compliant URI implementations is fully available for testing with PHP 8.5 RC 1.
If you'd like to see further improvements to PHP’s standard library please consider sponsoring The PHP Foundation.