Regular Expressions in PHP


This blog has been about JavaScript so far. However, with the topic of regular expressions, we are diving in more deep waters. As a refresher, some other languages will also surface on our horizon. In this post, we will get started with writing and testing simple regular expressions in PHP. Don’t worry if you don’t have PHP installed on your machine, we will simply execute our code in an online environment accessible from the browser.

Both PHP and JavaScript regular expressions are based on the PCRE dialect as well as the Regex execution environments of most programming languages. Our goal here is not a comprehensive overview, but a brief tutorial on getting started with regular expressions in PHP.

When you want a quick solution for generating code for matching regular expressions, head over to the code generator of regex101.com.

PHP offers global functions to handle regular expressions. These start with:

  • preg_ matching the Perl PCRE syntax,
  • ereg_ based on the ERE dialect,
  • mb_ereg also based on the ERE dialect, with the ability of handling Unicode characters.

We will only deal with the PCRE syntax here.

In PHP, some modifiers are the same as in JavaScript, while others are different:

  • i stands for case-insensitivity just like in JavaScript,
  • m stands for multiline just like in JavaScript,
  • u stands for Unicode matching just like in JavaScript,
  • s makes the . character match all characters without exception
  • x turns on free spacing mode for easier readability. Free spacing mode ignores whitespace characters between regex characters

Consult the PCRE section of the PHP manual for the documentation of all PCRE regex functions.

You can execute PHP code online using many sandbox solutions such as writephponline.com or the Joodle PHP online editor.

We can use preg_match to return the first match from a string.

The result is:

This is an array containing one element describing the match. The match descriptor is an array, where element 0 contains the matched sequence, and the element at index 1 contains the index of the first character of the match.

preg_match_all returns all matches, so it is similar to the g flag in JavaScript. Remember, PHP does not have the g modifier, so global matching is encoded in the public interface in the PCRE regular expression wrapper:

The result contains all matches:

Note that the regular expression /xy+/ does not contain any parentheses. Parentheses act as capture groups that tell the regex engine to capture whatever is matched with the regular expression fragment inside the parentheses.

For instance, suppose we would like to validate a string of form "Price: €19.00", and beyond returning the full match, we are interested in the currency symbol (), the number value of the price (19.00), and the price string (€19.00). In order to return these values, we need to put them in parentheses.

Execute the code e.g. in the PHP Sandbox. The result is:

You can see that the three captured strings are returned as elements 1, 2, and 3 of the result array. The order of these capture groups depend on the position of their corresponding open parentheses in the regular expression from left to right.

Although the preg_ functions act as a wrapper for Perl 5 style regular expressions, there are some small differences to the Perl syntax. Consult the php.net documentation on PCRE pattern differences for more details.