how to use preg_split() in php?

PHP's str_word_count may be a better choice here.

str_word_count($string, 2) will output an array of all words in the string, including duplicates.


preg means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".

Regexps are a nightmare to the beginner. I still don’t fully understand them and I’ve been working with them for years.

Basically the example you have there, broken down is:

"/[\s,]+/"

/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character

So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".

Other common characters are:

. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character

For PHP there is good information in the official documentation.


Documentation says:

The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.

So, the following code...

<?php

$ip = "123 ,456 ,789 ,000"; 
$iparr = preg_split ("/[\s,]+/", $ip); 
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />"  ;
print "$iparr[3] <br />"  ;

?>

This will produce following result.

123
456
789
000 

So, if have this subject: is is and you want: array ( 0 => 'is', 1 => 'is', )

you need to modify your regex to "/[\s]+/"

Unless you have is ,is you need the regex you already have "/[\s,]+/"


This should work:

$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);

echo '<pre>';
print_r($words);
echo '</pre>';

The output would be:

Array
(
    [0] => is
    [1] => is
)

Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY. That basically means only return the results of preg_split if the results are not empty. This assures you the data returned in the array $words truly has data in it and not just empty values which can happen when dealing with regex patterns and mixed data sources.

And the explanation of that regex can be broken down like this using this tool:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [!?.]*                   any character of: '!', '?', '.' (0 or more
                           times (matching the most amount possible))

An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/ in this other other tool:

  • (?<=\w) Positive Lookbehind - Assert that the regex below can be matched
  • \w match any word character [a-zA-Z0-9_]
  • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
  • \s* match any white space character [\r\n\t\f ]
  • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • !?. a single character in the list !?. literally

That last regex explanation can be boiled down by a human—also known as me—as the following:

Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?..

Tags:

Php

Preg Split