Separate street name from street number

Generally speaking, addresses are not always this clean. Especially if this data is coming straight from users, you have to consider that not everyone has such a standard address. There are PO boxes, rural routes, 31 1/2s, suites, tons of variations on street types (Road, Street, Circle, Court, etc, etc, plus all their abbreviations). Spaces in street names, hypens in house numbers, the complexity of addresses is very easy to underestimate. Mix in the potential for non-US addresses and the complexity goes up exponentially.

This giant function tries to make sense of all that (at least as far as the US Post is concerned): http://codepad.org/pkTdUDL6 I had this function kicking around, so it may need tweaking or elaboration. If nothing else, it should give you an idea of the task one is faced with when trying to make user address data sane.

This also makes it tempting to split the house number, street name, and street type into separate fields. If the accuracy of parsing addresses is critical to your system design, you might want to consider it; real estate systems for example would need to have this level of granularity for this data. If your use case does not critically rely on the ability to accurately parse this data, then I would not suggest presenting a user with all those extra fields. Just take their address as they give it, try to clean it up, and anticipate some inconsistencies in the rest of your system's design.


I would suggest that the best way to determine when the number starts is when you hit a digit. Thus, you would use

preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match)

Examples:

'Bubbletown 145' => 'Bubbletown', '145'
'Circlet56a' => 'Circle', '56a'
'Bloomfield Avenue 68' => 'Bloomfield Avenue', '68'
'Quibbit Ave       999a' => 'Quibbit Ave', '999a'
'Singletown551abc' => 'Singletown', '551abc'

It will probably be best for you to consider how you want edge cases to be handled, then write a unit test to test your own Regex function.


Try this as see if it works for you:

$subjects = array( "street 12", "street12", "street 12a", "street12a" );
foreach( $subjects as $subject )
{
    if ( preg_match('/([^\d]+)\s?(.+)/i', $subject, $result) )
    {
       var_dump( $result );
    }
}
die_r( $result  );

The only part you need is this:

// Find a match and store it in $result.
if ( preg_match('/([^\d]+)\s?(.+)/i', $subject, $result) )
{
    // $result[1] will have the steet name
    $streetName = $result[1];
    // and $result[2] is the number part. 
    $streetNumber = $result[2];
}