Parsing a string for dates in PHP

Something like the following might do it:

$months = array(
                    "01" => "January", 
                    "02" => "Feberuary", 
                    "03" => "March", 
                    "04" => "April", 
                    "05" => "May", 
                    "06" => "June", 
                    "07" => "July", 
                    "08" => "August", 
                    "09" => "September", 
                    "10" => "October", 
                    "11" => "November", 
                    "12" => "December"
                );

$weekDays = array(
                    "01" => "Monday", 
                    "02" => "Tuesday", 
                    "03" => "Wednesday", 
                    "04" => "Thursday", 
                    "05" => "Friday", 
                    "06" => "Saturday", 
                    "07" => "Sunday"
                );

foreach($months as $value){
    if(strpos(strtolower($string),strtolower($value))){
        \\ extract and assign as you like...
    }
}

Probably do a nother loop to check for other weekDays or other formats, or just nest.


Inspired by Juan Cortes's broken link based off Dolph's algorithm, I went ahead and wrote it up myself. Note that I decided to just return on first successful match.

<?php
function extractDatetime($string) {
    if(strtotime($string)) return $string;
    $string = str_replace(array(" at ", " on ", " the "), " ", $string);
    if(strtotime($string)) return $string;

    $list = explode(" ", $string);
    $first_length = count($list);
    for($j=0; $j < $first_length; $j++) {
        $original_length = count($list);
        for($i=0; $i < $original_length; $i++) {
            $temp_list = $list;
            for($k = 0; $k < $i; $k++) unset($temp_list[$k]);
            //echo "<code>".implode(" ", $temp_list)."</code><br/>"; // for visualizing the tests, if you want to see it
            if(strtotime(implode(" ", $temp_list))) return implode(" ", $temp_list);
        }
        array_pop($list);
    }

    return false;
}

Inputs

$array = array(
        "Gadzooks, is it 17th June already",
        "I’m going to play croquet next Friday",
        "Where was the dog yesterday at 6 PM?",
        "Where was Steve on Monday at 7am?"
);

foreach($array as $a) echo "$a => ".extractDatetime(str_replace("?", "", $a))."<hr/>";

Outputs

Gadzooks, is it 17th June already
is it 17th June already
it 17th June already
17th June already
June already
already
Gadzooks, is it 17th June
is it 17th June
it 17th June
17th June
Gadzooks, is it 17th June already => 17th June
-----
I’m going to play croquet next Friday
going to play croquet next Friday
to play croquet next Friday
play croquet next Friday
croquet next Friday
next Friday
I’m going to play croquet next Friday => next Friday
-----
Where was Rav Four yesterday 6 PM
was Rav Four yesterday 6 PM
Rav Four yesterday 6 PM
Four yesterday 6 PM
yesterday 6 PM
Where was the Rav Four yesterday at 6 PM? => yesterday 6 PM
-----
Where was Steve Monday 7am
was Steve Monday 7am
Steve Monday 7am
Monday 7am
Where was Steve on Monday at 7am? => Monday 7am
-----

I would do it this way:

First check if the entire string is a valid date with strtotime(). If so, you're done.

If not, determine how many words are in your string (split on whitespace for example). Let this number be n.

Loop over every n-1 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.

If not, loop over every n-2 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.

...and so on until you've found a valid date string or searched every single/individual word. By finding the longest matches, you'll get the most informed dates (if that makes sense). Since you're dealing with tweets, your strings will never be huge.


If you have the horsepower, you could try the following algorithm. I'm showing an example, and leaving the tedious work up to you :)

//Attempt to perform strtotime() on each contiguous subset of words...

//1st iteration
strtotime("Gadzooks, is it 17th June already")
strtotime("is it 17th June already")
strtotime("it 17th June already")
strtotime("17th June already")
strtotime("June already")
strtotime("already")

//2nd iteration
strtotime("Gadzooks, is it 17th June")
strtotime("is it 17th June")
strtotime("17th June") //date!
strtotime("June") //date!

//3rd iteration
strtotime("Gadzooks, is it 17th")
strtotime("is it 17th")
strtotime("it 17th")
strtotime("17th") //date!

//4th iteration
strtotime("Gadzooks, is it")
//etc

And we can assume that strtotime("17th June") is more accurate than strtotime("17th") simply because it contains more words... i.e. "next Friday" will always be more accurate than "Friday".